In [1]:
# check scikit-learn version
import sklearn
print(sklearn.__version__)

0.22.2.post1


In [2]:
# example of multioutput regression test problem
from sklearn.datasets import make_regression
# create datasets
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1)
# summarize dataset
print(X.shape, y.shape)

(1000, 10) (1000, 2)


## ***Inherently Multioutput Regression Algorithms***
Some regression machine learning algorithms support multiple outputs directly.

This includes most of the popular machine learning algorithms implemented in the scikit-learn library, such as:

LinearRegression (and related)
KNeighborsRegressor
DecisionTreeRegressor
RandomForestRegressor (and related)
Let’s look at a few examples to make this concrete.

## **Linear Regression for Multioutput Regression**
The example below fits a linear regression model on the multioutput regression dataset, then makes a single prediction with the fit model.

In [3]:
# linear regression for multioutput regression
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
# create datasets
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1)
# define model
model = LinearRegression()
# fit model
model.fit(X, y)
# make a prediction
data_in = [[-2.02220122, 0.31563495, 0.82797464, -0.30620401, 0.16003707, -1.44411381, 0.87616892, -0.50446586, 0.23009474, 0.76201118]]
yhat = model.predict(data_in)
# summarize prediction
print(yhat[0])

[-93.147146    23.26985013]


## **k-Nearest Neighbors for Multioutput Regression**
The example below fits a k-nearest neighbors model on the multioutput regression dataset, then makes a single prediction with the fit model.

In [4]:
# k-nearest neighbors for multioutput regression
from sklearn.datasets import make_regression
from sklearn.neighbors import KNeighborsRegressor
# create datasets
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1)
# define model
model = KNeighborsRegressor()
# fit model
model.fit(X, y)
# make a prediction
data_in = [[-2.02220122, 0.31563495, 0.82797464, -0.30620401, 0.16003707, -1.44411381, 0.87616892, -0.50446586, 0.23009474, 0.76201118]]
yhat = model.predict(data_in)
# summarize prediction
print(yhat[0])

[-109.74862659    0.38754079]


## **Random Forest for Multioutput Regression**
The example below fits a random forest model on the multioutput regression dataset, then makes a single prediction with the fit model.

In [5]:
# random forest for multioutput regression
from sklearn.datasets import make_regression
from sklearn.ensemble import RandomForestRegressor
# create datasets
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1)
# define model
model = RandomForestRegressor()
# fit model
model.fit(X, y)
# make a prediction
data_in = [[-2.02220122, 0.31563495, 0.82797464, -0.30620401, 0.16003707, -1.44411381, 0.87616892, -0.50446586, 0.23009474, 0.76201118]]
yhat = model.predict(data_in)
# summarize prediction
print(yhat[0])

[-78.88689906  30.03893177]


## **Evaluate Multioutput Regression With Cross-Validation**
We may want to evaluate a multioutput regression using k-fold cross-validation.

This can be achieved in the same way as evaluating any other machine learning model.

We will fit and evaluate a DecisionTreeRegressor model on the test problem using 10-fold cross-validation with three repeats. We will use the mean absolute error (MAE) performance metric as the score.

The complete example is listed below.

In [0]:
# evaluate multioutput regression model with k-fold cross-validation
from numpy import absolute
from numpy import mean
from numpy import std
from sklearn.datasets import make_regression
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedKFold

In [7]:
# create datasets
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1)
# define model
model = DecisionTreeRegressor()
# evaluate model
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
n_scores = cross_val_score(model, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1, error_score='raise')
# summarize performance
n_scores = absolute(n_scores)
print('Result: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))

Result: 51.537 (3.181)


## **Wrapper Multioutput Regression Algorithms**
Not all regression algorithms support multioutput regression.

One example is the support vector machine, although for regression, it is referred to as support vector regression, or SVR.

This algorithm does not support multiple outputs for a regression problem and will raise an error. We can demonstrate this with an example, listed below.

In [10]:
# failure of support vector regression for multioutput regression
from sklearn.datasets import make_regression
from sklearn.svm import LinearSVR
# create datasets
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1)
print(X.shape, y.shape)

(1000, 10) (1000, 2)


In [13]:
# define model
model = LinearSVR()
# fit model
model.fit(X,y)

ValueError: ignored

There are two workarounds that we can adopt in order to use an algorithm like SVR for multioutput regression.

## **Separate Model for Each Output (MultiOutputRegressor)**
We can create a separate model for each output of the problem.

This assumes that the outputs are independent of each other, which might not be a correct assumption. Nevertheless, this approach can provide surprisingly effective predictions on a range of problems and may be worth trying, at least as a performance baseline.

You never know. The outputs for your problem may, in fact, be mostly independent, if not completely independent, and this strategy can help you find out.

This approach is supported by the MultiOutputRegressor class that takes a regression model as an argument. It will then create one instance of the provided model for each output in the problem.

The example below demonstrates using the MultiOutputRegressor class with linear SVR for the test problem.

In [0]:
# example of linear SVR with the MultiOutputRegressor wrapper for multioutput regression
from sklearn.datasets import make_regression
from sklearn.multioutput import MultiOutputRegressor
from sklearn.svm import LinearSVR

In [15]:
# create datasets
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1)
# define model
model = LinearSVR()
wrapper = MultiOutputRegressor(model)
# fit model
wrapper.fit(X, y)
# make a prediction
data_in = [[-2.02220122, 0.31563495, 0.82797464, -0.30620401, 0.16003707, -1.44411381, 0.87616892, -0.50446586, 0.23009474, 0.76201118]]
yhat = wrapper.predict(data_in)
# summarize prediction
print(yhat[0])

[-93.147146    23.26985013]


## **Chained Models for Each Output (RegressorChain)**
Another approach to using single-output regression models for multioutput regression is to create a linear sequence of models.

The first model in the sequence uses the input and predicts one output; the second model uses the input and the output from the first model to make a prediction; the third model uses the input and output from the first two models to make a prediction, and so on.

This can be achieved using the RegressorChain class in the scikit-learn library.

In [16]:
# example of fitting a chain of linear SVR for multioutput regression
from sklearn.datasets import make_regression
from sklearn.multioutput import RegressorChain
from sklearn.svm import LinearSVR
# create datasets
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1)
# define model
model = LinearSVR()
wrapper = RegressorChain(model)
# fit model
wrapper.fit(X, y)
# make a prediction
data_in = [[-2.02220122, 0.31563495, 0.82797464, -0.30620401, 0.16003707, -1.44411381, 0.87616892, -0.50446586, 0.23009474, 0.76201118]]
yhat = wrapper.predict(data_in)
# summarize prediction
print(yhat[0])

[-93.147146    23.26754808]
