# Introduction
<hr style = "border:2px solid black" ></hr>

<div class="alert alert-warning">
<font color=black>

**What?** Multi-output regression

</font>
</div>

# Imports
<hr style = "border:2px solid black" ></hr>

In [12]:
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.multioutput import MultiOutputRegressor
from sklearn.svm import LinearSVR
from sklearn.multioutput import RegressorChain

# Theoretical recalls
<hr style = "border:2px solid black" ></hr>

<div class="alert alert-info">
<font color=black>
    
- In multioutput regression, typically the outputs are dependent upon the input and upon each other. This means that often the outputs **are not independent** of each other and may require a model that predicts both outputs together or each output contingent upon the other outputs.

- Multi-step time series forecasting **may be considered** a type of multiple-output regression where a sequence of future values are predicted and each predicted value is dependent upon the prior values in the sequence.
    
- **If outputs may be correlated, why would anyone use a regressor for each target?** Building a regressor for each target assumes that the outputs are independent of each other, which might not be a correct assumption. **Nevertheless**, this approach can be effective predictions on a range of problems and may be worth trying, at least as a performance baseline. If the target are mostly independent, this strategy can help you find out.

</font>
</div>

# Create dummy dataset
<hr style = "border:2px solid black" ></hr>

In [2]:
X, y = make_regression(n_samples=1000, n_features=10,
                       n_informative=5, n_targets=2, random_state=1, noise=0.5)

print(X.shape, y.shape)

(1000, 10) (1000, 2)


# Linear Regression for Multioutput Regression
<hr style = "border:2px solid black" ></hr>

<div class="alert alert-info">
<font color=black>

- Some regression machine learning algorithms support multiple outputs **directly**. 
- This means we need no wrapper where each target is assigned a single regressor.
- These includel:

    - [x] LinearRegression
    - [ ] KNeighborsRegressor
    - [ ] DecisionTreeRegressor
    - [ ] RandomForestRegressor

</font>
</div>

In [4]:
# Instantiate the model
model = LinearRegression()

# fit model
model.fit(X, y)

# make a prediction
row = [0.21947749, 0.32948997, 0.81560036, 0.440956, -0.0606303, -
       0.29257894, -0.2820059, -0.00290545, 0.96402263, 0.04992249]
yhat = model.predict([row])

# summarize prediction
print(yhat[0])

[50.06781717 64.564973  ]


# k-Nearest Neighbors for Multioutput Regression
<hr style = "border:2px solid black" ></hr>

<div class="alert alert-info">
<font color=black>

- Some regression machine learning algorithms support multiple outputs **directly**. 
- This means we need no wrapper where each target is assigned a single regressor.
- These includel:

    - [ ] LinearRegression
    - [x] KNeighborsRegressor
    - [ ] DecisionTreeRegressor
    - [ ] RandomForestRegressor

</font>
</div>

In [6]:
# create datasets
X, y = make_regression(n_samples=1000, n_features=10,
                       n_informative=5, n_targets=2, random_state=1, noise=0.5)

# define model
model = KNeighborsRegressor()

# fit model
model.fit(X, y)

# make a prediction
row = [0.21947749, 0.32948997, 0.81560036, 0.440956, -0.0606303, -
       0.29257894, -0.2820059, -0.00290545, 0.96402263, 0.04992249]
yhat = model.predict([row])

# summarize prediction
print(yhat[0])

[-11.73511093  52.78406297]


# Decision Tree for Multioutput Regression
<hr style = "border:2px solid black" ></hr>

<div class="alert alert-info">
<font color=black>

- Some regression machine learning algorithms support multiple outputs **directly**. 
- This means we need no wrapper where each target is assigned a single regressor.
- These includel:

    - [ ] LinearRegression
    - [ ] KNeighborsRegressor
    - [x] DecisionTreeRegressor
    - [ ] RandomForestRegressor

</font>
</div>

In [None]:
# define model
model = DecisionTreeRegressor()

# fit model
model.fit(X, y)

# make a prediction
row = [0.21947749, 0.32948997, 0.81560036, 0.440956, -0.0606303, -
       0.29257894, -0.2820059, -0.00290545, 0.96402263, 0.04992249]
yhat = model.predict([row])

# summarize prediction
print(yhat[0])

# SVR
<hr style = "border:2px solid black" ></hr>

<div class="alert alert-info">
<font color=black>

- Not all regression algorithms support multioutput regression and this is the case for SVR.port vector regression, or SVR.
- There are two main approaches to implementing this technique:
    - **Direct Multioutput**: fit an independent model for each target.
    - **Chained Multioutput**: Develop a sequence of dependent models to match the number of numerical values to be predicted. The first model in the sequence uses the input and predicts one output; the second model uses the input and the output from the first model to make a prediction; the third model uses the input and output from the first two models to make a prediction, and so on.

</font>
</div>

## Direct multiouput

In [9]:
# define base model
model = LinearSVR()

# define the direct multioutput wrapper model
wrapper = MultiOutputRegressor(model)

# fit the model on the whole dataset
wrapper.fit(X, y)

# make a single prediction
row = [0.21947749, 0.32948997, 0.81560036, 0.440956, -0.0606303, -
       0.29257894, -0.2820059, -0.00290545, 0.96402263, 0.04992249]

yhat = wrapper.predict([row])
# summarize the prediction
print('Predicted: %s' % yhat[0])

Predicted: [50.02804639 64.51260413]


In [10]:
dir(wrapper)

['__abstractmethods__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_abc_impl',
 '_check_feature_names',
 '_check_n_features',
 '_estimator_type',
 '_get_param_names',
 '_get_tags',
 '_more_tags',
 '_repr_html_',
 '_repr_html_inner',
 '_repr_mimebundle_',
 '_required_parameters',
 '_validate_data',
 'estimator',
 'estimators_',
 'fit',
 'get_params',
 'n_features_in_',
 'n_jobs',
 'partial_fit',
 'predict',
 'score',
 'set_params']

In [11]:
# Get the two independent regressors
wrapper.estimators_

[LinearSVR(), LinearSVR()]

## Chained multioutput

In [14]:
# define base model
model = LinearSVR()

# define the chained multioutput wrapper model
wrapper = RegressorChain(model)


# fit the model on the whole dataset
wrapper.fit(X, y)

# make a single prediction
row = [0.21947749, 0.32948997, 0.81560036, 0.440956, -0.0606303, -
       0.29257894, -0.2820059, -0.00290545, 0.96402263, 0.04992249]

yhat = wrapper.predict([row])

# summarize the prediction
print('Predicted: %s' % yhat[0])

Predicted: [50.03218906 64.45419582]




In [15]:
wrapper.estimators_

[LinearSVR(), LinearSVR()]

# References
<hr style = "border:2px solid black" ></hr>

<div class="alert alert-warning">
<font color=black>

- [How to Develop Multi-Output Regression Models with Python](https://machinelearningmastery.com/multi-output-regression-models-with-python/)

</font>
</div>