Fit Ridge and HuberRegressor on a dataset with outliers.

The example shows that the predictions in ridge are strongly influenced by the outliers present in the dataset. The Huber regressor is less influenced by the outliers since the model uses the linear loss for these. As the parameter epsilon is increased for the Huber regressor, the decision function approaches that of the ridge.

#### New to Plotly?
Plotly's Python library is free and open source! [Get started](https://plot.ly/python/getting-started/) by downloading the client and [reading the primer](https://plot.ly/python/getting-started/).
<br>You can set up Plotly to work in [online](https://plot.ly/python/getting-started/#initialization-for-online-plotting) or [offline](https://plot.ly/python/getting-started/#initialization-for-offline-plotting) mode, or in [jupyter notebooks](https://plot.ly/python/getting-started/#start-plotting-online).
<br>We also have a quick-reference [cheatsheet](https://images.plot.ly/plotly-documentation/images/python_cheat_sheet.pdf) (new!) to help you get started!

### Version

In [1]:
import sklearn
sklearn.__version__

'0.18.1'

### Imports

This tutorial imports [make_regression](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html#sklearn.datasets.make_regression), [HuberRegressor](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.HuberRegressor.html#sklearn.linear_model.HuberRegressor) and [Ridge](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge).

In [2]:
print(__doc__)

import plotly.plotly as py
import plotly.graph_objs as go

import numpy as np
from sklearn.datasets import make_regression
from sklearn.linear_model import HuberRegressor, Ridge

Automatically created module for IPython interactive environment


### Calculations

In [3]:
# Generate toy data.
rng = np.random.RandomState(0)
X, y = make_regression(n_samples=20, n_features=1, random_state=0, noise=4.0,
                       bias=100.0)

# Add four strong outliers to the dataset.
X_outliers = rng.normal(0, 0.5, size=(4, 1))
y_outliers = rng.normal(0, 2.0, size=4)
X_outliers[:2, :] += X.max() + X.mean() / 4.
X_outliers[2:, :] += X.min() - X.mean() / 4.
y_outliers[:2] += y.min() - y.mean() / 4.
y_outliers[2:] += y.max() + y.mean() / 4.
X = np.vstack((X, X_outliers))
y = np.concatenate((y, y_outliers))

### Plot Results

In [4]:
def data_to_plotly(x):
    k = []
    
    for i in range(0, len(x)):
        k.append(x[i][0])
        
    return k

In [5]:
data = []

p1 = go.Scatter(x=data_to_plotly(X), y=y,
                mode='markers',
                showlegend=False,
                marker=dict(color='blue', size=6)
               )
data.append(p1)
# Fit the huber regressor over a series of epsilon values.
colors = ['red', 'blue', 'yellow', 'magenta']

x = np.linspace(X.min(), X.max(), 7)
epsilon_values = [1.35, 1.5, 1.75, 1.9]

for k, epsilon in enumerate(epsilon_values):
    huber = HuberRegressor(fit_intercept=True, alpha=0.0, max_iter=100,
                           epsilon=epsilon)
    huber.fit(X, y)
    coef_ = huber.coef_ * x + huber.intercept_
    p2 = go.Scatter(x=x, y=coef_,
                    mode='lines',
                    line=dict(color=colors[k], width=1), 
                    name="huber loss, %s" % epsilon)
    data.append(p2)

# Fit a ridge regressor to compare it to huber regressor.
ridge = Ridge(fit_intercept=True, alpha=0.0, random_state=0, normalize=True)
ridge.fit(X, y)
coef_ridge = ridge.coef_
coef_ = ridge.coef_ * x + ridge.intercept_
p3 = go.Scatter(x=x, y=coef_,
                mode='lines',
                line=dict(color='green', width=1),  
                name="ridge regression")
data.append(p3)

layout = go.Layout(title="Comparison of HuberRegressor vs Ridge",
                   xaxis=dict(title='X', zeroline=False, showgrid=False),
                   yaxis=dict(title='Y', zeroline=False, showgrid=False),
                   hovermode='closest'
                  )
fig = go.Figure(data=data, layout=layout)

In [6]:
py.iplot(fig)

### License

Authors: 

        Manoj Kumar mks542@nyu.edu

License:

        BSD 3 clause

In [8]:

from IPython.display import display, HTML

display(HTML('<link href="//fonts.googleapis.com/css?family=Open+Sans:600,400,300,200|Inconsolata|Ubuntu+Mono:400,700" rel="stylesheet" type="text/css" />'))
display(HTML('<link rel="stylesheet" type="text/css" href="http://help.plot.ly/documentation/all_static/css/ipython-notebook-custom.css">'))

! pip install git+https://github.com/plotly/publisher.git --upgrade
import publisher
publisher.publish(
    'HuberRegressor vs Ridge on Dataset with Strong Outliers.ipynb', 'scikit-learn/plot-huber-vs-ridge/', 'HuberRegressor vs Ridge on Dataset with Strong Outliers | plotly',
    '',
    title = 'HuberRegressor vs Ridge on Dataset with Strong Outliers | plotly',
    name = 'HuberRegressor vs Ridge on Dataset with Strong Outliers',
    has_thumbnail='true', thumbnail='thumbnail/huber.jpg', 
    language='scikit-learn', page_type='example_index',
    display_as='linear_models', order=19,
    ipynb= '~Diksha_Gabha/3226')

Collecting git+https://github.com/plotly/publisher.git
  Cloning https://github.com/plotly/publisher.git to /tmp/pip-EfSwzi-build
Installing collected packages: publisher
  Found existing installation: publisher 0.10
    Uninstalling publisher-0.10:
      Successfully uninstalled publisher-0.10
  Running setup.py install for publisher ... [?25l- done
[?25hSuccessfully installed publisher-0.10
