Here a sine function is fit with a polynomial of order 3, for values close to zero.

* Robust fitting is demoed in different situations:
* No measurement errors, only modelling errors (fitting a sine with a polynomial)
* Measurement errors in X
* Measurement errors in y

The median absolute deviation to non corrupt new data is used to judge the quality of the prediction.

What we can see that:

* RANSAC is good for strong outliers in the y direction
* TheilSen is good for small outliers, both in direction X and y, but has a break point above which it performs worse than OLS.
* The scores of HuberRegressor may not be compared directly to both TheilSen and RANSAC because it does not attempt to completely filter the outliers but lessen their effect.


#### New to Plotly?
Plotly's Python library is free and open source! [Get started](https://plot.ly/python/getting-started/) by downloading the client and [reading the primer](https://plot.ly/python/getting-started/).
<br>You can set up Plotly to work in [online](https://plot.ly/python/getting-started/#initialization-for-online-plotting) or [offline](https://plot.ly/python/getting-started/#initialization-for-offline-plotting) mode, or in [jupyter notebooks](https://plot.ly/python/getting-started/#start-plotting-online).
<br>We also have a quick-reference [cheatsheet](https://images.plot.ly/plotly-documentation/images/python_cheat_sheet.pdf) (new!) to help you get started!

### Version

In [1]:
import sklearn
sklearn.__version__

'0.18.1'

### Imports

This tutorial imports [LinearRegression](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression), [TheilSenRegressor](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.TheilSenRegressor.html#sklearn.linear_model.TheilSenRegressor), [RANSACRegressor](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RANSACRegressor.html#sklearn.linear_model.RANSACRegressor), [HuberRegressor](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.HuberRegressor.html#sklearn.linear_model.HuberRegressor), [mean_squared_error](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html#sklearn.metrics.mean_squared_error), [PolynomialFeatures](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html#sklearn.preprocessing.PolynomialFeatures) and [make_pipeline](http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.make_pipeline.html#sklearn.pipeline.make_pipeline).

In [2]:
import plotly.plotly as py
import plotly.graph_objs as go
import numpy as np

from sklearn.linear_model import (
    LinearRegression, TheilSenRegressor, RANSACRegressor, HuberRegressor)
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline


### Calculations

In [3]:

np.random.seed(42)

X = np.random.normal(size=400)
y = np.sin(X)
# Make sure that it X is 2D
X = X[:, np.newaxis]

X_test = np.random.normal(size=200)
y_test = np.sin(X_test)
X_test = X_test[:, np.newaxis]

y_errors = y.copy()
y_errors[::3] = 3

X_errors = X.copy()
X_errors[::3] = 3

y_errors_large = y.copy()
y_errors_large[::3] = 10

X_errors_large = X.copy()
X_errors_large[::3] = 10

estimators = [('OLS', LinearRegression()),
              ('Theil-Sen', TheilSenRegressor(random_state=42)),
              ('RANSAC', RANSACRegressor(random_state=42)),
              ('HuberRegressor', HuberRegressor())]
colors = {'OLS': 'turquoise', 'Theil-Sen': 'gold', 'RANSAC': 'lightgreen', 'HuberRegressor': 'black'}
linestyle = {'OLS': 'dash', 'Theil-Sen': 'dashdot', 'RANSAC': 'dot', 'HuberRegressor': 'dot'}
lw = 3

x_plot = np.linspace(X.min(), X.max())

### Plot Results

In [4]:
plots = []

for title, this_X, this_y in [
        ('Modeling Errors Only', X, y),
        ('Corrupt X, Small Deviants', X_errors, y),
        ('Corrupt y, Small Deviants', X, y_errors),
        ('Corrupt X, Large Deviants', X_errors_large, y),
        ('Corrupt y, Large Deviants', X, y_errors_large)]:
    
    data = []
    trace = go.Scatter(x=this_X[:, 0], y=this_y, 
                       mode='markers', 
                       marker=dict(color='blue', size=3),
                       showlegend=False)
    data.append(trace)
    
    for name, estimator in estimators:
        model = make_pipeline(PolynomialFeatures(3), estimator)
        model.fit(this_X, this_y)
        mse = mean_squared_error(model.predict(X_test), y_test)
        y_plot = model.predict(x_plot[:, np.newaxis])
        
        trace = go.Scatter(x=x_plot, y=y_plot, 
                           mode='lines',
                           line=dict(color=colors[name], dash=linestyle[name],
                                     width=lw),
                           name='%s: error = %.3f' % (name, mse))
        data.append(trace)
        
    layout = go.Layout(title=title,
                       xaxis=dict(range=[-4, 10], showgrid=False,
                                  zeroline=False),
                       yaxis=dict(range=[-2, 10], showgrid=False,
                                  zeroline=False)
                      )
    fig = go.Figure(data=data, layout=layout)
    
    plots.append(fig)
    

### Modeling Errors Only

In [5]:
py.iplot(plots[0])

### Corrupt X, Small Deviants

In [6]:
py.iplot(plots[1])

### Corrupt y, Small Deviants

In [7]:
py.iplot(plots[2])

### Corrupt X, Large Deviants

In [8]:
py.iplot(plots[3])

### Corrupt y, Large Deviants

In [9]:
py.iplot(plots[4])

In [11]:
from IPython.display import display, HTML

display(HTML('<link href="//fonts.googleapis.com/css?family=Open+Sans:600,400,300,200|Inconsolata|Ubuntu+Mono:400,700" rel="stylesheet" type="text/css" />'))
display(HTML('<link rel="stylesheet" type="text/css" href="http://help.plot.ly/documentation/all_static/css/ipython-notebook-custom.css">'))

! pip install git+https://github.com/plotly/publisher.git --upgrade
import publisher
publisher.publish(
    'Robust Linear Estimator Fitting.ipynb', 'scikit-learn/plot-robust-fit/', 'Robust Linear Estimator Fitting | plotly',
    ' ',
    title = 'Robust Linear Estimator Fitting | plotly',
    name = 'Robust Linear Estimator Fitting',
    has_thumbnail='true', thumbnail='thumbnail/robust-fit.jpg', 
    language='scikit-learn', page_type='example_index',
    display_as='linear_models', order=28,
    ipynb= '~Diksha_Gabha/3293')

Collecting git+https://github.com/plotly/publisher.git
  Cloning https://github.com/plotly/publisher.git to /tmp/pip-hOP7ti-build
Installing collected packages: publisher
  Found existing installation: publisher 0.10
    Uninstalling publisher-0.10:
      Successfully uninstalled publisher-0.10
  Running setup.py install for publisher ... [?25l- done
[?25hSuccessfully installed publisher-0.10
