Illustration of how the performance of an estimator on unseen data (test data) is not the same as the performance on training data. As the regularization increases the performance on train decreases while the performance on test is optimal within a range of values of the regularization parameter. The example with an Elastic-Net regression model and the performance is measured using the explained variance a.k.a. R^2.

#### New to Plotly?
Plotly's Python library is free and open source! [Get started](https://plot.ly/python/getting-started/) by downloading the client and [reading the primer](https://plot.ly/python/getting-started/).
<br>You can set up Plotly to work in [online](https://plot.ly/python/getting-started/#initialization-for-online-plotting) or [offline](https://plot.ly/python/getting-started/#initialization-for-offline-plotting) mode, or in [jupyter notebooks](https://plot.ly/python/getting-started/#start-plotting-online).
<br>We also have a quick-reference [cheatsheet](https://images.plot.ly/plotly-documentation/images/python_cheat_sheet.pdf) (new!) to help you get started!

### Version

In [1]:
import sklearn
sklearn.__version__

'0.18.1'

### Imports

In [2]:
import plotly.plotly as py
import plotly.graph_objs as go
from plotly import tools

import numpy as np
from sklearn import linear_model

### Calculations

Generate sample data

In [3]:
n_samples_train, n_samples_test, n_features = 75, 150, 500
np.random.seed(0)
coef = np.random.randn(n_features)
coef[50:] = 0.0  # only the top 10 features are impacting the model
X = np.random.randn(n_samples_train + n_samples_test, n_features)
y = np.dot(X, coef)

# Split train and test data
X_train, X_test = X[:n_samples_train], X[n_samples_train:]
y_train, y_test = y[:n_samples_train], y[n_samples_train:]

Compute train and test errors

In [4]:
alphas = np.logspace(-5, 1, 60)
enet = linear_model.ElasticNet(l1_ratio=0.7)
train_errors = list()
test_errors = list()
for alpha in alphas:
    enet.set_params(alpha=alpha)
    enet.fit(X_train, y_train)
    train_errors.append(enet.score(X_train, y_train))
    test_errors.append(enet.score(X_test, y_test))

i_alpha_optim = np.argmax(test_errors)
alpha_optim = alphas[i_alpha_optim]
print("Optimal regularization parameter : %s" % alpha_optim)

# Estimate the coef_ on full data with optimal regularization parameter
enet.set_params(alpha=alpha_optim)
coef_ = enet.fit(X, y).coef_


Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.



Optimal regularization parameter : 0.000335292414925


### Plot Results 

In [5]:
fig = tools.make_subplots(rows=2, cols=1)

This is the format of your plot grid:
[ (1,1) x1,y1 ]
[ (2,1) x2,y2 ]



In [6]:
p1 = go.Scatter(x=alphas, y=train_errors, 
                name='Train', mode='lines',
                line=dict(width=1))
fig.append_trace(p1, 1, 1)
p2 = go.Scatter(x=alphas, y=test_errors, 
                name='Test', mode='lines',
                line=dict(width=1))
fig.append_trace(p2, 1, 1)
p3 = go.Scatter(x=2*[alpha_optim],y=[0, np.max(test_errors)],
                mode='lines',
                line=dict(width=3, color='black'),
                name='Optimum on test')
fig.append_trace(p3, 1, 1)

fig['layout']['yaxis1'].update(title='Regularization parameter', 
                               showgrid=False)
fig['layout']['xaxis1'].update(title='Performance', showgrid=False,
                               type='log')

# Show estimated coef_ vs true coef
p4 = go.Scatter(y=coef, name='True coef',
                mode='lines',
                line=dict(width=1))
fig.append_trace(p4, 2, 1)

p5 = go.Scatter(y=coef_, name='Estimated coef',
                mode='lines',
                line=dict(width=1))
fig.append_trace(p5, 2, 1)


In [7]:
py.iplot(fig)

### License

Author: 
    
        Alexandre Gramfort <alexandre.gramfort@inria.fr>

License:
    
        BSD 3 clause


In [8]:
from IPython.display import display, HTML

display(HTML('<link href="//fonts.googleapis.com/css?family=Open+Sans:600,400,300,200|Inconsolata|Ubuntu+Mono:400,700" rel="stylesheet" type="text/css" />'))
display(HTML('<link rel="stylesheet" type="text/css" href="http://help.plot.ly/documentation/all_static/css/ipython-notebook-custom.css">'))

! pip install git+https://github.com/plotly/publisher.git --upgrade
import publisher
publisher.publish(
    'Train error vs Test error.ipynb', 'scikit-learn/plot-underfitting-overfitting/', 'Train Error vs Test Error| plotly',
    ' ',
    title = 'Train Error vs Test Error | plotly',
    name = 'Train Error vs Test Error',
    has_thumbnail='true', thumbnail='thumbnail/train-test-error.jpg', 
    language='scikit-learn', page_type='example_index',
    display_as='model_selection', order=3,
    ipynb= '~Diksha_Gabha/3416')

Collecting git+https://github.com/plotly/publisher.git
  Cloning https://github.com/plotly/publisher.git to /tmp/pip-YmjCkD-build
Installing collected packages: publisher
  Found existing installation: publisher 0.10
    Uninstalling publisher-0.10:
      Successfully uninstalled publisher-0.10
  Running setup.py install for publisher ... [?25l- done
[?25hSuccessfully installed publisher-0.10



The `IPython.nbconvert` package has been deprecated. You should import from nbconvert instead.


Did you "Save" this notebook before running this command? Remember to save, always save.

