The PCA does an unsupervised dimensionality reduction, while the logistic regression does the prediction.

#### New to Plotly?
Plotly's Python library is free and open source! [Get started](https://plot.ly/python/getting-started/) by downloading the client and [reading the primer](https://plot.ly/python/getting-started/).
<br>You can set up Plotly to work in [online](https://plot.ly/python/getting-started/#initialization-for-online-plotting) or [offline](https://plot.ly/python/getting-started/#initialization-for-offline-plotting) mode, or in [jupyter notebooks](https://plot.ly/python/getting-started/#start-plotting-online).
<br>We also have a quick-reference [cheatsheet](https://images.plot.ly/plotly-documentation/images/python_cheat_sheet.pdf) (new!) to help you get started!

### Version

In [1]:
import sklearn
sklearn.__version__

'0.18'

### Imports

This tutorial imports [Pipeline](http://scikit-learn.org/0.18/modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline) and [GridSearchCV](http://scikit-learn.org/0.18/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV).

In [2]:
import plotly.plotly as py
import plotly.graph_objs as go

import numpy as np
from sklearn import linear_model, decomposition, datasets
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV

### Calculations

In [3]:
print(__doc__)

logistic = linear_model.LogisticRegression()

pca = decomposition.PCA()
pipe = Pipeline(steps=[('pca', pca), ('logistic', logistic)])

digits = datasets.load_digits()
X_digits = digits.data
y_digits = digits.target


Automatically created module for IPython interactive environment


### PCA Spectrum Plot

In [4]:
pca.fit(X_digits)

trace1 = go.Scatter(y=pca.explained_variance_ , 
                    mode="lines", line=dict(
                    width=2,
                    color='blue'),
                    name="PCA Spectrum"
                   )
layout1 = go.Layout(xaxis=dict(
                    title="n_components"),
                    yaxis=dict(
                    title="explained_variance_"))
fig1 = go.Figure(data=[trace1], layout=layout1)
py.iplot(fig1, filename="PCA-Spectrum")


### Prediction Plot

In [5]:
n_components = [20, 40, 64]
Cs = np.logspace(-4, 4, 3)

#Parameters of pipelines can be set using ‘__’ separated parameter names:

estimator = GridSearchCV(pipe,
                         dict(pca__n_components=n_components,
                              logistic__C=Cs))

estimator.fit(X_digits, y_digits)
x_ = estimator.best_estimator_.named_steps['pca'].n_components

trace2 = go.Scatter(x = [x_ , x_], y=[0, 1],
                    mode="lines", line=dict(
                    width=2,
                    dash='dot'),
                    name="n_components chosen",
                   )
layout2 = go.Layout(showlegend=True)
fig2 = go.Figure(data=[trace2], layout=layout2)

py.iplot(fig2, filename = "Prediction")

### Combined Plot

In [6]:
trace2 = go.Scatter(x=[x_ , x_], y=[0, 178],
                    mode="lines", line=dict(
                    width=1,
                    dash='dot',
                    color="rgb(10 ,10 , 240)"),
                    name="n_components chosen",
                   )
layout3 = go.Layout(xaxis=dict(
                    title="n_components"),
                    yaxis=dict(
                    title="explained_variance_"))
fig3 = go.Figure(data=[trace1, trace2], layout=layout3)
py.iplot(fig3, filename="pipeline")

### License

Code source:
                
                Gaël Varoquaux



License:

                BSD 3 clause

In [None]:
from IPython.display import display, HTML

display(HTML('<link href="//fonts.googleapis.com/css?family=Open+Sans:600,400,300,200|Inconsolata|Ubuntu+Mono:400,700" rel="stylesheet" type="text/css" />'))
display(HTML('<link rel="stylesheet" type="text/css" href="http://help.plot.ly/documentation/all_static/css/ipython-notebook-custom.css">'))

! pip install git+https://github.com/plotly/publisher.git --upgrade
import publisher
publisher.publish(
    'pipeline.ipynb', 'scikit-learn/plot-digits-pipe/', 'Pipelining | plotly',
    'Pipelining: chaining a PCA and a logistic regression',
    title = 'Pipelining | plotly',
    name = 'Pipelining',
    has_thumbnail='true', thumbnail='thumbnail/pipeline.jpg', 
    language='scikit-learn', page_type='example_index',
    display_as='general_examples', order=4,ipynb='~Diksha_Gabha/2671')  

Collecting git+https://github.com/plotly/publisher.git
  Cloning https://github.com/plotly/publisher.git to /tmp/pip-20c0aH-build
