This example shows how to perform univariate feature selection before running a SVC (support vector classifier) to improve the classification scores.

#### New to Plotly?
Plotly's Python library is free and open source! [Get started](https://plot.ly/python/getting-started/) by downloading the client and [reading the primer](https://plot.ly/python/getting-started/).
<br>You can set up Plotly to work in [online](https://plot.ly/python/getting-started/#initialization-for-online-plotting) or [offline](https://plot.ly/python/getting-started/#initialization-for-offline-plotting) mode, or in [jupyter notebooks](https://plot.ly/python/getting-started/#start-plotting-online).
<br>We also have a quick-reference [cheatsheet](https://images.plot.ly/plotly-documentation/images/python_cheat_sheet.pdf) (new!) to help you get started!

### Version

In [1]:
import sklearn
sklearn.__version__

'0.18.1'

### Imports

This tutorial imports [Pipeline](http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline) and [cross_val_score](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html#sklearn.model_selection.cross_val_score).

In [2]:
print(__doc__)

import plotly.plotly as py
import plotly.graph_objs as go

import numpy as np
from sklearn import svm, datasets, feature_selection
from sklearn.model_selection import cross_val_score
from sklearn.pipeline import Pipeline

Automatically created module for IPython interactive environment


### Calculations

Import some data to play with


In [3]:
digits = datasets.load_digits()
y = digits.target
# Throw away data, to be in the curse of dimension settings
y = y[:200]
X = digits.data[:200]
n_samples = len(y)
X = X.reshape((n_samples, -1))
# add 200 non-informative features
X = np.hstack((X, 2 * np.random.random((n_samples, 200))))

Create a feature-selection transform and an instance of SVM that we combine together to have an full-blown estimator


In [4]:
transform = feature_selection.SelectPercentile(feature_selection.f_classif)

clf = Pipeline([('anova', transform), ('svc', svm.SVC(C=1.0))])

### Plot Results

In [7]:
score_means = list()
score_stds = list()
percentiles = (1, 3, 6, 10, 15, 20, 30, 40, 60, 80, 100)

for percentile in percentiles:
    clf.set_params(anova__percentile=percentile)
    # Compute cross-validation score using 1 CPU
    this_scores = cross_val_score(clf, X, y, n_jobs=1)
    score_means.append(this_scores.mean())
    score_stds.append(this_scores.std())

p1 = go.Scatter(x=percentiles, y=score_means, 
                mode='lines',
                error_y = dict(visible=True, 
                              arrayminus=np.array(score_stds)))

layout = go.Layout(title=
                   'Performance of the SVM-Anova varying the percentile of features selected',
                   xaxis=dict(title='Percentile'),
                   yaxis=dict(title='Prediction rate'))
fig = go.Figure(data=[p1], layout=layout)

In [6]:
py.iplot(fig)

In [10]:
from IPython.display import display, HTML

display(HTML('<link href="//fonts.googleapis.com/css?family=Open+Sans:600,400,300,200|Inconsolata|Ubuntu+Mono:400,700" rel="stylesheet" type="text/css" />'))
display(HTML('<link rel="stylesheet" type="text/css" href="http://help.plot.ly/documentation/all_static/css/ipython-notebook-custom.css">'))

! pip install git+https://github.com/plotly/publisher.git --upgrade
import publisher
publisher.publish(
    'SVM-Anova SVM with Univariate Feature Selection.ipynb', 'scikit-learn/plot-svm-anova/', 'SVM-Anova SVM with Univariate Feature Selection | plotly',
    ' ',
    title = 'SVM-Anova SVM with Univariate Feature Selection | plotly',
    name = 'SVM-Anova SVM with Univariate Feature Selection',
    has_thumbnail='true', thumbnail='thumbnail/svm-anova.jpg', 
    language='scikit-learn', page_type='example_index',
    display_as='vector_machines', order=5,
    ipynb= '~Diksha_Gabha/3553')

Collecting git+https://github.com/plotly/publisher.git
  Cloning https://github.com/plotly/publisher.git to /tmp/pip-OjH1B3-build
Installing collected packages: publisher
  Found existing installation: publisher 0.10
    Uninstalling publisher-0.10:
      Successfully uninstalled publisher-0.10
  Running setup.py install for publisher ... [?25l- done
[?25hSuccessfully installed publisher-0.10
