# Introduction to Scikit-learn

You may need to have the [Scikit-learn Documentation](http://scikit-learn.org/stable/documentation.html) page handy. The [User Guide](http://scikit-learn.org/stable/user_guide.html) section covers many topics with examples.

In [1]:
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

In [2]:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
    digits.data, digits.target, random_state=0)

In [3]:
X_train.shape

(1347, 64)

In [4]:
np.bincount(y_train)

array([141, 139, 133, 138, 143, 134, 129, 131, 126, 133])

Really Simple API
-------------------
0) Import your model class

In [5]:
from sklearn.svm import LinearSVC

1) Instantiate an object and set the parameters

In [6]:
svm = LinearSVC()

2) Fit the model

In [7]:
svm.fit(X_train, y_train)

LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='squared_hinge', max_iter=1000,
     multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
     verbose=0)

3) Apply / evaluate

In [8]:
print(svm.predict(X_train))
print(y_train)

[2 9 9 ..., 7 7 8]
[2 8 9 ..., 7 7 8]


In [9]:
#compare predict(X_train) against y_train
svm.score(X_train, y_train)

0.98960653303637713

In [10]:
svm.score(X_test, y_test)

0.93111111111111111

And again
---------

In [11]:
from sklearn.ensemble import RandomForestClassifier

In [12]:
rf = RandomForestClassifier(n_estimators=50)

In [13]:
rf.fit(X_train, y_train)

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_split=1e-07, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            n_estimators=50, n_jobs=1, oob_score=False, random_state=None,
            verbose=0, warm_start=False)

In [14]:
rf.score(X_test, y_test)

0.96888888888888891

# Exercises
Load the iris dataset from the ``sklearn.datasets`` module using the ``load_iris`` function.

Split it into training and test set using ``train_test_split``.


Then train an evaluate a classifier of your choice. Try ``sklearn.neighbors.KNeighborsClassifier`` for example.


In [None]:
# %load solutions/train_iris.py