# scikit-learn

`scikit-learn` is the most popular machine learning packages in Python. The official site is [here](http://scikit-learn.org/stable/).

## An introduction to machine learning with scikit-learn

In [1]:
from sklearn import datasets

First we load the __iris__ and __digits__ datasets. A dataset is a dictionary-like object that holds all the data and some metadata about the data. This data is stored in the __.data__ member, which is a `n_samples`, `n_features` array. In the case of supervised problem, one or more response variables are stored in the __.target__ member.

In [2]:
iris = datasets.load_iris()
digits = datasets.load_digits()

For instance, in the case of the __digits__ dataset, __digits.data__ gives access to the features that can be used to classify the __digits__ samples and __digits.target__ gives the ground truth for the __digit__ dataset, that is the number corresponding to each digit image that we are trying to learn.

In [6]:
print(digits.data)

[[  0.   0.   5. ...,   0.   0.   0.]
 [  0.   0.   0. ...,  10.   0.   0.]
 [  0.   0.   0. ...,  16.   9.   0.]
 ..., 
 [  0.   0.   1. ...,   6.   0.   0.]
 [  0.   0.   2. ...,  12.   0.   0.]
 [  0.   0.  10. ...,  12.   1.   0.]]


In [7]:
digits.target

array([0, 1, 2, ..., 8, 9, 8])

In [10]:
iris.target

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [13]:
from sklearn import svm
clf = svm.SVC(gamma=0.001, C=100.)

In [15]:
clf.fit(digits.data[:-1], digits.target[:-1])

SVC(C=100.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=0.001, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [16]:
clf.predict(digits.data[-1:])

array([8])

To be continued ...

http://scikit-learn.org/stable/tutorial/basic/tutorial.html