In [2]:
from sklearn import svm
from sklearn import datasets

In the case of the digits dataset, the task is to predict, given an image, which digit it represents. We are given samples of each of the 10 possible classes (the digits zero through nine) on which we fit an estimator to be able to predict the classes to which unseen samples belong.

In scikit-learn, an estimator for classification is a Python object that implements the methods fit(X, y) and predict(T).

An example of an estimator is the class sklearn.svm.SVC that implements support vector classification. The constructor of an estimator takes as arguments the parameters of the model, but for the time being, we will consider the estimator as a black box

In [3]:
digits = datasets.load_digits()

In [4]:
print(digits.data)

[[ 0.  0.  5. ...  0.  0.  0.]
 [ 0.  0.  0. ... 10.  0.  0.]
 [ 0.  0.  0. ... 16.  9.  0.]
 ...
 [ 0.  0.  1. ...  6.  0.  0.]
 [ 0.  0.  2. ... 12.  0.  0.]
 [ 0.  0. 10. ... 12.  1.  0.]]


In [5]:
#response variable stored as follows
print(digits.target)

[0 1 2 ... 8 9 8]


In [6]:
clf = svm.SVC(gamma=0.001,C=100.)
#Generate the support vector classification
clf.fit(digits.data[:-1],digits.target[:-1])

SVC(C=100.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=0.001, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

We used all the digits aside from the last one to train the classifier. Lets see how well it predicts the last
digit

In [7]:
clf.predict(digits.data[-1:])

array([8])

In [8]:
digits.target[-1]

8

It was able to predict the digit!

## Saving models 

It is possible to save a model using pickle

In [12]:
from sklearn.externals import joblib
import pickle

clf = svm.SVC()
iris = datasets.load_iris()
x,y = iris.data, iris.target

#Fit the training dataset (in this case, its the whole dataset)
clf.fit(x,y)

s = pickle.dumps(clf)
clf2 = pickle.loads(s)


#dump to a disk and then reload the .pkl file 
joblib.dump(clf,'test.pkl')
clf = joblib.load('test.pkl')

## Conventions 

Most classifiers have a .fit() and a .predict() method