# scikit-learn with radanalytics.io

This brief notebook will show you how to install packages in the radanalytics.io Jupyter notebook image (in this case, scikit-learn).  Once we've installed scikit-learn, we'll demonstrate using it by [running through some of its tutorial](http://scikit-learn.org/stable/tutorial/basic/tutorial.html).  If you have packages that you wind up using often, you'll probably want to build them in to your notebook image, but it's handy to be able to install packages without building a new image while you're trying new things.

First, we'll use `pip install --user` to install scikit-learn in our user's home directory:

In [1]:
!pip install --user scikit-learn

Collecting scikit-learn
  Using cached https://files.pythonhosted.org/packages/c4/b8/eb447f84e0012b0bce97d12d1bc6ea6882b4ed9eb7faaca00e8f627733fb/scikit_learn-0.19.1-cp27-cp27mu-manylinux1_x86_64.whl
Installing collected packages: scikit-learn
Successfully installed scikit-learn-0.19.1
[33mYou are using pip version 9.0.1, however version 10.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


Next, we'll make sure that our local `site-packages` directory is in our module search path:

In [2]:
import sys
sys.path.append("/home/nbuser/.local/lib/python2.7/site-packages")

Now we'll load two of the example datasets included with scikit-learn:  the iris and digits datasets.  Once we've loaded these, we can inspect them:

In [3]:
from sklearn import datasets

iris = datasets.load_iris()
digits = datasets.load_digits()
print(iris)
print(digits)

{'target_names': array(['setosa', 'versicolor', 'virginica'], 
      dtype='|S10'), 'data': array([[ 5.1,  3.5,  1.4,  0.2],
       [ 4.9,  3. ,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2],
       [ 4.6,  3.1,  1.5,  0.2],
       [ 5. ,  3.6,  1.4,  0.2],
       [ 5.4,  3.9,  1.7,  0.4],
       [ 4.6,  3.4,  1.4,  0.3],
       [ 5. ,  3.4,  1.5,  0.2],
       [ 4.4,  2.9,  1.4,  0.2],
       [ 4.9,  3.1,  1.5,  0.1],
       [ 5.4,  3.7,  1.5,  0.2],
       [ 4.8,  3.4,  1.6,  0.2],
       [ 4.8,  3. ,  1.4,  0.1],
       [ 4.3,  3. ,  1.1,  0.1],
       [ 5.8,  4. ,  1.2,  0.2],
       [ 5.7,  4.4,  1.5,  0.4],
       [ 5.4,  3.9,  1.3,  0.4],
       [ 5.1,  3.5,  1.4,  0.3],
       [ 5.7,  3.8,  1.7,  0.3],
       [ 5.1,  3.8,  1.5,  0.3],
       [ 5.4,  3.4,  1.7,  0.2],
       [ 5.1,  3.7,  1.5,  0.4],
       [ 4.6,  3.6,  1. ,  0.2],
       [ 5.1,  3.3,  1.7,  0.5],
       [ 4.8,  3.4,  1.9,  0.2],
       [ 5. ,  3. ,  1.6,  0.2],
       [ 5. ,  3.4,  1.6,  0.4],
       [ 5.2,  3.

We can then train a multiclass classifier based on support vector machines to identify different digits:

In [4]:
from sklearn import svm
clf = svm.SVC(gamma=0.001, C=100.)

clf.fit(digits.data[:-1], digits.target[:-1]) 

SVC(C=100.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=0.001, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

Finally, we can predict the digit corresponding to a particular example:

In [5]:
clf.predict(digits.data[-1:])

array([8])