#Supervised Learning#
In **Supervised Learning**, we have a dataset consisting of both features and labels.
The task is to construct an estimator which is able to predict the label of an object
given the set of features. A relatively simple example is predicting the species of 
iris given a set of measurements of its flower. This is a relatively simple task. 
Some more complicated examples are:

- given a multicolor image of an object through a telescope, determine
  whether that object is a star, a quasar, or a galaxy.
- given a photograph of a person, identify the person in the photo.
- given a list of movies a person has watched and their personal rating
  of the movie, recommend a list of movies they would like
  (So-called *recommender systems*: a famous example is the [Netflix Prize](http://en.wikipedia.org/wiki/Netflix_prize)).

What these tasks have in common is that there is one or more unknown
quantities associated with the object which needs to be determined from other
observed quantities.

Supervised learning is further broken down into two categories, **classification** and **regression**.
In classification, the label is discrete, while in regression, the label is continuous. For example,
in astronomy, the task of determining whether an object is a star, a galaxy, or a quasar is a
classification problem: the label is from three distinct categories. On the other hand, we might
wish to estimate the age of an object based on such observations: this would be a regression problem,
because the label (age) is a continuous quantity.

##Scikit Learn Interface##
In scikit-learn, almost all operations are done through an estimator object.

For example, a decision tree classifier can be instantiated as follows:

In [11]:
from sklearn import tree
dtree = tree.DecisionTreeClassifier()
print dtree

# Supervised Learning: Classification of Iris Data

By the end of this section you will

- Know how to instantiate a scikit-learn classifier
- Know how to train a classifier by calling the `fit(...)` method
- Know how to predict new labels by calling the `predict(...)` method

In this example we will perform classification of the iris data with several different classifiers.

## Linear Support Vector Classifier (SVC)

First we'll load the iris data as we did before:

In [16]:
from sklearn.datasets import load_iris
iris = load_iris()

In the iris dataset example, suppose we are assigned the task to guess
the class of an individual flower given the measurements of petals and
sepals. This is a *classification* task, hence we have:

In [24]:
X = iris.data
y = iris.target

print X.shape
print y.shape

(150, 4)
(150,)


Once the data has this format it is trivial to train a classifier, for instance NaiveBayes Classifier:

In [12]:
from sklearn.naive_bayes import GaussianNB

``GaussianNB`` is an example of a scikit-learn classifier.  If you're curious about how it is used, you can use ``ipython``'s ``"?"`` magic function to see the documentation:

In [13]:
?GaussianNB

The first thing to do is to create an instance of the classifier.  This can be done simply by calling the class name, with any arguments that the object accepts:

In [14]:
clf = GaussianNB()

``clf`` is a statistical model that has parameters that control the learning algorithm (those parameters are sometimes called the *hyperparameters*). Those hyperparameters can be supplied by the user in the constructor of the model. We will explain later how to choose a good combination using either simple empirical rules or data driven selection:

In [15]:
print clf

GaussianNB()


By default the model parameters are not initialized. They will be tuned automatically from the data by calling the ``fit`` method with the data ``X`` and labels ``y``:

In [18]:
clf = clf.fit(X, y)

We can now see some of the fit parameters within the classifier object.

**In scikit-learn, parameters defined by training have a trailing underscore.**

In [20]:
clf.class_prior_

array([ 0.33333333,  0.33333333,  0.33333333])

In [22]:
clf.sigma_

array([[ 0.121764,  0.142276,  0.029504,  0.011264],
       [ 0.261104,  0.0965  ,  0.2164  ,  0.038324],
       [ 0.396256,  0.101924,  0.298496,  0.073924]])

Once the model is trained, it can be used to predict the most likely outcome on unseen data. For instance let us define a list of simple sample that looks like the first sample of the iris dataset:

In [23]:
X_new = [[ 5.0,  3.6,  1.3,  0.25]]

clf.predict(X_new)

array([0])

All classification tasks involve predicting an unknown category based on observed features.

Some examples of interested classification tasks:

- **E-mail classification:** label email as spam, normal, priority mail
- **Language identification:** label documents as English, Spanish, German, etc.
- **News articles categorization:** label articles as business, technology, sports...
- **Sentiment analysis in customer feedback:** label feedback as negative, neutral, positive
- **Face verification in pictures:** label images as same / different person
- **Speaker verification in voice recordings:** label recording as same / different person
- **Astronomical Sources:** label object as star / quasar / galaxy

## Exercise: Using a Different Classifier

Now we'll take a few minutes and try out another learning model.  Because of ``scikit-learn``'s uniform interface, the syntax is identical to that of ``LinearSVC`` above.

There are many possibilities of classifiers; you could try any of the methods discussed at <http://scikit-learn.org/stable/supervised_learning.html>.  Alternatively, you can explore what's available in ``scikit-learn`` using just the tab-completion feature.  For example, import the ``linear_model`` submodule:

In [None]:
from sklearn import linear_model

And use the tab completion to find what's available.  Type ``linear_model.`` and then the tab key to see an interactive list of the functions within this submodule.  The ones which begin with capital letters are the models which are available.

Now select a new classifier and try out a classification of the iris data.

Some good choices are

- ``sklearn.naive_bayes.GaussianNB`` :
    Gaussian Naive Bayes model. This is an unsophisticated model which can be trained very quickly.
    It is often used to obtain baseline results before moving to a more sophisticated classifier.

- ``sklearn.svm.LinearSVC`` :
    Support Vector Machines without kernels based on liblinear

- ``sklearn.svm.SVC`` :
    Support Vector Machines with kernels based on libsvm

- ``sklearn.neighbors.NeighborsClassifier`` :
    k-Nearest Neighbors classifier based on the ball tree datastructure for low dimensional data and brute force search for high dimensional data

- ``sklearn.tree.DecisionTreeClassifier`` :
    A classifier based on a series of binary decisions.  This is another very fast classifier, which can be very powerful.

Choose one of the above, import it, and use the ``?`` feature to learn about it.

Now instantiate this model as we did with ``LinearSVC`` above.

Now use our data ``X`` and ``y`` to train the model, using the ``fit(...)`` method

Now call the ``predict`` method, and find the classification of ``X_new``.

## Probabilistic Prediction

Some models have additional prediction modes.  For example, if ``clf`` is a ``LogisticRegression`` classifier, then it is possible to do a probibilistic prediction for any point.  This can be done through the ``predict_proba`` function:

In [None]:
from sklearn.linear_model import LogisticRegression
clf2 = LogisticRegression()
clf2.fit(X, y)
print clf2.predict_proba(X_new)

The result gives the probability (between zero and one) that the test point comes from any of the three classes.

This means that the model estimates that the sample in X_new has:

- 90% likelyhood to belong to the ‘setosa’ class (``target = 0``)
- 9% likelyhood to belong to the ‘versicolor’ class (``target = 1``)
- < 1% likelyhood to belong to the ‘virginica’ class (``target = 2``)

Of course, the predict method that outputs the label id of the most likely outcome is also available:

In [None]:
clf2.predict(X_new)