# Lab 6: Classification in Python
___

In this lab, we will conduct classification of iris using the features in a toy data. This is a __multi-class__ problem. For an $n$-class problem, we can solve using two different approaches:
* __One-versus-rest (OVR)__ approach
* __Multinomial approach__. That is, we need to compute the probability for each class, which sums to 1.

First we need to load the iris data:

In [1]:
from sklearn.datasets import load_iris
iris = load_iris()

In the iris dataset example, the task is to guess the class of an individual flower given the measurements of petals and sepals. This is a classification task, hence we have:

In [2]:
X, y = iris.data, iris.target

Once the data has this format it is trivial to train a classifier, for instance a __support vector machine (SVM) with a linear kernel__:

In [4]:
from sklearn.svm import LinearSVC

__LinearSVC__ is one of the classification algorithms in the package `scikit-learn`. If you want to know the usage of the algorithm, please enter:

In [30]:
LinearSVC?

The first thing to do is to create an instance of the classifier. This can be done simply by calling the class name, with any arguments that the object accepts:

In [9]:
clf = LinearSVC(loss="squared_hinge")

`clf` is a statistical model that has parameters that control the learning algorithm (those parameters are sometimes called the __hyperparameters__). Those hyperparameters can be supplied by the user in the constructor of the model. We will explain later how to choose a good combination using either simple empirical rules or data driven selection:

In [10]:
clf

LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='squared_hinge', max_iter=1000,
     multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
     verbose=0)

By default the model parameters are not initialized. They will be tuned automatically from the data by calling the fit method with the data $X$ and labels $y$:

In [11]:
clf.fit(X, y)

LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='squared_hinge', max_iter=1000,
     multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
     verbose=0)

Now the model parameters have been computed base on the observations:

In [12]:
clf.coef_

array([[ 0.18423387,  0.45122682, -0.80794191, -0.45071612],
       [ 0.04371718, -0.8799931 ,  0.40560728, -0.93922544],
       [-0.85065319, -0.98664935,  1.38091276,  1.86537937]])

In [14]:
X.shape

(150, 4)

In [15]:
clf.intercept_

array([ 0.10955916,  1.68122062, -1.70975493])

Once the model is trained, it can be used to predict the most likely outcome on unseen data. For instance let us define a list of simple sample that looks like the first sample of the iris dataset:

In [16]:
X_new = [[ 5.0,  3.6,  1.3,  0.25]]

clf.predict(X_new)

array([0])

We can compute the prediction accuracy based on the predictions as well as the observations.

In [81]:
(clf.predict(X)==y).mean()

0.96666666666666667

In [80]:
clf.score(X, y)

0.96666666666666667

In [19]:
iris.target_names

array(['setosa', 'versicolor', 'virginica'], 
      dtype='|S10')

This means that the new observation is predicted to be `setosa`, right?

## Using a different classifier

Now we'll try another learning model. Because of scikit-learn's uniform interface, the syntax is identical to that of LinearSVC above. There are many possibilities of classifiers; you could try any of the methods discussed at http://scikit-learn.org/stable/supervised_learning.html. Alternatively, you can explore what's available in scikit-learn using just the tab-completion feature. For example, import the linear_model submodule:

The possible choice of models can be:
* `sklearn.linear_model.LogisticRegression`
* `sklearn.linear_model.SGDClassifier`
* 

In [60]:
import sklearn

In [65]:
sklearn.linear_model.RidgeClassifier?

In [66]:
import scipy

In [68]:
scipy.stats.friedmanchisquare?

In [83]:
scipy.optimize.minimize?

### Logistic regression

Linear regression works on a continuum of numeric estimates. In order to classify correctly, you need a more suitable measure, such as the probability of class ownership. Thanks to the following formula, you can transform a linear regression numeric estimate into a probability that is more apt to describe how a class fits an observation:
$$
P(y=1) = e^r / (1+e^r)
$$
$r$ is the regression result (the sum of the variables weighted by the coefficients). A linear regression using such a formula (also called a link function) for transforming its results into probabilities is a logistic regression.

In [31]:
from sklearn.linear_model import LogisticRegression

In [51]:
LogisticRegression?

From the help manual, we can see that for logistic regression of multiclass problem, we can either set `multi_class="multinomial"` or `multi_class="ovr"` (one versus the rest). Also note that the optimization strategy should be set accordingly.

Here we set `multi_class="multinomial"` (多项式分布) and `solver=lbfgs` (limited-memory Broyden-Fletcher-Goldfarb-Shanno, an optimization algorithm in the family of quasi-Newton to approximate the BFGS algorithm using a limited amount of memory). To get more information, you can refer to [Wiki](https://en.wikipedia.org/wiki/Limited-memory_BFGS).

In [38]:
clf2 = LogisticRegression(multi_class="multinomial", solver="lbfgs")

In [39]:
clf2

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='multinomial',
          n_jobs=1, penalty='l2', random_state=None, solver='lbfgs',
          tol=0.0001, verbose=0, warm_start=False)

In [40]:
clf2.fit(X, y)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='multinomial',
          n_jobs=1, penalty='l2', random_state=None, solver='lbfgs',
          tol=0.0001, verbose=0, warm_start=False)

In [41]:
clf2.coef_

array([[-0.42337025,  0.9617125 , -2.51931371, -1.0861323 ],
       [ 0.53407096, -0.31787524, -0.20536783, -0.93955846],
       [-0.11070072, -0.64383727,  2.72468154,  2.02569076]])

In [42]:
clf2.intercept_

array([  9.88063247,   2.2191727 , -12.09980517])

With this fitted model, we can predict the class for a given new observation.

In [43]:
X_new = [[ 5.0,  3.6,  1.3,  0.25]]

clf2.predict(X_new)

array([0])

Or we can use `one-versus-rest` approach:

In [55]:
clf2 = LogisticRegression(multi_class="ovr", solver="liblinear")

In [56]:
clf2.fit(X, y)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

In [57]:
clf2.coef_, clf2.intercept_

(array([[ 0.41498833,  1.46129739, -2.26214118, -1.0290951 ],
        [ 0.41663969, -1.60083319,  0.57765763, -1.38553843],
        [-1.70752515, -1.53426834,  2.47097168,  2.55538211]]),
 array([ 0.26560617,  1.08542374, -1.21471458]))

In [58]:
clf2.predict(X_new)

array([0])

Unlike LinearSVC, logistic regression doesn’t just output the resulting class (in this case, the class 0), but it also estimates the probability of the observation’s being part of all three classes. Based on the observation used for prediction, logistic regression estimates a probability of 91 percent of its being from class 0 — a very high probability, but still not a perfect score, therefore leaving a small margin of uncertainty.

In [69]:
clf2.predict_proba(X_new)

array([[  9.07512928e-01,   9.24770379e-02,   1.00343962e-05]])

Similarly, logistic regression can also compute the prediction accuracy for the model: 

In [82]:
clf2.score(X, y)

0.95999999999999996

In fact, accuracy is not a best metric for assessing the performance of the model prediction. Besides, for binary classification, we also have sensitivity, specificity, and also the AUROC (Area Under the Receiver's Operation Curve). For multi-class (k>2), we have F1 score, and etc.

But we don't have much time to mention this. If you have interest, you can refer to the textbook for machine learning.