# Logistic Regression
Documentation: [linear_model.LogisticRegression()](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression)

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.preprocessing import StandardScaler

## Recognizing hand written digits
The digits data set description is [here](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits). There are 64 features for each data point to represent the 64 pixels in an 8 x 8 image. We will build a logistic regression model to identify whether the digit is greater than 4 or not.

In [10]:
digits = datasets.load_digits()
print(digits.data.shape)
print(digits.target.shape)

(1797, 64)
(1797,)


`sklearn`'s `transform`'s `fit()` just calculates the parameters (e.g. $\mu$ and $\sigma$ in case of `StandardScaler`) and saves them as an internal objects state. Afterwards, you can call its `transform()` method to apply the transformation to a particular set of examples.

`fit_transform()` joins these two steps and is used for the initial fitting of parameters on the training set `X`, but it also returns a transformed `X`. Internally, it just calls `fit()` first and then `transform()` on the same data.

In [17]:
X, y = digits.data, digits.target
X = StandardScaler().fit_transform(X)

In [20]:
# classify small against large digits
y = (y > 4).astype(np.int)

`C` is the inverse of regularization strength and it must be a positive float. Smaller values specify stronger regularization. Logistic regression always needs a penalty to work, either $L^1$ or $L^2$ penalty. $L^1$ is similar to Lasso and $L^2$ is similar to Ridge.

In [30]:
# Set regularization parameter
for i, C in enumerate((100, 1, 0.01)):  # i = 0, 1, 2 and C = 100, 1, 0.01
    # turn down tolerance for short training time
    clf_l1_LR = LogisticRegression(C=C, penalty='l1', tol=0.01)
    clf_l2_LR = LogisticRegression(C=C, penalty='l2', tol=0.01)
    clf_l1_LR.fit(X, y)
    clf_l2_LR.fit(X, y)
    
    # ravel creates flattened numpy arrays
    coef_l1_LR = clf_l1_LR.coef_.ravel()
    coef_l2_LR = clf_l2_LR.coef_.ravel()

    # coef_l1_LR contains zeros due to the
    # L1 sparsity inducing norm
    
    # percentage of coefficients which are zero
    sparsity_l1_LR = np.mean(coef_l1_LR == 0) * 100
    sparsity_l2_LR = np.mean(coef_l2_LR == 0) * 100
    
    # score is the R^2 value
    print("C=%.2f" % C)
    print("Sparsity with L1 penalty: %.2f%%" % sparsity_l1_LR)
    print("score with L1 penalty: %.4f" % clf_l1_LR.score(X, y))
    print("Sparsity with L2 penalty: %.2f%%" % sparsity_l2_LR)
    print("score with L2 penalty: %.4f\n" % clf_l2_LR.score(X, y))

C=100.00
Sparsity with L1 penalty: 4.69%
score with L1 penalty: 0.9098
Sparsity with L2 penalty: 4.69%
score with L2 penalty: 0.9098

C=1.00
Sparsity with L1 penalty: 9.38%
score with L1 penalty: 0.9104
Sparsity with L2 penalty: 4.69%
score with L2 penalty: 0.9093

C=0.01
Sparsity with L1 penalty: 85.94%
score with L1 penalty: 0.8625
Sparsity with L2 penalty: 4.69%
score with L2 penalty: 0.8915

