# Regression

We won't go over every model, in fact I will stick to as few as possible models but go over how they are used and what their commonalities are.

We will first start off by importing some toy data.

In [1]:
import sklearn.datasets as datasets

X, y = datasets.load_boston(return_X_y=True)

In [3]:
print y[0]

24.0


Next we will do the training. Models have two states:

1. Instantiated
2. Fit

When we instantiate the model we specify the hyperparameters of the model and nothing else. 

In [4]:
from sklearn import linear_model

linear_model.ElasticNet?

In [5]:
m = linear_model.ElasticNet(alpha=.1, l1_ratio=.9)

The next step is fitting the model

In [6]:
m.fit(X, y)

ElasticNet(alpha=0.1, copy_X=True, fit_intercept=True, l1_ratio=0.9,
      max_iter=1000, normalize=False, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [7]:
m.coef_

array([-0.09726289,  0.04968684, -0.03860629,  0.96761815, -0.        ,
        3.59541405, -0.0093816 , -1.16492427,  0.27719067, -0.01465229,
       -0.77597068,  0.01026115, -0.57628314])

In [8]:
m.intercept_

26.436572165133697

In [9]:
m.predict([X[0]])

array([ 30.77529215])

In [10]:
y[0]

24.0

In [11]:
m.score(X, y)

0.72671419328402376

In [12]:
m.score?

## CV models

Some of these models come with a CV model. 

In [13]:
linear_model.ElasticNetCV?

In [14]:
m = linear_model.ElasticNetCV(
    l1_ratio=[.1, .5, .7, .9, .95, .99, 1], 
    n_alphas=20)

In [15]:
m.fit(X, y)

ElasticNetCV(alphas=None, copy_X=True, cv=None, eps=0.001, fit_intercept=True,
       l1_ratio=[0.1, 0.5, 0.7, 0.9, 0.95, 0.99, 1], max_iter=1000,
       n_alphas=20, n_jobs=1, normalize=False, positive=False,
       precompute='auto', random_state=None, selection='cyclic',
       tol=0.0001, verbose=0)

In [16]:
m.alphas_

array([[  7.24820428e+03,   5.03889940e+03,   3.50300657e+03,
          2.43526493e+03,   1.69297864e+03,   1.17694655e+03,
          8.18204764e+02,   5.68810058e+02,   3.95432655e+02,
          2.74901933e+02,   1.91109843e+02,   1.32858186e+02,
          9.23620541e+01,   6.42094347e+01,   4.46379364e+01,
          3.10319718e+01,   2.15732033e+01,   1.49975355e+01,
          1.04261786e+01,   7.24820428e+00],
       [  1.44964086e+03,   1.00777988e+03,   7.00601313e+02,
          4.87052986e+02,   3.38595727e+02,   2.35389310e+02,
          1.63640953e+02,   1.13762012e+02,   7.90865309e+01,
          5.49803866e+01,   3.82219687e+01,   2.65716373e+01,
          1.84724108e+01,   1.28418869e+01,   8.92758728e+00,
          6.20639437e+00,   4.31464065e+00,   2.99950710e+00,
          2.08523573e+00,   1.44964086e+00],
       [  1.03545775e+03,   7.19842772e+02,   5.00429509e+02,
          3.47894990e+02,   2.41854091e+02,   1.68135222e+02,
          1.16886395e+02,   8.12585797e+01

In [17]:
m.mse_path_

array([[[  52.99352711,  154.44755359,  128.19461458],
        [  56.20888556,  154.44755359,  128.19461458],
        [  59.56775676,  140.16255422,  128.19461458],
        [  62.56212311,  130.01462871,  128.19461458],
        [  65.00930369,  122.90575445,  128.19461458],
        [  66.28730005,  117.98424417,   89.26949358],
        [  67.25358679,  114.65331176,   70.61239504],
        [  67.97056805,  112.38454687,   63.7980148 ],
        [  67.85053984,  110.83163868,   61.18157603],
        [  66.09102819,  107.9605525 ,   62.10261909],
        [  61.99689709,  103.17809863,   62.77477574],
        [  57.54356331,   98.64460928,   63.27678679],
        [  53.0199392 ,   94.4088015 ,   62.99800295],
        [  48.64224565,   90.29231151,   61.72902098],
        [  44.988768  ,   85.79807305,   60.63785356],
        [  40.75282972,   81.64986225,   60.67928842],
        [  36.39345971,   77.94892242,   61.02133599],
        [  32.3990042 ,   74.14840534,   61.93407876],
        [ 

In [18]:
m.alpha_

1.4496408567545191

In [19]:
m.l1_ratio_

0.5

In [20]:
m.predict([X[0]])

array([ 30.89195578])

In [21]:
m.score(X, y)

0.67126663940540476

# Classification

Okay this one is quite quick. And is very much so the same as the above. So to cut to the chase, I'll train a Cross Validated Logistic Regression.

In [22]:
X, y = datasets.load_iris(return_X_y=True)

In [25]:
d = datasets.load_iris()

print d.DESCR

Iris Plants Database

Notes
-----
Data Set Characteristics:
    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
    :Summary Statistics:

                    Min  Max   Mean    SD   Class Correlation
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20  0.76     0.9565  (high!)

    :Missing Attribute Values: None
    :Class Distribution: 33.3% for each of 3 classes.
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :Date: July, 1988

This is a copy of UCI ML iris d

In [26]:
linear_model.LogisticRegressionCV?

One thing you might notice here is that we have the option of parallelization!

In [27]:
m = linear_model.LogisticRegressionCV(Cs=10, n_jobs=2)

In [28]:
m.fit(X, y)

LogisticRegressionCV(Cs=10, class_weight=None, cv=None, dual=False,
           fit_intercept=True, intercept_scaling=1.0, max_iter=100,
           multi_class='ovr', n_jobs=2, penalty='l2', random_state=None,
           refit=True, scoring=None, solver='lbfgs', tol=0.0001, verbose=0)

In [29]:
m.coef_

array([[-0.28578072,  0.26274729, -1.01008572, -0.41231485],
       [-0.24956472, -2.77466308,  1.2883359 , -2.68062917],
       [-1.88077544, -2.43946122,  5.90967252,  7.34015856]])

In [30]:
m.predict([X[0]])

array([0])

In [31]:
y[0]

0

In [32]:
m.predict_proba([X[0]])

array([[  9.12028942e-01,   8.79710577e-02,   4.00346939e-14]])

In [33]:
m.predict_log_proba([X[0]])

array([[ -0.09208355,  -2.43074741, -30.84902997]])

In [34]:
m.score(X, y)

0.96666666666666667

In [35]:
m.score?