### OLS

* coefficients w = (w1,...wn) to minimize residual sum of squares
* relies on model terms being independent.
* complexity = O(n*p^2) for matrix of size (n,p)

[API](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression) |
[example](plot_ols.ipynb)

In [4]:
from sklearn import linear_model
reg = linear_model.LinearRegression()
reg.fit ([[0, 0], [1, 1], [2, 2]], [0, 1, 2])

reg.coef_

array([ 0.5,  0.5])

### RIDGE

* Uses alpha param to control shrinkage.
* As alpha grows, shinkage grows == coefficients are more tolerant of collinearity.
* Same complexity as OLS.

[API](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge) |
[ridge path vs alpha](plot_ridge_path.ipynb) | 
[OLS ridge variance](plot_ols_ridge_variance.ipynb)

In [5]:
from sklearn import linear_model
reg = linear_model.Ridge (alpha = .5)
reg.fit ([[0, 0], [0, 0], [1, 1]], [0, .1, 1]) 

reg.coef_, reg.intercept_ 

(array([ 0.34545455,  0.34545455]), 0.13636363636363641)

### RIDGE WITH CV

* Uses generalized cross-validation (GCV) = leave-one-out CV.

[API](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html#sklearn.linear_model.RidgeCV) |
[multi-out face completion](plot_multioutput_face_completion.ipynb)

In [6]:
from sklearn import linear_model
reg = linear_model.RidgeCV(alphas=[0.1, 1.0, 10.0])
reg.fit([[0, 0], [0, 0], [1, 1]], [0, .1, 1])       

reg.alpha_  

0.10000000000000001

### LASSO

* Linear model, estimates sparse coefficients.
* Useful in compressed sensing, also can be used for feature selection.
* Uses coordinate descent as the fitting algorithm.

[API](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html#sklearn.linear_model.Lasso) |
[sparse signals: lasso v elasticnet](plot_lasso_and_elasticnet.ipynb) | [lasso compression sensing](plot_tomography_l1_reconstruction.ipynb) | [lasso model selection](plot_lasso_model_selection.ipynb)

In [7]:
from sklearn import linear_model
reg = linear_model.Lasso(alpha = 0.1)
reg.fit([[0, 0], [1, 1]], [0, 1])

reg.predict([[1, 1]])

array([ 0.8])

### LASSO (MULTITASK)

* sparse coefficients, multi regressions
* y = 2D (#samples, #tasks); #tasks = same for all samples
* Trained with mixed l1,l2 prior as regularizer.
* Uses [Frobenius norm](https://en.wikipedia.org/wiki/Matrix_norm) in objective function.

[API](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.MultiTaskLasso.html#sklearn.linear_model.MultiTaskLasso) |
[joint feature selection](plot_multi_task_lasso_support.ipynb)

### ELASTICNET

* Trained with l1,l2 prior as regularizer.
* Replicates regularization properties of [Ridge](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge).
* `l1_ratio` = l1,l2 convex combo
* Use case: linear regression, #multiple correlated features.
* alpha, l1_ratio can be set by [cross validation](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNetCV.html#sklearn.linear_model.ElasticNetCV): 

[API](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html#sklearn.linear_model.ElasticNet) | [sparse signals: elasticnet v lasso](plot_lasso_and_elasticnet.ipynb) | [coordinate descent path - lasso](plot_lasso_coordinate_descent_path.ipynb) | [multi-task lasso (API)](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.MultiTaskElasticNet.html#sklearn.linear_model.MultiTaskElasticNet)


### ELASTICNET (MULTITASK)

* sparse coefficients, multi regressions
* y= 2D (#samples, #tasks); #tasks = same for all samples
* Uses coordinate descent for fitting
* alpha, l1_ratio ca be set by [cross_validation](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.MultiTaskElasticNetCV.html#sklearn.linear_model.MultiTaskElasticNetCV).

[API](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.MultiTaskElasticNet.html#sklearn.linear_model.MultiTaskElasticNet) |
[demo](plot_multi_task_lasso_support.ipynb)

### LEAST ANGLE REGRESSION (LARS)

* For high-D datasets (p>>n)
* Same complexity as OLS
* Full piecewise linear solution path (good for CVs)
* Noise sensitivity?

[API](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lars.html#sklearn.linear_model.Lars)

In [8]:
from sklearn import linear_model
clf = linear_model.Lars(n_nonzero_coefs=1)
clf.fit([
        [-1, 1], [0, 0], [1, 1]], 
        [-1.1111, 0, -1.1111])
print(clf.coef_) 

[ 0.     -1.1111]


### LARS LASSO

* Lasso model / LARS algo
* returns solution curve for each l1 norm value
* stored in coef_path_ (#features, max_features+1). 1st col always zero.

[api](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoLars.html#sklearn.linear_model.LassoLars) | [find lasso path on lars algo, diabetes dataset](plot_lasso_lars.ipynb)

In [9]:
from sklearn import linear_model
reg = linear_model.LassoLars(alpha=.1)
reg.fit([[0, 0], [1, 1]], [0, 1])  
reg.coef_ 

array([ 0.71715729,  0.        ])

### OMP

* uses l0 pseudonorm
* finds "atom most correlated with current residual"
* "residual is recomputed using orthogonal projection...
* "...on space of prev chosen dict elements"

[OMP](https://en.wikipedia.org/wiki/Matching_pursuit) |
[API](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.OrthogonalMatchingPursuit.html#sklearn.linear_model.OrthogonalMatchingPursuit) |
[example](plot_omp.ipynb)

### BAYES REGRESSION, BAYES RIDGE REGRESSION

* introduces [uninformative priors](https://en.wikipedia.org/wiki/Non-informative_prior#Uninformative_priors)
* find probabilistic model; w = spherical Gaussian
* alpha, delta = [gamma distrubtions](https://en.wikipedia.org/wiki/Gamma_distribution)

[api](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.BayesianRidge.html#sklearn.linear_model.BayesianRidge) | [example](plot_bayesian_ridge.ipynb)

In [10]:
from sklearn import linear_model
X = [[0., 0.], [1., 1.], [2., 2.], [3., 3.]]
Y = [0., 1., 2., 3.]
reg = linear_model.BayesianRidge()
reg.fit(X, Y)

print(reg.predict ([[1, 0.]]))
print(reg.coef_)

[ 0.50000013]
[ 0.49999993  0.49999993]


### AUTO RELEVANCE DETERMINATION (ARD)

* similar to BayesRidge, can lead to sparser weights
* drops assumption of Gaussian being spherical
* assumes instead: axis-parallel, elliptical, Gaussian
* each coord in w has unique deviation

[example](plot_ard.ipynb)

### LOGISTIC REGRESSION

* also: logit regression, max-entropy classifier, log-linear classifier
* [logistic function](https://en.wikipedia.org/wiki/Logistic_function)
* can fit binary, 1vsRest or multinomial LR with optional l1,l2
* small dataset or l1 penalty:'liblinear'
* large dataset or multinomial loss: 'newton-cg'
* vlrge dataset: 'sag'

[API](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression) |
[L1 vs sparsity, digit images](plot_logistic_l1_l2_sparsity.ipynb) | [logistic path, IRIS images](plot_logistic_path.ipynb) | [decision surface, multi-nomial vs one-vs-rest, make_blobs()](plot_logistic_multinomial.ipynb)

[CTR prediction w/ LR](https://turi.com/learn/gallery/notebooks/click_through_rate_prediction_intro.html)

### LOGISTIC REGRESSION w/ CV

[API](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegressionCV.html#sklearn.linear_model.LogisticRegressionCV)



### STOCHASTIC GRADIENT DESCENT

* Use case: #samples and/or #features = very large (10^5 fine)
* multiple convex loss funcs & penalties
* SGDclassifier + loss="log" ==> logistic regression
* SGDclassifier + loss="hinge" ==> linear SVM

[notebook](stochastic-gradient-descent-SGD.ipynb) | [SGD classifer (API)](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html#sklearn.linear_model.SGDClassifier) | [SGD regressor (API)](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html#sklearn.linear_model.SGDRegressor)

In [11]:
from sklearn.linear_model import SGDClassifier
X = [[0., 0.], [1., 1.]]
y = [0, 1]
clf = SGDClassifier(loss="hinge", penalty="l2")
clf.fit(X, y)

print("predict: ",clf.predict([[2., 2.]]))
print("coef:    ",clf.coef_)
print("intercpt:",clf.intercept_)
#print("decision function: ",clf.decision_function([[2.,2.]])

predict:  [1]
coef:     [[ 9.91080278  9.91080278]]
intercpt: [-9.99002993]


[SGD: max margin separating hyperplane](plot_sgd_separating_hyperplane.ipynb) |
[SGD: multiclass, iris dataset](plot_sgd_iris.ipynb) |
[SGD: weighted samples](plot_sgd_weighted_samples.ipynb) |
[SGD: various online solvers](plot_sgd_comparison.ipynb) |
[SVM: unbalanced classes](plot_separating_hyperplane_unbalanced.ipynb)

[SGD: sparse data: text doc classification](document_classification_20newsgroups.ipynb)

### PERCEPTRON

[API](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Perceptron.html#sklearn.linear_model.Perceptron)

### PASSIVE-AGGRESSIVE ALGOS

[Classifier (API)](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.PassiveAggressiveClassifier.html#sklearn.linear_model.PassiveAggressiveClassifier) | 
[Regressor (API)](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.PassiveAggressiveRegressor.html#sklearn.linear_model.PassiveAggressiveRegressor)

[Classifier example](plot_out_of_core_classification.ipynb) | 

### RANSAC (ROBUSTNESS TO OUTLIERS)

[example](plot_ransac.ipynb) | [linear estimator fit, sine function](plot_robust_fit.ipynb)

### THEIL-SAN ESTIMATOR

[example](plot_theilsen.ipynb)

### HUBER REGRESSOR

[HR vs Ridge, toy dataset](plot_huber_vs_ridge.ipynb)

### POLYNOMIAL REGRESSION

In [12]:
from sklearn.preprocessing import PolynomialFeatures
import numpy as np
X = np.arange(6).reshape(3, 2)
X
poly = PolynomialFeatures(degree=2)
poly.fit_transform(X)


array([[  1.,   0.,   1.,   0.,   0.,   1.],
       [  1.,   2.,   3.,   4.,   6.,   9.],
       [  1.,   4.,   5.,  16.,  20.,  25.]])

In [13]:
# preprocessing streamlined with pipeline tools
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
import numpy as np

model = Pipeline([('poly', PolynomialFeatures(degree=3)),
                  ('linear', LinearRegression(fit_intercept=False))])

# fit to an order-3 polynomial data
x = np.arange(5)
y = 3 - 2 * x + x ** 2 - x ** 3
model = model.fit(x[:, np.newaxis], y)
model.named_steps['linear'].coef_

array([ 3., -2.,  1., -1.])

In [14]:
# just checking "interaction features"

from sklearn.linear_model import Perceptron
from sklearn.preprocessing import PolynomialFeatures
import numpy as np
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = X[:, 0] ^ X[:, 1]
y

X = PolynomialFeatures(interaction_only=True).fit_transform(X).astype(int)
X

clf = Perceptron(fit_intercept=False, n_iter=10, shuffle=False).fit(X, y)
clf.predict(X)

clf.score(X, y)


1.0