# **Linear Regression**

In [1]:
# DATA

X_train = []
y_train = []
X_test = []
y_test = []

### **How to build baseline regression model**

`Dummy Regressor` helps in creating a 'baseline' for regression.

In [None]:
from sklearn.dummy import DummyRegressor

In [None]:
dummy_regr = DummyRegressor(strategy = 'mean')
dummy_regr.fit(X_train, y_train)
dummy_regr.predict(X_test)
dummy_regr.score(X_test, y_test) #score returns R^2 or coeff of determination

* It makes a prediction as specified by the `strategy`
* Strategy is based on some statistical property of the training set or user specified value.

  `Strategy = ['mean', 'median', 'quantile', 'constant']`

### **How is Linear Regression model trained?**

**Step-1** : Instantiate `object` of a suitable linear regression estimator from one of the following two options:
  * Normal Equation (`LinearRegression()`)
  * Iterative Optimization (`SGDRegressor()`)


In [None]:
# NORMAL EQUATION

from sklearn.linear_model import LinearRegression
LR = LinearRegression()

In [None]:
# ITERATIVE OPTIMIZATON

from sklearn.linear_model import SGDRegressor
LR = SGDRegressor()

**Step-2** : Call `fit` method on linear regression object with training feature matrix and label vector as arguments

Both the _feature matrix_ and _label vector_ comes from the **TRAINING SET**

Works for both single and multi-output regression

In [None]:
# Model training with feature matrix 'X_train' and label vector or matrix 'y_train'

LR.fit(X_train, y_train)

#### **SGDRegressor Estimator**

* Implements stochastic gradient descent
* Use for large training set up (>10k samples)
* Provides greater control on optimization process through provision for hyperparameter settings.

  * **Loss Parameter:**
    * `loss = 'squared error'`
    * `loss = 'huber'`

  * **Penalty parameter:**
    * `penalty = 'l1'`
    * `penalty = 'l2'`
    * `penalty = 'elasticnet'` _(it is a convex combination of l1 and l2 regularization)_

  * **Learning Rate parameter**
    * `learning_rate = 'constant'`
    * `learning_rate = 'optimal'`
    * `learning_rate = 'invscaling'` _(default)_
    * `learning_rate = 'adaptive'`

  * **Early Stopping** _(helps us to stop the iterations of SGDRegressor)_
    * `early_stopping = 'True'`
    * `early_stopping = 'False'`




##### Random Seed

* The random seed is a starting point for generating a sequence of random numbers. Setting a random seed ensures that the same sequence of random numbers is generated every time you run your code.

* It's a good idea to use a **random seed** of your choice while instantiating SGDRegressor object. It helps us get reproducible results.

* Set `random_state` to seed of your choice.

In [None]:
from sklearn.linear_model import SGDRegressor
LR = SGDRegressor(random_state = 30)

##### How to perform feature scaling for SGDRegressor?

* SGD is sensitive to feature scaling, so it is highly recommended to scale input feature matrix.

In [None]:
from sklearn.linear_model import SGDRegressor
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

sgd = Pipeline([
    ('feature_scaling', StandardScaler()),
    ('sgd_regressor', SGDRegressor())
])

sgd.fit(X_train, y_train)

* Feature scaling is not needed for word frequencies and indicator features as they have intrinsic scale

* Features extracted using PCA should be scaled by some constant c such that the average L2 norm of the training data equals 1.

##### How to shuffle training data after each epoch in SGDRegressor?

In [None]:
from sklearn.linear_model import SGDRegressor
LR = SGDRegressor(shuffle = True) #data will be shuffled before every epoch in the SGDRegressor

##### How to set learning rate in SGDRegressor?

* Three different learning rates:

  `learning_rate = ['constant', 'invscaling', 'adaptive']`

* Default Setting:

  `learning_rate = 'invscaling' , eta0 = 1e-2 , power_t = 0.25`

* Learning Rate reduces after every iteration:

   η = $\frac {η_0} {t^{power-t}} $

* You can make changes to these parameters to speed up or slow down the training process

##### How to set constant learning rate?

In [None]:
from sklearn.linear_model import SGDRegressor
LR = SGDRegressor( learning_rate = 'constant', eta0 = 1e-2)

* Constant learning rate `eta0 = 1e-2` is used throughout the training

##### How to set adaptive learning rate?

In [None]:
from sklearn.line_model import SGDRegressor
LR = SGDRegressor(learning_rate = 'adaptive', eta0 = 1e-2)

* The learning rate is kept to initial value as long as the training loss decreases.

* When the stopping criterion is reached, the learning rate is divided by 5, and the training loop continues.

* The algorithm stops when the learning rate goes below ${10^{-6}}$

##### How to set number of epochs in SGDRegressor?

* Set max_iter to desired number of epochs. The default value is 1000.

In [None]:
from sklearn.linear_model import SGDRegressor
LR = SGDRegressor(max_iter = 100)

* Remember one epoch is one full pass over the training data.

* SGD converges after observing approximately $10^6$ training samples. Thus a reasonable first guess for the number of iterations for n sampled training set is
  
  `max_iter = np.ceil( 10^6 / n)`

##### How to use set stopping criteria in SGDRegressor?

* Option-1:

  `tol` , '`n_iter_no_change` , `max_iter`

  Here the SGDRegressor stops
  * when the training loss does not improve (loss > best_loss --> `tol`) for `n_iter_no_change` consecutive epochs
  * else after a maximum number of iteration `max_iter`

In [None]:
from sklearn.linear_model import SGDRegressor
LR = SGDRegressor(loss = 'squared_error', max_iter = 500, tol = 1e-3, n_iter_no_change = 5)

* Option-2:

  `early_stopping`, `validation_fraction`

  * Set aside `validation_fraction` percentage records from training set as validation set. Use `score` method to obtain validation score.
  
  * The SGDRegressor stops when
    
    * validation score does not improve by at least `tol` for `n_iter_no_change` consecutive epochs.

    * else after a maximum number of iteration `max_iter`



In [None]:
from sklearn.linear_model import SGDRegressor
LR = SGDRegressor(loss = 'squared_error', early_stopping = True, max_iter = 500, tol = 1e-3, validation_fraction = 0.2, n_iter_no_change = 5)

##### How to use different loss functions in SGDRegressor?

* Set `loss` parameter to one of the supported values

  `squared_error`

In [None]:
from sklrean.linear_model import SGDRegressor
LR = SGDRegressor(loss = 'sqaured_error')

##### How to use averaged SGD?

Averaged SGD updates the weight vector to _average of weights_ from previous updates.

* Option-1

  Averaging cross all updates `average = True`

In [None]:
from sklearn.linear_model import SGDRegressor
LR = SGDRegressor( average = True )

* Option-2

  * Set `average` to int value

  * Averaging begins once the total number of samples seen reaches `average`

  * Setting `average = 10` starts averaging after seeing 10 samples

In [None]:
from sklearn,linear_model import SGDRegressor
LR = SGDRegressor( average = 10)

Averaged SGD works best with a larger number of features and a higher `eta0`