<a href="https://colab.research.google.com/github/sanjeesi/Notes-Notebooks/blob/master/Data%20Science%20IITM/MLP/Week3/LinearRegression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How to build baseline regression model?
`DummyRegressor` helps in creating **baseline** for regression.
```python
dummy_regr = DummyRegressor(strategy="mean")
```
Strategy
- mean
- median
- quantile
- constant

# How is **Linear Regression** model trained?
1. Instantiate **object** of suitable **linear regression estimator** from one of the following two options
  - Normal Equation
    ```python
    linear_regressor = LinearRegression()
    ```
  - Iterative optimization
    ```python
    linear_regressor = SGDRegressor()
    ```
2. Call **fit** method on **linear regression object** with **training feature matrix** and **label vector** as arguments.
  ```python
  linear_regressor.fit(X_train, y_train)
  ```
  > Works for both single and multi-output regression.


# SGDRegressor Estimator
- Use for large training set up (>10k samples)
- provides **greator control on optimization process** through provision for **hyperparameter** settings:
  - loss
    - loss= 'squared error'
    - loss= 'huber' (not heavily influenced by the **outliers**)  
  - penalty=
    - l1
    - l2
    - elasticnet
  - learning_rate = 
    - constant
    - optimal
    - invscaling (default)
    - adaptive
  - early_stopping =
    - True
    - False

*It's a good idea to use a* **random seed** *of your choice while instantiating SGDRegressor object. It helps us get* **reproducible results.**  

Set `random_state` to seed of your choice.  

## How to perform feature scaling for SGDRegressor
SGD is **sensitive to feature scaling**, so it is **highly recommended to scale** input feature matrix.  
```python
from sklearn.linear_model import SGDRegressor
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

sgd = Pipeline([
    ('feature_scaling', StandardScaler()),
    ('sgd_regressor', SGDRegressor())])
sgd.fit(X_train, y_train)
```
> Note:
> - Feature scaling **is not needed** for **word frequencies** and **indicator features** as they have intrinsic scale.
> - Features extracted using PCA should be **scaled by some constant c** such that the average L2 norm of the training data equals one.

## How to shuffle training data after each epoch in SGDRegressor?
```python
linear_reg = SGDRegressor(shuffle=True)
```

## How to use set **#epochs** in SGDRegressor?
Set **max_iter** to desired **#epochs**. The default value is 100.  
`linear_reg = SGDRegressor(max_iter=100)`

> Remember **one epoch** is **one full pass over the training data**.

Practical tip:  
> SGD converges after observing approximately $10^6$ training samples. Thus, a reasonable first guess for the number of iterations for n sampled training set is:  
`max_iter = np.ceil($10^6$/n)`

## How to use set **stopping criteria** in SGDRegressor?
Option 1: tol, n_iter_no_change, max_iter
```python
lin_reg = SGDRegressor( loss='squared_error',
                        max_iter=500,
                        tol=1e-3,
                        n_iter_no_change=5)
```
Option 2: early_stopping, validation_fraction

## How to use averaged SGD?
Averaged SGD updates the weight vector to **average of weights** from previous updates.  
Option 1: Averaging across all updates `average=True`  
Option 2: Set `average` to int value.  
> Averaged SGD works **best** with a **large number of features** and a **higher eta0** (initial leaning rate)

## How do we initialize SGD with weight vector of the previous run?
Set `warm_start = True`  
By default `warm_start = False`

## How to access the weights of trained **Linear Regression** model?
$$
\hat{y} = w_0 + w_1x_1 + w_2x_2 + ... + w_mx_m = \textbf{w}^TX
$$
> The **weights** $w_1, w_2, ..., w_m$ are stored in `coef_` class variable.  
`linear_regressor.coef_`  
> The **intercept** $w_0$ is stored in `intercept_` class variable.
`linear_regressor.intercept_`

Note:
  - These code snippets works for both **LinearRegression** and **SGDRegressor**, and for that matter to **all regression estimators**.

## How to make predictions on new data in **Linear Regression** model?
- Step 1: Arrange data for prediction in a feature matrix of shape(#samples, #features) or in sparse matrix format.
- Step 2: Call **predict** method on **linear regression object** with **feature matrix** as an argument.
  `linear_regressor.predict(X_test)`
  > Same code works for **all regression estimators.**

# Model Evaluation
Steps-
1. **Split data** into train and test
2. **Fit** linear regression estimator on training set.
3. **Calculate training error** (a.k.a. empirical error)
4. **Calculate test error** (aka generalization error)
5. **Compare** training and test errors

## How to evaluate trained **Linear Regression** model?
Using **score** method on **linear regression object**:
```python
# Evaluation on the eval set with
# 1. feature matrix
# 2. label vector or matrix (single/multi-output)
linear_regressor.score(X_test, y_test)
```
The score returns *$R^2$* or coefficient of determination

## Evaluation metrics
- `mean_absolute_error`
    - `train_error = mean_absolute_error(y_train, y_predicted)`
- `mean_squarred_error`
- `r2_score` also returns *$R^2$*

### How to evaluate regression model on worst case error?
Use metrics `max_error`  
- can only be used for **single output regression**. It **does not support multi-output regression**.

### Scores and Errors
- Score is a metric for which higher value is better.
- Error is a metric for which lower value is better.  

Convert error to score by adding **neg_** suffix.
- metrics.mean_absolute_error --> neg_mean_absolute

### Cross-validation performs **robust evaluation** of model performance
- by **repeated splitting** and
- providing **many training and test errors**
This enables us to **estimate variability in generalization performance** of the model.
Cross-vaildation iterators:
- `KFold`
  - k: partitions
  - k-1: training_set
  - 1: test_set
- `RepeatedKfold`
- `LeaveOneOut`
  - k = n(no. of training examples)
  - 1 example for test_set
- `ShuffleSplit`
  - Random permutaions of shuffled data (does it repeatedly)

  
