### How to model polynomial regression 
* apply polynomial transformation on feature matrix 
* learn linear regression model on the transformed feature matrix


> use pipeline --> apply polynomial transfrom --> apply linear regression <BR>
> * for SGD with polynomial we just use SGD regressor in place of linear regression

In [None]:
# normal equation
from sklearn.linear_model import LinearRegression 
from sklearn.pipeline import Pipeline 
from sklearn.preprocessing import PolynomialFeatures 

poly_model = Pipeline([
    ("polynomial_feature", PolynomialFeatures(degree = 2)), 
    ("linear_regression", LinearRegression())
])

poly_model.fit(X_train, y_train)

In [None]:
# gradient descent 
from sklearn.linear_model import SGDRegressor
from sklearn.pipeline import Pipeline 
from sklearn.preprocessing import PolynomialFeatures 

poly_model = Pipeline([
    ("polynomial_feature", PolynomialFeatures(degree = 2)), 
    ("sgd_regression", SGDRegressor())
])

poly_model.fit(X_train, y_train)

If we want to keep interaction features and remove the higher deg features the we set `interaction_only = True` in polynomial feature transform

```
[x1, x2] --> [1, x1, x2, x1x2]
```


In [None]:
from sklearn.preprocessing import PolynomialFeatures 
poly_transform = PolynomialFeatures(degree = 2, interaction_only = True)


#### ridge regularization :

<b> Option 1 : </b> 
1. instantiate obj of ridge estimator 
2. set parameter alpha(reg rate) to the specific param 



In [1]:
from sklearn.linear_model import Ridge 
ridge = Ridge(alpha = 1e-3)

<b> Option 2 : </b> 
1. instantiate SGD Regressor 
2. set parameter alpha to the req rate and `penalty = l2`


In [2]:
from sklearn.linear_model import SGDRegressor
sgd_ridge = SGDRegressor(penalty = 'l2')


### Best reg parameter : 
<b> Option 1 : </b> 
search the best reg rate with built in cross validation in `RidgeCV` estimator 
<b> Option 2 : </b> 
use  cross validation with ridge or SVDReg to search the best reg rate 


In [None]:
# ride + polynomial 
from sklearn.linear_model import Ridge 
from sklearn.pipeline import Pipeline 
from sklearn.preprocessing import PolynomialFeatures 

poly_model = Pipeline([
    ("polynomial_feature", PolynomialFeatures(degree = 2)), 
    ("ridge", Ridge(alpha = 1e-3))
])

poly_model.fit(X_train, y_train)

#### lasso regularization :

<b> Option 1 : </b> 
1. instantiate obj of lasso estimator 
2. set parameter alpha(reg rate) to the specific param 

<b> Option 2 : </b> 
1. instantiate SGD Regressor 
2. set parameter alpha to the req rate and `penalty = l1`

### Best reg parameter : 
<b> Option 1 : </b> 
search the best reg rate with built in cross validation in `LassoCV` estimator 
<b> Option 2 : </b> 
use  cross validation with ridge or SVDReg to search the best reg rate 


In [None]:
# sgd + lasso 
from sklearn.linear_model import SGDRegressor
sgd_ridge = SGDRegressor(penalty = 'l1')

In [None]:
# lasso + polynomial 
from sklearn.linear_model import Lasso 
from sklearn.pipeline import Pipeline 
from sklearn.preprocessing import PolynomialFeatures 

poly_model = Pipeline([
    ("polynomial_feature", PolynomialFeatures(degree = 2)), 
    ("lasso", Lasso(alpha = 1e-3))
])

poly_model.fit(X_train, y_train)



#### Elastic net regularization :
In the pipeline --> polynom transform --> `SGDRegressor(penalty = 'elastic net', l1_ration = 0.3)` <br> 
l2 ratio = 1 - l1 = 0.7

* l1_ration = 1 --> l1 
* l1_ration = 0 --> l2

In [None]:
# lasso + polynomial 
from sklearn.linear_model import SGDRegressor 
from sklearn.pipeline import Pipeline 
from sklearn.preprocessing import PolynomialFeatures 

poly_model = Pipeline([
    ("polynomial_feature", PolynomialFeatures(degree = 2)), 
    ("elasticnet", SGDRegressor(penalty= 'elasticnet', l1_ratio=0.3))
]) 

poly_model.fit(X_train, y_train)

### Hyperparameter Tuning

We select hyperparameters with the best cv score 

hyperparameter search consists of 
* an estimator 
* a parameter space 
* a method for searching or sampling candidates 
* a cross-validation scheme; 
* a score function


<b>GridSearchCV</b> exhaustively considers all parameter combinations for specified values

<b>RandomizedSearchCV</b> specifies distribution of parameter values and values are sampled from those distributions. 
Computational budget can be chosen independent of number of parameters and their possible values. specified in `n_iter` argument 



In [None]:
# grid search cv
param_grid = [{
    'c' : [1, 10, 100, 1000], 
    'kernel' : ['linear']
        
}]

In [None]:
# random search cv
param_dist = {
    "average" : [True, False], 
    "l1_ratio" : stats.uniform(0, 1), 
    "alpha" : loguniform(1e-4, 1e0)
}


#### Steps : 

1. Divide training data into traning, validation and test sets.
2. For each combination of hyper-parameter values learn a model with training set. 
> This step can be run in parallel by setting `n_jobs = -1` <br>
> **NOTE :** Some param combinations may cause the search to fail. Set `error_score = 0 (or np.NaN)` to set score for the problematic fold to 0 and complete the search 
3. Evaluate performance of each model with validation set and select a model with the best evaluation score.
4. Retrain model with the best hyper-parameter settings on training and validation set combined.
5. Evaluate the model performance on the test set.


#### Model specific HPT 
* Some models can fit data for a range of values of some parameter almost as efficiently as fitting the estimator for a single value of the parameter.
* This feature can be leveraged to perform more efficient cv used for model selection of this parameter. 
    * linear_model.LassoCV
    * linear_model.RidgeCV
    * linear_model.ElasticNetCV

In [2]:
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline 
from sklearn.preprocessing import PolynomialFeatures 
from sklearn.linear_model import SGDRegressor

pararm_grid = [
    {'poly__degree' : [2, 3, 4, 5, 6, 7, 8, 9]}
]

pipeline = Pipeline(steps = [
    ('poly', PolynomialFeatures()), 
    ('sgd', SGDRegressor())
])

grid_search = GridSearchCV(
    pipeline, param_grid, cv = 5, 
    scoring= 'neg_mean_squared_error', return_train_score= True
)

grid_search.fit(X_train.reshape(-1, 1), y_train)