# Polynomial Regression
1. Apply **polynomial transformation** on the feature matrix.
2. Learn **linear regression model** on the **transformed feature matrix**.

## How to use **only interaction features** for polynomial regression?
```python
from sklearn.preprocessing import PolynomialFeatures
poly_trasform = PolynomialFeatures(degree=2, interaction_only=True)
```
Here:  
> [$x_1, x_2$] is transformed to [$1, x_1, x_2, x_1x_2].  
> Note that [$x_1^2, x_2^2$] are excluded.

## Regularization
### 1. Ridge regularization
> a.k.a. L2 regularization
#### How to search the **best regularization** parameter for **ridge**?
- Option #1:
  Search for the best regularization rate with built-in cross validation in **RidgeCV** estimator.

- Option #2:
  Use cross validation with **Ridge** or **SVDRegressor** to search for best regularization.
  - Grid search
  - Randomized search

### 2. Lasso regularization
> a.k.a. L1 regularization
- The **best regularization** parameter can be found in similar manner as in ridge.

### 3. Elasticnet regularization
> convex combination of L1 (lasso) and L2 (ridge)
- If we set `l1_ratio = 0.3`,  
  it means `l2_ratio = 1 - l1_ratio = 0.7`

## Hyperparameter tuning (HPT)
### How to recognize hyperparameters in any sklearn estimator?
- **Hyper-parameters** are parameters that are **not directly learnt** within estimators.
- In `sklearn`, they are passed as **arguments** to the **constructor** of the **estimator** classes.
For example,
  - degree in PolynomialFeatures
  - learning rate in SGDRegressor

### How to set these hyperparameters?
Select hyperparameters that results in the **best cross validation scores**.  

Hyper parameter search consists of
- an estimator (regressor or classifier);
- a parameter space;
- a method for searching or sampling candidates;
- a cross-validation scheme; and
- a score function

#### Two generic HTP approaches:
|`GridSearchCV`|`RandomizedSearchCV`|
|--------------|--------------------|
|exhaustively considers all parameter combinations for specified values.|samples a given number of candidate values from a parameter space with a specified distribution.|
|specifies exact values of parameters in grid|specifies distributions of parameter values and values are smapled from those distributions.|
||Computational budget can be chosen independent of number of parameters and their possible values.
||The budget is chosen in terms of the number of sampled candidates or the number of training iterations. Specified in `n_iter` argument|

### Steps in HTP
1. Divide training data into training, validation and test sets.
2. For each combinations of hyper-parameter values learn a model with training set.  
  *This step creates multiple models.*  
  
  Tips:
  - This step can be run **in parallel** by setting `n_jobs= -1`.
  - Some parameter combinations may cause failure in fitting one or more folds of data. This may cause the search to fail. Set `error_score = 0` (or np.NAN) to set score for the problematic fold to 0 and complete the search.

3. **Evaluate performance** of each model with validation and select a model with the best evaluation score.
4. Retain model with the best hyper-parameter settings on training and validation set combined.
5. Evaluate the model performance on the test set.  
  *Note that the test set was not used in hyper-parameter search and model training.*

### What are some of model specific HPT available for regression task?
- Some models can fit data for a **range of values of some parameter** almost **as efficiently as** fitting the estimator for a **single value** of the parameter.
- This feature can be leveraged to perfom **more efficient cross-validation** used for model selection of this parameter.
  - linear_model.LassoCV
  - linear_model.RidgeCV
  - linear_model.ElasticNetCV