# About

# Recap
## Simple regression

$$
   Y = b_{0} + b_{1}X + \epsilon
$$
* Y : dependent variable
* X : independent variable
* $ b_{0}$ = intercept
* $ b_{1}$ = coefficient / slope of curve
* $\epsilon$ = error term

## Multiple regression
* Similar to simple regression but instead of one, there will be multiple independent variables

$$
   Y = b_{0} + b_{1}X_{1} + b_{2}X_{2}+ \epsilon
$$

# More regressions!
## 1. Lasso Regression
* Multiple regression isn't necessarily better than simple linear regression all the time
* Just because there are more predictors doesn't mean it will make the prediction more accurate
* Sometimes simpler models perform better than a complex one
* But when there are so many variables (like 20! or even 200!), how many of these features do we keep? which ones do we omit?

* Feature selection is important in machine learning to prevent overfitting and it's the same here with regression

<b> lasso? </b>
* If there are too many features, some of them are completely eliminated. And it is done by setting the coefficients to zero. This process is known as ‚ÄúL1 Regularization‚Äù in machine learning vocab.

### Why LASSO?
* Least Absolute Shrinkage and Selection Operator
    * is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the statistical model it produces.
* mathematically represented as
    ![0_y4hQJ8QqAHIow1mu.png](attachment:0_y4hQJ8QqAHIow1mu.png)
* This is also a regularization method and uses l1 regularization. The assumptions of lasso regression is same as least squared regression except normality is not to be assumed.


## 2. Ridge Regression
* In LASSO regression, we simplify regression equations that have *too many features* and we do so by <b>completely eliminating some of them.</b>
* Ridge regression serves the same purpose, i.e. simplifies models, but <b>instead of eliminating features entirely it minimizes their effect.</b>
* In ridge regression, the feature coefficients are shrunk to close to zero but not exactly zero (a process known as L2 Regularization). 
* In machine learning ridge regression models, a hyperparameter called lambda ($\lambda$) is used to control the weight of the penalty associated with the loss function.
* Since no feature is completely eliminated, ridge regression is not used for feature selection.

## Why ridge?
* One of the biggest problems faced by data analyst while building regression models is when he data suffers from multicollinearity. 
    * Multicollinearity is a situation that occurs when independent variables are highly correlated.
* This is the case when we apply Ridge regression to our data
![1_dCQiL9Cybg_Suo2tXK_Xag.jpeg](attachment:1_dCQiL9Cybg_Suo2tXK_Xag.jpeg)

* $\beta$ is the coefficient
* $\lambda$ is the shrinkage parameter
*  Ridge regression solves the multicollinearity problem through shrinkage parameter i.e. Œª .

## 3. Stepwise regression
* As the name suggests, in stepwise regression you start with the simplest model (e.g. 1 dependent and 1 independent variable), then evaluate its performance. 
* Then you add another variable and evaluate the performance again and compare two models to find the better one. This process is repeated until the best performing model is found.
    * In each step, a variable is considered for addition to or subtraction from the set of explanatory variables based on some prespecified criterion and see which one has the lowest p-value.
* An estimator called Akaike Information Criterion (AIC) is used to compare model performance in stepwise regression.

<b> p-value </b>
* Here, the p-value or probability value is the probability of obtaining test results at least as extreme as the results actually observed during the test, assuming that the null hypothesis is correct.
* The aim of this modeling technique is to maximize the prediction power with minimum number of predictor variables.


----

# 4. Polynomial Ridge Regression
* A regression equation is a polynomial regression equation if the power of independent variable is more than 1. Polynomial Regression is used when a linear regression line cannot fit the data properly i.e either the model underfits or overfits the data.
* In order to avoid such a situation, we need to increase the complexity of our model. To generate a higher order equation we can add powers of the original features as new features. 
* A polynomial regression algorithm is of the form :
$$
    y = \theta_{0} + \theta_{1}x_{1} + \theta_{2}x_{2}^{2}
$$

----
# 5. ElasticNet Regression 
* The ElasticNe Regression is a regularized regression method that linearly combines the L1 and L2 penalties of the lasso and ridge methods. 
* Elastic-net is useful when there are multiple features which are correlated.
![1_RXPEN97r_qTV3DlisylNWA.png](attachment:1_RXPEN97r_qTV3DlisylNWA.png)
* ElasticNet Regression encourages group effect in case of highly correlated variables. In addition to setting and choosing a lambda value elastic net also allows us to tune the alpha parameter where ùû™ = 0 corresponds to ridge and ùû™ = 1 to lasso