# Regression Extras
- In this notebook, we are going to look at some of the extra topics which may help you to make the model efficient.
    1. What is statistical significance & p-value?
    2. Building best model
    3. Adjusted R squared factor

## What is statistical significance & p-value?

- Let assume we are in the world, where the coin has a head and tail as faces so, we have 50-50 changes to get each face.  
- We are tossing the same coin 6 times, let see the result in the given table.

| Coin | Result | Probability of getting current result in H0 |
|------|--------|-------------------------------------------|
| <img src="../images/coin.png" alt="coin.png" width="30"> | 1st time you got tails 😀 | 0.5 (50%) |
| <img src="../images/coin.png" alt="coin.png" width="30"> | 2nd time you got tails 🙂 | 0.25 (25%) |
| <img src="../images/coin.png" alt="coin.png" width="30"> | 3rd time you got tails, again 😮 | 0.125 (12%) |
| <img src="../images/coin.png" alt="coin.png" width="30"> | 4th time.... tails 😑 | 0.0625 (6%) |
| <img src="../images/coin.png" alt="coin.png" width="30"> | 5th time, tails 🤔 | 0.03125 (3%) |
| <img src="../images/coin.png" alt="coin.png" width="30"> | 6th time, tails again 🧐 | 0.015625 (1%) |

- Once you see the above table, we may think **"Is that coin is fake or does it have both side tails? 🧐".**
- Just check at what time you feel more suspicious about the result? I feel suspicious at 4th time tossing.
- And you can see that, probability of getting tails is decreasing every time by half of previous.
- We humans can get the suspicious feeling but, how about the machines or algorithms?
- To answer this question we are creating a new variable or value called **p-value**. For our case, we can fix the **p-value as 0.05 (5%)**.
- if our algorithm gets a p-value less than 0.5 then we can confirm given data is not useful or not fitting data to the algorithm.
- then we can also assume that the other 95% of data is correct and valid for the algorithm

<img src="https://blog.analytics-toolkit.com/wp-content/uploads/2017/09/2017-09-11-Statistical-Significance-P-Value-1.png" alt="p-value image" width="500">

## Building best model

- If you have one feature x(1) to predict the dependent variable y then, we can use Simple Linear Regression. 
- If you have many feature x(n) to predict dependent variable y then, we can use Multiple Linear Regression and so many regressions we learned. 
- But, how we can find unwanted features which completely useless for the prediction of y?
```
example:
Let assume we are going to predict "Profit" (y)
Which is dependent on 
1. "R&D Spend" of the company.
2. "Administration Spend"  of the company.
3. "Marketing Spend" of the company.
4. "State" where the company is located.
```
- Can you guess! What are the best set of feature variables that is most dependent for predicting "Profit"?
- Let's find it out 😎.

### 5 methods of model building
1. All-in
2. Backward elimination (Stepwise Regression)
3. Forward selection (Stepwise Regression)
4. Bidirectional elimination (Stepwise Regression)
5. All Possible Model (Score Comparision)

#### 1. All-in
- If you have prior knowledge about the dataset and you are sure that all y is dependent on all the feature variables.
- If someone gives you a completely perfect dataset, then in that case you have to use all feature variables.
- We do **All-in** before going to *Backward elimination*. 

#### 2. Backward elimination
- **STEP 1**: You have to select *statistical significance* level to **stay** in the model. ```example: SL_STAY = 0.05 (5%)```
- **STEP 2**: Perform *All-in* with all possible feature varibales.
- **STEP 3**: Find p-value for each feature. If ```p > SL_STAY``` goto **STEP 4** else **END**.
- **STEP 4**: Remove the feature
- **STEP 5**: Refit the model with new set of feature and continue to **STEP 3**.
- **END**: 🥳 Your model is ready 🥳

#### 3. Forward selection
- **STEP 1**: You have to select *statistical significance* level to **enter** in the model. ```example: SL_ENTER = 0.05 (5%)```
- **STEP 2**: Find the best simple linear regression model but apply every single feature x(n) with the y.
- **STEP 3**: Keep that selected feature in the model and try adding all other features one by one.
- **STEP 4**: Find p-value for each feature. If ```p < SL_ENTER``` goto **STEP 3** else **END**.
- **END**: 🥳 Keep your previous, that's the model your look for 🥳

#### 4. Bidirectional elimination
- **STEP 1**: You have to select *statistical significance* level to **stay & enter** in the model. ```example:  SL_ENTER = 0.05 (5%) & SL_STAY = 0.05 (5%)```
- **STEP 2**: Perform **Forward selection** to select feature variable set with (SL_ENTER = 0.05).
- **STEP 3**: Perform all steps in **Backward elimination** on the selected set with (SL_STAY = 0.05) and continue to **STEP 2**.
- **STEP 4**: Iteration of **STEP 3 & 4** will be continue until no variable added or exit from the model then **END**. 
- **END**: 🥳 Your model is ready 🥳

#### 5. All Possible Model
- **STEP 1**: Select one goodness criteria ```example: R^2```
- **STEP 2**: Construct all possible models from the N feature ```ie, N feature can have (2^N)-1 total combinations```
- **STEP 3**: Find the best model out of it by applying criteria
- **END**: 🥳 Your model is ready 🥳

> Note : If you have 10 feature then you need to find 1023 models to take best out of it 😫.

## Adjust R squared

- We all learned about the [R squared](https://render.githubusercontent.com/view/ipynb?color_mode=light&commit=926f6a3db7d36af0e4b5a5e1760a7ec2366cb4a1&enc_url=68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f73616e6a617973616e6a753631382f4d616368696e652d6c6561726e696e672f393236663661336462376433366166306534623561356531373630613765633233363663623461312f6e6f7465626f6f6b2f382532302e25323052656772657373696f6e2532304d6f64656c25323053656c656374696f6e2e6970796e62&nwo=sanjaysanju618%2FMachine-learning&path=notebook%2F8+.+Regression+Model+Selection.ipynb&repository_id=402282035&repository_type=Repository#R-square-(R%5E2)), which is the great factor that helps us to evacuate the model performance.

<img src="../images/r_squared_eqn.png" alt="r_squared_eqn.png" width="500">

- But, there is one problem with it! Guess what? Answer this question "What will the result of R squared value if you add a new feature to the model?"
- The answer is your R squared value also increases! Why? You may think the new variable is not much import for prediction, but that feature is having a very small impact on prediction. Let say about 0.0001 % of dependence.
- Then how we find the performace of new model 🤔?
- Here come's our hero **Adjust R squared** 😎.

<img src="../images/adj_r_squared_eqn.png" alt="adj_r_squared_eqn.png" width="500">