Content
- AdaBoost
- Cascading

# **AdaBoost**

<img src="https://miro.medium.com/max/1400/1*DwvwMlOcT1T9hZwIJvMfng.png" >

AdaBost means Adaptive Boosting 


#### What does AdaBoost do?

Let's try to understand it using simple example (2-D data)

Let us assume a toy dataset with negatives and positives






<img src='https://drive.google.com/uc?id=1Og9gvnSUpx3uqaS12NUybZatrwLWhIqZ' >


First, we build a weak classifier i.e. which is a horizontal or a vertical plane 

Here,
* $ϵ$ is the error of the model

* $α$ is how good our model is, which is considered as the weight of the model


<img src='https://drive.google.com/uc?id=1e17PWvIvvhH0NeL_hAB45iFI8pBZAnPt' >



After building the first model, 

we give more weightage to the erronous data while building the second model 


Now as more weightage is given to the erronous data,
- the second model will be based on the erronous data.


<img src='https://drive.google.com/uc?id=1EauFehFWBMZB7scQaBKgv9bcTYwv9N_f' >



Again after the second model,

there will be few erronous points in the classification, so these points will be given more weightage while building the next model 

#### What is it doing intuitivelty?
Here what we are doing is giving weights to points proportional to the errors they are making

So, to get final model, we multiply each model with a weight.

The output will be sign of this multiplication i.e. +ve/ -ve

<img src='https://drive.google.com/uc?id=1PX8ikROK910A_ms432Wfm6CB0vt7yg9H' >




### Drawbacks of Adaboost

Major drawback of Adaboost is that 
- unlike GBDT, we don't have flexibility to use any loss function.

### Code walkthrough

In [None]:
import datetime as dt
from sklearn.ensemble import AdaBoostClassifier

model2 = AdaBoostClassifier(DTC(max_depth = 5))

params= {
    "n_estimators" : [50, 100, 150],
    "learning_rate" : [0.1, 0.2, 0.3]
}

clf = GridSearchCV(model2, params, cv = 5, scoring = "accuracy")
clf.fit(X_train, Y_train)

GridSearchCV(cv=5,
             estimator=AdaBoostClassifier(base_estimator=DecisionTreeClassifier(max_depth=5)),
             param_grid={'learning_rate': [0.1, 0.2, 0.3],
                         'n_estimators': [50, 100, 150]},
             scoring='accuracy')

In [None]:
res = clf.cv_results_

for i in range(len(res["params"])):
  print(f"Parameters: {res['params'][i]}\tMean_score: {res['mean_test_score'][i]}\tRank: {res['rank_test_score'][i]}")


Parameters: {'learning_rate': 0.1, 'n_estimators': 50}	Mean_score: 0.8614918585655154	Rank: 8
Parameters: {'learning_rate': 0.1, 'n_estimators': 100}	Mean_score: 0.8719566331830402	Rank: 6
Parameters: {'learning_rate': 0.1, 'n_estimators': 150}	Mean_score: 0.8739227127348593	Rank: 5
Parameters: {'learning_rate': 0.2, 'n_estimators': 50}	Mean_score: 0.8550859655759601	Rank: 9
Parameters: {'learning_rate': 0.2, 'n_estimators': 100}	Mean_score: 0.8808971375511561	Rank: 4
Parameters: {'learning_rate': 0.2, 'n_estimators': 150}	Mean_score: 0.889522615475674	Rank: 2
Parameters: {'learning_rate': 0.3, 'n_estimators': 50}	Mean_score: 0.86821454781015	Rank: 7
Parameters: {'learning_rate': 0.3, 'n_estimators': 100}	Mean_score: 0.8848932914290243	Rank: 3
Parameters: {'learning_rate': 0.3, 'n_estimators': 150}	Mean_score: 0.8917423613025894	Rank: 1


# **Cascading**

Lets, assume we are to detect a fraudulent transaction or not 
 
Let the dataset be $D_1$ which will be imbalanced, and 

- $y=1$ for fraudulent transaction
- $y =0$ for non fraudulent transaction

For a query point $x_q$, 
- we will pass this point through the first model $M_1$
- Model $M_1$ will return the probability of the query point being a fraud


Based on probability, we'll split it in 2 parts:
- if the probability of $y$ being 1 is extremely low, say $< 0.001$ then 
    - we consider that as not fraudulent, let this data be $D_1'$.

#### What happens to rest of the data? 

The rest of the points ($D_1-D_1'$) i.e. data with prob. > 0.001 which we are not sure about 
- will be passed through the next model $M_2$ 
- Model $M_2$ will be more stricter i.e. it'll penalize more.

Again model $M_2$ will split into 2 parts
- non fraud (say, $P(y =1 | x_q) < 0.001$)
- fraud transac. (p > 0.001)

We can again add another model after $M_2$ which will work on same principles





<img src='https://drive.google.com/uc?id=1TwfqSCDjjS3MsXaBadxNBIvBfF9LQ0JC' >

#### Did you notice the structure of model? 
We are cascading one model after another.

In the first model we are just removing all the genuine customers
- in second model, we are trying to find the may be fraudalent points from 2nd data set, 

we contimue doing this **cascading**

Every model is trained on different datasets ($D_n - D_n^1$ )

If even after all these models, we are not sure there will be a human at last to verify the same.

<img src='https://drive.google.com/uc?id=16cjT66tLCnFRuGzGPVmw4KUOgz0k85Ot' >

