<a href="https://colab.research.google.com/github/shaifali1102/Supervised-Learning/blob/main/OtherEnsembleTechniques_Practise.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# XGBoost
  - sklearn's GBDT is computationally expensive, hyperparams tuning takes long(took >30 mins)
  - XGBoost is optimized implementation of GBDT
      - helps in reducing model training process
      - parallelization of feature selection
          - the computation of Information Gains of n features used for splitting, is done in parallel
      - parallelization in building DT
          - while building DT, both subtrees(left and right) can be built in parallel as there is no dependency b/w them
      - optimizing thresholding in numerical features
          - in DTs, all numerical values are tested to find the maximum IG
          - XGBoost uses histogram based binning
              - it creates discrete percnetile based bins using the continuous values
              - selects threshold using the bins instead of trying every single value(approximation)
  - hyperparams
      - eta: learning rate,shrinking/regularization term
      - min_splt_loss: min IG that we want to further split
      - max_depth: depth of base DTs
      - subsample: row sampling rate
      - colsample_bytree, colsample_bylevel, colsample_bynode: column sampling rate for each tree, each level and each split respectively
      - reg_alpha - L1 regularization term on weights
      - reg_lamda - L2 regularization term on weights
  
# LightGBM
  - published by Microsoft research in 2017
  - much more optimized making it even faster than XGBoost
  - GOSS(Gradient-based One Side Sampling)
    - LightGBM will drop data points with small residual(psuedo residual)
      - it does this smart sampling and reduces the size of training data
      - one side sampling
          - if we plot error distribution, we are sampling the data points only from one side where the error is large
  - EFB(Exclusive feature bundling)
      - feature pairs that are exclusive are grouped into one feature and create a new feature
      - eg, if Gender is OHE, Male is 0 and female is 1. These features are always exclusive.
      - this helps in dimensionality reduction

# Stacking
  - stacked generalization, combines multiple diverse base models at level 0 to create a more accurate meta model at level 1
  - instead of simple averging, the meta model is trained to learn the best way to combine predictions from diverse models(LR, KNN, DTs, SVMs etc)
  - predictions from base models are used as input data to the meta model, we can also use class probabilities as input
  - meta model's prediction is taken as a final prediction
  - mainly used in kaggle competitions
  - extensive time complexity

# Cascading
  - chaining of multiple models in stages
  - each subsequent model only processes that the previous, simpler mode couldn't confidently classify
      - if a sample passes a confidence threshold in early stage, it exists the system. Otherwise, it moves to the next stage.
  - early stages are usually fast and simple to reject the obvious non-targets, while later stages are more complex to handle difficult cases
  - by eliminating most negative results early the system saves time and resources, making it ideal for real-time applications
  - complex, specialized models at later stages refine results, reducing false positives
  - eg transaction fraud or not, data is imbalanced, where
      - y=0 not fraud will be more
      - y=1 fraud will be less
  - used in industry where loss associated with misclassification is high
    - eg, cancer detection, financial domain etc



In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
import pickle

!gdown 171Yoe_GSapyrmOnD9oBzHWNOD_OnQs0F
!gdown 1hnIlTPW3AMeB69EbeaXCRIrpMVT1Vwmc
!gdown 1nZtB_RtxMg_MgoRczb8UWQX-AEK_l3qE
!gdown 1zLDUErwKdmF-RacOyHEuI_z_46LssQtP


with open('X_train.pickle', 'rb') as handle:
    X_train = pickle.load(handle)

with open('X_test.pickle', 'rb') as handle:
    X_test = pickle.load(handle)

with open('Y_train.pickle', 'rb') as handle:
    y_train = pickle.load(handle)

with open('Y_test.pickle', 'rb') as handle:
    y_test = pickle.load(handle)

Downloading...
From: https://drive.google.com/uc?id=171Yoe_GSapyrmOnD9oBzHWNOD_OnQs0F
To: /content/Y_test.pickle
100% 31.7k/31.7k [00:00<00:00, 56.9MB/s]
Downloading...
From: https://drive.google.com/uc?id=1hnIlTPW3AMeB69EbeaXCRIrpMVT1Vwmc
To: /content/X_test.pickle
100% 253k/253k [00:00<00:00, 66.3MB/s]
Downloading...
From: https://drive.google.com/uc?id=1nZtB_RtxMg_MgoRczb8UWQX-AEK_l3qE
To: /content/Y_train.pickle
100% 126k/126k [00:00<00:00, 73.3MB/s]
Downloading...
From: https://drive.google.com/uc?id=1zLDUErwKdmF-RacOyHEuI_z_46LssQtP
To: /content/X_train.pickle
100% 1.01M/1.01M [00:00<00:00, 124MB/s]


# XGBoost

In [None]:
from xgboost import XGBClassifier
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import StratifiedKFold

params = {
    'n_estimators': [50,100,150,200],
    'max_depth': [3, 4, 5, 7],
    'learning_rate': [0.1,0.2,0.3],
    "subsample": [0.6,0.8,1.0],
    "colsample_bytree": [0.6,0.8,1.0]
}

In [None]:
xgb = XGBClassifier(objective="multi:softmax", num_Class=20, silent=True)
# multi:softmax = multi-class classification
# n_class= should be specified with softmax or softprob, number of unique classes in target variable(y_train)
# silent=verbosity, when True it supresses warnings and messages from XGBoost during training

In [None]:
rs = RandomizedSearchCV(xgb, params, n_iter=10, scoring='accuracy', cv=3, n_jobs=-1,verbose=2)

rs.fit(X_train, y_train)

Fitting 3 folds for each of 10 candidates, totalling 30 fits


Parameters: { "num_Class", "silent" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


In [None]:
res = rs.cv_results_

for i in range(len(res['params'])):
  print(f"Params: {res['params'][i]} Mean_Score: {res['mean_test_score'][i]} Rank: {res['rank_test_score'][i]}")


Params: {'subsample': 1.0, 'n_estimators': 150, 'max_depth': 4, 'learning_rate': 0.2, 'colsample_bytree': 1.0} Mean_Score: 0.9561770674784373 Rank: 7
Params: {'subsample': 0.8, 'n_estimators': 50, 'max_depth': 7, 'learning_rate': 0.2, 'colsample_bytree': 0.8} Mean_Score: 0.957572298325723 Rank: 5
Params: {'subsample': 0.8, 'n_estimators': 200, 'max_depth': 4, 'learning_rate': 0.3, 'colsample_bytree': 1.0} Mean_Score: 0.9645484525621512 Rank: 2
Params: {'subsample': 0.8, 'n_estimators': 100, 'max_depth': 4, 'learning_rate': 0.1, 'colsample_bytree': 1.0} Mean_Score: 0.9370877727042112 Rank: 9
Params: {'subsample': 1.0, 'n_estimators': 100, 'max_depth': 5, 'learning_rate': 0.2, 'colsample_bytree': 0.8} Mean_Score: 0.9587138508371386 Rank: 4
Params: {'subsample': 0.8, 'n_estimators': 150, 'max_depth': 7, 'learning_rate': 0.1, 'colsample_bytree': 0.8} Mean_Score: 0.9647387113140536 Rank: 1
Params: {'subsample': 1.0, 'n_estimators': 150, 'max_depth': 7, 'learning_rate': 0.2, 'colsample_bytre

In [None]:
best_xgb = rs.best_estimator_

xgb = XGBClassifier(best_xgb)

xgb.fit(X_train, y_train)

print(xgb.score(X_train, y_train))
print(xgb.score(X_test, y_test))

Parameters: { "objective__colsample_bytree", "objective__enable_categorical", "objective__learning_rate", "objective__max_depth", "objective__missing", "objective__n_estimators", "objective__num_Class", "objective__objective", "objective__silent", "objective__subsample" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


1.0
0.9771747400456505


# LightGBM

In [None]:
from lightgbm import LGBMClassifier

params = {
    'learning_rate': [0.1,0.3,0.5],
    'boosting_type': ['gbdt'],
    'objective': ['multiclass'],
    'max_depth': [5,6,7,8],
    'colsample_bytree': [0.5,0.7],
    'subsample': [0.5,0.7],
    'metric': ['multi_error']
}

lgbm = LGBMClassifier(num_classes=20)

rs = RandomizedSearchCV(lgbm, params, n_iter=10, n_jobs=-1, cv=3, verbose=1)

In [None]:
rs.fit(X_train, y_train)

Fitting 3 folds for each of 10 candidates, totalling 30 fits
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001238 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 2040
[LightGBM] [Info] Number of data points in the train set: 15768, number of used features: 8
[LightGBM] [Info] Start training from score -2.982377
[LightGBM] [Info] Start training from score -2.993705
[LightGBM] [Info] Start training from score -3.016753
[LightGBM] [Info] Start training from score -2.986139
[LightGBM] [Info] Start training from score -3.010297
[LightGBM] [Info] Start training from score -3.014166
[LightGBM] [Info] Start training from score -3.037696
[LightGBM] [Info] Start training from score -3.011585
[LightGBM] [Info] Start training from score -2.972414
[LightGBM] [Info] Start training from score -2.988654
[LightGBM] [Info] Start training from score -3.021948
[LightGBM] [Info] Start training from score -2.962550
[Li

In [None]:
res = rs.cv_results_

for i in range(len(res['params'])):
  print(f"Params: {res['params'][i]} Mean_Score: {res['mean_test_score'][i]} Rank: {res['rank_test_score'][i]}")


Params: {'subsample': 0.7, 'objective': 'multiclass', 'metric': 'multi_error', 'max_depth': 5, 'learning_rate': 0.3, 'colsample_bytree': 0.5, 'boosting_type': 'gbdt'} Mean_Score: 0.9629629629629629 Rank: 3
Params: {'subsample': 0.7, 'objective': 'multiclass', 'metric': 'multi_error', 'max_depth': 6, 'learning_rate': 0.5, 'colsample_bytree': 0.7, 'boosting_type': 'gbdt'} Mean_Score: 0.1041349568746829 Rank: 10
Params: {'subsample': 0.5, 'objective': 'multiclass', 'metric': 'multi_error', 'max_depth': 6, 'learning_rate': 0.5, 'colsample_bytree': 0.5, 'boosting_type': 'gbdt'} Mean_Score: 0.1924150177574835 Rank: 8
Params: {'subsample': 0.5, 'objective': 'multiclass', 'metric': 'multi_error', 'max_depth': 5, 'learning_rate': 0.5, 'colsample_bytree': 0.5, 'boosting_type': 'gbdt'} Mean_Score: 0.1513191273465246 Rank: 9
Params: {'subsample': 0.7, 'objective': 'multiclass', 'metric': 'multi_error', 'max_depth': 6, 'learning_rate': 0.1, 'colsample_bytree': 0.5, 'boosting_type': 'gbdt'} Mean_Sco

In [None]:
best_lgbm = rs.best_estimator_

lgbm = XGBClassifier(best_lgbm)

lgbm.fit(X_train, y_train)

print(xgb.score(X_train, y_train))
print(xgb.score(X_test, y_test))

Parameters: { "objective__boosting_type", "objective__colsample_bytree", "objective__importance_type", "objective__learning_rate", "objective__max_depth", "objective__metric", "objective__min_child_samples", "objective__min_child_weight", "objective__min_split_gain", "objective__n_estimators", "objective__num_classes", "objective__num_leaves", "objective__objective", "objective__reg_alpha", "objective__reg_lambda", "objective__subsample", "objective__subsample_for_bin", "objective__subsample_freq" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


1.0
0.9771747400456505
