## Goal of the notebook

I decided to participate to this competition in order to increase my understanding of gradient boosting algorithms. I found the notebook of **Awwal Malhi** (https://www.kaggle.com/awwalmalhi/extreme-fine-tuning-lgbm-using-7-step-training) about Extreme Gradient Boosting very interesting (special thanks to him) and I decided to take a deep dive into it. My notebook is just an implementation of the strategy that he used with some explanations and some improvements.


## Pretrained LGBM strategy:

This strategy enables me to go from **0.84198** to **0.84184** with a single lgbm model. **Awwal Malhi** already gave some explanations of his strategy, but I will try to add mine.

### How it works ?

* train your best model
* decrease learning rate and train the model again
* decrease regularization params and retrain the model

### Explanations

This strategy is mostly based on **transfer learning** (mostly used in neural networks). In transfer learning,we use a pretrained model and add a head to it. Moreover, we usually freeze lower layers (the ones of the pretrained model) and train higher layers (those that we add to the pretrained model). This is exactly the case here:

We create a normal lgbm model and fit it on our data. Once it starts overfitting we stop the training. We will consider this part of the lgbm model as the pretrained model (to make an analogy to neural networks). 

After that, and in order to fight against overfitting, we decrease learning rate and starts fitting again the pretrained model on our data, in other words we add more weak learners to our pretrained model (that can be compared to higher layers in a neural network). We can also make an analogy to neural networks in this case. Indeed, when we train neural networks, it is good practice to decrease the learning rate during training process.

Once reducing the learning rate is not adding a significant improvement to our model, we should increase the complexity of our weak learners. Indeed, increasing weak learners complexity might increase their performance while also increasing their chance of overfitting. At inference time, we will have weak learners with high bias and low variance (weak learners from the pretrained model) and some which are slightly overfit (low bias- high variance). This is why we reduce the learning rate before adding overfitted weak learning (when we reduce learning rate, we basically reduce the contribution of these overfitted trees to final prediction).

I tried many things in order to increase model compelxity and decrease regularization params. I found that the best thing to do is to increase number of leaves and decrease minimum child samples. 

**This explanation was simply my understanding of how this strategy works. If you think anything is wrong or you want to add something, feel free to add a comment, I will be glad to read it and find what others think of how it works.**

# Imports

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import optuna
from lightgbm import LGBMRegressor

from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split,KFold
from sklearn.preprocessing import LabelEncoder

In [2]:
train=pd.read_csv('../input/tabular-playground-series-feb-2021/train.csv')
test=pd.read_csv('../input/tabular-playground-series-feb-2021/test.csv')

# Preprocessing

In [3]:
cat_var=[f'cat{i}' for i in range(10)]
cont_var=[f'cont{i}' for i in range(14)]
columns=[ col for col in train.columns.tolist() if col not in ['id','target']]

for cat in cat_var:
    le = LabelEncoder()
    train[cat]=le.fit_transform(train[cat])
    test[cat]=le.transform(test[cat])

In [4]:
X=train[columns]
y=train.target

## Hyperparameter tuning with Optuna

I wanted to say many thanks to **Hamza** (https://www.kaggle.com/hamzaghanmi/lgbm-hyperparameter-tuning-using-optuna) who creates an amazing notebook for tuning hyperparameters using Optuna, I learnt a lot from it. Using Optuna and a 5 fold cross validation strategy,the best result I could get is 0.84201 on public lb. So for my experiments I decided to use the hyperparameters of **Bizen** (https://www.kaggle.com/hiro5299834/tps-feb-2021-with-single-lgbm-tuned) which gave him a slightly better score of 0.84198 on public lb. 

def objective(trial,train=train,target=y):
    
    X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=42)
    
    param={
        'num_leaves':trial.suggest_int('num_leaves',100,1000),
        'max_depth':trial.suggest_categorical('max_depth',[7,10,20,50]),
        'min_child_samples':trial.suggest_int('min_child_samples',100,500),
        'max_bin':trial.suggest_categorical('max_bin',[255,350,512,1024]),
        'learning_rate':trial.suggest_categorical('learning_rate',[0.006,0.008,0.01,0.014,0.017,0.02]),
        'colsample_bytree': trial.suggest_categorical('colsample_bytree', [0.3,0.4,0.5,0.6,0.7,0.8,0.9, 1.0]),
        'subsample': trial.suggest_categorical('subsample', [0.4,0.5,0.6,0.7,0.8,1.0]),
        'metric': 'rmse', 
        'random_state': 48,
        'n_estimators': 30000
         
    }
    
    model=LGBMRegressor(**param)
    
    model.fit(X_train,y_train,eval_set=[(X_test,y_test)],early_stopping_rounds=100,verbose=False)
    
    predictions=model.predict(X_test)
    
    rmse=mean_squared_error(y_test,predictions,squared=False)
    
    return rmse

study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=200)

study.best_params

In [5]:
# base lgbm models


lgb_params={'random_state': 2021,
          'metric': 'rmse',
          'n_estimators': 30000,
          'n_jobs': -1,
          'cat_feature': [x for x in range(len(cat_var))],
          'bagging_seed': 2021,
          'feature_fraction_seed': 2021,
          'learning_rate': 0.003899156646724397,
          'max_depth': 99,
          'num_leaves': 63,
          'reg_alpha': 9.562925363678952,
          'reg_lambda': 9.355810045480153,
          'colsample_bytree': 0.2256038826485174,
          'min_child_samples': 290,
          'subsample_freq': 1,
          'subsample': 0.8805303688019942,
          'max_bin': 882,
          'min_data_per_group': 127,
          'cat_smooth': 96,
          'cat_l2': 19
          }

# Pretrained lgbm model strategy

In [6]:
# factor by which we will reduce lambda
f1= 0.6547870667136243
# we will increase alpha by 2.6 each time we retrain the model
f2= 2.6711351556035487
# increase number of leaves by 20 each time we retrain the model
f3= 20
# decrease min child samples by 49 each time we retrain the model
f4= 49

f5= 2

In [7]:
%%time

kf=KFold(n_splits=5,random_state=48,shuffle=True)

# we will store our final predictions in preds
preds = np.zeros(test.shape[0])
#store rmse of each iterations
rmse=[]
i=0

# --------------------------------------------------------------------------------
# Phase 1: create the pretrained model
for idx_train,idx_test in kf.split(X,y):
    
    X_train,X_test=X.iloc[idx_train],X.iloc[idx_test]
    y_train,y_test=y.iloc[idx_train],y.iloc[idx_test]

    
    model=LGBMRegressor(**lgb_params)
    
    model.fit(X_train,y_train,eval_set=(X_test,y_test),early_stopping_rounds=300,verbose=False,eval_metric='rmse')
    
    predictions=model.predict(X_test,num_iteration=model.best_iteration_)
    
    rmse.append(mean_squared_error(y_test,predictions,squared=False))
    
    print('First Round:')
    
    print(f'RMSE {rmse[i]}')
    
    rmse_tuned=[]
    params = lgb_params.copy()
    
    # -----------------------------------------------------------------------------
    # Phase 2: iterations where we decrease the learning rate and regularization params    
    for t in range(1,17):
        
        
        if t >2:    
                    
            params['reg_lambda'] *=  f1
            params['reg_alpha'] += f2
            params['num_leaves'] += f3
            params['min_child_samples'] -= f4
            params['cat_smooth'] -= f5
        
            
        params['learning_rate']=0.003
        
        # min_child_samples can not be lower than 0
        if params['min_child_samples']<1:
            params['min_child_samples']=1
        
        # we decrease the learning rate even more after 11 rounds of retraining
        if t>11:
            params['learning_rate']=0.001
              
        
        model=LGBMRegressor(**params).fit(X_train,y_train,eval_set=(X_test,y_test),eval_metric='rmse',early_stopping_rounds=200,verbose=False,init_model=model)
        
        predictions=model.predict(X_test, num_iteration= model.best_iteration_)
        
        rmse_tuned.append(mean_squared_error(y_test,predictions,squared=False))
        
        print(f'RMSE tuned {t}: {rmse_tuned[t-1]}')
        
    print(f'Improvement of {rmse[i]-rmse_tuned[t-1]}')
    
    # ---------------------------------------------------------------------------
    # Inference time: calculate predictions for test set
    
    preds+=model.predict(test[columns],num_iteration=model.best_iteration_)/kf.n_splits
        
    i+=1

First Round:
RMSE 0.8417286531304851
RMSE tuned 1: 0.8417286979346285
RMSE tuned 2: 0.8417287480497231
RMSE tuned 3: 0.841726938816642
RMSE tuned 4: 0.8417221230010475
RMSE tuned 5: 0.8417215805130301
RMSE tuned 6: 0.8417208759546769
RMSE tuned 7: 0.8417200664126364
RMSE tuned 8: 0.8417195340452981
RMSE tuned 9: 0.8417187093752144
RMSE tuned 10: 0.8417174869320329
RMSE tuned 11: 0.8417161499132937
RMSE tuned 12: 0.8417160046945139
RMSE tuned 13: 0.8417159629925569
RMSE tuned 14: 0.8417160035464603
RMSE tuned 15: 0.841716019394132
RMSE tuned 16: 0.8417160238842415
Improvement of 1.2629246243567316e-05
First Round:
RMSE 0.8451388043495696
RMSE tuned 1: 0.8451342007426411
RMSE tuned 2: 0.8451307966128719
RMSE tuned 3: 0.8451240854976896
RMSE tuned 4: 0.8451204440825001
RMSE tuned 5: 0.8451173649176033
RMSE tuned 6: 0.8451169403211635
RMSE tuned 7: 0.8451149064330081
RMSE tuned 8: 0.845114177059101
RMSE tuned 9: 0.8451130577896553
RMSE tuned 10: 0.8451121965684848
RMSE tuned 11: 0.84511045

In [8]:
# Create submission file
test['target']=preds
test=test[['id','target']]
test.to_csv('submission.csv',index=False)

### Thanks for reading

If you like this work and find it useful, upvote please.