# **Lab: Model Optimization**




# **Lab: Model Interpretation**

We will analyse the lightgbm model we trained with the best hyperparameters found

The steps are:
1.   Load Data and Model
2.   lightgbm Feature Importance
3.   lightgbm Variable Importance by Permutation
4.   Partial Dependence Plot
5.   LIME
6.   Push changes


### 1. Load Data and Model

**[1.1]** Navigate the folder `notebooks` and create a new jupyter notebook called `3_lightgbm_interpret.ipynb`

**[1.2]** Import the pandas and numpy package

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution
import pandas as pd
import numpy as np

**[1.3]** Load the prepared dataset from `data/interim` into a dataframe called `df_cleaned`



In [None]:
# Placeholder for student's code (Python code)

In [None]:
#Solution:
df_cleaned = pd.read_csv('../data/interim/yellow_tripdata_2020-04_prepared.csv')

**[1.4]** Remove the target variable from `df_cleaned`

In [None]:
# Placeholder for student's code (Python code)

In [None]:
#Solution:
_ = df_cleaned.pop('trip_duration')

**[1.5]** Import the function you created `load_sets` from `src/data/sets` and load the saved sets from `data/processed`

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution
from my_krml_149874.data.sets import load_sets

X_train, y_train, X_val, y_val, X_test, y_test = load_sets(path='../data/processed/')

**[1.6]** Import `lightgbm` as lgb and the method `load` from `joblib` package

In [None]:
# Placeholder for student's code (Python code)

In [None]:
import lightgbm as lgb
from joblib import load

**[1.7]** Load the trained lightgbm model from `/models/` into a variable called `lightgbm_best`

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution
lightgbm_best = load('../models/lightgbm_best.joblib')

### 3. lightgbm Variable Importance by Permutation

**[3.1]** import `permutation_importance` from `sklearn.inspection`

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution
from sklearn.inspection import permutation_importance

**[3.2]** Calculate variable importance by permutation on the training set

In [None]:
r = permutation_importance(
    lightgbm_best, X_train, y_train,
    n_repeats=30,
    random_state=8
)

**[3.3]** Sort the variable importance, iterate through the features and print their values

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution
for i in r.importances_mean.argsort()[::-1]:
     print(f"{df_cleaned.columns[i]}: {r.importances_mean[i]:.5f}")

### 4. Partial Dependence Plot

**[4.1]** Import `plot_partial_dependence` from `sklearn.inspection`

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution
from sklearn.inspection import PartialDependenceDisplay

**[4.2]** Define a function called `plot_pdp` with the following logics:
- input parameters: trained model (`model`), features (`X`), name of feature to be analysed (`feature_name`), name of features (`feature_cols`), values of target (`target_classes`)
- logics: find the index of the feature to be analysed and print the partial dependence plot for each value of the target class

In [None]:
def plot_pdp(model, X, feature_name, feature_cols, target_classes):
    feature_index = feature_cols.get_loc(feature_name)

    for target_class in target_classes:
        print(f"PDP for `{feature_cols[feature_index]}` ({feature_index}) with target {target_class}")

        PartialDependenceDisplay.from_estimator(
            model,
            X,
            features=[feature_index],
            feature_names=df_cleaned.columns,
            target=target_class
        )

**[4.3]** Create a list called `target_classes` containing the list of all values from the target variable

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution
target_classes=[0, 1, 2, 3]

**[4.4]** Display the partial dependence plot for the `payment_type` feature on the training set with `plot_pdp` function

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution
plot_pdp(model=lightgbm_best, X=X_train, feature_name='fare_amount', feature_cols=df_cleaned.columns, target_classes=target_classes)

### 5.   LIME

**[5.1]** Import `LimeTabularExplainer` from `lime.lime_tabular`

In [None]:
# Placeholder for student's code (Python code)

In [None]:
from lime.lime_tabular import LimeTabularExplainer

**[5.2]** Create a `LimeTabularExplainer` with the training set and save it into a variable called `lime_explainer`

In [None]:
lime_explainer = LimeTabularExplainer(X_train,
      feature_names=df_cleaned.columns,
      class_names=target_classes,
      mode='classification',
      discretize_continuous=False
)

**[5.3]** Analyse the first observation from the testing set with `lime_explainer` for the top value of the target variable

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution
exp = lime_explainer.explain_instance(
    X_test[0],
    lightgbm_best.predict_proba,
    top_labels=1,
    num_features=20)

**[5.4]** Display the results with `show_in_notebook`

In [None]:
# Placeholder for student's code (Python code)

In [None]:
# Solution
exp.show_in_notebook()

### 6.   Push changes

**[6.1]** Add you changes to git staging area

In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution:
git add .

**[6.2]** Create the snapshot of your repository and add a description

In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution:
git commit -m "lightgbm interpretation"

**[6.3]** Push your snapshot to Github

In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution:
git push

**[6.4]** Go to Github and merge the branch after reviewing the code and fixing any conflict




**[6.5]** Check out to the master branch

In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution:
git checkout master

**[6.6]** Pull the latest updates


In [None]:
# Placeholder for student's code (command line)

In [None]:
git pull