*PART 4: Second and Third Steps*

***

# What Will Happen in This Part?

We will be creating 3 different models to predict `grade`, `sub_grade` and `int_rate`.

# Data and Module Import

In [25]:
import pandas as pd
import numpy as np
import warnings
from utilities import *
from sklearn.preprocessing import StandardScaler, MinMaxScaler 
from category_encoders import OneHotEncoder, OrdinalEncoder, PolynomialEncoder, HelmertEncoder
from sklearn.compose import ColumnTransformer
from imblearn.pipeline import Pipeline, make_pipeline
from sklearn.decomposition import PCA, TruncatedSVD
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, make_scorer, fbeta_score, f1_score
from sklearn.model_selection import cross_validate, GridSearchCV, StratifiedKFold, train_test_split
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.metrics import precision_recall_curve, mean_absolute_error
from sklearn.utils import resample
import optuna
import pickle
from sklearn.utils import class_weight

warnings.filterwarnings('ignore')

df = pd.read_csv('./processed_data.csv', index_col=0)

df.head()

Unnamed: 0,loan_amnt,term,emp_title,emp_length,home_ownership,annual_inc,purpose,addr_state,dti,inq_last_6mths,...,pub_rec_bankruptcies,tot_hi_cred_lim,total_bal_ex_mort,total_bc_limit,total_il_high_credit_limit,disbursement_method,grade,sub_grade,loan_status,int_rate
0,3600.0,36 months,other,10+ years,MORTGAGE,55000.0,debt_consolidation,PA,5.91,1.0,...,0.0,178050.0,7746.0,2400.0,13734.0,Cash,C,C4,1,13.99
2,20000.0,60 months,driver,10+ years,MORTGAGE,63000.0,home_improvement,IL,10.78,0.0,...,0.0,218418.0,18696.0,6200.0,14877.0,Cash,B,B4,1,10.78
4,10400.0,60 months,other,3 years,MORTGAGE,104433.0,major_purchase,PA,25.37,3.0,...,0.0,439570.0,95768.0,20300.0,88097.0,Cash,F,F1,1,22.45
5,11950.0,36 months,other,4 years,RENT,34000.0,debt_consolidation,GA,10.2,0.0,...,0.0,16900.0,12798.0,9400.0,4000.0,Cash,C,C3,1,13.44
7,20000.0,36 months,driver,10+ years,MORTGAGE,85000.0,major_purchase,SC,17.61,0.0,...,0.0,193390.0,27937.0,14500.0,36144.0,Cash,B,B1,1,8.49


***

# 5. Predicting Grade

We will be using Random Forest Classifiers for this task, as it was the best model for classifying `loan_status` and the data hasn't changed.

## 5.1. Data Preparation

First, let's cut off our target feature and remove other target features.

In [2]:
X = df.drop(columns=['grade', 'sub_grade', 'int_rate'])
y = df['grade']

Sampling down the dataset for less resourcefully demanding model.

In [3]:
X, y = resample(X, y, n_samples=len(X) // 10, random_state=42)

Train, test split.

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42, stratify=y)

Saving column names as either numerical or categorical features.

In [5]:
categorical_features = X.dtypes[df.dtypes != np.number].index
numerical_features = X.dtypes[df.dtypes == np.number].index

Calculating class weights for `grade`.

In [6]:
y_unique = np.unique(y_train.to_list())

class_weights = class_weight.compute_class_weight('balanced', classes=y_unique, y=y_train)

class_weights = {y_unique[i]: class_weights[i] for i in range(len(y_unique))}

### Model Tuning

In [7]:
encoder = OneHotEncoder()

column_transformer = ColumnTransformer([
    ('cat', encoder, categorical_features)
])

def objective(trial):
    n_estimators = trial.suggest_int('n_estimators', 10, 200, log=True)
    max_depth = trial.suggest_int('max_depth', 2, 32)
    min_samples_split = trial.suggest_int('min_samples_split', 2, 10)
    min_samples_leaf = trial.suggest_int('min_samples_leaf', 1, 10)

    model = RandomForestClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth,
        min_samples_split=min_samples_split,
        min_samples_leaf=min_samples_leaf,
        class_weight=class_weights,
        random_state=42
    )

    pipe = make_pipeline(column_transformer, TruncatedSVD(), model)

    pipe.fit(X_train, y_train)

    y_pred = pipe.predict(X_test)
    
    accuracy = accuracy_score(y_pred, y_test)
    
    return accuracy

In [8]:
study = optuna.create_study(direction='maximize')

study.optimize(objective, n_trials=20, show_progress_bar=True)

[I 2023-08-31 19:25:19,883] A new study created in memory with name: no-name-bd9e5e31-4c13-4d13-9570-355c704d8119
Best trial: 0. Best value: 0.230004:   5%|▌         | 1/20 [00:01<00:33,  1.76s/it]

[I 2023-08-31 19:25:21,644] Trial 0 finished with value: 0.230004366958917 and parameters: {'n_estimators': 19, 'max_depth': 9, 'min_samples_split': 7, 'min_samples_leaf': 7}. Best is trial 0 with value: 0.230004366958917.


Best trial: 0. Best value: 0.230004:  10%|█         | 2/20 [00:04<00:39,  2.17s/it]

[I 2023-08-31 19:25:24,098] Trial 1 finished with value: 0.2186502737747321 and parameters: {'n_estimators': 37, 'max_depth': 8, 'min_samples_split': 7, 'min_samples_leaf': 2}. Best is trial 0 with value: 0.230004366958917.


Best trial: 2. Best value: 0.282072:  15%|█▌        | 3/20 [00:07<00:44,  2.62s/it]

[I 2023-08-31 19:25:27,264] Trial 2 finished with value: 0.28207195404615537 and parameters: {'n_estimators': 32, 'max_depth': 18, 'min_samples_split': 10, 'min_samples_leaf': 2}. Best is trial 2 with value: 0.28207195404615537.


Best trial: 3. Best value: 0.286405:  20%|██        | 4/20 [00:09<00:38,  2.38s/it]

[I 2023-08-31 19:25:29,275] Trial 3 finished with value: 0.2864053209714804 and parameters: {'n_estimators': 14, 'max_depth': 30, 'min_samples_split': 6, 'min_samples_leaf': 1}. Best is trial 3 with value: 0.2864053209714804.


Best trial: 4. Best value: 0.291746:  25%|██▌       | 5/20 [00:13<00:46,  3.10s/it]

[I 2023-08-31 19:25:33,637] Trial 4 finished with value: 0.29174644764688096 and parameters: {'n_estimators': 49, 'max_depth': 17, 'min_samples_split': 5, 'min_samples_leaf': 1}. Best is trial 4 with value: 0.29174644764688096.


Best trial: 4. Best value: 0.291746:  30%|███       | 6/20 [00:18<00:50,  3.62s/it]

[I 2023-08-31 19:25:38,284] Trial 5 finished with value: 0.24596056300178037 and parameters: {'n_estimators': 75, 'max_depth': 10, 'min_samples_split': 6, 'min_samples_leaf': 2}. Best is trial 4 with value: 0.29174644764688096.


Best trial: 4. Best value: 0.291746:  35%|███▌      | 7/20 [00:29<01:17,  5.97s/it]

[I 2023-08-31 19:25:49,080] Trial 6 finished with value: 0.28092982632940305 and parameters: {'n_estimators': 131, 'max_depth': 29, 'min_samples_split': 6, 'min_samples_leaf': 4}. Best is trial 4 with value: 0.29174644764688096.


Best trial: 4. Best value: 0.291746:  40%|████      | 8/20 [00:30<00:55,  4.64s/it]

[I 2023-08-31 19:25:50,876] Trial 7 finished with value: 0.26215190298632807 and parameters: {'n_estimators': 16, 'max_depth': 13, 'min_samples_split': 7, 'min_samples_leaf': 3}. Best is trial 4 with value: 0.29174644764688096.


Best trial: 4. Best value: 0.291746:  45%|████▌     | 9/20 [00:32<00:41,  3.81s/it]

[I 2023-08-31 19:25:52,858] Trial 8 finished with value: 0.22986999899224025 and parameters: {'n_estimators': 24, 'max_depth': 9, 'min_samples_split': 10, 'min_samples_leaf': 4}. Best is trial 4 with value: 0.29174644764688096.


Best trial: 4. Best value: 0.291746:  50%|█████     | 10/20 [00:34<00:30,  3.06s/it]

[I 2023-08-31 19:25:54,227] Trial 9 finished with value: 0.20343310154859082 and parameters: {'n_estimators': 15, 'max_depth': 6, 'min_samples_split': 4, 'min_samples_leaf': 8}. Best is trial 4 with value: 0.29174644764688096.


Best trial: 4. Best value: 0.291746:  55%|█████▌    | 11/20 [00:39<00:32,  3.61s/it]

[I 2023-08-31 19:25:59,106] Trial 10 finished with value: 0.26826564547011994 and parameters: {'n_estimators': 59, 'max_depth': 22, 'min_samples_split': 2, 'min_samples_leaf': 10}. Best is trial 4 with value: 0.29174644764688096.


Best trial: 4. Best value: 0.291746:  60%|██████    | 12/20 [00:40<00:24,  3.03s/it]

[I 2023-08-31 19:26:00,791] Trial 11 finished with value: 0.29036917598844436 and parameters: {'n_estimators': 10, 'max_depth': 31, 'min_samples_split': 4, 'min_samples_leaf': 1}. Best is trial 4 with value: 0.29174644764688096.


Best trial: 4. Best value: 0.291746:  65%|██████▌   | 13/20 [00:42<00:18,  2.59s/it]

[I 2023-08-31 19:26:02,376] Trial 12 finished with value: 0.27548792367899494 and parameters: {'n_estimators': 10, 'max_depth': 23, 'min_samples_split': 4, 'min_samples_leaf': 5}. Best is trial 4 with value: 0.29174644764688096.


Best trial: 4. Best value: 0.291746:  70%|███████   | 14/20 [00:46<00:18,  3.13s/it]

[I 2023-08-31 19:26:06,748] Trial 13 finished with value: 0.28986529611340656 and parameters: {'n_estimators': 49, 'max_depth': 17, 'min_samples_split': 4, 'min_samples_leaf': 1}. Best is trial 4 with value: 0.29174644764688096.


Best trial: 4. Best value: 0.291746:  75%|███████▌  | 15/20 [00:49<00:15,  3.06s/it]

[I 2023-08-31 19:26:09,634] Trial 14 finished with value: 0.2710201887869932 and parameters: {'n_estimators': 28, 'max_depth': 26, 'min_samples_split': 2, 'min_samples_leaf': 6}. Best is trial 4 with value: 0.29174644764688096.


Best trial: 4. Best value: 0.291746:  80%|████████  | 16/20 [00:50<00:09,  2.47s/it]

[I 2023-08-31 19:26:10,743] Trial 15 finished with value: 0.18915650508918674 and parameters: {'n_estimators': 10, 'max_depth': 3, 'min_samples_split': 3, 'min_samples_leaf': 1}. Best is trial 4 with value: 0.29174644764688096.


Best trial: 4. Best value: 0.291746:  85%|████████▌ | 17/20 [00:54<00:08,  2.85s/it]

[I 2023-08-31 19:26:14,493] Trial 16 finished with value: 0.28200477006281705 and parameters: {'n_estimators': 40, 'max_depth': 16, 'min_samples_split': 5, 'min_samples_leaf': 3}. Best is trial 4 with value: 0.29174644764688096.


Best trial: 4. Best value: 0.291746:  90%|█████████ | 18/20 [00:57<00:05,  2.74s/it]

[I 2023-08-31 19:26:16,963] Trial 17 finished with value: 0.2631596627364036 and parameters: {'n_estimators': 23, 'max_depth': 32, 'min_samples_split': 8, 'min_samples_leaf': 9}. Best is trial 4 with value: 0.29174644764688096.


Best trial: 4. Best value: 0.291746:  95%|█████████▌| 19/20 [01:02<00:03,  3.62s/it]

[I 2023-08-31 19:26:22,636] Trial 18 finished with value: 0.2774362591958077 and parameters: {'n_estimators': 67, 'max_depth': 21, 'min_samples_split': 5, 'min_samples_leaf': 5}. Best is trial 4 with value: 0.29174644764688096.


Best trial: 4. Best value: 0.291746: 100%|██████████| 20/20 [01:10<00:00,  3.53s/it]

[I 2023-08-31 19:26:30,465] Trial 19 finished with value: 0.28438980147132925 and parameters: {'n_estimators': 91, 'max_depth': 26, 'min_samples_split': 3, 'min_samples_leaf': 3}. Best is trial 4 with value: 0.29174644764688096.





Our best model for predicting grade only has an accuracy of 29%.

### Saving the Model

In [9]:
model = RandomForestClassifier(**study.best_params)

pickle.dump(model, open('./model/app/model_grade.pkl', 'wb'))

# 6. Predicting Sub Grade

## 6.1. Data Preparation

Cutting of the target feature.

In [10]:
X = df.drop(columns=['sub_grade', 'int_rate'])
y = df['sub_grade']

Sampling down the data.

In [11]:
X, y = resample(X, y, n_samples=len(X) // 10, random_state=42)

Train, test split.

In [12]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42, stratify=y)

Saving feature names.

In [13]:
categorical_features = X.dtypes[df.dtypes != np.number].index
numerical_features = X.dtypes[df.dtypes == np.number].index

Calculating class weights.

In [14]:
y_unique = np.unique(y_train.to_list())

class_weights = class_weight.compute_class_weight('balanced', classes=y_unique, y=y_train)

class_weights = {y_unique[i]: class_weights[i] for i in range(len(y_unique))}

### Model Tuning

In [15]:
encoder = OneHotEncoder()

column_transformer = ColumnTransformer([
    ('cat', encoder, categorical_features)
])

def objective(trial):
    n_estimators = trial.suggest_int('n_estimators', 10, 200, log=True)
    max_depth = trial.suggest_int('max_depth', 2, 32)
    min_samples_split = trial.suggest_int('min_samples_split', 2, 10)
    min_samples_leaf = trial.suggest_int('min_samples_leaf', 1, 10)

    model = RandomForestClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth,
        min_samples_split=min_samples_split,
        min_samples_leaf=min_samples_leaf,
        class_weight=class_weights,
        random_state=42
    )

    pipe = make_pipeline(column_transformer, TruncatedSVD(), model)

    pipe.fit(X_train, y_train)

    y_pred = pipe.predict(X_test)
    
    accuracy = accuracy_score(y_pred, y_test)
    
    return accuracy

In [16]:
study = optuna.create_study(direction='maximize')

study.optimize(objective, n_trials=20, show_progress_bar=True)

[I 2023-08-31 19:26:30,783] A new study created in memory with name: no-name-895a01b2-0e4c-4d98-94d5-f4b753bbde10
Best trial: 0. Best value: 0.0991972:   5%|▌         | 1/20 [00:03<00:57,  3.01s/it]

[I 2023-08-31 19:26:33,790] Trial 0 finished with value: 0.09919715139910645 and parameters: {'n_estimators': 34, 'max_depth': 8, 'min_samples_split': 6, 'min_samples_leaf': 8}. Best is trial 0 with value: 0.09919715139910645.


Best trial: 1. Best value: 0.129531:  10%|█         | 2/20 [00:16<02:40,  8.92s/it] 

[I 2023-08-31 19:26:46,855] Trial 1 finished with value: 0.12953071987638148 and parameters: {'n_estimators': 159, 'max_depth': 11, 'min_samples_split': 3, 'min_samples_leaf': 10}. Best is trial 1 with value: 0.12953071987638148.


Best trial: 2. Best value: 0.169236:  15%|█▌        | 3/20 [00:20<01:56,  6.86s/it]

[I 2023-08-31 19:26:51,251] Trial 2 finished with value: 0.1692364540293594 and parameters: {'n_estimators': 33, 'max_depth': 27, 'min_samples_split': 5, 'min_samples_leaf': 8}. Best is trial 2 with value: 0.1692364540293594.


Best trial: 2. Best value: 0.169236:  20%|██        | 4/20 [00:25<01:39,  6.21s/it]

[I 2023-08-31 19:26:56,476] Trial 3 finished with value: 0.16238368772884545 and parameters: {'n_estimators': 42, 'max_depth': 24, 'min_samples_split': 10, 'min_samples_leaf': 10}. Best is trial 2 with value: 0.1692364540293594.


Best trial: 2. Best value: 0.169236:  25%|██▌       | 5/20 [00:30<01:25,  5.67s/it]

[I 2023-08-31 19:27:01,185] Trial 4 finished with value: 0.10144781484094192 and parameters: {'n_estimators': 62, 'max_depth': 8, 'min_samples_split': 8, 'min_samples_leaf': 10}. Best is trial 2 with value: 0.1692364540293594.


Best trial: 2. Best value: 0.169236:  30%|███       | 6/20 [00:46<02:09,  9.26s/it]

[I 2023-08-31 19:27:17,427] Trial 5 finished with value: 0.15344821794484195 and parameters: {'n_estimators': 166, 'max_depth': 14, 'min_samples_split': 5, 'min_samples_leaf': 7}. Best is trial 2 with value: 0.1692364540293594.


Best trial: 2. Best value: 0.169236:  35%|███▌      | 7/20 [00:50<01:38,  7.56s/it]

[I 2023-08-31 19:27:21,485] Trial 6 finished with value: 0.11387685175854076 and parameters: {'n_estimators': 44, 'max_depth': 9, 'min_samples_split': 10, 'min_samples_leaf': 2}. Best is trial 2 with value: 0.1692364540293594.


Best trial: 2. Best value: 0.169236:  40%|████      | 8/20 [00:53<01:11,  5.98s/it]

[I 2023-08-31 19:27:24,077] Trial 7 finished with value: 0.15321307400315765 and parameters: {'n_estimators': 16, 'max_depth': 14, 'min_samples_split': 3, 'min_samples_leaf': 2}. Best is trial 2 with value: 0.1692364540293594.


Best trial: 2. Best value: 0.169236:  45%|████▌     | 9/20 [00:58<01:02,  5.69s/it]

[I 2023-08-31 19:27:29,114] Trial 8 finished with value: 0.1574456649534751 and parameters: {'n_estimators': 42, 'max_depth': 15, 'min_samples_split': 8, 'min_samples_leaf': 6}. Best is trial 2 with value: 0.1692364540293594.


Best trial: 2. Best value: 0.169236:  50%|█████     | 10/20 [01:02<00:51,  5.15s/it]

[I 2023-08-31 19:27:33,075] Trial 9 finished with value: 0.09966743928247505 and parameters: {'n_estimators': 50, 'max_depth': 8, 'min_samples_split': 10, 'min_samples_leaf': 4}. Best is trial 2 with value: 0.1692364540293594.


Best trial: 10. Best value: 0.175317:  55%|█████▌    | 11/20 [01:04<00:39,  4.35s/it]

[I 2023-08-31 19:27:35,607] Trial 10 finished with value: 0.17531660452148207 and parameters: {'n_estimators': 14, 'max_depth': 28, 'min_samples_split': 5, 'min_samples_leaf': 5}. Best is trial 10 with value: 0.17531660452148207.


Best trial: 11. Best value: 0.178273:  60%|██████    | 12/20 [01:06<00:29,  3.68s/it]

[I 2023-08-31 19:27:37,761] Trial 11 finished with value: 0.17827269978837046 and parameters: {'n_estimators': 10, 'max_depth': 32, 'min_samples_split': 5, 'min_samples_leaf': 4}. Best is trial 11 with value: 0.17827269978837046.


Best trial: 12. Best value: 0.180154:  65%|██████▌   | 13/20 [01:09<00:22,  3.25s/it]

[I 2023-08-31 19:27:40,032] Trial 12 finished with value: 0.18015385132184486 and parameters: {'n_estimators': 10, 'max_depth': 32, 'min_samples_split': 2, 'min_samples_leaf': 4}. Best is trial 12 with value: 0.18015385132184486.


Best trial: 12. Best value: 0.180154:  70%|███████   | 14/20 [01:11<00:17,  2.97s/it]

[I 2023-08-31 19:27:42,349] Trial 13 finished with value: 0.17773522792166346 and parameters: {'n_estimators': 11, 'max_depth': 32, 'min_samples_split': 2, 'min_samples_leaf': 4}. Best is trial 12 with value: 0.18015385132184486.


Best trial: 14. Best value: 0.195976:  75%|███████▌  | 15/20 [01:13<00:13,  2.74s/it]

[I 2023-08-31 19:27:44,545] Trial 14 finished with value: 0.1959756793980315 and parameters: {'n_estimators': 10, 'max_depth': 20, 'min_samples_split': 2, 'min_samples_leaf': 1}. Best is trial 14 with value: 0.1959756793980315.


Best trial: 14. Best value: 0.195976:  80%|████████  | 16/20 [01:15<00:09,  2.33s/it]

[I 2023-08-31 19:27:45,920] Trial 15 finished with value: 0.04954818771204945 and parameters: {'n_estimators': 18, 'max_depth': 2, 'min_samples_split': 2, 'min_samples_leaf': 1}. Best is trial 14 with value: 0.1959756793980315.


Best trial: 14. Best value: 0.195976:  85%|████████▌ | 17/20 [01:17<00:06,  2.27s/it]

[I 2023-08-31 19:27:48,056] Trial 16 finished with value: 0.18942524102254021 and parameters: {'n_estimators': 10, 'max_depth': 19, 'min_samples_split': 3, 'min_samples_leaf': 1}. Best is trial 14 with value: 0.1959756793980315.


Best trial: 17. Best value: 0.200645:  90%|█████████ | 18/20 [01:20<00:05,  2.66s/it]

[I 2023-08-31 19:27:51,632] Trial 17 finished with value: 0.20064496624004838 and parameters: {'n_estimators': 21, 'max_depth': 20, 'min_samples_split': 3, 'min_samples_leaf': 1}. Best is trial 17 with value: 0.20064496624004838.


Best trial: 17. Best value: 0.200645:  95%|█████████▌| 19/20 [01:24<00:02,  2.95s/it]

[I 2023-08-31 19:27:55,256] Trial 18 finished with value: 0.19029863280593906 and parameters: {'n_estimators': 22, 'max_depth': 20, 'min_samples_split': 4, 'min_samples_leaf': 2}. Best is trial 17 with value: 0.20064496624004838.


Best trial: 17. Best value: 0.200645: 100%|██████████| 20/20 [01:28<00:00,  4.41s/it]

[I 2023-08-31 19:27:58,948] Trial 19 finished with value: 0.1880479693641036 and parameters: {'n_estimators': 22, 'max_depth': 23, 'min_samples_split': 4, 'min_samples_leaf': 3}. Best is trial 17 with value: 0.20064496624004838.





Predicting sub grade only has an accuracy of 20%.

### Saving the model

In [None]:
model = RandomForestClassifier(**study.best_params)

pickle.dump(model, open('./model/app/model_grade.pkl', 'wb'))

# 7. Predicting Interest Rate

## 7.1. Data Preparation

Splitting data into X and y.

In [18]:
X = df.drop(columns=['int_rate'])
y = df['int_rate']

Sampling down.

In [19]:
X, y = resample(X, y, n_samples=len(X) // 10, random_state=42)

Train, test split.

In [20]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42, stratify=y)

Saving feature names.

In [21]:
categorical_features = X.dtypes[df.dtypes != np.number].index
numerical_features = X.dtypes[df.dtypes == np.number].index

### Model Tuning

In [22]:
encoder = OneHotEncoder()

column_transformer = ColumnTransformer([
    ('cat', encoder, categorical_features)
])

def objective(trial):
    n_estimators = trial.suggest_int('n_estimators', 10, 200, log=True)
    max_depth = trial.suggest_int('max_depth', 2, 32)
    min_samples_split = trial.suggest_int('min_samples_split', 2, 10)
    min_samples_leaf = trial.suggest_int('min_samples_leaf', 1, 10)

    model = RandomForestRegressor(
        n_estimators=n_estimators,
        max_depth=max_depth,
        min_samples_split=min_samples_split,
        min_samples_leaf=min_samples_leaf,
        random_state=42
    )

    pipe = make_pipeline(column_transformer, TruncatedSVD(), model)

    pipe.fit(X_train, y_train)

    y_pred = pipe.predict(X_test)
    
    mae = mean_absolute_error(y_test, y_pred)

    return mae

In [23]:
study = optuna.create_study(direction='minimize')

study.optimize(objective, n_trials=20, show_progress_bar=True)

[I 2023-08-31 19:27:59,244] A new study created in memory with name: no-name-20d44a51-f27e-4f1a-a63c-9fdce6f4afc5
Best trial: 0. Best value: 2.06893:   5%|▌         | 1/20 [00:10<03:26, 10.88s/it]

[I 2023-08-31 19:28:10,122] Trial 0 finished with value: 2.0689343861707785 and parameters: {'n_estimators': 87, 'max_depth': 13, 'min_samples_split': 8, 'min_samples_leaf': 1}. Best is trial 0 with value: 2.0689343861707785.


Best trial: 1. Best value: 1.97179:  10%|█         | 2/20 [00:14<01:59,  6.64s/it]

[I 2023-08-31 19:28:13,802] Trial 1 finished with value: 1.9717850066323606 and parameters: {'n_estimators': 21, 'max_depth': 25, 'min_samples_split': 7, 'min_samples_leaf': 9}. Best is trial 1 with value: 1.9717850066323606.


Best trial: 1. Best value: 1.97179:  15%|█▌        | 3/20 [00:16<01:13,  4.32s/it]

[I 2023-08-31 19:28:15,346] Trial 2 finished with value: 3.192795636797584 and parameters: {'n_estimators': 15, 'max_depth': 3, 'min_samples_split': 7, 'min_samples_leaf': 9}. Best is trial 1 with value: 1.9717850066323606.


Best trial: 1. Best value: 1.97179:  20%|██        | 4/20 [00:18<00:55,  3.46s/it]

[I 2023-08-31 19:28:17,499] Trial 3 finished with value: 2.504391182288969 and parameters: {'n_estimators': 13, 'max_depth': 9, 'min_samples_split': 3, 'min_samples_leaf': 2}. Best is trial 1 with value: 1.9717850066323606.


Best trial: 1. Best value: 1.97179:  25%|██▌       | 5/20 [00:24<01:05,  4.39s/it]

[I 2023-08-31 19:28:23,542] Trial 4 finished with value: 1.9892944565326922 and parameters: {'n_estimators': 42, 'max_depth': 15, 'min_samples_split': 9, 'min_samples_leaf': 5}. Best is trial 1 with value: 1.9717850066323606.


Best trial: 1. Best value: 1.97179:  30%|███       | 6/20 [00:37<01:41,  7.24s/it]

[I 2023-08-31 19:28:36,312] Trial 5 finished with value: 2.8232774465210033 and parameters: {'n_estimators': 197, 'max_depth': 6, 'min_samples_split': 4, 'min_samples_leaf': 6}. Best is trial 1 with value: 1.9717850066323606.


Best trial: 6. Best value: 1.91439:  35%|███▌      | 7/20 [00:42<01:26,  6.65s/it]

[I 2023-08-31 19:28:41,757] Trial 6 finished with value: 1.9143941389484176 and parameters: {'n_estimators': 32, 'max_depth': 28, 'min_samples_split': 8, 'min_samples_leaf': 5}. Best is trial 6 with value: 1.9143941389484176.


Best trial: 7. Best value: 1.85663:  40%|████      | 8/20 [00:47<01:14,  6.20s/it]

[I 2023-08-31 19:28:46,979] Trial 7 finished with value: 1.8566340071957297 and parameters: {'n_estimators': 26, 'max_depth': 30, 'min_samples_split': 7, 'min_samples_leaf': 1}. Best is trial 7 with value: 1.8566340071957297.


Best trial: 7. Best value: 1.85663:  45%|████▌     | 9/20 [00:50<00:57,  5.23s/it]

[I 2023-08-31 19:28:50,082] Trial 8 finished with value: 1.8811778960932894 and parameters: {'n_estimators': 13, 'max_depth': 28, 'min_samples_split': 6, 'min_samples_leaf': 2}. Best is trial 7 with value: 1.8566340071957297.


Best trial: 7. Best value: 1.85663:  50%|█████     | 10/20 [00:53<00:45,  4.56s/it]

[I 2023-08-31 19:28:53,147] Trial 9 finished with value: 2.0211358824350443 and parameters: {'n_estimators': 16, 'max_depth': 15, 'min_samples_split': 3, 'min_samples_leaf': 8}. Best is trial 7 with value: 1.8566340071957297.


Best trial: 7. Best value: 1.85663:  55%|█████▌    | 11/20 [00:58<00:41,  4.64s/it]

[I 2023-08-31 19:28:57,949] Trial 10 finished with value: 1.8947721552414385 and parameters: {'n_estimators': 26, 'max_depth': 22, 'min_samples_split': 10, 'min_samples_leaf': 3}. Best is trial 7 with value: 1.8566340071957297.


Best trial: 7. Best value: 1.85663:  60%|██████    | 12/20 [01:01<00:32,  4.00s/it]

[I 2023-08-31 19:29:00,510] Trial 11 finished with value: 1.9059032066364738 and parameters: {'n_estimators': 10, 'max_depth': 32, 'min_samples_split': 5, 'min_samples_leaf': 3}. Best is trial 7 with value: 1.8566340071957297.


Best trial: 7. Best value: 1.85663:  65%|██████▌   | 13/20 [01:03<00:25,  3.60s/it]

[I 2023-08-31 19:29:03,190] Trial 12 finished with value: 1.8866002003912392 and parameters: {'n_estimators': 10, 'max_depth': 32, 'min_samples_split': 6, 'min_samples_leaf': 1}. Best is trial 7 with value: 1.8566340071957297.


Best trial: 7. Best value: 1.85663:  70%|███████   | 14/20 [01:07<00:22,  3.70s/it]

[I 2023-08-31 19:29:07,124] Trial 13 finished with value: 1.885886689258857 and parameters: {'n_estimators': 20, 'max_depth': 21, 'min_samples_split': 5, 'min_samples_leaf': 3}. Best is trial 7 with value: 1.8566340071957297.


Best trial: 14. Best value: 1.84973:  75%|███████▌  | 15/20 [01:14<00:22,  4.46s/it]

[I 2023-08-31 19:29:13,329] Trial 14 finished with value: 1.8497272884945966 and parameters: {'n_estimators': 32, 'max_depth': 27, 'min_samples_split': 6, 'min_samples_leaf': 1}. Best is trial 14 with value: 1.8497272884945966.


Best trial: 14. Best value: 1.84973:  80%|████████  | 16/20 [01:21<00:21,  5.35s/it]

[I 2023-08-31 19:29:20,751] Trial 15 finished with value: 1.939733537106875 and parameters: {'n_estimators': 49, 'max_depth': 21, 'min_samples_split': 2, 'min_samples_leaf': 7}. Best is trial 14 with value: 1.8497272884945966.


Best trial: 14. Best value: 1.84973:  85%|████████▌ | 17/20 [01:26<00:15,  5.25s/it]

[I 2023-08-31 19:29:25,772] Trial 16 finished with value: 1.898664005891477 and parameters: {'n_estimators': 28, 'max_depth': 27, 'min_samples_split': 7, 'min_samples_leaf': 4}. Best is trial 14 with value: 1.8497272884945966.


Best trial: 17. Best value: 1.84101:  90%|█████████ | 18/20 [01:34<00:11,  5.95s/it]

[I 2023-08-31 19:29:33,342] Trial 17 finished with value: 1.8410143697228312 and parameters: {'n_estimators': 41, 'max_depth': 24, 'min_samples_split': 5, 'min_samples_leaf': 1}. Best is trial 17 with value: 1.8410143697228312.


Best trial: 17. Best value: 1.84101:  95%|█████████▌| 19/20 [01:43<00:06,  6.87s/it]

[I 2023-08-31 19:29:42,356] Trial 18 finished with value: 1.8509658558346282 and parameters: {'n_estimators': 52, 'max_depth': 24, 'min_samples_split': 5, 'min_samples_leaf': 2}. Best is trial 17 with value: 1.8410143697228312.


Best trial: 17. Best value: 1.84101: 100%|██████████| 20/20 [01:53<00:00,  5.65s/it]

[I 2023-08-31 19:29:52,281] Trial 19 finished with value: 1.908366453143178 and parameters: {'n_estimators': 66, 'max_depth': 18, 'min_samples_split': 4, 'min_samples_leaf': 4}. Best is trial 17 with value: 1.8410143697228312.





Tuned model has Mean Absolute Error of 1.8410143697228312.

### Saving the Model

In [24]:
model = RandomForestRegressor(**study.best_params)

pickle.dump(model, open('./model/app/model_int_rate.pkl', 'wb'))

***

# End of Part 4

What was done?

We created and saved three models:
* Model predicting loan grade
* Model predicting loan sub grade
* Model predicting interest rate