## Install necessary package for Target Encoder

In [None]:
!pip install category_encoders

## 0. Load Data and Split

In [None]:
# Read the provided labeled training data
df3 = pd.read_csv("https://drive.google.com/uc?export=download&id=1wOhyCnvGeY4jplxI8lZ-bbYN3zLtickf")
df3.info()

from sklearn.model_selection import train_test_split

X = df3.drop('BadCredit', axis=1)
y = df3['BadCredit']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6000 entries, 0 to 5999
Data columns (total 17 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   UserID             6000 non-null   object
 1   Sex                6000 non-null   object
 2   PreviousDefault    6000 non-null   int64 
 3   FirstName          6000 non-null   object
 4   LastName           6000 non-null   object
 5   NumberPets         6000 non-null   int64 
 6   PreviousAccounts   6000 non-null   int64 
 7   ResidenceDuration  6000 non-null   int64 
 8   Street             6000 non-null   object
 9   LicensePlate       6000 non-null   object
 10  BadCredit          6000 non-null   int64 
 11  Amount             6000 non-null   int64 
 12  Married            6000 non-null   int64 
 13  Duration           6000 non-null   int64 
 14  City               6000 non-null   object
 15  Purpose            6000 non-null   object
 16  DateOfBirth        6000 non-null   object


In [None]:
X_train.head(5)

Unnamed: 0,UserID,Sex,PreviousDefault,FirstName,LastName,NumberPets,PreviousAccounts,ResidenceDuration,Street,LicensePlate,Amount,Married,Duration,City,Purpose,DateOfBirth
3897,236-22-6766,M,0,Jerry,Black,2,0,2,0466 Brown Wall,3-U8282,3329,0,12,New Roberttown,Household,1970-04-22
5628,766-20-5986,F,0,Julia,Jones,0,2,2,6095 Larson Causeway,LWO 912,2996,0,36,Ericmouth,Household,1964-06-19
1756,744-25-5747,F,0,Abigail,Estrada,2,0,3,293 Michael Divide,715 OQT,2470,0,24,East Jill,NewCar,1975-02-17
2346,463-78-3098,F,0,Jessica,Jones,2,1,2,02759 Williams Roads,869 SYK,3745,0,30,Lake Debra,UsedCar,1977-02-16
2996,414-44-6527,M,0,William,Shaffer,0,1,3,19797 Turner Rue,48-A601,3549,0,36,North Judithbury,Vacation,1976-07-27


## 1. Baseline Model

Logistic regression will be the algorithm used for this binary classification problem. Because it is a classification problem, Macro F1 score is chosen as the scoring parameter to measure model performance. Macro F1 score is chosen because it reflects both precision and recall metrics and will not mask the poor performance of a prediction class that has low support; thus, this results in the best overall reflection of model performance. 

In [None]:
from sklearn.pipeline import make_pipeline
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import StandardScaler, FunctionTransformer 
from category_encoders.target_encoder import TargetEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score

The `UserID` feature is hypothesized to not affect the credit risk of an applicant. It is likely a random character sequence that has no value to evaluate the candidacy of a loan applicant. Therefore, this feature will be dropped when building prediction models for this entire question.

As part of basic feature engineering, target encoding is used to encode categorical features.

In [None]:
# Make a list to store names of columns to drop
columns_to_drop = ['UserID']

# Get list of categorical features to apply target encoding to
categorical_features = list(set(X_train.columns).difference(set(list(\
                                            X_train._get_numeric_data().columns) + columns_to_drop)))

In [None]:
# Make a feature engineering transformer to drop UserID and apply TargetEncoder
fe_transformer = make_column_transformer(
                      ('drop', columns_to_drop),
                      (TargetEncoder(), categorical_features),
                      remainder='passthrough')

In [None]:
# Create pipeline with feature engineering transformer and logistic regression
pipe1 = make_pipeline(fe_transformer,
                    LogisticRegression(random_state=42))

In [None]:
# Cross-validate the model and print the results
cv_scores = cross_val_score(pipe1, X_train, y_train, scoring='f1_macro', cv=10)

# Calculate mean and standard deviation of scores
avg = cv_scores.mean()
stddev = cv_scores.std()

# Print results
print("Scores:", [round(score, 4) for score in cv_scores], '\n')
print(f"Mean: {avg:.4f}")
print(f"Std. Dev: {stddev:.4f}")
print(f"+/-2 std. dev. range within mean: ({avg - 2*stddev:.4f}, {avg + 2*stddev:.4f})")

Scores: [0.5174, 0.5213, 0.5169, 0.5011, 0.4856, 0.5123, 0.4659, 0.4772, 0.5084, 0.4659] 

Mean: 0.4972
Std. Dev: 0.0206
+/-2 std. dev. range within mean: (0.4560, 0.5384)


The mean Macro F1 score of the baseline model that is obtained via cross-validation is 0.4972.

## 2. Feature Engineering

### Feature Engineering #1:
Calculate age from `DateOfBirth` feature.

In [None]:
# Calculate age in years to 3 decimal places given a date of birth
# Age is relative to December 26, 2021

from datetime import date, datetime
def calculate_age(df):
  relative_date = date(2021, 12, 26)
  
  return df.apply(lambda x: [round((relative_date - datetime.strptime(dob, '%Y-%m-%d').date()).days / 365, 3)
                             for dob in x])

### Feature Engineering #2:
A feature that  indicates whether a person is or is not within the typical societal working age range of 25-60 is created. It is likely those under 25 and above 60 will not be working full-time. Thus, it is hypothesized that people within this working age range are more likely to be good loan applicants because they are more likely to have a sufficient employment income to make repayments.

In [None]:
def is_working_age(df):
   
  def check_working_age(birthdate):
    relative_date = date(2021, 12, 26)
    age = round((relative_date - datetime.strptime(birthdate, '%Y-%m-%d').date()).days / 365, 3)
    return int(age >= 25 and age <= 60)
  
  return df.apply(lambda x: [check_working_age(dob) for dob in x])

### Feature Engineering #3:
As seen with the baseline model, target encoding is used to encode categorical features.

### Feature Engineering #4:
Standard scaling is applied to all features once they are in numerical form in an effort to improve model performance. Scaling of data is
required when logistic regression is regularized. Regularization strength is a logistic regrssion hyperparameter that can be tuned.

### Train and Cross-Validate Logistic Regression Model after Feature Engineering

In [None]:
# Make FunctionTransformers for custom feature engineering functions
calculate_age_transformer = FunctionTransformer(calculate_age)
working_age_transformer = FunctionTransformer(is_working_age)

In [None]:
# Make a feature engineering transformer to drop UserID and for the first 3 feature engineering steps:
# applying calculate_age, is_working_age, and TargetEncoder

fe_transformer = make_column_transformer(
                      ('drop', columns_to_drop),
                      (calculate_age_transformer, ['DateOfBirth']),
                      (working_age_transformer, ['DateOfBirth']),
                      (TargetEncoder(), categorical_features),
                      remainder='passthrough')

In [None]:
# Create pipeline with feature engineering transformer, standard scaler, and logistic regression
pipe2 = make_pipeline(fe_transformer,
                      StandardScaler(),
                      LogisticRegression(random_state=42))

In [None]:
# Cross-validate the model and print the results
cv_scores = cross_val_score(pipe2, X_train, y_train, scoring='f1_macro', cv=10)

# Calculate mean and standard deviation of scores
avg = cv_scores.mean()
stddev = cv_scores.std()

# Print results
print("Scores:", [round(score, 4) for score in cv_scores], '\n')
print(f"Mean: {avg:.4f}")
print(f"Std. Dev: {stddev:.4f}")
print(f"+/-2 std. dev. range within mean: ({avg - 2*stddev:.4f}, {avg + 2*stddev:.4f})")

Scores: [0.7323, 0.7138, 0.6862, 0.6338, 0.7266, 0.7014, 0.7101, 0.7376, 0.683, 0.7225] 

Mean: 0.7047
Std. Dev: 0.0293
+/-2 std. dev. range within mean: (0.6461, 0.7634)


After applying feature engineering, the model mean Macro F1 score improved to 0.7047 from the baseline model mean performance score of 0.4972. However, the standard deviation of the feature engineered model is greater than that of the baseline model suggesting there is increased variability of the prediction ability of the featured engineered model. Feature selection will now be applied in an attempt to improve model performance.

## 3. Feature Selection

### High-Cardinality Categorical Features - Street and LicensePlate

In [None]:
# Return the number of unique values of the categorical features of interest
X_train[categorical_features].nunique()

DateOfBirth     3570
LastName         907
FirstName        568
LicensePlate    4799
Purpose            8
Sex                2
Street          4800
City              20
dtype: int64

All instances have a unique value for `Street` and all but one have a unique value for `LicensePlate`. When included, these two features are hypothesized to make the classification model less performant. The model performance will be assessed after dropping these features.

In [None]:
# Get the updated list of categorical features to apply TargetEncoder to
updated_columns_to_drop = ['UserID', 'Street', 'LicensePlate']

updated_categorical_features = list(set(X_train.columns).difference(set(list(\
                                            X_train._get_numeric_data().columns) + updated_columns_to_drop)))

In [None]:
# Make an updated feature engineering transformer to drop unwanted categorical features and 
# for the first 3 feature engineering steps (applying calculate_age, is_working_age, and TargetEncoder)

updated_fe_transformer = make_column_transformer(
                      ('drop', updated_columns_to_drop),
                      (calculate_age_transformer, ['DateOfBirth']),
                      (working_age_transformer, ['DateOfBirth']),
                      (TargetEncoder(), updated_categorical_features),
                      remainder='passthrough')

In [None]:
# Create pipeline with updated feature engineering transformer, standard scaler, and logistic regression
pipe3 = make_pipeline(updated_fe_transformer,
                      StandardScaler(),
                      LogisticRegression(random_state=42))

In [None]:
# Cross-validate the model and print the results
cv_scores = cross_val_score(pipe3, X_train, y_train, scoring = 'f1_macro', cv = 10)

# Calculate mean and standard deviation of scores
avg = cv_scores.mean()
stddev = cv_scores.std()

# Print results
print("Scores:", [round(score, 4) for score in cv_scores], '\n')
print(f"Mean: {avg:.4f}")
print(f"Std. Dev: {stddev:.4f}")
print(f"+/-2 std. dev. range within mean: ({avg - 2*stddev:.4f}, {avg + 2*stddev:.4f})")

Scores: [0.7323, 0.7138, 0.6862, 0.6338, 0.7266, 0.7014, 0.7101, 0.7376, 0.683, 0.7225] 

Mean: 0.7047
Std. Dev: 0.0293
+/-2 std. dev. range within mean: (0.6461, 0.7634)


Dropping the `Street` and `LicensePlate` features from the model did not have any effect as identical Macro F1 metrics were obtained. Other feature selection techniques will be attempted to find an optimal subset of features. `Street` and `LicensePlate` will be kept in the dataset in case either or both of these features is in the optimal subset of features.

### SelectKBest

The SelectKBest method is used to remove features that are poorly correlated with the target feature.

In [None]:
from sklearn.feature_selection import SelectKBest, f_classif

In [None]:
# Make a pipeline with feature engineering transformer, standard scaler, KBest feature selector, and 
# logistic regression classifier
# KBest selector will select the 10 features with the best ANOVA F-value

pipe_kbest = make_pipeline(fe_transformer,
                      StandardScaler(),
                      SelectKBest(score_func=f_classif, k=10),
                      LogisticRegression(random_state=42))

In [None]:
# Cross-validate the model and print the results
cv_scores = cross_val_score(pipe_kbest, X_train, y_train, scoring='f1_macro', cv=10)

# Calculate mean and standard deviation of scores
avg = cv_scores.mean()
stddev = cv_scores.std()

# Print results
print("Scores:", [round(score, 4) for score in cv_scores], '\n')
print(f"Mean: {avg:.4f}")
print(f"Std. Dev: {stddev:.4f}")
print(f"+/-2 std. dev. range within mean: ({avg - 2*stddev:.4f}, {avg + 2*stddev:.4f})")

Scores: [0.7287, 0.7026, 0.7051, 0.641, 0.7207, 0.7138, 0.7127, 0.7464, 0.6845, 0.7285] 

Mean: 0.7084
Std. Dev: 0.0276
+/-2 std. dev. range within mean: (0.6532, 0.7636)


The SelectKBest method for feature selection gives improved mean results from the model in step 2.

### Recursive Feature Elimination

Recursive Feature Elimination (RFE) will be used because as a wrapper method, it is able to evaluate model performance and will thoroughly go through testing various feature combinations.



In [None]:
# Use RFE to identify the most relevant features
from sklearn.feature_selection import RFE

In [None]:
# Create pipeline with feature engineering transformer, standard scaler, RFE with 10 features to select, 
# and prediction model

pipe_rfe10 = make_pipeline(fe_transformer,
                      StandardScaler(),
                      RFE(estimator = LogisticRegression(random_state=42), n_features_to_select=10),
                      LogisticRegression(random_state=42))

In [None]:
# Cross-validate the model and print the results
cv_scores = cross_val_score(pipe_rfe10, X_train, y_train, scoring='f1_macro', cv=10)

# Calculate mean and standard deviation of scores
avg = cv_scores.mean()
stddev = cv_scores.std()

# Print results
print("Scores:", [round(score, 4) for score in cv_scores], '\n')
print(f"Mean: {avg:.4f}")
print(f"Std. Dev: {stddev:.4f}")
print(f"+/-2 std. dev. range within mean: ({avg - 2*stddev:.4f}, {avg + 2*stddev:.4f})")

Scores: [0.7232, 0.7173, 0.7014, 0.641, 0.7207, 0.7163, 0.7101, 0.7438, 0.6845, 0.7225] 

Mean: 0.7081
Std. Dev: 0.0267
+/-2 std. dev. range within mean: (0.6547, 0.7615)


Selecting 10 features results in an increase of 0.0034 in mean Macro F1 score and a standard deviation decrease of 0.0026 when compared to the model from step 2 that was trained with 17 features. Out of curiosity, the model will be retrained by selecting 5 features for RFE. 

In [None]:
# Create pipeline with feature engineering transformer, standard scaler, RFE with 5 features to select,
# and prediction model

pipe_rfe5 = make_pipeline(fe_transformer,
                      StandardScaler(),
                      RFE(estimator = LogisticRegression(random_state=42), n_features_to_select=5),
                      LogisticRegression(random_state=42))

In [None]:
# Cross-validate the model and print the results
cv_scores = cross_val_score(pipe_rfe5, X_train, y_train, scoring='f1_macro', cv=10)

# Calculate mean and standard deviation of scores
avg = cv_scores.mean()
stddev = cv_scores.std()

# Print results
print("Scores:", [round(score, 4) for score in cv_scores], '\n')
print(f"Mean: {avg:.4f}")
print(f"Std. Dev: {stddev:.4f}")
print(f"+/-2 std. dev. range within mean: ({avg - 2*stddev:.4f}, {avg + 2*stddev:.4f})")

Scores: [0.7215, 0.7098, 0.6821, 0.6381, 0.6854, 0.7014, 0.7163, 0.7207, 0.675, 0.7039] 

Mean: 0.6954
Std. Dev: 0.0246
+/-2 std. dev. range within mean: (0.6462, 0.7446)


Selecting 10 features via either the SelectKBest and RFE methods yields a better mean model performance than the model trained by selecting 5 features via RFE. The model trained that selected 5 features via RFE was less performant than the model from step 2.

Models that selected 10 features using the SelectKBest and RFE methods yielded similar mean Macro F1 scores. However, the lower bound of the +/-2 standard deviation range of the model that used the RFE method is higher than that of the model that used the SelectKBest method. Studies have shown that the true performance of a model is closer to the lower bound of the range. Thus, it is hypothesized that using RFE will yield better results in production.

Hyperparameter tuning will now be used to try to improve the model performance score.

## 4. Hyperparameter Tuning

### Using Grid Search

In [None]:
from sklearn.model_selection import GridSearchCV

Hyperparameters to tune for RFE:

* `n_features_to_select`: Number of features to select

Hyperparameters to tune for logistic regression:

* `penalty`: type of regularization used
* `C`: regularization strength where making the value smaller increases the strength
* `solver`: optimization algorithm used



In [None]:
# Create pipeline with feature engineering transformer, standard scaler, RFE, and prediction model
pipe4 = make_pipeline(fe_transformer,
                      StandardScaler(),
                      RFE(estimator = LogisticRegression(random_state=42)),
                      LogisticRegression(random_state=42))

Three sets of hyperparameter grids are made because solver algorithms used in logistic regression support different sets of penalties. To try different solver hyperparameters without error, a grid is made for:

* `'newton-cg', 'lbfgs', 'sag',` and `'saga'` solvers with `'l2'` and `'none'` as the penalties
* `'saga'` solver with `'l1'` as the penality
* `'liblinear'` solver with `'l1'` and `'l2'` as the penalities

In [None]:
# Set potential hyperparameter grid for solvers that support 'l2' and 'none' penalities
gs_params1 = {'rfe__n_features_to_select': [8, 9, 10, 11, 12],
          'logisticregression__penalty': ('l2', 'none'),
          'logisticregression__C': [0.001, 0.01, 0.1, 1, 10, 100, 1000],
          'logisticregression__solver': ('newton-cg', 'lbfgs', 'sag', 'saga')}

In [None]:
# Set potential hyperparameter grid for saga solver to evaluate how performant model is when penalty='l1'
gs_params2 = {'rfe__n_features_to_select': [8, 9, 10, 11, 12],
          'logisticregression__penalty': ['l1'],
          'logisticregression__C': [0.001, 0.01, 0.1, 1, 10, 100, 1000],
          'logisticregression__solver': ['saga']}

In [None]:
# Set potential hyperparameter grid for liblinear solver which support 'l1' and 'l2' penalties
gs_params3 = {'rfe__n_features_to_select': [8, 9, 10, 11, 12],
          'logisticregression__penalty': ('l1', 'l2'),
          'logisticregression__C': [0.001, 0.01, 0.1, 1, 10, 100, 1000],
          'logisticregression__solver': ['liblinear']}

In [None]:
# Perform GridSearchCV
logit_gs = GridSearchCV(pipe4, param_grid=[gs_params1, gs_params2, gs_params3], 
                        scoring='f1_macro', cv=10, n_jobs=-1, return_train_score=True)

In [None]:
# Fit to training data
logit_gs.fit(X_train, y_train)

GridSearchCV(cv=10,
             estimator=Pipeline(steps=[('columntransformer',
                                        ColumnTransformer(remainder='passthrough',
                                                          transformers=[('drop',
                                                                         'drop',
                                                                         ['UserID']),
                                                                        ('functiontransformer-1',
                                                                         FunctionTransformer(func=<function calculate_age at 0x7f264de72440>),
                                                                         ['DateOfBirth']),
                                                                        ('functiontransformer-2',
                                                                         FunctionTransformer(func=<function is_working_age at 0x7f2645603b00>),
              

In [None]:
# Print the hyperparameters, score, standard deviation, and standard deviation range of the 
# best performing model from GridSearchCV

avg = logit_gs.best_score_
stddev = logit_gs.cv_results_['std_test_score'][logit_gs.best_index_]

print(f"Best Hyperparameters: {logit_gs.best_params_}'\n'")
print(f"Best Mean Score: {avg:.4f}")
print(f"Best Mean Std. Dev.: {stddev:.4f}")
print(f"+/-2 std. dev. range within mean: ({avg - 2*stddev:.4f}, {avg + 2*stddev:.4f})")

Best Hyperparameters: {'logisticregression__C': 0.001, 'logisticregression__penalty': 'none', 'logisticregression__solver': 'newton-cg', 'rfe__n_features_to_select': 11}'
'
Best Mean Score: 0.7083
Best Mean Std. Dev.: 0.0278
+/-2 std. dev. range within mean: (0.6528, 0.7639)


In [None]:
# Function to display in a dataframe cross-validation results sorted by test score rank
# This will be used to display GridSearchCV and RandomizedSearchCV results

def show_cv_results(cv_results):
  df_results = pd.DataFrame(cv_results['params'])
  df_results['mean_train_score'] = cv_results['mean_train_score']
  df_results['std_train_score'] = cv_results['std_train_score']
  df_results['mean_test_score'] = cv_results['mean_test_score']
  df_results['std_test_score'] = cv_results['std_test_score']
  df_results['rank_test_score'] = cv_results['rank_test_score']

  df_results = df_results.sort_values(by='rank_test_score', ascending=True)
  return df_results

In [None]:
# Show test score rank sorted GridSearchCV results
best_gs_results = show_cv_results(logit_gs.cv_results_)
best_gs_results

Unnamed: 0,logisticregression__C,logisticregression__penalty,logisticregression__solver,rfe__n_features_to_select,mean_train_score,std_train_score,mean_test_score,std_test_score,rank_test_score
278,1000.000,none,saga,11,0.883648,0.003221,0.708325,0.027777,1
258,1000.000,l2,saga,11,0.883648,0.003221,0.708325,0.027777,1
28,0.001,none,lbfgs,11,0.883648,0.003221,0.708325,0.027777,1
153,1.000,none,sag,11,0.883648,0.003221,0.708325,0.027777,1
263,1000.000,none,newton-cg,11,0.883648,0.003221,0.708325,0.027777,1
...,...,...,...,...,...,...,...,...,...
280,0.001,l1,saga,8,0.453054,0.000034,0.453054,0.000305,376
317,0.001,l1,liblinear,10,0.453054,0.000034,0.453054,0.000305,376
318,0.001,l1,liblinear,11,0.453054,0.000034,0.453054,0.000305,376
282,0.001,l1,saga,10,0.453054,0.000034,0.453054,0.000305,376


There are multiple hyperparameter combinations that yield the best mean Macro F1 score of 0.7083 including the combination stored in the `best_params_` attribute. This best mean Macro F1 score is similar to the score obtained in step 3.

### Using Randomized Search

In [None]:
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform

Randomized search is similar to grid search however, randomly selected values from a continuous distribution will be used for the `n_features_to_select` and `C` hyperparameters.

In [None]:
# Set potential hyperparameter grid for solvers that support 'l2' and 'none' penalities
rs_params1 = {'rfe__n_features_to_select': uniform(0.0, 1.0),
          'logisticregression__penalty': ('l2', 'none'),
          'logisticregression__C': uniform(0.0001, 10000),
          'logisticregression__solver': ('newton-cg', 'lbfgs', 'sag', 'saga')}

In [None]:
# Set potential hyperparameter grid for saga solver to evaluate how performant model is when penalty='l1'
rs_params2 = {'rfe__n_features_to_select': uniform(0.0, 1.0),
          'logisticregression__penalty': ['l1'],
          'logisticregression__C': uniform(0.0001, 10000),
          'logisticregression__solver': ['saga']}

In [None]:
# Set potential hyperparameter grid for liblinear solver which support 'l1' and 'l2' penalties
rs_params3 = {'rfe__n_features_to_select': uniform(0.0, 1.0),
          'logisticregression__penalty': ('l1', 'l2'),
          'logisticregression__C': uniform(0.0001, 10000),
          'logisticregression__solver': ['liblinear']}

In [None]:
# Combine the two hyperparameter grids into a list
rs_params = [rs_params1, rs_params2, rs_params3]

In [None]:
# Perform RandomizedSearchCV 
logit_rs = RandomizedSearchCV(pipe4, param_distributions=[rs_params1, rs_params2, rs_params3], 
                              n_iter=1000, scoring='f1_macro', cv=10, n_jobs=-1, return_train_score=True)

In [None]:
# Fit to training data
logit_rs.fit(X_train, y_train)

RandomizedSearchCV(cv=10,
                   estimator=Pipeline(steps=[('columntransformer',
                                              ColumnTransformer(remainder='passthrough',
                                                                transformers=[('drop',
                                                                               'drop',
                                                                               ['UserID']),
                                                                              ('functiontransformer-1',
                                                                               FunctionTransformer(func=<function calculate_age at 0x7f264de72440>),
                                                                               ['DateOfBirth']),
                                                                              ('functiontransformer-2',
                                                                               FunctionTransformer

In [None]:
# Print the hyperparameters, score, standard deviation, and standard deviation range of the 
# best performing model from RandomizedSearchCV

avg = logit_rs.best_score_
stddev = logit_rs.cv_results_['std_test_score'][logit_rs.best_index_]

print(f"Best Hyperparameters: {logit_rs.best_params_}'\n'")
print(f"Best Mean Score: {avg:.4f}")
print(f"Best Mean Std. Dev.: {stddev:.4f}")
print(f"+/-2 std. dev. range within mean: ({avg - 2*stddev:.4f}, {avg + 2*stddev:.4f})")

Best Hyperparameters: {'logisticregression__C': 9465.876175501207, 'logisticregression__penalty': 'l2', 'logisticregression__solver': 'lbfgs', 'rfe__n_features_to_select': 0.1408325471180527}'
'
Best Mean Score: 0.7821
Best Mean Std. Dev.: 0.0330
+/-2 std. dev. range within mean: (0.7162, 0.8481)


In [None]:
# Show test score rank sorted RandomizedSearchCV results
best_rs_results = show_cv_results(logit_rs.cv_results_)
best_rs_results

Unnamed: 0,logisticregression__C,logisticregression__penalty,logisticregression__solver,rfe__n_features_to_select,mean_train_score,std_train_score,mean_test_score,std_test_score,rank_test_score
696,135.764355,l1,liblinear,0.126967,0.782894,0.003613,0.782148,0.032993,1
162,3953.554567,none,saga,0.118956,0.782894,0.003613,0.782148,0.032993,1
432,7913.923427,none,sag,0.151762,0.782952,0.003703,0.782148,0.032993,1
166,7074.127874,l2,liblinear,0.169084,0.782894,0.003613,0.782148,0.032993,1
427,8664.842155,l1,liblinear,0.144419,0.782894,0.003613,0.782148,0.032993,1
...,...,...,...,...,...,...,...,...,...
854,3725.784953,l1,saga,0.019101,,,,,996
847,777.138983,l1,saga,0.043799,,,,,997
339,7072.568959,l2,liblinear,0.032493,,,,,998
885,8724.222369,none,saga,0.010226,,,,,999


As the case with GridSearchCV results, RandomizedSearchCV returns multiple hyperparameter combinations that yield the best mean Macro F1 score including the combination stored in the `best_params_` attribute. The best hyperparameter combination from RandomizedSearchCV produced a better mean Macro F1 score (0.7821) than any of the previously trained models. Furthermore, the lower bound of the +/-2 standard deviation range of this model (0.7162) is greater than the best mean Macro F1 score (0.7084) from step 3. This suggests that hyperparameter tuning via RandomizedSearchCV was effective and in production, the model with the best hyperparameter combination should yield a Macro F1 score that is better than the previous best mean score of 0.7084.

Each step has improved the mean Macro F1 score of the logistic regression model (0.4972 -> 0.7047 -> 0.7084 -> 0.7821). However, it remains to be seen how the best performing model produced in this step will fare on unseen data. The best performing model from RandomizedSearchCV will now be deployed on test data.

## 5. Performance Estimation

### Evaluate Model on Test Data

In [None]:
from sklearn.metrics import confusion_matrix, f1_score

In [None]:
# Make predictions on test data using model with the best hyperparameter combination 
# obtained via RandomSearchCV in step 4
y_pred = logit_rs.predict(X_test)

In [None]:
# Print confusion matrix and performance metric
cm = confusion_matrix(y_test, y_pred)

print(cm, '\n')
print(f"Test Set Macro F1 Score: {f1_score(y_test, y_pred, average='macro'):.4f}")

[[956  35]
 [ 89 120]] 

Test Set Macro F1 Score: 0.7992


When deployed on the test set, the model produced a greater Macro F1 score (0.7992) than the best mean score (0.7821) from step 4. While greater, the test performance score is within two standard deviations of the mean score of the best performing model implying that the scores are similar. The similar Macro F1 scores resulting from the training and test sets suggest that how this model performs during training is a valid reflection of how it would perform in production to predict whether a loan applicant has good or bad risk.