# Predicting Customer Churn

Beta Bank customers are leaving: little by little, chipping away every month. The bankers figured out itâ€™s cheaper to save the existing customers rather than to attract new ones.

Using data on clients' past behavior, I will build a model to predict whether a customer is likely to leave the bank soon. The goal of this project will be to obtain a model that has an F1 score of at least 0.59 on the test set. Additionally, the AUC-ROC metric will be calculated.

To build this model, I will first preprocess the data by filling missing values, standardizing numerical features, and encoding categorical features using OHE.

Then, I will split the data into training, test, and validation datasets. For this project, the target that we want to predict is the `Exited` column (turned into `has_left` during the project). I will train a Decision Tree model, Random Forest model, and a Logistic Regressor using various hyperparameters before and after taking clas imbalance into account. I will test the model with the highest F1 score on the test dataset.

## Initialization

To begin, I will import libraries needed throughout the project, load the dataset, and observe the data.

In [1]:
# import libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.dummy import DummyClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import f1_score
from sklearn.metrics import roc_auc_score
from sklearn.utils import shuffle
from sklearn.preprocessing import OrdinalEncoder

### Loading Datasets

In [2]:
# loading datasets 
try:
    df = pd.read_csv('/datasets/Churn.csv')
except:
    print("The data file could not be read.")

### Check Data

In [3]:
# checking basic info of the dataset
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   RowNumber        10000 non-null  int64  
 1   CustomerId       10000 non-null  int64  
 2   Surname          10000 non-null  object 
 3   CreditScore      10000 non-null  int64  
 4   Geography        10000 non-null  object 
 5   Gender           10000 non-null  object 
 6   Age              10000 non-null  int64  
 7   Tenure           9091 non-null   float64
 8   Balance          10000 non-null  float64
 9   NumOfProducts    10000 non-null  int64  
 10  HasCrCard        10000 non-null  int64  
 11  IsActiveMember   10000 non-null  int64  
 12  EstimatedSalary  10000 non-null  float64
 13  Exited           10000 non-null  int64  
dtypes: float64(3), int64(8), object(3)
memory usage: 1.1+ MB


Only the `Tenure` column has null values - I'll have to take a closer look to determine what to do with these null values. There's also a redundant `RowNumber` column.

In [4]:
# viewing data
df.head(10)

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2.0,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1.0,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8.0,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1.0,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2.0,125510.82,1,1,1,79084.1,0
5,6,15574012,Chu,645,Spain,Male,44,8.0,113755.78,2,1,0,149756.71,1
6,7,15592531,Bartlett,822,France,Male,50,7.0,0.0,2,1,1,10062.8,0
7,8,15656148,Obinna,376,Germany,Female,29,4.0,115046.74,4,1,0,119346.88,1
8,9,15792365,He,501,France,Male,44,4.0,142051.07,2,0,1,74940.5,0
9,10,15592389,H?,684,France,Male,27,2.0,134603.88,1,1,1,71725.73,0


The data has been loaded.

## Data Preprocessing

### Drop + Rename Columns
`RowNumber`, `Surname`, and `CustomerId` can all be dropped, as these are not features that can be used to predict if a customer leaves a bank. I will also make the column names standardized by making them snake case. Some column names will be renamed for consistency and to make them easier to understand. `Geography` will become `country` and `Exited` will become `has_left`.

In [5]:
# drop row number and surname columns
df = df.drop(['RowNumber', 'Surname', 'CustomerId'], axis=1)

In [6]:
# make columns lowercase
df.columns = df.columns.str.lower()

In [7]:
# check current column names
df.columns

Index(['creditscore', 'geography', 'gender', 'age', 'tenure', 'balance',
       'numofproducts', 'hascrcard', 'isactivemember', 'estimatedsalary',
       'exited'],
      dtype='object')

In [8]:
# rename columns
df.columns = ['credit_score', 'country', 'gender', 'age', 'tenure', 'account_balance', 'num_of_products', 'has_cr_card', 'is_active_member', 'salary', 'has_left']

In [9]:
# view data
df.head()

Unnamed: 0,credit_score,country,gender,age,tenure,account_balance,num_of_products,has_cr_card,is_active_member,salary,has_left
0,619,France,Female,42,2.0,0.0,1,1,1,101348.88,1
1,608,Spain,Female,41,1.0,83807.86,1,0,1,112542.58,0
2,502,France,Female,42,8.0,159660.8,3,1,0,113931.57,1
3,699,France,Female,39,1.0,0.0,2,0,0,93826.63,0
4,850,Spain,Female,43,2.0,125510.82,1,1,1,79084.1,0


### Missing Values in Tenure

About 10% of our data has NaN values for this column, and the other columns will be valuable for training our model. As such, we cannot drop these rows and will need to come up with a sufficient way to fill in the data.

In [10]:
# viewing unique values in tenure column
df['tenure'].unique()

array([ 2.,  1.,  8.,  7.,  4.,  6.,  3., 10.,  5.,  9.,  0., nan])

The tenure column has integer values, excluding the NaN. After filling in the missing values, I will convert this column to be an integer type.

In [11]:
# view rows with NaN values
df[df['tenure'].isna()].head(10)

Unnamed: 0,credit_score,country,gender,age,tenure,account_balance,num_of_products,has_cr_card,is_active_member,salary,has_left
30,591,Spain,Female,39,,0.0,3,1,0,140469.38,1
48,550,Germany,Male,38,,103391.38,1,0,1,90878.13,0
51,585,Germany,Male,36,,146050.97,2,0,0,86424.57,0
53,655,Germany,Male,41,,125561.97,1,0,0,164040.94,1
60,742,Germany,Male,35,,136857.0,1,0,0,84509.57,0
82,543,France,Female,36,,0.0,2,0,0,26019.59,0
85,652,Spain,Female,75,,0.0,2,1,1,114675.75,0
94,730,Spain,Male,42,,0.0,2,0,1,85982.47,0
99,413,France,Male,34,,0.0,2,0,0,6534.18,0
111,538,Germany,Male,39,,108055.1,2,1,0,27231.26,0


There does not seem to be a pattern for missing tenures, so I will need to fill these values. Since tenure is a period of time, it likely relies on age. As such, I will fill these values with the median based on ages.

In [12]:
# look at the median values for tenure based on age
median_tenures = df[df['tenure'].notnull()].groupby('age').median()['tenure']

# round so values are integers
median_tenures = median_tenures.round()

# view tenures
median_tenures

age
18     4.0
19     5.0
20     4.0
21     4.0
22     6.0
      ... 
83     6.0
84     8.0
85    10.0
88    10.0
92     1.0
Name: tenure, Length: 70, dtype: float64

In [13]:
# function to replace nan values
def replace_missing_tenure(age):
    return median_tenures[age]

# checking function works -> output should be 6.0
print(replace_missing_tenure(22))

6.0


In [14]:
# replace all nan values
df['tenure'] = df['tenure'].fillna(df['age'].apply(replace_missing_tenure))

In [15]:
# check values
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 11 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   credit_score      10000 non-null  int64  
 1   country           10000 non-null  object 
 2   gender            10000 non-null  object 
 3   age               10000 non-null  int64  
 4   tenure            10000 non-null  float64
 5   account_balance   10000 non-null  float64
 6   num_of_products   10000 non-null  int64  
 7   has_cr_card       10000 non-null  int64  
 8   is_active_member  10000 non-null  int64  
 9   salary            10000 non-null  float64
 10  has_left          10000 non-null  int64  
dtypes: float64(3), int64(6), object(2)
memory usage: 859.5+ KB


`tenure` no longer has NaN values. Next, I will convert the `tenure` column to integers.

In [16]:
# convert data type to integer
df['tenure'] = df['tenure'].astype('int')

# show data
df.head()

Unnamed: 0,credit_score,country,gender,age,tenure,account_balance,num_of_products,has_cr_card,is_active_member,salary,has_left
0,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


The missing values in `tenure` have been filled with the median for their age group, and the data type has been casted from float to integer. There are no more NaN values in the dataset.

## Feature Preparation

### OHE for Categorical Features
Logistic regression only works with numerical, as opposed to categorical, features. One-Hot Encoding can be used to convert our categorical features to numerical features - separate columns will be created for each value in the categorical category we want. These separate columns will be binary columns, having a value 1 if that feature is true and having a value of 0 if that feature is false.

The new features are called **dummy columns**, and we can remove one column (done via `drop_first` below) since its value can easily be inferred based on the values of the remaining columns.

Our categorical columns are `country` and `gender`. 

In [17]:
# view unique values of country
df['country'].unique()

array(['France', 'Spain', 'Germany'], dtype=object)

In [18]:
# view unique values of gender
df['gender'].unique()

array(['Female', 'Male'], dtype=object)

`country` has three values, and `gender` has two. There will be two dummy columns created for `country` and one for `gender`. First, I will convert these column values to lowercase, then I will create the dummy columns.

In [19]:
# convert columns to lowercase
df['country'] = df['country'].str.lower()
df['gender'] = df['gender'].str.lower()

In [20]:
# create dummy columns and drop first instance
df = pd.get_dummies(df, drop_first=True)

In [21]:
# show data with dummy variables
df.head()

Unnamed: 0,credit_score,age,tenure,account_balance,num_of_products,has_cr_card,is_active_member,salary,has_left,country_germany,country_spain,gender_male
0,619,42,2,0.0,1,1,1,101348.88,1,0,0,0
1,608,41,1,83807.86,1,0,1,112542.58,0,0,1,0
2,502,42,8,159660.8,3,1,0,113931.57,1,0,0,0
3,699,39,1,0.0,2,0,0,93826.63,0,0,0,0
4,850,43,2,125510.82,1,1,1,79084.1,0,0,1,0


The original columns for `gender` and `country` were removed. The binary columns `country_germany`, `country_spain`, and `gender_male` were created.

### Splitting the Data

Since the test dataset doesn't yet exist, we need to split the source data into three datasets: The training dataset used to train the model, the validation dataset to evaluate the trained model, and the test dataset to give an unbiased final evaluation of the model.

It is good practice to do a 3:1:1 split since the test dataset doesn't exist - the training dataset will be made up of 60% of the source data whereas the validation and test datasets will both be 20% of the source data.

In [22]:
# defining our target and features
target = df['has_left']
features = df.drop(['has_left'], axis=1)

In [23]:
# reserving 20% of the data for the test dataset
features_train, features_test, target_train, target_test = train_test_split(
    features, target, test_size=0.20, stratify = target, random_state=12345
)

# splitting the 80% of the source dataset into the training dataset (60%) and validation dataset (20%)  
features_train, features_valid, target_train, target_valid = train_test_split(
    features_train, target_train, test_size=0.25, stratify = target_train, random_state=12345
)

In [24]:
# confirming sizes of datasets
original_data_size = len(df)
print(f'Training dataset size: {len(features_train) / original_data_size:.2%}')
print(f'Validation dataset size: {len(features_valid) / original_data_size:.2%}')
print(f'Testing dataset size: {len(features_test) / original_data_size:.2%}')

Training dataset size: 60.00%
Validation dataset size: 20.00%
Testing dataset size: 20.00%


### Standardize Numerical Features

Next, numeric features must be standardized. By standardizing to relative terms, it prevents features with wider ranges from having a larger weight while our model is being trained.

In [25]:
# defining our numeric columns
numeric = ['credit_score', 'age', 'tenure', 'account_balance', 'salary', 'num_of_products']

In [26]:
pd.options.mode.chained_assignment = None

# scaling our numeric columns
scaler = StandardScaler()
scaler.fit(features_train[numeric])

# scaling training, validation, and testing sets
features_train[numeric] = scaler.transform(features_train[numeric])
features_valid[numeric] = scaler.transform(features_valid[numeric])
features_test[numeric] = scaler.transform(features_test[numeric])

# show data
features_train.head()

Unnamed: 0,credit_score,age,tenure,account_balance,num_of_products,has_cr_card,is_active_member,salary,country_germany,country_spain,gender_male
5536,-0.143332,0.577533,-0.000424,-1.220573,0.797767,1,1,1.029613,0,1,1
8530,1.632702,-0.564119,-1.09143,0.435807,-0.916018,1,0,0.237986,0,0,0
1762,1.116413,-0.468981,-1.455098,1.245822,-0.916018,1,1,-0.686104,0,0,0
9090,1.643028,0.006707,-0.000424,-1.220573,-0.916018,1,0,-0.391097,0,0,0
8777,-0.484083,-1.420358,-1.455098,1.421989,0.797767,1,0,-1.361559,0,1,1


The data has been standardized.

### Splitting the Data

Since the test dataset doesn't yet exist, we need to split the source data into three datasets: The training dataset used to train the model, the validation dataset to evaluate the trained model, and the test dataset to give an unbiased final evaluation of the model.

It is good practice to do a 3:1:1 split since the test dataset doesn't exist - the training dataset will be made up of 60% of the source data whereas the validation and test datasets will both be 20% of the source data.

In [27]:
# defining our target and features
target = df['has_left']
features = df.drop(['has_left'], axis=1)

In [28]:
# reserving 20% of the data for the test dataset
features_train, features_test, target_train, target_test = train_test_split(
    features, target, test_size=0.20, random_state=12345
)

# splitting the 80% of the source dataset into the training dataset (60%) and validation dataset (20%)  
features_train, features_valid, target_train, target_valid = train_test_split(
    features_train, target_train, test_size=0.25, random_state=12345
)

In [29]:
# confirming sizes of datasets
original_data_size = len(df)
print(f'Training dataset size: {len(features_train) / original_data_size:.2%}')
print(f'Validation dataset size: {len(features_valid) / original_data_size:.2%}')
print(f'Testing dataset size: {len(features_test) / original_data_size:.2%}')

Training dataset size: 60.00%
Validation dataset size: 20.00%
Testing dataset size: 20.00%


## Models with Class Imbalance
The first model we create will not take class imbalances into account. Since the value we want to predict is a binary value of 0 or 1, we can use classification.

In [30]:
df['has_left'].value_counts(normalize=True)

0    0.7963
1    0.2037
Name: has_left, dtype: float64

In our source data, 79.63% of our values for `has_left` are 0, whereas only 20.37% have the value 1. This is a class imbalance, as the number of 0s are significantly higher than 1. While this should be addressed for the final model, I will first test different combinations of hyperparameters for three models: Decision Tree, Random Forest, and Logistic Regression.

### Decision Tree Classifier

In [31]:
# function to run the decision tree classifier based on datasets and is the classes are weighted
def decision_tree_classifier(data_features_train, data_target_train, class_weight=None):
    
    # initializing values for our best depth and accuracy
    best_depth = 0
    best_f1 = 0
    final_roc_auc = 0

    # loop over depths from 1 to 9
    for depth in range(1, 10):
        model = DecisionTreeClassifier(random_state=12345, max_depth=depth, class_weight=class_weight)
        model.fit(data_features_train, data_target_train)
        predictions_valid = model.predict(features_valid)

        f1 = f1_score(target_valid, predictions_valid)

        if f1 > best_f1:
            best_f1 = f1
            best_depth = depth
            proba_valid = model.predict_proba(features_valid)
            proba_one_valid = proba_valid[:, 1]
            final_roc_auc = roc_auc_score(target_valid, proba_one_valid)

    # print the best depth
    print(f'Best Depth: {best_depth} with F1 {best_f1:.2%} and AUC-ROC {final_roc_auc:.2%}')

In [32]:
# call our decision tree classifier
decision_tree_classifier(features_train, target_train)

Best Depth: 7 with F1 55.84% and AUC-ROC 82.31%


### Random Forest Classifier

In [33]:
def random_forest_classifier(data_features_train, data_target_train, class_weight=None):
    
    # initializing values for our best depth and accuracy
    best_f1 = 0
    final_roc_auc = 0
    best_est = 0
    best_depth = 0

    # looping through max depths
    for depth in range(1, 11):

        # looping through number of estimators
        for est in range(10, 121, 10): 
            model = RandomForestClassifier(random_state=12345, n_estimators=est,
                                           max_depth=depth, class_weight=class_weight)
            model.fit(data_features_train, data_target_train) 

            predictions_valid = model.predict(features_valid)
            f1 = f1_score(target_valid, predictions_valid)

            if f1 > best_f1:
                best_f1 = f1
                proba_valid = model.predict_proba(features_valid)
                proba_one_valid = proba_valid[:, 1]
                final_roc_auc = roc_auc_score(target_valid, proba_one_valid)
                best_est = est
                best_depth = depth

    # printing results
    print(f'Best Depth: {best_depth}')
    print(f'Best Number of Estimators: {best_est}')
    print(f'Best F1 Score: {best_f1:.2%}')
    print(f'Final AUC-ROC: {final_roc_auc:.2%}')

In [34]:
random_forest_classifier(features_train, target_train)

Best Depth: 10
Best Number of Estimators: 30
Best F1 Score: 55.17%
Final AUC-ROC: 84.75%


For our data, the best depth and number of estimators of our Random Forest Model was 10 and 30. The F1 score is slightly worse than for our Decision Tree Model, with a score of 55.17%.

### Logistic Regression

In [35]:
# Running model using logistic regression
def logistic_regression(data_features_train, data_target_train, class_weight=None):
    model = LogisticRegression(random_state=12345, solver='liblinear', class_weight=class_weight)
    model.fit(data_features_train, data_target_train)
    predictions_valid = model.predict(features_valid)

    proba_valid = model.predict_proba(features_valid)
    proba_one_valid = proba_valid[:, 1]

    f1 = f1_score(target_valid, predictions_valid)
    roc_auc = roc_auc_score(target_valid, proba_one_valid)

    # Printing the accuracy
    print(f'F1: {f1:.2%}')
    print(f'AUC-ROC: {roc_auc:.2%}')

In [36]:
logistic_regression(features_train, target_train)

F1: 10.29%
AUC-ROC: 68.14%


Logistic regression is the worst of the three models. It has no hyperparameters, and it has an F1 score of 10.29% and an AUC-ROC of 68.14%.

## Models without Class Imbalance
Next, I will train models that take class imbalances into account. I will apply two different methods, upsampling and class weight adjustment, for the three models.

### Upsampling

In [37]:
# upsample method to make a rare class less rare
def upsample(features, target, repeat):
    features_zeros = features[target == 0]
    features_ones = features[target == 1]
    target_zeros = target[target == 0]
    target_ones = target[target == 1]

    # upsamples the target and features by repeating our positive features
    features_upsampled = pd.concat([features_zeros] + [features_ones] * repeat)
    target_upsampled = pd.concat([target_zeros] + [target_ones] * repeat)

    # shuffle target and features
    features_upsampled, target_upsampled = shuffle(
        features_upsampled, target_upsampled, random_state=12345
    )

    return features_upsampled, target_upsampled

In [38]:
# perform upsampling on our features and targets - I tried a few numbers and 4 was the best
features_upsampled, target_upsampled = upsample(
    features_train, target_train, 4
)

In [39]:
# view old value counts
target_train.value_counts()

0    4781
1    1219
Name: has_left, dtype: int64

In [40]:
# view new value counts
target_upsampled.value_counts()

1    4876
0    4781
Name: has_left, dtype: int64

After performing upsampling, `has_left` has similar amounts of 0s and 1s - we've made the `1` class more common.

#### Decision Tree

In [41]:
# running the decision tree classifier with the upsampled data
decision_tree_classifier(features_upsampled, target_upsampled)

Best Depth: 6 with F1 55.87% and AUC-ROC 80.91%


Without taking class imbalances into account previously, our best depth for decision trees was 7, with a F1 score of 55.84% and an AUC-ROC of 82.31%

With taking class imbalances into account, our new best depth is 6. The F1 score has increased to 55.87%, and the AUC-ROC has decreased to 80.91%.

#### Random Forest

In [42]:
# running the random forest classifier with the upsampled data
random_forest_classifier(features_upsampled, target_upsampled)

Best Depth: 10
Best Number of Estimators: 110
Best F1 Score: 60.34%
Final AUC-ROC: 85.37%


The best depth and number of estimators changed to 10 and 110. However, now our F1 Score is 60.34% (previously 55.17% from the original data), which is a large improvement. The AUC-ROC has increased very slightly to 85.37% (from 84.75%).

#### Logistic Regression

In [43]:
# running the logistic regression with the upsampled data
logistic_regression(features_upsampled, target_upsampled)

F1: 43.04%
AUC-ROC: 71.69%


Logistic regression is still performing the worst of the three models, but the F1 score has drastically increased to 43.04% (from 10.29%), and the AUC-ROC has increased to 71.69%.

After upsampling, there have been improvements in the Logistic Regression and Random Forest models. The AUC-ROC also increased for each model, meaning that our models are beginning more efficient at differentiating between classes.

### Class Weight Adjustment
The next method I will use to deal with class imbalances is using the built in `class_weight` attribute when building the model. For each of the methods I have created for the three models, I created an optional parameter, class_weight. When calling these models, I can set that third class_weight parameter to "balanced" to assign weights to our classes.

In [44]:
# running the decision tree classifier with class weight adjustment
decision_tree_classifier(features_train, target_train, 'balanced')

Best Depth: 6 with F1 55.87% and AUC-ROC 80.91%


The F1 and AUC-ROC are about the same as when upsampling was used.

In [45]:
# running the random forest classifier with balanced classes
random_forest_classifier(features_train, target_train, 'balanced')

Best Depth: 9
Best Number of Estimators: 90
Best F1 Score: 59.32%
Final AUC-ROC: 85.26%


For our Random Forest model, the F1 score has decreased to 59.32% (60.34% during upsampling), and the AUC-ROC has decreased slightly to 85.26% (85.37% during upsampling).

In [46]:
# running logistic regression with balanced classes
logistic_regression(features_train, target_train, 'balanced')

F1: 43.31%
AUC-ROC: 72.00%


The F1 and AUC-ROC are about the same as when upsampling was used. 

### Conclusion

We applied upsampling and class weight adjustment to tackle the class imbalances in our data. While both methods increased our F1 scores from the original models that did not do anything about class imbalances, ultimately the Random Forest Classifier with upsampling had the highest F1 score. This model had a depth of 10 and 110 estimators.

Next, I will do final testing with the test set.

## Quality Testing
Since the Random Forest Classifier had the best accuracy, I will perform a quality test on it. To do this, I will retrain the model using 80% of the data (combining the training and validation datasets) and do a final evaluation on the validation dataset.

In [47]:
# combining train and validation datasets
features_combined = pd.concat([features_valid, features_train])
target_combined = pd.concat([target_valid, target_train])

In [48]:
# setting hyperparameter variables
best_est = 110
best_depth = 10

In [49]:
# original value counts
target_combined.value_counts()

0    6390
1    1610
Name: has_left, dtype: int64

In [50]:
# perform upsampling on our features and targets
features_upsampled, target_upsampled = upsample(
    features_combined, target_combined, 4
)

In [51]:
# new value counts
target_upsampled.value_counts()

1    6440
0    6390
Name: has_left, dtype: int64

In [52]:
# creating model and training
model = RandomForestClassifier(random_state=12345, n_estimators = best_est, max_depth = best_depth)
model.fit(features_upsampled, target_upsampled);

In [53]:
# testing model with test set
predictions_test = model.predict(features_test)

# get f1 score
f1 = f1_score(target_test, predictions_test)

# get AUC-ROC score
proba_valid = model.predict_proba(features_test)
proba_one_valid = proba_valid[:, 1]
roc_auc = roc_auc_score(target_valid, proba_one_valid)

# print accuracy
print(f'F1 Score: {f1:.2%}')
print(f'AUC-ROC: {roc_auc:.2%}')

F1 Score: 63.75%
AUC-ROC: 51.24%


Our final F1 score is 63.75% and our final AUC-ROC is 51.24%.