## Predicting customer behavior

Data on clients’ past behavior and termination of contracts with a bank will be used to predict is a customer will leave the bank soon. A model with the maximum possible F1 score will be built.

### Step 1. Downloading and preparing the data

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score
from sklearn.utils import shuffle
from sklearn.metrics import roc_auc_score

In [None]:
df = pd.read_csv('/datasets/Churn.csv')

In [None]:
df.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2.0,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1.0,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8.0,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1.0,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2.0,125510.82,1,1,1,79084.1,0


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
RowNumber          10000 non-null int64
CustomerId         10000 non-null int64
Surname            10000 non-null object
CreditScore        10000 non-null int64
Geography          10000 non-null object
Gender             10000 non-null object
Age                10000 non-null int64
Tenure             9091 non-null float64
Balance            10000 non-null float64
NumOfProducts      10000 non-null int64
HasCrCard          10000 non-null int64
IsActiveMember     10000 non-null int64
EstimatedSalary    10000 non-null float64
Exited             10000 non-null int64
dtypes: float64(3), int64(8), object(3)
memory usage: 1.1+ MB


All data columns except for `Tenure` appear to have the correct datatypes. The `Tenure` column in this dataset is counted in whole years, so it is better to convert this column into an `int` datatype. `Tenure` also has 909 null values which should be inspected further.

`EstimatedSalary` can also be changed from a `float` datatype into a `int` datatype since annual salaries are normally thought of as whole numbers, but this will be left unchanged to preserve accuracy.

`Geography` and `Gender` are two non-numeric features that should be converted. `Gender` can be converted into 1/0 (binary) values. `Geography` likely has more than 2 different values, so the column should be converted into dummy variables.

In [None]:
df.describe()

Unnamed: 0,RowNumber,CustomerId,CreditScore,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
count,10000.0,10000.0,10000.0,10000.0,9091.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,5000.5,15690940.0,650.5288,38.9218,4.99769,76485.889288,1.5302,0.7055,0.5151,100090.239881,0.2037
std,2886.89568,71936.19,96.653299,10.487806,2.894723,62397.405202,0.581654,0.45584,0.499797,57510.492818,0.402769
min,1.0,15565700.0,350.0,18.0,0.0,0.0,1.0,0.0,0.0,11.58,0.0
25%,2500.75,15628530.0,584.0,32.0,2.0,0.0,1.0,0.0,0.0,51002.11,0.0
50%,5000.5,15690740.0,652.0,37.0,5.0,97198.54,1.0,1.0,1.0,100193.915,0.0
75%,7500.25,15753230.0,718.0,44.0,7.0,127644.24,2.0,1.0,1.0,149388.2475,0.0
max,10000.0,15815690.0,850.0,92.0,10.0,250898.09,4.0,1.0,1.0,199992.48,1.0


#### Inspecting the contents of each column

Each column will be reviewed to check the values it contains and to make sure there are no unusual values.

In [None]:
df['RowNumber'].value_counts()

2047    1
5424    1
1338    1
7481    1
5432    1
       ..
2716    1
8857    1
4759    1
6806    1
2049    1
Name: RowNumber, Length: 10000, dtype: int64

The `RowNumber` column indicates the index number of each customer, with the index number starting at 1.

In [None]:
df['CustomerId'].value_counts()

15812607    1
15741078    1
15635776    1
15740223    1
15738174    1
           ..
15743714    1
15639265    1
15641312    1
15684319    1
15695872    1
Name: CustomerId, Length: 10000, dtype: int64

The `CustomerId` column indicates the unique customer identifier.

In [None]:
df['Surname'].value_counts()

Smith        32
Scott        29
Martin       29
Walker       28
Brown        26
             ..
Hodge         1
Peyser        1
Darling       1
Sheffield     1
Steere        1
Name: Surname, Length: 2932, dtype: int64

The `Surname` column indicates the customer's surname.

In [None]:
df['CreditScore'].value_counts()

850    233
678     63
655     54
705     53
667     53
      ... 
419      1
417      1
373      1
365      1
401      1
Name: CreditScore, Length: 460, dtype: int64

The `CreditScore` column indicates the customer's credit score.

In [None]:
df['Geography'].value_counts()

France     5014
Germany    2509
Spain      2477
Name: Geography, dtype: int64

The `Geography` column indicates the customer's country of residence.

In [None]:
df['Gender'].value_counts()

Male      5457
Female    4543
Name: Gender, dtype: int64

The `Gender` column indicates the customer's gender.

In [None]:
df['Age'].value_counts()

37    478
38    477
35    474
36    456
34    447
     ... 
92      2
88      1
82      1
85      1
83      1
Name: Age, Length: 70, dtype: int64

The `Age` column indicates the customer's age.

In [None]:
df['Tenure'].value_counts()

1.0     952
2.0     950
8.0     933
3.0     928
5.0     927
7.0     925
4.0     885
9.0     882
6.0     881
10.0    446
0.0     382
Name: Tenure, dtype: int64

The `Tenure` column indicates the period of maturation for a customer’s fixed deposit in years.

In [None]:
df['Balance'].value_counts()

0.00         3617
105473.74       2
130170.82       2
113063.83       1
80242.37        1
             ... 
183555.24       1
137648.41       1
112689.95       1
115465.28       1
74681.90        1
Name: Balance, Length: 6382, dtype: int64

The `Balance` column indicates the account balance on the customer's account.

In [None]:
df['NumOfProducts'].value_counts()

1    5084
2    4590
3     266
4      60
Name: NumOfProducts, dtype: int64

The `NumOfProducts` column indicates the number of banking products used by the customer.

In [None]:
df['HasCrCard'].value_counts()

1    7055
0    2945
Name: HasCrCard, dtype: int64

The `HasCrCard` column indicates if the customer has a credit card.

In [None]:
df['IsActiveMember'].value_counts()

1    5151
0    4849
Name: IsActiveMember, dtype: int64

The `IsActiveMember` column indicates the customer's activeness.

In [None]:
df['EstimatedSalary'].value_counts()

24924.92     2
109029.72    1
182025.95    1
82820.85     1
30314.04     1
            ..
158302.59    1
171037.63    1
43036.60     1
55034.02     1
104181.78    1
Name: EstimatedSalary, Length: 9999, dtype: int64

The `EstimatedSalary` column indicates the customer's estimated salary.

In [None]:
df['Exited'].value_counts()

0    7963
1    2037
Name: Exited, dtype: int64

The `Exited` column indicates if the customer has left the bank.

#### Dealing with missing values

In [None]:
df.isna().sum()

RowNumber            0
CustomerId           0
Surname              0
CreditScore          0
Geography            0
Gender               0
Age                  0
Tenure             909
Balance              0
NumOfProducts        0
HasCrCard            0
IsActiveMember       0
EstimatedSalary      0
Exited               0
dtype: int64

In [None]:
df.loc[df['Tenure'].isna()]

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
30,31,15589475,Azikiwe,591,Spain,Female,39,,0.00,3,1,0,140469.38,1
48,49,15766205,Yin,550,Germany,Male,38,,103391.38,1,0,1,90878.13,0
51,52,15768193,Trevisani,585,Germany,Male,36,,146050.97,2,0,0,86424.57,0
53,54,15702298,Parkhill,655,Germany,Male,41,,125561.97,1,0,0,164040.94,1
60,61,15651280,Hunter,742,Germany,Male,35,,136857.00,1,0,0,84509.57,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9944,9945,15703923,Cameron,744,Germany,Male,41,,190409.34,2,1,1,138361.48,0
9956,9957,15707861,Nucci,520,France,Female,46,,85216.61,1,1,0,117369.52,1
9964,9965,15642785,Douglas,479,France,Male,34,,117593.48,2,0,0,113308.29,0
9985,9986,15586914,Nepean,659,France,Male,36,,123841.49,2,1,0,96833.00,0


A null `Tenure` value may indicate that the customer has not made any fixed deposits.

Null values will be replaced with a -1 to denote this case, since a tenure with a value of 0 could indicate a fixed deposit with a period of maturation that is less than 1 year.

In [None]:
df['Tenure'].fillna(-1, inplace=True)

#### Converting the data to the necessary types

In [None]:
df['Tenure'] = df['Tenure'].astype(int)

#### Checking for duplicate rows

In [None]:
df['Surname'] = df['Surname'].str.lower()

In [None]:
df.duplicated().sum()

0

The letter casing for values in the `Surname` column have been changed to lowercase, in order to match any possible duplicated names that may have had different casing.

No duplicate customer entries were found.

#### Converting non-numeric features into numbers

In [None]:
df['Gender'] = df['Gender'].replace(['Male', 'Female'], [1, 0])

Here, the `Gender` column is made binary. `Male` is replaced with a `1` and `Female` is replaced with a `0`

In [None]:
df  = df.join(pd.get_dummies(df['Geography'], prefix = 'Geo', drop_first=True))

The `Geography` column has categorical values. These values are transformed into numerical features using the One-Hot Encoding method.

`.get_dummies` is called with the `drop_first` parameter to avoid falling into the dummy feature trap.

In [None]:
df.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited,Geo_Germany,Geo_Spain
0,1,15634602,hargrave,619,France,0,42,2,0.0,1,1,1,101348.88,1,0,0
1,2,15647311,hill,608,Spain,0,41,1,83807.86,1,0,1,112542.58,0,0,1
2,3,15619304,onio,502,France,0,42,8,159660.8,3,1,0,113931.57,1,0,0
3,4,15701354,boni,699,France,0,39,1,0.0,2,0,0,93826.63,0,0,0
4,5,15737888,mitchell,850,Spain,0,43,2,125510.82,1,1,1,79084.1,0,0,1


#### Setting Target and Features

In [None]:
target = df['Exited']
features = df.drop(['Exited', 'Geography', 'RowNumber', 'CustomerId', 'Surname'], axis=1)

The target feature for this project is `Exited`, since we are trying to determine if a customer will leave the bank.

The `Geography` column is dropped from the features variable because values contained in this column were converted into numerical values using the One-Hot Encoding method.

The `RowNumber`, `CustomerId`, and `Surname` columns were not included in the features variable here because the values contained in these columns have no influence on whether a customer will leave the bank.

### Conclusion

`/datasets/Churn.csv` was opened and examined for general information.

There are 14 columns in the file.

The columns in this dataset are described as follows:

    RowNumber — data string index
    CustomerId — unique customer identifier
    Surname — surname
    CreditScore — credit score
    Geography — country of residence
    Gender — gender
    Age — age
    Tenure — period of maturation for a customer’s fixed deposit (years)
    Balance — account balance
    NumOfProducts — number of banking products used by the customer
    HasCrCard — customer has a credit card
    IsActiveMember — customer’s activeness
    EstimatedSalary — estimated salary
    Exited — сustomer has left

The datatype for the `Tenure` column in this dataset was converted from float to int, since Tenure is counted in whole years. There were also null values found within this column. The null values were converted into `-1` since they may indicate a situation where the customer has never made a fixed deposit.

The datatype for `EstimatedSalary` is kept as float type to preserve accuracy, even though annual salaries are normally thought of as whole numbers.

Next, the values in each column were reviewed to check for anything unusual.

The names in the `Surname` column were all lowercased to match any potential duplicates that may have originally had different casing.

Duplicate rows were checked using `.duplicated()`.

Non-numeric values in the `Gender` and `Geography` columns were converted into numbers.

Lastly, appropriate contents for the target and features variables were set.

### Step 2. Examining the balance of classes and training the model without taking into account the imbalance

In [None]:
df['Exited'].value_counts()

0    7963
1    2037
Name: Exited, dtype: int64

In [None]:
df['Exited'].mean()

0.2037

20.37% of `Exited` has the value `1`, meaning that the customer has left the bank.

The remaining 79.63% has the value `0`, meaning that they are still current customers.

In [None]:
features_train, features_valid, target_train, target_valid = train_test_split(features, target, test_size=0.4, random_state = 12345)
features_valid, features_test, target_valid, target_test = train_test_split(features_valid, target_valid, test_size=0.5, shuffle = False)

In [None]:
print("--- Train Sizes (Rows, Columns) ---")
print("target_train:", target_train.shape)
print("features_train:", features_train.shape)
print("")
print("--- Valid Sizes (Rows, Columns) ---")
print("target_valid:", target_valid.shape)
print("features_valid:", features_valid.shape)
print("")
print("--- Test Sizes (Rows, Columns) ---")
print("target_test:", target_test.shape)
print("features_test:", features_test.shape)

--- Train Sizes (Rows, Columns) ---
target_train: (6000,)
features_train: (6000, 11)

--- Valid Sizes (Rows, Columns) ---
target_valid: (2000,)
features_valid: (2000, 11)

--- Test Sizes (Rows, Columns) ---
target_test: (2000,)
features_test: (2000, 11)


`train_test_split()` was imported from `sklearn.model_selection`, which splits any data set into two sets.

The source data is split twice using `train_test_split()` into a 3:1:1 ratio: a training dataset (60%), validating dataset (20%), and test dataset (20%).

In [None]:
dt_model = DecisionTreeClassifier(random_state=99)
rf_model = RandomForestClassifier(random_state=99)

dt_model.fit(features_train, target_train)
rf_model.fit(features_train, target_train)

dt_predictions_valid = dt_model.predict(features_valid)
rf_predictions_valid = rf_model.predict(features_valid)

print("--- F1 Scores ---")
print("Decision Tree:", f1_score(target_valid, dt_predictions_valid))
print("Random Forest:", f1_score(target_valid, rf_predictions_valid))

--- F1 Scores ---
Decision Tree: 0.49514563106796117
Random Forest: 0.5454545454545454




In [None]:
dt_probabilities_valid = dt_model.predict_proba(features_valid)
rf_probabilities_valid = rf_model.predict_proba(features_valid)

dt_probabilities_one_valid = dt_probabilities_valid[:, 1]
rf_probabilities_one_valid = rf_probabilities_valid[:, 1]

print("--- AUC-ROC Scores ---")
print("Decision Tree:", roc_auc_score(target_valid, dt_probabilities_one_valid))
print("Random Forest:", roc_auc_score(target_valid, rf_probabilities_one_valid))

--- AUC-ROC Scores ---
Decision Tree: 0.6775281350542155
Random Forest: 0.8256199835931579


### Conclusion

The class distribution of the target column, `Exited`, was examined. It was revealed that 20.37% of the set has the value 1, meaning that the customer has left the bank. The remaining 79.63% of the set has the value 0, meaning that they are still current customers.

The source data was split twice using `train_test_split()` into a 3:1:1 ratio: a training dataset (60%), validating dataset (20%), and test dataset (20%).

A Decision Tree model and a Random Forest model were trained using default hyperparameters without taking into account the class imbalance, and an F1 score and AUC-ROC score was obtained from each.

The Random Forest model had a better F1 score and AUC-ROC score than the Decision Tree model in this step.

For the Decision Tree model, the F1 score obtained was 0.495 and the AUC-ROC score was 0.678.

In the Random Forest model, the F1 score was 0.545 and the AUC-ROC score was 0.826.

### Step 3. Improving the quality of the model

#### 1. Upsampling method

In [None]:
def upsample(features, target, repeat):
    features_zeros = features[target == 0]
    features_ones = features[target == 1]
    target_zeros = target[target == 0]
    target_ones = target[target == 1]

    features_upsampled = pd.concat([features_zeros] + [features_ones] * repeat)
    target_upsampled = pd.concat([target_zeros] + [target_ones] * repeat)
    
    features_upsampled, target_upsampled = shuffle(
        features_upsampled, target_upsampled, random_state=12345)
    
    return features_upsampled, target_upsampled

In [None]:
features_upsampled, target_upsampled = upsample(features_train, target_train, 4)

In [None]:
target_upsampled.value_counts()

0    4804
1    4784
Name: Exited, dtype: int64

In [None]:
for num in range(5, 16):
    dt_balanced_model = DecisionTreeClassifier(random_state=99, max_depth=num)
    rf_balanced_model = RandomForestClassifier(random_state=99, n_estimators=num, max_depth=11)

    dt_balanced_model.fit(features_upsampled, target_upsampled)
    rf_balanced_model.fit(features_upsampled, target_upsampled)

    dt_b_predictions_valid = dt_balanced_model.predict(features_valid)
    rf_b_predictions_valid = rf_balanced_model.predict(features_valid)
    
    dt_b_probabilities_valid = dt_balanced_model.predict_proba(features_valid)
    rf_b_probabilities_valid = rf_balanced_model.predict_proba(features_valid)

    dt_b_probabilities_one_valid = dt_b_probabilities_valid[:, 1]
    rf_b_probabilities_one_valid = rf_b_probabilities_valid[:, 1]

    print("")
    print("--- F1 Scores (", num, ") ---")
    print("Decision Tree:", f1_score(target_valid, dt_b_predictions_valid))
    print("Random Forest:", f1_score(target_valid, rf_b_predictions_valid))
    
    print("--- AUC-ROC Scores ---")
    print("Decision Tree:", roc_auc_score(target_valid, dt_b_probabilities_one_valid))
    print("Random Forest:", roc_auc_score(target_valid, rf_b_probabilities_one_valid))


--- F1 Scores ( 5 ) ---
Decision Tree: 0.6043165467625898
Random Forest: 0.5829042224510814
--- AUC-ROC Scores ---
Decision Tree: 0.8403898634897145
Random Forest: 0.8229721098573557

--- F1 Scores ( 6 ) ---
Decision Tree: 0.5805860805860806
Random Forest: 0.5972660357518401
--- AUC-ROC Scores ---
Decision Tree: 0.8170354235928006
Random Forest: 0.8283891964964991

--- F1 Scores ( 7 ) ---
Decision Tree: 0.5766990291262135
Random Forest: 0.6078639744952178
--- AUC-ROC Scores ---
Decision Tree: 0.809137955933783
Random Forest: 0.8328802345195787

--- F1 Scores ( 8 ) ---
Decision Tree: 0.559694364851958
Random Forest: 0.6047008547008547
--- AUC-ROC Scores ---
Decision Tree: 0.7784413797826615
Random Forest: 0.8388854066946466

--- F1 Scores ( 9 ) ---
Decision Tree: 0.5678537054860443
Random Forest: 0.6173913043478261
--- AUC-ROC Scores ---
Decision Tree: 0.7737694496263796
Random Forest: 0.841899531169278

--- F1 Scores ( 10 ) ---
Decision Tree: 0.5333333333333334
Random Forest: 0.620689

For the upsampling method, the Random Forest model with `n_estimators = 12` and `max_depth = 11` resulted in an F1 score of 0.6309 and an AUC-ROC score of 0.8442.

#### 2. `class_weight = 'balanced'` method

In [None]:
for num in range(5, 16):
    dt_balanced_model = DecisionTreeClassifier(random_state=99, class_weight='balanced', max_depth=num)
    rf_balanced_model = RandomForestClassifier(random_state=99, class_weight='balanced', n_estimators=num, max_depth = 11)

    dt_balanced_model.fit(features_train, target_train)
    rf_balanced_model.fit(features_train, target_train)

    dt_b_predictions_valid = dt_balanced_model.predict(features_valid)
    rf_b_predictions_valid = rf_balanced_model.predict(features_valid)
    
    dt_b_probabilities_valid = dt_balanced_model.predict_proba(features_valid)
    rf_b_probabilities_valid = rf_balanced_model.predict_proba(features_valid)

    dt_b_probabilities_one_valid = dt_b_probabilities_valid[:, 1]
    rf_b_probabilities_one_valid = rf_b_probabilities_valid[:, 1]

    print("")
    print("--- F1 Scores (", num, ") ---")
    print("Decision Tree:", f1_score(target_valid, dt_b_predictions_valid))
    print("Random Forest:", f1_score(target_valid, rf_b_predictions_valid))
    
    print("--- AUC-ROC Scores ---")
    print("Decision Tree:", roc_auc_score(target_valid, dt_b_probabilities_one_valid))
    print("Random Forest:", roc_auc_score(target_valid, rf_b_probabilities_one_valid))


--- F1 Scores ( 5 ) ---
Decision Tree: 0.6043165467625898
Random Forest: 0.5549999999999999
--- AUC-ROC Scores ---
Decision Tree: 0.8403898634897145
Random Forest: 0.815832453686403

--- F1 Scores ( 6 ) ---
Decision Tree: 0.5779816513761468
Random Forest: 0.57
--- AUC-ROC Scores ---
Decision Tree: 0.8123917810952088
Random Forest: 0.8235311633225195

--- F1 Scores ( 7 ) ---
Decision Tree: 0.5778210116731517
Random Forest: 0.6079613992762364
--- AUC-ROC Scores ---
Decision Tree: 0.8084970171408323
Random Forest: 0.8298378223862575

--- F1 Scores ( 8 ) ---
Decision Tree: 0.5605338417540515
Random Forest: 0.6058394160583942
--- AUC-ROC Scores ---
Decision Tree: 0.7792877763071504
Random Forest: 0.834454666049301

--- F1 Scores ( 9 ) ---
Decision Tree: 0.5717035611164581
Random Forest: 0.6060606060606061
--- AUC-ROC Scores ---
Decision Tree: 0.778318551790981
Random Forest: 0.8379072492336279

--- F1 Scores ( 10 ) ---
Decision Tree: 0.5343511450381679
Random Forest: 0.5995145631067961
---

For the `class_weight='balanced'` method, the Random Forest model with `n_estimators = 15` and `max_depth = 11` resulted in a F1 score of 0.6097 and an AUC-ROC score of 0.8490.

### Conclusion

In order to improve model quality, class imbalances were addressed. Two approaches were used:
1. Upsampling so that the ratio of both classes in the target feature is more balanced, near 1:1
2. Specifying the `class_weight='balanced'` argument to the Decision Tree and the Random Forest models
    
The Random Forest model using the upsampling method with `n_estimators = 12` and `max_depth = 11` yielded the best F1 score of 0.6309, which is a 0.0855 improvement from the previous step. The corresponding AUC-ROC score is 0.8442, meaning that there is a 84.42% chance that the model will be able to distinguish between positive class and negative class.

### Step 4. Perform the final testing.

In [None]:
ftv = pd.concat([features_train, features_valid])

In [None]:
ttv = pd.concat([target_train, target_valid])

In [None]:
features_up, target_up = upsample(ftv, ttv, 4)

In [None]:
print(target_up.value_counts())

1    6492
0    6377
Name: Exited, dtype: int64


In [None]:
rf_final_model = RandomForestClassifier(random_state=99, n_estimators=12, max_depth=11)
rf_final_model.fit(features_up, target_up)
predictions_test = rf_final_model.predict(features_test)
print("Random Forest F1 Score:", f1_score(target_test, predictions_test))

Random Forest F1 Score: 0.6035634743875279


In [None]:
rf_f_probabilities_test = rf_final_model.predict_proba(features_test)

rf_f_probabilities_one_test = rf_f_probabilities_test[:, 1]

print("Random Forest AUC-ROC Score:", roc_auc_score(target_test, rf_f_probabilities_one_test))

Random Forest AUC-ROC Score: 0.8350291804497079


### Conclusion

For the final test, the Random Forest model was used with the same hyperparameters and balancing procedure as in the previous step.

The final model was trained with the training set and the valid set in order to improve quality.

The final F1 score was 0.6036, and the final AUC-ROC score was 0.8350, meaning that there is a 83.50% chance that the model will be able to distinguish between positive class and negative class.