# ANN Bank Customer Churn Prediction

Dataset: https://www.kaggle.com/datasets/radheshyamkollipara/bank-customer-churn

- RowNumber — corresponds to the record (row) number and has no effect on the output.
- CustomerId — contains random values and has no effect on customer leaving the bank.
- Surname — the surname of a customer has no impact on their decision to leave the bank.
- CreditScore — can have an effect on customer churn, since a customer with a higher credit score is less likely to leave the bank.
- Geography — a customer’s location can affect their decision to leave the bank.
- Gender — it’s interesting to explore whether gender plays a role in a customer leaving the bank.
- Age — this is certainly relevant, since older customers are less likely to leave their bank than younger ones.
- Tenure — refers to the number of years that the customer has been a client of the bank. Normally, older clients are more loyal and less likely to leave a bank.
- Balance — also a very good indicator of customer churn, as people with a higher balance in their accounts are less likely to leave the bank compared to those with lower balances.
- NumOfProducts — refers to the number of products that a customer has purchased through the bank.
- HasCrCard — denotes whether or not a customer has a credit card. This column is also relevant, since people with a credit card are less likely to leave the bank.
- IsActiveMember — active customers are less likely to leave the bank.
- EstimatedSalary — as with balance, people with lower salaries are more likely to leave the bank compared to those with higher salaries.
- Exited — whether or not the customer left the bank.
- Complain — customer has complaint or not.
- Satisfaction Score — Score provided by the customer for their complaint resolution.
- Card Type — type of card hold by the customer.
- Points Earned — the points earned by the customer for using credit card.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Dataset

In [None]:
df0 = pd.read_csv('./Customer-Churn-Records.csv')
df = df0.copy()

In [None]:
df.info()

---

## Data Preprocessing

In [None]:
df

In [None]:
df.columns = df.columns.str.lower()

In [None]:
df.columns.tolist()

In [None]:
cols_to_rename = {
    'rownumber':'row_num',
    'customerid':'customer_id',
    'creditscore':'credit_score',
    'geography':'country',
    'numofproducts':'num_of_products',
    'hascrcard':'has_credit_card',
    'isactivemember':'is_active',
    'estimatedsalary':'estimated_salary',
    'complain':'complained',
    'satisfaction score':'satisfaction_score',
    'card type':'card_type',
    'point earned':'points_earned'
}

In [None]:
df = df.rename(columns=cols_to_rename)

In [None]:
df.columns.tolist()

In [None]:
df['exited'] = df.pop('exited')

In [None]:
df

In [None]:
df.isna().sum()

# No Missing Values

In [None]:
df.duplicated().sum()

# No Duplicates

---

## EDA

In [None]:
df.columns.tolist()

---

# exited - Target Variable

In [None]:
df['exited'].value_counts()

In [None]:
# Map labels to Data for EDA

label_mapping = {
    0: 'Retained', 
    1: 'Exited'    
}
df['exited'] = df['exited'].replace(label_mapping)

df.head()

In [None]:
fig, ax = plt.subplots(figsize=(10, 5))

sns.countplot(
    data=df,
    x="exited",
    hue='exited'
)

ax.set_title("Total Customer Churned", fontsize=18)

for container in ax.containers:
    ax.bar_label(container)

In [None]:
fig, ax = plt.subplots(figsize=(6, 6))

ax.pie(x=df.exited.value_counts().values,
       labels=df.exited.value_counts().index,
       autopct='%.1f%%',
       explode=(0, 0.1),
       colors=['lightskyblue', 'gold'],
       textprops={'fontsize': 12},
       shadow=True
       )

plt.title("Percentage of Churned Customers", fontdict = {'fontsize': 14})

plt.show()

---

# row_num 

`row_num` plays no role in the prediction and therefore needs to be removed.

In [None]:
df = df.drop('row_num',axis=1)

In [None]:
df.columns

# customer_id

`customer_id` plays no role in the prediction and therefore needs to be removed.

In [None]:
df = df.drop('customer_id',axis=1)

In [None]:
df.columns

# surname

`surname` plays no role in the prediction and therefore needs to be removed.

In [None]:
surname_count_top50 = df['surname'].value_counts().head(50).sort_values(ascending=True)

In [None]:
plt.figure(figsize=(6,20))
surname_count_top50.plot(kind='barh')

plt.title('Top 50 Most Frequent Customer Surnames')
plt.xlabel('Count')
plt.ylabel('Surnames')

plt.xticks(rotation=45,ha='right')
plt.grid(axis='x',linestyle='--',alpha=0.7)
plt.tight_layout()

plt.show()

In [None]:
df = df.drop('surname',axis=1)

In [None]:
df.columns

# credit_score

In [None]:
df['credit_score'].describe()

In [None]:
plt.figure(figsize=(15, 5))

sns.histplot(data=df,
             x="credit_score",
             bins=85,
             kde=True,
             hue="exited")

plt.title("credit_score Distribution by Exited", fontsize=20, color="darkblue")
plt.xlabel("Credit Score", fontsize=18)
plt.ylabel("Frequency", fontsize=18)
plt.xticks(fontsize=14, color='red')

plt.show();

In [None]:
plt.figure(figsize=(15,6))

sns.boxplot(
    data=df,
    x="exited", # Use the categorical variable 'exited' here
    y="credit_score", # Use the numerical variable 'credit_score' here
    showmeans=True,
    meanprops={"marker":"o",
                "markerfacecolor":"white",
                "markeredgecolor":"red",
                "markersize":"10"},
    palette='Set1',
    hue='exited'
)

plt.title("Credit Score Distribution by Exited", fontsize=20, color="darkblue");

In [None]:
df.groupby("exited").credit_score.describe()

`credit_score` seems to play little role in the prediction.

### 

# country

In [None]:
df['country'].value_counts()

In [None]:
df.groupby('country').exited.value_counts()

In [None]:
plt.figure(figsize=(15,5))

ax = sns.countplot(data=df,x='country',hue='exited')

plt.title('The Distribution of country by exited')

for container in ax.containers:
    ax.bar_label(container)

In [None]:
ctry = df.groupby('country').exited.value_counts(normalize=True)

In [None]:
ctry

In [None]:
plt.figure(figsize=(18,6))

index = 1

for i in [0,2,4]:
    plt.subplot(1,3,index)
    ctry[i:i+2].plot.pie(
        subplots=True,
        autopct='%.2f%%',
        textprops={
            'fontsize':12
        },
        color=['red','blue']
    )
    plt.title(ctry.index[i][0], fontdict={'fontsize':14})

    plt.ylabel('') 

    index+=1


`Germany` has more exited customers in comparison to the others.

# gender

In [None]:
df['gender'].value_counts()

In [None]:
df.groupby('exited').gender.value_counts()

In [None]:
plt.figure(figsize=(15,5))

ax = sns.countplot(data=df,x='gender',hue='exited',palette='Set1')

plt.title('The Distribution of gender by exited')

for container in ax.containers:
    ax.bar_label(container)

In [None]:
gender = df.groupby('exited').gender.value_counts(normalize=True, sort=False)

In [None]:
gender

In [None]:
plt.figure(figsize=(18,6))

index = 1

for i in [0,2]:
    plt.subplot(1,3,index)
    gender[i:i+2].plot.pie(
        subplots=True,
        autopct="%.2f%%",
        textprops={
            'fontsize':12
        },
        color=['red','blue']
    )
    plt.title(gender.index[i][0], fontdict={'fontsize':14})
    plt.ylabel('') 

    index+=1


`Female` customers have exited more.

# age

In [None]:
df['age'].describe()

In [None]:
df.groupby("exited").age.describe()

In [None]:
plt.figure(figsize=(15, 5))

sns.histplot(data=df,
             x="age",
             bins=20,
             kde=True,
             hue="exited")

plt.title("Age Distribution by Exited", fontsize=20, color="darkblue")
plt.xlabel("Age", fontsize=18)
plt.ylabel("Frequency", fontsize=18)
plt.xticks(fontsize=14, color='red')

plt.show();

In [None]:
sns.boxplot(data=df,
            x="exited",
            y="age",
            showmeans=True,
            palette='Set1',
            hue='exited',
            meanprops={"marker":"o",
                       "markerfacecolor":"white",
                       "markeredgecolor":"red",
                       "markersize":"10"})

plt.title("Age Distribution by Exited", fontsize=20, color="darkblue");

The exited customers are `on average, 45 yrs old`

# tenure

In [None]:
df['tenure'].describe()

In [None]:
df['tenure'].value_counts()

In [None]:
df.groupby('exited').tenure.describe()

In [None]:
plt.figure(figsize=(12, 5))

ax = sns.countplot(data=df, x="tenure", hue='exited', palette='Set1')

plt.title("Tenure by Exited", fontsize=16, color="darkblue")

ax.legend(
    title='Exited Status', 
    # This specifies the bounding box location for the legend
    # (x, y, width, height) in axes coordinates.
    # (1.05, 1) means 5% to the right of the right edge and at the top edge.
    bbox_to_anchor=(1.05, 1), 
    # 'upper left' places the upper-left corner of the legend box 
    # at the bbox_to_anchor coordinates.
    loc='upper left' 
)

for container in ax.containers:
    ax.bar_label(container);

In [None]:
sns.boxplot(data=df,
            x="exited",
            y="tenure",
            palette='Set1',
            hue='exited',
            showmeans=True,
            meanprops={"marker":"o",
                       "markerfacecolor":"white",
                       "markeredgecolor":"black",
                       "markersize":"10"})


plt.title("Tenure by Exited", fontsize=18, color='darkblue')

plt.show();

`Tenure` doesn't seem to play a role in customer churn. The average tenure is 5 yrs.

# balance

In [None]:
df['balance'].describe()

In [None]:
df.groupby('exited').balance.describe()

In [None]:
plt.figure(figsize=(15, 5))

sns.histplot(data=df,
             x="balance",
             bins=25,
             kde=True,
             hue="exited")

plt.title("Balance Distribution by Exited", fontsize=20, color="darkblue")
plt.xlabel("Balance", fontsize=18)
plt.ylabel("Frequency", fontsize=18)
plt.xticks(fontsize=14, color='red')

plt.show();

In [None]:
sns.boxplot(data=df,
            x="exited",
            y="balance",
            palette='Set1',
            hue='exited',
            showmeans=True,
            meanprops={"marker":"o",
                       "markerfacecolor":"white",
                       "markeredgecolor":"black",
                       "markersize":"10"})


plt.title("Balance by Exited", fontsize=18, color='darkblue')

plt.show();

The exited customers have `the avg balance of 91109.476006 EUR`, while the retained customers have `72742.750663 EUR on avg`.

# num_of_products

In [None]:
df['num_of_products'].value_counts()

In [None]:
df.groupby('exited').num_of_products.value_counts()

In [None]:
plt.figure(figsize=(12, 5))

ax = sns.countplot(data=df, x="num_of_products", hue='exited', palette='Set1')

plt.title("Number of Products by Exited", fontsize=16, color="darkblue")

for container in ax.containers:
    ax.bar_label(container);

In [None]:
num_of_prods = df.groupby('exited').num_of_products.value_counts(normalize=True, sort=False).reset_index(name="percentage")

In [None]:
num_of_prods

In [None]:
nop_pct = num_of_prods

fig, ax = plt.subplots(figsize=(12, 5))

ax = sns.barplot(data=nop_pct,
                 x="num_of_products",
                 y="percentage",
                 hue="exited",
                 order=nop_pct.groupby("num_of_products").percentage.sum().sort_values(ascending=False).index)

plt.title("The Distribution of Products Bought by Exited", fontsize=18, color="darkblue")
plt.xticks(rotation=0)

for container in ax.containers:
    ax.bar_label(container, fmt="%.2f");

Most exited customers only had `1 product`.
Most retained customers have `2 products`. 

# has_credit_card

In [None]:
df['has_credit_card'].value_counts()

In [None]:
has_cr_card = df.groupby('exited').has_credit_card.value_counts(normalize=True)

In [None]:
has_cr_card

In [None]:
# The labels corresponding to the likely index values [0, 1]
pie_labels = ['No', 'Yes']

plt.figure(figsize=(18,6))

index = 1

for i in [0,2]:
    plt.subplot(1,3,index)
    has_cr_card[i:i+2].plot.pie(
        subplots=True,
        autopct="%.2f%%",
        labels=pie_labels,
        textprops={
            'fontsize':12
        },
        color=['red','blue']
    )
    plt.title(has_cr_card.index[i][0], fontdict={'fontsize':14})
    
    plt.ylabel('') 

    index+=1

plt.show()

In [None]:
plt.figure(figsize=(12, 5))

ax = sns.countplot(data=df, x="has_credit_card", hue='exited', palette='Set1')

ax.set_xticks([0, 1])
ax.set_xticklabels(['No Credit Card', 'Has Credit Card'])

plt.title("Credit Card Ownnership by Exited", fontsize=16, color="darkblue")


for container in ax.containers:
    ax.bar_label(container);

`Credit card ownnership` doesn't seem to play a role in predicting customer churn in this case.

# is_active

In [None]:
df['is_active'].value_counts()

In [None]:
isactive = df.groupby('exited').is_active.value_counts(normalize=True, sort=False)

In [None]:
isactive

In [None]:
# The labels corresponding to the likely index values [0, 1]
pie_labels = ['Inactive', 'Active']

plt.figure(figsize=(18,6))

index = 1

for i in [0,2]:
    plt.subplot(1,3,index)
    isactive[i:i+2].plot.pie(
        subplots=True,
        autopct="%.2f%%",
        labels=pie_labels,
        textprops={
            'fontsize':12
        },
        color=['red','blue']
    )
    plt.title(isactive.index[i][0], fontdict={'fontsize':14})
    
    plt.ylabel('') 

    index+=1

plt.show()

Most exited customers are `inactive`

# estimated_salary

In [None]:
df['estimated_salary'].describe()

In [None]:
plt.figure(figsize=(15, 5))

sns.histplot(data=df,
             x="estimated_salary",
             bins=200,
             kde=True,
             hue="exited")

plt.title("Estimated Salary Distribution by Exited", fontsize=20, color="darkblue")
plt.xlabel("Estimated Salary", fontsize=18)
plt.ylabel("Frequency", fontsize=18)
plt.xticks(fontsize=14, color='red')

plt.show();

In [None]:
df.groupby('exited').estimated_salary.describe()

In [None]:
sns.boxplot(data=df,
            x="exited",
            y="estimated_salary",
            palette='Set1',
            hue='exited',
            showmeans=True,
            meanprops={"marker":"o",
                       "markerfacecolor":"white",
                       "markeredgecolor":"black",
                       "markersize":"10"})


plt.title("Estimated Salary by Exited", fontsize=18, color='darkblue')

plt.show();

`Estimated Salary` doesn't seem to play a role in predicting the customer churn in this case.

# complained

In [None]:
df['complained'].value_counts()

In [None]:
cmplnd = df.groupby('exited').complained.value_counts(normalize=True, sort=False)

In [None]:
cmplnd

In [None]:
# The labels corresponding to the likely index values [0, 1]
pie_labels = ['Didn\'t Complain', 'Complained']

plt.figure(figsize=(18,6))

index = 1

for i in [0,2]:
    plt.subplot(1,2,index)
    cmplnd[i:i+2].plot.pie(
        subplots=True,
        autopct="%.2f%%",
        labels=pie_labels,
        textprops={
            'fontsize':12
        },
        color=['red','blue']
    )
    plt.title(cmplnd.index[i][0], fontdict={'fontsize':14})
    
    plt.ylabel('') 

    index+=1

plt.show()

Exited customers are very much likely to have `complained`.

# satisfaction_score

In [None]:
df['satisfaction_score'].describe()

In [None]:
df.groupby('exited')['satisfaction_score'].value_counts(normalize=True, sort=False)

In [None]:
plt.figure(figsize=(12, 5))

ax = sns.countplot(data=df, x="satisfaction_score", hue='exited', palette='Set1')

plt.title("Satisfaction Score by Exited", fontsize=16, color="darkblue")

ax.legend(
    title='Exited Status', 
    # This specifies the bounding box location for the legend
    # (x, y, width, height) in axes coordinates.
    # (1.05, 1) means 5% to the right of the right edge and at the top edge.
    bbox_to_anchor=(1.05, 1), 
    # 'upper left' places the upper-left corner of the legend box 
    # at the bbox_to_anchor coordinates.
    loc='upper left' 
)

for container in ax.containers:
    ax.bar_label(container);

In [None]:
sns.boxplot(data=df,
            x="exited",
            y="satisfaction_score",
            palette='Set1',
            hue='exited',
            showmeans=True,
            meanprops={"marker":"o",
                       "markerfacecolor":"white",
                       "markeredgecolor":"black",
                       "markersize":"10"})


plt.title("Satisfaction Score by Exited", fontsize=18, color='darkblue')

plt.show();

`Satisfaction Score` doesn't seem to play a role in predicting customer churn in this case.

# card_type

In [None]:
df['card_type'].value_counts()

In [None]:
df.groupby('exited').card_type.value_counts(normalize=True, sort=False)

In [None]:
plt.figure(figsize=(12, 5))

ax = sns.countplot(data=df, x="card_type", hue='exited', palette='Set1')

plt.title("Card Type by Exited", fontsize=16, color="darkblue")

ax.legend(
    title='Exited Status', 
    # This specifies the bounding box location for the legend
    # (x, y, width, height) in axes coordinates.
    # (1.05, 1) means 5% to the right of the right edge and at the top edge.
    bbox_to_anchor=(1.05, 1), 
    # 'upper left' places the upper-left corner of the legend box 
    # at the bbox_to_anchor coordinates.
    loc='upper left' 
)

for container in ax.containers:
    ax.bar_label(container);

`Card type` doesn't seem to play a role in predicting customer churn in this case.

# points_earned

In [None]:
df['points_earned'].describe()

In [None]:
plt.figure(figsize=(15, 5))

sns.histplot(data=df,
             x="points_earned",
             bins=100,
             kde=True,
             hue="exited")

plt.title("Points Distribution by Exited", fontsize=20, color="darkblue")
plt.xlabel("Points Earned", fontsize=18)
plt.ylabel("Frequency", fontsize=18)
plt.xticks(fontsize=14, color='red')

plt.show();

In [None]:
sns.boxplot(data=df,
            x="exited",
            y="points_earned",
            palette='Set1',
            hue='exited',
            showmeans=True,
            meanprops={"marker":"o",
                       "markerfacecolor":"white",
                       "markeredgecolor":"black",
                       "markersize":"10"})


plt.title("Points by Exited", fontsize=18, color='darkblue')

plt.show();

`Points earned` doesn't seem to play a role in predicting customer churn for this case.

---

In [None]:
df.head()

In [None]:
# Map labels to Data after EDA

label_mapping = {
    'Retained':0, 
    'Exited':1    
}
df['exited'] = df['exited'].replace(label_mapping)

df.head()

---

## Remove Irrelvant Columns

In [None]:
df.columns.tolist()

In [None]:
cols_to_drop = [
    'credit_score',
    'tenure',
    'has_credit_card',
    'estimated_salary',
    'satisfaction_score',
    'card_type',
    'points_earned'
]

In [None]:
df = df.drop(columns=cols_to_drop,axis=1)

In [None]:
df.head()

---

## Data Encoding

In [None]:
df.info()

In [None]:
df.head()

## Handling Categorical Data

### Labelencoder - gender

In [None]:
from sklearn.preprocessing import LabelEncoder

In [None]:
lbl_encoder = LabelEncoder()

In [None]:
df['gender'] = lbl_encoder.fit_transform(df['gender'])

In [None]:
df

## OneHotEncoder - country

In [None]:
from sklearn.preprocessing import OneHotEncoder

In [None]:
oh_encoder = OneHotEncoder(sparse_output=False)

In [None]:
country_encoded = oh_encoder.fit_transform(df[['country']])

In [None]:
country_encoded

In [None]:
oh_encoder.get_feature_names_out(['country'])

In [None]:
country_encoded_df = pd.DataFrame(country_encoded, columns=oh_encoder.get_feature_names_out(['country']))

In [None]:
country_encoded_df

In [None]:
country_encoded_df = country_encoded_df.astype(int)

In [None]:
country_encoded_df

In [None]:
df = pd.concat([df,country_encoded_df],axis=1)

In [None]:
df.head()

In [None]:
df = df.drop('country',axis=1)

In [None]:
df

---

### Correlation Check

In [None]:
correlations = df.corr()['Exited'].sort_values(ascending=False)
print(correlations)

---

## Declare Dependent Variable & Independent Variables

In [None]:
X = df.drop(columns='exited')

In [None]:
y = df['exited']

In [None]:
X.head()

In [None]:
y.head()

---

# Train Test Split

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=42)

In [None]:
X_train.shape, y_train.shape

In [None]:
X_test.shape, y_test.shape

---

# Scale Data

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
scaler = StandardScaler()

In [None]:
X_train = scaler.fit_transform(X_train)

In [None]:
X_test = scaler.transform(X_test)

In [None]:
X_train

In [None]:
X_test

---

# ANN

In [None]:
# Currently using tf-nightly because tensorflow 2.20.0 still doesn't fully support RTX5070 - 20251125

In [None]:
# CUDA: 12.5
# CUDNN: 9.3

In [None]:
# Fallback on CPU - 20251125
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

In [None]:
# import tensorflow.keras as tf
from keras.models import Sequential
from keras.layers import Dense, Input

### ANN Modelling

In [None]:
# Total Number of Inputs
X_train.shape[1]

In [None]:
model = Sequential([
    Input(shape=(X_train.shape[1],)),

    # HL1: Connected to Input Layer
    Dense(64, activation='relu'),

    # HL2
    Dense(32, activation='relu'),

    # HL3 - Output Layer
    Dense(1, activation='sigmoid')

])

In [None]:
model.summary()

In [None]:
from keras.optimizers import Adam
from keras.losses import BinaryCrossentropy

In [None]:
adam = Adam(learning_rate=0.01)

In [None]:
bicrossentropy = BinaryCrossentropy()

In [None]:
model.compile(
    optimizer=adam, 
    loss=bicrossentropy, 
    metrics=['accuracy']
)

### TensorBoard

In [None]:
from keras.callbacks import EarlyStopping, TensorBoard

In [None]:
import datetime

log_dir = 'logs/fit/'+datetime.datetime.now().strftime('%Y%m%d-%H%M%S')

In [None]:
tensorboard_cb = TensorBoard(log_dir=log_dir, histogram_freq=1, write_graph=True)
earlystopping_cb = EarlyStopping(monitor='val_loss', patience=5,restore_best_weights=True)

In [None]:
history = model.fit(
    X_train,
    y_train,
    validation_data=(X_test,y_test),
    epochs=100,
    callbacks=[tensorboard_cb,earlystopping_cb]
)

---

# Exporting Components

In [None]:
import pickle
pickle.dump(oh_encoder, open('oh_encoder.pkl', 'wb'))
pickle.dump(lbl_encoder, open('lbl_encoder.pkl', 'wb'))
pickle.dump(scaler, open('scaler.pkl', 'wb'))

# Export `.keras` file --- ANN Model

In [None]:
model.save('ann_model.keras')

---

## Tensorboard

In [None]:
%load_ext tensorboard

In [None]:
# The tensorboard ext has to be called twice to start working - 20251126
%tensorboard --logdir logs/fit
%tensorboard --logdir logs/fit

---

# Model Prediction: ANN

In [None]:
df.columns.tolist()

In [None]:
df.head()

In [None]:
data_input = {
 'gender':'Female',
 'age':45,
 'balance':200000,
 'num_of_products':4,
 'is_active':0,
 'complained':1,
 'country':'Germany'
}

In [None]:
country_encoded_df

In [None]:
country_input_encoded = oh_encoder.transform([[data_input['country']]])

In [None]:
country_input_encoded_df = pd.DataFrame(country_input_encoded, columns=oh_encoder.get_feature_names_out(['country']))

In [None]:
country_input_encoded_df

In [None]:
input_df = pd.DataFrame([data_input])
input_df

In [None]:
# LabelEncode - gender input
input_df['gender'] = lbl_encoder.transform(input_df['gender'])

In [None]:
input_df

In [None]:
input_df = pd.concat([input_df.drop('country', axis=1), country_input_encoded_df], axis=1)

In [None]:
input_df

### Scaling Input

In [None]:
input_scaled = scaler.transform(input_df)

In [None]:
input_scaled

### Prediction

In [None]:
pred = model.predict(input_scaled)

In [None]:
pred_proba = pred[0][0]

In [None]:
pred_proba

In [None]:
if pred_proba > 0.5:
    print('The customer is likely to churn.')
else:
    print('The customer is NOT likely to churn.')

---

## Evaluation

In [None]:
# 1. Predict on the WHOLE test set
y_pred_probs = model.predict(X_test) 

# 2. Convert all probabilities to 0 or 1
y_pred = (y_pred_probs > 0.5).astype("int32")

# 3. Run the evaluation
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", cm)
print(f"Accuracy Score: {accuracy_score(y_test, y_pred):.4f}")

# Classification Report:
print("\nClassification Report:\n", classification_report(y_test, y_pred))

In [None]:
# Check if you accidentally left the target in X
print("Features used in model:", X.columns.tolist())

In [None]:
correlations = df.corr()['exited'].sort_values(ascending=False)
print(correlations)