# 📊 NZ Bank – Exploratory Data Analysis (EDA)

In this notebook, we explore customer data from a fictional NZ bank to understand:
- Demographic and financial trends
- Churn patterns across age, gender, region
- Correlations between features

This step will help guide future machine learning models and dashboards.

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set theme for all plots
sns.set(style='whitegrid', palette='Set2')


In [None]:
# Upload file if using Google Colab
from google.colab import files
uploaded = files.upload()

import io
df = pd.read_csv(io.BytesIO(uploaded['cleaned_nz_banking_data.csv']))
df.head()


In [None]:
df.info()
df.describe()


In [None]:
plt.figure(figsize=(8, 5))
sns.histplot(df['Age'], bins=20, kde=True)
plt.title('Customer Age Distribution')
plt.xlabel('Age')
plt.ylabel('Count')
plt.show()


In [None]:
plt.figure(figsize=(8, 5))
sns.histplot(df['AccountBalance'], bins=30, kde=True, color='orange')
plt.title('Account Balance Distribution')
plt.xlabel('Balance (NZD)')
plt.ylabel('Count')
plt.show()


In [None]:
plt.figure(figsize=(6, 4))
sns.countplot(data=df, x='Gender', hue='Churn')
plt.title('Churn Count by Gender')
plt.xlabel('Gender')
plt.ylabel('Number of Customers')
plt.legend(title='Churn', labels=['No', 'Yes'])
plt.show()


In [None]:
plt.figure(figsize=(10, 5))
sns.countplot(data=df, y='Region', hue='Churn', order=df['Region'].value_counts().index)
plt.title('Churn by Region')
plt.xlabel('Count')
plt.ylabel('Region')
plt.legend(title='Churn', labels=['No', 'Yes'])
plt.show()


In [None]:
plt.figure(figsize=(8, 5))
sns.boxplot(data=df, x='Churn', y='AccountBalance')
plt.title('Account Balance by Churn')
plt.xlabel('Churn')
plt.ylabel('Balance (NZD)')
plt.xticks([0, 1], ['No', 'Yes'])
plt.show()


In [None]:
plt.figure(figsize=(10, 6))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)
plt.title('Correlation Between Numeric Features')
plt.show()


# ✅ Summary

From our EDA, we observed:
- Most customers are aged between 30–60
- Higher churn among some regions and gender groups
- Correlation exists between credit score and churn, but not extremely strong
- Account balance distributions vary widely

These insights will help us create more accurate churn prediction models next.