
# Project: Predicting Client Subscription to Term Deposits

## Objective
The goal of this project is to build a predictive model that determines whether a client will subscribe to a term deposit based on various features.

## Dataset Overview
The dataset contains information about bank customers, including demographics, past interactions, and economic indicators.

## Instructions
1. **Data Exploration**: Load the dataset and perform exploratory data analysis (EDA).
2. **Data Preprocessing**: Handle missing values, encode categorical variables, and scale numerical features.
3. **Model Training**: Train a classification model to predict whether a client subscribes to a term deposit.
4. **Evaluation**: Assess the model’s performance using appropriate metrics.
5. **Interpretation**: Provide insights based on the model results.


In [1]:

# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix


In [2]:

# Load the dataset
# (Replace 'your_dataset.csv' with the actual dataset file)
df = pd.read_csv('bankmarketing.csv')

# Display basic information about the dataset
df.info()
df.head(5)
df.tail()
df.max()

FileNotFoundError: [Errno 2] No such file or directory: 'bankmarketing.csv'


## Exploratory Data Analysis (EDA)
- Check for missing values
- Visualize the distribution of key features
- Analyze correlations


In [None]:

# Check for missing values or null values
print("Missing Values in Each Column:")
print(df.isnull().sum())
print(df.isnull().sum())

# Visualize distributions
num_cols=df.select_dtypes(include=['int64','float64']).columns

df[num_cols].hist(figsize=(12,8),bins=20,edgecolor='black')
plt.suptitle("Distribution of Numerical Variables",fontsize=16)
plt.show()


# Correlation heatmap
plt.figure(figsize=(10, 6))
sns.heatmap(df[num_cols].corr(), annot=True, cmap='coolwarm', linewidths=0.5)
plt.title("Correlation Heatmap")
plt.show()



## Data Preprocessing
- Handle missing values (if any)
- Encode categorical variables
- Scale numerical features


In [None]:

# Encode categorical variables

le = LabelEncoder()
categorical_cols = ['job', 'marital', 'education', 'default', 'housing', 'loan', 'contact', 'month', 'poutcome', 'day_of_week', 'y']
for col in categorical_cols:
    df[col] = le.fit_transform(df[col])

# Standardize numerical features

numerical_cols = ['age', 'duration', 'campaign', 'pdays', 'previous', 'emp.var.rate', 
                  'cons.price.idx', 'cons.conf.idx', 'euribor3m', 'nr.employed']
scaler = StandardScaler()
df[numerical_cols] = scaler.fit_transform(df[numerical_cols])

# Define features and target variable

X = df.drop(columns=['y'])  # Features
y = df['y']  # Target variable

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)


# Split into train and test sets
X_train.shape, X_test.shape, y_train.shape, y_test.shape



# Display dataset shapes
df.info(), df.head(5)






## Model Training
Train a classification model to predict term deposit subscription.


In [None]:
# Train a Random Forest Classifier
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)

# Predictions
y_pred = rf_model.predict(X_test)

# Model evaluation
accuracy = accuracy_score(y_test, y_pred)
classification_rep = classification_report(y_test, y_pred)

# Confusion matrix visualization
conf_matrix = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(6, 5))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=['No', 'Yes'], yticklabels=['No', 'Yes'])
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

# Display 
accuracy, classification_rep
