# Decision Tree Classifier – Bank Marketing Dataset

### Objective
To build a Decision Tree classifier to predict whether a customer will purchase a product or service based on their demographic and behavioral data.

## 1. Import Required Libraries

In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

## 2. Load the Dataset

In [2]:
df = pd.read_csv("bank.csv", sep=';')
df.head()

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y
0,30,unemployed,married,primary,no,1787,no,no,cellular,19,oct,79,1,-1,0,unknown,no
1,33,services,married,secondary,no,4789,yes,yes,cellular,11,may,220,1,339,4,failure,no
2,35,management,single,tertiary,no,1350,yes,no,cellular,16,apr,185,1,330,1,failure,no
3,30,management,married,tertiary,no,1476,yes,yes,unknown,3,jun,199,4,-1,0,unknown,no
4,59,blue-collar,married,secondary,no,0,yes,no,unknown,5,may,226,1,-1,0,unknown,no


## 3. Dataset Overview

In [3]:
df.shape

(4521, 17)

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4521 entries, 0 to 4520
Data columns (total 17 columns):
age          4521 non-null int64
job          4521 non-null object
marital      4521 non-null object
education    4521 non-null object
default      4521 non-null object
balance      4521 non-null int64
housing      4521 non-null object
loan         4521 non-null object
contact      4521 non-null object
day          4521 non-null int64
month        4521 non-null object
duration     4521 non-null int64
campaign     4521 non-null int64
pdays        4521 non-null int64
previous     4521 non-null int64
poutcome     4521 non-null object
y            4521 non-null object
dtypes: int64(7), object(10)
memory usage: 600.5+ KB


In [5]:
df['y'].value_counts()

no     4000
yes     521
Name: y, dtype: int64

## 4. Data Preprocessing

In [6]:
# Check for missing values
df.isnull().sum()

age          0
job          0
marital      0
education    0
default      0
balance      0
housing      0
loan         0
contact      0
day          0
month        0
duration     0
campaign     0
pdays        0
previous     0
poutcome     0
y            0
dtype: int64

## 5. Encoding Categorical Variables

In [7]:
# Convert categorical variables to dummy variables
df_encoded = pd.get_dummies(df, drop_first=True)

df_encoded.head()

Unnamed: 0,age,balance,day,duration,campaign,pdays,previous,job_blue-collar,job_entrepreneur,job_housemaid,...,month_jun,month_mar,month_may,month_nov,month_oct,month_sep,poutcome_other,poutcome_success,poutcome_unknown,y_yes
0,30,1787,19,79,1,-1,0,0,0,0,...,0,0,0,0,1,0,0,0,1,0
1,33,4789,11,220,1,339,4,0,0,0,...,0,0,1,0,0,0,0,0,0,0
2,35,1350,16,185,1,330,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,30,1476,3,199,4,-1,0,0,0,0,...,1,0,0,0,0,0,0,0,1,0
4,59,0,5,226,1,-1,0,1,0,0,...,0,0,1,0,0,0,0,0,1,0


## 6. Feature Selection and Train-Test Split

In [8]:
# Separate features and target variable
X = df_encoded.drop('y_yes', axis=1)
y = df_encoded['y_yes']

# Split the data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

## 7. Decision Tree Model Building

In [9]:
# Initialize the Decision Tree classifier
dt_model = DecisionTreeClassifier(random_state=42)

# Train the model
dt_model.fit(X_train, y_train)

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=42,
            splitter='best')

## 8. Model Evaluation

In [10]:
# Make predictions
y_pred = dt_model.predict(X_test)

# Model accuracy
accuracy = accuracy_score(y_test, y_pred)
accuracy

0.8695652173913043

In [11]:
# Confusion matrix
confusion_matrix(y_test, y_pred)

array([[1113,   92],
       [  85,   67]], dtype=int64)

In [12]:
# Classification report
print(classification_report(y_test, y_pred))

             precision    recall  f1-score   support

          0       0.93      0.92      0.93      1205
          1       0.42      0.44      0.43       152

avg / total       0.87      0.87      0.87      1357



## 9. Conclusion

A Decision Tree classifier was successfully built to predict whether a customer would purchase a product or service based on demographic and behavioral data.  

The model achieved an accuracy of approximately **87%**, indicating that the Decision Tree was able to effectively learn patterns from the dataset and make reliable predictions.  

This task provided hands-on experience with the complete machine learning workflow, including data preprocessing, model training, and evaluation.