# Bank classification

In this analysis I predict whether the the client will subscribe a term deposit. To achieve accurate predictions, I will be utilizing a dataset from UCI repository that includes a range of client attributes, such as age, job type, marital status, education level, and financial details. By applying machine learning techniques, I aim to uncover patterns and insights that can help enhance marketing strategies and client engagement efforts. The ultimate goal of the project is to improve the understanding of client behavior and preferences in order to better meet their needs. In this case, the Random Forest Model gave the best accuracy.

## Importing the libraries

In [142]:
import pandas as pd
import numpy as np

## Importing the dataset

In [143]:
dataset = pd.read_csv('bank.csv', sep=';')
X = dataset.iloc[:, :-1]
y = dataset.iloc[:, -1]

## Encoding categorical variables

### Encoding the Independent Variable

In [144]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1, 2, 3, 4, 6, 7, 8, 10, 15])], remainder='passthrough')
X = np.array(ct.fit_transform(X))

### Encoding the Dependent Variable

In [145]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y = le.fit_transform(y)

## Splitting the dataset into the Training set and Test set

In [146]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

## Feature Scaling

In [147]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

## Training the Random Forest Model

In [148]:
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)

## Predict the Test set results

In [149]:
from sklearn.metrics import confusion_matrix, accuracy_score
y_pred = classifier.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

[[982  11]
 [113  25]]


0.8903625110521662

## Applying k-Fold cross validation

In [150]:
from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10)
print("Accuracy: {:.2f} %".format(accuracies.mean()*100))
print("Standard Deviation: {:.2f} %".format(accuracies.std()*100))

Accuracy: 89.50 %
Standard Deviation: 0.78 %
