Anomaly Detection on BankMarketing Dataset

This a solution notebook to an assignment question given in a Data Mining graduate course. Each code block is accompanied by relevant analysis wherever required.
Dataset link: https://github.com/GuansongPang/ADRepository-Anomaly-detection-datasets/blob/main/numerical%20data/DevNet%20datasets/bank-additional-full_normalised.csv
Samples with Class label 1 are treated as anomalous.
Broadly, the following steps have been performed in this solution notebook:

Using KNN as baseline model and fitting it on the dataset
Dimensionality Reduction using PCA and retraining the model using reduced dimensions.

Performed accuracy comparison of baseline model with the new model obtained after retaining various levels of variance (60,70, 80,90,99)%

Clustering using DBSCAN to remove anomalies and retraining the model after removal of anomalies.

Performed accuracy comparison of baseline model with model trained after anomalies removal.

Used a classification model(Decision Tree) to identify anomalies on test set. Followed by retraining the model after anomalies removal.

Performed accuracy comparison of baseline model with model trained after anomalies removal.

These above assumptions and the flow of work is according to the questions asked in assignment.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
AnomalyDetection.ipynb		AnomalyDetection.ipynb
README.md		README.md
bank-additional-full_normalised.csv		bank-additional-full_normalised.csv

Provide feedback