Skip to content

Perform anomaly detection on Bank Marketing dataset

Notifications You must be signed in to change notification settings

havelhakimi/BankMarketing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Anomaly Detection on BankMarketing Dataset

This a solution notebook to an assignment question given in a Data Mining graduate course. Each code block is accompanied by relevant analysis wherever required.
Dataset link: https://github.com/GuansongPang/ADRepository-Anomaly-detection-datasets/blob/main/numerical%20data/DevNet%20datasets/bank-additional-full_normalised.csv
Samples with Class label 1 are treated as anomalous.
Broadly, the following steps have been performed in this solution notebook:

  • Applied different statistical measures and presented them on infograph.
    • Count plot and classwise categorical plot for categorical attributes
    • Histogram plot for continuous attribute
    • Pie chart depciting class distribution
    • Correlation Analysis
  • Using KNN as baseline model and fitting it on the dataset
  • Dimensionality Reduction using PCA and retraining the model using reduced dimensions.
    • Performed accuracy comparison of baseline model with the new model obtained after retaining various levels of variance (60,70, 80,90,99)%
  • Clustering using DBSCAN to remove anomalies and retraining the model after removal of anomalies.
    • Performed accuracy comparison of baseline model with model trained after anomalies removal.
  • Used a classification model(Decision Tree) to identify anomalies on test set. Followed by retraining the model after anomalies removal.
    • Performed accuracy comparison of baseline model with model trained after anomalies removal.
These above assumptions and the flow of work is according to the questions asked in assignment.