This project was one of the requirements within my postgraduate module called Big Data Management. The main aim of this project is to utilize Classification & Clustering technqiues onto the dataset. The classification model chosen in this project is Naive Bayes and the Clustering model chosen is K-means Clustering. Moreover, a graph analysis is performed using the Neo4j software to identify the relationship between the data.
The main process flow of this project is Exploratory Data Analysis, Data Cleaning, Data Visualization, Data Manipulation, Model Training, and Model Evaluation. The project is coded in Python Language using the Google Colab IDE.
Since the dataset is too big to be upload into GitHub. Below is the link to the dataset: https://drive.google.com/drive/folders/1kRlMDVR94O431XGDBnqnDNTEJtUf8ICH?usp=sharing
The code can be viewed in the "Code.ipynb" file.
The graph analysis code can be viewd in the "Graph Analysis.txt" file.
If anyone wants to use a part of the code. Please reference it. Thanks.