This project was coursework for the Big Data module of my Data Science MSc at City University.
The project used large volumes of text data from the Project Gutenberg online repository. The task was to compare the performance of 3 classifiers - Naive Bayes, Decisions Trees and Logistic Regression.
Trainind and Testing data was created, and documents processed using TF/IDF methodology, then each classifier was trained and stats on performance, accuracy collacted for various hyperparameters.
I scored 92% for this coursework.