Skip to content

Naive Bayes/Decision Tree/Logistic Regression in Apache Spark and Python

Notifications You must be signed in to change notification settings

suzannefox/ApacheSpark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ApacheSpark

Big Data project with Apache Spark

This project was coursework for the Big Data module of my Data Science MSc at City University.

The project used large volumes of text data from the Project Gutenberg online repository. The task was to compare the performance of 3 classifiers - Naive Bayes, Decisions Trees and Logistic Regression.

Trainind and Testing data was created, and documents processed using TF/IDF methodology, then each classifier was trained and stats on performance, accuracy collacted for various hyperparameters.

I scored 92% for this coursework.

About

Naive Bayes/Decision Tree/Logistic Regression in Apache Spark and Python

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages