Reddit Comments Classification

Abstract

The task at hand for this project was to build a classifier for reddit comments from 20 subreddits, ie 20 classes. Different models were tested such as Logistic Regression, Multinomial Naive Bayes, and Support Vector Machines from the sklearn library. The Bernouilli Naive Bayes Model was also implemented from scratch but did not offer the best accuracy. Some findings were in the importance of task-specific text pre-processing combined with tfidf vectorization. The training data for this project was composed of 70000 reddit comments with their respective id’s and corresponding subreddits. The test data, with no corresponding subreddits, was made up of 30000 reddit comments. This project was also part of a Kaggle Competition on which our team’s best accuracy came out to 58.55%. This was achieved using a Multiple Layer Perceptron model with a single hidden layer.

Kaggle Competition:

https://www.kaggle.com/c/reddit-comment-classification-comp-551/

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
code		code
README.md		README.md
miniproject2_spec.pdf		miniproject2_spec.pdf
report.pdf		report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reddit Comments Classification

Abstract

Kaggle Competition:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

shashankbm09/reddit-comment-classification

Folders and files

Latest commit

History

Repository files navigation

Reddit Comments Classification

Abstract

Kaggle Competition:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages