What makes a useful Yelp review? Can we predict if a review will be useful based on its text content?
The goals of this project are to:
- predict the usefulness of Yelp reviews as a classification problem using machine learning models
- use topic modeling/decomposition to improve the accuracy of those models
- evaluate the effectiveness of the models by assessing the validity of the models' predictions
An indepth discussion of this project is found in the technical report.
All statistical analysis was done a t2.2xlarge AWS EC2 instance.
-
NLP: Spacy, Textacy, scikit-learn
-
Modeling: scikit-learn - Logistic Regression, Random Forest Classifier
-
Data Management: numpy, pandas, PostgreSQL