In the Capstone-II, we choose the IMDB datasets which has 50K reviews(Download).
- First, we will pre-process the reviews by removing stopwords, HTML codes, etc.
- Then, we will do some feature engineering like convert the review to TF, TF-DF, and sentiment scores.
- Next, we visualize the data by using LDA and WordClouds.
- Last, we will build a few models:
- LSTM
- Naive Bayes Classifier
- Support Vector Machine
- Random Forest
- Logistic Regression