What is Sentiment Analysis? Put a few paragraphs here
In this notebook we are going to use the IMDB Review dataset compiled by Stanford (add a link here). This dataset has 50,000 reviews, half of which are used for training and the other half for testing. This is a binary classification problem where the classes are either 'positive' or 'negative'.
The IMDB dataset is has been downloaded from here and unzipped into the 'data' directory.
Note: Keras has a built-in function to access this database but we want to manually perform the preprocessing
This needs a proper write-up
Steps:
- get the data
- clean the data
- examine the data
- make a basic model
- parameter sweep
- more complicated models
- Word Embeddings - In Progress
- VADER (Valence Aware Dictionary and sEntiment Reasoner) - To Do
- Tokenization - Done
- Summarizer - To Do
- IMDB - Done
- Amazon Reviews - To Do
- New York Times Articles - To Do
- Interactive Terminal - To Do