Skip to content

mmicatka/sentiment_analysis

Repository files navigation

Sentiment Analysis

What is Sentiment Analysis? Put a few paragraphs here

Overview

In this notebook we are going to use the IMDB Review dataset compiled by Stanford (add a link here). This dataset has 50,000 reviews, half of which are used for training and the other half for testing. This is a binary classification problem where the classes are either 'positive' or 'negative'.

The IMDB dataset is has been downloaded from here and unzipped into the 'data' directory.

Note: Keras has a built-in function to access this database but we want to manually perform the preprocessing

Layout of this Project

This needs a proper write-up

Steps:

  1. get the data
  2. clean the data
  3. examine the data
  4. make a basic model
  5. parameter sweep
  6. more complicated models

Code Structure

Roadmap

Sentiment Analysis Techniques

  1. Word Embeddings - In Progress
  2. VADER (Valence Aware Dictionary and sEntiment Reasoner) - To Do

Preprocessing Techniques

  1. Tokenization - Done
  2. Summarizer - To Do

Datasets

  1. IMDB - Done
  2. Amazon Reviews - To Do
  3. New York Times Articles - To Do

Other

  1. Interactive Terminal - To Do

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published