🎬 machine learning on movie reviews with word embeddings, bag of words model & recurrent nets
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
.ipynb_checkpoints not sure why kernel is dying Apr 30, 2017
NLP on IMDB Dataset.ipynb



Word2Vec implementation in Python and using tensorflow, with application to learning word vectors on an IMDB movie-reviews dataset.

Investigates several models, including a bag of words model, a neural network that learns from clusters of word-vectors, and an RNN whose inputs are word-vectors.


make: Will run all of the models (as well as output accuracy reports), download the necessary data, and write the preprocessed data to persistent storage so future runs don't have to do preprocessing/redownloading.

make targets:

  • bag-of-words: runs the bag of words model and writes an accuracy report to file bag-of-words/bag-of-words-acc.txt

  • word-vectors: runs a Tensorflow model to learn word vectors from the review data, and outputs a picture of a TSNE word vector visualization to file tf-implementation/TSNE.png

  • cluster-model: runs a feature engineering algorithm that generates features based on the clustering of a review's word vectors (ie, the frequency of each word vectors in a particular cluster in the review) followed by a Tensorflow neural network model whose inputs are these clusters. An accuracy report is written to file tf-implementation/cluster-acc.txt.

  • rnn-model: runs a Tensorflow RNN model who's inputs are the learned word vectors made by the word-vectors target. An accuracy report is written to file tf-implementation/rnn-acc.txt.