Movie Review Sentiment Analysis Benchmark
An iPython notebook that tests Graphify's feature extraction and selection algorithm as a logistic regression classifier. This classifier is benchmarked against Stanford's Large Movie Review Dataset and Cornell Movie Review Dataset.
The interactive notebooks are in the main folder.
~90%accuracy on Cornell Movie Review dataset.
~80%accuracy on Stanford Large Movie Review dataset.
- Features are extracted and learned using Java and Neo4j, and evaluated by building a logistic regression classifier on a weighted tf-idf feature vector.
Viewing the notebooks online
The content of the notebooks can be viewed online through nbviewer.ipython.org.
For a true interactive use of the notebooks you need to install Python, IPython (for notebooks) and the required libraries scikit-learn, matplotlib and numpy.
You can install everything at once using a complete scientific Python distribution. Two good ones are the Enthought Python distribution (EPD, free for academic use) or Python-(x, y) (free for everyone).
Just use your package manager, for example on ubuntu or debian, use
apt-get install python ipython python-matplotlib python-numpy python-sklearn.
You need to make sure to have at least IPython >= 0.11 installed. You can update using the programm
More tips on installing scikit-learn can be found on the scikit-learn website.
This repository was modeled off of tutorial_ml_gkbionics.