Text classification using Doc2Vec
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore
.travis.yml
LICENSE
README.md
dataset.csv Add Dataset Mar 24, 2018
model.py replace non-ASCII character \xe2 (#12) Feb 17, 2019
requirements.txt

README.md

Doc2Vec Text Classification

Text classification script using gensim Doc2Vec for paragraph embeddings and scikit-learn Logistic Regression for classification.

Dataset

25,000 IMDB movie reviews, specially selected for sentiment analysis. The sentiment of reviews is binary (1 for postive, 0 for negative).

This source dataset was collected in association with the following publication:

Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). "Learning Word Vectors for Sentiment Analysis." The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011).

Usage

  • Install the required tools

    sudo pip install -r requirements.txt

  • Run the script

    python model.py

References