Doc2Vec Text Classification

Text classification model which uses gensim Doc2Vec for generating paragraph embeddings and scikit-learn Logistic Regression for classification.

Dataset

25,000 IMDB movie reviews, specially selected for sentiment analysis. The sentiment of reviews is binary (1 for postive, 0 for negative).

This source dataset was collected in association with the following publication:

Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). "Learning Word Vectors for Sentiment Analysis." The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011).

Usage

Install the required tools

pip install -r requirements.txt
Run the script

python text_classifier.py

References

Kaggle – Bag of Words Meets Bags of Popcorn (https://www.kaggle.com/c/word2vec-nlp-tutorial)
Gensim – Deep learning with paragraph2vec (https://radimrehurek.com/gensim/models/doc2vec.html)
Quoc Le and Tomas Mikolov. Distributed Representations of Sentences and Documents (https://arxiv.org/pdf/1405.4053v2.pdf)

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
classifiers		classifiers
data		data
models		models
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
text_classifier.py		text_classifier.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Doc2Vec Text Classification

Dataset

Usage

References

About

Releases

Packages

Contributors 5

Languages

License

ibrahimsharaf/doc2vec

Folders and files

Latest commit

History

Repository files navigation

Doc2Vec Text Classification

Dataset

Usage

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages