Skip to content

code for iclr workshop paper "Learning Document Embedding by Predicting N-grams for Sentiment Classification of Long Movie Reviews"

Notifications You must be signed in to change notification settings

libofang/DV-ngram

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 

Repository files navigation

Learning Document Embeddings by Predicting N-grams for Sentiment Classification of Long Movie Reviews

This code implements the DV-ngram model proposed in iclr 2016 workshop paper:

[Bofang Li, Tao Liu, Xiaoyong Du, Deyuan Zhang and Zhe Zhao - Learning Document Embedding by Predicting N-grams for Sentiment Classification of Long Movie Reviews] (http://arxiv.org/abs/1512.08183)

##Code

The original version of this code was written in C, built upon the code written by Grégoire Mesnil et al. Due to the strong interest in this work, we have rewritten the entire algorithm in Java for easier and more scalable use.

This code has been tested on Windows, but it should work on Linux, OSX or any other operation system with only a little changes about calling LibLinear.

Preprocessed data for IMDB dataset can be downloaded here: unigram, bigram, trigram

All parameters your may need to change are in src/NN/DV.java. Your can just run it with default setting to get around 92.14% accuracy on IMDB dataset.

Note that for DV-trigram, dictionary size is large, you may need around 20G RAM to run our code in this setting.

##Acknowledgements

We thank Grégoire Mesnil et al. for their implementation of Paragraph Vector. Both their code and iclr 2015 workshop paper have influenced us a lot.

About

code for iclr workshop paper "Learning Document Embedding by Predicting N-grams for Sentiment Classification of Long Movie Reviews"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published