This is a repository for the NAACL 2021 paper:
A Disentangled Adversarial Neural Topic Model for Separating Opinions from Plots in User Reviews
This repository provides:
- a PyTorch implementation of the DIATOM core architecture;
- an extract of the annotated sentences from the MOBO dataset.
The MOvie and BOok reviews dataset is a collection made up of movie and book reviews, paired with their related plots. The reviews come from different publicly available datasets: the Stanford's IMDB movie reviews [1], the GoodReads [2] and the Amazon reviews dataset [3].
With the help of 15 annotators, we further labeled more than 18,000 reviews' sentences (~6000 per corpus), marking the sentence polarity (Positive, Negative), or whether a sentence describes its corresponding movie/book Plot, or none of the above (None).
In the dataset
folder, we have shared an excerpt of the annotated sentences for each dataset.
Further details on the data annotation process and inter-annotator agreement are available in the paper.
[1]: Learning word vectors for sentiment analysis, Maas et al., ACL11
[2]: Fine-grained spoiler detection from large-scale review corpora, Wan et al., ACL19
[3]: Image-based recommendations on styles and substitutes, McAuley et al., SIGIR15
[4]: MPST: A corpus of movie plot synopses with tags, Kar et al., LREC18
Statistics | IMDB | GoodReads | Amazon |
---|---|---|---|
Number of Plots | 1,131 | 150 | 100 |
Number of Reviews | 25,836 | 83,852 | 32,375 |
% Pos. reviews | 0.46 | 0.33 | 0.32 |
% Neg. reviews | 0.54 | 0.50 | 0.46 |
% Neu. reviews | 0.00 | 0.17 | 0.22 |
Training set | 20,317 | 65,816 | 25,883 |
Development set | 2,965 | 9,007 | 3,275 |
Test set | 2,554 | 9,029 | 3,217 |
Number of annotated sent. | 6,000 | 6,000 | 6,000 |
- Python 3.x
- PyTorch >= 1.6.0
- Gensim
- SentenceBERT
- Spacy
- tqdm
Current repository structure
./
diatom
: Core architecture of the DIATOM modelmobo_dataset
: Extract of the annotated sentences from the MOBO dataset
./diatom/
main_adv_vae.py
: Main file for training and test proceduresadversarial_vae_model.py
: DIATOM architecture and functionsvae_avitm_paper.py
: Basic VAE components adopted in DIATOMsentiment_classifier.py
: Basic classifier used in the adversarial mechanismtopic_class.py
: Auxiliary topic class
./mobo_dataset/
Amazon_annotated_sentences_excerpt.json
GoodReads_annotated_sentences_excerpt.json
IMDB_annotated_sentences_excerpt.json