A Unified Dual-view Model for Review Summarization and Sentiment Classification with Inconsistency Loss

This repository contains the source code for our SIGIR 2020 paper "A Unified Dual-view Model for Review Summarization and Sentiment Classification with Inconsistency Loss".

Important Notice: We found a bug in our evaluation function for the balanced accuracy score. We have updated the code and the results in our paper (Table 3) on arXiv. Please make sure that you are using the most updated version of our code. Nevertheless, our model still achieves higher balanced accuracy scores than the baselines after we fixed this bug. Our apologies for the inconvenience caused.

Some of our source code are adapted from https://github.com/ChenRocks/fast_abs_rl.

If you use this code, please cite our paper:

@inproceedings{DBLP:conf/sigir/ChanCK20,
  author    = {Hou Pong Chan and
               Wang Chen and
               Irwin King},
  title     = {A Unified Dual-view Model for Review Summarization and Sentiment Classification
               with Inconsistency Loss},
  booktitle = {Proceedings of {SIGIR} 2020, Virtual
               Event, China, July 25-30, 2020},
  pages     = {1191--1200},
  year      = {2020},
  url       = {https://arxiv.org/abs/2006.01592},
  doi       = {10.1145/3397271.3401039},
  biburl    = {https://dblp.org/rec/conf/sigir/ChanCK20.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Model Architecture

Dependencies

Python 3.6+
Pytorch 1.1
NLTK
pyrouge

Please refer to the requirements.txt for the full dependencies.

Datasets

We use the Sports and Outdoors, Toys and Games, Home and Kitchen, Movies and TV datasets from from 5-core subsets of the Amazon review corpus by Prof. Julian McAuley.

The followings are the URLs of our preprocessed versions of these datasets. https://www.dropbox.com/s/2b5ueuj6bqpfyvh/min_4_reviews_Sports_and_Outdoors_5.tar.gz?dl=0 https://www.dropbox.com/s/bf1im78iz0cvpta/min_4_Toys_and_Games_5.tar.gz?dl=0 https://www.dropbox.com/s/52wyjfz1xz4oeb4/min_4_Home_and_Kitchen_5.tar.gz?dl=0 https://www.dropbox.com/s/a17hub5m1osdukw/min_4_Movies_and_TV_5.tar.gz?dl=0

If you use these datasets, Please cite one or both of the following papers:

Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering
R. He, J. McAuley
WWW, 2016

Image-based recommendations on styles and substitutes
J. McAuley, C. Targett, J. Shi, A. van den Hengel
SIGIR, 2015

Training

Compute and export the class distribution on the training set, which will be used by the training script. python3 compute_rating_stat.py -data_dir datasets/processed_reviews_Sports_and_Outdoors_5 -split train
Script for training a word2vec embedding on the training set: python3 train_word2vec.py -data datasets/processed_reviews_Sports_and_Outdoors_5 -path word_embeddings/sport_and_outdoors -dim 128
Scripts for training our dual-view model with inconsistency loss

python3 train_ml.py \
-data=datasets/processed_reviews_Sports_and_Outdoors_5 \
-exp_path=exp/%s.%s \
-exp=train_movie_dual_view_inc_seed_250 \
-epochs=50 \
-checkpoint_interval=1000 \
-copy_attention \
-batch_size=32 \
-seed=250 \
-w2v=word_embeddings/sport_and_outdoors \
-v_size=50000 \
-word_vec_size=128 \
-encoder_size=256 \
-decoder_size=512 \
-enc_layers=2 \
-residual \
-model_type=multi_view_multi_task_basic \
-dropout=0.0 \
-dec_classify_input_type=dec_state \
-classifier_type=word_multi_hop_attn \
-dec_classifier_type=word_multi_hop_attn \
-gen_loss_weight=0.8 \
-class_loss_weight=0.1 \
-inconsistency_loss_type=KL_div \
-inconsistency_loss_weight=0.1 \
-early_stop_loss=joint \
-batch_workers 0

Predict

Download pyrouge, and save it to path/to/pyrouge. git clone https://github.com/andersjo/pyrouge.git
Export ROUGE score enviornment variable export ROUGE=[path/to/pyrouge/tools/ROUGE-1.5.5]
Make evaluation reference for a dataset (Only need to do it for once for each dataset) python make_eval_reference.py -data datasets/processed_reviews_Sports_and_Outdoors_5 -split all
Run predict, specify the path to the best checkpoint (lowest validation loss) in the -pretrained_model argument.

python predict.py \
-data datasets/processed_reviews_Sports_and_Outdoors_5 \
-pred_path pred/%s.%s \
-exp predict_dual_view_inc_seed_250 \
-pretrained_model saved_model/train_movie_dual_view_inc_seed_250.ml.copy.bi-directional.20191212-154843/ckpt/train_movie_dual_view_inc_seed_250.ml.copy.bi-directional-epoch-2-total_batch-75000-joint-2.640 \
-seed 9527 \
-batch_size 16 \
-replace_unk \
-src_max_len -1

Run evaluate prediction to compute ROGUE scores, macro F1, and balanced accuracy. The reported macro F1 and balanced accuracy are results from our source-view sentiment classifier. python evaluate_prediction.py -rouge -decode_dir pred/predict_dual_view_inc_seed_250.20190901-160022 -data datasets/processed_reviews_Sports_and_Outdoors_5

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
exp		exp
figs		figs
model		model
saved_model		saved_model
utils		utils
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
beam.py		beam.py
build_vocab.py		build_vocab.py
collect_evaluation_results.py		collect_evaluation_results.py
compute_rating_stat.py		compute_rating_stat.py
config.py		config.py
data.py		data.py
evaluate_prediction.py		evaluate_prediction.py
find_intro_case.py		find_intro_case.py
find_rating_vocab.py		find_rating_vocab.py
lower_case_script.py		lower_case_script.py
make_eval_reference.py		make_eval_reference.py
metric.py		metric.py
ml_pipeline.py		ml_pipeline.py
my_plotter.py		my_plotter.py
output_stat.py		output_stat.py
paired_t_test.py		paired_t_test.py
penalties.py		penalties.py
predict.py		predict.py
preprocess_nyt.py		preprocess_nyt.py
preprocess_raw_review.py		preprocess_raw_review.py
rating_vocab_statistics.py		rating_vocab_statistics.py
requirements.txt		requirements.txt
rl_pipeline.py		rl_pipeline.py
sentiment_word_stat.py		sentiment_word_stat.py
sequence_generator.py		sequence_generator.py
tokenize_raw_review.py		tokenize_raw_review.py
train_ml.py		train_ml.py
train_word2vec.py		train_word2vec.py
validation.py		validation.py
wchen_data_onmt_style.py		wchen_data_onmt_style.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Unified Dual-view Model for Review Summarization and Sentiment Classification with Inconsistency Loss

Model Architecture

Dependencies

Datasets

Training

Predict

About

Releases

Packages

Languages

License

kenchan0226/dual_view_review_sum

Folders and files

Latest commit

History

Repository files navigation

A Unified Dual-view Model for Review Summarization and Sentiment Classification with Inconsistency Loss

Model Architecture

Dependencies

Datasets

Training

Predict

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages