If you use our work, please cite our paper Prediction for the Newsroom: Which Articles Will Get the Most Comments? as follows:
@inproceedings{ambroselli2018prediction,
author = {Ambroselli, Carl and Risch, Julian and Krestel, Ralf and Loos, Andreas},
booktitle = {Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)},
pages = {193-199},
title = {Prediction for the Newsroom: Which Articles Will Get the Most Comments?},
year = {2018}
}
- Clone the repository:
git clone https://github.com/CarlAmbroselli/Popularity_Predictor
- Open repository:
cd Popularity_Predictor
- (optional) Checkout paper/napoles branch:
git checkout paper/napoles
- Install requirements:
pip install -r requirements.txt #might be done in a virtualenv
- Add
comments.csv
to thedata/datasets/YNACC-Evaluation/train/
anddata/datasets/YNACC-Evaluation/test
folder with the following format:
sdid,commentindex,headline,url,guid,commentid,timestamp,thumbs-up,thumbs-down,text,parentid,constructiveclass,sd_agreement,sd_type,sentiment,tone,commentagreement,topic,intendedaudience,persuasiveness,y_persuasive,y_audience,y_agreement_with_commenter,y_informative,y_mean,y_controversial,y_disagreement_with_commenter,y_off_topic_with_article,y_sentiment,y_sentiment_neutral,y_sentiment_positive,y_sentiment_negative,y_sentiment_mixed
- Add
data/datasets/YNACC/users.csv
file with the following format:
index,comment_count,threads_participated_in,threads_initiated,tu_received,td_received,commenting_rate
- Save word2vec model to
model/word2vec/GoogleNews-vectors-negative300.bin
- Download en spacy model:
python -m spacy download en
- Run using
python .