ICWSM2020

Citation

If you use our work, please cite our paper Top Comment or Flop Comment? Predicting and Explaining User Engagement in Online News Discussions as follows:

@inproceedings{risch2020top,
title = {Top Comment or Flop Comment? Predicting and Explaining User Engagement in Online News Discussions},
author = {Risch, Julian and Krestel, Ralf},
booktitle = {Proceedings of the International Conference on Web and Social Media (ICWSM)},
pages = {579-589},
year = {2020}
}

Acknowledgments

We would like to thank Johannes Filter, Cornelius Hagmeister and Thomas Kellermeier for their contribution to this project during the seminar Text Mining in Practice.

Implementation

The Python notebook train_and_evaluate_model.ipynb contains the source code for our experiments. In order to run the experiments, the following additional files are required:

guardian-300.bin fasttext word embeddings that we trained on 4.4 billion tokens from TheGuardian.com comments. They can be downloaded from here (5.5GB): Link
amazon-300.bin fasttext word embeddings that we trained on 7.6 billion tokens from Amazon.com reviews. They can be downloaded from here (4.8GB): Link
comments_top_and_bottom_upvotes_10percent.csv the top/flop 10% comments in the politics section with the largest/smallest relative number of upvotes received
comments_top_and_bottom_replies_10percent.csv the top/flop 10% comments in the politics section with the largest/smallest relative number of replies received
reviews_books_top_10percent.csv the top 10% book reviews on Amazon.com with the largest relative number of helpfulness upvotes received. The Amazon reviews dataset can be downloaded from here: Link
reviews_books_bottom_10percent.csv the flop 10% book reviews on Amazon.com with the smallest relative number of helpfulness upvotes received.

Dataset

This repository contains a python script create_dataset.py and four files comment_ids_* that list comment IDs. We provide a script to download a dataset of comments. The script accesses the Guardian’s Web API to download a predefined list of comments identified by their IDs.

The script takes comment IDs as input and retrieves the corresponding comments via the Guardian's API. An API key is required to access the API. You can register for a key by filling out this short form: https://bonobo.capi.gutools.co.uk/register/developer

In case the daily number of API calls is limited, the script stops when the limit is reached. If restarted, the script will continue from the point where it stopped.

The general dataset comprises four files (please see the paper for details):

comment_ids_replies_top.csv 3111 IDs, the top 10% comments in the politics section with the largest relative number of replies received
comment_ids_replies_flop.csv 3111 IDs, the flop 10% comments in the politics section with the smallest relative number of replies received
comment_ids_upvotes_top.csv 11081 IDs, the top 10% comments in the politics section with the largest relative number of upvotes received
comment_ids_upvotes_flop.csv 11081 IDs, the flop 10% comments in the politics section with the smallest relative number of upvotes received

We annotated a subset of the dataset. The subset includes only true positives (top comments correctly classified as such). The comment ids and labels are split into four files:

true_positives_comment_ids_replies_top.csv 335 IDs
true_positives_comment_upvotes_top.csv 1128 IDs
true_positives_labels_replies_top.csv 335 labels
true_positives_labels_upvotes_top.csv 1128 labels

Labels (please see the paper for details):

Question asking for an Explanation
Question asking for an Opinion
Question asking for a Fact
Information in form of a Correction
Information in form of a Personal Story
Information in form of a Fact
Consent referring to an Article
Dissent referring to an Article
Consent referring to a Comment
Dissent referring to a Comment
Suggestion
Speculation about the Future
Speculation about Reasons
Joke/Humor

Video Presentation

A short video about the paper is on Youtube.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ICWSM2020

Citation

Acknowledgments

Implementation

Dataset

Video Presentation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
README.md		README.md
comment_ids_replies_flop.csv		comment_ids_replies_flop.csv
comment_ids_replies_top.csv		comment_ids_replies_top.csv
comment_ids_upvotes_flop.csv		comment_ids_upvotes_flop.csv
comment_ids_upvotes_top.csv		comment_ids_upvotes_top.csv
create_dataset.py		create_dataset.py
train_and_evaluate_model.ipynb		train_and_evaluate_model.ipynb
true_positives_comment_ids_replies_top.csv		true_positives_comment_ids_replies_top.csv
true_positives_comment_ids_upvotes_top.csv		true_positives_comment_ids_upvotes_top.csv
true_positives_labels_replies_top.csv		true_positives_labels_replies_top.csv
true_positives_labels_upvotes_top.csv		true_positives_labels_upvotes_top.csv

julian-risch/ICWSM2020

Folders and files

Latest commit

History

Repository files navigation

ICWSM2020

Citation

Acknowledgments

Implementation

Dataset

Video Presentation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages