CLEF2020-CheckThat! Task 1 on Check-Worthiness for Tweets

This repository contains the dataset, format checker, scorer and baselines for the CLEF2020-CheckThat! task 1. The task consists in ranking a stream of tweets according to their check-worthiness.

FCPD corpus for the CLEF-2020 LAB on "Automatic Identification and Verification of Claims"
Version 5.0: Jun 8th, 2020 (Data, Baseline, Test data Release)

This file contains the basic information regarding the CLEF2020-CheckThat! Task 1 on check-worthiness on tweets provided for the CLEF2020-CheckThat! Lab on "Automatic Identification and Verification of Claims". The current version (5.0, Jun 8th, 2020) corresponds to the release of the first batch of the training data set. The test set is released with the current version.

Table of contents:

CLEF2020-CheckThat! Task 1

Evaluation Results

You can find the results in this spreadsheet, https://tinyurl.com/y9sjooxo.

List of Versions

v1.0 [2020/03/22] - Training/Dev data. The training data for task 1 contains 488 annotated tweets and the dev data contains 150 annotated tweets.
v2.0 [2020/05/17] - Training/dev data with updated labels. The training dataset has been extended to 672 annotated tweets, whereas the dev dataset contains 150 tweets.
3.0 [2020/05/26] - Input test data released
4.0 [2020/06/03] - Fixed test data tweet JSON object's formatting
5.0 [2020/06/08] - Official test results and gold labels released"

Contents of the Repository

We provide the following files:

Main folder: data
- Subfolder: v1
  - training.tsv
    Contains the training tweets with claim worthiness labels from the first (outdated) version of the data
  - dev.tsv
    Contains the dev tweets with claim worthiness labels from the first (outdated) version of the data
- Subfolder: v2
  - training_v2.tsv
    Contains the training tweets with claim worthiness labels from the second (and latest) version of the data
  - dev_v2.tsv
    Contains the dev tweets with claim worthiness labels from the second (and latest) version of the data
  - training_v2.json
    Contains the twitter object for the tweets included in the v2 training dataset
  - dev_v2.json
    Contains the twitter object for the tweets included in the dev dataset Note: Not all tweets which are included in the training and dev sets have a corresponding twitter object
README.md
this file
Main folder: test-input
- test-input.zip
  File containing tweets used for testing with gold labels

Data Format

Input Dataset

The datasets are TAB separated text files. The text encoding is UTF-8. A row of the file has the following format:

topic_id tweet_id tweet_url tweet_text claim check_worthiness

Where:

topic_id: unique ID for the topic the tweet is about
tweet_id: Tweet ID for a given tweets given by Twitter
tweet_url: URL to the given tweet
tweet_text: content of the tweet
claim: 1 if the tweet is a claim; 0 otherwise
check_worthiness: 1 if the tweet is worth fact checking; 0 otherwise

Example:

covid-19 1235648554338791427 https://twitter.com/A6Asap/status/1235648554338791427 COVID-19 health advice⚠️ https://t.co/XsSAo52Smu 0 0
covid-19 1235287380292235264 https://twitter.com/ItsCeliaAu/status/1235287380292235264 There's not a single confirmed case of an Asian infected in NYC. Stop discriminating cause the virus definitely doesn't. #racist #coronavirus https://t.co/Wt1NPOuQdy 1 0
covid-19 1236020820947931136 https://twitter.com/ddale8/status/1236020820947931136 Epidemiologist Marc Lipsitch, director of Harvard's Center for Communicable Disease Dynamics: “In the US it is the opposite of contained.' https://t.co/IPAPagz4Vs 1 1
...

Note that the gold labels for the task are the ones in the check_worthiness column

Results File

For this task, the expected results file is a list of tweets with the estimated score for check-worthiness. Each row contains four tab-separated fields:

topic_id tweet_id score run_id

Where:

topic_id: unique ID for the topic the tweet is about given in the test dataset file
tweet_id: Tweet ID for a given tweets given by Twitter given in the test dataset file
score: score given by the participant's model about whether a claim is worth fact checking or not
run_id: string identifier used by participants.

Example:

covid-19 1235648554338791427 0.39 Model_1
covid-19 1235287380292235264 0.61 Model_1
covid-19 1236020820947931136 0.76 Model_1
...

Data Annotation Process

The annotation of the tweets were done according to the following guidelines.

We define a factuale claim as a claim that can be verified using factual, verifiable information such as statistics, specific examples, or personal testimony. For each tweet, we would label it as a claim or not based on that definition. Some positive examples include: stating a definition, mentioning quantity in the present or the past, etc. Some negative examples include: spersonal opinions and preferences.

If a tweet contains a factual claim, we determine if it is worth fact checking we try to answer 3 questions and based on their answer we determine of the tweet is worth fact checking. The questions we tried to answer were:

To what extent does the tweet appear to contain false information?
False information is news, stories or hoaxes created to deliberately misinform or deceive readers. To answer this question most often it is important to open the link to the tweet and to see if the tweet contains a link to an article from an reputable information source (e.g., Reuters, Associated Press, France Press, Aljazeera English, BBC) then the answer could be “Contains no false info”.
Will the tweet's claim have an effect on or be of interest to the general public?
In general, topics such as healthcare, political news and findings, and current events are of higher interest to the general public. If it has higher interest to the public then fact-checking such tweets is important.
To what extent does the tweet look weaponized, i.e., has the potential to do harm to the society or to person(s)/company(s)/product(s)?
This can be measured as the extent to which the tweet aims to and has the capacity to negatively affect the society as a whole or specific person(s), company(s), or product(s) or to spread rumours about them. If a tweet is harmful then it is worth checking it's validity.

Format checkers

The checker for the subtask is located in the format_checker module of the project. The format checker verifies that your generated results file complies with the expected format. To launch it run:

python3 format_checker/main.py --pred_file_path=<path_to_your_results_file>

Note that the checker can not verify whether the prediction file you submit contain all lines / claims), because it does not have access to the corresponding gold file.

The script used is adapted from the one for the CLEF2019 Check That! Lab Task 5 (check-worthiness for political debates).

Scorers

Launch the scorer for the task as follows:

python3 scorer/main.py --gold_file_path="<path_gold_file_1, path_to_gold_file_k>" --pred_file_path="<predictions_file_1, predictions_file_k>"

Both --gold_file_path and --pred_file_path take a single string that contains a comma separated list of file paths. The lists may be of arbitraty positive length (so even a single file path is OK) but their lengths must match.

<path_to_gold_file_n> is the path to the file containing the gold annotations for topic/batch n and <predictions_file_n> is the path to the corresponding file with participants' predictions for topic/batch n, which must follow the format, described in the 'Results File Format' section.

The scorer invokes the format checker for the task to verify the output is properly shaped. It also handles checking if the provided predictions file contains all lines / tweets from the gold one.

The script used is adapted from the one for the CLEF2019 Check That! Lab Task 5 (check-worthiness for political debates).

Evaluation metrics

The official evaluation measure is MAP, but we also report (P@5,10…,P@30).

Baseline

The baselines module contains a random and a simple ngram baseline for the task. To launch the baseline script you need to install packages dependencies found in requirement.txt using the following:

pip3 install -r requirements.txt

To launch the baseline script run the following:

python3 baselines/baselines.py

Both baselines will be trained on the training tweets from training.tsv and the performace of the model was was evaluated on the dev tweets from dev.tsv The performance of both baselines will be displayed:
Random Baseline AVGP: 0.34661954358047853
Ngram Baseline AVGP: 0.6926897425211712

The scripts used are adapted from the ones for the CLEF2019 Check That! Lab Task 5 (check-worthiness for political debates).

Licensing

These datasets are free for general research use.

Credits

Task 1 Organizers:

Alex Nikolov, Sofia University
Firoj Alam, Qatar Computing Research Institute, HBKU
Shaden Shaar, Qatar Computing Research Institute, HBKU
Giovanni Da San Martino, Qatar Computing Research Institute, HBKU
Preslav Nakov, Qatar Computing Research Institute, HBKU

Task website: https://sites.google.com/view/clef2020-checkthat/tasks/tasks-1-5-check-worthiness?authuser=0

Contact: clef-factcheck@googlegroups.com

Citation

You can find the overview paper on the CLEF2020-CheckThat! Lab in the papers papers, "Overview of CheckThat! 2020 --- Automatic Identification and Verification of Claims in Social Media" (see citation bellow) in this link, and "CheckThat! at CLEF 2020: Enabling the Automatic Identification and Verification of Claims in Social Media" (see citation bellow) in this link.

You can find CLEF2020-CheckThat! Task 1 details published in the paper "Overview of the CLEF-2020 CheckThat! Lab on Automatic Identification and Verification of Claims in Social Media: English tasks" (see citation bellow).

@InProceedings{clef-checkthat:2020,
 author = "Barr\'{o}n-Cede{\~n}o, Alberto and
    Elsayed, Tamer and
    Nakov, Preslav and
    {Da San Martino}, Giovanni and
    Hasanain, Maram and   
    Suwaileh, Reem and
    Haouari, Fatima and
    Babulkov, Nikolay and
    Hamdan, Bayan and
    Nikolov, Alex and   
    Shaar, Shaden and
    Ali, {Zien Sheikh}",
 title  = "{Overview of CheckThat! 2020} --- Automatic Identification and
Verification of Claims in Social Media",
 year = {2020},
 booktitle = "Proceedings of the 11th International Conference of the CLEF Association: Experimental IR Meets Multilinguality, Multimodality, and Interaction",
 series = {CLEF~'2020},
 address = {Thessaloniki, Greece},
 nopages="--",
}

@InProceedings{clef-checkthat-en:2020,
 author = "Shaar, Shaden and
    Nikolov, Alex and
    Babulkov, Nikolay and
    Alam, Firoj and  
    Barr\'{o}n-Cede{\~n}o, Alberto and
    Elsayed, Tamer and
    Hasanain, Maram and    
    Suwaileh, Reem and
    Haouari, Fatima and
    {Da San Martino}, Giovanni and
    Nakov, Preslav",
 title = "Overview of {CheckThat!} 2020 {E}nglish: Automatic Identification and Verification of Claims in Social Media",
  booktitle = "Working Notes of CLEF 2020---Conference and Labs of the Evaluation Forum",
  series = {CLEF~'2020},
  address = {Thessaloniki, Greece},
  year = {2020}
}

@InProceedings{CheckThat:ECIR2020,
  author    = {Alberto Barr{\'{o}}n{-}Cede{\~{n}}o and
               Tamer Elsayed and
               Preslav Nakov and
               Giovanni Da San Martino and
               Maram Hasanain and
               Reem Suwaileh and
               Fatima Haouari},
  title     = {CheckThat! at {CLEF} 2020: Enabling the Automatic Identification and Verification of Claims in Social Media},
    booktitle = {Proceedings of the 42nd European Conference on Information Retrieval},
    series = {ECIR~'19},
    pages = {499--507},
    address   = {Lisbon, Portugal},
    month     = {April},
    year      = {2020},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLEF2020-CheckThat! Task 1 on Check-Worthiness for Tweets

Evaluation Results

List of Versions

Contents of the Repository

Data Format

Input Dataset

Results File

Data Annotation Process

Format checkers

Scorers

Evaluation metrics

Baseline

Licensing

Credits

Citation

About

Releases

Packages

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
baselines		baselines
data		data
format_checker		format_checker
scorer		scorer
test-input		test-input
README.md		README.md
requirements.txt		requirements.txt

sshaar/clef2020-factchecking-task1

Folders and files

Latest commit

History

Repository files navigation

CLEF2020-CheckThat! Task 1 on Check-Worthiness for Tweets

Evaluation Results

List of Versions

Contents of the Repository

Data Format

Input Dataset

Results File

Data Annotation Process

Format checkers

Scorers

Evaluation metrics

Baseline

Licensing

Credits

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages