Skip to content

sshaar/clef2020-factchecking-task2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CLEF2020-CheckThat! Task 2: Verified Claim Retrieval

This repository contains the data set, format checker, scorer and baselines for the CLEF2020-CheckThat! task 2.
The task, given an input claim and a set of already verified claims, consists in ranking the already verified claims such that the ones that verify the input claim, or a subclaim in it, are ranked on top.
The goal of the task is to build a tool to support journalists fact-checkers when trying to determine whether a claim has been already fact-checked. This task is part of the CLEF2020-CheckThat! lab. For more information about deadlines , updates and other related task visit the site of the lab.

FCPD corpus for the CLEF-2020 LAB on "Automatic Identification and Verification of Claims"
Jun 8th, 2020 (Test Data Released)

This file contains the basic information regarding the CLEF2020-CheckThat! Task 2 data set provided for the CLEF2020-CheckThat! Lab on "Automatic Identification and Verification of Claims".

The current version of the data is the release of the test data.

All changes and updates on these data sets and tools are reported in Section 1 of this document.

Table of contents:

Evaluation Results

You can find the results in this spreadsheet, https://tinyurl.com/y9sjooxo.

List of Versions

  • v1.0 [2020/03/20] - Version 1 of the training data: 626 Tweets and 518 already verified claims.
  • v2.0 [2020/03/29] - Version 2 of the training data: 1,003 Tweets and 784 already verified claims.
  • v3.0 [2020/05/11] - Version 3 of the training data: 1,003 Tweets and 10,373 already verified claims. Fixed some labels, in addition to extending the dataset.
  • Test [2020/05/26] - Release of test input: 200 Tweets to be matched against the 10,373 already verified claims released with version v.3.0 of the data.
  • v4.0 [2020/06/08] - Release the gold labels of test tweets.

Contents of the Repository

We provide the following files:

Data Format

The format used in the task is inspired from Text REtrieval Conference (TREC)'s campaigns for information retrieval (a description of the TREC format can be found here).

The data sets are TAB separated csv files. The text encoding for all files is UTF-8.

The data seta is separated into train and dev splits. They may be used as is or they can be combined and used with cross-validation. It isentirely upto the participants how the given train and dev data will be managed.

Already Verified Claims

All the verified claims that will be used for both training and test are found in file (data/verified_claims.qrels.tsv).

The file has the following format:

vclaim_id vclaim title

where

  • vclaim_id: unique ID of the verified claim
  • vclaim: text of the verified claim
  • title: title of the document fact checking the verified claim

Example:

vclaim_id vclaim title
2 "A ""law to separate families"" was enacted prior to April 2018, and the federal government is powerless not to enforce it." Was the ‘Law to Separate Families’ Passed in 1997 or ‘by Democrats’?
222 Former U.S. Vice President Joe Biden owns the largest mansion in his state. Does Joe Biden Own the Largest Mansion in His State?
503 "U.S. Sen. Bernie Sanders compared Baltimore to a ""third world country.""" Did U.S. Sen. Bernie Sanders Say Baltimore Was Like a ‘Third World Country’?
...

Note: Not all verified claims in the file have a corresponding tweet.

Queries file

TAB separated file with the input tweets. A row of the file has the following format

tweet_id tweet_content

where:

  • tweet_id: unique ID for a given tweet
  • tweet_content: text of the tweet

Example:

tweet_id tweet_content
8 im screaming. google featured a hoax article that claims Minecraft is being shut down in 2020 pic.twitter.com/ECRqyfc8mI — Makena Kelly (@kellymakena) January 2, 2020
335 BREAKING: Footage in Honduras giving cash 2 women & children 2 join the caravan & storm the US border @ election time. Soros? US-backed NGOs? Time to investigate the source! pic.twitter.com/5pEByiGkkN — Rep. Matt Gaetz (@RepMattGaetz) October 17, 2018
622 y’all really joked around so much that tide put their tide pods in plastic boxes…smh pic.twitter.com/Z44efALcX5 — ㅤnavid (@NavidHasan_) January 13, 2018
...

Note: tweet_id doesn't corresponds to the id the tweet has on the Twitter platform.

Qrels file

A TAB-separated file containing all the pairs of tweet and verified claims such that the verified claim (vclaim_id) proves the tweet (tweet_id).

tweet_id 0 vclaim_id relevance

where:

  • tweet_id: unique ID for a given tweet. Tweet details found in the queries file.
  • 0: literally 0 (this column is needed to comply with the TREC format).
  • vclaim_id: unique ID for a given verified claim. Details on the verified claim are in file data/verified_claims.qrels.tsv
  • relevance: 1 if the verified claim whose id is vclaim_id proves the tweet with id tweet_id; 0 otherwise.

Note: In the qrels file only pairs with relevance = 1 are reported. Relevance = 0 is assumed for all pairs not appearing in the qrels file.

Example:

tweet_id 0 vclaim_id relevance
422 0 92 1
538 0 454 1
221 0 12 1
137 0 504 1
...

Results File

Each row of the result file is related to a pair tweet and verified_claim and intuitively indicates the ranking of the verified claim with respect to the input tweet. Each row has the following format:

tweet_id 0 vclaim_id rank score tag

where

  • tweet_id: is ID of the tweet given in the tweet file
  • Q0: is not a meaningful column (it is needed to comply with the TREC format).
  • vclaim_id: is ID of the verified claim found in the verified claims file (data/verified_claims.qrels.tsv)
  • rank: is the rank of the pair given based on the scores of all possible pairs for a given tweet_id. (Not taken into account when calculating metrics. Always equal to 1)
  • score: is the score given by your model for the pair tweet_id and vclaim_id
  • tag: is a string identifier of the team.

For example:

tweet_id Q0 vclaim_id rank score tag
359 Q0 303 1 1.1086285 elastic
476 Q0 292 1 4.680018 elastic
35 Q0 373 1 5.631936 elastic
474 Q0 352 1 0.8830346 elastic
174 Q0 408 1 0.98045605 elastic
...

Your result file MUST have at most 1 unique pair of tweet_id and vclaim_id. You can skip pairs if you deem them not relevant.

Example Ranking

The following is an example ranking of verified claims for given tweet.

Let's take random tweet from the data set:

tweet_id: 251
tweet_content: A big scandal at @ABC News. They got caught using really gruesome FAKE footage of the Turks bombing in Syria. A real disgrace. Tomorrow they will ask softball questions to Sleepy Joe Biden’s son, Hunter, like why did Ukraine & China pay you millions when you knew nothing? Payoff? — Donald J. Trump (@realDonaldTrump) October 15, 2019

Using the content of the tweet, your model should give the highest score to the veriefied claim, that matches the tweet in the Qrels file. In this case the verified claim is:

vclaim_id: 115
vclaim: ABC News mistakenly aired a video from a Kentucky gun range during its coverage of Turkey's attack on northern Syria in October 2019.

Example of top 5 ranked verfied claims from the baseline model in this repository:

vclaim score
ABC News mistakenly aired a video from a Kentucky gun range during its coverage of Turkey's attack on northern Syria in October 2019. 21.218384
In a speech to U.S. military personnel, President Trump said if soldiers were real patriots, they wouldn't take a pay raise. 19.962847
Former President Barack Obama tweeted: "Ask Ukraine if they found my birth certificate." 19.414398
Mark Twain said, "Do not fear the enemy, for your enemy can only take your life. It is far better that you fear the media, for they will steal your HONOR." 16.810490
Dolly Parton wrote "Jolene" and "I Will Always Love You" in one day. 16.005116

Format checkers

The format checker verifies that the generated results file from your model complies with the expected format. To launch it run:

python3 lib/format_checker.py --model-prediction <path_to_your_results_file>

Note: The checker can't verify whether the prediction file you submit contain all lines/claims, because it does not have access to the corresponding gold file.

Note: The python files in this repo require a version of python that is at least 3.6.

Evaluation metrics and Scorers

The official metric for the task is Mean Average Precision (MAP), more specifically MAP@5. The scorer reports also R-Precision, Average Precision, Reciprocal Rank, Precision@k and means of these over all verified claims.

You can use these repos as reference for the evaluation, https://github.com/joaopalotti/trectools and https://github.com/usnistgov/trec_eval.

Before using the scorers or running the baseline, make sure you have all python packages in requirements.txt installed.

If you have pipenv installed, one way to do it is by using the following command:

pipenv install -r requirements.txt --skip-lock
pipenv shell

The scripts evaluate.py evaluates a submission. Example:

python3 evaluate.py -s <results-file> -g data/dev/tweet-vclaim-pairs.qrels

The results file contains the predictions of the model.

Note: The metric reciprocal_rank in the output of the evaluation script corresponds to Mean reciprocal rank.

Baseline

To use the Elastic Search baseline you need to have a locally running Elastic Search instance. You can follow this article for Elastic Search installation. You can then run elasticsearch using the following command:

/path/to/elasticsearch

Alternatively, if you have docker installed, you can run elasticsearch u:Wusing this command:

docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.6.1

Once you have Elastic Search running you can run the baseline script using the following:

python3 elastic_search_baseline.py --vclaims data/verified_claims.docs.tsv --tweets data/dev/tweets.queries.tsv --predict-file <results-file>

Licensing

These data sets are free for general research use.

Credits

Task Organizers:

  • Nikolay Babulkov, Sofia University

  • Shaden Shaar, Qatar Computing Research Institute, HBKU

  • Giovanni Da San Martino, Qatar Computing Research Institute, HBKU

  • Preslav Nakov, Qatar Computing Research Institute, HBKU

Task website: https://sites.google.com/view/clef2020-checkthat/tasks/task-2-claim-retrieval

Contact: clef-factcheck@googlegroups.com

Citation

You can find the overview paper on the CLEF2020-CheckThat! Lab in the papers papers, "Overview of CheckThat! 2020 --- Automatic Identification and Verification of Claims in Social Media" (see citation bellow) in this link, and "CheckThat! at CLEF 2020: Enabling the Automatic Identification and Verification of Claims in Social Media" (see citation bellow) in this link.

You can find CLEF2020-CheckThat! Task 2 details published in the paper "Overview of the CLEF-2020 CheckThat! Lab on Automatic Identification and Verification of Claims in Social Media: English tasks" (see citation bellow).

Further work on the task using the dataset released in the CLEF2020-CheckThat! Task 2 was published in "That is a Known Lie: Detecting Previously Fact-Checked Claims". You can find the paper in this link, https://arxiv.org/pdf/2005.06058.pdf.

@InProceedings{clef-checkthat:2020,
 author = "Barr\'{o}n-Cede{\~n}o, Alberto and
    Elsayed, Tamer and
    Nakov, Preslav and
    {Da San Martino}, Giovanni and
    Hasanain, Maram and   
    Suwaileh, Reem and
    Haouari, Fatima and
    Babulkov, Nikolay and
    Hamdan, Bayan and
    Nikolov, Alex and   
    Shaar, Shaden and
    Ali, {Zien Sheikh}",
 title  = "{Overview of CheckThat! 2020} --- Automatic Identification and
Verification of Claims in Social Media",
 year = {2020},
 booktitle = "Proceedings of the 11th International Conference of the CLEF Association: Experimental IR Meets Multilinguality, Multimodality, and Interaction",
 series = {CLEF~'2020},
 address = {Thessaloniki, Greece},
 nopages="--",
}

@InProceedings{clef-checkthat-en:2020,
 author = "Shaar, Shaden and
    Nikolov, Alex and
    Babulkov, Nikolay and
    Alam, Firoj and  
    Barr\'{o}n-Cede{\~n}o, Alberto and
    Elsayed, Tamer and
    Hasanain, Maram and    
    Suwaileh, Reem and
    Haouari, Fatima and
    {Da San Martino}, Giovanni and
    Nakov, Preslav",
 title = "Overview of {CheckThat!} 2020 {E}nglish: Automatic Identification and Verification of Claims in Social Media",
  booktitle = "Working Notes of CLEF 2020---Conference and Labs of the Evaluation Forum",
  series = {CLEF~'2020},
  address = {Thessaloniki, Greece},
  year = {2020}
}

@InProceedings{CheckThat:ECIR2020,
  author    = {Alberto Barr{\'{o}}n{-}Cede{\~{n}}o and
               Tamer Elsayed and
               Preslav Nakov and
               Giovanni Da San Martino and
               Maram Hasanain and
               Reem Suwaileh and
               Fatima Haouari},
  title     = {CheckThat! at {CLEF} 2020: Enabling the Automatic Identification and Verification of Claims in Social Media},
    booktitle = {Proceedings of the 42nd European Conference on Information Retrieval},
    series = {ECIR~'19},
    pages = {499--507},
    address   = {Lisbon, Portugal},
    month     = {April},
    year      = {2020},
}

@inproceedings{shaar-etal-2020-known,
    title = "That is a Known Lie: Detecting Previously Fact-Checked Claims",
    author = "Shaar, Shaden  and
      Babulkov, Nikolay  and
      Da San Martino, Giovanni  and
      Nakov, Preslav",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    series = {ACL~'20},
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.332",
    pages = "3607--3618",
}    

About

Contains data, format checker, scorer and baselines for the CLEF2020-CheckThat! Task 2.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages