Skip to content

isspek/Cross_Lingual_Checkworthy_Detection

Repository files navigation

Mitigating Cultural Differences for Identifying Multi-lingual Checkworthy Claims

software_image

This repository contains source codes of the submission to CLEF2021-CheckThat! Task 1. We propose language identification task as an auxilary task to mitigate cultural bias across the languages.

If you use our work, please use following bib entry:

@inproceedings{SchlichtEtAl:CLEF-2021,
title = {UPV at CheckThat! 2021: Mitigating Cultural Differences for Identifying Multilingual Check-worthy Claims},
author = {Ipek Baris Schlicht and Angel Felipe Magnossão de Paula and Paolo Rosso},
pages = {465--475},
url = {http://ceur-ws.org/Vol-2936/#paper-36},
crossref = {CLEF-2021},
}

Task 1: Check-Worthiness Estimation

This repository contains the dataset, format checker, scorer and baselines for the CLEF2021-CheckThat! Task 1. The task consists in ranking a stream of tweets according to their check-worthiness.

FCPD corpus for the CLEF-2021 LAB on "Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News"

This file contains the basic information regarding the CLEF2021-CheckThat! Task 1 on check-worthiness on tweets provided for the CLEF2021-CheckThat! Lab on "Automatic Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News". The current version are listed bellow corresponds to the release of the training and dev data sets.

Table of contents:

Evaluation Results

TBA

List of Versions

  • subtask-1a--bulgarian-v1.0 [2021/02/23] - Training/Dev data for subtask-1a--Bukgarian released. Containing 2594 tweets for train and 372 tweets for dev.
  • subtask-1b--english-v1.0 [2021/02/07] - Training/Dev data for subtask-1b--English released. Containing 50 debates/speeches for train and 10 debates/speeches for dev.
  • subtask-1a--english-v1.0 [2021/02/07] - Training/Dev data for subtask-1a--English released. Containing 822 tweets for train and 140 tweets for dev.
  • subtask-1a--spanish-v1.0 [2021/02/04] - Training/Dev data for subtask-1a--Spanish released. Containing 2496 tweets for train and 1248 tweets for dev.
  • subtask-1a--turkish-v1.0 [2021/03/04] - Training/Dev data for subtask-1a--Turkish released. Containing 1899 tweets for train and 388 tweets for dev.

Contents of the Task 1 Directory

We provide the following files:

  • Main folder: data

    • Subfolder: subtask-1A--english

      • data.zip
        Contains train data released with the tweets used, the tweets JSON objects, and the labels assigned. Also it contains dev data released with the tweets used, the tweets JSON objects, and the labels assigned.
    • Subfolder: subtask-1A--bulgarian

      • data.zip
        Contains train data released with the tweets used, the tweets JSON objects, and the labels assigned. Also it contains dev data released with the tweets used, the tweets JSON objects, and the labels assigned.
    • Subfolder: subtask-1A--spanish

      • data.zip
        Contains train data released with the tweets used, the tweets JSON objects, and the labels assigned. Also it contains dev data released with the tweets used, the tweets JSON objects, and the labels assigned.
    • Subfolder: subtask-1A--turkish

      • data.zip
        Contains train data released with the tweets used and the labels assigned. Also it contains dev data released with the tweets used and the labels assigned.

      NOTE: The data for the spanish language is released in a seperate directory. Kindly find the link here.

    • Subfolder: subtask-1B--english

      • data.zip
        Contains train data released the debates/speeches used, and the labels assigned. Also it contains dev data released the debates/speeches used, and the labels assigned.
  • Main folder: baselines
    Contains scripts provided for baseline models of the tasks

  • Main folder: formet_checker
    Contains scripts provided to check format of the submission file

  • Main folder: scorer
    Contains scripts provided to score output of the model when provided with label (i.e., dev set).

  • README.md
    This file!

Input Data Format

Subtask 1A: Check-Worthiness of Tweets

All languages (Arabic, Bulgarian, English, and Spanish) in subtask-1A have the same data format, which includes train and dev files.

For both Train and Dev datasets we provide 2 files each. We give a JSON file containing the tweet objects retrieved from the Twitter API, and a TSV file containing the tweets and it's annotations.

The datasets are TAB separated text files. The text encoding is UTF-8. A row of the file has the following format:

topic_id tweet_id tweet_url tweet_text claim check_worthiness

Where:

  • topic_id: unique ID for the topic the tweet is about
  • tweet_id: Tweet ID for a given tweets given by Twitter
  • tweet_url: URL to the given tweet
  • tweet_text: content of the tweet
  • claim: 1 if the tweet is a claim; 0 otherwise
  • check_worthiness: 1 if the tweet is worth fact checking; 0 otherwise

Example:

covid-19 1235648554338791427 https://twitter.com/A6Asap/status/1235648554338791427 COVID-19 health advice⚠️ https://t.co/XsSAo52Smu 0 0
covid-19 1235287380292235264 https://twitter.com/ItsCeliaAu/status/1235287380292235264 There's not a single confirmed case of an Asian infected in NYC. Stop discriminating cause the virus definitely doesn't. #racist #coronavirus https://t.co/Wt1NPOuQdy 1 0
covid-19 1236020820947931136 https://twitter.com/ddale8/status/1236020820947931136 Epidemiologist Marc Lipsitch, director of Harvard's Center for Communicable Disease Dynamics: “In the US it is the opposite of contained.' https://t.co/IPAPagz4Vs 1 1
...

Note that the gold labels for the task are the ones in the check_worthiness column

Subtask 1B: Check-Worthiness of Debates/Speeches

This task is only given in English. The input files are TAB-separated CSV files with four fields:

line_number speaker text label

Where:

  • line_number: the line number (starting from 1)
  • speaker: the person speaking (a candidate, the moderator, or "SYSTEM"; the latter is used for the audience reaction)
  • text: a sentence that the speaker said
  • label: 1 if this sentence is to be fact-checked, and 0 otherwise

The text encoding is UTF-8.

Example:

...
65 TRUMP So we're losing our good jobs, so many of them. 0
66 TRUMP When you look at what's happening in Mexico, a friend of mine who builds plants said it's the eighth wonder of the world. 0
67 TRUMP They're building some of the biggest plants anywhere in the world, some of the most sophisticated, some of the best plants. 0
68 TRUMP With the United States, as he said, not so much. 0
69 TRUMP So Ford is leaving. 1
70 TRUMP You see that, their small car division leaving. 1
71 TRUMP Thousands of jobs leaving Michigan, leaving Ohio. 1
72 TRUMP They're all leaving. 0
...

Output Data Format

Subtask 1A: Check-Worthiness of Tweets

All languages (Arabic, Bulgarian, English, Spanish and Turkish) in subtask-1A have the same data format, which includes submission files.

For this task, the expected results file is a list of tweets with the estimated score for check-worthiness. Each row contains four TAB separated fields:

topic_id tweet_id score run_id

Where:

  • topic_id: unique ID for the topic the tweet is about given in the test dataset file
  • tweet_id: Tweet ID for a given tweets given by Twitter given in the test dataset file
  • score: score given by the participant's model about whether a claim is worth fact checking or not
  • run_id: string identifier used by participants.

Example:

covid-19 1235648554338791427 0.39 Model_1
covid-19 1235287380292235264 0.61 Model_1
covid-19 1236020820947931136 0.76 Model_1
...

Subtask 1B: Check-Worthiness of Debates/Speeches

For this subtask, the expected results file is a list of claims with the estimated score for check-worthiness. Each row contains two tab-separated fields:

line_number score

Where line_number is the number of the claim in the debate and score is a number, indicating the priority of the claim for fact-checking. For example:

1 0.9056
2 0.6862
3 0.7665
4 0.9046
5 0.2598
6 0.6357
7 0.9049
8 0.8721
9 0.5729
10 0.1693
11 0.4115
...

Your result file MUST contain scores for all lines of the input file. Otherwise the scorer will return an error and no score will be computed.

Format Checkers

Subtask 1A: Check-Worthiness of Tweets

The checker for the subtask is located in the format_checker module of the project. To launch the baseline script you need to install packages dependencies found in requirement.txt using the following:

pip3 install -r requirements.txt

The format checker verifies that your generated results files complies with the expected format. To launch it run:

python3 format_checker/main.py --subtask=1a --pred-files-path=<path_to_result_file_1 path_to_result_file_2 ... path_to_result_file_n>

or

python3 format_checker/subtask_1a.py --pred-files-path=<path_to_result_file_1 path_to_result_file_2 ... path_to_result_file_n>

--pred-files-path take a single string that contains a space separated list of file paths. The lists may be of arbitrary positive length (so even a single file path is OK) but their lengths must match.

<path_to_result_file_n> is the path to the corresponding file with participants' predictions, which must follow the format, described in the Output Data Format section.

Note that the checker can not verify whether the prediction files you submit contain all lines / claims), because it does not have access to the corresponding gold file.

Subtask 1B: Check-Worthiness of Debates/Speeches

The checker for the subtask is located in the format_checker module of the project. To launch the baseline script you need to install packages dependencies found in requirement.txt using the following:

pip3 install -r requirements.txt

The format checker verifies that your generated results files complies with the expected format. To launch it run:

python3 format_checker/main.py --subtask=1b --pred-files-path=<path_to_result_file_1 path_to_result_file_2 ... path_to_result_file_n>

or

python3 format_checker/subtask_1b.py --pred-files-path=<path_to_result_file_1 path_to_result_file_2 ... path_to_result_file_n>

--pred-files-path take a single string that contains a space separated list of file paths. The lists may be of arbitrary positive length (so even a single file path is OK) but their lengths must match.

<path_to_result_file_n> is the path to the corresponding file with participants' predictions for debate n, which must follow the format, described in the Output Data Format section.

Note that the checker can not verify whether the prediction files you submit contain all lines / claims), because it does not have access to the corresponding gold file.

Scorers

Subtask 1A: Check-Worthiness of Tweets

The scorer for the subtask is located in the scorer module of the project. To launch the script you need to install packages dependencies found in requirement.txt using the following:

pip3 install -r requirements.txt

Launch the scorer for the subtask as follows:

python3 scorer/subtask_1a.py --gold-file-path=<path_gold_file> --pred-file-path=<predictions_file>

The scorer invokes the format checker for the subtask to verify the output is properly shaped. It also handles checking if the provided predictions file contains all lines/tweets from the gold one.

Subtask 1B: Check-Worthiness of Debates/Speeches

The scorer for the subtask is located in the scorer module of the project. To launch the script you need to install packages dependencies found in requirement.txt using the following:

pip3 install -r requirements.txt

Launch the scorer for the subtask as follows:

python3 scorer/subtask_1b.py --gold-files-path=<path_gold_file_1 path_gold_file_2 ... path_gold_file_n> --pred-files-path=<prediction_file_1 prediction_file_2 ... prediction_file_n>

Both --gold-files-path and --pred-files-path take a single string that contains a space separated list of file paths. The lists may be of arbitrary positive length (so even a single file path is OK) but their lengths must match.

<path_gold_file_n> is the path to the file containing the gold annotations for debate n and <predictions_file_n> is the path to the corresponding file with participants' predictions for debate n, which must follow the format, described in the Output Data Format section.

The scorer invokes the format checker for the task to verify the output is properly shaped. It also handles checking if the provided predictions file contains all lines / claims from the gold one.

Evaluation Metrics

Both subtasks {Subtask 1A: Check-Worthiness of Tweets, Subtask 1B: Check-Worthiness of Debates/Speeches} of all languages use the same evaluation metric. The subtasks are evaluated as ranking tasks. We will use Mean Average Precision (MAP) as the official evaluation measure and we will report reciprocal rank, and P@k for k∈ {1,3,5,10,20,30} as well.

Baselines

Subtask 1A: Check-Worthiness of Tweets

The baselines module contains a random and a simple ngram baseline for the task. To launch the baseline script you need to install packages dependencies found in requirement.txt using the following:

pip3 install -r requirements.txt

To launch the baseline script run the following:

python3 baselines/subtask_1a.py --train-file-path=<path_to_your_training_data> --test-file-path=<path_to_your_test_data_to_be_evaluated> --lang=<language_of_the_sutask_1a>

Both baselines will be trained on the training tweets and the performance of the model was was evaluated on the dev tweets. The MAP score of both baselines are as follows:

Model English Arabic Spanish Bulgarian
Random Baseline 0.4795 0.0806 0.2045
Ngram Baseline 0.5916 0.4122 0.4729

Subtask 1B: Check-Worthiness of Debates/Speeches

The baselines module contains a random and a simple ngram baseline for the task. To launch the baseline script you need to install packages dependencies found in requirement.txt using the following:

pip3 install -r requirements.txt

To launch the baseline script run the following:

python3 baselines/subtask_1b.py --train-files-path=<path_to_train_file_1 path_to_train_file_2 ... path_to_train_file_n> --test-files-path=<path_to_test_file_1 path_to_test_file_2 ... path_to_test_file_n>

Both baselines will be trained on the training debates/speeches and the performance of the model was was evaluated on the dev debates/speeches. The MAP score of both baselines are as follows:

Model English
Random Baseline 0.0352
Ngram Baseline 0.0707

Credits

Task 1 Organizers: TBA

Task website: https://sites.google.com/view/clef2021-checkthat/tasks/task-1-check-worthiness-estimation

Contact: clef-factcheck@googlegroups.com

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published