Skip to content

Code for "Multi-Task Training with Hyperpartisian and Semantic Relation for Multi-Lingual News Article Similarity" for SemEval Task 8

Notifications You must be signed in to change notification settings

rdev12/Multilingual-News-Article-Similarity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multilingual News Article Similarity

License

Introduction

This repository is the code for Team Innovator's submission of SemEval 2022 Task 8 paper titled "Multi-Task Training with Hyperpartisian and Semantic Relation for Multi-Lingual News Article Similarity". The shared task emphasizes finding the similarity of multilingual news articles irrespective of the style of writing, political spin, tone, or any othermore subjective "design decision" imposed by a medium/outlet. We propose a pipeline consisting of text rank to filter irrelevant information followed by a multi-task approach which allows multiple sub-tasks to share the same encoder during training thereby facilitating knowledge transfer.

Data

The model is trained on multiple subtasks as outlined below. The results are evaluated on SemEval dataset found here.

SemEval Dataset

The SemEval dataset consists of a csv file with each row corresponding to a pair of article. For each article, url_lang, link and id is mentioned. Along with it, the similarity score across Geography, Entities, Time, Narrative, Style, Tone and Overall are mentioned. The final evaluation is done on the Overall similarity. The content of the news article is extracted using the script given here.

Subtask Dataset

Subtask Description Dataset
Semantic Textual Similarity Determine how semantically similar two pieces of text are. STS benchmark
Hyperpartisan detection Given a news article, decide whether it follows a hyperpartisan argumentation, i.e., whether it exhibits blind, prejudiced, or unreasoning allegiance to one party, faction, cause, or person. Hyperpartisan News Detection
Stance detection It involves estimating the relative perspective (or stance) of two pieces of text respective to a topic, claim or issue. Fake News Challenge - 1
Fake news inference detection Fake news Detection using the Natural Language Inference. This entails categorizing a piece of text into categories such as "pants-on-fire", "false", "barely true", "half-true", "mostly true", and "true." Fake News Inference Dataset
Paraphrase detection Determine whether a particular sentence is a paraphrase of the original text. Microsoft Research Paraphrase Corpus

The preprocessed version of the above datasets are available under dataset folder and some are used directly through hugging-face glue-dataset so there is no need to download the datasets.

Models

The models can be run locally by cloning the current repository or through google colab using the following links. The pearson score reported are for the validation dataset during training.

Model Pearson Score Link
Main Model: Multi-task Training 0.835 Open In Collab
Experiment 1: Multi-Objective Weighted Loss Training 0.811 Open In Collab
Experiment 2: Multi-task Training with Multilingual Text Rank 0.737 Open In Collab

Setup

  1. Clone the current repository and upload it in google drive
  2. Open the concerning notebook in the training module folder and enable GPU access
  3. Connect the notebook to your google drive. You can see the tutorial here
  4. Install the dependencies mentioned in the initial cells and the rest of the cells

Contributors

  1. Nidhir Bhavsar* (Navrachana University, Gujarat, India): nidbhavsar989@gmail.com
  2. Rishikesh Devanathan* (Indian Institute of Technology Patna, India) rishi.devanathan@gmail.com
  3. Aakash Bhatnagar* (Navrachana University, Gujarat, India): akashbharat.bhatnagar@gmail.com
  4. Tirthankar Ghosal (UFAL, MFFCharles University, Czech Republic): tghosal@acm.org
  5. Muskaan Singh (IDIAP Research Institute, Switzerland)

* denotes equal contribution

About

Code for "Multi-Task Training with Hyperpartisian and Semantic Relation for Multi-Lingual News Article Similarity" for SemEval Task 8

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published