Skip to content

levysouza/News-Table-Matching

Repository files navigation

Task Overview

Nowadays, digital-news understanding is often overwhelmed by the deluge of online information. One approach to cover this gap is to outline the news story by highlighting the most relevant facts. For example, recent studies summarize news articles by generating representative headlines. In this repository, we focus on news augmentation and argue news understanding can also be enhanced by surfacing contextual data relevant to the article, such as structured web tables. Specifically, our goal is to match news articles and web tables for news augmentation. For that, we introduce a novel BERT-based attention model to compute this matching degree. Through an extensive experimental evaluation over Wikipedia tables, we compare the performance of our model with standard IR techniques, document/sentence encoders and neural IR models for this task. Lastly, we also present the first news-table corpus from literature. By crawling Wikipedia pages, we collected 275,352 news articles and 298,792 web tables. In addition, our ground truth contains 93,818 matching pairs created by distant supervision strategies.

Dataset

Link to dataset: https://drive.google.com/drive/folders/1UBjHqpTvem9bxHDKnWvMAoBuNQ_N7Ad7?usp=sharing

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published