Skip to content

ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description. We hope it can serve as a useful research benchmark for high-precision conditional text generation.

Notifications You must be signed in to change notification settings

google-research-datasets/ToTTo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

82 Commits
 
 

Repository files navigation

ToTTo Dataset

ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description.

During the dataset creation process, tables from English Wikipedia are matched with (noisy) descriptions. Each table cell mentioned in the description is highlighted and the descriptions are iteratively cleaned and corrected to faithfully reflect the content of the highlighted cells.

We hope this dataset can serve as a useful research benchmark for high-precision conditional text generation.

You can find more details, analyses, and baseline results in our paper. You can cite it as follows:

@inproceedings{parikh2020totto,
  title={{ToTTo}: A Controlled Table-To-Text Generation Dataset},
  author={Parikh, Ankur P and Wang, Xuezhi and Gehrmann, Sebastian and Faruqui, Manaal and Dhingra, Bhuwan and Yang, Diyi and Das, Dipanjan},
  booktitle={Proceedings of EMNLP},
  year={2020}
 }

Getting Started

Download the ToTTo data

The ToTTo dataset is released under the Creative Commons Share-Alike 3.0 license.

To download the data from the command line:

 wget https://storage.googleapis.com/totto-public/totto_data.zip
 unzip totto_data.zip

(or alternatively copy the above url into your browser address bar.)

Inside the totto_data directory you should see three files: totto_train_data.jsonl, totto_dev_data.jsonl, and unlabeled_totto_test_data.jsonl for the training, development, and unlabeled test sets respectively.

Download the evaluation scripts

You can find evaluation scripts and some exploratory processing scripts at this repository. It also includes a separate README file with instruction on how to run the evaluation.

Dataset Description

The ToTTo dataset consists of three .jsonl files, where each line is a JSON dictionary with the following format:

{
  "table_page_title": "'Weird Al' Yankovic",
  "table_webpage_url": "https://en.wikipedia.org/wiki/%22Weird_Al%22_Yankovic",
  "table_section_title": "Television",
  "table_section_text": "",
  "table": "[Described below]",
  "highlighted_cells": [[22, 2], [22, 3], [22, 0], [22, 1], [23, 3], [23, 1], [23, 0]],
  "example_id": 12345678912345678912,
  "sentence_annotations": [{"original_sentence": "In 2016, Al appeared in 2 episodes of BoJack Horseman as Mr. Peanutbutter's brother, Captain Peanutbutter, and was hired to voice the lead role in the 2016 Disney XD series Milo Murphy's Law.",
                  "sentence_after_deletion": "In 2016, Al appeared in 2 episodes of BoJack Horseman as Captain Peanutbutter, and was hired to the lead role in the 2016 series Milo Murphy's Law.",
                  "sentence_after_ambiguity": "In 2016, Al appeared in 2 episodes of BoJack Horseman as Captain Peanutbutter, and was hired for the lead role in the 2016 series Milo Murphy's 'Law.",
                  "final_sentence": "In 2016, Al appeared in 2 episodes of BoJack Horseman as Captain Peanutbutter and was hired for the lead role in the 2016 series Milo Murphy's Law."}],
}

The table field is a List[List[Dict]]. The outer lists represents rows and the inner lists columns. Each Dict has the fields column_span: int, is_header: bool, row_span: int, and value: str. The first two rows for the example above look as follows:

[
  [
    {    "column_span": 1,
         "is_header": true,
         "row_span": 1,
         "value": "Year"},
    {    "column_span": 1,
         "is_header": true,
         "row_span": 1,
         "value": "Title"},
    {    "column_span": 1,
         "is_header": true,
         "row_span": 1,
         "value": "Role"},
    {    "column_span": 1,
         "is_header": true,
         "row_span": 1,
         "value": "Notes"}
  ],
  [
    {    "column_span": 1,
         "is_header": false,
         "row_span": 1,
         "value": "1997"},
    {    "column_span": 1,
         "is_header": false,
         "row_span": 1,
         "value": "Eek! The Cat"},
    {    "column_span": 1,
         "is_header": false,
         "row_span": 1,
         "value": "Himself"},
    {    "column_span": 1,
         "is_header": false,
         "row_span": 1,
         "value": "Episode: 'The FugEektive'"}
  ], ...
]

-The table metadata consists of the table_page_title, table_section_title, and table_section_text strings to help give the model more context about the table.

-The highlighted_cells field is a List[[row_index, column_index]] where each [row_index, column_index] pair indicates that table[row_index][column_index] is highlighted.

-The example_id is simply a unique id for this example.

-The sentence_annotations field consists of the original sentence and the sequence of revised sentences performed in order to produce the final_sentence. See our paper for more details.

To help understand the dataset, you can find a sample of the train and dev sets in the sample/ folder of our supplementary repository. It additionally provides the create_table_to_text_html.py script that visualizes examples, the output of which you can also find in the sample/ folder.

Official Task

The official task described in our paper is given the table, highlighted cells, and table metadata (table_page_title, table_section_title, and table_section_text) as input, to generate the final_sentence.

Dev and Test Set

The dev and test set have between two and three references for each example, which are added to the list at the sentence_annotations key. The test set annotations are private and thus not included in the data.

If you want us to evaluate your model on the development or the private test set, please submit your files here. You can find more submission information below. By emailing us or by submitting prediction files, you consent to being contacted by Google about your submission, this dataset or any related competitions.

We provide two splits within the dev and test sets - one uses previously seen combinations of table headers and one uses unseen combinations. The sets are marked using the overlap_subset: bool flag that is added to the JSON representation. By filtering the evaluation to examples with the flag set to true, you will be able to test the generalization ability of your model.


Leaderboard

We are maintaining a leaderboard with official results on our test set.

The leaderboard indicates whether or not a model was trained on any auxiliary Wikipedia data. This is because our tables and (unrevised) test targets are from Wikipedia and thus we would like to study the effect of using additional Wikipedia data to train models.

We ask you to not incorporate any part of the ToTTo development set into the training data, and only use it for validation/hyperparameter tuning as development sets are typically used.

In addition to BLEU and PARENT, we also report a learnt metric BLEURT. The checkpoint used was BLEURT-base-128 which can be found here. To handle multiple references, we take the average of the scores as suggested by Sellam et al. 2020.

Overall Overlap Subset Non-Overlap Subset
Model Link Uses Wiki BLEU PARENT BLEURT BLEU PARENT BLEURT BLEU PARENT BLEURT
Diffusion-TT (MIPT, Sber) in preparation yes 47.1 56.9 0.191 55.4 61.9 0.333 38.5 51.9 .048
LATTICE [Wang et al. 2022] yes 48.4 58.1 0.222 56.1 62.4 0.345 40.4 53.9 .099
SKY in preparation yes 49.9 59.8 0.212 57.8 64.0 0.334 42.0 55.7 0.091
CoNT [An et al., 2022] yes 49.1 58.9 0.238 56.7 63.2 0.355 41.3 54.6 0.121
Supervised+NLPO [Ramamurthy et al. 2022] yes 47.4 59.6 0.192 55.0 64.3 0.315 39.2 55.0 0.068
Anonymous 3 in preparation yes 49.3 58.8 0.235 57.1 63.4 0.358 41.5 54.1 0.112
ProEdit Paper in preparation yes 48.6 59.18 0.202 55.9 63.3 0.325 41.3 55.1 0.078
Anonymous 2 Paper in preparation yes 49.4 59.0 0.253 57.0 62.9 0.370 41.7 55.1 0.136
PlanGen (University of Cambridge, Apple) [Su et al. 2021] yes 49.2 58.7 0.249 56.9 62.8 0.371 41.5 54.6 0.126
T5-based (Google) [Kale, 2020] yes 49.5 58.4 0.230 57.5 62.6 0.351 41.4 54.2 0.1079
BERT-to-BERT (Wiki+Books) [Rothe et al., 2019] yes 44.0 52.6 0.121 52.7 58.4 0.259 35.1 46.8 -0.017
BERT-to-BERT (Books) [Rothe et al., 2019] no 43.9 52.6 0.104 52.7 58.4 0.255 34.8 46.7 -0.046
Pointer Generator [See et al., 2017] no 41.6 51.6 0.076 50.6 58.0 0.244 32.2 45.2 -0.0922
Content Planner [Puduppully et al., 2019] no 19.2 29.2 -0.576 24.5 32.5 -0.491 13.9 25.8 -0.662

Leaderboard Submission

If you want to submit dev and test outputs, please format your predictions as a single .txt file with line-separated predictions. The predictions should be in the same order as the examples in the test.jsonl file. You can upload your prediction files here and email us at totto@google.com to tell us you have submitted. By emailing us or by submitting prediction files, you consent to being contacted by Google about your submission, this dataset or any related competitions.

About

ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description. We hope it can serve as a useful research benchmark for high-precision conditional text generation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published