Table-to-Text generation via Logical Forms

This repository contains the code for the pipeline-based system for logical table-to-text generation.

Requirements

Python >=3.9
pip install -r requirements.txt

Training and inference were conducted on A100 GPU card.

Datasets are mostly taken from tabgenie package, which provides data-to-text datasets in a unified form.

Outputs

.json files with the outputs of intermediate steps and final results are in model_outputs folder.

Pre-trained models

Pre-trained models are available via Huggingface Hub:

Content selection: kategaranina/lt2t_content_selection
LF-to-text generation: kategaranina/lt2t_lf_to_text

Preprocessing and generation configurations are stored in inference_config folder in this repository and used during inference.

Inference and evaluation

For running inference on LogicNLG, download the original TabFact dataset from the original repository and put it in data/LogicNLG folder.

Inference and evaluation on test set of LogicNLG, using HF models and predefined processing and generation configurations:

python end_to_end_inference.py \
    --dataset logicnlg \
    --part test \
    --predictions-dir predictions_logicnlg_test \
    --batch-size 64 \
    --selection all

For further parametrization, refer to end_to_end_inference.py.

Training

For running training, Neptune account and credentials (NEPTUNE_ORG and NEPTUNE_API_TOKEN) are required.

By default, checkpoints are saved to TMP_DIR, if set, or to repository directory. The final model is saved to the repository directory, with predictions and metrics values inside the model folder.

For running training of our system steps, preprocessed tables are required. Run preprocessing with the following command:

python parse_dataset.py --dataset logic2text

The .pkl file with preprocessed data will be saved to data folder.

Examples

Baseline:

python train_baseline.py \
    --do-train \
    --do-predict \
    --dataset=logicnlg \
    --linearize-style=nl \
    --references=tabfact \
    --epochs=20 \
    --base-model=t5-base \
    --batch-size=16 \
    --num-beams=3

Content selection:

python train_content_selection.py \
    --do-train \
    --do-predict \
    --sources=main \
    --include-stats \
    --include-num-stats \
    --include-value \
    --epochs=20 \
    --base-model=t5-base \
    --batch-size=8 \
    --num-beams=1 \
    --do-sample \
    --top-k=50 \
    --n-generated=5

LF-to-text:

python train_lf_to_text.py \
    --do-train \
    --do-predict \
    --epochs=30 \
    --base-model=t5-base \
    --learning-rate=2e-5 \
    --batch-size=8 \
    --num-beams=3 \
    --n-generated=1

Contact

Please open an issue in case of any questions, requests, or comments.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
BLEC @ 57943e7		BLEC @ 57943e7
baseline		baseline
content_selection		content_selection
evaluation		evaluation
inference_configs		inference_configs
input_processing		input_processing
lf_to_text		lf_to_text
model_outputs/logicnlg		model_outputs/logicnlg
template_filling		template_filling
training_utils		training_utils
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE.txt		LICENSE.txt
README.md		README.md
config.py		config.py
end_to_end_inference.py		end_to_end_inference.py
evaluate_all_metrics.py		evaluate_all_metrics.py
parse_dataset.py		parse_dataset.py
requirements.txt		requirements.txt
train_baseline.py		train_baseline.py
train_content_selection.py		train_content_selection.py
train_lf_to_text.py		train_lf_to_text.py

License

kategerasimenko/LT2T

Folders and files

Latest commit

History

Repository files navigation

Table-to-Text generation via Logical Forms

Requirements

Outputs

Pre-trained models

Inference and evaluation

Training

Examples

Contact

About

Resources

License

Stars

Watchers

Forks

Languages