Skip to content

Codebase for the COLING paper "A Tale of Two Linkings: Dynamically Gating between Schema Linking and Structural Linking for Text-to-SQL Parsing".

Notifications You must be signed in to change notification settings

sanxing-chen/linking-tale

Repository files navigation

A Tale of Two Linkings: Dynamically Gating between Schema Linking and Structural Linking for Text-to-SQL Parsing

This repo implements the dynamic gating mechanism described in our COLING 2020 paper on top of a graph neural network-based Text-to-SQL parser. The implementation is built on top of this repository.

Installation

  1. Install pytorch version 1.5.0 that fits your CUDA version

  2. Install the rest of required packages

    pip install -r requirements.txt
    
  3. Run this command to install NLTK punkt.

python -c "import nltk; nltk.download('punkt')"
  1. Download the dataset from the official Spider dataset website

  2. Edit the config file train_configs/defaults.jsonnet to update the location of the dataset:

local dataset_path = "dataset/";
  1. Before preprocessing the dataset, modify two lines in allennlp lib, to replace self._tokenizer with _tokenizer. This change greatly reduces the size of cache data and memory usage. Also, change the number of processes in dataset_readers/spider.py according to your machine setting.

Training and Inference

Run the following command to train a new model with or without the dynamic gating mechanism.

python run.py [--gated]

First time loading of the dataset might take a while (a few hours) since the model first loads values from SQL tables and calculates similarity features with the relevant question. It will then be cached for subsequent runs.

Run the following command to generate model predictions.

python run.py <path> --mode eval

The predictions can be further evaluated by the official evaluation scripts of the Spider dataset.

Ablations and alternative approach studies can be performed by the following command.

python run.py <path> --mode train --gated --ablation <study_name>

Debugging

Refer to AllenNLP, use run.py for debugging.

BibTeX

@inproceedings{chen2020tale,
    title={A Tale of Two Linkings: Dynamically Gating between Schema Linking and Structural Linking for Text-to-SQL Parsing},
    author={Sanxing Chen and Aidan San and Xiaodong Liu and Yangfeng Ji},
    booktitle = "Proceedings of the 28th International Conference on Computational Linguistics",
    year={2020}
}

About

Codebase for the COLING paper "A Tale of Two Linkings: Dynamically Gating between Schema Linking and Structural Linking for Text-to-SQL Parsing".

Resources

Stars

Watchers

Forks

Releases

No releases published