GitHub - nghoanglong/seq2seq-baseline-text2SQL: The sequence-to-sequence model for testing evaluation methods in Text-to-SQL task

Seq2Seq Baselines

This is a documentation for the sequence-to-sequence model on 2 datasets (Spider, ViText2SQL) with 3 methods (basic, attention, attention+copy) each. With mask and tune with glove embedding.

Dependency

This code is based on Google seq2seq v0.1 and the code by Catherine Finegan-Dollak.

Python 2.7.14 :: Anaconda custom (64-bit)
tensorflow, gpu: 1.5.0
numpy: 1.13.3
matplotlib: 2.1.0

Folder Structure

bin/ contains entry point to the model including train.py and infer.py, and other tool codes
experimental_configs/ contains experimental configurations in yaml format
data/ contains folder pre_process, folder glove, folder datasets folder containing 2 datasets.
seq2seq/ contains the main model code
config_builder.py helps make new model directory and write configurations into bash files

1. Configuration

1.1 Prepare Data

Download glove.6B.100d.txt in data/glove/.

Put the original data in the folder

data/datasets/data             
data/datasets/data_radn_split

Then we should generate data to folder

data/datasets/data_processed
data/datasets/data_radn_split_processed

by changing the infiles and prefix variables in data/pre_process/utils.py, and use data/pre_process/generate_vocab.py to generate processed data.

1.2 Configuration yaml

We run 6 experiments. The configuration yaml files are in experimental_configs folder:

attn_copying_tune_data_radn_split.yaml # attention + copy, data_radn_split
attn_tune_data.yaml # attention, data
attn_copying_tune_data.yaml # attention + copy, data
basic_tune_data_radn_split.yaml # basic, data_radn_split
attn_tune_data_radn_split.yaml # attention, data_radn_split
basic_tune_data.yaml # basic, data

In the configuration yaml files, change the data directotries:

data_directories: data/datasets/data_processed/
embedding.file: data/glove/glove.6B.100d.txt

1.3 Build model folder

Use config_builder.py to generate model folder with configuration and bash files using configs in experimental_configs/:

python config_builder.py [configuration_yaml_file]

Then we will get 6 model folders

InputAttentionCopyingSeq2Seq_tune_model_data/
InputAttentionCopyingSeq2Seq_tune_model_data_radn_split/
BasicSeq2Seq_tune_model_data/
BasicSeq2Seq_tune_model_data_radn_split/
AttentionSeq2Seq_tune_model_data/
AttentionSeq2Seq_tune_model_data_radn_split/

2. Run Experiments

For training:

./[model_folder]/experiment.sh

For testing:

./[model_folder]/experiment_infer.sh

with the following outputs:

train: [model_folder]/output_train.txt
dev: [model_folder]/output.txt
test: [model_folder]/output_test.txt

3. Evaluation

Check output_[data_split].txt and keep all lines relevant for comparing

python evaluation.py --gold [gold file] --pred [predicted file] --etype [evaluation type] --db [database dir] --table [table file]

arguments:
  [gold file]        gold.sql file where each line is `a gold SQL \t db_id`
  [predicted file]   predicted sql file where each line is a predicted SQL
  [evaluation type]  "match" for exact set matching score, "exec" for execution score, and "all" for both
  [database dir]     directory which contains sub-directories where each SQLite3 database is stored
  [table file]       table.json file which includes foreign key info of each database

Contact

Dongxu Wang, Rui Zhang

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
bin		bin
data/pre_process		data/pre_process
experimental_configs		experimental_configs
seq2seq		seq2seq
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
config_builder.py		config_builder.py
evaluation.py		evaluation.py
experiments_to_review.txt		experiments_to_review.txt
process_sql.py		process_sql.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Seq2Seq Baselines

Dependency

Folder Structure

1. Configuration

1.1 Prepare Data

1.2 Configuration yaml

1.3 Build model folder

2. Run Experiments

3. Evaluation

Contact

About

Releases

Packages

Languages

License

nghoanglong/seq2seq-baseline-text2SQL

Folders and files

Latest commit

History

Repository files navigation

Seq2Seq Baselines

Dependency

Folder Structure

1. Configuration

1.1 Prepare Data

1.2 Configuration yaml

1.3 Build model folder

2. Run Experiments

3. Evaluation

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages