Skip to content

Code for the ACL 2022 paper "Online Semantic Parsing for Latency Reduction in Task-Oriented Dialogue," Jiawei Zhou, Jason Eisner, Michael Newman, Emmanouil Antonios Platanios, and Sam Thomson.

License

microsoft/online-semantic-parsing-for-latency-reduction

Repository files navigation

Online Semantic Parsing

License: MIT Paper

Online semantic parsing for latency reduction in task-oriented dialogue.

Set up the online semantic parsing (OSP) task, formulate the Final Latency Reduction (FLR) evaluation metric, propose the graph-based program prediction model, and develop general frameworks to parse and execute utterances in an online fashion, exploring the balance between latency reduction and program accuracy, as described in the paper Online Semantic Parsing for Latency Reduction in Task-Oriented Dialogue (ACL 2022). If you use any source code or data included in this repository in your work, please cite the following paper:

@inproceedings{zhou-etal-2022-online,
    title = "Online Semantic Parsing for Latency Reduction in Task-Oriented Dialogue",
    author = "Zhou, Jiawei  and
      Eisner, Jason  and
      Newman, Michael  and
      Platanios, Emmanouil Antonios  and
      Thomson, Sam",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-long.110",
    doi = "10.18653/v1/2022.acl-long.110",
    pages = "1554--1576",
    abstract = "Standard conversational semantic parsing maps a complete user utterance into an executable program, after which the program is executed to respond to the user. This could be slow when the program contains expensive function calls. We investigate the opportunity to reduce latency by predicting and executing function calls while the user is still speaking. We introduce the task of online semantic parsing for this purpose, with a formal latency reduction metric inspired by simultaneous machine translation. We propose a general framework with first a learned prefix-to-program prediction module, and then a simple yet effective thresholding heuristic for subprogram selection for early execution. Experiments on the SMCalFlow and TreeDST datasets show our approach achieves large latency reduction with good parsing quality, with a 30{\%}{--}65{\%} latency reduction depending on function execution time and allowed cost.",
}

Getting Started

Install

Assuming Conda is the default environment management tool, run

bash install_with_conda.sh

This will set up a local conda environment in the current directory named ./cenv, with all necessary packages installed. Activate the environment with conda activate ./cenv or source activate ./cenv when you need to run the code interactively.

Code Structure

  • configs*/: configuration files for data processing, model architectures, and training setups
  • run/: scripts to build offline full-to-graph models for program/graph generation
  • run_prefix/: scripts to run experiments for prefix-to-graph model for online semantic parsing
  • run_utter_dlm/: scripts to run experiments for LM-complete pipeline: de-noising language model + full-to-graph
  • src/: core code for data structures, parsing state machine, and the action-pointer Transformer model architectures

The core program/graph generation models are developed/extended from the Action-Pointer Tranformer (APT) of a transition-based parser, and our PyTorch implementation is formulated as an extension module to the Fairseq library.

Data Download

The default root directories:

  • for all raw/processed data -> ../DATA (accessed through $DATADIR)
  • for all saved models/results -> ../SAVE (accessed through $SAVEDIR).

These are set (or can be changed) in set_default_dirs.sh.

Download the SMCalFlow2.0 data and TreeDST data from here, and decompress them in $DATADIR/smcalflow2.0/ and $DATADIR/treedst/, respectively.

For quick testing of a small sample dataset, run bash scripts/create_sample_dataset.sh. This will create a sample dataset of 100 training examples and 50 validation examples from SMCalFlow2.0 data.

Running Models & Evaluation

All of the commands take the following format

# run interactively, output logs in the console
bash run*/xxx.sh configs*/config_*/config_xxx.sh [seed, default=42]

or

# send the whole process to the background
bash run*/jbash_xxx.sh configs*/config_*/config_xxx.sh [seed] [gpu_id]

where all the data/model/optimization/decoding setups are passed through a config_xxx.sh configuration file, and all detailed logs are saved in files.

Offline Parsing: Train a Program/Graph Prediction Model


Run the following script

# GPU device must be specified if you are working on a multi-GPU machine
CUDA_VISIBLE_DEVICES=0 bash run/run_model_gap.sh configs/config_gap_data_order-top_src-ct1-npwa_tgt-cp-str_roberta-large-top24_ep50/config_model_gap_ptr-lay6-h1_lr0.0005-bsz4096x4-wm4k-dp0.2.sh

This would include data processing (use run/run_data.sh for processing the data alone), training the model, and decoding for evaluation.

If you want to send the whole process to the background and specify the GPU device, run

# random seed 42, and GPU ID 1 for example
bash run/jbash_run_model_gap.sh configs/config_gap_data_order-top_src-ct1-npwa_tgt-cp-str_roberta-large-top24_ep50/config_model_gap_ptr-lay6-h1_lr0.0005-bsz4096x4-wm4k-dp0.2.sh 42 1

The configuration files in configs/ are default to SMCalFlow2.0, whereas configurations for TreeDST are in configs_treedst/.

Note: If you are working with no Internet connection and running into the following Fairseq error,

requests.exceptions.ProxyError: HTTPSConnectionPool(host='dl.fbaipublicfiles.com', port=443): Max retries exceeded with url: /fairseq/gpt2_bpe/encoder.json

but you have previously put the necessary files in the default cache directory, check the fix here.

Trained Checkpoints

We provide trained offline parsing models for SMCalFlow2.0 and TreeDST data with top-down generation order tested with beam 1 decoding

Dataset Valid Exact Match Model
SMCalFlow2.0 80.7 Download
TreeDST 90.8 Download

Online Parsing: Train a Prefix-to-Graph Model


Run the following script

# GPU device must be specified if you are working on a multi-GPU machine
CUDA_VISIBLE_DEVICES=0 bash run_prefix/run_model_gap_prefix.sh configs/config_gap_data_prefix-last-8ps_order-top_src-ct1-npwa_tgt-cp-str_roberta-large-top24/config_model_gap_ptr-lay6-h1_lr0.0001-bsz4096x4-wm4k-dp0.2.sh

This would create data consisting of utterance prefixes of last 8 percentages of relative lengths (>= 30%) for training, with full programs masking unseen entities as the targets (use run_prefix/run_data_xxx.sh for prefix data processing alone), train the model, and evaluate on all prefix lengths using the validation set.

If you want to send the whole process to the background, use run_prefix/jbash_run_model_gap_prefix.sh similarly as in training offline models. For TreeDST dataset, run with configs_treedst/ configurations.

Note: if you are running the online parser directly from scratch without first running the offline parser, you need to process the offline data first, by running the following script with corresponding data configuration file

# taking the SMCalFlow data for top-down generation as an example
CUDA_VISIBLE_DEVICES=0 bash run/run_data.sh configs/config_data/config_data_order-top_src-ct1-npwa_tgt-cp-str_roberta-large-top24.sh

Trained Checkpoints

We provide trained Prefix-to-Graph models from the above configuration below

Dataset Online Parsing Model
SMCalFlow2.0 Download
TreeDST Download

Online Parsing: Utterance LM-Complete Model from BART


Run the following script

# GPU device must be specified if you are working on a multi-GPU machine
CUDA_VISIBLE_DEVICES=0 bash run_utter_dlm/run_model_utter-dlm.sh configs/config_utter-dlm_src-ct1-npwa/config_dlm_bart-large_data_utter_abs-prefix-all_lr0.00003-scheduler-is-bsz2048x4-wm500-dp0.1-cn0.1_ep1.sh

For TreeDST dataset, run with configs_treedst/ configurations.

Evaluation: Final Latency Reduction (FLR)


Take the Prefix-to-Graph model as an example, assume we want the following execution simulations

  • constant utterance token speaking time
  • time measured by the number of source tokens
  • iterate over different constant execution times
  • for each execution time, iterate over different subgraph selection thresholds to get the FLR-cost tradeoff curve

Run the following command (assuming the prefix-to-graph model decoding results are in place)

bash run_prefix/eval_valid-prefix-allt_latency.sh

For other utterance speaking and execution timing models, we can run similarly

  • bash run_prefix/eval_valid-prefix-allt_latency_wr-char-linear.sh: time measured by milliseconds, word speaking model is a character length linear model
  • bash run_prefix/eval_valid-prefix-allt_latency_wr-real-voice.sh: time measured by milliseconds, word speaking model fit from real voice ASR data

For running FLR evaluation with different models and/or on different datasets, check run_prefix/jbash_eval_latency_many*.sh.

For running FLR evaluation with LM-Complete + Full-to-Graph model, run scripts in run_utter_dlm/ with similar names as above.

About

Code for the ACL 2022 paper "Online Semantic Parsing for Latency Reduction in Task-Oriented Dialogue," Jiawei Zhou, Jason Eisner, Michael Newman, Emmanouil Antonios Platanios, and Sam Thomson.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published