Skip to content

nickvosk/sigir2020-query-resolution

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Query Resolution as Term Classification (QuReTeC)

This repository contains resources for the SIGIR 2020 paper:

Query Resolution for Conversational Search with Limited Supervision [pdf]

by N. Voskarides, D. Li, P. Ren, E. Kanoulas and M. de Rijke.

Code

Download Data in current folder.

Download and unzip Models in current folder.

Set up environment

conda create -n quretec python=3.5
source activate quretec
pip install -r requirements.txt

Train model

In this example we train QuReTeC using QuAC gold resolutions.

BASE_DIR=./models/  
DATA_DIR=./data/quac_canard/token_classification/

TRAIN_ON=train_gold_supervision
DEV_ON=dev_gold_supervision

MODEL_ID=XXX   # provide an ID for the model to be trained here.

python -m run_ner --task_name ner --bert_model bert-large-uncased --max_seq_length 300 --train_batch_size 4 --hidden_dropout_prob 0.4 --train_on $TRAIN_ON --DEV_ON $DEV_ON --do_train --data_dir $DATA_DIR

Generate output using trained model

In this example we use a trained model to generate output and perform intrinsic evaluation on the TREC CAsT 2019 test data.

BASE_DIR=./models/
# model trained on QuAC gold resolutions (as in paper)
MODEL_ID=191790_50  

DATA_DIR=./data/trec_cast_2019/token_classification/
DEV_ON=test_oracle_rewrite
 
python -m run_ner --task_name ner --do_eval --do_lower_case --data_dir $DATA_DIR --base_dir $BASE_DIR --dev_on $DEV_ON --model_id $MODEL_ID --no_cuda

...

[Token eval] P=76.6, R=80.3, F1=78.4

The above command generates the file: ./models/191790_50/eval_results_test_oracle_rewrite_epoch0.json

In order to generate the query file for retrieval:

MODEL_OUTPUT_FILE=./models/191790_50/eval_results_test_oracle_rewrite_epoch0.json
RAW_QUERY_FILE=./data/trec_cast_2019/cast2019_test_annotated.tsv
OUTPUT_FILE=query_file_quretec.txt
python -m generate_query_files_for_trained_model --model_output_file $MODEL_OUTPUT_FILE --raw_query_file $RAW_QUERY_FILE --dataset_name cast --output_file $OUTPUT_FILE

The above script assumes the same set of qids in model_output_file and raw_query_file.

Data

You can find the preprocessed data and the output of QuReTeC and the baselines here.

Cite

@inproceedings{voskarides-2020-query,
Author = {Voskarides, Nikos and Li, Dan and Ren, Pengjie and Kanoulas, Evangelos and de Rijke, Maarten},
Booktitle = {SIGIR 2020: 43rd international ACM SIGIR conference on Research and Development in Information Retrieval},
Month = {July},
Publisher = {ACM},
Title = {Query Resolution for Conversational Search with Limited Supervision},
Year = {2020}}

Questions

If you have any questions, please contact Nikos Voskarides

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages