GitHub - rzhangpku/MFAE: Source code for SDM 2020 paper "What Do Questions Exactly Ask? MFAE: Duplicate Question Identification with Multi-Fusion Asking Emphasis"

What Do Questions Exactly Ask? MFAE: Duplicate Question Identification with Multi-Fusion Asking Emphasis

Paper Accepted by SDM 2020

Description

This repository includes the source code of the paper "What Do Questions Exactly Ask? MFAE: Duplicate Question Identification with Multi-Fusion Asking Emphasis". Please cite our paper when you use this program! 😍 This paper has been accepted to the conference "SIAM International Conference on Data Mining (SDM20)". The paper can be downloaded here.

@inproceedings{zhang2020questions,
  title={What Do Questions Exactly Ask? MFAE: Duplicate Question Identification with Multi-Fusion Asking Emphasis},
  author={Zhang, Rong and Zhou, Qifei and Wu, Bo and Li, Weiping and Mo, Tong},
  booktitle={Proceedings of the 2020 SIAM International Conference on Data Mining},
  pages={226--234},
  year={2020},
  organization={SIAM}
}

Model Overview

Requirements

python3

pip install -r requirements.txt

Datasets

The codes support four datasets for Duplicate Sentence Identification.

Duplicate Question Identification Datasets (DQI)

Quora Question Pairs
CQADupStack

Natural Language Inference Datasets (NLI)

SNLI
MultiNLI

Data Preprocessing

After the datasets have been downloaded, you can preprocess the data.

Preprocess the data by BERT

cd scripts/preprocessing
python process_quora_bert.py
python preprocess_cqadup_bert.py
python preprocess_snli_bert.py
python process_mnli_bert.py

Preprocess the data by ELMo

cd scripts/preprocessing
python process_quora.py
python preprocess_snli.py
python preprocess_mnli.py

Training

BERT as service

If you want to train models with BERT word embedding, please use the bert-as-service, and then run the following scripts.

Train all models

sh -x run.sh

Train with BERT

python bert_quora.py >> log/quora/quora_bert.log
python bert_cqadup.py >> log/cqadup/cqadup_bert.log
python bert_snli.py >> log/snli/snli_bert.log
python bert_mnli.py >> log/mnli/mnli_bert.log

Train with ELMo

python train_quora_elmo.py >> log/quora/quora_elmo.log
python train_snli_elmo.py >> log/snli/snli_elmo.log
python train_mnli_elmo.py >> log/mnli/mnli_elmo.log

Testing

After the models have been trained, you can test the models.

Test the models with BERT backbone

python test_bert_quora.py
python test_bert_cqadup.py
python test_bert_snli.py
python test_bert_mnli.py

Test the models with ELMo backbone

python test_elmo_quora.py
python test_elmo_snli.py
python test_elmo_mnli.py

Report Issues

Please let us know, if you encounter any problems.

The contact email is rzhangpku@pku.edu.cn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What Do Questions Exactly Ask? MFAE: Duplicate Question Identification with Multi-Fusion Asking Emphasis

Paper Accepted by SDM 2020

Description

Model Overview

Requirements

Datasets

Duplicate Question Identification Datasets (DQI)

Natural Language Inference Datasets (NLI)

Data Preprocessing

Preprocess the data by BERT

Preprocess the data by ELMo

Training

BERT as service

Train all models

Train with BERT

Train with ELMo

Testing

Test the models with BERT backbone

Test the models with ELMo backbone

Report Issues

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.idea		.idea
config		config
data/predict/quora		data/predict/quora
experiments		experiments
mfae		mfae
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bert_cqadup.py		bert_cqadup.py
bert_mnli.py		bert_mnli.py
bert_quora.py		bert_quora.py
bert_snli.py		bert_snli.py
elmo_mnli.py		elmo_mnli.py
elmo_quora.py		elmo_quora.py
elmo_snli.py		elmo_snli.py
plot_emphasis.ipynb		plot_emphasis.ipynb
requirements.txt		requirements.txt
run.sh		run.sh
test.ipynb		test.ipynb
test_bert_cqadup.py		test_bert_cqadup.py
test_bert_mnli.py		test_bert_mnli.py
test_bert_quora.py		test_bert_quora.py
test_bert_snli.py		test_bert_snli.py
test_elmo_mnli.py		test_elmo_mnli.py
test_elmo_quora.py		test_elmo_quora.py
test_elmo_snli.py		test_elmo_snli.py
utils_bert.py		utils_bert.py
utils_elmo.py		utils_elmo.py

License

rzhangpku/MFAE

Folders and files

Latest commit

History

Repository files navigation

What Do Questions Exactly Ask? MFAE: Duplicate Question Identification with Multi-Fusion Asking Emphasis

Paper Accepted by SDM 2020

Description

Model Overview

Requirements

Datasets

Duplicate Question Identification Datasets (DQI)

Natural Language Inference Datasets (NLI)

Data Preprocessing

Preprocess the data by BERT

Preprocess the data by ELMo

Training

BERT as service

Train all models

Train with BERT

Train with ELMo

Testing

Test the models with BERT backbone

Test the models with ELMo backbone

Report Issues

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages