Ruijie Wang, Zhiruo Zhang, Luca Rossetto, Florian Ruosch, and Abraham Bernstein
🏆 Winner of the DBLP_QuAD KGQA Task - Scholarly QALD Challenge at The 22nd International Semantic Web Conference (ISWC 2023).
💥 Based on NLQxform, we developed NLQxform-UI — an easy-to-use web-based interactive QA system over DBLP Knowledge Graph, which is also open-sourced.
❗ Please note that the SSL certificate verification is disabled in do_query.py
and generator_main.py
due to the SSL: CERTIFICATE_VERIFY_FAILED
error that we recently encounter when querying the SPARQL endpoint.
This is just a temporary solution. Please do not send sensitive information or adapt the code to query other servers.
Please set up a Python environment with Pytorch, Pands, Transformers, Sacrebleu, Rouge-score, SPARQLWrapper, Colorama, Numpy, and Beautifulsoup4 installed.
Please download the data and models from OSF. (Unzip data.zip
and logs.zip
and move it to the root directory.)
You can use the following command to load our pre-trained model and ask questions interactively.
# add `--verbose` to check all intermediate results, including entity linking results and generated final queries
python -u generator_main.py --resume_prefix v2 --device 0
A snapshot of our system answering the question please enumerate other papers published by the authors of ‘BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding’ is shown below:
The following command can be used to train the model from scratch and evaluate it on the test set of the DBLP-QuAD Challenge.
# Preprocess the data
python -u preprocess_datasets.py
# Finetune model - results are saved in `./logs/[save_prefix]`
python -u finetune.py --save_prefix v1 --target processed_query_converted --max_epochs 30 --gpus 0,1,2,3 --save_best --batch_size 10 --learning_rate 5e-6
# Inference - results are saved in `./logs/[resume_prefix]/inference_heldout.json`
python -u inference.py --input data/DBLP-QuAD/dblp.heldout.500.questionsonly.json --resume_prefix v1 --device 0 --batch_size 12 --use_convert
# Postprocess - results are saved in `./logs/[resume_prefix]/postprocess_heldout.json`
python -u postprocess.py --resume_prefix v1
# Do querying - results are saved in `./logs/[resume_prefix]/answer.txt`
python -u do_query.py --resume_prefix v1
@inproceedings{DBLP:conf/semweb/0003ZRRB23,
author = {Ruijie Wang and
Zhiruo Zhang and
Luca Rossetto and
Florian Ruosch and
Abraham Bernstein},
editor = {Debayan Banerjee and
Ricardo Usbeck and
Nandana Mihindukulasooriya and
Gunjan Singh and
Raghava Mutharaju and
Pavan Kapanipathi},
title = {NLQxform: {A} Language Model-based Question to {SPARQL} Transformer},
booktitle = {Joint Proceedings of Scholarly {QALD} 2023 and SemREC 2023 co-located
with 22nd International Semantic Web Conference {ISWC} 2023, Athens,
Greece, November 6-10, 2023},
series = {{CEUR} Workshop Proceedings},
volume = {3592},
publisher = {CEUR-WS.org},
year = {2023},
url = {https://ceur-ws.org/Vol-3592/paper2.pdf},
timestamp = {Tue, 02 Jan 2024 17:44:44 +0100},
biburl = {https://dblp.org/rec/conf/semweb/0003ZRRB23.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}