Skip to content

This repository includes the code of our paper NLQxform: A Language Model-based Question to SPARQL Transformer.

License

Notifications You must be signed in to change notification settings

ruijie-wang-uzh/NLQxform

Repository files navigation

Ruijie Wang, Zhiruo Zhang, Luca Rossetto, Florian Ruosch, and Abraham Bernstein

🏆 Winner of the DBLP_QuAD KGQA Task - Scholarly QALD Challenge at The 22nd International Semantic Web Conference (ISWC 2023).

💥 Based on NLQxform, we developed NLQxform-UI — an easy-to-use web-based interactive QA system over DBLP Knowledge Graph, which is also open-sourced.

❗ Please note that the SSL certificate verification is disabled in do_query.py and generator_main.py due to the SSL: CERTIFICATE_VERIFY_FAILED error that we recently encounter when querying the SPARQL endpoint. This is just a temporary solution. Please do not send sensitive information or adapt the code to query other servers.


Environment Setup

Please set up a Python environment with Pytorch, Pands, Transformers, Sacrebleu, Rouge-score, SPARQLWrapper, Colorama, Numpy, and Beautifulsoup4 installed.


Data and Models

Please download the data and models from OSF. (Unzip data.zip and logs.zip and move it to the root directory.)


Inference using our Pre-trained Models

You can use the following command to load our pre-trained model and ask questions interactively.

# add `--verbose` to check all intermediate results, including entity linking results and generated final queries
python -u generator_main.py --resume_prefix v2 --device 0

A snapshot of our system answering the question please enumerate other papers published by the authors of ‘BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding’ is shown below:

overall results


Training from Scratch

The following command can be used to train the model from scratch and evaluate it on the test set of the DBLP-QuAD Challenge.

# Preprocess the data
python -u preprocess_datasets.py

# Finetune model - results are saved in `./logs/[save_prefix]`
python -u finetune.py --save_prefix v1 --target processed_query_converted --max_epochs 30 --gpus 0,1,2,3 --save_best --batch_size 10 --learning_rate 5e-6

# Inference - results are saved in `./logs/[resume_prefix]/inference_heldout.json`
python -u inference.py --input data/DBLP-QuAD/dblp.heldout.500.questionsonly.json --resume_prefix v1 --device 0 --batch_size 12 --use_convert

# Postprocess - results are saved in `./logs/[resume_prefix]/postprocess_heldout.json`
python -u postprocess.py --resume_prefix v1

# Do querying - results are saved in `./logs/[resume_prefix]/answer.txt`
python -u do_query.py --resume_prefix v1

Citation

@inproceedings{DBLP:conf/semweb/0003ZRRB23,
  author       = {Ruijie Wang and
                  Zhiruo Zhang and
                  Luca Rossetto and
                  Florian Ruosch and
                  Abraham Bernstein},
  editor       = {Debayan Banerjee and
                  Ricardo Usbeck and
                  Nandana Mihindukulasooriya and
                  Gunjan Singh and
                  Raghava Mutharaju and
                  Pavan Kapanipathi},
  title        = {NLQxform: {A} Language Model-based Question to {SPARQL} Transformer},
  booktitle    = {Joint Proceedings of Scholarly {QALD} 2023 and SemREC 2023 co-located
                  with 22nd International Semantic Web Conference {ISWC} 2023, Athens,
                  Greece, November 6-10, 2023},
  series       = {{CEUR} Workshop Proceedings},
  volume       = {3592},
  publisher    = {CEUR-WS.org},
  year         = {2023},
  url          = {https://ceur-ws.org/Vol-3592/paper2.pdf},
  timestamp    = {Tue, 02 Jan 2024 17:44:44 +0100},
  biburl       = {https://dblp.org/rec/conf/semweb/0003ZRRB23.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

About

This repository includes the code of our paper NLQxform: A Language Model-based Question to SPARQL Transformer.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages