The exponential growth of knowledge graphs necessitates effective and efficient methods for their exploration and understanding. Frequently Asked Questions (FAQ) is a list of questions and answers related to a specific topic intended to help people understand a particular subject. In this paper, we present ULYSSES, the first system for automatically constructing a FAQ for large Knowledge Graphs. Our method consists of three key steps. First, we select the most frequent questions by exploiting available query logs. In the sequel, we answer the selected queries using the original graph and finally, we construct textual descriptions of both the queries and the corresponding answers exploring state-of-the-art transformer models, i.e., ChatGTP and Gemini. We evaluate the results of each model using a human-constructed FAQ, contributing a unique dataset to the domain and showing the benefits of our approach
ON LINE DEMO AT ULYSSES DEMO
You can use the Python script in this folder, to query the SPARQL ENDPOINT, with the most frequent SPARQL queries from the log provided in the data folder.
The script parse_tsv_file.py will USE THE SPARQL QUERY LOG (FROM DATA FOLDER) to query the Endpoint, and collect the answers/results .
It will create:
- A Folder containing json files with their responses, returned from querying DBpedia Endpoint with the most frequent queries (response_folder)
- An Excel file created by the script containing the saved DataFrames and the counts from the execution of the queries in the DBpedia endpoint(db.xlsx) - (contains the most frequent queries , which will be used for FAQ creation)
BOTH to be given as input to the ULYSSES LLM query script
- main.py : reads the SPARQL queries in Results.xlsx and their responses in response_folder (contains json style replies of the SPARQL queries from Endpoint-done by Parser) and via the OpenAI API, translates the queries/responses to plain English questions/answers pairs
- Response_folder: The folder containing the ENDPOINT's responses to the queries in json format, created by parser
- Results.xlsx: The actual SPARQL queries (most frequent used), created by parser
Inputs:
- You have to use YOUR OWN OpenAI's ChatGPT secret key, in the appropriate spots in the code .
- The same code with minor changes, was used for leveraging Google's GEMINI in the procedure .