Skip to content

giannisvassiliou/KGFaq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ULYSSES: FreqUentLY ASked QueStions for KnowlEdge GraphS

The exponential growth of knowledge graphs necessitates effective and efficient methods for their exploration and understanding. Frequently Asked Questions (FAQ) is a list of questions and answers related to a specific topic intended to help people understand a particular subject. In this paper, we present ULYSSES, the first system for automatically constructing a FAQ for large Knowledge Graphs. Our method consists of three key steps. First, we select the most frequent questions by exploiting available query logs. In the sequel, we answer the selected queries using the original graph and finally, we construct textual descriptions of both the queries and the corresponding answers exploring state-of-the-art transformer models, i.e., ChatGTP and Gemini. We evaluate the results of each model using a human-constructed FAQ, contributing a unique dataset to the domain and showing the benefits of our approach

ON LINE DEMO AT ULYSSES DEMO

ULYSSES Parser


You can use the Python script in this folder, to query the SPARQL ENDPOINT, with the most frequent SPARQL queries from the log provided in the data folder.


The script parse_tsv_file.py will USE THE SPARQL QUERY LOG (FROM DATA FOLDER) to query the Endpoint, and collect the answers/results .

It will create:

  • A Folder containing json files with their responses, returned from querying DBpedia Endpoint with the most frequent queries (response_folder)
  • An Excel file created by the script containing the saved DataFrames and the counts from the execution of the queries in the DBpedia endpoint(db.xlsx) - (contains the most frequent queries , which will be used for FAQ creation)

BOTH to be given as input to the ULYSSES LLM query script



ULYSSES LLM Query

You can use the Python script in this folder, to send to ChatGPT the most frequent queries, along with their output as collected from the previous (parser) script, to create the plain English question/answer pairs.

  • main.py : reads the SPARQL queries in Results.xlsx and their responses in response_folder (contains json style replies of the SPARQL queries from Endpoint-done by Parser) and via the OpenAI API, translates the queries/responses to plain English questions/answers pairs

  • Inputs:

    • Response_folder: The folder containing the ENDPOINT's responses to the queries in json format, created by parser
    • Results.xlsx: The actual SPARQL queries (most frequent used), created by parser


  • You have to use YOUR OWN OpenAI's ChatGPT secret key, in the appropriate spots in the code .
  • The same code with minor changes, was used for leveraging Google's GEMINI in the procedure .

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages