## Example use-case

In this notebook, we showcase a simple question answering task. We will use the [SPHN ontology](https://www.biomedit.ch/rdf/sphn-ontology/sphn), along with a small mock dataset which contains information artificial medical data.

In [1]:

SPARQL_TEMPLATE = """
Generate a SPARQL query to answer the input question. A sample of the knowledge graph schema is provided to help construct the query.
After you generate the sparql, you should display it.
When generating sparql:
* never enclose the sparql in back-quotes.
* always include the prefix declarations.
* prefer using OPTIONAL when selecting multiple variables.
* Allow case-insensitive matching of strings.

Use the following format:

Question: the input question for which you must generate a SPARQL query
Information: the schema information in RDF format. This will help you generate the sparql query with the correct format.

Question: {question_str}
Information:
{context_str}
Answer:
"""

ANSWER_TEMPLATE = """
The following describe a user question, associated SPARQL query and the result from executing the query.
Based on this information, write an answer in simple terms that describes the results.
When appropriate, use markdown formatting to format the results into a table or bullet points.

Question:
{question_str}
Query:
{query_str}
Result:
{result_str}
Answer:
"""

We setup a similar configuration as in the nl_sparql notebook, but we have one sparql configuration for the ontology, and one for the instance data, each living in different files.

In [2]:
from aikg.config import ChatConfig, ChromaConfig, SparqlConfig

chroma_config = ChromaConfig(
    host="local",
    port=8000,
    collection_name="test",
    embedding_model="all-MiniLM-L6-v2",
    persist_directory="/tmp/chroma-test/",
)
ontology_config = SparqlConfig(
    endpoint="../sphn/sphn_ontology_2023_2.ttl",
)
kg_config = SparqlConfig(
    endpoint="../sphn/sphn_mock_data_2023_2.ttl",
)

chat_config = ChatConfig(
    model_id="lmsys/vicuna-7b-v1.3",
    max_new_tokens=48,
    max_input_size=2048,
    num_output=256,
    max_chunk_overlap=20,
    answer_template=ANSWER_TEMPLATE,
    sparql_template=SPARQL_TEMPLATE
)


In [3]:
import os
os.environ["OPENAI_API_KEY"] = "sk-...a"

In [4]:

from aikg.utils.llm import setup_llm_chain
from aikg.utils.rdf import setup_kg


# Use OpenAI API
from langchain.llms import OpenAI
llm = OpenAI(model_name="text-davinci-003")

# For now, both chains share the same model to spare memory
answer_chain = setup_llm_chain(llm, chat_config.answer_template)
sparql_chain = setup_llm_chain(llm, chat_config.sparql_template)
kg = setup_kg(**kg_config.dict())

# Embed ontology
from aikg.flows.chroma_build import chroma_build_flow
chroma_build_flow(chroma_config, ontology_config)

Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#gYear, Converter=<function parse_date at 0x7f44061e09d0>
Traceback (most recent call last):
  File "/home/cmatthey/.cache/pypoetry/virtualenvs/aikg-ULDgE_fB-py3.10/lib/python3.10/site-packages/rdflib/term.py", line 2084, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "/home/cmatthey/.cache/pypoetry/virtualenvs/aikg-ULDgE_fB-py3.10/lib/python3.10/site-packages/isodate/isodates.py", line 203, in parse_date
    raise ISO8601Error('Unrecognised ISO 8601 date format: %r' % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '-1508+14:00'
Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#gYear, Converter=<function parse_date at 0x7f44061e09d0>
Traceback (most recent call last):
  File "/home/cmatthey/.cache/pypoetry/virtualenvs/aikg-ULDgE_fB-py3.10/lib/python3.10/site-packages/rdflib/term.py", lin

In [5]:

from aikg.utils.chroma import setup_client, setup_collection
client = setup_client(
    chroma_config.host,
    chroma_config.port,
    chroma_config.persist_directory,
)
collection = setup_collection(
    client,
    chroma_config.collection_name,
    chroma_config.embedding_model,
)


  from .autonotebook import tqdm as notebook_tqdm


In [45]:
QUESTION = "Please give me the number of healthcare encounters recorded per year."

In [46]:
from aikg.utils.chat import generate_sparql
query = generate_sparql(QUESTION, collection, sparql_chain)
print(query)


PREFIX ns1: <http://www.w3.org/2004/02/skos/core#>
PREFIX ns2: <https://biomedit.ch/rdf/sphn-ontology/sphn#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT (COUNT(*) AS ?encounters) (YEAR(?startDateTime) AS ?year)
WHERE {
  ?encounter a ns2:HealthcareEncounter ;
    ns2:hasStartDateTime ?startDateTime .
}
GROUP BY ?year


In [47]:
from aikg.utils.rdf import query_kg
results = query_kg(kg, query)
print(results)

[(rdflib.term.Literal('2', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')), rdflib.term.Literal('2009', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))]


In [48]:
from aikg.utils.chat import generate_answer
print(generate_answer(QUESTION, query, results, answer_chain))

In 2009 there were 2 healthcare encounters recorded.
