# Test LLMs

This notebook tests candidate LLMs for use in the Coptic Metadata Viewer as the engine for generating SPARQL queres and interpreting responses.

The candidate LLMs were selected according to size (<20GB), availability on Ollama, and time of most recent update. They are:

- gpt-oss (OpenAI)
- deepseek-r1:8b (Deepseek)
- gemma3:27b (Google)
- mistral-small3.2 (Mistral)
- qwen3:8b (Alibaba)

## Import the necessary libraries

In [None]:
from langchain_community.graphs import OntotextGraphDBGraph
import os
from langchain.chains import OntotextGraphDBQAChain
from langchain_ollama.llms import OllamaLLM
from langchain.prompts import PromptTemplate
from time import time

## Initialize the graph

In [3]:
graph = OntotextGraphDBGraph(
    query_endpoint="http://localhost:7200/repositories/coptic-metadata-viewer",
    local_file="/Users/sjhuskey/Python/coptic_metadata_viewer/data/coptic-metadata-viewer.ttl",
)

## Create the templates

In [4]:
GRAPHDB_SPARQL_GENERATION_TEMPLATE = """
  You are an expert in SPARQL queries and RDF graph structures. Your task is to generate a SPARQL query based on the provided schema and the user's question.
  Use only the node types and properties provided in the schema.
  Do not use any node types or properties not explicitly listed.
  Include all necessary PREFIX declarations.
  Return only the SPARQL query.
  Use a (shorthand for rdf:type) to declare the class of a subject.
  Do not wrap the query in backticks.
  Do not use triple backticks or any markdown formatting.
  Do not include any text except the SPARQL query generated.
  Always return a human-readable label instead of a URI when possible.
  Do not use multiple WHERE clauses.
  
  Your RDF graph has these prefixes, classes, and properties:

    @prefix coptic: <http://www.semanticweb.org/sjhuskey/ontologies/2025/7/coptic-metadata-viewer/> .
    @prefix dcmitype: <http://purl.org/dc/dcmitype/> .
    @prefix dcterms: <http://purl.org/dc/terms/> .
    @prefix foaf: <http://xmlns.com/foaf/0.1/> .
    @prefix frbr: <http://purl.org/vocab/frbr/core#> .
    @prefix lawd: <http://lawd.info/ontology/> .
    @prefix ns1: <http://lexinfo.net/ontology/2.0/lexinfo#> .
    @prefix owl: <http://www.w3.org/2002/07/owl#> .
    @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
    @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
    @prefix schema1: <http://schema.org/> .
    @prefix skos: <http://www.w3.org/2004/02/skos/core#> .
    @prefix time: <http://www.w3.org/2006/time#> .
    
    - coptic:Colophon:
      - dcterms:description
      - dcterms:identifier
      - dcterms:isPartOf
      - dcterms:references
      - dcterms:type
      - ns1:translation
      - rdf:type
      - time:hasBeginning
      - time:hasEnd

    - coptic:Title:
      - dcterms:description
      - dcterms:identifier
      - dcterms:isPartOf
      - dcterms:references
      - dcterms:type
      - rdf:type

    - dcmitype:Collection:
      - dcterms:hasPart
      - dcterms:identifier
      - dcterms:spatial
      - rdf:type

    - dcterms:Agent:
      - dcterms:creator
      - dcterms:description
      - dcterms:identifier
      - foaf:name
      - owl:sameAs
      - rdf:type
      - schema1:title

    - dcterms:PhysicalResource:
      - dcterms:bibliographicCitation
      - dcterms:description
      - dcterms:hasPart
      - dcterms:identifier
      - dcterms:isPartOf
      - dcterms:medium
      - rdf:type
      - time:hasBeginning
      - time:hasEnd

    - foaf:Person:
      - dcterms:identifier
      - dcterms:isReferencedBy
      - foaf:name
      - ns1:transliteration
      - rdf:type
      - rdfs:label
      - schema1:birthPlace
      - schema1:gender
      - schema1:roleName
      - time:hasBeginning
      - time:hasEnd

    - frbr:Work:
      - dcterms:creator
      - dcterms:description
      - dcterms:identifier
      - dcterms:isPartOf
      - dcterms:isReferencedBy
      - dcterms:temporal
      - dcterms:title
      - rdf:type
      - rdfs:label

    - lawd:Place:
      - lawd:primaryForm
      - rdf:type
      - rdfs:label
      - skos:exactMatch
    
    Schema: {schema}
  
    The question delimited by triple backticks is:
  ```
  {prompt}
  ```
  """
GRAPHDB_SPARQL_GENERATION_PROMPT = PromptTemplate(
      input_variables=["schema", "prompt"],
      template=GRAPHDB_SPARQL_GENERATION_TEMPLATE,
  )

In [5]:
GRAPHDB_QA_TEMPLATE = """Task: Generate a natural language response from the results of a SPARQL query.
  You are an assistant that creates well-written and human understandable answers.
  The information part contains the information provided, which you can use to construct an answer.
  The information provided is authoritative, you must never doubt it or try to use your internal knowledge to correct it.
  Make your response sound like the information is coming from an AI assistant, but don't add any information.
  Don't use internal knowledge to answer the question, just say you don't know if no information is available.
  Information:
  {context}
  Question: {prompt}
  Helpful Answer:"""
GRAPHDB_QA_PROMPT = PromptTemplate(
      input_variables=["context", "prompt"], template=GRAPHDB_QA_TEMPLATE
  )

In [6]:
GRAPHDB_SPARQL_FIX_TEMPLATE = """
  This following SPARQL query delimited by triple backticks
  ```
  {generated_sparql}
  ```
  is not valid.
  The error delimited by triple backticks is
  ```
  {error_message}
  ```
  - Do NOT include any Markdown formatting (e.g. no ```sparql or backticks).
  - Do NOT output explanations, only the corrected query.
  - Always start with any necessary PREFIX declarations.
  - Use `a` instead of `rdf:type` to state class membership.
  - Ensure that classes like `frbr:Work` are used as objects, not predicates.
  - Fix common mistakes like using classes as predicates, missing semicolons, or malformed FILTER clauses.

  Only output a valid, working SPARQL query.
  ```
  {schema}
  ```
  """
  
GRAPHDB_SPARQL_FIX_PROMPT = PromptTemplate(
      input_variables=["error_message", "generated_sparql", "schema"],
      template=GRAPHDB_SPARQL_FIX_TEMPLATE,
  )

## Test the LLMs

Here I test each LLM on a series of five questions:

1. Who wrote "Sermo asceticus"?
2. "Sermo asceticus" is part of which manuscript?
3. Manuscript 575 is part of which collection?
4. Which manuscripts contain a work with the title "Bible: Epistulae Pauli"?
5. What is the description of manuscript 130?

I record the responses and the elapsed time for each.

In [8]:
# The LLMs to test
LLMS = ["gpt-oss:latest", "mistral-small3.2:latest", "deepseek-r1:8b", "gemma3:27b", "qwen3:8b"]

# The questions to ask the LLMs
questions = [
    "Who wrote 'Sermo asceticus'?",
    "'Sermo asceticus' is part of which manuscript?",
    "Manuscript 575 is part of which collection?",
    "Which manuscripts contain a work with the title 'Bible: Epistulae Pauli'?",
    "What is the description of manuscript 130?"
]

# Function to create a chain for a specific LLM
def make_chain(llm):
    return OntotextGraphDBQAChain.from_llm(
        llm=OllamaLLM(model=llm, temperature=0),
        graph=graph,
        sparql_generation_prompt=GRAPHDB_SPARQL_GENERATION_PROMPT,
        qa_prompt=GRAPHDB_QA_PROMPT,
        sparql_fix_prompt=GRAPHDB_SPARQL_FIX_PROMPT,
        max_fix_retries=3,
        verbose=True,
        allow_dangerous_requests=True,
    )

# Function to test the chain
def test_chain(chain):
    responses = []
    for question in questions:
        start = time()
        try:
            response = chain.invoke(question)
        except Exception as e:
            response = str(e)
            continue
        end = time()
        responses.append((question, response, end - start))
    return responses

# Initialize a list for the LLMs' responses
llm_responses = []

# Loop over the LLMs and gather their responses and times
for llm in LLMS:
    chain = make_chain(llm)
    responses = test_chain(chain)
    print(f"Responses for {llm}:\n")
    for question, response, duration in responses:
        print(f"Question: {question}\nResponse: {response}\nDuration: {duration:.2f} seconds\n")
        llm_responses.append((llm, question, response, duration))



[1m> Entering new OntotextGraphDBQAChain chain...[0m
Generated SPARQL:
[32;1m[1;3mPREFIX frbr: <http://purl.org/vocab/frbr/core#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX coptic: <http://www.semanticweb.org/sjhuskey/ontologies/2025/7/coptic-metadata-viewer/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?authorName WHERE {
  ?work a frbr:Work ;
        dcterms:title "Sermo asceticus" ;
        dcterms:creator ?author .
  ?author foaf:name ?authorName .
}[0m

[1m> Finished chain.[0m


[1m> Entering new OntotextGraphDBQAChain chain...[0m
Generated SPARQL:
[32;1m[1;3mPREFIX coptic: <http://www.semanticweb.org/sjhuskey/ontologies/2025/7/coptic-metadata-viewer/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?manuscript ?identifier
WHERE {
  {
    ?text rdf:type coptic:Title .
    ?text dcterms:description "Sermo asceticus" .
    ?text dcterms:isPartOf ?manuscript .
  } UNION {
    ?text rdf:type copti

In [9]:
llm_responses

[('gpt-oss:latest',
  "Who wrote 'Sermo asceticus'?",
  {'query': "Who wrote 'Sermo asceticus'?",
   'result': 'The *Sermo asceticus* was written by **Stephen of Thebes**.'},
  51.08218288421631),
 ('gpt-oss:latest',
  "'Sermo asceticus' is part of which manuscript?",
  {'query': "'Sermo asceticus' is part of which manuscript?",
   'result': 'I’m sorry, but I don’t have that information.'},
  129.05509281158447),
 ('gpt-oss:latest',
  'Manuscript 575 is part of which collection?',
  {'query': 'Manuscript 575 is part of which collection?',
   'result': 'I’m sorry, but I don’t have that information.'},
  170.63097500801086),
 ('gpt-oss:latest',
  "Which manuscripts contain a work with the title 'Bible: Epistulae Pauli'?",
  {'query': "Which manuscripts contain a work with the title 'Bible: Epistulae Pauli'?",
   'result': 'I’m sorry, but I don’t have that information.'},
  110.65448808670044),
 ('gpt-oss:latest',
  'What is the description of manuscript 130?',
  {'query': 'What is the de

In [37]:
# Turn llm_responses into a pandas DataFrame
import pandas as pd
df = pd.DataFrame(llm_responses, columns=["LLM", "Question", "Response", "Duration"])


In [38]:
# Make a new column that contains just the value for the "response" key in the Response column
df[["Query","Response Value"]] = pd.json_normalize(df['Response'])

In [40]:
df = df[["LLM", "Question", "Response Value", "Duration"]]  # Reorder columns
df = df.rename(columns={"Response Value": "Response","Duration": "Duration (Seconds)"})
df

Unnamed: 0,LLM,Question,Response,Duration (Seconds)
0,gpt-oss:latest,Who wrote 'Sermo asceticus'?,The *Sermo asceticus* was written by **Stephen...,51.082183
1,gpt-oss:latest,'Sermo asceticus' is part of which manuscript?,"I’m sorry, but I don’t have that information.",129.055093
2,gpt-oss:latest,Manuscript 575 is part of which collection?,"I’m sorry, but I don’t have that information.",170.630975
3,gpt-oss:latest,Which manuscripts contain a work with the titl...,"I’m sorry, but I don’t have that information.",110.654488
4,gpt-oss:latest,What is the description of manuscript 130?,"I’m sorry, but I don’t have that information.",81.75291
5,mistral-small3.2:latest,Who wrote 'Sermo asceticus'?,"Based on the information available, the author...",94.450615
6,mistral-small3.2:latest,'Sermo asceticus' is part of which manuscript?,I don't have the information available to answ...,46.471404
7,mistral-small3.2:latest,Manuscript 575 is part of which collection?,I don't have information about Manuscript 575....,46.053564
8,mistral-small3.2:latest,Which manuscripts contain a work with the titl...,I don't have the information available to answ...,44.642782
9,mistral-small3.2:latest,What is the description of manuscript 130?,The manuscript 130 has an interesting provenan...,66.516913


In [45]:
grouped = df.groupby("Question")[["LLM","Response","Duration (Seconds)"]].agg(list)
grouped

Unnamed: 0_level_0,LLM,Response,Duration (Seconds)
Question,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
'Sermo asceticus' is part of which manuscript?,"[gpt-oss:latest, mistral-small3.2:latest]","[I’m sorry, but I don’t have that information....","[129.05509281158447, 46.47140383720398]"
Manuscript 575 is part of which collection?,"[gpt-oss:latest, mistral-small3.2:latest]","[I’m sorry, but I don’t have that information....","[170.63097500801086, 46.05356407165527]"
What is the description of manuscript 130?,"[gpt-oss:latest, mistral-small3.2:latest]","[I’m sorry, but I don’t have that information....","[81.75290989875793, 66.5169129371643]"
Which manuscripts contain a work with the title 'Bible: Epistulae Pauli'?,"[gpt-oss:latest, mistral-small3.2:latest]","[I’m sorry, but I don’t have that information....","[110.65448808670044, 44.64278221130371]"
Who wrote 'Sermo asceticus'?,"[gpt-oss:latest, mistral-small3.2:latest]",[The *Sermo asceticus* was written by **Stephe...,"[51.08218288421631, 94.45061492919922]"


In [46]:
grouped.columns

Index(['LLM', 'Response', 'Duration (Seconds)'], dtype='object')

In [47]:
grouped[["GPT Response","Mistral Response"]] = pd.DataFrame(grouped["Response"].tolist(), index=grouped.index)
grouped

Unnamed: 0_level_0,LLM,Response,Duration (Seconds),GPT Response,Mistral Response
Question,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
'Sermo asceticus' is part of which manuscript?,"[gpt-oss:latest, mistral-small3.2:latest]","[I’m sorry, but I don’t have that information....","[129.05509281158447, 46.47140383720398]","I’m sorry, but I don’t have that information.",I don't have the information available to answ...
Manuscript 575 is part of which collection?,"[gpt-oss:latest, mistral-small3.2:latest]","[I’m sorry, but I don’t have that information....","[170.63097500801086, 46.05356407165527]","I’m sorry, but I don’t have that information.",I don't have information about Manuscript 575....
What is the description of manuscript 130?,"[gpt-oss:latest, mistral-small3.2:latest]","[I’m sorry, but I don’t have that information....","[81.75290989875793, 66.5169129371643]","I’m sorry, but I don’t have that information.",The manuscript 130 has an interesting provenan...
Which manuscripts contain a work with the title 'Bible: Epistulae Pauli'?,"[gpt-oss:latest, mistral-small3.2:latest]","[I’m sorry, but I don’t have that information....","[110.65448808670044, 44.64278221130371]","I’m sorry, but I don’t have that information.",I don't have the information available to answ...
Who wrote 'Sermo asceticus'?,"[gpt-oss:latest, mistral-small3.2:latest]",[The *Sermo asceticus* was written by **Stephe...,"[51.08218288421631, 94.45061492919922]",The *Sermo asceticus* was written by **Stephen...,"Based on the information available, the author..."


In [48]:
grouped[["GPT Duration (Seconds)","Mistral Duration (Seconds)"]] = pd.DataFrame(grouped["Duration (Seconds)"].tolist(), index=grouped.index)
grouped

Unnamed: 0_level_0,LLM,Response,Duration (Seconds),GPT Response,Mistral Response,GPT Duration (Seconds),Mistral Duration (Seconds)
Question,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
'Sermo asceticus' is part of which manuscript?,"[gpt-oss:latest, mistral-small3.2:latest]","[I’m sorry, but I don’t have that information....","[129.05509281158447, 46.47140383720398]","I’m sorry, but I don’t have that information.",I don't have the information available to answ...,129.055093,46.471404
Manuscript 575 is part of which collection?,"[gpt-oss:latest, mistral-small3.2:latest]","[I’m sorry, but I don’t have that information....","[170.63097500801086, 46.05356407165527]","I’m sorry, but I don’t have that information.",I don't have information about Manuscript 575....,170.630975,46.053564
What is the description of manuscript 130?,"[gpt-oss:latest, mistral-small3.2:latest]","[I’m sorry, but I don’t have that information....","[81.75290989875793, 66.5169129371643]","I’m sorry, but I don’t have that information.",The manuscript 130 has an interesting provenan...,81.75291,66.516913
Which manuscripts contain a work with the title 'Bible: Epistulae Pauli'?,"[gpt-oss:latest, mistral-small3.2:latest]","[I’m sorry, but I don’t have that information....","[110.65448808670044, 44.64278221130371]","I’m sorry, but I don’t have that information.",I don't have the information available to answ...,110.654488,44.642782
Who wrote 'Sermo asceticus'?,"[gpt-oss:latest, mistral-small3.2:latest]",[The *Sermo asceticus* was written by **Stephe...,"[51.08218288421631, 94.45061492919922]",The *Sermo asceticus* was written by **Stephen...,"Based on the information available, the author...",51.082183,94.450615


In [49]:
grouped = grouped.drop(columns=["LLM","Response","Duration (Seconds)"])
grouped

Unnamed: 0_level_0,GPT Response,Mistral Response,GPT Duration (Seconds),Mistral Duration (Seconds)
Question,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
'Sermo asceticus' is part of which manuscript?,"I’m sorry, but I don’t have that information.",I don't have the information available to answ...,129.055093,46.471404
Manuscript 575 is part of which collection?,"I’m sorry, but I don’t have that information.",I don't have information about Manuscript 575....,170.630975,46.053564
What is the description of manuscript 130?,"I’m sorry, but I don’t have that information.",The manuscript 130 has an interesting provenan...,81.75291,66.516913
Which manuscripts contain a work with the title 'Bible: Epistulae Pauli'?,"I’m sorry, but I don’t have that information.",I don't have the information available to answ...,110.654488,44.642782
Who wrote 'Sermo asceticus'?,The *Sermo asceticus* was written by **Stephen...,"Based on the information available, the author...",51.082183,94.450615
