# GraphRDFQAChain

This notebook shows how to use LLMs to provide a natural language interface to an RDF graph database you can query with the SPARQL query language.

You may start with an external RDF store, a public SPARQL http end point or a serialized RDF file. If neither is provided, an in-memory RDF graph will be created for you. In theory, you may use any RDF stores that support the [W3C SPARQL 1.1 standard](https://www.w3.org/TR/sparql11-query/). An (incomplete) list of RDF store products can be found on this Wikipedia page: https://www.w3.org/wiki/LargeTripleStores


In [33]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import GraphRDFQAChain
from langchain.graphs import RDFGraph

In [34]:
graph = RDFGraph()
# graph = RDFGraph('path/to/graph.ttl')
# graph = RDFGraph('http://localhost:9999/blazegraph/namespace/kb/sparql') # Blazegraph

## Seeding the database

Assuming your database is empty, you can populate it using SPARQL query language. The following SPARQL UPDATE statement is idempotent, which means the database information will be the same if you run it one or multiple times.

In [35]:
graph.update("""
PREFIX : <http://example.org/>
PREFIX schema: <http://schema.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

INSERT DATA {
  schema:Person a rdfs:Class ;
      rdfs:label "Person" .
      
  schema:Movie a rdfs:Class ;
      rdfs:label "Movie" .

  :TomCruise a schema:Person ;
      schema:name "Tom Cruise" .
  :ValKilmer a schema:Person ;
      schema:name "Val Kilmer" .
  :KellyMcGillis a schema:Person ;
      schema:name "Kelly McGillis" .
  :JenniferConnelly a schema:Person ;
      schema:name "Jennifer Connelly" .

  :TopGun a schema:Movie ;
      schema:name "Top Gun" ;
      schema:datePublished "1986-05-16"^^xsd:date ;
      schema:actor :TomCruise, :ValKilmer, :KellyMcGillis .
  :TopGunMaverick a schema:Movie ;
      schema:name "Top Gun: Maverick" ;
      schema:datePublished "2022-05-27"^^xsd:date ;
      schema:actor :TomCruise, :ValKilmer, :JenniferConnelly .
}
""")

## Refresh graph schema information
If the schema of database changes, you can refresh the schema information needed to generate Cypher statements.

In [36]:
graph.refresh_schema()

In [37]:
print(graph.get_schema)

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://schema.org/Movie> a rdfs:Class .

<http://schema.org/Person> a rdfs:Class .

rdfs:Class a rdfs:Class .

<http://schema.org/actor> a rdf:Property ;
    rdfs:domain <http://schema.org/Movie> ;
    rdfs:range <http://schema.org/Person> .

<http://schema.org/datePublished> a rdf:Property ;
    rdfs:domain <http://schema.org/Movie> ;
    rdfs:range xsd:date .

<http://schema.org/name> a rdf:Property ;
    rdfs:domain <http://schema.org/Movie>,
        <http://schema.org/Person> ;
    rdfs:range xsd:string .

rdf:type a rdf:Property ;
    rdfs:domain <http://schema.org/Movie>,
        <http://schema.org/Person> ;
    rdfs:range rdfs:Class .

rdfs:label a rdf:Property ;
    rdfs:domain rdfs:Class ;
    rdfs:range xsd:string .




## Querying the graph

We can now use the graph RDF QA chain to ask question of the graph

In [38]:
chain = GraphRDFQAChain.from_llm(
    ChatOpenAI(temperature=0), graph=graph, verbose=True
)

In [39]:
chain.run("Who played in Top Gun?")



[1m> Entering new GraphRDFQAChain chain...[0m
Generated RDF SPARQL:
[32;1m[1;3mSELECT ?actorName
WHERE {
  ?movie rdf:type <http://schema.org/Movie> .
  ?movie <http://schema.org/name> "Top Gun" .
  ?movie <http://schema.org/actor> ?actor .
  ?actor <http://schema.org/name> ?actorName .
}[0m
Full Context:
[32;1m[1;3m[(rdflib.term.Literal('Kelly McGillis'),), (rdflib.term.Literal('Tom Cruise'),), (rdflib.term.Literal('Val Kilmer'),)][0m

[1m> Finished chain.[0m


'Kelly McGillis, Tom Cruise, and Val Kilmer played in Top Gun.'

In [40]:
chain.run("Who played in Top Gun and Top Gun: Maverick?")



[1m> Entering new GraphRDFQAChain chain...[0m
Generated RDF SPARQL:
[32;1m[1;3mSELECT ?actorName WHERE {
  ?movie1 rdf:type <http://schema.org/Movie> ;
          <http://schema.org/name> "Top Gun" .
  ?movie2 rdf:type <http://schema.org/Movie> ;
          <http://schema.org/name> "Top Gun: Maverick" .
  ?movie1 <http://schema.org/actor> ?actor .
  ?movie2 <http://schema.org/actor> ?actor .
  ?actor <http://schema.org/name> ?actorName .
}[0m
Full Context:
[32;1m[1;3m[(rdflib.term.Literal('Tom Cruise'),), (rdflib.term.Literal('Val Kilmer'),)][0m

[1m> Finished chain.[0m


'Tom Cruise and Val Kilmer played in both Top Gun and Top Gun: Maverick.'

In [41]:
chain.run("Who played in Top Gun but not Top Gun: Maverick?")



[1m> Entering new GraphRDFQAChain chain...[0m
Generated RDF SPARQL:
[32;1m[1;3mPREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX schema: <http://schema.org/>
SELECT ?actorName
WHERE {
  ?topGun schema:name "Top Gun" .
  ?topGun schema:actor ?actor .
  ?actor schema:name ?actorName .
  FILTER NOT EXISTS {
    ?topGunMaverick schema:name "Top Gun: Maverick" .
    ?topGunMaverick schema:actor ?actor .
  }
}[0m
Full Context:
[32;1m[1;3m[(rdflib.term.Literal('Kelly McGillis'),)][0m

[1m> Finished chain.[0m


'Kelly McGillis played in Top Gun but not Top Gun: Maverick.'

## Limit the number of results
You can limit the number of results from the Cypher QA Chain using the `top_k` parameter.
The default is 10.

In [42]:
chain = GraphRDFQAChain.from_llm(
    ChatOpenAI(temperature=0), graph=graph, verbose=True, top_k=2
)

In [43]:
chain.run("Who played in Top Gun?")



[1m> Entering new GraphRDFQAChain chain...[0m
Generated RDF SPARQL:
[32;1m[1;3mSELECT ?actorName
WHERE {
  ?movie rdf:type <http://schema.org/Movie> .
  ?movie <http://schema.org/name> "Top Gun" .
  ?movie <http://schema.org/actor> ?actor .
  ?actor <http://schema.org/name> ?actorName .
}[0m
Full Context:
[32;1m[1;3m[(rdflib.term.Literal('Kelly McGillis'),), (rdflib.term.Literal('Tom Cruise'),)][0m

[1m> Finished chain.[0m


'Kelly McGillis and Tom Cruise played in Top Gun.'

## Return intermediate results
You can return intermediate steps from the Cypher QA Chain using the `return_intermediate_steps` parameter

In [44]:
chain = GraphRDFQAChain.from_llm(
    ChatOpenAI(temperature=0), graph=graph, verbose=True, return_intermediate_steps=True
)

In [45]:
result = chain("Who played in Top Gun?")
print(f"Intermediate steps: {result['intermediate_steps']}")
print(f"Final answer: {result['result']}")



[1m> Entering new GraphRDFQAChain chain...[0m
Generated RDF SPARQL:
[32;1m[1;3mSELECT ?actorName
WHERE {
  ?movie rdf:type <http://schema.org/Movie> .
  ?movie <http://schema.org/name> "Top Gun" .
  ?movie <http://schema.org/actor> ?actor .
  ?actor <http://schema.org/name> ?actorName .
}[0m
Full Context:
[32;1m[1;3m[(rdflib.term.Literal('Kelly McGillis'),), (rdflib.term.Literal('Tom Cruise'),), (rdflib.term.Literal('Val Kilmer'),)][0m

[1m> Finished chain.[0m
Intermediate steps: [{'query': 'SELECT ?actorName\nWHERE {\n  ?movie rdf:type <http://schema.org/Movie> .\n  ?movie <http://schema.org/name> "Top Gun" .\n  ?movie <http://schema.org/actor> ?actor .\n  ?actor <http://schema.org/name> ?actorName .\n}'}, {'context': [(rdflib.term.Literal('Kelly McGillis'),), (rdflib.term.Literal('Tom Cruise'),), (rdflib.term.Literal('Val Kilmer'),)]}]
Final answer: Kelly McGillis, Tom Cruise, and Val Kilmer played in Top Gun.


## Return direct results
You can return direct results from the Cypher QA Chain using the `return_direct` parameter

In [46]:
chain = GraphRDFQAChain.from_llm(
    ChatOpenAI(temperature=0), graph=graph, verbose=True, return_direct=True
)

In [47]:
results = chain.run("Who played in Top Gun?")
results



[1m> Entering new GraphRDFQAChain chain...[0m
Generated RDF SPARQL:
[32;1m[1;3mSELECT ?actorName
WHERE {
  ?movie rdf:type <http://schema.org/Movie> .
  ?movie <http://schema.org/name> "Top Gun" .
  ?movie <http://schema.org/actor> ?actor .
  ?actor <http://schema.org/name> ?actorName .
}[0m

[1m> Finished chain.[0m


[(rdflib.term.Literal('Kelly McGillis'),),
 (rdflib.term.Literal('Tom Cruise'),),
 (rdflib.term.Literal('Val Kilmer'),)]

In [48]:
import json
json.dumps(results)

'[["Kelly McGillis"], ["Tom Cruise"], ["Val Kilmer"]]'