# Create and query knowledge graph with LLM. (CosmosDB version)

![title](./cosmosgremlin.png)

We created a pickle file in notebook ´[Create knowledge graph from PDF](./knowledgegraph.ipynb)´ that we will now use with CosmosDB

In [None]:
%pip install -r requirements.txt

# Load env variables and connect to CosmosDB database

- You need to have CosmosDB Gremlin graph database in Azure.
- The database name is expected to be 'rag' and the graph-name 'kg'.
- It also assumes 'type' is used as partition key (/type)

In [None]:
import os
from dotenv import load_dotenv
import nest_asyncio

from GremlinGraph import GremlinGraph


load_dotenv()

nest_asyncio.apply()

# Note that this graph-provider is implemented by me (I'm looking to contribute it to LangChain)
# This is just gentle warning that there is not much feedback about it how it works.
# If you have problems with it, please share your experience (Create ticket)

graph = GremlinGraph(
    url=os.getenv("GREMLIN_URI"),
    username="/dbs/rag/colls/kg", # CosmosDB Gremlin database named 'rag' with graph named 'kg'
    password=os.getenv("GREMLIN_PASSWORD"),
)


In [None]:
import pickle
with open('./data/graph_docs.pkl','rb') as f:
    graph_docs = pickle.load(f)
    graph.add_graph_documents(graph_docs)

graph.refresh_schema()

In [None]:

from langchain_openai import AzureChatOpenAI
from gremlinqa import GremlinQAChain
from langchain_core.prompts import PromptTemplate

# CosmosDB does not support all Tinkerpop Gremlin, so we need to provide a custom template

COSMOS_TEMPLATE = """Task:Generate Gremlin statement to query a CosmosDB Gremlin Graph Database.
Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.

Schema:
{schema}
Note: Do not include any explanations or apologies in your responses.
Do not respond to any questions that might ask anything else than for you to construct a Gremlin statement.
Do not include any text except the generated Gremlin statement.

CosmosDB Gremlin dialect does not support following Gremlin features:
- The match() step isn't currently available. This step provides declarative querying capabilities.
- Objects as properties on vertices or edges aren't supported. Properties can only be primitive types or arrays.
- Sorting by array properties order().by(<array property>) isn't supported. Sorting is supported only by primitive types.
- Non-primitive JSON types aren't supported. Use string, number, or true/false types. null values aren't supported.

The question is:
{question}"""

c_llm = AzureChatOpenAI(
    model=os.getenv("OPENAI_DEPLOYMENT_NAME"), 
    temperature=0, 
    max_tokens=1500,
    verbose=True)

# Note that this QA Chain is implemented by me (I'm looking to contribute it to LangChain)
# This is just gentle warning that there is not much feedback about it how it works.
# If you have problems with it, please share your experience (Create ticket)

gremlin_chain = GremlinQAChain.from_llm(
    graph=graph,
    llm=c_llm,    
    verbose=True,
    top_k = 500,
    gremlin_prompt=PromptTemplate(input_variables=["schema", "question"], template=COSMOS_TEMPLATE)
)


In [None]:
from IPython.display import Markdown, display
response = gremlin_chain.invoke("Make table of different breeding groups and their characteristics")
display(Markdown(response["result"]))

In [None]:
# Clear the CosmosDB graph
graph.client.submit("g.V().drop()").all().result()