This code is inspired by several tutorials from llama-index and langchain. In order to execute this code entirely you need to install and run a Nebula GraphStore on your machine so this code can store and request a KnowledgeGraph.

In [None]:
# First we install everything we need
!python3 -m pip install llama-index-llms-openai llama-index-llms-langchain llama-index-llms-fireworks llama-index-graph-stores-nebula langchain flask bs4 langchain.community llama-index requests langchain-openai
# Second we import everything
from llama_index.core import KnowledgeGraphIndex, SimpleDirectoryReader
from llama_index.core import StorageContext
from llama_index.graph_stores.nebula import NebulaGraphStore
from llama_index.core import Settings
from langchain.agents import initialize_agent, AgentType
from langchain.tools import StructuredTool, Tool
from langchain.chains import NebulaGraphQAChain
from langchain.memory import ConversationBufferMemory
from langchain_core.prompts import PromptTemplate
import os
from langchain_community.graphs import NebulaGraph
from langchain_openai import ChatOpenAI
import requests
from flask import Flask, request
from bs4 import BeautifulSoup

First we download the wikipedia page for Search Engine Optimization and store it in the data directory. You can skip this step and put any files you want in the data directory to change the expertise of the agent. The code will create the data directory if it does not exists. If you skip this step make sure to create the directory before adding documents to it.

In [None]:
import requests
import os
from bs4 import BeautifulSoup

# Getting the Wikipedia page
res = requests.get("https://en.wikipedia.org/wiki/Search_engine_optimization")
if res.status_code == requests.codes.OK:
    soup = BeautifulSoup(res.text, 'html.parser')
    if not os.path.exists('./data'):
        os.makedirs('./data')
    with open("./data/wikipedia_seo.txt", "w", encoding="utf-8") as f:
        f.write(soup.body.get_text())
else:
    print(res.status_code)


Before loading documents into the knowledge graph you must have an instance of Nebula graph store running. Visit https://docs.nebula-graph.io/3.6.0/ to see the different options (Docker or on premise). We strongly recommend avoiding Docker as while much simpler it is very much slower and can impact the result of the code. Once you have a datastore running, you should add a space called sej_graph_rag (or any other name) with the TAG entity (name string) and the EDGE TYPE relationship(relationship string). The command required to create the space tag and relationship are the following:

CREATE SPACE sej_graph_rag(vid_type=FIXED_STRING(256));
USE sej_graph_rag;
CREATE TAG entity(name string);
CREATE EDGE relationship(relationship string);

(if you run into the error 'not enough host' while executing this command please run the following command: 
ADD HOSTS "storaged0":9779
)

Once this is done you can execute the rest of the code.

In [None]:
os.environ["NEBULA_USER"] = "root" # Put your nebula username
os.environ["NEBULA_PASSWORD"] = "nebula"  # Put your nebula password
os.environ["NEBULA_ADDRESS"] = "localhost:9669"  # Change if you didn't use the defaults

space_name = "sej_graph_rag" # Change it if you used another name for the space
edge_types, rel_prop_names = ["relationship"], ["relationship"]
tags = ["entity"]

graph_store = NebulaGraphStore(
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
)

storage_context = StorageContext.from_defaults(graph_store=graph_store)

# Set up the OpenAI API key
os.environ["OPENAI_API_KEY"] = "" # Fill with your OpenAI API key
# Create the LLM using the GPT-4 model
llm = ChatOpenAI(temperature=0, model="gpt-4")
Settings.llm = llm
Settings.chunk_size = 512

# Read the documents in the ./data directory
documents = SimpleDirectoryReader("./data").load_data()

# This could take some time depending on the number of documents
index = KnowledgeGraphIndex.from_documents(
    documents,
    storage_context=storage_context,
    max_triplets_per_chunk=10,
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
    include_embeddings=True
)

Right now your knowledge graph is populated. You can go check using the nebula console to query your graph. Next step is creating the tools for our chatbot. The first one will use the Knowledge Graph as a database to answer the questions we have. We create a Langchain NebulaGraphQAChain in order to do so. We slightly modify the prompt given to the tools to reflect the schema used in the KG. We also help the LLM know what to do when it receives the result of its query to the Nebula Graph Store.

In [None]:
graph = NebulaGraph(
    space="sej_graph_rag",
    username="root", # Your Nebula username
    password="nebula", # Your Nebula password
    address="localhost", # Change if necessary
    port=9669, # Change if necessary
    session_pool_size=30,
)
graph.refresh_schema()

ngql_prompt = PromptTemplate(input_variables=['question', 'schema'],
                             template="""Task:Generate NebulaGraph Cypher statement to query a graph database.

                             Instructions:

                             First, generate cypher then convert it to NebulaGraph Cypher dialect(rather than standard):
                             1. it requires explicit label specification only when referring to node properties: v.`entity`.name
                             2. note explicit label specification is not needed for edge properties, so it's e.name instead of e.`relationship`.name
                             3. it uses double equals sign for comparison: `==` rather than `=`
                             For instance:
                             ```diff
                             < MATCH (p:entity)-[e:relationship]->(m:entity) WHERE p.name = 'The Godfather II'
                             < RETURN p.name, e.year, m.name;
                             ---
                             > MATCH (p:`entity`)-[e:relationship]->(m:`entity`) WHERE lower(p.`entity`.`name`) == lower('The Godfather II')
                             > RETURN p.`entity`.`name`, e.relationship, m.`entity`.`name`;
                             ```

                             Use only the provided relationship types and properties in the schema.
                             Do not use any other relationship types or properties that are not provided.
                             Schema:
                             {schema}
                             Note: Do not include any explanations or apologies in your responses.
                             Do not respond to any questions that might ask anything else than for you to construct a Cypher statement.
                             Do not include any text except the generated Cypher statement.

                             The question is:
                             {question}""")
qa_prompt = PromptTemplate(input_variables=['context', 'question'],
                           template="""You are an assistant that helps to form nice and human understandable answers.
                           The information part contains the provided information that you must use to construct an answer.
                           The provided information is authoritative, you must never doubt it or try to use your internal knowledge to correct it.
                           Make the answer sound as a response to the question. Do not mention that you based the result on the given information.
                           Here is an example:

                           Question: Who is Toto?
                           Context:['p.entity.name': ['Toto', 'Toto', 'Toto'], 'e.relationship': ['Is', 'Has won', 'Knows'], 'm.entity.name': ['Pro surfer', 'The Olympics', 'Kelly Slater']]
                           Helpful Answer: Toto is a pro surfer that has won the olympics. He also knows Kelly Slater.

                           Follow this example when generating answers. Use has many meaningful triplets as possible with a maximum of 7.
                           If the provided information is empty, say that you don't know the answer.
                           Information:
                           {context}

                           Question: {question}
                           Helpful Answer:""")


nebula_chain = NebulaGraphQAChain.from_llm(
    ChatOpenAI(temperature=0),
    qa_prompt=qa_prompt,
    ngql_prompt=ngql_prompt,
    graph=graph, verbose=True
)

We now add to functions to interact with the Babbar API, one to get the metrics of the host and the other to get the backlinks ordered in decreasing BabbarAuthorityScore.

In [None]:
BABBAR_URL = "https://www.babbar.tech/api/host/overview/main?api_token="
BABBAR_LINKS = "https://www.babbar.tech/api/host/backlinks/url?api_token="
BABBAR_KEY = "" # Fill with your Babbar API key


def get_babbar_metrics(url: str) -> dict[str: float]:
    """Sends a POST request to the BABBAR API to get metrics for the url"""
    res = requests.post(BABBAR_URL + BABBAR_KEY,
                        json={'host': url})
    if res.status_code == requests.codes.OK:
        return res.json()
    else:
        return {}


def get_host_backlinks(url: str) -> dict:
    """Sends a POST request to the BABBAR API to get the backlinks for the url
    : Arguments :
    : url : (str) : The URL for the requests"""

    res = requests.post(BABBAR_LINKS + BABBAR_KEY,
                        json={'host': url, 'limit': 10, 'type': 'babbarAuthorityScore'})
    if res.status_code == requests.codes.OK:
        return res.json()
    else:
        return {}

Know we can create the "tools" our agent will use to answer our questions and initialize the agent. We will give it a memory as it will make the interactions easier. You will notice that the StructuredTool.from_function don't have a description. Langchain will use the description of the functions as the description of the tool.

In [None]:
tools = [StructuredTool.from_function(get_babbar_metrics),
        StructuredTool.from_function(get_host_backlinks),
        Tool.from_function(
             func=nebula_chain.run,
             name="Global Knowledge Base",
             description="Always use this tool first to get information about anything. If it cannot answer try something else."
         ),
         ]
am = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
agent = initialize_agent(tools, ChatOpenAI(temperature=0.0, model_name="gpt-4"),
                         agent=AgentType.CONVERSATIONAL_REACT_DESCRIPTION,
                         verbose=True, memory=am)

Now that we have an agent ready to answer questions we will build a flask interface with one route.

In [None]:
app = Flask(__name__)
app.config['FLASK_SECRET'] = 'A big secret'


def get_answer(question: str):
    """
    : Arguments :
    : question : The question we want an answer to"""

    result = agent.run(f"Using a tool first, answer the following question: {question}\nIf the tool doesn't know the answer try something else.\nPlease answer in the same language the question was asked in.")
    return {"question": question, "answer": result}

@app.route("/answer", methods=["POST"])
def answer_query():
    data = request.json
    return get_answer(data.get("query"))

app.run(port=5000)

You can now make POST requests to http://localhost:5000/answer. Make sure to send your query in JSON format with key "query".