# ArangoDB

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/arangodb/interactive_tutorials/blob/master/notebooks/Langchain.ipynb)

>[ArangoDB](https://github.com/arangodb/arangodb) is a scalable graph database system to drive value from
>connected data, faster. Native graphs, an integrated search engine, and JSON support, via
>a single query language. `ArangoDB` runs on-prem or in the cloud.

This notebook shows how to use LLMs to provide a natural language interface to an [ArangoDB](https://github.com/arangodb/arangodb#readme) database.

## Setting up

You can get a local `ArangoDB` instance running via the [ArangoDB Docker image](https://hub.docker.com/_/arangodb):  

```
docker run -p 8529:8529 -e ARANGO_ROOT_PASSWORD= arangodb/arangodb
```

An alternative is to use the [ArangoDB Cloud Connector package](https://github.com/arangodb/adb-cloud-connector#readme) to get a temporary cloud instance running:

In [1]:
import os

os.environ["OPENAI_API_KEY"] = "your-api-key"

In [None]:
%%capture
!pip install python-arango # The ArangoDB Python Driver
!pip install adb-cloud-connector # The ArangoDB Cloud Instance provisioner
!pip install cityhash # Hashing library
!pip install langchain-community
!pip install langchain-openai
!pip install langchain-experimental
!pip install langchain

In [3]:
# Instantiate ArangoDB Database
import json

from adb_cloud_connector import get_temp_credentials
from arango import ArangoClient

con = get_temp_credentials()

db = ArangoClient(hosts=con["url"]).db(
    con["dbName"], con["username"], con["password"], verify=True
)

print(json.dumps(con, indent=2))

Log: requesting new credentials...
Succcess: new credentials acquired
{
  "dbName": "TUTup9uhyabp5ney7pmxmmikl",
  "username": "TUT97khupg72t8o5f05gj6nb",
  "password": "TUTc1f3w36oj8mqkdpf5ofuor",
  "hostname": "tutorials.arangodb.cloud",
  "port": 8529,
  "url": "https://tutorials.arangodb.cloud:8529"
}


In [4]:
# Instantiate the ArangoDB-LangChain Graph
from langchain_community.graphs import ArangoGraph

graph = ArangoGraph(db)

## Populating database

We will rely on the `Python Driver` to import our [GameOfThrones](https://github.com/arangodb/example-datasets/tree/master/GameOfThrones) data into our database.

In [5]:
if db.has_graph("GameOfThrones"):
    db.delete_graph("GameOfThrones", drop_collections=True)

db.create_graph(
    "GameOfThrones",
    edge_definitions=[
        {
            "edge_collection": "ChildOf",
            "from_vertex_collections": ["Characters"],
            "to_vertex_collections": ["Characters"],
        },
    ],
)

documents = [
    {
        "_key": "NedStark",
        "name": "Ned",
        "surname": "Stark",
        "alive": True,
        "age": 41,
        "gender": "male",
    },
    {
        "_key": "CatelynStark",
        "name": "Catelyn",
        "surname": "Stark",
        "alive": False,
        "age": 40,
        "gender": "female",
    },
    {
        "_key": "AryaStark",
        "name": "Arya",
        "surname": "Stark",
        "alive": True,
        "age": 11,
        "gender": "female",
    },
    {
        "_key": "BranStark",
        "name": "Bran",
        "surname": "Stark",
        "alive": True,
        "age": 10,
        "gender": "male",
    },
]

edges = [
    {"_to": "Characters/NedStark", "_from": "Characters/AryaStark"},
    {"_to": "Characters/NedStark", "_from": "Characters/BranStark"},
    {"_to": "Characters/CatelynStark", "_from": "Characters/AryaStark"},
    {"_to": "Characters/CatelynStark", "_from": "Characters/BranStark"},
]

db.collection("Characters").import_bulk(documents)
db.collection("ChildOf").import_bulk(edges)

{'error': False,
 'created': 4,
 'errors': 0,
 'empty': 0,
 'updated': 0,
 'ignored': 0,
 'details': []}

## Getting and setting the ArangoDB schema

An initial `ArangoDB Schema` is generated upon instantiating the `ArangoDBGraph` object. Below are the schema's getter & setter methods should you be interested in viewing or modifying the schema:

In [6]:
# The schema should be empty here,
# since `graph` was initialized prior to ArangoDB Data ingestion (see above).

import json

print(json.dumps(graph.schema, indent=4))

{
    "graph_schema": [],
    "collection_schema": []
}


In [7]:
graph.set_schema(graph.generate_schema())

In [8]:
# We can now view the generated schema

import json

print(json.dumps(graph.schema, indent=4))

{
    "graph_schema": [
        {
            "graph_name": "GameOfThrones",
            "edge_definitions": [
                {
                    "edge_collection": "ChildOf",
                    "from_vertex_collections": [
                        "Characters"
                    ],
                    "to_vertex_collections": [
                        "Characters"
                    ]
                }
            ]
        }
    ],
    "collection_schema": [
        {
            "name": "Characters",
            "type": "document",
            "size": 4,
            "properties": [
                {
                    "name": "_key",
                    "type": "str"
                },
                {
                    "name": "_id",
                    "type": "str"
                },
                {
                    "name": "_rev",
                    "type": "str"
                },
                {
                    "name": "name",
                    "type": "

## Querying the ArangoDB database

We can now use the `ArangoDB Graph` QA Chain to inquire about our data

In [9]:
from langchain.chains import ArangoGraphQAChain
from langchain_openai import ChatOpenAI

chain = ArangoGraphQAChain.from_llm(
    ChatOpenAI(temperature=0), graph=graph, verbose=True, allow_dangerous_requests=True
)

In [10]:
chain.invoke("Is Ned Stark alive?")



[1m> Entering new ArangoGraphQAChain chain...[0m
AQL Query (1):[32;1m[1;3m
WITH Characters
FOR character IN Characters
FILTER character.name == 'Ned' && character.surname == 'Stark'
RETURN character.alive
[0m
AQL Result:
[32;1m[1;3m[True][0m

[1m> Finished chain.[0m


{'query': 'Is Ned Stark alive?',
 'result': 'Based on the information retrieved from the database, Ned Stark is indeed alive.'}

In [11]:
chain.invoke("How old is Arya Stark?")



[1m> Entering new ArangoGraphQAChain chain...[0m
AQL Query (1):[32;1m[1;3m
WITH Characters
FOR character IN Characters
FILTER character.name == 'Arya' && character.surname == 'Stark'
RETURN character.age
[0m
AQL Result:
[32;1m[1;3m[11][0m

[1m> Finished chain.[0m


{'query': 'How old is Arya Stark?',
 'result': 'Summary:\nArya Stark is 11 years old according to the information stored in the ArangoDB database.'}

In [12]:
chain.invoke("Are Arya Stark and Ned Stark related?")



[1m> Entering new ArangoGraphQAChain chain...[0m
AQL Query (1):[32;1m[1;3m
WITH Characters, ChildOf
FOR v, e IN 1..1 OUTBOUND 'Characters/AryaStark' ChildOf
    FILTER e._to == 'Characters/NedStark'
    RETURN { related: true }
[0m
AQL Result:
[32;1m[1;3m[{'related': True}][0m

[1m> Finished chain.[0m


{'query': 'Are Arya Stark and Ned Stark related?',
 'result': 'Based on the information retrieved from the database, Arya Stark and Ned Stark are indeed related.'}

In [13]:
chain.invoke("Does Arya Stark have a dead parent?")



[1m> Entering new ArangoGraphQAChain chain...[0m
AQL Query (1):[32;1m[1;3m
WITH Characters, ChildOf
FOR v, e IN 1..1 OUTBOUND 'Characters/AryaStark' ChildOf
    FILTER v.alive == false
    RETURN v
[0m
AQL Result:
[32;1m[1;3m[{'_key': 'CatelynStark', '_id': 'Characters/CatelynStark', '_rev': '_jRnIBeG--_', 'name': 'Catelyn', 'surname': 'Stark', 'alive': False, 'age': 40, 'gender': 'female'}][0m

[1m> Finished chain.[0m


{'query': 'Does Arya Stark have a dead parent?',
 'result': "Based on the information retrieved from the database, Arya Stark does have a deceased parent. Catelyn Stark, Arya's mother, is no longer alive according to the database records."}

### Chain modifiers

You can alter the values of the following `ArangoDBGraphQAChain` class variables to modify the behaviour of your chain results


In [14]:
# Specify the maximum number of AQL Query Results to return
chain.top_k = 10

# Specify whether or not to return the AQL Query in the output dictionary
chain.return_aql_query = True

# Specify whether or not to return the AQL JSON Result in the output dictionary
chain.return_aql_result = True

# Specify the maximum amount of AQL Generation attempts that should be made
chain.max_aql_generation_attempts = 5

# Specify a set of AQL Query Examples, which are passed to
# the AQL Generation Prompt Template to promote few-shot-learning.
# Defaults to an empty string.
chain.aql_examples = """
# Is Ned Stark alive?
RETURN DOCUMENT('Characters/NedStark').alive

# Is Arya Stark the child of Ned Stark?
FOR e IN ChildOf
    FILTER e._from == "Characters/AryaStark" AND e._to == "Characters/NedStark"
    RETURN e
"""

In [15]:
chain.invoke("Is Ned Stark alive?")



[1m> Entering new ArangoGraphQAChain chain...[0m
AQL Query (1):[32;1m[1;3m
RETURN DOCUMENT('Characters/NedStark').alive
[0m
AQL Result:
[32;1m[1;3m[True][0m

[1m> Finished chain.[0m


{'query': 'Is Ned Stark alive?',
 'result': 'Based on the information retrieved from the database, Ned Stark is indeed alive.',
 'aql_query': "\nRETURN DOCUMENT('Characters/NedStark').alive\n",
 'aql_result': [True]}

In [16]:
chain.invoke("Is Bran Stark the child of Ned Stark?")



[1m> Entering new ArangoGraphQAChain chain...[0m
AQL Query (1):[32;1m[1;3m
FOR e IN ChildOf
    FILTER e._from == "Characters/BranStark" AND e._to == "Characters/NedStark"
    RETURN e
[0m
AQL Result:
[32;1m[1;3m[{'_key': '266276529491', '_id': 'ChildOf/266276529491', '_from': 'Characters/BranStark', '_to': 'Characters/NedStark', '_rev': '_jRnIBia--_'}][0m

[1m> Finished chain.[0m


{'query': 'Is Bran Stark the child of Ned Stark?',
 'result': 'Based on the information retrieved from the database, Bran Stark is indeed the child of Ned Stark. The relationship between them has been confirmed through the database query, showing that Bran Stark is linked to Ned Stark as his parent.',
 'aql_query': '\nFOR e IN ChildOf\n    FILTER e._from == "Characters/BranStark" AND e._to == "Characters/NedStark"\n    RETURN e\n',
 'aql_result': [{'_key': '266276529491',
   '_id': 'ChildOf/266276529491',
   '_from': 'Characters/BranStark',
   '_to': 'Characters/NedStark',
   '_rev': '_jRnIBia--_'}]}

## Text to Graph in ArangoDB + Similarity Search


We can combine `LLMGraphTransformer`, `ArangoGraph`, and `ArangoVector` to insert text data into ArangoDB as a Graph, and then perform similarity search on the graph's Chunk, Node, and Edge embeddings.

In [17]:
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

llm = ChatOpenAI(temperature=0)

embeddings = embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",
    dimensions=256,
)

In [18]:
from langchain_experimental.graph_transformers import LLMGraphTransformer

llm_transformer = LLMGraphTransformer(llm=llm)



In [21]:
from langchain_core.documents import Document

text_1 = """
Marie Curie, born in 1867, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.
She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.
Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.
She was, in 1906, the first woman to become a professor at the University of Paris.
"""

text_2 = """
Alan Turing, born in 1912, was a British mathematician, logician, and cryptanalyst who played a crucial role in breaking the German Enigma code during World War II.
He is widely considered the father of theoretical computer science and artificial intelligence.
His work laid the foundation for modern computing, and the Turing Test remains a benchmark for AI intelligence.
Despite his contributions, he was persecuted for his homosexuality and posthumously received a royal pardon in 2013.
"""

text_3 = """
Ada Lovelace, born in 1815, was an English mathematician and writer, best known for her work on Charles Babbage’s early mechanical general-purpose computer, the Analytical Engine.
She is often regarded as the first computer programmer for her pioneering notes on algorithms that could be processed by a machine.
Her insights extended beyond simple calculations, envisioning computers as capable of handling complex tasks like music composition.
Her legacy continues to inspire women in STEM, and Ada Lovelace Day celebrates her contributions to computing.
"""

for chunk in [text_1, text_2, text_3]:
    document = Document(page_content=chunk)

    graph_doc = llm_transformer.process_response(document)

    graph.add_graph_documents(
        graph_documents=[graph_doc],
        include_source=True,
        graph_name="MyGraph",
        embeddings=embeddings,
        embedding_field="embedding",
        embed_source=True,
        embed_nodes=True,
        embed_relationships=True,
    )

In [None]:
# Option 1: Similarity Search on Chunks

from langchain_community.vectorstores.arangodb_vector import ArangoVector

vector_db = ArangoVector(
    embedding=embeddings,
    embedding_dimension=256,
    database=db,
    collection_name="MyGraph_SOURCE",  # Similarity on Chunks (i.e Source documents)
    embedding_field="embedding",
    text_field="text",
)

result = vector_db.similarity_search_with_relevance_scores(
    query="Who is Ada Lovelace?",
    k=1,
    use_approx=False,  # Approximate Nearest Neighbor only supported in >= 3.12.4
)[0]

document, score = result

print(document)
print(score)

page_content='
Ada Lovelace, born in 1815, was an English mathematician and writer, best known for her work on Charles Babbage’s early mechanical general-purpose computer, the Analytical Engine.
She is often regarded as the first computer programmer for her pioneering notes on algorithms that could be processed by a machine.
Her insights extended beyond simple calculations, envisioning computers as capable of handling complex tasks like music composition.
Her legacy continues to inspire women in STEM, and Ada Lovelace Day celebrates her contributions to computing.
'
0.8389450394321321


In [43]:
# Option 2: Similarity Search on Nodes

from langchain_community.vectorstores.arangodb_vector import ArangoVector

vector_db = ArangoVector(
    embedding=embeddings,
    embedding_dimension=256,
    database=db,
    collection_name="MyGraph_ENTITY",
    embedding_field="embedding",
    text_field="name",
)

result = vector_db.similarity_search_with_relevance_scores(
    query="mathematician",
    k=5,
    use_approx=False,  # Approximate Nearest Neighbor only supported in >= 3.12.4
)

for r in result:
    print(r[0].id, r[0].page_content, r[1])

12157092465195988495 Mathematician 0.9368022244680257
11002426697490865248 Physicist 0.5846517081010337
1204676489986466130 Alan Turing 0.47424045930427766
8721508686425501130 Chemist 0.45148191153882994
4777130905068023973 Logician 0.4398502690955969


In [45]:
# Option 3: Similarity Search on Relationships

from langchain_community.vectorstores.arangodb_vector import ArangoVector

vector_db = ArangoVector(
    embedding=embeddings,
    embedding_dimension=256,
    database=db,
    collection_name="MyGraph_LINKS_TO",
    embedding_field="embedding",
    text_field="type",
)

result = vector_db.similarity_search_with_relevance_scores(
    query="What is the nationality of Marie Curie?",
    k=5,
    use_approx=False,  # Approximate Nearest Neighbor only supported in >= 3.12.4
)

for r in result:
    edge = db.document(f"MyGraph_LINKS_TO/{r[0].id}")
    src_node = db.document(edge["_from"])
    dst_node = db.document(edge["_to"])

    print(f"{src_node['name']} {edge['type']} {dst_node['name']}", r[1])

Marie Curie NATIONALITY French 0.850415409765681
Marie Curie NATIONALITY Polish 0.8139104682558209
Marie Curie BORN_IN 1867 0.7649211608959867
Marie Curie MARRIED_TO Pierre Curie 0.7058500527972313
Marie Curie PROFESSION Chemist 0.698266397213545


**Note**: Using Approximate Nearest Neighbors (ANN) for similarity search is recommended for large datasets, and is supported in ArangoDB as of version 3.12.4. Read more here: https://arangodb.com/2024/11/vector-search-in-arangodb-practical-insights-and-hands-on-examples/