<a href="https://colab.research.google.com/github/neo4j-contrib/ms-graphrag-neo4j/blob/main/examples/neo4j_weaviate_combined.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

>[Naive RAG vs GraphRAG with Neo4J & Weaviate](#scrollTo=n3QFDMrgAkCo)

>>[Install Dependencies](#scrollTo=n3QFDMrgAkCo)

>>[Write Documents to Weaviate Cloud](#scrollTo=nqwuGr0Xhgtm)

>>[Classic RAG with OpenAI](#scrollTo=-uAAWPQXBUdX)

>>[Graph RAG](#scrollTo=zzBnUF4bBYKG)

>>>[Build a Graph with Neo4J](#scrollTo=zzBnUF4bBYKG)

>>>[Extract Relevant Entities](#scrollTo=FVzpKJViBiJT)

>>>[Summarize Nodes and Communities](#scrollTo=j1wAsUfIBrGc)

>>>[Write the Entities to Weaviate](#scrollTo=n105cc-_B9bN)



# Naive RAG vs GraphRAG with Neo4J & Weaviate

In this recipe, we will be walking through 2 ways of doing RAG:

1. Classic RAG where we do simple vector search, followed be answer generation based on this context
2. Graph RAG, making use of both vector search, combined by a graph representation of our dataset including community and node summaries

For this example, we will be using a generated dataset called "Financial Contracts", that lists (fake) contracts sugned between individuals and companies.

## Install Dependencies

In [None]:
!pip install --quiet --upgrade git+https://github.com/neo4j-contrib/ms-graphrag-neo4j.git datasets weaviate-client neo4j-graphrag

## Write Documents to Weaviate Cloud

To get started, you can use a free Weaviate Sandbox.

1. Create a cluster
2. Take note of the cluster URL and API key
3. Go to 'Embeddings' and turn it on.

In [None]:
import os
from getpass import getpass

if "WEAVIATE_API_KEY" not in os.environ:
  os.environ["WEAVIATE_API_KEY"] = getpass("Weaviate API Key")
if "WEAVIATE_URL" not in os.environ:
  os.environ["WEAVIATE_URL"] = getpass("Weaviate URL")

In [None]:
import weaviate
from weaviate.auth import Auth

client = weaviate.connect_to_weaviate_cloud(
    cluster_url=os.environ.get("WEAVIATE_URL"),
    auth_credentials=Auth.api_key(os.environ.get("WEAVIATE_API_KEY")),
)

In [None]:
from weaviate.classes.config import Configure

#client.collections.delete("Financial_contracts")
client.collections.create(
    "Financial_contracts",
    description="A dataset of financial contracts between indivicuals and/or companies, as well as information on the type of contract and who has authored them.",
    vectorizer_config=Configure.Vectorizer.text2vec_weaviate(),
)

In [None]:
from datasets import load_dataset

financial_dataset = load_dataset("weaviate/agents", "query-agent-financial-contracts", split="train", streaming=True)

In [None]:
financial_collection = client.collections.get("Financial_contracts")

with financial_collection.batch.dynamic() as batch:
    for item in financial_dataset:
        batch.add_object(properties=item["properties"])

## Classic RAG with OpenAI

In [None]:
from openai import AsyncOpenAI

openai_client = AsyncOpenAI()

async def achat(messages, model="gpt-4o", temperature=0, config={}):
    response = await openai_client.chat.completions.create(
        model=model,
        temperature=temperature,
        messages=messages,
        **config,
    )
    return response.choices[0].message.content

In [None]:
async def classic_rag(input: str) -> str:
    context = [str(obj.properties) for obj in financial_collection.query.near_text(query = input, limit=3).objects]
    messages = [
    {
        "role": "user",
        "content": "Based on the given context: {context} \n\n Answer the following question: {question}".format(context=context, question=input)
    },
    ]
    output = await achat(messages, model="gpt-4o")
    return output

In [None]:
response = await classic_rag("What do you know about Weaviate")
print(response)

## Graph RAG

### Build a Graph with Neo4J


In [None]:
import os
from getpass import getpass

from ms_graphrag_neo4j import MsGraphRAG
from neo4j import GraphDatabase
import pandas as pd

# Use Neo4j Sandbox - Blank Project https://sandbox.neo4j.com/

os.environ["OPENAI_API_KEY"]= getpass("Openai API Key:")
os.environ["NEO4J_URI"]="bolt://3.218.248.139:7687"
os.environ["NEO4J_USERNAME"]="neo4j"
os.environ["NEO4J_PASSWORD"]="outfit-streams-oxygens"

In [None]:
driver = GraphDatabase.driver(
    os.environ["NEO4J_URI"],
    auth=(os.environ["NEO4J_USERNAME"], os.environ["NEO4J_PASSWORD"]),
    #notifications_min_severity="OFF",
)
ms_graph = MsGraphRAG(driver=driver, model="gpt-4o", max_workers=10)

In [None]:
import pandas as pd

# Login using e.g. `huggingface-cli login` to access this dataset
df = pd.read_parquet("hf://datasets/weaviate/agents/query-agent/financial-contracts/0001.parquet")
df.head()

In [None]:
texts = [el['contract_text'] for el in df['properties']]
texts[:2]

### Extract Relevant Entities

Next, we will start extracting relevant entities and relations between these entities that we might be interested in.

In [None]:
allowed_entities = ["Person", "Organization", "Location"]

await ms_graph.extract_nodes_and_rels(texts, allowed_entities)

### Summarize Nodes and Communities

In [None]:
await ms_graph.summarize_nodes_and_rels()

In [None]:
await ms_graph.summarize_communities()

In [None]:
entities = ms_graph.query("""
MATCH (e:__Entity__)
RETURN e.name AS entity_id, e.summary AS entity_summary
""")

In [None]:
entities[:2]

### Write the Entities to Weaviate

In [None]:
from weaviate.classes.config import Configure

# client.collections.delete("Entities")
client.collections.create(
    "Entities",
    description="A dataset of entities appearing in the financial contracts between indivicuals and/or companies, as well as information on the type of contract and who has authored them.",
    vectorizer_config=Configure.Vectorizer.text2vec_weaviate(),
)

In [None]:
from datasets import IterableDataset

# Define a simple generator
def list_generator(data):
    for item in data:
        yield item

# Create the IterableDataset
entities_dataset = IterableDataset.from_generator(list_generator, gen_kwargs={"data": entities})

In [None]:
entities_collection = client.collections.get("Entities")

with entities_collection.batch.dynamic() as batch:
    for item in entities_dataset:
        batch.add_object(properties=item)

In [None]:
from neo4j_graphrag.retrievers import WeaviateNeo4jRetriever

retriever = WeaviateNeo4jRetriever(
    driver=driver,
    client=client,
    collection="Entities",
    id_property_external="entity_id",
    id_property_neo4j="name",
    retrieval_query="""
    WITH collect(node) as nodes
WITH collect {
    UNWIND nodes as n
    MATCH (n)<-[:MENTIONS]->(c:__Chunk__)
    WITH c, count(distinct n) as freq
    RETURN c.text AS chunkText
    ORDER BY freq DESC
    LIMIT 3
} AS text_mapping,
collect {
    UNWIND nodes as n
    MATCH (n)-[:IN_COMMUNITY*]->(c:__Community__)
    WHERE c.summary IS NOT NULL
    WITH c, c.rating as rank
    RETURN c.summary
    ORDER BY rank DESC
    LIMIT 3
} AS report_mapping,
collect {
    UNWIND nodes as n
    MATCH (n)-[r:SUMMARIZED_RELATIONSHIP]-(m)
    WHERE m IN nodes
    RETURN r.summary AS descriptionText
    LIMIT 3
} as insideRels,
collect {
    UNWIND nodes as n
    RETURN n.summary AS descriptionText
} as entities
RETURN {Chunks: text_mapping, Reports: report_mapping,
       Relationships: insideRels,
       Entities: entities} AS output
    """
)

In [None]:
async def hybrid_graph_embedding_rag(input: str) -> str:
    context = [str(el[1]) for el in retriever.search(query_text=input, top_k=3)]
    messages = [
    {
        "role": "user",
        "content": "Based on the given context: {context} \n\n Answer the following question: {question}".format(context=context, question=input)
    },
    ]
    output = await achat(messages, model="gpt-4o")
    return output

In [None]:
response = await hybrid_graph_embedding_rag(input="What do you know about Weaviate")
print(response)