# Tutorial de GraphRAG com base no LLM Graph Transformer

Notebooks desenvolvido com base nos links abaixo:
- [Graph RAG](https://python.langchain.com/docs/integrations/retrievers/graph_rag/)
- [Project Page](https://datastax.github.io/graph-rag/#example-langchain-retriever-combining-vector-and-graph-traversal)

## Imports e carregamento de variáveis de ambiente

In [1]:
from src.settings import BASE_DIR, SQLITE_DB_NAME, MONGO_URI, MONGO_DB_NAME, MONGO_COLLECTION
from src.utils.dataviz import plot_graph_documents

## Graph RAG

In [2]:
# Download the test documents

from graph_rag_example_helpers.datasets.animals import fetch_documents

animals = fetch_documents()

In [3]:
animals

[Document(id='aardvark', metadata={'type': 'mammal', 'number_of_legs': 4, 'keywords': ['burrowing', 'nocturnal', 'ants', 'savanna'], 'habitat': 'savanna', 'tags': [{'a': 5, 'b': 7}, {'a': 8, 'b': 10}]}, page_content='the aardvark is a nocturnal mammal known for its burrowing habits and long snout used to sniff out ants.'),
 Document(id='albatross', metadata={'type': 'bird', 'number_of_legs': 2, 'keywords': ['seabird', 'wingspan', 'ocean'], 'habitat': 'marine', 'tags': [{'a': 5, 'b': 8}, {'a': 8, 'b': 10}]}, page_content='the albatross is a large seabird with the longest wingspan of any bird, allowing it to glide effortlessly over oceans.'),
 Document(id='alligator', metadata={'type': 'reptile', 'number_of_legs': 4, 'keywords': ['reptile', 'jaws', 'wetlands'], 'diet': 'carnivorous', 'nested': {'a': 5}}, page_content='alligators are large reptiles with powerful jaws and are commonly found in freshwater wetlands.'),
 Document(id='alpaca', metadata={'type': 'mammal', 'number_of_legs': 4, '

In [4]:
animals_sample = animals[0:3]

In [25]:
animals_sample

[Document(id='aardvark', metadata={'type': 'mammal', 'number_of_legs': 4, 'keywords': ['burrowing', 'nocturnal', 'ants', 'savanna'], 'habitat': 'savanna', 'tags': [{'a': 5, 'b': 7}, {'a': 8, 'b': 10}]}, page_content='the aardvark is a nocturnal mammal known for its burrowing habits and long snout used to sniff out ants.'),
 Document(id='albatross', metadata={'type': 'bird', 'number_of_legs': 2, 'keywords': ['seabird', 'wingspan', 'ocean'], 'habitat': 'marine', 'tags': [{'a': 5, 'b': 8}, {'a': 8, 'b': 10}]}, page_content='the albatross is a large seabird with the longest wingspan of any bird, allowing it to glide effortlessly over oceans.'),
 Document(id='alligator', metadata={'type': 'reptile', 'number_of_legs': 4, 'keywords': ['reptile', 'jaws', 'wetlands'], 'diet': 'carnivorous', 'nested': {'a': 5}}, page_content='alligators are large reptiles with powerful jaws and are commonly found in freshwater wetlands.')]

In [12]:
# Select embeddings model:

import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
  os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

#from langchain_openai import OpenAIEmbeddings
# embedding_model = OpenAIEmbeddings(model="text-embedding-3-small")


from langchain.embeddings import HuggingFaceEmbeddings

# Use o modelo gratuito 'all-MiniLM-L6-v2' da SentenceTransformers
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

  from .autonotebook import tqdm as notebook_tqdm
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


In [26]:
# Then create a vector store and load the test documents:

from langchain_core.vectorstores import InMemoryVectorStore

vector_store = InMemoryVectorStore.from_documents(
    documents=animals,
    embedding=embedding_model,
)

In [24]:
vector_store

<langchain_core.vectorstores.in_memory.InMemoryVectorStore at 0x73fe3129a170>

In [27]:
# Graph retriever

from graph_retriever.strategies import Eager
from langchain_graph_retriever import GraphRetriever

traversal_retriever = GraphRetriever(
    store = vector_store,
    edges = [("habitat", "habitat"), ("origin", "origin")],
    strategy = Eager(k=5, start_k=1, max_depth=2),
)
results = traversal_retriever.invoke("what animals could be found near a capybara?")

for doc in results:
    print(f"{doc.id}: {doc.page_content}")
    print(doc.metadata)

capybara: capybaras are the largest rodents in the world and are highly social animals.
{'_depth': 0, '_similarity_score': np.float64(0.6954972122544383), 'type': 'mammal', 'number_of_legs': 4, 'keywords': ['rodent', 'social', 'largest'], 'habitat': 'wetlands'}
newt: newts are small amphibians known for their ability to regenerate limbs and tails.
{'_depth': 1, '_similarity_score': np.float64(0.28079772105700496), 'type': 'amphibian', 'number_of_legs': 4, 'keywords': ['regeneration', 'amphibian', 'small'], 'habitat': 'wetlands'}
crocodile: crocodiles are large reptiles with powerful jaws and a long lifespan, often living over 70 years.
{'_depth': 1, '_similarity_score': np.float64(0.23464065053115268), 'type': 'reptile', 'number_of_legs': 4, 'keywords': ['reptile', 'jaws', 'long lifespan'], 'habitat': 'wetlands'}
duck: ducks are waterfowl birds known for their webbed feet and quacking sounds.
{'_depth': 1, '_similarity_score': np.float64(0.2195624621175971), 'type': 'bird', 'number_of_

In [28]:
# Aplicando o conhecimento dentro de uma chain

from langchain.chat_models import init_chat_model
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

model="gpt-4o-mini"
llm = init_chat_model(model, model_provider="openai")


prompt = ChatPromptTemplate.from_template(
"""Answer the question based only on the context provided.

Context: {context}

Question: {question}"""
)

def format_docs(docs):
    return "\n\n".join(f"text: {doc.page_content} metadata: {doc.metadata}" for doc in docs)

chain = (
    {"context": traversal_retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [29]:
chain.invoke("what animals could be found near a capybara?")

'Animals that could be found near a capybara, based on the context provided, include newts, crocodiles, ducks, and frogs, as they all inhabit wetlands.'