# üåç GraphRAG Core: Climate Intelligence Tutorial

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nunezmatias/grafoRag/blob/main/examples/Tutorial_GraphRAG.ipynb)

Welcome to the **GraphRAG Core** tutorial. This notebook demonstrates a next-generation retrieval system designed for scientific discovery. Unlike traditional search engines that return isolated documents, this system understands the *structure* of knowledge.

By combining vector search with a causal knowledge graph, we can answer complex questions about climate adaptation, identifying not just *what* is happening, but *why* it matters and what ripple effects it might trigger.


## 1. Setup & Installation
We will install the `graphrag_core` library directly from the repository. This package includes the retrieval engine and automatically handles the download of the Climate Knowledge Graph (~300MB).


In [None]:
!pip install git+https://github.com/nunezmatias/grafoRag.git
!pip install -q -U google-genai

import os
from graphrag_core import GraphRAGEngine
print('‚úÖ Libraries Installed & Loaded')


## 2. Initialize the Engine
Initializing the engine is simple. If the climate data is not found locally, it will be automatically downloaded from the cloud storage. This ensures you have the latest version of the knowledge graph.


In [None]:
engine = GraphRAGEngine()
# Watch the output below for the download progress bar


## 3. Run a Deep Research Query
We will now perform a complex search. The engine allows you to tune the depth of the investigation:

- **`top_k`** establishes the **Breadth** of the investigation. It scans the entire library to find the core concepts related to your query.
- **`context_k`** dictates the **Depth**. Instead of one paper, the engine reads multiple documents per topic to find consensus and nuance.
- **`hops`** activates the **Causal Reasoning** layer. It follows the connections in the graph to find cascading risks.

A configuration of `hops=2` allows us to see second-order effects, essential for systemic analysis.


In [None]:
# Define your research question
query = "cascading risks of extreme heat and urban floods"

# Execute the Search
results = engine.search(
    query=query, 
    top_k=3,        # Breadth
    context_k=4,    # Depth
    hops=2          # Causality
)

print("--- Research Stats ---")
print(f"Primary Sources: {results['stats']['primary']}")
print(f"Context Expansion: {results['stats']['context']}")
print(f"Causal Links:      {results['stats']['graph']}")


## 4. Inspect the Retrieved Intelligence
It is important to verify the quality of the retrieved data. This "White Box" approach builds trust by showing you the exact evidence found before the AI summarizes it.


In [None]:
# 1. Check the Top Paper
if results['papers']:
    p = results['papers'][0]
    title = p['title']
    content = p['content'][:200]
    print(f'Top Paper: {title}')
    print(f'Snippet: {content}...')

# 2. Check Discovered Causal Chains
if results['graph_links']:
    print('
Sample Causal Chains Discovered:')
    for link in results['graph_links'][:5]:
        n1 = link['node1']
        n2 = link['node2']
        rel = link['relation']
        print(f'   {n1} --[{rel}]--> {n2}')


## 5. Construct the Expert Prompt
We use the engine's built-in expert template to package this structured data into a rigorous prompt for the LLM. This template forces the model to triangulate evidence and cite specific sources.


In [None]:
# Using the default expert template designed for this Climate Graph
prompt = engine.format_prompt(results, query)

print("Here is your optimized prompt (COPY THIS):")
print("--------------------------------------------------")
print(prompt)
print("--------------------------------------------------")


## 6. Generate Answer with Gemini Flash ‚ö°
Finally, we send the generated prompt to Google's Gemini model to synthesize the final report.

**Prerequisite:** Add your API Key to Colab Secrets (Key icon on the left) with the name `GOOGLE_API_KEY`.


In [None]:
from google import genai
from google.colab import userdata
from IPython.display import Markdown, display

try:
    GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
    client = genai.Client(api_key=GOOGLE_API_KEY)
    print('‚úÖ Gemini Client Configured')
except Exception as e:
    print('‚ö†Ô∏è Error: API Key not found in Colab Secrets.')

print('‚è≥ Generating expert response with Gemini Flash...')
try:
    response = client.models.generate_content(
        model='gemini-flash-latest',
        contents=prompt
    )
    display(Markdown('### ü§ñ Response:'))
    display(Markdown(response.text))
except Exception as e:
    print(f'‚ùå Error: {e}')


## 7. Advanced: Build Your Own Prompt Template
Do you want full control? Here is how you can access the raw variables `papers` and `graph_links` to modify the prompt structure entirely before sending it to the LLM.


In [None]:
my_role = "You are a Data Journalist writing for a general audience."
my_instruction = "Summarize the risks in 3 bullet points. Be concise."

# 1. Flatten the Papers data into a string
papers_text = ""
for p in results['papers']:
    t = p['title']
    c = p['content'][:200]
    papers_text += f'- {t}: {c}...
'

# 2. Flatten the Graph data into a string
graph_text = ""
for link in results['graph_links']:
    n1 = link['node1']
    n2 = link['node2']
    graph_text += f'- {n1} causes {n2}
'

# 3. Build the F-String (Edit this!)
custom_prompt = f"ROLE: {my_role}
QUESTION: {query}

DATA:
{papers_text}
{graph_text}

DO: {my_instruction}"
print("Custom prompt created successfully.")


## 8. Bonus: Swap the Brain üß† (Test a New Dataset)
You can swap the underlying "Brain" instantly by providing a new Vector DB and Graph JSON.

In this example, we will download a small **Test Brain** about Solar Energy from the repository and initialize a new engine with it.


In [None]:
# 1. Download the Test Brain (Solar Energy) from GitHub
!wget -q https://github.com/nunezmatias/grafoRag/raw/main/examples/test_brain.zip
!unzip -o -q test_brain.zip

# 2. Initialize a NEW Engine with this data
solar_engine = GraphRAGEngine(
    vector_db_path='./test_brain/test_db',
    graph_json_path='./test_brain/test_skeleton.json'
)

# 3. Run a Query on the new domain
solar_query = 'How does solar energy affect the grid stability?'
solar_results = solar_engine.search(solar_query, top_k=2)

print(f'Query: {solar_query}')
p_count = len(solar_results['papers'])
l_count = len(solar_results['graph_links'])
print(f'Found {p_count} papers and {l_count} links.')

if solar_results['graph_links']:
    link = solar_results['graph_links'][0]
    n1, n2, rel = link['node1'], link['node2'], link['relation']
    print(f'Link Found: {n1} --[{rel}]--> {n2}')


## 9. Advanced: Load your own Brain from Google Drive ‚òÅÔ∏è
If you have your Knowledge Graph and Vector DB packaged in a `.zip` and hosted on Google Drive, you can load it directly by passing its **File ID**.

**Note:** Ensure you upload your own `test_brain.zip` to Drive and replace the ID below.


In [None]:
# Replace the ID below with your own File ID from Google Drive
MY_CUSTOM_GDRIVE_ID = "1iKcEzECN9LTMi3bIq4ocRfFJgvb1dLus"

try:
    # Initialize from Drive
    drive_engine = GraphRAGEngine(gdrive_id=MY_CUSTOM_GDRIVE_ID)
    print('‚úÖ Custom Brain Loaded from Google Drive')
    
    # Run the SAME query as above to verify consistency
    solar_query = 'How does solar energy affect the grid stability?'
    drive_results = drive_engine.search(solar_query, top_k=2)

    print(f'Query: {solar_query}')
    p_count = len(drive_results['papers'])
    l_count = len(drive_results['graph_links'])
    print(f'Found {p_count} papers and {l_count} links.')

    if drive_results['graph_links']:
        link = drive_results['graph_links'][0]
        n1, n2, rel = link['node1'], link['node2'], link['relation']
        print(f'Link Found: {n1} --[{rel}]--> {n2}')

except Exception as e:
    print(f'‚ùå Error: {e}')
