# üåç GraphRAG Core: Climate Intelligence Tutorial

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nunezmatias/grafoRag/blob/main/examples/Tutorial_GraphRAG.ipynb)

Welcome to the **GraphRAG Core** tutorial. This notebook demonstrates a next-generation retrieval system designed for scientific discovery.

By combining vector search with a causal knowledge graph, we can answer complex questions about climate adaptation, identifying not just *what* is happening, but *why* it matters and what ripple effects it might trigger.

## 1. Installation
Install the library directly from GitHub. This will setup the engine and download required dependencies.

In [None]:
!pip install git+https://github.com/nunezmatias/grafoRag.git
!pip install -q -U google-genai

import os
from graphrag_core import GraphRAGEngine
print('‚úÖ Libraries Installed & Loaded')

## 2. Initialize the Engine
Initializing the engine is simple. If the climate data is not found locally, it will be automatically downloaded from the cloud storage.

In [None]:
engine = GraphRAGEngine()
# Output should say: "Attempting to download from Google Drive..." followed by "System Ready"

## 3. Run a Deep Research Query
We will now perform a complex search. The engine allows you to tune the depth of the investigation:

- **`top_k`** estabelece la amplitud tem√°tica.
- **`context_k`** controla la profundidad de evidencia por tema.
- **`hops`** define el razonamiento causal.

In [None]:
# Define your research question
query = "cascading risks of extreme heat and urban floods"

# Execute the Search
results = engine.search(
    query=query, 
    top_k=2, 
    context_k=4, 
    hops=2
)

print(f"--- Research Stats ---")
print(f"Primary Sources: {results['stats']['primary']}")
print(f"Context Expansion: {results['stats']['context']}")
print(f"Causal Links:      {results['stats']['graph']}")

## 4. Inspect the Intelligence
Verify the quality of the retrieved data before generating the final answer.

In [None]:
# 1. Check the Top Paper
if results['papers']:
    p = results['papers'][0]
    print(f"üìÑ Top Paper: {p['title']}")
    print(f"   Snippet: {p['content'][:200]}...")

# 2. Check Discovered Causal Chains
if results['graph_links']:
    print("")
    print("üîó Sample Causal Chains:")
    for link in results['graph_links'][:5]:
        print(f"   {link['node1']} --[{link['relation']}]--> {link['node2']}")

## 5. Construct the Expert Prompt
We package the data into a rigorous prompt for the LLM using the engine's built-in template.

In [None]:
prompt = engine.format_prompt(results, query)

print("Here is your optimized prompt (COPY THIS):")
print("--------------------------------------------------")
print(prompt)
print("--------------------------------------------------")

## 6. Generate Answer with Gemini Flash ‚ö°
Send the prompt to Google's Gemini Flash model for final synthesis.

In [None]:
from google import genai
from google.colab import userdata
from IPython.display import Markdown, display

try:
    GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
    client = genai.Client(api_key=GOOGLE_API_KEY)
    print("‚úÖ Gemini Client Configured")
except Exception as e:
    print("‚ö†Ô∏è Error: API Key not found. Please add 'GOOGLE_API_KEY' to Colab Secrets.")

print("‚è≥ Generating expert response with Gemini Flash...")
try:
    response = client.models.generate_content(
        model='gemini-flash-latest',
        contents=prompt
    )
    display(Markdown("### ü§ñ Response:"))
    display(Markdown(response.text))
except Exception as e:
    print(f"‚ùå Generation Error: {e}")

## 7. Advanced: Custom Prompt Template
Access raw variables and customize the output structure easily.

In [None]:
# OPTION 1: Quick Customization (Role & Instructions only)
# Override default persona without changing the structure.
prompt_editor = engine.format_prompt(
    results, 
    query,
    role="You are a Scientific Editor for Nature.",
    instructions="Summarize key findings in 1 paragraph for a general audience."
)
print("--- Quick Edit Prompt ---")
print(prompt_editor[:300] + "...\n")

# OPTION 2: Full Layout Control (Custom Template)
# Use markers {role}, {query}, {papers_block}, {graph_block}, {instructions} to redesign the prompt.
my_template = """# INVESTIGATION REPORT
Target: {query}
Researcher: {role}

EVIDENCE FOUND:
{papers_block}
{graph_block}

ACTION REQUIRED:
{instructions}
"""

prompt_custom = engine.format_prompt(
    results, 
    query,
    role="Senior Analyst",
    template=my_template,
    instructions="List top 3 risks identified."
)
print("--- Fully Custom Prompt ---")
print(prompt_custom[:300] + "...")

## 8. Bonus: Swap the Brain üß†
Download and load a new domain dataset.

In [None]:
# 1. Download and Extract the New Brain
!wget -q https://github.com/nunezmatias/grafoRag/raw/main/examples/test_brain.zip
!unzip -o -q test_brain.zip

# 2. Initialize the Engine with the New Data
local_engine = GraphRAGEngine(
    vector_db_path='./test_brain/test_db', 
    graph_json_path='./test_brain/test_skeleton.json'
)

# 3. Validation Search: "Effects of heat waves on health"
# This tests if the brain contains relevant knowledge on the topic.
query_test = "Effects of heat waves on health"
results_local = local_engine.search(query_test, top_k=2)

# 4. Show Stats
print(f"--- Local Brain Stats for '{query_test}' ---")
print(f"Primary Sources: {results_local['stats']['primary']}")
print(f"Propagated Links:  {results_local['stats']['graph']}")

## 9. Advanced: Load from Google Drive ‚òÅÔ∏è
Share custom knowledge bases via Drive File IDs.

In [None]:
# 1. Load Brain from Google Drive (Cloud)
# Only works if you have permission or the ID is public.
MY_CUSTOM_GDRIVE_ID = "1iKcEzECN9LTMi3bIq4ocRfFJgvb1dLus"

try:
    print(f"‚òÅÔ∏è Downloading Brain ID: {MY_CUSTOM_GDRIVE_ID}...")
    cloud_engine = GraphRAGEngine(gdrive_id=MY_CUSTOM_GDRIVE_ID)
    print("‚úÖ Custom Brain Loaded Successfully")
    
    # 2. Validation Search on the Same Topic
    query_cloud = "Effects of heat waves on health"
    results_cloud = cloud_engine.search(query_cloud, top_k=2)

    # 3. Compare Stats
    print(f"\n--- Cloud Brain Stats for '{query_cloud}' ---")
    print(f"Primary Sources: {results_cloud['stats']['primary']}")
    print(f"Propagated Links:  {results_cloud['stats']['graph']}")

except Exception as e:
    print(f"‚ùå Error loading from Drive: {e}")