# RAG vs. GraphRAG: 

## Overview

This notebook provides a rigorous, side-by-side comparison of Standard RAG (Vector-based) and GraphRAG (Graph-based) focusing on the Global Intelligence & security domain.

### The Challenge: Navigating Fragmentation
In intelligence work, facts are scattered. One report might mention a person, another a location, and a third a specific project. Vector search often fails to bridge these "semantic gaps" if the keywords aren't directly co-located.

We will demonstrate how GraphRAG creates a "Chain of Evidence" that Vector RAG simply cannot see.

### Key Semantica Modules Utilized

| Pipeline Stage | Modules Selection |
| :--- | :--- |
| **Intelligence Gathering** | `semantica.ingest`, `semantica.normalize` |
| **Vector Pipeline** | `semantica.split`, `semantica.vector_store` |
| **Graph Pipeline** | `semantica.kg`, `semantica.deduplication`, `semantica.conflicts` |
| **Inference/Reasoning**| `semantica.reasoning`, `semantica.pipeline` |
| **Interface** | `semantica.context`, `semantica.visualization` |

In [None]:
# Setup
!pip install -qU semantica networkx matplotlib plotly pandas faiss-cpu

## 1. Domain Acquisition: Real-World Intelligence Feeds

We ingest from real-world feeds to build our knowledge base. We'll look for connections across global security news and official reports.

In [None]:
from semantica.ingest import WebIngestor, FeedIngestor
from semantica.normalize import TextNormalizer

normalizer = TextNormalizer()
all_content = []

print("Gathering Intelligence Data...")

# 1. Global News Feeds
feeds = [
    "http://feeds.bbci.co.uk/news/world/rss.xml",
    "https://www.reutersagency.com/feed/" 
]
feed_ingestor = FeedIngestor()
for f in feeds:
    docs = feed_ingestor.ingest(f)[:5]
    all_content.extend([d.content if hasattr(d, 'content') else str(d) for d in docs])

# 2. Official Intelligence/Security Overviews
web_urls = [
    "https://www.cia.gov/the-world-factbook/",
    "https://www.un.org/en/observances/security-council-day"
]
web_ingestor = WebIngestor()
for url in web_urls:
    docs = web_ingestor.ingest(url, method="url")
    all_content.extend([d.content if hasattr(d, 'content') else str(d) for d in docs])

clean_docs = [normalizer.normalize(text) for text in all_content if len(text) > 100]

print(f"\nIntelligence Knowledge Hub Populated with {len(clean_docs)} reports.")

## 2. Standard Vector RAG Pipeline

The baseline approach: Linear retrieval via semantic overlap.

In [None]:
from semantica.core import Semantica, ConfigManager
from semantica.split import TextSplitter
from semantica.vector_store import VectorStore

v_core = Semantica(config=ConfigManager().load_from_dict({
    "embedding": {"provider": "openai", "model": "text-embedding-3-small"},
    "vector_store": {"provider": "faiss", "dimension": 1536}
}))

splitter = TextSplitter(method="recursive", chunk_size=800, chunk_overlap=100)
chunks = []
for doc in clean_docs:
    chunks.extend(splitter.split(doc))

vs = VectorStore(backend="faiss", dimension=1536)
embeddings = v_core.embedding_generator.generate_embeddings([str(c) for c in chunks[:20]])
vs.store_vectors(vectors=embeddings, metadata=[{"text": str(c)} for c in chunks[:20]])

print(f"Vector RAG ready with {len(chunks[:20])} encoded fragments.")

## 3. High-Fidelity GraphRAG Pipeline

Utilizing Entity resolution and relationship synthesis to bridge reports.

In [None]:
from semantica.kg import GraphBuilder
from semantica.deduplication import DuplicateDetector
from semantica.conflicts import ConflictDetector

gb = GraphBuilder(merge_entities=True)
kg = gb.build(sources=[{"text": text} for text in clean_docs[:10]])

detector = DuplicateDetector(similarity_threshold=0.85)

print(f"GraphRAG Synthesis Complete: {kg.number_of_nodes()} Entity Nodes mapped.")

## 4. The Intelligence Test: Multi-Source Linkage

Intelligence query: "What are the current global security challenges mentioned across different regions?"

Vector RAG will likely return fragments about specific countries but fail to group them. GraphRAG will traverse nodes of type Region to find shared CHALLENGE edges.

In [None]:
from semantica.reasoning import GraphReasoner

query = "Identify interconnected security risks across the UN and major regions."
print(f"Investigative Query: {query}\n")

print("--- Standard Vector Recall ---")
q_vec = v_core.embedding_generator.generate_embeddings(query)
v_res = vs.search_vectors(q_vec, k=3)
for r in v_res:
    print(f"Recall: {r['metadata']['text'][:150]}...")

print("\n--- Graph Intelligence Reasoning ---")
reasoner = GraphReasoner(graph=kg)
g_res = reasoner.reason(query, depth=2)
print(f"Final Combined Intelligence: {g_res[:400]}...")

## 5. Visualizing the Semantic Network

We visualize how the Semantica engine has mapped the relationships between global actors and current events.

In [None]:
from semantica.visualization import KGVisualizer
import matplotlib.pyplot as plt

KGVisualizer().visualize_network(
    kg, 
    layout="spring", 
    output="static",
    title="Intelligence Connectivity Map"
)
plt.show()