[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hawksight-AI/semantica/blob/main/cookbook/use_cases/advanced_rag/01_GraphRAG_Complete.ipynb)

# üß™ GraphRAG: Skincare Intelligence System
## üìñ Overview

This notebook demonstrates the construction and querying of a **highly detailed Knowledge Graph** for the Skincare and Dermatology domain. 

### üèóÔ∏è Pipeline Architecture
1. **Phase 0: Foundation**: Environment setup and "Ground Truth" seeding.
2. **Phase 1: Multi-Source Ingestion**: Aggregating knowledge from Expert RSS Feeds and clinical guides.
3. **Phase 2: Semantic Extraction**: Deep extraction using `semantica.semantic_extract` via **Llama 3.1 8B**.
4. **Phase 3: Refinement**: Autonomous deduplication and conflict resolution.
5. **Phase 4: Multi-Hop Question Answering**: Interactive user queries using **Graph-Enhanced Retrieval**.

---

In [None]:
# Install dependencies
!pip install -qU semantica networkx matplotlib plotly pandas faiss-cpu beautifulsoup4 groq sentence-transformers

## üõ†Ô∏è Phase 0: Environment & Foundation
We configure **Groq** as our primary LLM provider and use **Sentence-Transformers** for local embeddings to avoid API dependencies.

In [None]:
import os
import json
import pandas as pd
from semantica.core import Semantica, ConfigManager
from semantica.seed import SeedDataManager
from semantica.vector_store import VectorStore

# 1. Groq Configuration
import getpass
if "GROQ_API_KEY" not in os.environ:
    os.environ["GROQ_API_KEY"] = getpass.getpass("Enter your Groq API Key: ")

config_dict = {
    "project_name": "Skincare_Intelligence",
    "embedding": {"provider": "sentence_transformers", "model": "all-MiniLM-L6-v2"}, 
    "extraction": {
        "provider": "groq", 
        "model": "llama-3.1-8b-instant", 
        "temperature": 0.0
    },
    "inference": {
        "provider": "groq",
        "model": "llama-3.1-70b-versatile"
    },
    "vector_store": {"provider": "faiss", "dimension": 384},
    "knowledge_graph": {"backend": "networkx", "merge_entities": True}
}

config = ConfigManager().load_from_dict(config_dict)
core = Semantica(config=config)
vs = VectorStore(backend="faiss", dimension=384)

# 2. Seeding Ground Truth
foundation_data = {
    "entities": [
        {"id": "hyaluronic_acid", "name": "Hyaluronic Acid", "type": "Ingredient", "properties": {"role": "Humectant"}},
        {"id": "retinol", "name": "Retinol", "type": "Ingredient", "properties": {"role": "Anti-aging actives"}},
        {"id": "niacinamide", "name": "Niacinamide", "type": "Ingredient", "properties": {"role": "Barrier repair"}}
    ],
    "relationships": [
        {"source": "hyaluronic_acid", "target": "niacinamide", "type": "COMPLEMENTS", "properties": {"benefit": "Hydration + Barrier"}}
    ]
}

with open("skincare_base.json", "w") as f: json.dump(foundation_data, f)
seed_manager = SeedDataManager()
seed_manager.register_source("core_ontology", "json", "skincare_base.json")
foundation_graph = seed_manager.create_foundation_graph()

print(f"‚úÖ Phase 0 Complete. Seeded {len(foundation_data['entities'])} primary nodes.")

## üì• Phase 1: Ingestion & Processing
We pull real-world knowledge from expert feeds and clinical guides, then split them into semantic chunks.

In [None]:
from semantica.ingest import ingest_feed, ingest_file
from semantica.split import EntityAwareChunker
from semantica.normalize import TextNormalizer

sources = []

# 1. RSS Ingestion
feed_urls = [
    "https://makeupandbeautyblog.com/feed",
    "https://www.westlakedermatology.com/feed",
    "https://www.contourderm.com/feed",
    "https://www.michelegreenmd.com/feed",
    "https://www.skincarephysicians.net/blog/feed/",
    "https://www.beautifulwithbrains.com/blog/feed/",
    "https://www.drbaileyskincare.com/blogs/blog.atom"
]

for url in feed_urls:
    try:
        print(f"Ingesting from: {url}")
        feed_data = ingest_feed(url, method="rss")
        sources.extend([item.content or item.description for item in feed_data.items[:3]])
    except Exception as e:
        print(f"Failed to ingest {url}: {e}")

# 2. Local Expert Guide
expert_content = """
RETINOL CLINICAL GUIDE
Mechanism: Binds to retinoic acid receptors to increase cellular turnover.
Precautions: Should not be used with high-concentration AHA/BHA exfoliants.
Synergy: Highly effective when paired with Niacinamide to offset potential erythema.
"""
with open("expert_guide.txt", "w") as f: f.write(expert_content)
sources.append(expert_content)

# 3. Chunking & Normalization
normalizer = TextNormalizer()
chunker = EntityAwareChunker(chunk_size=1000, chunk_overlap=200)
all_chunks = []
for text in sources:
    if not text: continue
    norm_text = normalizer.normalize(text)
    all_chunks.extend(chunker.chunk(norm_text))

print(f"‚úÖ Phase 1 Complete. Generated {len(all_chunks)} semantic chunks.")

## üß† Phase 2: Semantic Extraction
Using **Groq (Llama 3.1 8B)** to extract entities and relationships.

In [None]:
from semantica.semantic_extract import NERExtractor, RelationExtractor

ner = NERExtractor(method="llm", provider="groq", model="llama-3.1-8b-instant")
rel_ext = RelationExtractor(method="llm", provider="groq", model="llama-3.1-8b-instant")

combined_results = {"entities": [], "relationships": []}

print("Extracting intelligence from chunks...")
for chunk in all_chunks[:5]:
    txt = str(chunk.text)
    entities = ner.extract(txt)
    combined_results["entities"].extend([{"name": e.text, "type": e.label, "id": e.text.lower().replace(' ', '_')} for e in entities])
    relations = rel_ext.extract(txt, entities=entities)
    combined_results["relationships"].extend([{"source": r.subject, "target": r.object, "type": r.predicate} for r in relations])

print(f"‚úÖ Phase 2 Complete. Extracted {len(combined_results['entities'])} entities.")

## ‚ú® Phase 3: Graph Refinement
Merging fragments and building the final Knowledge Graph.

In [None]:
from semantica.kg import GraphBuilder, EntityResolver

# 1. Build & Vectorize
gb = GraphBuilder(merge_entities=True)
kg = gb.build([combined_results])

# 2. Deduplicate
# Use EntityResolver for comprehensive deduplication and merging
resolver = EntityResolver(similarity_threshold=0.85)
kg_final = {**kg, 'entities': resolver.resolve_entities(kg['entities'])}

# 3. Populate Vector Store for retrieval
texts = [str(c.text) for c in all_chunks]
embeddings = core.embedding_generator.generate_embeddings(texts)
vs.store_vectors(vectors=embeddings, metadata=[{"text": t} for t in texts])

print(f"‚úÖ Phase 3 Complete. Graph contains {len(kg_final['entities'])} resolved entities.")

## üí¨ Phase 4: Interactive GraphRAG Question Answering
Enter a skincare query to see the **multi-hop retrieval** and **final answer**.

In [None]:
from semantica.context import ContextRetriever
from semantica.reasoning import InferenceEngine

# 1. Initialize Retriever & Engine
retriever = ContextRetriever(
    vector_store=vs, 
    knowledge_graph=kg_final, 
    use_graph_expansion=True,
    max_expansion_hops=2
)

engine = InferenceEngine(provider="groq", model="llama-3.1-70b-versatile")

# 2. Interactive Query Loop
user_query = input("Enter your skincare query: ") or "What ingredients synergize with Retinol to prevent irritation?"

print(f"\nüîç Processing Multi-Hop Query: {user_query}")

# Retrieve multi-hop context
context_results = retriever.retrieve(user_query, max_results=2)

if context_results:
    # Show the reasoning path (Connected Entities)
    if context_results[0].related_entities:
        print("\n--- üß† MULTI-HOP CONTEXT DISCOVERED ---")
        for ent in context_results[0].related_entities[:5]:
            print(f"- {ent['content']} ({ent['type']}) via {ent['relationship']}")

    # Generate Final Answer
    context_text = " ".join([r.content for r in context_results])
    prompt = f"Based on the following context, answer the user query accurately.\nContext: {context_text}\nQuery: {user_query}"
    final_answer = engine.generate(prompt)

    print("\n--- ‚ú® FINAL GRAPHRAG ANSWER ---")
    print(final_answer)
else:
    print("\n‚ùå No relevant context found in the knowledge graph.")

## üìä Visualizing the Intelligence
A final look at the relationships we've built.

In [None]:
from semantica.visualization import KGVisualizer
import matplotlib.pyplot as plt

viz = KGVisualizer()
viz.visualize_network(kg_final, layout="spring", title="Skincare Ingredient Intelligence Graph")
plt.show()