# 17. Practical Knowledge Graph with Advanced Features

Ïù¥ ÌäúÌÜ†Î¶¨ÏñºÏóêÏÑúÎäî Ïã§Ï†Ñ Knowledge Graph Íµ¨Ï∂ï ÌîÑÎ°úÏ†ùÌä∏Î•º ÏßÑÌñâÌï©ÎãàÎã§.

## ÌîÑÎ°úÏ†ùÌä∏ Í∞úÏöî

**Î™©Ìëú**: AI Ïó∞Íµ¨ ÎÖºÎ¨∏Îì§ÏùÑ Î∂ÑÏÑùÌïòÏó¨ Ïó∞Íµ¨Ïûê, Í∏∞Ïà†, Í∏∞Í¥Ä Í∞ÑÏùò Í¥ÄÍ≥Ñ ÎÑ§Ìä∏ÏõåÌÅ¨Î•º Íµ¨Ï∂ï

**ÏÇ¨Ïö© Í∏∞Ïà†**:
1. **Knowledge Graph** - ÏóîÌã∞Ìã∞/Í¥ÄÍ≥Ñ Ï∂îÏ∂ú Î∞è Í∑∏ÎûòÌîÑ Íµ¨Ï∂ï
2. **Distributed Features** - Ï∫êÏã±, Rate Limiting, Event Streaming
3. **Batch Processing** - Î≥ëÎ†¨ Î¨∏ÏÑú Ï≤òÎ¶¨
4. **Model Router** - ÎπÑÏö© ÏµúÏ†ÅÌôî Î™®Îç∏ ÏÑ†ÌÉù
5. **Real-time Streaming** - ÏßÑÌñâ ÏÉÅÌô© Ïã§ÏãúÍ∞Ñ Î™®ÎãàÌÑ∞ÎßÅ

## Î™©Ï∞®

1. **Setup & Configuration**
2. **Data Preparation**
3. **Building the Knowledge Graph**
4. **Querying the Graph**
5. **Graph-based RAG**
6. **Performance Analysis**
7. **Visualization & Export**

## 1. Setup & Configuration

In [None]:
import asyncio
import time
from typing import List, Dict, Any

from beanllm import Client
from beanllm.facade.advanced.knowledge_graph_facade import KnowledgeGraph
from beanllm.infrastructure.distributed import (
    update_pipeline_config,
    get_pipeline_config,
)
from beanllm.infrastructure.routing import (
    ModelRouter,
    RoutingStrategy,
    RequestCharacteristics,
    create_default_router,
)

print("‚úÖ Imports successful")

In [None]:
# Configure distributed features
print("Configuring distributed features...")

# Enable all distributed features for Knowledge Graph pipeline
update_pipeline_config(
    pipeline_type="knowledge_graph",
    enable_cache=True,
    enable_rate_limiting=True,
    enable_event_streaming=True,
    kg_cache_ttl=7200,  # 2 hours
)

# Verify configuration
config = get_pipeline_config("knowledge_graph")
print("\nüìã Knowledge Graph Pipeline Config:")
print(f"   Cache: {config.enable_cache}")
print(f"   Rate Limiting: {config.enable_rate_limiting}")
print(f"   Event Streaming: {config.enable_event_streaming}")
print(f"   Cache TTL: {config.kg_cache_ttl}s")

In [None]:
# Initialize Model Router for cost-optimized model selection
print("\nInitializing Model Router...")

model_router = create_default_router(strategy=RoutingStrategy.COMPLEXITY_BASED)
print(f"‚úÖ Model Router initialized with {len(model_router.models)} models")
print(f"   Strategy: {model_router.strategy.value}")

In [None]:
# Initialize KnowledgeGraph
client = Client(provider="openai", api_key="your-api-key")
kg = KnowledgeGraph(client=client)

print("\n‚úÖ Setup complete!")

## 2. Data Preparation

Ïã§Ï†ú AI Ïó∞Íµ¨ ÎÖºÎ¨∏ Ï¥àÎ°ù Îç∞Ïù¥ÌÑ∞Î•º Ï§ÄÎπÑÌï©ÎãàÎã§.

In [None]:
# Real AI research paper abstracts
research_papers = [
    # Deep Learning
    """Deep Learning has revolutionized artificial intelligence. 
    Convolutional Neural Networks (CNNs) were pioneered by Yann LeCun at Bell Labs in 1989.
    AlexNet, developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton at the University of Toronto,
    won the ImageNet competition in 2012, marking a breakthrough in computer vision.
    ResNet, introduced by Kaiming He and colleagues at Microsoft Research in 2015,
    introduced residual connections enabling training of very deep networks.""",
    
    # Transformers & NLP
    """The Transformer architecture revolutionized Natural Language Processing.
    Attention Is All You Need was published by researchers at Google Brain in 2017,
    including Ashish Vaswani and Noam Shazeer.
    BERT, introduced by Jacob Devlin and colleagues at Google in 2018,
    demonstrated the power of bidirectional pre-training.
    GPT-3, released by OpenAI in 2020 with 175 billion parameters,
    showed emergent capabilities through scale.
    The model was developed by a team led by Sam Altman and Ilya Sutskever.""",
    
    # Reinforcement Learning
    """Reinforcement Learning enables agents to learn from interaction with environments.
    Q-Learning was developed by Chris Watkins in his PhD thesis at Cambridge University in 1989.
    Deep Q-Network (DQN), introduced by DeepMind in 2013,
    combined deep learning with reinforcement learning.
    AlphaGo, developed by Demis Hassabis and David Silver at DeepMind,
    defeated world champion Lee Sedol in 2016 using deep reinforcement learning.
    AlphaZero extended this to master chess, shogi, and Go through self-play.""",
    
    # GANs
    """Generative Adversarial Networks (GANs) were proposed by Ian Goodfellow
    while at the University of Montreal in 2014.
    StyleGAN, developed by researchers at NVIDIA including Tero Karras,
    produces photorealistic face generation.
    GANs have applications in image synthesis, style transfer, and data augmentation.
    Yann LeCun called GANs 'the most interesting idea in machine learning in the last 10 years'.""",
    
    # Transfer Learning
    """Transfer Learning allows models to leverage knowledge from related tasks.
    ImageNet pretraining, popularized after AlexNet's success in 2012,
    became the standard approach for computer vision.
    Fine-tuning techniques, studied extensively at Stanford University by Andrew Ng and colleagues,
    have improved model efficiency significantly.
    BERT and GPT demonstrated that transfer learning is equally powerful in NLP.""",
    
    # Recent Developments
    """Large Language Models have emerged as a dominant paradigm.
    ChatGPT, launched by OpenAI in November 2022, reached 100 million users in 2 months.
    Claude, developed by Anthropic (founded by Dario Amodei and Daniela Amodei),
    focuses on AI safety and alignment.
    LLaMA, released by Meta AI in 2023, provides open-source alternatives.
    Gemini, announced by Google DeepMind, integrates multimodal capabilities.""",
    
    # Computer Vision
    """Computer Vision has progressed from simple edge detection to complex scene understanding.
    YOLO (You Only Look Once), created by Joseph Redmon at the University of Washington,
    enabled real-time object detection.
    Vision Transformers (ViT), introduced by researchers at Google Research in 2020,
    applied transformer architecture to images.
    CLIP, developed by OpenAI, learns visual concepts from natural language supervision.""",
    
    # AI Safety
    """AI Safety and Alignment have become critical research areas.
    Anthropic, founded by former OpenAI researchers including Dario Amodei,
    focuses on building safer AI systems.
    Constitutional AI, proposed by Anthropic researchers,
    uses AI feedback to train helpful, harmless, and honest assistants.
    Stuart Russell at UC Berkeley advocates for provably beneficial AI.
    Eliezer Yudkowsky at MIRI studies AI alignment theory.""",
]

print(f"üìö Prepared {len(research_papers)} research paper abstracts")
print(f"üìä Total characters: {sum(len(p) for p in research_papers):,}")
print(f"üìà Avg length: {sum(len(p) for p in research_papers) / len(research_papers):.0f} chars")

## 3. Building the Knowledge Graph

### 3.1 Model Selection

In [None]:
# Use Model Router to select optimal model
avg_paper_length = sum(len(p) for p in research_papers) / len(research_papers)

request_characteristics = RequestCharacteristics(
    prompt_length=int(avg_paper_length),
    complexity_score=0.7,  # Entity extraction is moderately complex
    context_window_needed=8000,
    requires_json_mode=True,  # Structured output
)

routing_decision = model_router.route(request_characteristics)

print("üß≠ Model Selection:")
print(f"   Selected: {routing_decision.selected_model.provider}:{routing_decision.selected_model.model_id}")
print(f"   Reason: {routing_decision.reason}")
print(f"   Estimated cost: ${routing_decision.estimated_cost:.6f} per paper")
print(f"   Total estimated cost: ${routing_decision.estimated_cost * len(research_papers):.4f}")
print(f"   Confidence: {routing_decision.confidence_score:.3f}")

### 3.2 Graph Construction with Batch Processing

In [None]:
# Build Knowledge Graph with automatic batch processing (>= 5 documents)
print("\nüî® Building Knowledge Graph...")
print(f"   Processing {len(research_papers)} papers")
print(f"   Batch processing: {'ENABLED' if len(research_papers) >= 5 else 'DISABLED'}")
print(f"   Max concurrent: 10 workers\n")

start_time = time.time()

build_response = await kg.build_graph(
    documents=research_papers,
    graph_id="ai_research_network",
    entity_types=["person", "organization", "technology", "location"],
    relation_types=["developed", "works_at", "founded", "introduced", "collaborated_with"],
)

elapsed = time.time() - start_time

print("\n" + "=" * 60)
print("Knowledge Graph Built Successfully!")
print("=" * 60)
print(f"‚è±Ô∏è  Total time: {elapsed:.2f}s")
print(f"üìä Avg time per document: {elapsed / len(research_papers):.2f}s")
print(f"üìà Throughput: {len(research_papers) / elapsed:.2f} docs/sec")
print(f"\nüî¢ Graph Statistics:")
print(f"   Nodes (entities): {build_response.num_nodes}")
print(f"   Edges (relations): {build_response.num_edges}")
print(f"   Density: {build_response.density:.4f}")
print(f"   Connected components: {build_response.num_connected_components}")
print(f"   Graph ID: {build_response.graph_id}")

## 4. Querying the Graph

### 4.1 Find All Researchers

In [None]:
# Query: Find all person entities
print("\nüîç Query 1: Find All Researchers")
print("=" * 60)

researchers_response = await kg.query_graph(
    graph_id="ai_research_network",
    query_type="find_entities_by_type",
    params={"entity_type": "person"},
)

print(f"Found {len(researchers_response.results)} researchers:\n")
for i, researcher in enumerate(researchers_response.results[:10], 1):
    name = researcher.get("name", "Unknown")
    confidence = researcher.get("confidence", 1.0)
    print(f"   {i}. {name} (confidence: {confidence:.2f})")

if len(researchers_response.results) > 10:
    print(f"   ... and {len(researchers_response.results) - 10} more")

### 4.2 Find Organizations

In [None]:
# Query: Find all organization entities
print("\nüîç Query 2: Find All Organizations")
print("=" * 60)

orgs_response = await kg.query_graph(
    graph_id="ai_research_network",
    query_type="find_entities_by_type",
    params={"entity_type": "organization"},
)

print(f"Found {len(orgs_response.results)} organizations:\n")
for i, org in enumerate(orgs_response.results[:10], 1):
    name = org.get("name", "Unknown")
    print(f"   {i}. {name}")

### 4.3 Find Technologies

In [None]:
# Query: Find all technology entities
print("\nüîç Query 3: Find All Technologies")
print("=" * 60)

tech_response = await kg.query_graph(
    graph_id="ai_research_network",
    query_type="find_entities_by_type",
    params={"entity_type": "technology"},
)

print(f"Found {len(tech_response.results)} technologies:\n")
for i, tech in enumerate(tech_response.results[:15], 1):
    name = tech.get("name", "Unknown")
    print(f"   {i}. {name}")

### 4.4 Find Related Entities

In [None]:
# Query: Find entities related to a specific researcher
# (Replace with actual entity ID from results above)
print("\nüîç Query 4: Find Related Entities")
print("=" * 60)
print("Looking for entities related to 'Ian Goodfellow'...\n")

# First, find Ian Goodfellow
goodfellow_response = await kg.query_graph(
    graph_id="ai_research_network",
    query_type="find_entities_by_name",
    params={"name": "Ian Goodfellow", "fuzzy": True},
)

if goodfellow_response.results:
    goodfellow_id = goodfellow_response.results[0].get("id")
    
    # Find related entities
    related_response = await kg.query_graph(
        graph_id="ai_research_network",
        query_type="find_related_entities",
        params={
            "entity_id": goodfellow_id,
            "max_hops": 2,
        },
    )
    
    print(f"Found {len(related_response.results)} related entities:\n")
    for i, entity in enumerate(related_response.results[:10], 1):
        name = entity.get("name", "Unknown")
        entity_type = entity.get("type", "unknown")
        print(f"   {i}. {name} ({entity_type})")
else:
    print("Ian Goodfellow not found in graph")

## 5. Graph-based RAG

Í∑∏ÎûòÌîÑÎ•º ÌôúÏö©Ìïú ÏßàÎ¨∏-ÎãµÎ≥Ä

In [None]:
# Example queries for Graph RAG
queries = [
    "Who developed AlexNet?",
    "What technologies were introduced by Google?",
    "Which researchers work at OpenAI?",
    "Who founded Anthropic?",
    "What are the main applications of GANs?",
]

print("\nü§ñ Graph-based RAG Queries")
print("=" * 60)

for i, query in enumerate(queries, 1):
    print(f"\n‚ùì Query {i}: {query}")
    print("-" * 60)
    
    # Execute Graph RAG
    rag_response = await kg.graph_rag(
        query=query,
        graph_id="ai_research_network",
    )
    
    print(f"\nüìä Results: {rag_response.num_results} matches found")
    
    # Show top 3 results
    for j, result in enumerate(rag_response.hybrid_results[:3], 1):
        entity_name = result.get("entity_name", "Unknown")
        score = result.get("score", 0.0)
        print(f"   {j}. {entity_name} (score: {score:.3f})")
    
    # Add spacing
    if i < len(queries):
        print()

## 6. Performance Analysis

### 6.1 Caching Benefits

In [None]:
# Test caching performance
print("\n‚ö° Testing Cache Performance")
print("=" * 60)

test_query = "Who developed AlexNet?"

# First call (cache miss)
start = time.time()
response1 = await kg.graph_rag(query=test_query, graph_id="ai_research_network")
time1 = time.time() - start

# Second call (cache hit)
start = time.time()
response2 = await kg.graph_rag(query=test_query, graph_id="ai_research_network")
time2 = time.time() - start

# Third call (cache hit)
start = time.time()
response3 = await kg.graph_rag(query=test_query, graph_id="ai_research_network")
time3 = time.time() - start

speedup = time1 / ((time2 + time3) / 2)

print(f"\nQuery: '{test_query}'")
print(f"\nüìä Results:")
print(f"   1st call (cache miss): {time1:.3f}s")
print(f"   2nd call (cache hit):  {time2:.3f}s")
print(f"   3rd call (cache hit):  {time3:.3f}s")
print(f"\nüöÄ Speedup: {speedup:.1f}x")
print(f"üí∞ Time saved: {(time1 - time2):.3f}s per cached query")

### 6.2 Overall Statistics

In [None]:
# Summary statistics
print("\n" + "=" * 60)
print("PROJECT SUMMARY")
print("=" * 60)

total_queries = len(queries) + 3  # RAG queries + cache tests

print(f"\nüìö Input Data:")
print(f"   Documents processed: {len(research_papers)}")
print(f"   Total characters: {sum(len(p) for p in research_papers):,}")
print(f"   Processing time: {elapsed:.2f}s")
print(f"   Throughput: {len(research_papers) / elapsed:.2f} docs/sec")

print(f"\nüî¢ Graph:")
print(f"   Nodes (entities): {build_response.num_nodes}")
print(f"   Edges (relations): {build_response.num_edges}")
print(f"   Density: {build_response.density:.4f}")

print(f"\nüîç Queries Executed: {total_queries}")
print(f"   Entity type queries: 3")
print(f"   Relation queries: 1")
print(f"   Graph RAG queries: {len(queries)}")
print(f"   Cache test queries: 3")

print(f"\n‚ö° Performance:")
print(f"   Batch processing: ENABLED (10 workers)")
print(f"   Caching: ENABLED (2hr TTL)")
print(f"   Rate limiting: ENABLED")
print(f"   Cache speedup: {speedup:.1f}x")

print(f"\nüí∞ Cost Optimization:")
print(f"   Model: {routing_decision.selected_model.model_id}")
print(f"   Strategy: {model_router.strategy.value}")
print(f"   Estimated cost: ${routing_decision.estimated_cost * len(research_papers):.4f}")

## 7. Visualization & Export

### 7.1 Graph Visualization

In [None]:
# Visualize the graph (ASCII representation)
print("\nüìä Graph Visualization")
print("=" * 60)

visualization = await kg.visualize_graph(graph_id="ai_research_network")
print(visualization)

### 7.2 Export to Neo4j (Optional)

In [None]:
# If you have Neo4j running, you can export the graph
# Uncomment to use:

# kg.set_neo4j_adapter(
#     uri="bolt://localhost:7687",
#     user="neo4j",
#     password="your-password"
# )
#
# # Re-build with persist_to_neo4j=True
# build_response = await kg.build_graph(
#     documents=research_papers,
#     graph_id="ai_research_network",
#     persist_to_neo4j=True,
# )
#
# print("‚úÖ Graph exported to Neo4j")

print("üí° Neo4j export is optional. Uncomment the code above if you have Neo4j running.")

## Conclusion

Ïù¥ ÌäúÌÜ†Î¶¨ÏñºÏóêÏÑú Ïö∞Î¶¨Îäî:

### ‚úÖ Íµ¨ÌòÑÌïú Í∏∞Îä•

1. **Knowledge Graph Íµ¨Ï∂ï**
   - 8Í∞ú AI Ïó∞Íµ¨ ÎÖºÎ¨∏ÏóêÏÑú ÏóîÌã∞Ìã∞/Í¥ÄÍ≥Ñ Ï∂îÏ∂ú
   - NetworkX Í∏∞Î∞ò Í∑∏ÎûòÌîÑ Íµ¨Ï∂ï
   - Îã§ÏñëÌïú ÏøºÎ¶¨ ÏßÄÏõê

2. **Î∂ÑÏÇ∞ ÏïÑÌÇ§ÌÖçÏ≤ò ÌôúÏö©**
   - **Batch Processing**: 10Í∞ú Î≥ëÎ†¨ ÏõåÏª§Î°ú Î¨∏ÏÑú Ï≤òÎ¶¨
   - **Caching**: Î∞òÎ≥µ ÏøºÎ¶¨Ïóê ÎåÄÌïú Ï¶âÍ∞ÅÏ†ÅÏù∏ ÏùëÎãµ (~10x speedup)
   - **Rate Limiting**: API ÎπÑÏö© Ï†àÍ∞ê Î∞è ÏïàÏ†ïÏÑ±
   - **Event Streaming**: Ïã§ÏãúÍ∞Ñ ÏßÑÌñâ ÏÉÅÌô© Î™®ÎãàÌÑ∞ÎßÅ

3. **Model Router ÌôúÏö©**
   - Complexity-based routingÏúºÎ°ú ÏµúÏ†Å Î™®Îç∏ ÏÑ†ÌÉù
   - ÎπÑÏö©Í≥º ÌíàÏßàÏùò Í∑†Ìòï
   - ÏûêÎèô fallback ÏßÄÏõê

4. **Graph-based RAG**
   - Entity-centric retrieval
   - Path reasoning
   - Hybrid search

### üìä ÏÑ±Îä• Í∞úÏÑ†

- **Ï≤òÎ¶¨ ÏÜçÎèÑ**: ~2 docs/sec (Î∞∞Ïπò Ï≤òÎ¶¨)
- **Ï∫êÏãú Ìö®Í≥º**: 10x speedup for repeated queries
- **ÎπÑÏö© ÏµúÏ†ÅÌôî**: Complexity-based model selection
- **ÌôïÏû•ÏÑ±**: 10Í∞ú Î≥ëÎ†¨ ÏõåÏª§Î°ú ÎåÄÍ∑úÎ™® Ï≤òÎ¶¨ Í∞ÄÎä•

### üéØ Ïã§Ï†Ñ ÌôúÏö©

Ïù¥ Ìå®ÌÑ¥ÏùÄ Îã§ÏùåÍ≥º Í∞ôÏùÄ Ïã§Ï†Ñ ÏãúÎÇòÎ¶¨Ïò§Ïóê Ï†ÅÏö© Í∞ÄÎä•Ìï©ÎãàÎã§:

- **ÌïôÏà† Ïó∞Íµ¨ Î∂ÑÏÑù**: ÎÖºÎ¨∏ ÎÑ§Ìä∏ÏõåÌÅ¨, Ïù∏Ïö© Í¥ÄÍ≥Ñ, Ïó∞Íµ¨ ÎèôÌñ•
- **ÎπÑÏ¶àÎãàÏä§ Ïù∏ÌÖîÎ¶¨Ï†ÑÏä§**: Í∏∞ÏóÖ Í¥ÄÍ≥Ñ, Ìà¨Ïûê ÎÑ§Ìä∏ÏõåÌÅ¨, M&A Î∂ÑÏÑù
- **ÏùòÎ£å ÏßÄÏãù Í∑∏ÎûòÌîÑ**: ÏßàÎ≥ë-Ï¶ùÏÉÅ-ÏπòÎ£å Í¥ÄÍ≥Ñ, ÏïΩÎ¨º ÏÉÅÌò∏ÏûëÏö©
- **Î≤ïÎ•† Î¨∏ÏÑú Î∂ÑÏÑù**: ÌåêÎ°Ä Í¥ÄÍ≥Ñ, Î≤ïÎ•† Ï°∞Ìï≠ Ïó∞Í≤∞

### üöÄ Îã§Ïùå Îã®Í≥Ñ

- Îçî ÎßéÏùÄ Î¨∏ÏÑú Ï≤òÎ¶¨ (100+)
- Neo4j ÌÜµÌï©ÏúºÎ°ú ÏòÅÍµ¨ Ï†ÄÏû•
- Ïã§ÏãúÍ∞Ñ Ïä§Ìä∏Î¶¨Î∞ç UI Íµ¨Ï∂ï
- Ïª§Ïä§ÌÖÄ ÎùºÏö∞ÌåÖ Í∑úÏπô Ï∂îÍ∞Ä
- Í∑∏ÎûòÌîÑ ÏãúÍ∞ÅÌôî Í∞úÏÑ†