# Knowledge Graph Engine v2 - Notebook Tutorial

This notebook demonstrates how to use KG Engine v2 with Ollama (local LLM) and Neo4j in a development environment.

## Prerequisites
Make sure you have started the services:
```bash
docker-compose -f docker-compose.notebook.yml up -d
```

## Setup and Imports

In [16]:
import os
import sys
from pathlib import Path
import logging
# Add src to Python path
src_path = Path('../src').resolve()
if str(src_path) not in sys.path:
    sys.path.insert(0, str(src_path))

# Load environment variables
from dotenv import load_dotenv
load_dotenv('../.env.notebook')

logging.getLogger().setLevel(logging.WARNING)
# Disable tokenizer parallelism warning
os.environ["TOKENIZERS_PARALLELISM"] = "false"

print(f"Python path: {sys.path[0]}")
print(f"Neo4j URI: {os.getenv('NEO4J_URI')}")
print(f"LLM Provider: {os.getenv('LLM_PROVIDER', 'auto-detect')}")
print(f"Ollama Model: {os.getenv('OLLAMA_MODEL', 'not set')}")
print(f"Ollama Base URL: {os.getenv('OLLAMA_BASE_URL', 'not set')}")

Python path: /Users/dasein/.local/share/uv/python/cpython-3.12.9-macos-aarch64-none/lib/python312.zip
Neo4j URI: neo4j+s://346d12e8.databases.neo4j.io
LLM Provider: litellm
Ollama Model: phi3:3.8b
Ollama Base URL: http://localhost:11434/v1


In [17]:
# Import KG Engine components
from kg_engine import (
    KnowledgeGraphEngineV2, 
    InputItem, 
    Neo4jConfig,
    SearchType,
    LLMClientFactory,
    __version__
)

print(f"✅ KG Engine v{__version__} loaded successfully")

✅ KG Engine v2.2.0 loaded successfully


## Initialize the Engine

In [18]:
# Test Neo4j connection
config = Neo4jConfig()
if config.verify_connectivity():
    print("✅ Neo4j connection successful")
else:
    print("❌ Neo4j connection failed - check docker-compose services")
    raise ConnectionError("Neo4j not available")

✅ Neo4j connection successful


In [19]:
# Test Ollama connection
import requests

ollama_base = os.getenv('OLLAMA_BASE_URL', 'http://localhost:11434/v1')
ollama_model = os.getenv('OLLAMA_MODEL', 'phi3:mini')

try:
    # Try to connect to Ollama API
    response = requests.get(f"{ollama_base.replace('/v1', '')}/api/tags")
    if response.status_code == 200:
        models = [model['name'] for model in response.json().get('models', [])]
        print("✅ Ollama connection successful")
        print(f"Available models: {', '.join(models) if models else 'No models found'}")
        
        if ollama_model in models:
            print(f"✅ Selected model '{ollama_model}' is available")
        else:
            print(f"⚠️  Selected model '{ollama_model}' not found. Available: {models}")
            print("   Run: ollama pull phi3:mini")
    else:
        print(f"❌ Ollama connection failed - status {response.status_code}")
except Exception as e:
    print(f"❌ Cannot connect to Ollama: {e}")
    print("   Make sure Ollama is running: docker-compose -f docker-compose.notebook.yml up -d")

✅ Ollama connection successful
Available models: phi3:3.8b, deepseek-coder:6.7b-base-q4_0
✅ Selected model 'phi3:3.8b' is available


In [20]:
# Initialize KG Engine with LLMClientFactory
try:
    # Create LLM configuration from environment
    llm_config = LLMClientFactory.create_from_env()
    
    # Initialize the engine
    engine = KnowledgeGraphEngineV2(
        llm_config=llm_config,
        neo4j_config=config
    )
    
    print(f"🚀 KG Engine initialized with {llm_config.provider} provider!")
    print(f"   Model: {llm_config.get_model_name()}")
    
    # For Ollama, show the base URL
    if hasattr(llm_config, 'base_url'):
        print(f"   Base URL: {llm_config.base_url}")
        
except Exception as e:
    print(f"❌ Failed to initialize engine: {e}")
    print("\nTroubleshooting:")
    print("1. Check if Ollama is running: curl http://localhost:11434/api/tags")
    print("2. Verify environment variables in .env.notebook")
    print("3. Ensure OLLAMA_MODEL is set correctly")
    raise

🤖 LLM Interface initialized: gpt-4o via litellm
🚀 Knowledge Graph Engine v2
   - LLM interface: gpt-4o via litellm
🚀 KG Engine initialized with litellm provider!
   Model: gpt-4o
   Base URL: https://litellm.chatcyber.ai


### LLM Provider Configuration

The LLMClientFactory automatically detects the provider based on environment variables:

1. **Ollama** (current): Set `LLM_PROVIDER=ollama` or have `OLLAMA_MODEL` set
2. **OpenAI**: Set `LLM_PROVIDER=openai` and `OPENAI_API_KEY`
3. **LiteLLM**: Set `LLM_PROVIDER=litellm` and `LITELLM_BEARER_TOKEN`

To switch providers, update the `.env.notebook` file and restart the kernel.

## Basic Usage - Adding Knowledge

In [21]:
# Create some sample knowledge
sample_data = [
    "Alice works as a senior software engineer at Google",
    "Bob is a data scientist at Microsoft", 
    "Alice lives in Mountain View, California",
    "Bob lives in Seattle, Washington",
    "Charlie is Alice's friend from Stanford University",
    "Google's headquarters is in Mountain View",
    "Alice graduated from Stanford in 2018",
    "Bob enjoys hiking and photography"
]

# Convert to InputItem objects
input_items = [InputItem(description=text) for text in sample_data]

print(f"Created {len(input_items)} input items")
for i, item in enumerate(input_items, 1):
    print(f"{i}. {item.description}")

Created 8 input items
1. Alice works as a senior software engineer at Google
2. Bob is a data scientist at Microsoft
3. Alice lives in Mountain View, California
4. Bob lives in Seattle, Washington
5. Charlie is Alice's friend from Stanford University
6. Google's headquarters is in Mountain View
7. Alice graduated from Stanford in 2018
8. Bob enjoys hiking and photography


In [22]:
# Process the input items
print("🔄 Processing input items...")
results = engine.process_input(input_items)

print("\n📊 Processing Results:")
print(f"   Items processed: {results['processed_items']}")
print(f"   New edges created: {results['new_edges']}")
print(f"   Updated edges: {results['updated_edges']}")
print(f"   Duplicates ignored: {results['duplicates_ignored']}")
print(f"   Processing time: {results['processing_time_ms']:.1f}ms")

if results['errors']:
    print(f"\n⚠️ Errors: {results['errors']}")

🔄 Processing input items...

📊 Processing Results:
   Items processed: 8
   New edges created: 1
   Updated edges: 0
   Duplicates ignored: 11
   Processing time: 23038.2ms


## Searching the Knowledge Graph

In [23]:
# Define some test queries
test_queries = [
    "Who works at Google?",
    "Where does Alice live?",
    "What companies are mentioned?",
    "Who are Alice's friends?",
    "Where did Alice study?",
    "What does Bob enjoy doing?"
]

print("🔍 Testing search queries:")
print("=" * 40)

🔍 Testing search queries:


In [24]:
# Test each query
for i, query in enumerate(test_queries, 1):
    print(f"\n{i}. Query: '{query}'")
    
    try:
        response = engine.search(query, search_type=SearchType.BOTH)
        
        print(f"   Results found: {len(response.results)}")
        print(f"   Answer: {response.answer}")
        print(f"   Query time: {response.query_time_ms:.1f}ms")
        
        # Show top results
        if response.results:
            print("   Top results:")
            for j, result in enumerate(response.results[:3], 1):
                if result.triplet and result.triplet.edge:
                    edge = result.triplet.edge
                    print(f"     {j}. {edge.subject} {edge.relationship} {edge.object} (score: {result.score:.2f})")
                
    except Exception as e:
        print(f"   ❌ Error: {e}")


1. Query: 'Who works at Google?'
Search type: SearchType.BOTH Who works at Google?
Parsing query DIRECT with LLM intuition...
Parsed query: ParsedQuery(entities=['Google'], relationships=['WORKS_AT', 'EMPLOYED_BY'], search_type=<SearchType.DIRECT: 'direct'>, query_intent='list', temporal_context=None)
Standardized relationships: ['SPEAKS', 'LEADS']
Searching with entities: ['Google'] and relationships: ['SPEAKS', 'LEADS']
Parsing query SEMANTIC...
   Results found: 10
   Answer: Alice currently works at Google. Bob previously worked at Google on search algorithms.
   Query time: 4239.3ms
   Top results:
     1. Alice VISITED Google (score: 0.82)
     2. Bob Smith CONTINUED Google (score: 0.80)
     3. TechCorp IS_ONE_OF ['Google', 'Microsoft', 'Amazon'] (score: 0.78)

2. Query: 'Where does Alice live?'
Search type: SearchType.BOTH Where does Alice live?
Parsing query DIRECT with LLM intuition...
Parsed query: ParsedQuery(entities=['Alice'], relationships=['LIVES_IN', 'RESIDES_IN'], se

[#C14E]  _: <CONNECTION> error: Failed to write data to connection ResolvedIPv4Address(('34.124.169.171', 7687)) (ResolvedIPv4Address(('34.124.169.171', 7687))): SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')
Unable to retrieve routing information
Error in vector similarity search: Unable to retrieve routing information


   Results found: 0
   Answer: None
   Query time: 4161124.1ms

4. Query: 'Who are Alice's friends?'
Search type: SearchType.BOTH Who are Alice's friends?
Parsing query DIRECT with LLM intuition...
Parsed query: ParsedQuery(entities=['Alice'], relationships=['HAS_FRIEND'], search_type=<SearchType.DIRECT: 'direct'>, query_intent='list', temporal_context=None)
Standardized relationships: ['HAS_ROLE']
Searching with entities: ['Alice'] and relationships: ['HAS_ROLE']


[#C0D1]  _: <CONNECTION> error: Failed to write data to connection IPv4Address(('si-346d12e8-1580.production-orch-0703.neo4j.io', 7687)) (ResolvedIPv4Address(('34.124.169.171', 7687))): SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')


Error in direct graph search: Failed to write data to connection IPv4Address(('si-346d12e8-1580.production-orch-0703.neo4j.io', 7687)) (ResolvedIPv4Address(('34.124.169.171', 7687)))
Parsing query SEMANTIC...
   Results found: 10
   Answer: Alice's friend is Charlie.
   Query time: 9109.3ms
   Top results:
     1. Charlie IS_ONE_OF Alice (score: 0.90)
     2. Bob Smith LEADS Alice to start TechCorp. (score: 0.82)
     3. Alice VISITED Stanford University (score: 0.82)

5. Query: 'Where did Alice study?'
Search type: SearchType.BOTH Where did Alice study?
Parsing query DIRECT with LLM intuition...
Parsed query: ParsedQuery(entities=['Alice'], relationships=['STUDIED_AT', 'ATTENDED'], search_type=<SearchType.DIRECT: 'direct'>, query_intent='search', temporal_context=None)
Standardized relationships: ['TEACHES', 'TEACHES']
Searching with entities: ['Alice'] and relationships: ['TEACHES', 'TEACHES']
Parsing query SEMANTIC...
   Results found: 10
   Answer: Alice studied at Stanford Univers

## Advanced Features

In [10]:
# Get all relationships for a specific entity
entity_name = "Alice"
print(f"🔗 All relationships for '{entity_name}':")

relations = engine.get_node_relations(entity_name, max_depth=1)

for i, relation in enumerate(relations, 1):
    if relation.triplet and relation.triplet.edge:
        edge = relation.triplet.edge
        print(f"{i}. {edge.subject} {edge.relationship} {edge.object}")
        print(f"   Summary: {edge.metadata.summary}")
        print(f"   Confidence: {edge.metadata.confidence}")
        print()

🔗 All relationships for 'Alice':
1. Alice SPEAKS senior software engineer
   Summary: Alice works as a senior software engineer
   Confidence: 0.95

2. Alice LIVES_IN Mountain View, California
   Summary: Alice lives in Mountain View, California
   Confidence: 0.95

3. Alice BECAME Meta
   Summary: Alice now works at Meta as a senior engineer
   Confidence: 0.95

4. Alice VISITED Google
   Summary: Alice works at Google
   Confidence: 0.95

5. Alice TEACHES Stanford
   Summary: Alice graduated from Stanford in 2018
   Confidence: 0.95

6. Alice IS_ONE_OF Charlie
   Summary: Charlie is Alice's friend
   Confidence: 0.9

7. Alice TEACHES Stanford University
   Summary: Alice attended Stanford University
   Confidence: 0.85



In [11]:
# Get system statistics
print("📊 System Statistics:")
stats = engine.get_stats()

print(f"Graph Stats: {stats.get('graph_stats', {})}")
print(f"Vector Stats: {stats.get('vector_stats', {})}")
print(f"Total Entities: {stats.get('entities', 0)}")
print(f"Relationship Types: {len(stats.get('relationships', []))}")

📊 System Statistics:
Graph Stats: {'total_entities': 207, 'total_edges': 201, 'active_edges': 201, 'obsolete_edges': 0, 'relationship_types': 45, 'relationships': ['RELATES_TO', 'BORN_IN', 'SPEAKS', 'HAS_ROLE', 'SPECIALIZES_IN', 'HAS_CHILDREN', 'LIVES_IN', 'PRACTICES', 'INTERESTED_IN', 'LIKES', 'LEADS', 'RULED', 'JOINED', 'CONTINUED', 'BECAME', 'CAPTURED', 'ORDERED', 'PROHIBITED', 'SUBJECTED_TO', 'INVADED']}
Vector Stats: {'total_triplets': 201, 'active_triplets': 201, 'obsolete_triplets': 0, 'embedder_model': 'unknown'}
Total Entities: 208
Relationship Types: 45


## Conflict Resolution Demo

In [12]:
# Add conflicting information to demonstrate conflict resolution
conflict_data = [
    "Alice no longer works at Google",  # Negation
    "Alice now works at Meta as a senior engineer",  # New job
    "Bob moved to Portland, Oregon in 2024"  # Location change
]

conflict_items = [InputItem(description=text) for text in conflict_data]

print("⚔️ Testing conflict resolution:")
for item in conflict_items:
    print(f"   - {item.description}")

print("\n🔄 Processing conflicting information...")
conflict_results = engine.process_input(conflict_items)

print(f"New edges: {conflict_results['new_edges']}")
print(f"Updated edges: {conflict_results['updated_edges']}")
print(f"Obsoleted edges: {conflict_results['obsoleted_edges']}")

⚔️ Testing conflict resolution:
   - Alice no longer works at Google
   - Alice now works at Meta as a senior engineer
   - Bob moved to Portland, Oregon in 2024

🔄 Processing conflicting information...
New edges: 0
Updated edges: 0
Obsoleted edges: 0


In [13]:
# Test the updated information
test_queries_after_conflict = [
    "Where does Alice work now?",
    "Where does Bob live?",
    "Who works at Google?"
]

print("🔍 Testing queries after conflict resolution:")
for query in test_queries_after_conflict:
    response = engine.search(query)
    print(f"\nQ: {query}")
    print(f"A: {response.answer}")

🔍 Testing queries after conflict resolution:
Search type: SearchType.BOTH Where does Alice work now?
Parsing query DIRECT with LLM intuition...
Parsed query: ParsedQuery(entities=['Alice'], relationships=['WORKS_AT', 'EMPLOYED_BY'], search_type=<SearchType.DIRECT: 'direct'>, query_intent='search', temporal_context=None)
Standardized relationships: ['SPEAKS', 'LEADS']
Searching with entities: ['Alice'] and relationships: ['SPEAKS', 'LEADS']
Parsing query SEMANTIC...

Q: Where does Alice work now?
A: Alice now works at Meta as a senior engineer.
Search type: SearchType.BOTH Where does Bob live?
Parsing query DIRECT with LLM intuition...
Parsed query: ParsedQuery(entities=['Bob'], relationships=['LIVES_IN', 'RESIDES_IN'], search_type=<SearchType.DIRECT: 'direct'>, query_intent='search', temporal_context=None)
Standardized relationships: ['LIVES_IN', 'LIVES_IN']
Searching with entities: ['Bob'] and relationships: ['LIVES_IN', 'LIVES_IN']
Parsing query SEMANTIC...

Q: Where does Bob live?
A

## Performance Testing

In [14]:
import time

# Test search performance
search_query = "Who works at tech companies?"
num_trials = 3

print(f"⏱️ Performance test: '{search_query}' ({num_trials} trials)")

times = []
for i in range(num_trials):
    start_time = time.time()
    response = engine.search(search_query)
    end_time = time.time()
    
    query_time = (end_time - start_time) * 1000
    times.append(query_time)
    
    print(f"Trial {i+1}: {query_time:.1f}ms - {len(response.results)} results")

avg_time = sum(times) / len(times)
print(f"\nAverage query time: {avg_time:.1f}ms")
print(f"Provider: {llm_config.provider}")
print(f"Model: {llm_config.get_model_name()}")

⏱️ Performance test: 'Who works at tech companies?' (3 trials)
Search type: SearchType.BOTH Who works at tech companies?
Parsing query DIRECT with LLM intuition...
Parsed query: ParsedQuery(entities=['tech companies'], relationships=['WORKS_AT', 'EMPLOYED_BY'], search_type=<SearchType.DIRECT: 'direct'>, query_intent='list', temporal_context=None)
Standardized relationships: ['SPEAKS', 'LEADS']
Searching with entities: ['tech companies'] and relationships: ['SPEAKS', 'LEADS']
Parsing query SEMANTIC...
Trial 1: 4432.7ms - 10 results
Search type: SearchType.BOTH Who works at tech companies?
Parsing query DIRECT with LLM intuition...
Parsed query: ParsedQuery(entities=['tech companies'], relationships=['WORKS_AT', 'EMPLOYED_BY'], search_type=<SearchType.DIRECT: 'direct'>, query_intent='list', temporal_context=None)
Standardized relationships: ['SPEAKS', 'LEADS']
Searching with entities: ['tech companies'] and relationships: ['SPEAKS', 'LEADS']
Parsing query SEMANTIC...
Trial 2: 4509.3ms - 

## Clean Up (Optional)

In [None]:
# Uncomment to clear all data
# print("🧹 Clearing all data...")
# engine.clear_all_data()
# print("✅ All data cleared")

## Summary

This notebook demonstrated:
1. ✅ Setting up KG Engine v2 with Ollama and Neo4j
2. ✅ Processing natural language input
3. ✅ Searching the knowledge graph with natural language queries
4. ✅ Exploring entity relationships
5. ✅ Conflict resolution with temporal tracking
6. ✅ Performance testing

The system is now ready for your knowledge management tasks! 

**Model Performance**: The `phi3:mini` model provides good balance of speed and accuracy for notebook environments. For production use, consider larger models or OpenAI GPT models.