# Knowledge Graph RAG System Demo

This notebook demonstrates how to use the knowledge graph RAG system to build a knowledge graph from text documents and query it.

In [None]:
import os
import sys
import pandas as pd
from dotenv import load_dotenv
import matplotlib.pyplot as plt
from pyvis.network import Network

# Add project root to path
sys.path.append('..')

# Load environment variables
load_dotenv()

# Import project modules
from src.data.ingestion import DocumentProcessor, KnowledgeGraphBuilder
from src.graph.database import Neo4jDatabase
from src.utils.embeddings import EmbeddingGenerator
from src.rag.query_engine import GraphRAGQueryEngine

## 1. Create Sample Documents

First, let's create some sample documents to build our knowledge graph.

In [None]:
# Create a data directory if it doesn't exist
os.makedirs('../data/sample', exist_ok=True)

# Sample documents about AI and machine learning
doc1 = """
Artificial Intelligence (AI) is revolutionizing industries across the globe. 
Machine learning, a subset of AI, enables computers to learn from data and improve over time. 
Deep learning, a specialized form of machine learning, uses neural networks with many layers.
Companies like Google, Microsoft, and OpenAI are leading the development of advanced AI systems.
Google's DeepMind created AlphaGo, which defeated the world champion in the game of Go.
"""

doc2 = """
Natural Language Processing (NLP) is a field of AI focused on enabling computers to understand human language.
Large Language Models (LLMs) like GPT-4 developed by OpenAI have shown remarkable abilities in text generation.
These models are trained on vast amounts of text data from the internet and books.
Microsoft has integrated GPT models into their Bing search engine and other products.
NLP applications include machine translation, sentiment analysis, and question answering systems.
"""

doc3 = """
Knowledge graphs represent information as a network of entities and their relationships.
Google uses a knowledge graph to enhance its search results with structured information.
Neo4j is a popular graph database used to store and query knowledge graphs.
Graph neural networks (GNNs) can learn from graph-structured data for tasks like recommendation systems.
Retrieval-Augmented Generation (RAG) combines knowledge graphs with language models for more accurate responses.
"""

# Write documents to files
with open('../data/sample/ai_overview.txt', 'w') as f:
    f.write(doc1)
    
with open('../data/sample/nlp.txt', 'w') as f:
    f.write(doc2)
    
with open('../data/sample/knowledge_graphs.txt', 'w') as f:
    f.write(doc3)
    
print("Sample documents created in ../data/sample/")

## 2. Process Documents and Build Knowledge Graph

Now let's process these documents to extract entities and relationships.

In [None]:
# Initialize document processor
processor = DocumentProcessor()

# Process documents
results = processor.process_directory('../data/sample')

# Build knowledge graph
kg_builder = KnowledgeGraphBuilder()
for result in results:
    kg_builder.add_processed_data(result)

# Get entities and relationships
entities_df = kg_builder.get_unique_entities()
relationships_df = kg_builder.get_relationships_df()

print(f"Extracted {len(entities_df)} unique entities and {len(relationships_df)} relationships")

# Display sample entities
print("\nSample entities:")
display(entities_df.head())

# Display sample relationships
print("\nSample relationships:")
display(relationships_df.head())

## 3. Generate Embeddings for Entities

Let's generate vector embeddings for our entities to enable semantic search.

In [None]:
# Initialize embedding generator
embedding_generator = EmbeddingGenerator()

# Generate embeddings
entities_with_embeddings = embedding_generator.generate_entity_embeddings(entities_df)

# Display sample with embeddings
print(f"Generated embeddings with {len(entities_with_embeddings.iloc[0]['embedding'])} dimensions")
display(entities_with_embeddings.head())

## 4. Visualize the Knowledge Graph

Let's create a simple visualization of our knowledge graph using PyVis.

In [None]:
# Create a network graph
net = Network(height="750px", width="100%", notebook=True, directed=True)

# Add nodes
entity_types = entities_df['label'].unique()
colors = plt.cm.rainbow(np.linspace(0, 1, len(entity_types)))
color_map = {entity_type: f'#{int(r*255):02x}{int(g*255):02x}{int(b*255):02x}' 
             for entity_type, (r, g, b, _) in zip(entity_types, colors)}

for _, row in entities_df.iterrows():
    net.add_node(row['entity_id'], label=row['text'], title=f"Type: {row['label']}", 
                 color=color_map.get(row['label'], '#CCCCCC'))

# Add edges
for _, row in relationships_df.iterrows():
    if row['source'].lower() in net.node_ids and row['target'].lower() in net.node_ids:
        net.add_edge(row['source'].lower(), row['target'].lower(), title=row['relation'])

# Display the graph
net.show_buttons(filter_=['physics'])
net.show('knowledge_graph.html')

## 5. Export to Neo4j

Now let's export our knowledge graph to Neo4j. Make sure Neo4j is running and your connection details are in the `.env` file.

In [None]:
# Export entities and relationships to CSV
os.makedirs('../data/output', exist_ok=True)
entities_path = '../data/output/entities.csv'
relationships_path = '../data/output/relationships.csv'
embeddings_path = '../data/output/entities_with_embeddings.csv'

entities_df.to_csv(entities_path, index=False)
relationships_df.to_csv(relationships_path, index=False)
entities_with_embeddings.to_csv(embeddings_path, index=False)

print(f"Exported data to {entities_path} and {relationships_path}")

In [None]:
# Initialize Neo4j database
try:
    db = Neo4jDatabase()
    db.connect()
    print("Connected to Neo4j successfully")
    
    # Create constraints
    db.create_constraints()
    
    # Clear existing data
    db.clear_database()
    
    # Import data
    db.import_knowledge_graph(entities_path, relationships_path)
    print("Knowledge graph imported to Neo4j successfully")
except Exception as e:
    print(f"Error connecting to Neo4j: {e}")
    print("Make sure Neo4j is running and your connection details are in the .env file")
finally:
    if 'db' in locals() and db.driver:
        db.close()

## 6. Query the Knowledge Graph

Now let's use our Graph RAG query engine to answer questions using the knowledge graph.

In [None]:
# Initialize Neo4j database and query engine
try:
    db = Neo4jDatabase()
    db.connect()
    
    # Initialize query engine
    query_engine = GraphRAGQueryEngine(db)
    
    # Sample queries
    queries = [
        "What is the relationship between OpenAI and Microsoft?",
        "How are knowledge graphs used in search engines?",
        "What are the applications of NLP?"
    ]
    
    for query in queries:
        print(f"\nQuery: {query}")
        try:
            result = query_engine.query(query)
            print("\nAnswer:")
            print(result["answer"])
            
            print("\nRelevant entities:")
            for entity in result["context"]["relevant_entities"]:
                print(f"- {entity['text']} ({entity['label']}): {entity['similarity']:.4f}")
        except Exception as e:
            print(f"Error processing query: {e}")
except Exception as e:
    print(f"Error connecting to Neo4j: {e}")
finally:
    if 'db' in locals() and db.driver:
        db.close()

## 7. Conclusion

In this notebook, we've demonstrated how to:

1. Process text documents to extract entities and relationships
2. Build a knowledge graph from the extracted information
3. Generate embeddings for semantic search
4. Visualize the knowledge graph
5. Import the knowledge graph into Neo4j
6. Query the knowledge graph using a RAG approach

This approach combines the structured information in knowledge graphs with the power of language models for more accurate and contextual responses.