# LightRAG with Memgraph Integration Demo

This notebook demonstrates how to use LightRAG with Memgraph as the graph storage backend. LightRAG is a simple and fast retrieval-augmented generation framework that combines the power of graph databases with large language models.

## What you'll learn:
- How to set up LightRAG with Memgraph
- How to insert documents and create knowledge graphs
- How to perform different types of queries (local, global, hybrid)

## Prerequisites:
- Memgraph running (Docker: `docker run -p 7687:7687 memgraph/memgraph:latest`)
- OpenAI API key
- Required Python packages

## Installation and Setup

First, let's install the required packages. LightRAG with Memgraph support needs to be installed from source.

In [None]:
# Clone LightRAG repository and install from source (required for Memgraph support)
import os

# Clone the repository (only if it doesn't exist)
if not os.path.exists('LightRAG'):
    print("Cloning LightRAG repository...")
    !git clone https://github.com/HKUDS/LightRAG.git
else:
    print("LightRAG directory already exists, skipping clone...")

# Install LightRAG from the cloned directory
print("Installing LightRAG in editable mode...")
%pip install -e ./LightRAG

print("✅ LightRAG and dependencies installed successfully!")
print("⚠️  Note: You may need to restart the kernel if imports don't work immediately.")

## Environment Configuration

Set up your environment variables for Memgraph connection and OpenAI API.

In [None]:
import os
from dotenv import load_dotenv
import getpass

# Load environment variables
load_dotenv()

# Set up OpenAI API key
if not os.getenv("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")

# Configure Memgraph connection
os.environ["MEMGRAPH_URI"] = "bolt://localhost:7687"
os.environ["MEMGRAPH_USERNAME"] = ""  # Default is empty
os.environ["MEMGRAPH_PASSWORD"] = ""  # Default is empty
os.environ["MEMGRAPH_DATABASE"] = "memgraph"  # Default database name
os.environ["MEMGRAPH_WORKSPACE"] = "lightrag_demo"  # Workspace for data isolation

print("Environment configured successfully!")

## Import Required Libraries

**Important**: If you just ran the installation cell above, please restart the kernel first.

Import all necessary libraries for LightRAG with Memgraph integration.

In [None]:
from lightrag import LightRAG, QueryParam
from lightrag.llm.openai import gpt_4o_mini_complete, openai_embed
from lightrag.kg.shared_storage import initialize_pipeline_status

print("Libraries imported successfully!")

## Initialize LightRAG with Memgraph

Create and configure a LightRAG instance using Memgraph as the graph storage backend.

In [None]:
# Working directory for LightRAG storage
WORKING_DIR = "./lightrag_memgraph_storage"
if not os.path.exists(WORKING_DIR):
    os.mkdir(WORKING_DIR)

async def initialize_rag():
    """Initialize LightRAG with Memgraph as graph storage."""
    rag = LightRAG(
        working_dir=WORKING_DIR,
        embedding_func=openai_embed,
        llm_model_func=gpt_4o_mini_complete,
        graph_storage="MemgraphStorage",  # Use Memgraph as graph storage
    )
    
    await rag.initialize_storages()  # Initialize storage backends
    await initialize_pipeline_status()  # Initialize processing pipeline
    
    return rag

# Initialize RAG instance
rag = await initialize_rag()
print("LightRAG initialized with Memgraph successfully!")

## Sample Data: AI and Machine Learning Text

Let's create some sample text about artificial intelligence and machine learning to demonstrate the knowledge graph creation capabilities.

In [None]:
# Sample text about AI and Machine Learning
ai_ml_text = """
Artificial Intelligence (AI) is a broad field of computer science that aims to create intelligent machines 
capable of performing tasks that typically require human intelligence. Machine Learning (ML) is a subset of 
AI that focuses on the development of algorithms and statistical models that enable computers to improve 
their performance on a specific task through experience.

Deep Learning is a specialized branch of Machine Learning that uses neural networks with multiple layers 
to model and understand complex patterns in data. Convolutional Neural Networks (CNNs) are particularly 
effective for image recognition tasks, while Recurrent Neural Networks (RNNs) excel at processing 
sequential data like natural language.

Natural Language Processing (NLP) is another important subfield of AI that deals with the interaction 
between computers and human language. Large Language Models (LLMs) like GPT and BERT have revolutionized 
NLP by demonstrating remarkable capabilities in text generation, translation, and understanding.

Knowledge Graphs are structured representations of information that capture entities and their relationships. 
They are widely used in AI systems to store and reason about complex domain knowledge. Graph databases 
like Neo4j and Memgraph provide efficient storage and querying capabilities for knowledge graphs.

Retrieval-Augmented Generation (RAG) is a technique that combines the power of large language models 
with external knowledge retrieval systems. RAG systems can access up-to-date information from knowledge 
bases to generate more accurate and contextually relevant responses.
"""

print("Sample text prepared for knowledge graph creation.")
print(f"Text length: {len(ai_ml_text)} characters")

## Insert Documents and Create Knowledge Graph

Now let's insert the sample text into LightRAG. This will automatically extract entities and relationships to create a knowledge graph stored in Memgraph.

In [None]:
# Insert the text into LightRAG
print("Inserting text and creating knowledge graph...")

try:
    await rag.ainsert(ai_ml_text)
    print("✅ Knowledge graph created successfully!")
    print("The extracted entities and relationships are now stored in Memgraph.")
except Exception as e:
    print(f"❌ Error during insertion: {e}")
    print("Please check your Memgraph connection and OpenAI API key.")
    import traceback
    traceback.print_exc()

## Query the Knowledge Graph

Let's demonstrate different types of queries that LightRAG supports:

- **Local mode**: Focuses on context-dependent information
- **Global mode**: Utilizes global knowledge
- **Hybrid mode**: Combines local and global retrieval methods
- **Mix mode**: Integrates knowledge graph and vector retrieval

### Local Query
Local queries focus on specific entities and their immediate relationships.

In [None]:
# Local query - focuses on specific entities and their immediate context
local_query = "What is Deep Learning and how does it relate to Machine Learning?"

print(f"🔍 Local Query: {local_query}")
print("=" * 50)

try:
    local_response = await rag.aquery(
        local_query,
        param=QueryParam(mode="local")
    )
    print(local_response)
except Exception as e:
    print(f"❌ Error during local query: {e}")

print("\n" + "=" * 50)

### Global Query
Global queries utilize broader knowledge patterns across the entire knowledge graph.

In [None]:
# Global query - uses broader knowledge patterns
global_query = "What are the main components and technologies in the AI ecosystem?"

print(f"🌐 Global Query: {global_query}")
print("=" * 50)

try:
    global_response = await rag.aquery(
        global_query,
        param=QueryParam(mode="global")
    )
    print(global_response)
except Exception as e:
    print(f"❌ Error during global query: {e}")

print("\n" + "=" * 50)

### Hybrid Query
Hybrid queries combine both local and global retrieval methods for comprehensive answers.

In [None]:
# Hybrid query - combines local and global methods
hybrid_query = "How do Knowledge Graphs and RAG systems work together in AI applications?"

print(f"🔄 Hybrid Query: {hybrid_query}")
print("=" * 50)

try:
    hybrid_response = await rag.aquery(
        hybrid_query,
        param=QueryParam(mode="hybrid")
    )
    print(hybrid_response)
except Exception as e:
    print(f"❌ Error during hybrid query: {e}")

print("\n" + "=" * 50)

### Mix Query
Mix queries integrate knowledge graph and vector retrieval for enhanced performance.

In [None]:
# Mix query - integrates knowledge graph and vector retrieval
mix_query = "Compare CNNs and RNNs in terms of their applications and architectures."

print(f"🔀 Mix Query: {mix_query}")
print("=" * 50)

try:
    mix_response = await rag.aquery(
        mix_query,
        param=QueryParam(mode="mix")
    )
    print(mix_response)
except Exception as e:
    print(f"❌ Error during mix query: {e}")

print("\n" + "=" * 50)

## Adding More Documents

Let's add another document to see how the knowledge graph grows and evolves.

In [None]:
# Additional text about graph databases and vector databases
graph_db_text = """
Graph databases are specialized database management systems designed to store and query data 
represented as graphs. Unlike traditional relational databases that use tables, graph databases 
use nodes, edges, and properties to represent and store data. Memgraph is a high-performance 
in-memory graph database that supports the Cypher query language.

Vector databases are optimized for storing and querying high-dimensional vectors, which are 
commonly used in machine learning applications for similarity search and recommendation systems. 
Popular vector databases include Pinecone, Weaviate, and Chroma.

Graph Neural Networks (GNNs) are a class of deep learning models designed to work with 
graph-structured data. GNNs can learn representations of nodes and edges in graphs, making 
them useful for tasks like node classification, link prediction, and graph classification.

The combination of graph databases and vector databases in RAG systems enables both 
structured reasoning through knowledge graphs and semantic similarity search through 
vector embeddings. This hybrid approach provides more comprehensive and accurate 
information retrieval capabilities.
"""

print("📝 Adding additional document about graph and vector databases...")

try:
    await rag.ainsert(graph_db_text)
    print("✅ Additional document inserted successfully!")
    print("The knowledge graph has been updated with new entities and relationships.")
except Exception as e:
    print(f"❌ Error during insertion: {e}")

## Query the Enhanced Knowledge Graph

Now let's query the enhanced knowledge graph that includes information about both AI/ML and database technologies.

In [None]:
# Query the enhanced knowledge graph
enhanced_query = "How do graph databases like Memgraph support AI and machine learning applications?"

print(f"🚀 Enhanced Knowledge Graph Query: {enhanced_query}")
print("=" * 60)

try:
    enhanced_response = await rag.aquery(
        enhanced_query,
        param=QueryParam(mode="hybrid", top_k=30)
    )
    print(enhanced_response)
except Exception as e:
    print(f"❌ Error during enhanced query: {e}")

print("\n" + "=" * 60)

## Cleanup and Finalization

Finally, let's properly close the LightRAG instance and clean up resources.

In [None]:
# Cleanup and finalization
try:
    await rag.finalize_storages()
    print("✅ LightRAG instance properly finalized.")
except Exception as e:
    print(f"❌ Error during finalization: {e}")

print("\n🎉 Demo completed successfully!")
print("\n📝 Summary of what we accomplished:")
print("   ✓ Set up LightRAG with Memgraph as graph storage")
print("   ✓ Created knowledge graphs from text documents")
print("   ✓ Performed various types of queries (local, global, hybrid, mix)")
print("   ✓ Added multiple documents to expand the knowledge graph")
print("\n🔗 Next Steps:")
print("   • Explore the knowledge graph in Memgraph Lab")
print("   • Try adding your own documents")
print("   • Experiment with different query modes and parameters")
print("   • Build a complete RAG application using this setup")

## Additional Resources and Documentation

For more information about LightRAG and Memgraph integration:

### LightRAG Resources:
- [LightRAG GitHub Repository](https://github.com/HKUDS/LightRAG)

### Memgraph Resources:
- [Memgraph Documentation](https://memgraph.com/docs)
- [Memgraph AI Ecosystem](https://memgraph.com/docs/ai-ecosystem)
- [Memgraph's LightRAG integration](https://memgraph.com/docs/ai-ecosystem/integrations#lightrag)

### Key Features Demonstrated:
1. **Graph Storage**: Using Memgraph as the backend for storing knowledge graphs
2. **Entity Extraction**: Automatic extraction of entities and relationships from text
3. **Multiple Query Modes**: Local, global, hybrid, and mix query capabilities
4. **Workspace Isolation**: Keeping different projects' data separate
5. **Real-time Updates**: Adding new documents and updating the knowledge graph

### Performance Benefits:
- **In-memory Processing**: Memgraph's in-memory architecture for fast graph operations
- **Cypher Query Language**: Standard graph query language for complex graph traversals
- **Scalability**: Handle large knowledge graphs efficiently
- **ACID Compliance**: Reliable data consistency and integrity