# Modular LangChain GraphRAG Pipeline

This notebook demonstrates the complete knowledge graph construction and Q&A system using modular agents:
- **StructuredDataAgent**: Constructs domain graph from CSV files
- **UnstructuredDataAgent**: Extracts entities from markdown reviews
- **EntityResolutionAgent**: Connects subject and domain graphs
- **LangChainRAGAgent**: Implements multiple retrieval strategies
- **SupplyChainQASystem**: Orchestrates the complete pipeline

## Architecture Overview

```
CSV Files → StructuredDataAgent → Domain Graph
                                        ↓
                              EntityResolutionAgent → Connected Graph → LangChainRAGAgent → Q&A
                                        ↑
Markdown Files → UnstructuredDataAgent → Subject Graph
```

## 1. Setup and Environment

In [12]:
# Import required libraries
import os
import asyncio
import warnings
warnings.filterwarnings('ignore')

# Load environment variables
from dotenv import load_dotenv
load_dotenv()

# Import the main orchestrator
from supply_chain_qa_system import SupplyChainQASystem, create_and_build_system

# Verify environment
print("✅ Environment Check:")
print(f"  Neo4j URI: {os.getenv('NEO4J_URI', 'Not set')}")
print(f"  Neo4j Username: {os.getenv('NEO4J_USERNAME', 'Not set')}")
print(f"  OpenAI API Key: {'Set' if os.getenv('OPENAI_API_KEY') else 'Not set'}")
print(f"  Neo4j Import Dir: {os.getenv('NEO4J_IMPORT_DIR', 'Not set')}")

✅ Environment Check:
  Neo4j URI: bolt://localhost:7687
  Neo4j Username: neo4j
  OpenAI API Key: Set
  Neo4j Import Dir: /var/lib/neo4j/import


## 2. Initialize the Supply Chain Q&A System

In [13]:
# Create the system instance
system = SupplyChainQASystem()

print("✅ System initialized with agents:")
print(f"  - {system.structured_agent.name}: {system.structured_agent.description}")
print(f"  - {system.unstructured_agent.name}: {system.unstructured_agent.description}")
print(f"  - {system.resolution_agent.name}: {system.resolution_agent.description}")
print(f"  - {system.rag_agent.name}: {system.rag_agent.description}")

print("\n📁 Files to process:")
print(f"  CSV files: {len(system.csv_files)}")
print(f"  Markdown files: {len(system.markdown_files)}")

ImportError: cannot import name 'omit' from 'openai._types' (/opt/homebrew/Caskroom/miniconda/base/envs/kg-workshop/lib/python3.12/site-packages/openai/_types.py)

## 3. Build the Complete Knowledge Graph

This section runs the complete pipeline:
1. Reset the graph (optional)
2. Build domain graph from CSVs
3. Build subject graph from markdown reviews
4. Perform entity resolution
5. Initialize RAG system

In [None]:
# Build the complete knowledge graph
# Note: Set limit_markdown_files=None to process all files (takes longer)
results = await system.build_complete_graph(
    reset=True,  # Reset the graph first
    limit_markdown_files=3  # Process only 3 markdown files for demo
)

print("\n✅ Knowledge Graph Built Successfully!")

## 4. Test the Q&A System

Now let's test the system with various types of queries.

### 4.1 Simple Queries

In [None]:
# Test a simple product query
question = "What products are available and their prices?"
answer = system.ask_question(question)
print(f"Q: {question}")
print(f"\nA: {answer}")

In [None]:
# Test a supplier query
question = "List all suppliers and where they are located"
answer = system.ask_question(question)
print(f"Q: {question}")
print(f"\nA: {answer}")

### 4.2 Supply Chain Tracing Queries

In [None]:
# Test supply chain tracing
question = "Which suppliers provide parts for the Uppsala Sofa?"
answer = system.ask_question(question)
print(f"Q: {question}")
print(f"\nA: {answer}")

In [None]:
# Test root cause analysis
question = "Trace any quality issues in furniture back to their suppliers"
answer = system.ask_question(question, use_workflow=True)  # Use workflow for complex query
print(f"Q: {question}")
print(f"\nA: {answer}")

### 4.3 Review-based Queries

In [None]:
# Test review extraction
question = "What quality issues are mentioned in product reviews?"
answer = system.ask_question(question)
print(f"Q: {question}")
print(f"\nA: {answer}")

In [None]:
# Test feature extraction
question = "What features do customers appreciate in the furniture products?"
answer = system.ask_question(question)
print(f"Q: {question}")
print(f"\nA: {answer}")

## 5. Explore Individual Agents

Let's explore what each agent can do individually.

### 5.1 Structured Data Agent

In [None]:
# Get domain graph statistics
from structured_data_agent import StructuredDataAgent

structured_agent = StructuredDataAgent()
stats = structured_agent.get_graph_statistics()

print("📊 Domain Graph Statistics:")
print("\nNodes:")
for label, count in stats['nodes'].items():
    print(f"  {label}: {count}")

print("\nRelationships:")
for rel_type, count in stats['relationships'].items():
    print(f"  {rel_type}: {count}")

### 5.2 Unstructured Data Agent

In [None]:
# Get subject graph statistics
from unstructured_data_agent import UnstructuredDataAgent

unstructured_agent = UnstructuredDataAgent()
stats = unstructured_agent.get_graph_statistics()

print("📄 Subject Graph Statistics:")
print(f"\nDocuments: {stats['document_count']}")
print(f"Chunks: {stats['chunk_count']}")

print("\nEntities by type:")
for entity_type, count in stats['entities_by_type'].items():
    print(f"  {entity_type}: {count}")

### 5.3 Entity Resolution Agent

In [None]:
# Get resolution statistics
from entity_resolution_agent import EntityResolutionAgent

resolution_agent = EntityResolutionAgent()
stats = resolution_agent.get_resolution_statistics()

print("🔗 Entity Resolution Statistics:")
print(f"\nTotal correspondences: {stats['total_correspondences']}")

print("\nResolution by type:")
for entity_type, type_stats in stats['resolution_by_type'].items():
    print(f"  {entity_type}:")
    print(f"    Count: {type_stats['count']}")
    print(f"    Avg similarity: {type_stats['avg_similarity']}")

if stats.get('unresolved_by_type'):
    print("\nUnresolved entities:")
    for entity_type, count in stats['unresolved_by_type'].items():
        print(f"  {entity_type}: {count}")

## 6. Advanced Features

### 6.1 LangGraph Workflow for Complex Queries

In [None]:
# Create and test the LangGraph workflow
workflow = system.rag_agent.create_langgraph_workflow()

# Test with a complex multi-hop query
complex_query = "Find all quality issues in products and trace them back to the responsible suppliers through the supply chain"

print(f"Complex Query: {complex_query}\n")
print("Processing with LangGraph workflow...\n")

result = workflow.invoke({"question": complex_query})

print(f"Query Type Detected: {result['query_type']}")
print(f"\nAnswer:\n{result['answer']}")

### 6.2 Direct RAG Agent Methods

In [None]:
# Test hybrid search directly
query = "quality issues"
docs = system.rag_agent.hybrid_search(query, k=2)

print(f"Hybrid Search for: '{query}'\n")
for i, doc in enumerate(docs, 1):
    print(f"Result {i}:")
    print(f"  Content: {doc.page_content[:200]}...")
    if doc.metadata:
        print(f"  Metadata: {doc.metadata}")
    print()

In [None]:
# Test direct Cypher query generation
question = "How many suppliers are there in each country?"
cypher_result = system.rag_agent.cypher_query(question)

print(f"Cypher Query for: '{question}'\n")
print(f"Result: {cypher_result}")

In [None]:
# Test supply chain tracing
trace_result = system.rag_agent.trace_issue_to_supplier(
    product_name="Sofa",
    issue_keyword="quality"
)

print("Supply Chain Trace for Sofa with quality issues:\n")
print(trace_result)

## 7. Interactive Q&A Session

Run an interactive session where you can ask questions directly.

In [None]:
# Run interactive Q&A (uncomment to use)
# system.interactive_qa()

## 8. System Test Suite

Run a comprehensive test of the system.

In [None]:
# Run the system test suite
system.test_system()

## 9. Quick Setup Function

For quick demonstrations, use the convenience function that builds everything in one call.

In [None]:
# Quick setup - builds complete system with defaults
# quick_system = await create_and_build_system(
#     reset=True,
#     limit_markdown_files=3
# )

# # Now you can immediately ask questions
# answer = quick_system.ask_question("What suppliers are in Sweden?")
# print(answer)

## 10. Visualization Helpers

Generate Cypher queries for visualization in Neo4j Browser.

In [None]:
# Generate visualization query for a product's supply chain
product_name = "Uppsala Sofa"

viz_query = f"""
// Visualization for {product_name} supply chain
MATCH path = (p:Product {{product_name: '{product_name}'}})-[:Contains]->(a:Assembly)
OPTIONAL MATCH parts_path = (part:Part)-[:Is_Part_Of]->(a)
OPTIONAL MATCH supplier_path = (part)-[:Supplied_By]->(s:Supplier)
OPTIONAL MATCH entity_path = (e:`__Entity__`:Product)-[:CORRESPONDS_TO]->(p)
OPTIONAL MATCH issue_path = (e)-[:HAS_ISSUE]->(issue:Issue)

RETURN path, parts_path, supplier_path, entity_path, issue_path
LIMIT 50
"""

print(f"To visualize {product_name} in Neo4j Browser, run:")
print("\n" + viz_query)

In [None]:
# Generate query to see all entity correspondences
correspondence_query = """
// Show entity resolution connections
MATCH (e:`__Entity__`)-[r:CORRESPONDS_TO]->(d)
RETURN e, r, d
LIMIT 25
"""

print("To visualize entity resolutions in Neo4j Browser:")
print("\n" + correspondence_query)

## Summary

This notebook demonstrated:

1. **Modular Architecture**: Clean separation of concerns with specialized agents
2. **Complete Pipeline**: From raw data to working Q&A system
3. **Multiple Retrieval Strategies**: Vector search, Cypher queries, and graph traversal
4. **Supply Chain Analysis**: Tracing issues through the complete supply chain
5. **Entity Resolution**: Connecting extracted entities to domain nodes
6. **LangGraph Workflow**: Intelligent query routing for complex questions

### Key Advantages of Modular Design:

- **Maintainability**: Each agent can be updated independently
- **Testability**: Individual components can be tested in isolation
- **Extensibility**: New agents can be added without modifying existing code
- **Reusability**: Agents can be reused in different pipelines
- **Clarity**: Clear responsibility boundaries between components

### Next Steps:

1. Process all markdown files (remove limit)
2. Add more entity types and relationships
3. Implement custom similarity metrics for entity resolution
4. Add evaluation metrics for Q&A quality
5. Deploy as API service
6. Create a web interface for the Q&A system