# Knowledge Graph Creation and Querying with Neo4j and LLMs
![title](neo4jdogs.png)

This notebook demonstrates how to use Neo4j to store, manage, and query a knowledge graph created in the previous notebook [Create knowledge graph from PDF](./knowledgegraph.ipynb).

## Overview
- Load graph data from pickle file created in the previous step
- Store data in Neo4j database
- Create a graph query chain using LangChain and Azure OpenAI
- Query the graph using natural language

## Environment Setup and Neo4j Connection

Before running this notebook, ensure you have started the Neo4j database using Docker Compose:
```bash
docker-compose up -d
```

The following cell loads environment variables and establishes a connection to the Neo4j database.

In [None]:
import os
from langchain_neo4j import Neo4jGraph
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Set up connection to Neo4j database
try:
    graph = Neo4jGraph(
        url=os.getenv("NEO4J_URL"),
        username=os.getenv("NEO4J_USERNAME"),
        password=os.getenv("NEO4J_PASSWORD")
    )
    print("Successfully connected to Neo4j database")
except Exception as e:
    print(f"Error connecting to Neo4j: {e}")
    print("Make sure the Neo4j container is running and environment variables are set correctly")

## Loading and Processing Graph Data

In this section, we load previously generated graph data from a pickle file and prepare it for import into Neo4j.
The data enhances node properties to improve visualization in Neo4j's interface.

In [None]:
import pickle
from pathlib import Path

# Define file path and check if it exists
graph_file = Path('./data/graph_docs.pkl')
if not graph_file.exists():
    raise FileNotFoundError(f"Graph data file not found at {graph_file}. Run the previous notebook first.")

# Load the graph documents from pickle file
try:
    with open(graph_file, 'rb') as f:
        graph_docs = pickle.load(f)
    print(f"Successfully loaded {len(graph_docs)} graph documents")

    # Enhance nodes with display properties for better visualization
    for doc in graph_docs:
        for node in doc.nodes:
            node.properties['name'] = node.id
            node.properties['category'] = node.type
                
    # Add the enhanced graph documents to Neo4j
    graph.add_graph_documents(graph_docs, baseEntityLabel=True)
    print("Graph data successfully imported to Neo4j")
except Exception as e:
    print(f"Error loading or importing graph data: {e}")

## Neo4j Visualization

Now you should be able to see the graph in your Neo4j browser at http://localhost:7474/

The visualization provides an interactive way to explore the knowledge graph structure, relationships, and properties.

## Configure Natural Language Querying with LangChain and Azure OpenAI

This section sets up a query chain that allows us to ask questions in natural language about the knowledge graph.
The chain uses Azure OpenAI to translate natural language into Cypher queries and format the results.

In [None]:
from langchain_neo4j import GraphCypherQAChain
from langchain_openai import AzureChatOpenAI

# Refresh schema to ensure the latest graph structure is available to the LLM
graph.refresh_schema()

# Initialize Azure OpenAI client with proper error handling
try:
    c_llm = AzureChatOpenAI(
        timeout=3*60*1000,  # 3-minute timeout
        api_version="2025-02-01-preview",
        azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
        azure_deployment=os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME"),
        api_key=os.getenv("AZURE_OPENAI_API_KEY"),
        verbose=True,
        reasoning_effort="medium",
    )
    
    # Create the Cypher query chain
    cypher_chain = GraphCypherQAChain.from_llm(
        graph=graph,
        cypher_llm=c_llm,  # LLM for generating Cypher queries
        qa_llm=c_llm,      # LLM for formatting results
        top_k=500,         # Maximum number of results to return
        validate_cypher=True,  # Validate Cypher queries before execution
        verbose=True,          # Show intermediate steps
        allow_dangerous_requests=True  # Enable potentially expensive queries
    )
    print("Query chain successfully created")
except Exception as e:
    print(f"Error initializing Azure OpenAI or query chain: {e}")
    print("Check that all environment variables are properly set in your .env file")

## Querying the Knowledge Graph with Natural Language

Now we can ask questions about our knowledge graph using natural language. The system will:
1. Convert natural language to Cypher queries
2. Execute the queries against the Neo4j database
3. Format the results in a user-friendly way

Let's try some example queries below.

In [None]:
from IPython.display import Markdown, display

def query_graph(question):
    """Query the knowledge graph with natural language and display formatted results"""
    try:
        print(f"Question: {question}")
        response = cypher_chain.invoke(question)
        display(Markdown(response["result"]))
    except Exception as e:
        print(f"Error querying graph: {e}")

# Example 1: Overview of breeding groups
query_graph("Make table of different breeding groups and their characteristics")

In [None]:
# Example 2: Specific question about herding group breeds
query_graph("What breeds are in the Herding Group and what characteristics do they have?")

In [None]:
# Example 3: Compare characteristics across breed groups
query_graph("Compare the traits of Working Group dogs versus Toy Group dogs")

## Cleanup (Optional)

The following cell will clear all data from the Neo4j database. Run this if you want to reset your database or load fresh data.
**Warning**: This operation cannot be undone. Comment out or skip this cell if you want to preserve your data.

In [None]:
# Database cleanup - uncomment to execute
# graph.query("MATCH (n) DETACH DELETE n")
# print("Database cleared successfully")

## Conclusion

In this notebook, we've demonstrated how to:

1. Load pre-processed graph data into Neo4j
2. Configure a natural language interface using LangChain and Azure OpenAI
3. Query the graph using natural language questions
4. Visualize and explore the results

This approach combines the power of knowledge graphs with LLMs to provide a flexible way to explore complex relationships in structured data while maintaining context and provenance.

### Next Steps

- Experiment with more complex queries
- Add additional data sources to enrich the knowledge graph
- Implement more advanced filtering and aggregation capabilities
- Integrate with downstream applications or dashboards