<a href="https://colab.research.google.com/github/genaiconference/Agentic_RAG_Workshop/blob/main/09_build_kg_from_unstructured_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neo4J GraphRAG Python Package: Accelerating GenAI With Knowledge Graphs

## Intro to Neo4J GraphRAG

The **GraphRAG Python package** from **Neo4j** provides end-to-end workflows that take you from unstructured data to knowledge graph creation, knowledge graph retrieval, and full GraphRAG pipelines in one place. Whether you’re using Python to build knowledge assistants, search APIs, chatbots, or report generators, this package makes it easy to incorporate knowledge graphs to improve your retrieval-augmented generation (RAG) relevance, accuracy, and explainability.

## Goal of this Notebook

In this notebook, you can learn how to go from **zero to GraphRAG** using the GraphRAG Python package

This notebook contains an **end-to-end worked example** using the GraphRAG Python package for Neo4j.
- It starts with
  - unstructured documents (in this case pdf - Leave Policy Document),
  - progresses through knowledge graph construction,
  - knowledge graph retriever design, and
  - complete GraphRAG pipelines.

# GraphRAG: Adding Knowledge to GenAI

Combining knowledge graphs and RAG, GraphRAG helps solve common issues with large language models (LLMs), like hallucinations

## GraphRAG Overview

- **Problem with LLMs:** They hallucinate and often lack reliable, domain-specific context

- **Traditional RAG:** Retrieves only fragments of unstructured text → limited context.

- **GraphRAG Advantages**  
  - Leverages **knowledge graphs** for structured + semi-structured data  
  - Provides richer context → reduces hallucinations  
  - Enables more reliable answers  
  - Acts as a **trusted agent** in complex workflows  

- **GraphRAG Python Package Features**  
  - Easy creation and querying of knowledge graphs  
  - Retrieval via:  
    - Graph traversals  
    - Text2Cypher query generation  
    - Vector search  
    - Full-text search  
  - Tooling for **full RAG pipelines**  
  - Seamless integration into GenAI applications & workflows


## Setup & Installations

In [None]:
!git clone https://github.com/genaiconference/Agentic_RAG_Workshop.git

In [None]:
!pip install -r /content/Agentic_RAG_Workshop/requirements.txt

## Pre-Requisites

**1. Create a Neo4j Database:** To work through this RAG example, you need a database for storing and retrieving data. There are many options for this. You can quickly start a free Neo4j Graph Database using [Neo4j AuraDB](https://neo4j.com/product/auradb/?ref=neo4j-home-hero). You can use **AuraDB Free ** or start an **AuraDB Professional (Pro) free trial** for higher ingestion and retrieval performance. The Pro instances have a bit more RAM.

**2. Get Credentials:** You will need:
The Neo4j URI, username, and password variables from when you created the database. If you created your database on AuraDB, they are in the file you downloaded.

## 0. Create a Neo4j Database

Use **AuraDB Free**

- Start a free Neo4j Graph Database using [Neo4j AuraDB](https://neo4j.com/product/auradb/?ref=neo4j-home-hero).

- Once you create an instance, you can download and save the credentials to use in the following code.

In [None]:
# Load Environment Variables
import os
os.chdir("/content/Agentic_RAG_Workshop/")

from dotenv import load_dotenv
load_dotenv()

# Load neo4j credentials (and openai api key in background).
NEO4J_URI = os.getenv('NEO4J_URI')
NEO4J_USERNAME = os.getenv('NEO4J_USERNAME')
NEO4J_PASSWORD = os.getenv('NEO4J_PASSWORD')

## 1. Knowledge Graph Building

Let's transform 'Leave Policy Document' into a knowledge graph and store it in our Neo4j database

The `SimpleKGPipeline` class allows you to automatically build a knowledge graph with a few key inputs, including

 - a driver to connect to Neo4j,
 - an LLM for entity extraction, and
 - an embedding model to create vectors on text chunks for similarity search.

### Neo4j Driver

The Neo4j driver allows you to connect and perform read and write transactions with the database.

In [None]:
import neo4j

# Neo4J Graph Driver
neo4j_driver = neo4j.GraphDatabase.driver(NEO4J_URI,
                                         auth=(NEO4J_USERNAME, NEO4J_PASSWORD))

### LLM & Embedding Model

In this case, we will use OpenAI **GPT-4o-mini** for convenience. It is a fast and low-cost model.

Likewise, we will use OpenAI’s **text-embedding-3-small** for the embedding model

In [None]:
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.embeddings.openai import OpenAIEmbeddings


# Neo4J way of defining LLM & embedder
llm=OpenAILLM(
    model_name="gpt-4o-mini",
    model_params={
        "response_format": {"type": "json_object"}, # use json_object formatting for best results
        "temperature": 0 # turning temperature down for more deterministic results
    }
)

# Create text embedder
embedder = OpenAIEmbeddings(
    model="text-embedding-3-small"
)