<a href="https://colab.research.google.com/github/maneeha/property-graph/blob/main/docs/docs/examples/property_graph/property_graph_neo4j.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neo4j Property Graph Index

<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/property_graph/property_graph_neo4j.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


Neo4j is a production-grade graph database, capable of storing a property graph, performing vector search, filtering, and more.

The easiest way to get started is with a cloud-hosted instance using [Neo4j Aura](https://neo4j.com/cloud/platform/aura-graph-database/)

For this notebook, we will instead cover how to run the database locally with docker.

If you already have an existing graph, please skip to the end of this notebook.

In [1]:
%pip install llama-index llama-index-graph-stores-neo4j

Collecting llama-index
  Downloading llama_index-0.10.45-py3-none-any.whl (6.8 kB)
Collecting llama-index-graph-stores-neo4j
  Downloading llama_index_graph_stores_neo4j-0.2.4-py3-none-any.whl (11 kB)
Collecting llama-index-agent-openai<0.3.0,>=0.1.4 (from llama-index)
  Downloading llama_index_agent_openai-0.2.7-py3-none-any.whl (12 kB)
Collecting llama-index-cli<0.2.0,>=0.1.2 (from llama-index)
  Downloading llama_index_cli-0.1.12-py3-none-any.whl (26 kB)
Collecting llama-index-core==0.10.44 (from llama-index)
  Downloading llama_index_core-0.10.44-py3-none-any.whl (15.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.4/15.4 MB[0m [31m48.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting llama-index-embeddings-openai<0.2.0,>=0.1.5 (from llama-index)
  Downloading llama_index_embeddings_openai-0.1.10-py3-none-any.whl (6.2 kB)
Collecting llama-index-indices-managed-llama-cloud<0.2.0,>=0.1.2 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-

## Docker Setup

To launch Neo4j locally, first ensure you have docker installed. Then, you can launch the database with the following docker command

```bash
docker run \
    -p 7474:7474 -p 7687:7687 \
    -v $PWD/data:/data -v $PWD/plugins:/plugins \
    --name neo4j-apoc \
    -e NEO4J_apoc_export_file_enabled=true \
    -e NEO4J_apoc_import_file_enabled=true \
    -e NEO4J_apoc_import_file_use__neo4j__config=true \
    -e NEO4JLABS_PLUGINS=\[\"apoc\"\] \
    neo4j:latest
```

From here, you can open the db at [http://localhost:7474/](http://localhost:7474/). On this page, you will be asked to sign in. Use the default username/password of `neo4j` and `neo4j`.

Once you login for the first time, you will be asked to change the password.

After this, you are ready to create your first property graph!

## Env Setup

We need just a few environment setups to get started.

In [9]:
import os

os.environ["OPENAI_API_KEY"] = ""

In [10]:
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

--2024-06-15 09:16:57--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘data/paul_graham/paul_graham_essay.txt’


2024-06-15 09:16:58 (3.47 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]



In [11]:
import nest_asyncio

nest_asyncio.apply()

In [12]:
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

## Index Construction

In [21]:
from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore

# Note: used to be `Neo4jPGStore`
graph_store = Neo4jPropertyGraphStore(
    username="neo4j",
    password="",
    url="neo4j+s://3ec39e25.databases.neo4j.io",
)



In [14]:
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor

index = PropertyGraphIndex.from_documents(
    documents,
    embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
    kg_extractors=[
        SchemaLLMPathExtractor(
            llm=OpenAI(model="gpt-3.5-turbo", temperature=0.0)
        )
    ],
    property_graph_store=graph_store,
    show_progress=True,
)

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]


Extracting paths from text with schema:   0%|          | 0/22 [00:00<?, ?it/s][A
Extracting paths from text with schema:   5%|▍         | 1/22 [00:03<01:21,  3.88s/it][A
Extracting paths from text with schema:   9%|▉         | 2/22 [00:04<00:37,  1.88s/it][A
Extracting paths from text with schema:  14%|█▎        | 3/22 [00:06<00:38,  2.05s/it][A
Extracting paths from text with schema:  18%|█▊        | 4/22 [00:08<00:36,  2.04s/it][A
Extracting paths from text with schema:  23%|██▎       | 5/22 [00:09<00:28,  1.65s/it][A
Extracting paths from text with schema:  27%|██▋       | 6/22 [00:11<00:25,  1.59s/it][A
Extracting paths from text with schema:  32%|███▏      | 7/22 [00:14<00:31,  2.07s/it][A
Extracting paths from text with schema:  36%|███▋      | 8/22 [00:14<00:21,  1.50s/it][A
Extracting paths from text with schema:  41%|████      | 9/22 [00:15<00:16,  1.28s/it][A
Extracting paths from text with schema:  45%|████▌     | 10/22 [00:20<00:30,  2.51s/it][A
Extracting paths

Now that the graph is created, we can explore it in the UI by visting [http://localhost:7474/](http://localhost:7474/).

The easiest way to see the entire graph is to use a cypher command like `"match n=() return n"` at the top.

To delete an entire graph, a useful command is `"match n=() detach delete n"`.

## Querying and Retrieval

In [17]:
retriever = index.as_retriever(
    include_text=False,  # include source text in returned nodes, default True
)

nodes = retriever.retrieve("What happened at Interleaf and Viaweb?")

for node in nodes:
    print(node.text)

Interleaf -> HAS -> Lisp
Paul Graham -> WORKED_ON -> Interleaf
Viaweb -> PART_OF -> Yahoo
Viaweb -> HAS -> Julian
Viaweb -> IS_A -> software as a service
Viaweb -> IS_A -> application service provider
Viaweb -> IS_A -> web app
Viaweb -> LOCATED_IN -> California
Viaweb -> USED_FOR -> software development
Viaweb -> USED_FOR -> retail
Viaweb -> USED_FOR -> building stores
Paul Graham -> WORKED_ON -> Viaweb
Paul Graham -> WORKED_ON -> Web
Paul Graham -> IS_A -> web app
web app -> USED_FOR -> building online stores


In [18]:
query_engine = index.as_query_engine(include_text=True)

response = query_engine.query("What happened at Interleaf and Viaweb?")

print(str(response))

Paul Graham worked at Interleaf where he encountered a scripting language inspired by Emacs, which was a dialect of Lisp. He mentioned that he learned valuable lessons at Interleaf, such as the importance of having technology companies run by product people rather than sales people. Later, Paul Graham worked on Viaweb, a company that specialized in building online stores using a web app. Viaweb allowed users to control the software through a browser, eliminating the need for client software. Paul Graham and his team received seed funding for Viaweb and made significant progress in developing the software, leading to the creation of a new company named Viaweb.


## Loading from an existing Graph

If you have an existing graph (either created with LlamaIndex or otherwise), we can connect to and use it!

**NOTE:** If your graph was created outside of LlamaIndex, the most useful retrievers will be [text to cypher](../../module_guides/indexing/lpg_index_guide.md#texttocypherretriever) or [cypher templates](../../module_guides/indexing/lpg_index_guide.md#cyphertemplateretriever). Other retrievers rely on properties that LlamaIndex inserts.

In [20]:
from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI

graph_store = Neo4jPropertyGraphStore(
    username="neo4j",
    password="fIDInSO0iYR_6Xvk-869x_RNrYFTgWWmzXxIfMzMS-g",
    url="neo4j+s://3ec39e25.databases.neo4j.io",
)

index = PropertyGraphIndex.from_existing(
    property_graph_store=graph_store,
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.3),
    embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
)



From here, we can still insert more documents!

In [None]:
from llama_index.core import Document

document = Document(text="LlamaIndex is great!")

index.insert(document)

In [None]:
nodes = index.as_retriever(include_text=False).retrieve("LlamaIndex")

print(nodes[0].text)

For full details on construction, retrieval, querying of a property graph, see the [full docs page](../../module_guides/indexing/lpg_index_guide.md).