# Property Graph Index

<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/property_graph/property_graph_basic.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


In this notebook, we demonstrate some basic usage of the `PropertyGraphIndex` in LlamaIndex.

The property graph index here will take unstructured documents, extract a property graph from it, and provide various methods to query that graph.

In [1]:
%pip install llama-index

Collecting llama-index
  Downloading llama_index-0.10.46-py3-none-any.whl.metadata (11 kB)
Collecting llama-index-agent-openai<0.3.0,>=0.1.4 (from llama-index)
  Downloading llama_index_agent_openai-0.2.7-py3-none-any.whl.metadata (678 bytes)
Collecting llama-index-cli<0.2.0,>=0.1.2 (from llama-index)
  Downloading llama_index_cli-0.1.12-py3-none-any.whl.metadata (1.5 kB)
Collecting llama-index-core==0.10.46 (from llama-index)
  Downloading llama_index_core-0.10.46-py3-none-any.whl.metadata (2.5 kB)
Collecting llama-index-embeddings-openai<0.2.0,>=0.1.5 (from llama-index)
  Downloading llama_index_embeddings_openai-0.1.10-py3-none-any.whl.metadata (604 bytes)
Collecting llama-index-indices-managed-llama-cloud<0.2.0,>=0.1.2 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.1.6-py3-none-any.whl.metadata (3.8 kB)
Collecting llama-index-legacy<0.10.0,>=0.9.48 (from llama-index)
  Downloading llama_index_legacy-0.9.48-py3-none-any.whl.metadata (8.5 kB)
Collecting ll

## Setup

In [3]:
import os

os.environ["OPENAI_API_KEY"] = "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"

In [2]:
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

--2024-06-18 11:38:46--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8000::154, 2606:50c0:8001::154, 2606:50c0:8002::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8000::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘data/paul_graham/paul_graham_essay.txt’


2024-06-18 11:38:49 (167 KB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]



In [4]:
import nest_asyncio

nest_asyncio.apply()

In [5]:
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data/laohac/").load_data()

## Construction

In [6]:
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI

index = PropertyGraphIndex.from_documents(
    documents,
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.3),
    embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
    show_progress=True,
)

  from .autonotebook import tqdm as notebook_tqdm
Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 48.69it/s]
Extracting paths from text: 100%|██████████| 12/12 [00:09<00:00,  1.21it/s]
Extracting implicit paths: 100%|██████████| 12/12 [00:00<00:00, 7986.62it/s]
Generating embeddings: 100%|██████████| 1/1 [00:02<00:00,  2.71s/it]
Generating embeddings: 100%|██████████| 3/3 [00:01<00:00,  1.64it/s]


So lets recap what exactly just happened
- `PropertyGraphIndex.from_documents()` - we loaded documents into an index
- `Parsing nodes` - the index parsed the documents into nodes
- `Extracting paths from text` - the nodes were passed to an LLM, and the LLM was prompted to generate knowledge graph triples (i.e. paths)
- `Extracting implicit paths` - each `node.relationships` property was used to infer implicit paths
- `Generating embeddings` - embeddings were generated for each text node and graph node (hence this happens twice)

Lets explore what we created! For debugging purposes, the default `SimplePropertyGraphStore` includes a helper to save a `networkx` representation of the graph to an `html` file.

In [7]:
!pip install pyvis

Collecting pyvis
  Downloading pyvis-0.3.2-py3-none-any.whl.metadata (1.7 kB)
Collecting jinja2>=2.9.6 (from pyvis)
  Using cached jinja2-3.1.4-py3-none-any.whl.metadata (2.6 kB)
Collecting jsonpickle>=1.4.1 (from pyvis)
  Downloading jsonpickle-3.2.1-py3-none-any.whl.metadata (7.2 kB)
Collecting MarkupSafe>=2.0 (from jinja2>=2.9.6->pyvis)
  Using cached MarkupSafe-2.1.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.0 kB)
Downloading pyvis-0.3.2-py3-none-any.whl (756 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m756.0/756.0 kB[0m [31m161.3 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hUsing cached jinja2-3.1.4-py3-none-any.whl (133 kB)
Downloading jsonpickle-3.2.1-py3-none-any.whl (41 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.7/41.7 kB[0m [31m287.9 kB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hUsing cached MarkupSafe-2.1.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB)
Installing 

In [8]:
index.property_graph_store.save_networkx_graph(name="./kg.html")

Opening the html in a browser, we can see our graph!

If you zoom in, each "dense" node with many connections is actually the source chunk, with extracted entities and relations branching off from there.

![example graph](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/property_graph/kg_screenshot.png?raw=1)

## Customizing Low-Level Construction

If we wanted, we can do the same ingestion using the low-level API, leverage `kg_extractors`.

In [None]:
from llama_index.core.indices.property_graph import (
    ImplicitPathExtractor,
    SimpleLLMPathExtractor,
)

index = PropertyGraphIndex.from_documents(
    documents,
    embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
    kg_extractors=[
        ImplicitPathExtractor(),
        SimpleLLMPathExtractor(
            llm=OpenAI(model="gpt-3.5-turbo", temperature=0.3),
            num_workers=4,
            max_paths_per_chunk=10,
        ),
    ],
    show_progress=True,
)

For a full guide on all extractors, see the [detailed usage page](../../module_guides/indexing/lpg_index_guide.md#construction).

## Querying

Querying a property graph index typically consists of using one or more sub-retrievers and combining results.

Graph retrieval can be thought of
- selecting node(s)
- traversing from those nodes

By default, two types of retrieval are used in unison
- synoynm/keyword expansion - use the LLM to generate synonyms and keywords from the query
- vector retrieval - use embeddings to find nodes in your graph

Once nodes are found, you can either
- return the paths adjacent to the selected nodes (i.e. triples)
- return the paths + the original source text of the chunk (if available)

In [None]:
retriever = index.as_retriever(
    include_text=False,  # include source text, default True
)

nodes = retriever.retrieve("What happened at Interleaf and Viaweb?")

for node in nodes:
    print(node.text)

Interleaf -> Was -> On the way down
Viaweb -> Had -> Code editor
Interleaf -> Built -> Impressive technology
Interleaf -> Added -> Scripting language
Interleaf -> Made -> Scripting language
Viaweb -> Suggested -> Take to hospital
Interleaf -> Had done -> Something bold
Viaweb -> Called -> After
Interleaf -> Made -> Dialect of lisp
Interleaf -> Got crushed by -> Moore's law
Dan giffin -> Worked for -> Viaweb
Interleaf -> Had -> Smart people
Interleaf -> Had -> Few years to live
Interleaf -> Made -> Software
Interleaf -> Made -> Software for creating documents
Paul graham -> Started -> Viaweb
Scripting language -> Was -> Dialect of lisp
Scripting language -> Is -> Dialect of lisp
Software -> Will be affected by -> Rapid change
Code editor -> Was -> In viaweb
Software -> Worked via -> Web
Programs -> Typed on -> Punch cards
Computers -> Skipped -> Step
Idea -> Was clear from -> Experience
Apartment -> Wasn't -> Rent-controlled


In [None]:
query_engine = index.as_query_engine(
    include_text=True,
)

response = query_engine.query("What happened at Interleaf and Viaweb?")

print(str(response))

Interleaf had smart people and built impressive technology, including adding a scripting language that was a dialect of Lisp. However, despite their efforts, they were eventually impacted by Moore's Law and faced challenges. Viaweb, on the other hand, was started by Paul Graham and had a code editor where users could define their own page styles using Lisp expressions. Viaweb also suggested taking someone to the hospital and called something "After."


For full details on customizing retrieval and querying, see [the docs page](../../module_guides/indexing/lpg_index_guide.md#retrieval-and-querying).

## Storage

By default, storage happens using our simple in-memory abstractions - `SimpleVectorStore` for embeddings and `SimplePropertyGraphStore` for the property graph.

We can save and load these to/from disk.

In [None]:
index.storage_context.persist(persist_dir="./storage")

from llama_index.core import StorageContext, load_index_from_storage

index = load_index_from_storage(
    StorageContext.from_defaults(persist_dir="./storage")
)

### Vector Stores

While some graph databases support vectors (like Neo4j), you can still specify the vector store to use on top of your graph for cases where its not supported, or cases where you want to override.

Below we will combine `ChromaVectorStore` with the default `SimplePropertyGraphStore`.

In [None]:
%pip install llama-index-vector-stores-chroma

In [None]:
from llama_index.core.graph_stores import SimplePropertyGraphStore
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

client = chromadb.PersistentClient("./chroma_db")
collection = client.get_or_create_collection("my_graph_vector_db")

index = PropertyGraphIndex.from_documents(
    documents,
    embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
    graph_store=SimplePropertyGraphStore(),
    vector_store=ChromaVectorStore(collection=collection),
    show_progress=True,
)

index.storage_context.persist(persist_dir="./storage")

Then to load:

In [None]:
index = PropertyGraphIndex.from_existing(
    SimplePropertyGraphStore.from_persist_dir("./storage"),
    vector_store=ChromaVectorStore(chroma_collection=collection),
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.3),
)

This looks slightly different than purely using the storage context, but the syntax is more concise now that we've started to mix things together.