# NebulaGraph Property Graph Index

NebulaGraph is an open-source distributed graph database built for super large-scale graphs with milliseconds of latency.

If you already have an existing graph, please skip to the end of this notebook.

In [None]:
%pip install llama-index llama-index-graph-stores-nebula jupyter-nebulagraph

## Docker Setup

To launch NebulaGraph locally, first ensure you have docker installed. Then, you can launch the database with the following docker command

```bash
mkdir nebula-docker-compose
cd nebula-docker-compose
curl -O https://raw.githubusercontent.com/vesoft-inc/nebula-docker-compose/master/docker-compose-lite.yaml
docker compose -f docker-compose-lite.yaml up 
```

After this, you are ready to create your first property graph!

In [1]:
%load_ext ngql

In [2]:
%ngql --address 127.0.0.1 --port 9669 --user root --password nebula
%ngql CREATE SPACE IF NOT EXISTS llamaindex_nebula_property_graph(vid_type=FIXED_STRING(256));

Connection Pool Created


In [3]:
%ngql USE llamaindex_nebula_property_graph;

## Env Setup

We need just a few environment setups to get started.

In [4]:
import os

os.environ["OPENAI_API_KEY"] = "sk-proj-..."

In [5]:
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

--2024-05-28 16:25:48--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘data/paul_graham/paul_graham_essay.txt’


2024-05-28 16:25:49 (614 KB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]



In [11]:
import nest_asyncio

nest_asyncio.apply()

In [4]:
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

## Index Construction

Prepare property graph store

In [5]:
from llama_index.graph_stores.nebula import NebulaPropertyGraphStore

graph_store = NebulaPropertyGraphStore(space="llamaindex_nebula_property_graph")

Prepare embedding model using huggingface or openai

In [None]:
%pip install llama-index-embeddings-huggingface

In [6]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
emb_model = HuggingFaceEmbedding(model_name="intfloat/multilingual-e5-large")
# from llama_index.embeddings.openai import OpenAIEmbedding
# emb_model = OpenAIEmbedding(model_name="text-embedding-3-small")



And vector store:

In [7]:
from llama_index.core.vector_stores.simple import SimpleVectorStore
vec_store = SimpleVectorStore()
# vec_store = SimpleVectorStore.from_persist_path("./vec_store.json")

Finally, build the index!

In [12]:
from llama_index.core.indices.property_graph import PropertyGraphIndex
from llama_index.core.storage.storage_context import StorageContext
from llama_index.llms.openai import OpenAI

index = PropertyGraphIndex.from_documents(
    documents,
    llm=OpenAI(model="gpt-4o", temperature=0.3),
    embed_model=emb_model,
    storage_context=StorageContext.from_defaults(
        property_graph_store=graph_store,
        vector_store=vec_store,
    ),
    show_progress=True,
)

index.storage_context.vector_store.persist("./vec_store.json")

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

Extracting paths from text: 100%|█████████████████████████| 22/22 [02:15<00:00,  6.16s/it]
Extracting implicit paths: 100%|████████████████████████| 22/22 [00:00<00:00, 7176.44it/s]
  subpatternappend((LITERAL, _ord(this)))
Generating embeddings: 100%|████████████████████████████████| 3/3 [00:01<00:00,  2.49it/s]
Generating embeddings: 100%|██████████████████████████████| 52/52 [00:07<00:00,  6.87it/s]


Now that the graph is created, we can explore it by [jupyter-nebulagraph](https://github.com/wey-gu/jupyter_nebulagraph)

In [14]:
%ng_draw_schema

<class 'pyvis.network.Network'> |N|=2 |E|=0

In [24]:
%ngql MATCH (v:Entity) RETURN v LIMIT 5;

[ERROR]:
 Query Failed:
 SemanticError: `Entity__': Unknown tag


In [20]:
%ng_draw

<class 'pyvis.network.Network'> |N|=0 |E|=0

## Querying and Retrieval

In [None]:
retriever = index.as_retriever(
    include_text=False,  # include source text in returned nodes, default True
)

nodes = retriever.retrieve("What happened at Interleaf and Viaweb?")

for node in nodes:
    print(node.text)

Interleaf -> Got crushed by -> Moore's law
Interleaf -> Made -> Scripting language
Interleaf -> Had -> Smart people
Interleaf -> Inspired by -> Emacs
Interleaf -> Had -> Few years to live
Interleaf -> Made -> Software
Interleaf -> Had done -> Something bold
Interleaf -> Added -> Scripting language
Interleaf -> Built -> Impressive technology
Interleaf -> Was -> Company
Viaweb -> Was -> Profitable
Viaweb -> Was -> Growing rapidly
Viaweb -> Suggested -> Hospital
Idea -> Was clear from -> Experience
Idea -> Would have to be embodied as -> Company
Painting department -> Seemed to be -> Rigorous


In [None]:
query_engine = index.as_query_engine(include_text=True)

response = query_engine.query("What happened at Interleaf and Viaweb?")

print(str(response))

Interleaf had smart people and built impressive technology but got crushed by Moore's Law. Viaweb was profitable and growing rapidly.


## Loading from an existing Graph

If you have an existing graph (either created with LlamaIndex or otherwise), we can connect to and use it!

**NOTE:** If your graph was created outside of LlamaIndex, the most useful retrievers will be [text to cypher](../../module_guides/indexing/lpg_index_guide.md#texttocypherretriever) or [cypher templates](../../module_guides/indexing/lpg_index_guide.md#cyphertemplateretriever). Other retrievers rely on properties that LlamaIndex inserts.

In [None]:
from llama_index.graph_stores.neo4j import Neo4jPGStore
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI

graph_store = Neo4jPGStore(
    username="neo4j",
    password="794613852",
    url="bolt://localhost:7687",
)

index = PropertyGraphIndex.from_existing(
    property_graph_store=graph_store,
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.3),
    embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
)

From here, we can still insert more documents!

In [None]:
from llama_index.core import Document

document = Document(text="LlamaIndex is great!")

index.insert(document)

In [None]:
nodes = index.as_retriever(include_text=False).retrieve("LlamaIndex")

print(nodes[0].text)

Llamaindex -> Is -> Great


For full details on construction, retrieval, querying of a property graph, see the [full docs page](../../module_guides/indexing/lpg_index_guide.md).