# Memgraph Property Graph Index

In this example, we're using Memgraph's integration with
[LlamaIndex](https://www.llamaindex.ai/) to build a **Property Graph Index**
from a Paul Graham essay and use it to retrieve structured insights.  

- We start by **downloading** the essay and preparing the text for processing.  
- Next, we **connect to Memgraph**, a graph database, to store and manage our
  structured data.  
- We then **create a Property Graph Index**, transforming the unstructured text
  into a structured graph using OpenAI’s embedding and language models.  
- Finally, we **query the graph** using both a retriever and a query engine to
  extract meaningful relationships from the text.  

This notebook demonstrates how to turn raw text into a **queryable knowledge
graph**, making it easier to analyze and retrieve insights from documents.

## Prerequisites

1. **Run Memgraph**
Before running Memgraph, ensure you have [Docker](https://www.docker.com/)
running in the background. The quickest way to try out Memgraph Platform
(Memgraph database + MAGE library + Memgraph Lab) for the first time is running
the following command:

For Linux/macOS:
`curl https://install.memgraph.com | sh`

For Windows:
`iwr https://windows.memgraph.com | iex`

From here, you can check Memgraph's visual tool, [Memgraph
Lab](https://memgraph.com/docs/data-visualization) on the
`http://localhost:3000/` or the [desktop version](https://memgraph.com/download)
of the app.

2. **Install necessary dependencies**

In [None]:
%pip install llama-index llama-index-graph-stores-memgraph python-dotenv

3. Create vector index in Memgraph on the `__Entity__` label and `embedding`
   property. LlamaIndex creates embeddings and uses Memgraph's [vector
   search](https://memgraph.com/docs/querying/vector-search) for more accurate
   retrieval.

`CREATE VECTOR INDEX entity ON :__Entity__(embedding) WITH CONFIG {"dimension": 1536, "capacity": 1000};`


## Create the script

First, let's create an `.env` file that contains your OpenAI API key:

`OPENAI_API_KEY=sk-proj-...`

We then load our `.env` file and set the LLM model we want to use. In this
example, we're using OpenAI's gpt-4 model.

In [None]:
from dotenv import load_dotenv
load_dotenv()

Next, create the data directory and download the Paul Graham essay we'll be
using as the input data for this example.

In [None]:
import urllib.request
import os

os.makedirs("data/paul_graham/", exist_ok=True)

url = "https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt"
output_path = "data/paul_graham/paul_graham_essay.txt"
urllib.request.urlretrieve(url, output_path)

### Load the dataset

Using LlamaIndex's `SimpleDirectoryReader`, we're loading the textual data from
our defined data directory. This prepares the document for further processing,
such as indexing.

In [None]:
import nest_asyncio
from llama_index.core import SimpleDirectoryReader

nest_asyncio.apply()

with open(output_path, "r", encoding="utf-8") as file:
    content = file.read()

with open(output_path, "w", encoding="utf-8") as file:
    file.write(content)

documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

### Connect to Memgraph

To establish a connection with Memgraph, set up the `MemgraphPropertyGraphStore`
class by providing your database credentials. You need to specify the username,
password, and connection URL (e.g., `bolt://localhost:7687`).  

Once initialized, this `graph_store` object will allow you to interact with
Memgraph and store or retrieve graph-based data efficiently.

In [None]:
from llama_index.graph_stores.memgraph import MemgraphPropertyGraphStore

username = ""  # Enter your Memgraph username (default "")
password = ""  # Enter your Memgraph password (default "")
url = ""  # Specify the connection URL, e.g., 'bolt://localhost:7687'

graph_store = MemgraphPropertyGraphStore(
    username=username,
    password=password,
    url=url,
)

### Create a Property Graph Index  

Next, we build a **Property Graph Index** using the documents we previously
loaded. This index will help structure and store our data efficiently in
Memgraph.  

- We use `OpenAIEmbedding` to generate vector embeddings for the text.  
- We configure `SchemaLLMPathExtractor`, which utilizes an OpenAI model
  (`gpt-4`) to extract structured knowledge from the documents.  
- The index is stored in Memgraph using the `graph_store` connection.  

By running this, we transform unstructured text into a structured property
graph, making it easier to query and analyze relationships within the data.


In [None]:
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor

index = PropertyGraphIndex.from_documents(
    documents,
    embed_model=OpenAIEmbedding(model_name="text-embedding-ada-002"),
    kg_extractors=[
        SchemaLLMPathExtractor(
            llm=OpenAI(model="gpt-4", temperature=0.0)
        )
    ],
    property_graph_store=graph_store,
    show_progress=True,
)

Now that the graph is created, we can explore it in the UI by visiting `http://localhost:3000/`.

The easiest way to visualize the entire graph is by running a Cypher command similar to this:

`MATCH p=()-[]-() RETURN p;`

This command matches all of the possible paths in the graph and returns entire graph.

To visualize the schema of the graph, visit the Graph schema tab and generate the new schema based on the newly created graph.

To delete an entire graph, use:

`MATCH (n) DETACH DELETE n;`

### Querying & retrieval 

Now that we have structured our data into a property graph, we can retrieve
relevant information using two different approaches:  

1. **Retriever-based Search:**  
   - We convert the index into a retriever (`as_retriever`), which allows us to
     fetch relevant nodes related to a query.  
   - In this example, we query, *"What happened at Interleaf and Viaweb?"*, and
     print the retrieved nodes.  

2. **Query Engine:**  
   - We convert the index into a query engine (`as_query_engine`), which
     provides a more detailed response by leveraging the structured graph.  
   - The response includes a more comprehensive answer based on the extracted
     relationships.  

This step allows us to interact with our graph and extract meaningful insights
from the indexed data.


In [None]:
retriever = index.as_retriever(include_text=False)

# Example query: "What happened at Interleaf and Viaweb?"
nodes = retriever.retrieve("What happened at Interleaf and Viaweb?")

# Output results
print("Query Results:")
for node in nodes:
    print(node.text)

# Alternatively, using a query engine
query_engine = index.as_query_engine(include_text=True)

# Perform a query and print the detailed response
response = query_engine.query("What happened at Interleaf and Viaweb?")
print("\nDetailed Query Response:")
print(str(response))