In [13]:
%pip install --upgrade --quiet  langchain langchain-community langchain-openai neo4j

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.0 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


# Running Cypher Queries in Neo4j

In this section, we will go through how to write and execute Cypher queries to interact with a Neo4j database. Cypher is a declarative graph query language used to interact with Neo4j databases.

In [2]:
from neo4j import GraphDatabase


uri = "bolt://localhost:7687"
user = "neo4j"
password = "new_password"  
db_name = "STACDBs"

# Connect to the Neo4j instance
driver = GraphDatabase.driver(uri, auth=(user, password))

def create_database(driver, db_name):
    with driver.session() as session:
        try:
            result = session.run("SHOW DATABASES")
            databases = [record["name"] for record in result]
            
            if db_name not in databases:
                session.run(f"CREATE DATABASE {db_name}")
                print(f"Database {db_name} created successfully.")
            else:
                print(f"Database {db_name} already exists.")
        except Exception as e:
            print(f"Error while creating database: {e}")

# Execute Cypher script
def run_cypher_script(driver, db_name, script_path):
    with driver.session(database=db_name) as session:
        try:
            with open(script_path, 'r') as file:
                cypher_query = file.read()  
                session.run(cypher_query)  
            print("Cypher script executed successfully.")
        except Exception as e:
            print(f"Error executing Cypher script: {e}")


create_database(driver, db_name) 
run_cypher_script(driver, db_name, "STAC.cypher") 

Error while creating database: {code: Neo.ClientError.Database.ExistingDatabaseFound} {message: Failed to create the specified database 'STACDBs': Database name or alias already exists.}
Cypher script executed successfully.


## Step 1: Connect to the Neo4j Database

To begin working with Neo4j, we need to connect to the Neo4j instance. Make sure that Neo4j is running and that you have the correct credentials.

Here’s an example of how to connect to the database using Python's `neo4j` driver:

In [3]:
def run_cypher_script(driver, db_name, script_path):
    with driver.session(database=db_name) as session:
        try:
            with open(script_path, 'r') as file:
                cypher_query = file.read()  
                session.run(cypher_query)  
            print("Cypher script executed successfully.")
        except Exception as e:
            print(f"Error executing Cypher script: {e}")

try:
    driver = GraphDatabase.driver(uri, auth=(user, password))

    with driver.session() as session:
        result = session.run("SHOW DATABASES")
        print("Databases under the Neo4j instance:")
        for record in result:
            print(f"Database: {record['name']} - Current Status: {record['currentStatus']} - Status Message: {record['statusMessage']}")

except Exception as e:
    print(f"Error connecting to Neo4j: {e}")
finally:
    driver.close()


Databases under the Neo4j instance:
Database: neo4j - Current Status: online - Status Message: 
Database: stacdbs - Current Status: online - Status Message: 
Database: system - Current Status: online - Status Message: 


## Deleting Specific Databases in Neo4j

In [None]:
databases_to_delete = [ 'stacdbs']

try:
    driver = GraphDatabase.driver(uri, auth=(user, password))
    
    with driver.session() as session:
        for db_name in databases_to_delete:
            # Avoid trying to delete 'system' or 'neo4j' database which are system databases
            if db_name not in ['neo4j', 'system']:
                session.run(f"DROP DATABASE {db_name} IF EXISTS")
                print(f"Database {db_name} has been deleted.")
            else:
                print(f"Cannot delete system database: {db_name}")
    
except Exception as e:
    print(f"Error connecting to Neo4j: {e}")
finally:
    driver.close()

Database stacdbs has been deleted.


## Retrive the data and check

In [4]:


try:
    driver = GraphDatabase.driver(uri, auth=(user, password))
    
    with driver.session(database=db_name) as session:
        result = session.run("MATCH (n) RETURN n")
        
        print(f"Data from the database {db_name}:")
        for record in result:
            print(record["n"])  

except Exception as e:
    print(f"Error connecting to Neo4j: {e}")
finally:
    driver.close()

Data from the database STACDBs:
<Node element_id='4:2da65ef2-b6cc-412d-80ce-8775f1c5036c:0' labels=frozenset({'BASIC_INFORMATIONS'}) properties={'composite_type': 'P1M', 'temporal_extent_start': '2020-01-08T16:22:00Z', 'spatial_extent': [5.7635, 46.7937, 15.1528, 55.9456], 'description': 'Sentinel-2 L3A WASP Products of Germany processed by DLR using the WASP processor', 'instrument': 'MSI', 'title': 'Sentinel-2 L3A Monthly WASP Products', 'sensor_type': 'OPTICAL', 'platform_serial_identifier': 'A/B', 'platform': 'Sentinel-2', 'license': 'CC-BY-4.0', 'name': 'S2_L3A_WASP', 'orbit_type': 'LEO', 'doi': '10.15489/4hcq6dgkj648'}>
<Node element_id='4:2da65ef2-b6cc-412d-80ce-8775f1c5036c:1' labels=frozenset({'BASIC_INFORMATIONS'}) properties={'license': 'CC-BY-4.0', 'name': 'S2_L2A_MAJA', 'temporal_extent_start': '2020-01-08T16:22:00Z', 'spatial_extent': [5.7635, 46.7937, 15.1528, 55.9456], 'description': 'Sentinel-2 L2A Products of Germany processed by DLR using the MAJA processor', 'orbit_

In [7]:
%pip install langchain transformers torch neo4j langchain_community[graphs]

Collecting transformers
  Using cached transformers-4.46.3-py3-none-any.whl.metadata (44 kB)
Collecting torch
  Using cached torch-2.5.1-cp312-cp312-win_amd64.whl.metadata (28 kB)
Collecting filelock (from transformers)
  Using cached filelock-3.16.1-py3-none-any.whl.metadata (2.9 kB)
Collecting huggingface-hub<1.0,>=0.23.2 (from transformers)
  Using cached huggingface_hub-0.26.3-py3-none-any.whl.metadata (13 kB)
Collecting tokenizers<0.21,>=0.20 (from transformers)
  Using cached tokenizers-0.20.3-cp312-none-win_amd64.whl.metadata (6.9 kB)
Collecting safetensors>=0.4.1 (from transformers)
  Using cached safetensors-0.4.5-cp312-none-win_amd64.whl.metadata (3.9 kB)
Collecting networkx (from torch)
  Using cached networkx-3.4.2-py3-none-any.whl.metadata (6.3 kB)
Collecting jinja2 (from torch)
  Using cached jinja2-3.1.4-py3-none-any.whl.metadata (2.6 kB)
Collecting fsspec (from torch)
  Using cached fsspec-2024.10.0-py3-none-any.whl.metadata (11 kB)
Collecting setuptools (from torch)
  


[notice] A new release of pip is available: 24.0 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
from langchain.chains import GraphCypherQAChain 
from langchain_community.graphs import Neo4jGraph
from langchain.llms import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

graph = Neo4jGraph(
    url=uri,
    username=user,
    password=password,
    database=db_name
)

model_name = "distilgpt2"  
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

hf_pipeline = pipeline(
    "text-generation", 
    model=model, 
    tokenizer=tokenizer,
    max_length=600
)

llm = HuggingFacePipeline(pipeline=hf_pipeline)
chain = GraphCypherQAChain.from_llm(
    graph=graph,
    llm=llm,
    verbose=True,
    allow_dangerous_requests=True
)

In [24]:
question = "Is there a specific download link available for accessing the satellite data related to Frankfurt within the S2_L2A_MAJA dataset?"

response = chain.invoke({"query": question})

print("Response:", response)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.




[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mTask:Generate Cypher statement to query a graph database.
Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.
Schema:
Node properties are the following:
BASIC_INFORMATIONS {composite_type: STRING, temporal_extent_start: STRING, spatial_extent: LIST, description: STRING, instrument: STRING, title: STRING, sensor_type: STRING, platform_serial_identifier: STRING, platform: STRING, license: STRING, name: STRING, orbit_type: STRING, doi: STRING},PROVIDER {name: STRING},CITY {name: STRING},TILES {name: STRING, country: STRING, downloadLink: STRING, dataset_name: STRING}
Relationship properties are the following:

The relationships are the following:
(:PROVIDER)-[:PROVIDES]->(:BASIC_INFORMATIONS),(:CITY)-[:LOCATED_IN]->(:TILES),(:TILES)-[:HAS_INFORMATION]->(:BASIC_INFORMATIONS)
Note: Do not include any

CypherSyntaxError: {code: Neo.ClientError.Statement.SyntaxError} {message: Invalid input 'Task': expected 'FOREACH', 'ALTER', 'ORDER BY', 'CALL', 'USING PERIODIC COMMIT', 'CREATE', 'LOAD CSV', 'START DATABASE', 'STOP DATABASE', 'DEALLOCATE', 'DELETE', 'DENY', 'DETACH', 'DROP', 'DRYRUN', 'FINISH', 'GRANT', 'INSERT', 'LIMIT', 'MATCH', 'MERGE', 'NODETACH', 'OFFSET', 'OPTIONAL', 'REALLOCATE', 'REMOVE', 'RENAME', 'RETURN', 'REVOKE', 'ENABLE SERVER', 'SET', 'SHOW', 'SKIP', 'TERMINATE', 'UNWIND', 'USE' or 'WITH' (line 1, column 1 (offset: 0))
"Task:Generate Cypher statement to query a graph database."
 ^}