# Temporal Node Embedding Property Approach

#### Import the required libraries
First of all we have to install and import the libraries that we need for the implementation of the Temporal Node Embedding.

- neo4j: The Neo4j Python driver is used to connect to the Neo4j database.
- graphdatascience: The graph datascience client is a Python client for working with the Neo4j Graph Data Science Library which is used for the in-memory graph projection and the FastRP algorithm for the embedding.

In [None]:
from requests import session
%pip install neo4j
%pip install graphdatascience

In [None]:
from neo4j import GraphDatabase
import graphdatascience

### Configure Driver and Client

We have to configure the driver and the client for the connection to the Neo4j database. The driver is used to execute Cypher queries and the client is used to execute the Graph Data Science Library algorithms.

- Endpoint: Bolt URL of the Neo4j database
- Username: Username
- Password: Password
- database: Database where you imported the trips

In [None]:
endpoint = "neo4j://localhost:7687"
username = "neo4j"
password = "#Bachelorarbeit"
database = "neo4j"

gds = graphdatascience.GraphDataScience(endpoint=endpoint, auth=(username, password))
gds.set_database(database)

db_driver = GraphDatabase.driver(endpoint, auth=(username,password)).session(database=database)

### Function for running Cypher Queries

We introduce a simple function that will be used to run cypher queries. The function takes a query as an argument and returns the result of the query.

In [None]:
def run_query(query):
    with db_driver as session:
        result = session.run(query)
        return [record.data() for record in result]


## Temporal Node Embedding with FastRP

In this section we will create two in-memory graph projections of the Graph and apply the FastRP algorithm to calculate a start and endembedding which we will average to generate the final interval ebedding.


Now we will check if we created the memory graph correctly and save it into the variable G.

In [None]:
G = gds.graph.get("propertiesStartGraph")

In [None]:
gds.fastRP.write.estimate(
    G,
    writeProperty="propertiesStartEmbedding",
    randomSeed = 42,
    embeddingDimension=128,
    nodeSelfInfluence = 2.0,
    propertyRatio = 0.5,
    featureProperties = ['startMonth','startDay', 'startHour', 'startMinute', 'startWeekday', 'startSeason', 'startIsWeekend'],
    iterationWeights = [1.0]
)


In [None]:
#Query braucht 12min
gds.fastRP.write(
    G,
    writeProperty="propertiesStartEmbedding",
    randomSeed = 42,
    embeddingDimension=128,
    nodeSelfInfluence = 2.0,
    propertyRatio = 0.5,
    featureProperties = ['startMonth','startDay', 'startHour', 'startMinute', 'startWeekday', 'startSeason', 'startIsWeekend'],
    iterationWeights= [1.0]
)

In [None]:
G.drop()

In [None]:
#Query braucht 2min

projection_query  = """
MATCH (source)-[r:HAS_START|HAS_END]->(target)
WHERE source:Trip AND target:Station
WITH gds.graph.project(
  'propertiesEndGraph',
  source,
  target,
  {
    sourceNodeProperties: source {
      year: source.endYear,
      month: source.endMonth,
      day: source.endDay,
      hour: source.endHour,
      minute: source.endMinute,
      weekday: source.endWeekday,
      season: source.endSeason,
      isWeekend: source.endIsWeekend
    },
    targetNodeProperties: target {
      year: target.endYear,
      month: target.endMonth,
      day: target.endDay,
      hour: target.endHour,
      minute: target.endMinute,
      weekday: target.endWeekday,
      season: target.endSeason,
      isWeekend: target.endIsWeekend
    }},
  {undirectedRelationshipTypes: ['*']}
) AS g
RETURN g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels
"""
run_query(projection_query)

Now we will check if we created the memory graph correctly and save it into the variable G.

In [None]:
G = gds.graph.get("propertiesEndGraph")

In [None]:
gds.fastRP.write.estimate(
    G,
    writeProperty="propertyEndEmbedding",
    randomSeed = 42,
    embeddingDimension=128,
    nodeSelfInfluence = 2.0,
    propertyRatio = 0.5,
    featureProperties = ['endMonth','endDay', 'endHour', 'endMinute', 'endWeekday', 'endSeason', 'endIsWeekend'],
    iterationWeights = [1.0]
)


In [None]:
#Query braucht 12min
gds.fastRP.write(
    G,
    writeProperty="propertyEndEmbedding",
    randomSeed = 42,
    embeddingDimension=128,
    nodeSelfInfluence = 2.0,
    propertyRatio = 0.5,
    featureProperties= ['endMonth','endDay', 'endHour', 'endMinute', 'endWeekday', 'endSeason', 'endIsWeekend'],
    iterationWeights= [1.0]
)

In [None]:
###Hier Query um Embeddings zusammenfließen zu lassen 2 Indexe und ein 3. für die ABfrage dann ?

In [None]:
average_query = """
CALL apoc.periodic.iterate(
  "MATCH (t:Trip) WHERE t.propertyStartEmbedding IS NOT NULL AND t.propertyEndEmbedding IS NOT NULL RETURN t",
  "WITH t, apoc.coll.zip(t.propertyStartEmbedding, t.propertyEndEmbedding) AS zipped
   SET t.propertyIntervalEmbedding = [pair IN zipped | (pair[0] + pair[1]) / 2.0]",
  {batchSize:10000, parallel:true}
)"""

with db_driver.session() as session:
    session.run(average_query)

In [None]:
def create_vector_index(index_name, label, property_name, vector_dimension, similarity="cosine"):
    query = f"""
    CREATE VECTOR INDEX {index_name} IF NOT EXISTS
    FOR (n:{label})
    ON (n.{property_name})
    OPTIONS {{
    indexConfig: {{
        `vector.dimensions`: {vector_dimension},
        `vector.similarity_function`: '{similarity}'
        }}
    }}
    """
    run_query(query)
create_vector_index( 'propertyIndex','Trip', 'propertyIntervalEmbedding', '128')

In [None]:
gds.close()
db_driver.close()