# Node Embeddings with Properties

In this step, we extract temporal features from the timestamps stored in the `validFrom` and `validTo` properties of each `Trip` node. These include basic components like year, month, day, hour, and minute, as well as derived attributes such as weekday, weekend and season. The resulting values are stored as new node properties and will later be used as input features for the FastRP algorithm.





#### Import the required libraries
First of all we have to install and import the libraries that we need for the implementation of the Temporal Node Embedding.

- neo4j: The Neo4j Python driver is used to connect to the Neo4j database.
- graphdatascience: The graph datascience client is a Python client for working with the Neo4j Graph Data Science Library which is used for the in-memory graph projection and the FastRP algorithm for the embedding.


In [None]:
%pip install neo4j
%pip install graphdatascience
%pip install holidays

In [None]:
from neo4j import GraphDatabase
import graphdatascience
import holidays
from datetime import datetime

### Configure Driver and Client

We have to configure the driver and the client for the connection to the Neo4j database. The driver is used to execute Cypher queries and the client is used to execute the Graph Data Science Library algorithms. 

- Endpoint: Bolt URL of the Neo4j database 
- Username: Username 
- Password: Password
- database: Database where you imported the trips

In [40]:
endpoint = "neo4j://localhost:7687"
username = "neo4j"
password = "#Bachelorarbeit"
database = "neo4j"

gds = graphdatascience.GraphDataScience(endpoint=endpoint, auth=(username, password))
gds.set_database(database)

db_driver = GraphDatabase.driver(endpoint, auth=(username,password))

### Adding Derived Temporal Node Properties from Timestamps
In this step, temporal features are extracted from the `validFrom` and `validTo` properties of each `Trip` node. These include basic components like year, month, day, hour, and minute, as well as derived attributes such as weekday, weekend indicator, and season. The values are stored as new node properties and will later be used as input features for the FastRP algorithm.

To support future interval-based analyses and to avoid reprocessing timestamps, both start and end features (e.g., `startDay`, `endDay`, etc.) are stored, even if in this case only a single timestamp is relevant. If the embedding is based on a single point in time, only the corresponding features would be needed.


In [None]:
query = """
MATCH (t:Trip)
SET t.startYear = t.validFrom.year,
    t.startMonth = t.validFrom.month,
    t.startDay = t.validFrom.day,
    t.startHour = t.validFrom.hour,
    t.startMinute = t.validFrom.minute,
    t.startSeason = CASE
        WHEN t.validFrom.month IN [12, 1, 2] THEN 1
        WHEN t.validFrom.month IN [3, 4, 5] THEN 2
        WHEN t.validFrom.month IN [6, 7, 8] THEN 3
        WHEN t.validFrom.month IN [9, 10, 11] THEN 4
        ELSE 0
    END,
    t.startIsWeekend = CASE
        WHEN t.validFrom.dayOfWeek IN [6, 7] THEN 1
        WHEN t.validFrom.dayOfWeek IN [1, 2, 3, 4, 5] THEN 0
        ELSE 0
    END,
    t.startWeekday = t.validFrom.dayOfWeek,
    t.endYear = t.validTo.year,
    t.endMonth = t.validTo.month,
    t.endDay = t.validTo.day,
    t.endHour = t.validTo.hour,
    t.endMinute = t.validTo.minute,
    t.endSeason = CASE
        WHEN t.validTo.month IN [12, 1, 2] THEN 1
        WHEN t.validTo.month IN [3, 4, 5] THEN 2
        WHEN t.validTo.month IN [6, 7, 8] THEN 3
        WHEN t.validTo.month IN [9, 10, 11] THEN 4
        ELSE 0
    END,
    t.endIsWeekend = CASE
        WHEN t.validTo.dayOfWeek IN [6, 7] THEN 1
        WHEN t.validTo.dayOfWeek IN [1, 2, 3, 4, 5] THEN 0
        ELSE 0
    END,
    t.endWeekday = t.validTo.dayOfWeek

"""
with db_driver.session(database=database) as session:
        session.run(query)

###  Annotating Trips with Holiday Information

In the context of bike sharing in New York City, certain contextual factors—such as public holidays—can significantly influence user behavior. To capture this, each `Trip` node is annotated with an `isHoliday` property based on the `validFrom` timestamp and the official US holidays for New York State.

This additional temporal feature allows for more nuanced analyses and can help improve downstream tasks like demand prediction. The annotation is done in batches for performance reasons using the APOC library.


In [None]:
# 23min
ny_holidays = holidays.country_holidays('US', subdiv='NY')

def check_holidays(date_obj):
    if isinstance(date_obj, datetime):
        return int(date_obj.date() in ny_holidays)
    return 0

def annotate_holidays_batchwise(driver, batch_size=500):
    with driver.session(database=database) as session:
        result = session.run("""
            MATCH (t:Trip)
            WHERE t.validFrom IS NOT NULL
            RETURN id(t) AS node_id, t.validFrom AS startTime
        """)

        batch = []
        batches_sent = 0

        for record in result:
            try:
                start_time = record["startTime"]
                is_holiday = check_holidays(start_time.to_native())
                batch.append({
                    "node_id": record["node_id"],
                    "isHoliday": is_holiday
                })
            except Exception as e:
                print(f"Skipping node {record['node_id']}: {e}")

            if len(batch) >= batch_size:
                _send_isHoliday_batch(driver, batch)
                batches_sent += batch_size
                print(f"{batches_sent} holiday annotations written")
                batch = []

        if batch:
            _send_isHoliday_batch(driver, batch)

def _send_isHoliday_batch(driver, batch):
    query = """
    CALL apoc.periodic.iterate(
      'UNWIND $batch AS row RETURN row',
      '
      MATCH (t:Trip) WHERE id(t) = row.node_id
      SET t.isHoliday = row.isHoliday
      ',
      {batchSize: 100, parallel: true, params: {batch: $batch}}
    )
    """
    with driver.session(database=database) as session:
        session.run(query, batch=batch)

annotate_holidays_batchwise(db_driver)


### Assigning Temporal Properties to Station Nodes
In order to generate node embeddings using feature-based algorithms like FastRP, all nodes in the embedded graph must share the same set of properties. This ensures that the embedding algorithm can compute meaningful and comparable vectors across different node types.

In this bike-sharing graph, Trip nodes inherently contain temporal information such as start and end timestamps. However, Station nodes do not carry any temporal attributes by default. To embed both node types into the same feature space, we need to assign analogous temporal properties to Station nodes.

The following Python code achieves this by generating random but reproducible temporal features for each Station node. The node's unique ID (station_id) is used as a seed to ensure that the same properties are assigned every time the script is run.

In [43]:
import random

def generate_properties_for_station(station_id):
    rnd = random.Random(f"{station_id}_start")  # use a consistent seed per station

    return {
        "startYear": 2017,
        "startMonth": rnd.randint(1, 12),
        "startDay": rnd.randint(1, 28),  # safe to keep Feb valid
        "startHour": rnd.randint(0, 23),
        "startMinute": rnd.randint(0, 59),
        "startWeekday": rnd.randint(0, 6),
        "startIsHoliday": rnd.randint(0, 1),
        "startSeason": rnd.randint(0, 3),
        "startIsWeekend": rnd.randint(0, 1),
    }

def generate_end_properties_for_station(station_id):
    rnd = random.Random(f"{station_id}_end")

    return {
        "endYear": 2017,
        "endMonth": rnd.randint(1, 12),
        "endDay": rnd.randint(1, 28),
        "endHour": rnd.randint(0, 23),
        "endMinute": rnd.randint(0, 59),
        "endWeekday": rnd.randint(0, 6),
        "endIsHoliday": rnd.randint(0, 1),
        "endSeason": rnd.randint(0, 3),
        "endIsWeekend": rnd.randint(0, 1),
    }

def write_station_properties_batchwise(driver, batch_size=500):
    with driver.session(database=database) as session:
        result = session.run("MATCH (s:Station) RETURN id(s) AS station_id")

        batch = []
        count = 0
        for record in result:
            sid = record["station_id"]

            try:
                props = generate_properties_for_station(sid)
                props.update(generate_end_properties_for_station(sid))
                props["station_id"] = sid

                batch.append(props)

            except Exception as e:
                print(f"Skipping station {sid}: {e}")

            if len(batch) >= batch_size:
                _send_station_property_batch(driver, batch)
                count += len(batch)
                print(f"{count} stations processed.")
                batch = []

        if batch:
            _send_station_property_batch(driver, batch)

def _send_station_property_batch(driver, batch):
    query = """
    CALL apoc.periodic.iterate(
      'UNWIND $batch AS row RETURN row',
      '
      MATCH (s:Station) WHERE id(s) = row.station_id
      SET s.startYear = row.startYear,
          s.startMonth = row.startMonth,
          s.startDay = row.startDay,
          s.startHour = row.startHour,
          s.startMinute = row.startMinute,
          s.startWeekday = row.startWeekday,
          s.startIsHoliday = row.startIsHoliday,
          s.startSeason = row.startSeason,
          s.startIsWeekend = row.startIsWeekend,
          s.endYear = row.endYear,
          s.endMonth = row.endMonth,
          s.endDay = row.endDay,
          s.endHour = row.endHour,
          s.endMinute = row.endMinute,
          s.endWeekday = row.endWeekday,
          s.endIsHoliday = row.endIsHoliday,
          s.endSeason = row.endSeason,
          s.endIsWeekend = row.endIsWeekend
      ',
      {batchSize: 100, parallel: true, params: {batch: $batch}}
    )
    """
    with driver.session(database=database) as session:
        session.run(query, batch=batch)

write_station_properties_batchwise(db_driver)


500 stations processed.


## Temporal Node Embedding with FastRP

In this section we will create the in-memory graph projection of the Graph and apply the FastRP algorithm to embed the nodes.


### Create the In-Memory Graph Projection
First we create the in-memory graph projection. This is necessary to apply the FastRP algorithm. Projected graphs can also include additional numerical properties from the original graph.

In [None]:
#Query braucht 2min

projection_query  = """
MATCH (source)-[r:HAS_START|HAS_END]->(target)
WHERE source:Trip AND target:Station
WITH gds.graph.project(
  'propertiesGraph',
  source,
  target,
  {
    sourceNodeProperties: source {
      year: source.startYear,
      month: source.startMonth,
      day: source.startDay,
      hour: source.startHour,
      minute: source.startMinute,
      weekday: source.startWeekday,
      season: source.startSeason,
      isWeekend: source.startIsWeekend,
      isHoliday: source.startIsHoliday
    },
    targetNodeProperties: target {
      year: target.startYear,
      month: target.startMonth,
      day: target.startDay,
      hour: target.startHour,
      minute: target.startMinute,
      weekday: target.startWeekday,
      season: target.startSeason,
      isWeekend: target.startIsWeekend,
      isHoliday: target.startIsHoliday
    }},
  {undirectedRelationshipTypes: ['*']}
) AS g
RETURN g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels
"""
with db_driver.session(database=database) as session:
    session.run(projection_query)

Now we will check if we created the memory graph correctly and save it into the variable G.

In [None]:
G = gds.graph.get("propertiesGraph")

### FastRP Algorithm Estimation
We estimate the FastRP algorithm. If we checked those with our resources we can run the FastRP algorithm to embed the nodes of the Time Tree.

In [None]:
gds.fastRP.write.estimate(
    G,
    writeProperty="propertyEmbedding",
    randomSeed = 42,
    embeddingDimension= 128,
    nodeSelfInfluence = 1.0,
    propertyRatio = 0.5,
    featureProperties = ['month','day', 'hour', 'minute', 'weekday', 'season', 'isWeekend', 'isHoliday'],
    iterationWeights = [1.0]
)

### FastRP Algorithm Execution
Now we run the FastRP algorithm to embed the nodes of the Time Tree. We will write the embedding into the propertyEmbedding.

In [None]:
#40min
gds.fastRP.write(
    G,
    writeProperty="propertyEmbedding",
    randomSeed = 42,
    embeddingDimension=128,
    nodeSelfInfluence=0.4,
    propertyRatio = 0.5,
    featureProperties = ['month','day', 'hour', 'minute', 'weekday', 'season', 'isWeekend', 'isHoliday'],
    iterationWeights = [1.0]
)

### Dropping Graph and Closing Connection
After we have finished the embedding we can drop the graph and close the connection to the database.

In [None]:
G.drop()

In [None]:
def create_vector_index(index_name, label, property_name, vector_dimension, similarity="cosine"):
    query = f"""
    CREATE VECTOR INDEX {index_name}
    FOR (n:{label})
    ON (n.{property_name})
    OPTIONS {{
    indexConfig: {{
        `vector.dimensions`: {vector_dimension},
        `vector.similarity_function`: '{similarity}'
        }}
    }}
    """
    with db_driver.session(database=database) as session:
        session.run(query)

create_vector_index( 'propertyIndex','Trip', 'propertyEmbedding', '128')

In [None]:
gds.close()
db_driver.close()