# Examples

Here follows examples of how to use the utility Python functions contained in `utils.py`.

In [2]:
import json
from utils import *

import warnings

# Suppress FutureWarning on pandas
warnings.filterwarnings("ignore", category=FutureWarning)

# Load ferries data
with open("ferries.json", "r") as file:
    ferries_data = json.load(file)

## `fetch_vessel_data`

Fetches historical vessel data from PONTOS-HUB through the REST API.
The function requires a specified time range because it is NOT possible to access all the data for a specific vessel with a single request. It is just too much data and the REST API has a limit of returning max 1 million rows per request.

The example below fetches the data of the ferry "Fragancia" as it is stored in the hub for a time range of 10 minutes. 

    NOTE: `fetch_vessel_data` assumes the time is in Coordinated Universal Time (UTC) and not Central European Time (CET) or similar. The timestamps in the returned vessel data are also in UTC.

In [5]:
# PONTOS vessel id for ferry "Fragancia"
vessel_id = "mmsi_265558290"

# Time range for fetching data (10 minutes)
start_time = "2024-11-01 12:00:00"
end_time = "2024-11-01 12:10:00"

# Fetch vessel data
vessel_data = fetch_vessel_data(vessel_id, start_time, end_time)

# Print the first 5 data points
print(f"Vessel data contains {len(vessel_data)} data points within the 10 minutes interval.")
unique_parameters = set(data_point['parameter_id'] for data_point in vessel_data)


# Formatting the unique measurement output
unique_parameters_str = "\n\t".join(unique_parameters)
print(f"The unique measurements available are:\n\t{unique_parameters_str}")

print("The first data point is:")
print(vessel_data[0])

Vessel data contains 7200 data points within the 10 minutes interval.
The unique measurements available are:
	enginemain_fuelcons_lph_3
	enginemain_fuelcons_lph_2
	enginemain_speed_rpm_2
	positioningsystem_longitude_deg_1
	enginemain_speed_rpm_4
	positioningsystem_sog_kn_1
	enginemain_fuelcons_lph_1
	positioningsystem_latitude_deg_1
	positioningsystem_cog_deg_1
	enginemain_speed_rpm_1
	enginemain_fuelcons_lph_4
	enginemain_speed_rpm_3
The first data point is:
{'time': '2024-11-01T12:00:00+00:00', 'parameter_id': 'enginemain_fuelcons_lph_1', 'value': 4.7999997}


To facilate the handling of the data, the `fetch_vessel_data` function can use the PONTOS-HUB resources that return data averaged within different "time buckets". This functionality is controlled by the `time_bucket` argument. The examples below demonstrate.

In [6]:
# Fetch vessel data, averaged within a 5 seconds time bucket
vessel_data = fetch_vessel_data(vessel_id, start_time, end_time, time_bucket="5 seconds")
print(f"Vessel data contains {len(vessel_data)} data points within the 10 minutes interval.")

# Fetch vessel data, averaged within a 30 seconds time bucket
vessel_data = fetch_vessel_data(vessel_id, start_time, end_time, time_bucket="30 seconds")
print(f"Vessel data contains {len(vessel_data)} data points within the 10 minutes interval.")

# Fetch vessel data, averaged within a 1 minute time bucket
vessel_data = fetch_vessel_data(vessel_id, start_time, end_time, time_bucket="1 minute")
print(f"Vessel data contains {len(vessel_data)} data points within the 10 minutes interval.")

# Fetch vessel data, averaged within a 5 minutes time bucket
vessel_data = fetch_vessel_data(vessel_id, start_time, end_time, time_bucket="5 minutes")
print(f"Vessel data contains {len(vessel_data)} data points within the 10 minutes interval.")

# Fetch vessel data, averaged within a 10 minutes time bucket
vessel_data = fetch_vessel_data(vessel_id, start_time, end_time, time_bucket="10 minutes")
print(f"Vessel data contains {len(vessel_data)} data points within the 10 minutes interval.")


Vessel data contains 1440 data points within the 10 minutes interval.
Vessel data contains 240 data points within the 10 minutes interval.
Vessel data contains 120 data points within the 10 minutes interval.
Vessel data contains 24 data points within the 10 minutes interval.
Vessel data contains 12 data points within the 10 minutes interval.


As expected, the number of data points decreases as the size of the time bucket increases. 

The `fetch_vessel_data` function can also request specific measurements. The `parameter_ids` argument takes a list of strings that are to be matched with the the tags of the available measurements. The examples below demonstrates this.

In [8]:
# Fetch vessel data, with only measurement tags that include the words 'latitude' and 'longitude', averaged within a 10 minutes time bucket
vessel_data = fetch_vessel_data(
    vessel_id, 
    start_time, 
    end_time, 
    time_bucket="10 minutes",
    parameter_ids=["latitude", "longitude"]
)
print("With parameter_ids=['latitude', 'longitude'] ...")
print(f"Vessel data contains {len(vessel_data)} data points within the 10 minutes interval.")
measurements_str = "\n\t".join(data_point['parameter_id'] for data_point in vessel_data)
print(f"The measurements available are:\n\t{measurements_str}\n")

# Fetch vessel data, only the measurements with the tag 'enginemain', averaged within a 10 minutes time bucket
vessel_data = fetch_vessel_data(
    vessel_id, 
    start_time, 
    end_time, 
    time_bucket="10 minutes",
    parameter_ids=["enginemain"]
)
print("With parameter_ids=['enginemain'] ...")
measurements_str = "\n\t".join(data_point['parameter_id'] for data_point in vessel_data)
print(f"Vessel data contains {len(vessel_data)} data points within the 10 minutes interval.")
print(f"The measurements available are:\n\t{measurements_str}\n")

# Fetch vessel data, only the measurements with the tag 'enginemain_fuelcons_lph_1', averaged within a 10 minutes time bucket
vessel_data = fetch_vessel_data(
    vessel_id, 
    start_time, 
    end_time, 
    time_bucket="10 minutes",
    parameter_ids=["enginemain_fuelcons_lph_1"]
)
print("With parameter_ids=['enginemain_fuelcons_lph_1'] ...")
measurements_str = "\n\t".join(data_point['parameter_id'] for data_point in vessel_data)
print(f"Vessel data contains {len(vessel_data)} data points within the 10 minutes interval.")
print(f"The measurements available are:\n\t{measurements_str}\n")

With parameter_ids=['latitude', 'longitude'] ...
Vessel data contains 2 data points within the 10 minutes interval.
The measurements available are:
	positioningsystem_latitude_deg_1
	positioningsystem_longitude_deg_1

With parameter_ids=['enginemain'] ...
Vessel data contains 8 data points within the 10 minutes interval.
The measurements available are:
	enginemain_fuelcons_lph_1
	enginemain_fuelcons_lph_2
	enginemain_fuelcons_lph_3
	enginemain_fuelcons_lph_4
	enginemain_speed_rpm_1
	enginemain_speed_rpm_2
	enginemain_speed_rpm_3
	enginemain_speed_rpm_4

With parameter_ids=['enginemain_fuelcons_lph_1'] ...
Vessel data contains 1 data points within the 10 minutes interval.
The measurements available are:
	enginemain_fuelcons_lph_1



## `transform_vessel_data_to_dataframe`

Transforms vessel data into a Pandas DataFrame where each row corresponds to a timestamp. If not averaged, the resolution of the timestamps is 1 second and the first measurment in that second is used in the dataframe. If averaged, the resolution of the timestamp is the size of the time bucket. Also, if averaged, the `avg_` prefixes are removed.

The examples below demonstrate.

In [9]:
# Fetch vessel data without averaging
vessel_data = fetch_vessel_data(
    vessel_id, 
    start_time, 
    end_time, 
    parameter_ids=["latitude","longitude","sog","fuelcons"]
    )
df = transform_vessel_data_to_dataframe(vessel_data)
print(f"The data contains {len(df)} rows.")
df.head(5)

The data contains 600 rows.


parameter_id,time,enginemain_fuelcons_lph_1,enginemain_fuelcons_lph_2,enginemain_fuelcons_lph_3,enginemain_fuelcons_lph_4,positioningsystem_latitude_deg_1,positioningsystem_longitude_deg_1,positioningsystem_sog_kn_1
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,2024-11-01 12:00:00+00:00,4.8,4.7,4.7,4.8,59.3953,18.441317,0.0
1,2024-11-01 12:00:01+00:00,4.7,4.5,4.7,4.8,59.3953,18.441317,0.0
2,2024-11-01 12:00:02+00:00,4.6,4.4,4.7,4.8,59.3953,18.441317,0.0
3,2024-11-01 12:00:03+00:00,4.6,4.4,4.7,4.8,59.3953,18.441317,0.0
4,2024-11-01 12:00:04+00:00,4.5,4.3,4.7,4.8,59.395306,18.441317,0.0


In [10]:
# Fetch vessel data
vessel_data = fetch_vessel_data(
    vessel_id, 
    start_time,
    end_time,
    time_bucket="30 seconds",
    parameter_ids=["latitude","longitude","sog","fuelcons"]
)
df = transform_vessel_data_to_dataframe(vessel_data)

df.head(5)

parameter_id,time,enginemain_fuelcons_lph_1,enginemain_fuelcons_lph_2,enginemain_fuelcons_lph_3,enginemain_fuelcons_lph_4,positioningsystem_latitude_deg_1,positioningsystem_longitude_deg_1,positioningsystem_sog_kn_1
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,2024-11-01 12:00:14.500000+00:00,4.38,4.18,4.703334,4.803333,59.395307,18.441321,0.0
1,2024-11-01 12:00:44.500000+00:00,4.363333,4.153333,4.62,4.78,59.395315,18.441326,0.0
2,2024-11-01 12:01:14.500000+00:00,4.4,4.2,4.543334,4.763333,59.395319,18.441328,0.0
3,2024-11-01 12:01:44.500000+00:00,4.4,4.2,4.596667,4.786666,59.395315,18.441328,0.0
4,2024-11-01 12:02:14.500000+00:00,4.4,4.2,4.6,4.786666,59.395301,18.44133,0.0


## `plot_paths`

Plots a series of paths on a map using the pydeck library where a 'path' is a list of latitude, longitude tuples/lists. The example below demonstrates.

In [11]:
# Fetch vessel data, averaged within a 30 seconds time bucket
vessel_data = fetch_vessel_data(
    vessel_id, 
    "2024-11-01 12:00:00",
    "2024-11-01 12:30:00",
    time_bucket="30 seconds",
    parameter_ids=["positioningsystem_latitude_deg_1","positioningsystem_longitude_deg_1"]
)
df = transform_vessel_data_to_dataframe(vessel_data)

# Create a path
path = df[["positioningsystem_latitude_deg_1", "positioningsystem_longitude_deg_1"]].values.tolist()

# Plot the path (path is wrapped in a list because plot_paths expects a list of paths)
plot_paths([path])

## `get_trips_from_vessel_data`

Processes vessel data to extract 'trips' between 'stops'. A 'stop' is defined as a period of time where the vessel's speed is below a certain threshold. The function removes all the measurements that correspond to speeds below the defined threshold and then splits the data into trips where the time gaps exceed the defined time threshold. The function defaults to a speed threshold of 1 knot and a time threshold of 1 minute. A trip is a dictionary containing keys-value corresponding to the trips path, timestamps, and correponding measurements. 

The example below demonstrates.

    NOTE: `get_trips_from_vessel_data` assumes that the timestamps in the provided vessel data are in UTC and transforms them to CET. See the function's documentation for the default argument `time_zone`.

In [13]:
# Fetch vessel data, averaged within a 30 seconds time bucket
vessel_data = fetch_vessel_data(
    vessel_id, 
    "2024-11-01 12:00:00",
    "2024-11-01 12:30:00",
    time_bucket="30 seconds",
    parameter_ids=["positioningsystem_latitude_deg_1", "positioningsystem_longitude_deg_1", "sog", "fuelcons"]
)

trips = get_trips_from_vessel_data(vessel_data)
print(f"There are {len(trips)} trips in the data.")

# Constructing the keys output
trip_keys_str = "\n\t".join(trips[0].keys())
print(f"The keys of the first trip are:\n\t{trip_keys_str}")

There are 4 trips in the data.
The keys of the first trip are:
	path
	enginemain_fuelcons_lph_1
	enginemain_fuelcons_lph_2
	enginemain_fuelcons_lph_3
	enginemain_fuelcons_lph_4
	positioningsystem_sog_kn_1
	time


In [14]:
# Make a list of paths
paths = [trip["path"] for trip in trips]

# Plot the paths
plot_paths(paths)

## `cluster_paths`

Cluster paths based on their Fréchet distance and direction similarity. In other words, tries to cluster all the paths based on how similar they are an the overall direction on which the vessel is travelling. The example below demonstrates.

In [15]:
clusters = cluster_paths(paths)
colors = get_cluster_colors(clusters)
plot_paths(paths, colors=colors)


## Other

Addtional functions are available in `utils.py`, take a look at the file for more information.