### Porto Taxi Dataset

The dataset covers one year of taxi trajectory data from 442 vehicles operating in Porto, Portugal, between July 1, 2013, and June 30, 2014. Each completed trip is categorized as either (A) taxi central–based, (B) stand–based, or (C) non–taxi central–based, reflecting whether the ride was initiated through the dispatch central, a taxi stand, or a random street pickup. Each record includes metadata such as trip origin type, taxi and call identifiers, day type, and a GPS trajectory encoded as a sequence of geographic coordinates.

In [None]:
import folium
import h3
from IPython.display import display
from srai.datasets import PortoTaxiDataset

In [None]:
porto_taxi = PortoTaxiDataset()

In [None]:
type(porto_taxi.train_gdf), type(porto_taxi.test_gdf)

Get data using .load() method -> Default config (Travel time estimation)

In [None]:
ds = porto_taxi.load()
ds.keys()

In [None]:
type(porto_taxi.train_gdf), type(porto_taxi.test_gdf)

In [None]:
ds["train"].head()

In [None]:
ds["test"].head()

Getting h3 trajectories with target valeus

In [None]:
porto_taxi.resolution

In [None]:
train_h3, _, test_h3 = porto_taxi.get_h3_with_labels()

In [None]:
train_h3

In [None]:
test_h3

Get Human Mobility Prediction (HMP) data

In [None]:
ds = porto_taxi.load(version="HMP")

In [None]:
ds["train"].head()

Creating your own train_test split based on trajectory duration (version TTE) or length version (HMP).

Downloading version `all` without passing resolution, will return trajectories as linestring geometries.

In [None]:
ds = porto_taxi.load(version="all")
ds.keys()

In [None]:
ds["train"].head()

Passing resolution parameter is neccessary for generation of trajectory in h3 style.

`Resolution` parameter is required to create h3 sequences from the linestring geometry.

In [None]:
ds = porto_taxi.load(version="all", resolution=9)
ds.keys()

In [None]:
ds["train"].head()

In [None]:
train, test = porto_taxi.train_test_split(
    target_column="trip_id", task="HMP", test_size=0.2, n_bins=7
)

In [None]:
len(train), len(test)

In [None]:
def visualize_h3_trajectories(
    h3_sequences, map_center=(41.14075, -8.61029), zoom_start=12
):
    """
    Visualize H3 sequences on a Folium map.

    Args:
        h3_sequences (List[List[str]]): A list of H3 sequences (trajectories).
        map_center (Tuple[float, float]): Center of the map (lat, lon).
        zoom_start (int): Initial zoom level.
    """
    m = folium.Map(location=map_center, zoom_start=zoom_start, tiles="cartodbpositron")

    colors = ["red", "blue", "green", "purple", "orange", "darkred", "lightblue"]

    for i, sequence in enumerate(h3_sequences):
        color = colors[i % len(colors)]

        for h3_id in sequence:
            boundary = h3.cell_to_boundary(
                h3_id,
            )
            folium.Polygon(
                locations=boundary, color=color, weight=2, fill=True, fill_opacity=0.3
            ).add_to(m)

    return m


h3_sequences = train["h3_sequence"].tolist()
map_ = visualize_h3_trajectories(h3_sequences[0:10])  # visualize first 10 for speed
display(map_)