### Geolife Dataset
This GPS trajectory dataset was collected in (Microsoft Research Asia) Geolife project by 182 users in a period of over five years (from April 2007 to August 2012). A GPS trajectory of this dataset is represented by sequence of time-stamped points each of which contains the information of altitude, longitude, latitude. The original dataset contains 17,784 trajectories, ~25M Points with a total distance of 1,292,951 kilometers and a total duration of 50,176 hours. These trajectories were recorded by different GPS loggers and GPS phones, and have a variety of sampling rates. 91.5 percent of the trajectories are logged in a dense representation, e.g. every 1~5 seconds or every 5~10 meters per point. The original dataset was filtered and preprocessed, with further details available in the benchmark publication.

Attributes:
* latitude: Latitude in decimal degrees.
* longitude: Longitude in decimal degrees.
* altitude: Altitude in feet (-777 if not valid).
* time: Date and time as a string.
* trajectory_id: ID of trajectory that Point belongs to.
* user_id: ID of user that reported Point. 
* crs: WGS 84

In [None]:
import folium
import h3
from IPython.display import display
from srai.datasets import GeolifeDataset

In [None]:
geolife = GeolifeDataset()

In [None]:
type(geolife.train_gdf), type(geolife.test_gdf)

Get data using .load() method -> Default config (Human Mobility Classification)

In [None]:
ds = geolife.load()
ds.keys()

In [None]:
type(ds["train"]), type(ds["test"])

In [None]:
ds["train"].head()

Creating your own train_test split based on trajectory duration (version TTE) or length version (HMC). 

Downloading version `all` without passing resolution, will return trajectories as linestring geometries.

In [None]:
ds = geolife.load(version="all")
ds.keys()

In [None]:
ds["train"].head()

Passing resolution parameter is neccessary for generation of trajectory in h3 style.

`Resolution` parameter is required to create h3 sequences from the linestring geometry.

In [None]:
ds = geolife.load(version="all", resolution=10)
ds.keys()

In [None]:
ds["train"].head()

In [None]:
train, test = geolife.train_test_split(
    target_column="trajectory_id", task="TTE", test_size=0.2, n_bins=3
)

In [None]:
len(train), len(test)

In [None]:
geolife.resolution

In [None]:
geolife.test_gdf.head()

In [None]:
def visualize_h3_trajectories(
    h3_sequences, map_center=(39.98899, 116.32702), zoom_start=12
):
    """
    Visualize H3 sequences on a Folium map.

    Args:
        h3_sequences (List[List[str]]): A list of H3 sequences (trajectories).
        map_center (Tuple[float, float]): Center of the map (lat, lon).
        zoom_start (int): Initial zoom level.
    """
    m = folium.Map(location=map_center, zoom_start=zoom_start, tiles="cartodbpositron")

    colors = ["red", "blue", "green", "purple", "orange", "darkred", "lightblue"]

    for i, sequence in enumerate(h3_sequences):
        color = colors[i % len(colors)]

        for h3_id in sequence:
            boundary = h3.cell_to_boundary(
                h3_id,
            )
            folium.Polygon(
                locations=boundary, color=color, weight=2, fill=True, fill_opacity=0.3
            ).add_to(m)

    return m


h3_sequences = train["h3_sequence"].tolist()
map_ = visualize_h3_trajectories(h3_sequences[10:20])  # visualize first 10 for speed
display(map_)