# Access GrandTour Data using HuggingFace 🤗
© 2025 ETH Zurich
 
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/leggedrobotics/grand_tour_dataset/blob/main/examples/%5B0%5D_Accessing_GrandTour_Data.ipynb)


## Overview
> GrandTour data is avaialable in two formats, hosted on two platforms:

<table>
  <tr>
    <th style="padding:10px;text-align:left;">Format</th>
    <th style="padding:10px;text-align:left;"> </th>
    <th style="padding:10px;text-align:left;">Hosted&nbsp;on</th>
    <th style="padding:10px;text-align:left;"> </th>
  </tr>

  <tr>
    <td><img src="https://raw.githubusercontent.com/leggedrobotics/grand_tour_dataset/main/assets/ros-logo.png"  height="30" alt="ROS logo"></td>
    <td style="padding-left:15px;"><a href="https://wiki.ros.org/rosbag">ROS&nbsp;Bags</a></td>
    <td><img src="https://raw.githubusercontent.com/leggedrobotics/grand_tour_dataset/main/assets/rsl-logo.png"  height="30" alt="RSL logo"></td>
    <td style="padding-left:15px;">Kleinkram</td>
  </tr>


  <tr>
    <td><img src="https://raw.githubusercontent.com/leggedrobotics/grand_tour_dataset/main/assets/zarr-logo.png" height="40" alt="Zarr logo"></td>
    <td style="padding-left:15px;"><a href="https://zarr.dev/">ZARR</a></td>
    <td><img src="https://raw.githubusercontent.com/leggedrobotics/grand_tour_dataset/main/assets/hf-logo.png"  height="30" alt="Hugging Face logo"></td>
    <td style="padding-left:15px;">HuggingFace</td>
  </tr>
</table>

> This notebook explains how to download the zarr/png converted dataset hosted on Huggingface.
>
> 
> 💡 Please refer to the `examples_hugging_face/explore.ipynb` on how to use the data.
 
## Downloading
> We provide the entire dataset on HuggingFace in `.zarr`, `.png`, and `.yaml` format.
> 
> To avoid checking in +1M individual files on the HuggingHub, we created a tar-ball `.tar` for each topic per mission.

> HuggingFace has an easy-to-use Python download API called `huggingface_hub`.
> It is possible to download directly from the [GrandTour HuggingFace repo UI](https://huggingface.co/leggedrobotics), but we strongly reccomend making use of `huggingface_hub`, as it manages caching files, interrupted downloads and smart fetching of updated files.

> First, install `huggingface_hub` which requires you to  have an HuggingFace account. You can create one for free at [huggingface.co](https://huggingface.co/).

In [1]:
! pip install -q huggingface_hub # Should be already installed when following the README.md and uv installation!


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


> Then, login using the cli. This will store authentication tokens on your PC and allow you to use the API to download data.

In [2]:
# If your notebook isn't able to take input from the command line, run this in a local terminal instead
# huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): Traceback (most recent call last):
  File "/home/jonfrey/git/grand_tour

> Now you can download an a mission of your choice. The next tutorial - _[1] Exploring GrandTour Data_ - uses 2024-10-01-11-29-55, so we will donwload it here in anticipation.

In [18]:
from huggingface_hub import snapshot_download

# Specify the mission you want to download.
mission = "2024-11-02-17-18-32"

# Download the full dataset
allow_patterns = [f"*"]

# Download all data from a single mission
allow_patterns = [f"{mission}/*"]

# Download a specific topic
# topic = "alphasense_front_center"
# allow_patterns = [f"{mission}/*{topic}*", f"{mission}/*.yaml"]


# to only include a subset of the 

# If this is interuppted during download, simply re-run the block and huggingface_hub will resume the download without re-downloading the already downloaded files.
hugging_face_data_cache_path = snapshot_download(repo_id="leggedrobotics/grand_tour_dataset", allow_patterns=allow_patterns, repo_type="dataset")

Fetching 135 files:   0%|          | 0/135 [00:00<?, ?it/s]

alphasense_right.tar:   0%|          | 0.00/2.19M [00:00<?, ?B/s]

alphasense_front_right.tar:   0%|          | 0.00/2.19M [00:00<?, ?B/s]

alphasense_imu.tar:   0%|          | 0.00/6.76M [00:00<?, ?B/s]

adis_imu.tar:   0%|          | 0.00/6.76M [00:00<?, ?B/s]

anymal_command_twist.tar:   0%|          | 0.00/3.82M [00:00<?, ?B/s]

alphasense_front_left.tar:   0%|          | 0.00/2.19M [00:00<?, ?B/s]

alphasense_left.tar:   0%|          | 0.00/2.19M [00:00<?, ?B/s]

alphasense_front_center.tar:   0%|          | 0.00/2.19M [00:00<?, ?B/s]

anymal_imu.tar:   0%|          | 0.00/6.76M [00:00<?, ?B/s]

anymal_state_actuator.tar:   0%|          | 0.00/266M [00:00<?, ?B/s]

anymal_state_state_estimator.tar:   0%|          | 0.00/36.3M [00:00<?, ?B/s]

anymal_state_battery.tar:   0%|          | 0.00/16.4M [00:00<?, ?B/s]

anymal_state_odometry.tar:   0%|          | 0.00/6.95M [00:00<?, ?B/s]

ap20_imu.tar:   0%|          | 0.00/6.76M [00:00<?, ?B/s]

cpt7_ie_rt_odometry.tar:   0%|          | 0.00/6.95M [00:00<?, ?B/s]

cpt7_ie_rt_tf.tar:   0%|          | 0.00/4.10M [00:00<?, ?B/s]

cpt7_ie_tc_odometry.tar:   0%|          | 0.00/6.95M [00:00<?, ?B/s]

depth_camera_front_lower.tar:   0%|          | 0.00/2.19M [00:00<?, ?B/s]

cpt7_ie_tc_tf.tar:   0%|          | 0.00/4.10M [00:00<?, ?B/s]

depth_camera_left.tar:   0%|          | 0.00/2.19M [00:00<?, ?B/s]

depth_camera_front_upper.tar:   0%|          | 0.00/2.19M [00:00<?, ?B/s]

depth_camera_rear_lower.tar:   0%|          | 0.00/2.19M [00:00<?, ?B/s]

depth_camera_rear_upper.tar:   0%|          | 0.00/2.19M [00:00<?, ?B/s]

depth_camera_right.tar:   0%|          | 0.00/2.19M [00:00<?, ?B/s]

dlio_hesai_points_undistorted.tar:   0%|          | 0.00/6.95M [00:00<?, ?B/s]

gnss_raw_cpt7_ie_rt.tar:   0%|          | 0.00/5.46M [00:00<?, ?B/s]

dlio_map_odometry.tar:   0%|          | 0.00/6.95M [00:00<?, ?B/s]

dlio_tf.tar:   0%|          | 0.00/4.10M [00:00<?, ?B/s]

gnss_raw_cpt7_ie_tc.tar:   0%|          | 0.00/5.46M [00:00<?, ?B/s]

hdr_front.tar:   0%|          | 0.00/2.19M [00:00<?, ?B/s]

hdr_left.tar:   0%|          | 0.00/2.19M [00:00<?, ?B/s]

hdr_right.tar:   0%|          | 0.00/2.19M [00:00<?, ?B/s]

hesai.tar:   0%|          | 0.00/8.20M [00:00<?, ?B/s]

livox_points.tar:   0%|          | 0.00/8.01M [00:00<?, ?B/s]

hesai_undistorted.tar:   0%|          | 0.00/8.20M [00:00<?, ?B/s]

navsatfix_cpt7_ie_tc.tar:   0%|          | 0.00/7.16M [00:00<?, ?B/s]

livox_imu.tar:   0%|          | 0.00/6.76M [00:00<?, ?B/s]

livox_points_undistorted.tar:   0%|          | 0.00/8.01M [00:00<?, ?B/s]

stim320_accelerometer_temperature.tar:   0%|          | 0.00/4.36M [00:00<?, ?B/s]

stim320_imu.tar:   0%|          | 0.00/6.76M [00:00<?, ?B/s]

stim320_gyroscope_temperature.tar:   0%|          | 0.00/4.36M [00:00<?, ?B/s]

velodyne.tar:   0%|          | 0.00/8.29M [00:00<?, ?B/s]

velodyne_undist.tar:   0%|          | 0.00/8.28M [00:00<?, ?B/s]

zed2i_depth_confidence_image.tar:   0%|          | 0.00/2.19M [00:00<?, ?B/s]

zed2i_depth_image.tar:   0%|          | 0.00/2.19M [00:00<?, ?B/s]

zed2i_left_images.tar:   0%|          | 0.00/2.19M [00:00<?, ?B/s]

zed2i_right_images.tar:   0%|          | 0.00/2.19M [00:00<?, ?B/s]

zed2i_vio_map.tar:   0%|          | 0.00/4.71M [00:00<?, ?B/s]

alphasense_front_center.tar:   0%|          | 0.00/993k [00:00<?, ?B/s]

alphasense_front_left.tar:   0%|          | 0.00/922k [00:00<?, ?B/s]

alphasense_left.tar:   0%|          | 0.00/922k [00:00<?, ?B/s]

alphasense_front_right.tar:   0%|          | 0.00/1.02M [00:00<?, ?B/s]

depth_camera_front_lower.tar:   0%|          | 0.00/676k [00:00<?, ?B/s]

alphasense_right.tar:   0%|          | 0.00/922k [00:00<?, ?B/s]

depth_camera_front_upper.tar:   0%|          | 0.00/707k [00:00<?, ?B/s]

depth_camera_left.tar:   0%|          | 0.00/676k [00:00<?, ?B/s]

depth_camera_right.tar:   0%|          | 0.00/573k [00:00<?, ?B/s]

hdr_front.tar:   0%|          | 0.00/1.30M [00:00<?, ?B/s]

depth_camera_rear_upper.tar:   0%|          | 0.00/758k [00:00<?, ?B/s]

hdr_left.tar:   0%|          | 0.00/1.53M [00:00<?, ?B/s]

hdr_right.tar:   0%|          | 0.00/1.41M [00:00<?, ?B/s]

depth_camera_rear_lower.tar:   0%|          | 0.00/645k [00:00<?, ?B/s]

zed2i_depth_confidence_image.tar:   0%|          | 0.00/11.6M [00:00<?, ?B/s]

zed2i_left_images.tar:   0%|          | 0.00/1.35M [00:00<?, ?B/s]

zed2i_depth_image.tar:   0%|          | 0.00/3.18M [00:00<?, ?B/s]

zed2i_right_images.tar:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

anymal_command_twist.yaml:   0%|          | 0.00/43.0 [00:00<?, ?B/s]

cpt7_imu.yaml:   0%|          | 0.00/16.0 [00:00<?, ?B/s]

prism_position.yaml:   0%|          | 0.00/22.0 [00:00<?, ?B/s]

zed2i_depth_caminfo.yaml:   0%|          | 0.00/628 [00:00<?, ?B/s]

> The downloaded data will be compressed into `.tar` files, and must be extracted before it can be used. We reccomend extracting to a destination of your choice outside the huggingface cache directory:

In [19]:
from pathlib import Path

# Define the destination directory
dataset_folder = Path("~/grand_tour_dataset").expanduser()
dataset_folder.mkdir(parents=True, exist_ok=True)

# Print for confirmation
print(f"Data will be extracted to: {dataset_folder}")

Data will be extracted to: /home/jonfrey/grand_tour_dataset


> Define a `.tar` extractor helper function and extract the files:

In [21]:
import os
import shutil
import tarfile
import re

def move_dataset(cache, dataset_folder, allow_patterns=["*"]):

    def convert_glob_patterns_to_regex(glob_patterns):
        regex_parts = []
        for pat in glob_patterns:
            # Escape regex special characters except for * and ?
            pat = re.escape(pat)
            # Convert escaped glob wildcards to regex equivalents
            pat = pat.replace(r'\*', '.*').replace(r'\?', '.')
            # Make sure it matches full paths
            regex_parts.append(f".*{pat}$")
        
        # Join with |
        combined = "|".join(regex_parts)
        return re.compile(combined)
    
    pattern = convert_glob_patterns_to_regex(allow_patterns)
    files = [f for f in Path(cache).rglob("*") if pattern.match(str(f))]
    tar_files = [f for f in files if f.suffix == ".tar" ]
    
    for source_path in tar_files:
        dest_path = dataset_folder / source_path.relative_to(cache)
        dest_path.parent.mkdir(parents=True, exist_ok=True)
        
        try:
            with tarfile.open(source_path, "r") as tar:
                tar.extractall(path=dest_path.parent)
        except tarfile.ReadError as e:
            print(f"Error opening or extracting tar file '{source_path}': {e}")
        except Exception as e:
            print(f"An unexpected error occurred while processing {source_path}: {e}")
    
    other_files = [f for f in files if not f.suffix == ".tar" and f.is_file()]
    for source_path in other_files:
        dest_path = dataset_folder / source_path.relative_to(cache)
        dest_path.parent.mkdir(parents=True, exist_ok=True)
        shutil.copy2(source_path,dest_path)

    print(f"Moved data from {cache} to {dataset_folder} !")

print(dataset_folder)
move_dataset(hugging_face_data_cache_path, dataset_folder, allow_patterns=allow_patterns)

/home/jonfrey/grand_tour_dataset
Moved data from /home/jonfrey/.cache/huggingface/hub/datasets--leggedrobotics--grand_tour_dataset/snapshots/a6c80c525d6690a1060204a9cd2bc4abaf7eae78 to /home/jonfrey/grand_tour_dataset !


> You should now be able to load the dataset in `.zarr` format an inspect the contents:

In [22]:
import zarr.storage

store = zarr.storage.LocalStore(dataset_folder / mission / "data")
root = zarr.group(store=store)

print([k for k in root.keys()])

['adis_imu', 'dlio_map_odometry', 'zed2i_depth_confidence_image', 'zed2i_right_images', 'zed2i_depth_image', 'zed2i_left_images', 'alphasense_right', 'depth_camera_rear_lower', 'stim320_gyroscope_temperature', 'dlio_hesai_points_undistorted', 'ap20_imu', 'hesai_undistorted', 'dlio_tf', 'livox_imu', 'gnss_raw_cpt7_ie_tc', 'anymal_imu', 'anymal_state_battery', 'hesai', 'stim320_accelerometer_temperature', 'cpt7_ie_rt_tf', 'depth_camera_front_lower', 'velodyne', 'prism_position', 'velodyne_undist', 'cpt7_ie_rt_odometry', 'depth_camera_left', 'alphasense_front_center', 'anymal_state_actuator', 'cpt7_ie_tc_tf', 'depth_camera_rear_upper', 'livox_points_undistorted', 'anymal_command_twist', 'cpt7_imu', 'depth_camera_front_upper', 'anymal_state_state_estimator', 'gnss_raw_cpt7_ie_rt', 'alphasense_front_left', 'livox_points', 'zed2i_vio_map', 'alphasense_left', 'stim320_imu', 'alphasense_front_right', 'cpt7_ie_tc_odometry', 'hdr_front', 'alphasense_imu', 'navsatfix_cpt7_ie_tc', 'hdr_left', 'dep