# Access GrandTour Data using HuggingFace 🤗
© 2025 ETH Zurich
 
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/leggedrobotics/grand_tour_dataset/blob/main/examples/%5B0%5D_Accessing_GrandTour_Data.ipynb)


## Overview
> GrandTour data is avaialable in two formats, hosted on two platforms:

<table>
  <tr>
    <th style="padding:10px;text-align:left;">Format</th>
    <th style="padding:10px;text-align:left;"> </th>
    <th style="padding:10px;text-align:left;">Hosted&nbsp;on</th>
    <th style="padding:10px;text-align:left;"> </th>
  </tr>

  <tr>
    <td><img src="https://raw.githubusercontent.com/leggedrobotics/grand_tour_dataset/main/assets/ros-logo.png"  height="30" alt="ROS logo"></td>
    <td style="padding-left:15px;"><a href="https://wiki.ros.org/rosbag">ROS&nbsp;Bags</a></td>
    <td><img src="https://raw.githubusercontent.com/leggedrobotics/grand_tour_dataset/main/assets/rsl-logo.png"  height="30" alt="RSL logo"></td>
    <td style="padding-left:15px;">Kleinkram</td>
  </tr>


  <tr>
    <td><img src="https://raw.githubusercontent.com/leggedrobotics/grand_tour_dataset/main/assets/zarr-logo.png" height="40" alt="Zarr logo"></td>
    <td style="padding-left:15px;"><a href="https://zarr.dev/">ZARR</a></td>
    <td><img src="https://raw.githubusercontent.com/leggedrobotics/grand_tour_dataset/main/assets/hf-logo.png"  height="30" alt="Hugging Face logo"></td>
    <td style="padding-left:15px;">HuggingFace</td>
  </tr>
</table>

> This notebook explains how to download the zarr/png converted dataset hosted on Huggingface.
>
> 
> 💡 Please refer to the `examples_hugging_face/explore.ipynb` on how to use the data.
 
## Downloading
> We provide the entire dataset on HuggingFace in `.zarr`, `.png`, and `.yaml` format.
> 
> To avoid checking in +1M individual files on the HuggingHub, we created a tar-ball `.tar` for each topic per mission.

> HuggingFace has an easy-to-use Python download API called `huggingface_hub`.
> It is possible to download directly from the [GrandTour HuggingFace repo UI](https://huggingface.co/leggedrobotics), but we strongly reccomend making use of `huggingface_hub`, as it manages caching files, interrupted downloads and smart fetching of updated files.

> First, install `huggingface_hub` which requires you to  have an HuggingFace account. You can create one for free at [huggingface.co](https://huggingface.co/).

In [3]:
! pip install -q huggingface_hub

> Then, login using the cli. This will store authentication tokens on your PC and allow you to use the API to download data.

In [None]:
# If your notebook isn't able to take input from the command line, run this in a local terminal instead
! huggingface-cli login

> Now you can download an a mission of your choice. The next tutorial - _[1] Exploring GrandTour Data_ - uses 2024-10-01-11-29-55, so we will donwload it here in anticipation.

In [None]:
from huggingface_hub import snapshot_download

# Specify the mission you want to download.
mission = "2024-10-01-11-29-55"

# If this is interuppted during download, simply re-run the block and huggingface_hub will resume the download without re-downloading the already downloaded files.
hugging_face_data_cache_path = snapshot_download(repo_id="leggedrobotics/grand-tour-dataset-testing", allow_patterns=[f"{mission}/**"], repo_type="dataset")

> The downloaded data will be compressed into `.tar` files, and must be extracted before it can be used. We reccomend extracting to a destination of your choice outside the huggingface cache directory:

In [None]:
import os
from pathlib import Path

# Get user home directory in a platform-independent way
home_dir = str(Path.home())

# Define the destination directory
destination_directory = os.path.join(home_dir, f"grand_tour_data/{mission}")

# Create the directory and all parent directories if they don't exist
os.makedirs(destination_directory, exist_ok=True)

# Print for confirmation
print(f"Data will be extracted to: {destination_directory}")

> Define a `.tar` extractor helper function:

In [9]:
import os
import shutil
import tarfile

def recreate_structure_and_extract(cache_dir, output_dir):
    # Ensure output_dir exists
    os.makedirs(output_dir, exist_ok=True)
    
    # Loop over subdirectories in cache_dir
    for subdir in os.listdir(cache_dir):
        subdir_path = os.path.join(cache_dir, subdir)
        target_subdir = os.path.join(output_dir, subdir)
        
        if os.path.isdir(subdir_path):
            # Create the corresponding subdirectory in output_dir
            os.makedirs(target_subdir, exist_ok=True)
            
            # If it's the metadata folder, just copy all files (like YAMLs)
            if subdir == "metadata":
                for filename in os.listdir(subdir_path):
                    src_file = os.path.join(subdir_path, filename)
                    dst_file = os.path.join(target_subdir, filename)
                    shutil.copy2(src_file, dst_file)
            else:
                # For folders like data and images, process tar files
                for filename in os.listdir(subdir_path):
                    src_file = os.path.join(subdir_path, filename)
                    if filename.endswith(".tar"):
                        # Extract the tar file into the target_subdir
                        with tarfile.open(src_file) as tar:
                            tar.extractall(path=target_subdir)
                    else:
                        # If there are non-tar files that need copying, handle them here
                        shutil.copy2(src_file, target_subdir)


> And extract the files:

In [10]:
cache_dir = hugging_face_data_cache_path + f"/{mission}/"
recreate_structure_and_extract(cache_dir, destination_directory)

> You should now be able to load the dataset in `.zarr` format an inspect the contents:

In [11]:
import zarr
import zarr.storage

# The /data folder contains the actual data files, while the /metadata folder contains static data like TFs and calibration. Images are stored in /images.
store = zarr.storage.LocalStore(destination_directory + "/data")
root = zarr.group(store=store)
mission_root = zarr.open_group(destination_directory + "/data", mode='r')

# Take a look at the available data
print(mission_root.tree())




