# [0] Accessing GrandTour Data
© 2025 ETH Zurich

 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/leggedrobotics/grand_tour_dataset/blob/main/examples/%5B0%5D_Accessing_GrandTour_Data.ipynb)


## Overview
GrandTour data is avaialable in three formats, hosted on two platforms:

<table>
  <tr>
    <th style="padding: 10px; text-align: left;">Format</th>
    <th style="padding: 10px; text-align: left;"></th>
    <th style="padding: 10px; text-align: left;"></th>
    <th style="padding: 10px; text-align: left;">Hosted on</th>
    <th style="padding: 10px; text-align: left;"></th>
  </tr>
  <tr>
    <td><img src="https://raw.githubusercontent.com/leggedrobotics/grand_tour_dataset/refs/heads/main/assets/ros-logo.png?token=GHSAT0AAAAAACX6Q2VDL4MPT2URST4PMQL4Z5PB4YQ" height="30"></td>
    <td style="padding-left: 15px;"><a href="https://wiki.ros.org/rosbag">ROS Bags</a></td><td></td><td><img src="https://raw.githubusercontent.com/leggedrobotics/grand_tour_dataset/refs/heads/main/assets/rsl-logo.png?token=GHSAT0AAAAAACX6Q2VD7M25RXD6ETUTSYWOZ5PB43A" height="30"></td><td style="padding-left: 15px;">Kleinkram</td>
  </tr>
  <tr>
    <td><img src="https://raw.githubusercontent.com/leggedrobotics/grand_tour_dataset/refs/heads/main/assets/mcap-logo.png?token=GHSAT0AAAAAACX6Q2VC6MPFMAMDDC7QNCV4Z5PB4WA" height="40"></td>
    <td style="padding-left: 15px;"><a href="https://mcap.dev/">MCAP</a></td><td></td><td><img src="https://raw.githubusercontent.com/leggedrobotics/grand_tour_dataset/refs/heads/main/assets/rsl-logo.png?token=GHSAT0AAAAAACX6Q2VD7M25RXD6ETUTSYWOZ5PB43A" height="30"><td style="padding-left: 15px;">Kleinkram</td>
  </tr>
  <tr>
    <td><img src="https://raw.githubusercontent.com/leggedrobotics/grand_tour_dataset/refs/heads/main/assets/zarr-logo.png?token=GHSAT0AAAAAACX6Q2VCG22SOZRJTL42FVF6Z5PB45A" height="40"></td>
    <td style="padding-left: 15px;"><a href="https://zarr.dev/">ZARR</a></td><td></td><td><img src="https://raw.githubusercontent.com/leggedrobotics/grand_tour_dataset/refs/heads/main/assets/hf-logo.png?token=GHSAT0AAAAAACX6Q2VDYNRPBRVVCMMVM76AZ5PB4HA" height="30"><td style="padding-left: 15px;">HuggingFace</td>
  </tr>
</table>

### Structure

Data is stored by **Mission**, which represents a single continuous deployment of the robot. All **Missions** will have the same data fields, except where it was impossible to collect (eg: GNSS data indoors).

## Downloading from Kleinkram
[Kleinkram](https://datasets.leggedrobotics.com/) is the ETHZ Robotic Systems Lab in-house data storage platform. A user account is needed to access the api.

Kleinkram is provided as a CLI that requires python3.8 or later, though data can be downloaded via the Kleinkram UI as well. It is recommended that you use a virtual environment when running locally, for example:

```
virtualenv .venv -ppython3.8
source .venv/bin/activate
```

The CLI can be pip installed:

In [2]:
!pip install -q kleinkram

This will add `klein` to your path, and you are ready to download GrandTour ROSbag and MCAP data! Use `klein --help` to see additional options not covered here.

First, login with:




In [5]:
!klein login

Please open the following URL manually to authenticate: https://api.datasets.leggedrobotics.com/auth/google?state=cli-no-redirect
Enter the authentication token provided after logging in:
Authentication Token: 
Refresh Token: 
Authentication complete. Tokens saved to /root/.kleinkram.json.


Then download a file or entire mission of your choice. Here we will only download a single `.bag` file to keep the notebook lightweight, but you can choose a mission from the [TODO!GrandTour webpage](https://TODO) or [Kleinkram's UI](https://datasets.leggedrobotics.com/).

⚡*Tip:* The Kleinkram UI provides the CLI command needed to download a specific mission (or specific file):

In [44]:
# Download a single .bag with position data
!klein download --dest=. 825939ea-3034-4b70-b3bd-d5ee82ebb430 # The ID of the individual file.

# To download an entire mission, use the format:
# klein download --dest=. --mission={MISSION_UUID}

# To download only files matching a pattern, use the format:
# klein download --dest=. --mission={MISSION_UUID} '*.bag'
# klein download --dest=. --mission={MISSION_UUID} '*imu*'
# etc...

[3m                                        downloading files...                                        [0m
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃[1m [0m[1mproject  [0m[1m [0m┃[1m [0m[1mmission            [0m[1m [0m┃[1m [0m[1mname                      [0m[1m [0m┃[1m [0m[1mid                       [0m[1m [0m┃[1m [0m[1mstate[0m[1m [0m┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ GrandTour │ 2024-11-03-07-52-45 │ 2024-11-03-07-52-45_cpt7_… │ [32m825939ea-3034-4b70-b3bd-…[0m │ [32mOK   [0m │
└───────────┴─────────────────────┴────────────────────────────┴───────────────────────────┴───────┘
Downloading 2024-11-03-07-52-45_cpt7_raw_imu.bag: 100% 4.12k/4.12k [00:00<00:00, 7.19MB/s]


See the bag downloaded:

In [49]:
!ls

Traceback (most recent call last):
  File "/home/kappi/.vscode/extensions/ms-python.python-2025.0.0-linux-x64/python_files/python_server.py", line 133, in exec_user_input
    retval = callable_(user_input, user_globals)
  File "<string>", line 1
    !ls
    ^
SyntaxError: invalid syntax



#### 💡 Notes on Downloading via `klein`
* The `--project` option isn't necessary to specify. It can be used to specify
the **GrandTour** project as Kleinkram is a general purpose data repository for the Robotic System's Lab, but mission UUIDs are unique so it isn't needed.

* The pattern argument can be used to filter the data to get only the ROSBags/MCAP files that you want from the mission, eg:

 `klein download --mission {MISSION_UUID} --dest=. "*imu*"`




## Downloading from HuggingFace

We provide the entire dataset on HuggingFace in `.zarr` format, free from the overhead of the ROS ecosystem. HuggingFace has an easy-to-use python download API called `huggingface_hub`. It is possible to download directly from the GrandTour HuggingFace repo UI, but we strongly reccomend making use of `hugginface_hub`, as it manages caching files, interrupted downloads and smart fetching of updated files.

First, install `huggingface_hub`:

In [None]:
! pip install huggingface_hub

Traceback (most recent call last):
  File "/home/kappi/.vscode/extensions/ms-python.python-2025.0.0-linux-x64/python_files/python_server.py", line 133, in exec_user_input
    retval = callable_(user_input, user_globals)
  File "<string>", line 1
    ! pip install huggingface_hub
    ^
SyntaxError: invalid syntax



Then (you'll need a HuggingFace account first), login using the cli. This will store authentication tokens on your PC and allow you to use the API to download data.

In [None]:
! huggingface-cli login

Now you can download an a mission of your choice. The next tutorial - _[1] Exploring GrandTour Data_ - uses 2024-10-01-11-29-55, so we will donwload it here in anticipation.

In [None]:
mission = "2024-10-01-11-29-55"

In [None]:
from huggingface_hub import snapshot_download

# If this is interuppted during download, simply re-run the block and huggingface_hub will resume the download without re-downloading the already downloaded files.
hugging_face_data_cache_path = snapshot_download(repo_id="leggedrobotics/grand-tour-dataset-testing", allow_patterns=[f"{mission}/**"], repo_type="dataset")

The downloaded data will be compressed into `.tar` files, and must be extracted before it can be used. We reccomend extracting to a destination of your choice outside the huggingface cache directory:

In [None]:
destination_directory = f"/home/kappi/grand_tour_data/{mission}"

Define a `.tar` extractor helper function:

In [None]:
import os
import shutil
import tarfile

def recreate_structure_and_extract(cache_dir, output_dir):
    # Ensure output_dir exists
    os.makedirs(output_dir, exist_ok=True)
    
    # Loop over subdirectories in cache_dir
    for subdir in os.listdir(cache_dir):
        subdir_path = os.path.join(cache_dir, subdir)
        target_subdir = os.path.join(output_dir, subdir)
        
        if os.path.isdir(subdir_path):
            # Create the corresponding subdirectory in output_dir
            os.makedirs(target_subdir, exist_ok=True)
            
            # If it's the metadata folder, just copy all files (like YAMLs)
            if subdir == "metadata":
                for filename in os.listdir(subdir_path):
                    src_file = os.path.join(subdir_path, filename)
                    dst_file = os.path.join(target_subdir, filename)
                    shutil.copy2(src_file, dst_file)
            else:
                # For folders like data and images, process tar files
                for filename in os.listdir(subdir_path):
                    src_file = os.path.join(subdir_path, filename)
                    if filename.endswith(".tar"):
                        # Extract the tar file into the target_subdir
                        with tarfile.open(src_file) as tar:
                            tar.extractall(path=target_subdir)
                    else:
                        # If there are non-tar files that need copying, handle them here
                        shutil.copy2(src_file, target_subdir)


And extract the files:

In [None]:
cache_dir = hugging_face_data_cache_path + f"/{mission}/"
recreate_structure_and_extract(cache_dir, destination_directory)

You should now be able to load the dataset in `.zarr` format an inspect the contents:

In [None]:
import zarr
import zarr.storage

# The /data folder contains the actual data files, while the /metadata folder contains static data like TFs and calibration. Images are stored in /images.
store = zarr.storage.LocalStore(destination_directory + "/data")
root = zarr.group(store=store)
mission_root = zarr.open_group(destination_directory + "/data", mode='r')

# Take a look at the available data
print(mission_root.tree())

/
├── adis_imu
│   ├── ang_vel (69333, 3) float64
│   ├── ang_vel_cov (69333, 3, 3) float64
│   ├── lin_acc (69333, 3) float64
│   ├── lin_acc_cov (69333, 3, 3) float64
│   ├── orien (69333, 4) float64
│   ├── orien_cov (69333, 3, 3) float64
│   ├── sequence_id (69333,) uint64
│   └── timestamp (69333,) uint64
├── alphasense_cam1
│   ├── sequence_id (3588,) uint64
│   └── timestamp (3588,) uint64
├── alphasense_cam2
│   ├── sequence_id (3587,) uint64
│   └── timestamp (3587,) uint64
├── alphasense_cam3
│   ├── sequence_id (3588,) uint64
│   └── timestamp (3588,) uint64
├── alphasense_cam4
│   ├── sequence_id (3588,) uint64
│   └── timestamp (3588,) uint64
├── alphasense_cam5
│   ├── sequence_id (3588,) uint64
│   └── timestamp (3588,) uint64
├── ap20_imu
│   ├── ang_vel (68029, 3) float64
│   ├── ang_vel_cov (68029, 3, 3) float64
│   ├── lin_acc (68029, 3) float64
│   ├── lin_acc_cov (68029, 3, 3) float64
│   ├── orien (68029, 4) float64
│   ├── orien_cov (68029, 3, 3) float64
│   ├── 