# Getting Started with Keypoint Estimation: Loading Keypoint Data

## Who this is for
This tutorial is designed for:
- ML engineers working with keypoint detection/pose estimation
- Those new to FiftyOne or with basic familiarity
- Practitioners looking to organize and visualize keypoint datasets
- Anyone building systems for keypoint estimation

## Assumed Knowledge
You should be familiar with:
- Basic computer vision concepts (keypoints, pose estimation)
- Common dataset formats (COCO)
- Python programming fundamentals
- Basic FiftyOne concepts

## Time to complete
- 20-30 minutes to work through the examples
- Additional time if downloading the full datasets

## Required packages
We recommend using a virtual environment with FiftyOne already installed. If you haven't installed FiftyOne yet, follow the [installation guide](https://beta-docs.voxel51.com/getting_started/basic/install/).

Additional required packages:
```python
pip install pycocotools opencv-python numpy
```

## Content Overview

This tutorial contains several key sections:

- Loading Keypoint Data in Common Format: Working with standardized COCO-format keypoint data

- Connecting Keypoint Edges: Creating skeleton structures to visualize keypoint relationships

- Loading Keypoints with Custom Format: Converting custom keypoint formats into FiftyOne's structure

### Loading Keypoint Data in Common Format

Start by downloading the dataset from [this website](https://zenodo.org/records/10057090). Alternatively, you can use your favorite method for programmatically downloading the dataset, for example by running the following command in your terminal (Note: This is a 6GB download):

```bash
wget https://zenodo.org/records/10057090/files/tampar.zip?download=1
```


This dataset is in [COCO format](https://beta-docs.voxel51.com/fiftyone_concepts/dataset_creation/datasets/#cocodetectiondataset) and we can [automatically load it FiftyOne format](https://beta-docs.voxel51.com/fiftyone_concepts/dataset_creation/#common-formats). For this we will need to install `pycocotools`:

In [None]:
!pip install pycocotools

This code demonstrates how to load TAMPAR dataset using FiftyOne's [data loaders for common formats](https://beta-docs.voxel51.com/fiftyone_concepts/dataset_creation/datasets/#supported-import-formats). Here's what's happening:


- **[`fo.Dataset.from_dir()`](https://beta-docs.voxel51.com/api/fiftyone.core.dataset.Dataset.html#from_dir)**: This method imports the TAMPAR dataset in COCO format directly from disk. It accepts several important parameters:
  - `dataset_type=fo.types.COCODetectionDataset`: Specifies we're loading data in [COCO object detection format](https://beta-docs.voxel51.com/api/fiftyone.types.dataset_types.COCODetectionDataset.html)
  - `data_path` and `labels_path`: Point to the directories containing images and annotation JSON file
  - `include_id=True`: Preserves the original COCO IDs
  - `name="TAMPAR"`: Gives our dataset a [descriptive name](https://beta-docs.voxel51.com/api/fiftyone.core.dataset.Dataset.html#name) in FiftyOne
  - `persistent=True`: [Persists](https://beta-docs.voxel51.com/fiftyone_concepts/using_datasets/#dataset-persistence) the dataset to disk for future sessions

- **[`compute_metadata()`](https://beta-docs.voxel51.com/api/fiftyone.core.collections.SampleCollection.html#compute_metadata)**: After loading, this method analyzes all images to extract and store metadata like dimensions, file sizes, and other properties that enhance the dataset's functionality within FiftyOne.


**Note:** If your annotations are in VOC format, you can use the [`VOCDetectionDataset`](https://beta-docs.voxel51.com/fiftyone_concepts/export_datasets/#vocdetectiondataset) type to import your dataset by passing [`fo.types.VOCDetectionDataset`](https://beta-docs.voxel51.com/api/fiftyone.types.html#fiftyone.types.VOCDetectionDataset) into the `dataset_type` argument.


In [1]:
import fiftyone as fo

tampar_dataset = fo.Dataset.from_dir(
    dataset_type=fo.types.COCODetectionDataset,
    data_path="tampar",
    labels_path="tampar/tampar_test.json",
    include_id=True,
    name="TAMPAR",
    persistent=True,
)

tampar_dataset.compute_metadata()

 100% |█████████████████| 485/485 [26.2s elapsed, 0s remaining, 20.4 samples/s]      


The `keypoints` field is automatically parsed as [Keypoint labels](https://beta-docs.voxel51.com/api/fiftyone.core.labels.Keypoint.html) in the Dataset. 

[Keypoints](https://beta-docs.voxel51.com/api/fiftyone.core.labels.Keypoints.html) in FiftyOne represents a **collection of coordinate points that mark specific locations in an image**. These can be used for localizing important features like facial landmarks, human pose joints, or in our case, the corners of parcel boxes. Each [Keypoint label](https://beta-docs.voxel51.com/api/fiftyone.core.labels.Keypoint.html) contains:

* A list of points as (x,y) coordinates normalized to [0,1] range
* Optional confidence scores for each point
* A semantic label describing what these points represent
* Optional attributes for storing additional metadata

This dataset has 8 keypoints (24 values total), all with `visibility=2` (which means "visible").

Notice that the field we parsed (which is named `keypoints`) is a FiftyOne [Keypoints](https://beta-docs.voxel51.com/api/fiftyone.core.labels.Keypoints.html) label and the entire collection of points which make up the box is parsed as as a [Keypoint label](https://beta-docs.voxel51.com/api/fiftyone.core.labels.Keypoint.html). This is useful when Keypoints are semantically meaningful and should be parsed together.

In [2]:
tampar_dataset.first()['keypoints']

<Keypoints: {
    'keypoints': [
        <Keypoint: {
            'id': '67d452a8a7449f0b9f1841c9',
            'attributes': {},
            'tags': [],
            'label': 'normal box',
            'points': [
                [0.5589583333333333, 0.6946957671957673],
                [0.14292162698412697, 0.4266931216931217],
                [0.09013640873015873, 0.21361772486772487],
                [0.5546006944444445, 0.9186904761904762],
                [0.898640873015873, 0.31390873015873016],
                [0.4302876984126984, 0.27409391534391536],
                [0.43548611111111113, 0.06962632275132276],
                [0.8419171626984128, 0.5316666666666666],
            ],
            'confidence': None,
            'index': None,
            'visible': [2, 2, 2, 2, 2, 2, 2, 2],
            'supercategory': 'box',
            'iscrowd': 0,
            'num_keypoints': 8,
            'occluded': False,
        }>,
    ],
}>

Now that the Dataset is loaded, we can [launch the app](https://beta-docs.voxel51.com/getting_started/basic/application_tour/) and inspect it.

```python
fo.launch_app(tampar_dataset)
```

<img src="assets/tampar.webp" width="70%">

# Connecting Keypoint Edges in FiftyOne

When combined with a [`KeypointSkeleton`](https://beta-docs.voxel51.com/api/fiftyone.core.odm.dataset.KeypointSkeleton.html), these individual points form a connected structure (like our box wireframe) that visually represents the spatial relationships between points.

In our parcel detection dataset, we use keypoints to mark the 8 corners of each box, allowing us to reconstruct the 3D geometry of parcels from 2D images.

## The Skeleton Structure

All [Dataset](https://beta-docs.voxel51.com/api/fiftyone.core.dataset.html#fiftyone.core.dataset.Dataset) instances have [`skeletons`](https://beta-docs.voxel51.com/api/fiftyone.core.dataset.Dataset.html#skeletons) and [`default_skeleton`](https://beta-docs.voxel51.com/api/fiftyone.core.dataset.Dataset.html#default_skeleton) properties that you can use to store keypoint skeletons for Keypoint field(s) of a dataset.

To visualize these boxes correctly in FiftyOne, we define a skeleton that connects these keypoints with edges. Our [`KeypointSkeleton`](https://beta-docs.voxel51.com/api/fiftyone.core.odm.dataset.KeypointSkeleton.html) defines:

1. **Labels**: Semantic names for each keypoint (e.g., "front_top_right", "back_bottom_left")
2. **Edges**: Connections between pairs of keypoints that form the wireframe of the 3D box

## Edge Connections

The edges are organized to represent the physical structure of the keypoints:

- **Front Face**: Connects the four front corners in a clockwise or counter-clockwise order

- **Top Face**: Connects the four top corners (front-top-left → back-top-left → back-top-right → front-top-right)

- **Right Side**: Connects the front and back edges on the right side of the box

- **Additional Edges**: Completes the remaining connections to form a full box

## Implementation

In FiftyOne, we implement this by creating a [`KeypointSkeleton`](https://beta-docs.voxel51.com/api/fiftyone.core.odm.dataset.KeypointSkeleton.html) object:


In [3]:
import fiftyone as fo

labels = [
    "front_upper_right",    # 0
    "front_lower_left",     # 1
    "front_upper_left",     # 2
    "front_lower_right",    # 3
    "back_upper_right",     # 4
    "back_upper_left",      # 5 
    "back_upper_left",      # 6 
    "back_lower_right"      # 7
]

# Complete set of edges to create a full 3D box
edges = [
    # Front face - complete square
    [2, 0],  # front upper-left to front upper-right
    [0, 3],  # front upper-right to front lower-right
    [3, 1],  # front lower-right to front lower-left
    [1, 2],  # front lower-left to front upper-left
    
    # Top face
    [2, 6],  # front upper-left to back upper-left
    [6, 4],  # back upper-left to back upper-right
    [4, 0],  # back upper-right to front upper-right
    
    # Right side face
    [0, 4],  # front upper-right to back upper-right
    [4, 7],  # back upper-right to back lower-right
    [7, 3],  # back lower-right to front lower-right
    
    # Back face 
    [6, 4],  # back upper-left to back upper-right
    [4, 7],  # back upper-right to back lower-right
    
    # Left side face 
    [2, 6],  # front upper-left to back upper-left
    
    # Needed for complete box
    [1, 5],  # front lower-left to back lower-left
    [5, 6],  # back lower-left to back upper-left
    [5, 7]   # back lower-left to back lower-right
]


tampar_dataset.default_skeleton = fo.KeypointSkeleton(
    labels=labels,
    edges=edges
)

tampar_dataset.save()



This skeleton definition ensures that when our 3D box keypoints are visualized in FiftyOne, the lines connecting them accurately represent the box's structure, making it easy to interpret the detection results at a glance.


```python
fo.launch_app(tampar_dataset)
```

<img src="assets/tampar-skeletons.webp" width="70%">

## Loading Keypoints with Custom Format

We'll use the [_Hand Keypoint Detection in Single Images
using Multiview Bootstrapping_](http://domedb.perception.cs.cmu.edu/handdb.html) dataset. You can download the dataset from here: http://domedb.perception.cs.cmu.edu/panopticDB/hands/hand_labels.zip

Regardless of your dataset, whose specifics may be different, the core pattern for parsing them will remain the same:

### 1. Create an empty Dataset

The [`fo.Dataset()`](https://beta-docs.voxel51.com/api/fiftyone.core.dataset.Dataset.html) constructor creates a new FiftyOne dataset named "hand_keypoints". The `overwrite=True` parameter ensures any existing dataset with the same name is replaced.

### 2. Create Samples to populate the Dataset with
The [`fo.Sample()`](https://beta-docs.voxel51.com/api/fiftyone.core.sample.Sample.html) class is the fundamental unit in FiftyOne. Each sample represents one data point (in this case, an image) along with its metadata and annotations. Here we create samples pointing to image files on disk.

### 3. Parse Keypoints for each Sample

The [`fo.Keypoint()`](https://beta-docs.voxel51.com/api/fiftyone.core.labels.Keypoint.html) class represents a single labeled point or a collection of points in an image. In this code, we're using it to represent all joints of a hand with:
- `label`: Identifies this as a "left hand" or "right hand"
- `points`: The normalized (0-1) coordinates of each joint
- `num_keypoints`: The total number of joints in the hand

### 4. Add the parsed Keypoint to a Keypoints Field
The [`fo.Keypoints()`](https://beta-docs.voxel51.com/api/fiftyone.core.labels.Keypoints.html) class is a collection that can contain multiple `Keypoint` objects. In our case, we're creating a collection with just one keypoint (which itself contains all the joint points).

### 5. Adding Fields to Samples
FiftyOne samples are dynamic - we can add custom fields to them. Here we add a field named either "left_hand" or "right_hand" containing the keypoints collection.

### 6. Add Samples to Dataset
Rather than adding samples one at a time, which would be inefficient, we collect all samples in a list and add them to the dataset in a single batch operation using [`add_samples()`](https://beta-docs.voxel51.com/api/fiftyone.core.dataset.Dataset.html#add_samples).


The FiftyOne Keypoints structure has an important hierarchical design that's reflected in our code:

- A [`Keypoints`](https://beta-docs.voxel51.com/api/fiftyone.core.labels.Keypoints.html) object is a collection container that holds multiple `Keypoint` objects.

- Each [`Keypoint`](https://beta-docs.voxel51.com/api/fiftyone.core.labels.Keypoint.html) object represents a semantically meaningful group of points (like a human pose or hand), not a single point.

In our implementation, we store all joints of a hand/body as a single [`Keypoint`](https://beta-docs.voxel51.com/api/fiftyone.core.labels.Keypoint.html) object with multiple coordinate points rather than creating separate [`Keypoint`](https://beta-docs.voxel51.com/api/fiftyone.core.labels.Keypoint.html)  objects for each individual joint. 

This approach is ideal when:

1. The points collectively represent a unified structure (like a hand or full body pose)
2. The relationships between points matter (which we visualize using a skeleton)
3. All points share a common label (e.g., "left hand" or "body")

For different use cases where points aren't semantically related (such as general point tracking or scattered interest points), you would create individual [`Keypoint`](https://beta-docs.voxel51.com/api/fiftyone.core.labels.Keypoint.html)  objects for each independent point instead.

In [None]:
import os
import json
import glob
import numpy as np
import cv2
import fiftyone as fo

# Input data path for labeled hand images
dataset_path = 'hand_labels/manual_test/'

# Get all annotation JSON files
annotation_files = glob.glob(os.path.join(dataset_path, "*.json"))

# Create a FiftyOne dataset
hand_dataset = fo.Dataset(
    "hand_keypoints",
    overwrite=True,
    persistent=True
    )

# Create a list to collect all samples
samples_to_add = []

# Process each annotation file
for annotation_file in annotation_files:
    # Get corresponding image file
    image_file = annotation_file.replace(".json", ".jpg")
    
    # Load the annotation data
    with open(annotation_file, 'r') as file_handle:
        annotation_data = json.load(file_handle)
    
    # Create a sample for this image
    sample = fo.Sample(filepath=image_file)

    # Get image dimensions for normalization
    image = cv2.imread(image_file)
    image_height, image_width = image.shape[:2]
    
    # Process hand keypoints if present
    if 'hand_pts' in annotation_data:
        # Extract hand joint coordinates and hand type
        joint_coordinates = np.array(annotation_data['hand_pts'])
        is_left_hand = annotation_data['is_left']
        
        # Create a list of all normalized keypoints for the hand
        normalized_joint_points = []
        
        for joint_index in range(joint_coordinates.shape[0]):
            # Normalize coordinates to [0, 1] range
            normalized_x = joint_coordinates[joint_index, 0] / image_width
            normalized_y = joint_coordinates[joint_index, 1] / image_height
            
            normalized_joint_points.append([normalized_x, normalized_y])
        
        # Determine field name and label based on hand type
        hand_field_name = "left_hand" if is_left_hand else "right_hand"
        hand_label = "left hand" if is_left_hand else "right hand"
        
        # Create a keypoint object containing all joint points
        hand_keypoint = fo.Keypoint(
            label=hand_label,
            points=normalized_joint_points,
            num_keypoints=len(normalized_joint_points),
        )
        
        # Create keypoints collection
        hand_keypoints_collection = fo.Keypoints(keypoints=[hand_keypoint])
        
        # Add keypoints to the sample
        sample[hand_field_name] = hand_keypoints_collection
    
    # Process body keypoints if present
    if 'mpii_body_pts' in annotation_data:
        # Extract body joint coordinates
        body_coordinates = np.array(annotation_data['mpii_body_pts'])
        
        # Create a list of all normalized keypoints for the body
        normalized_body_points = []
        
        for joint_index in range(body_coordinates.shape[0]):
            # Normalize coordinates to [0, 1] range
            normalized_x = body_coordinates[joint_index, 0] / image_width
            normalized_y = body_coordinates[joint_index, 1] / image_height
            
            normalized_body_points.append([normalized_x, normalized_y])
        
        # Create a keypoint object containing all body joint points
        body_keypoint = fo.Keypoint(
            label="body",
            points=normalized_body_points,
            num_keypoints=len(normalized_body_points),
        )
        
        # Create keypoints collection 
        body_keypoints_collection = fo.Keypoints(keypoints=[body_keypoint])
        
        # Add keypoints to the sample
        sample["body"] = body_keypoints_collection
    
    # Add sample to our list instead of directly to dataset
    samples_to_add.append(sample)

# Add all samples to the dataset in a single batch operation
hand_dataset.add_samples(samples_to_add)

# Save the dataset
hand_dataset.save()

Recapping the code above, we:

1. Locatee all JSON annotation files in the specified directory
2. Created a new FiftyOne dataset
3. For each annotation file:
   - Loaded the corresponding image and JSON data
   - Extracted hand joint coordinates
   - Normalized the coordinates to [0,1] range
   - Created a FiftyOne Keypoint object with all the joint points
   - Created a Keypoints collection containing this keypoint
   - Added this collection to the sample as either ("left_hand" or "right_hand", and "body")
   - Added the sample to a list
4. Added all samples to the dataset in one efficient batch operation

Let's inspect the the keypoints in the `body` field:

In [6]:
hand_dataset.first()['body']

<Keypoints: {
    'keypoints': [
        <Keypoint: {
            'id': '67d45301a7449f0b9f18495d',
            'attributes': {},
            'tags': [],
            'label': 'body',
            'points': [
                [0.46510416666666665, 0.7435185185185185],
                [0.45729166666666665, 0.35555555555555557],
                [0.4548357963562012, 0.2400996172869647],
                [0.45037253697713214, 0.03027074248702438],
                [0.5109375, 0.8972222222222223],
                [0.3125, 0.7851851851851852],
                [0.4125, 0.7472222222222222],
                [0.5177083333333333, 0.7388888888888889],
                [0.6838541666666667, 0.7472222222222222],
                [0.46927083333333336, 0.8796296296296297],
                [0.42239583333333336, 0.3398148148148148],
                [0.4078125, 0.5583333333333333],
                [0.365625, 0.3425925925925926],
                [0.5484375, 0.3685185185185185],
                [0.553125, 0.621296

As seen before, we can define the connectivity between Keypoints. In this example, I will show you how to set the [`skeleton`](https://beta-docs.voxel51.com/api/fiftyone.core.dataset.Dataset.html#skeletons) for the semantic labels and point connectivity of the hands, while leaving the body connectivity empty.

In [2]:
# Define the keypoint labels for hand joints
hand_joint_labels = [
    "WRIST", "THUMB_CMC", "THUMB_MCP", "THUMB_IP", "THUMB_TIP",
    "INDEX_MCP", "INDEX_PIP", "INDEX_DIP", "INDEX_TIP",
    "MIDDLE_MCP", "MIDDLE_PIP", "MIDDLE_DIP", "MIDDLE_TIP",
    "RING_MCP", "RING_PIP", "RING_DIP", "RING_TIP",
    "PINKY_MCP", "PINKY_PIP", "PINKY_DIP", "PINKY_TIP"
]

# Define the hand skeleton connections
hand_joint_connections = [
    # Original finger connections
    [0, 1], [1, 2], [2, 3], [3, 4],       # Thumb
    [0, 5], [5, 6], [6, 7], [7, 8],       # Index finger
    [0, 9], [9, 10], [10, 11], [11, 12],  # Middle finger
    [0, 13], [13, 14], [14, 15], [15, 16], # Ring finger
    [0, 17], [17, 18], [18, 19], [19, 20], # Pinky finger
    
    # Knuckle connections (MCP joints)
    [2, 5],    # Thumb MCP to Index MCP
    [5, 9],    # Index MCP to Middle MCP
    [9, 13],   # Middle MCP to Ring MCP
    [13, 17]   # Ring MCP to Pinky MCP
]


# Create the hand skeleton definition
hand_skeleton = fo.KeypointSkeleton(
    labels=hand_joint_labels,
    edges=hand_joint_connections
)

hand_dataset.skeletons = {
    "right_hand": hand_skeleton,
    "left_hand": hand_skeleton,
}

# Save the dataset
hand_dataset.save()

We can now inspect the parsed Dataset in the FiftyOne App:

```python
fo.launch_app(hand_dataset)
```

<img src="assets/hands-dataset.webp" width="70%">

## Summary

In this tutorial, you learned how to work with keypoint data in FiftyOne through three main approaches:

1. **Loading Standard Format Data**
   - Used[ FiftyOne's built-in COCO format support](https://beta-docs.voxel51.com/fiftyone_concepts/dataset_creation/) to load the TAMPAR dataset
   - Learned how keypoints are automatically parsed into FiftyOne's data structure
   - Explored the relationship between [Keypoints](https://beta-docs.voxel51.com/api/fiftyone.core.labels.Keypoints.html#keypoints) collections and individual [Keypoint](https://beta-docs.voxel51.com/api/fiftyone.core.labels.Keypoint.html) labels

2. **Creating Keypoint Skeletons**
   - Defined semantic relationships between keypoints using [KeypointSkeleton](https://beta-docs.voxel51.com/api/fiftyone.core.odm.dataset.KeypointSkeleton.html)
   - Created edge connections to visualize 3D structures from 2D keypoints
   - Learned how to customize skeleton visualization for different use cases

3. **Working with Custom Formats**
   - Implemented a complete workflow for loading custom keypoint data
   - Normalized coordinate systems for consistent representation
   - Organized related keypoints into meaningful structures
   - Handled multiple keypoint types (hands and body) in the same dataset

### Key Takeaways

- FiftyOne provides flexible tools for working with keypoint data, whether in standard formats or custom annotations

- The hierarchical Keypoints → Keypoint structure helps organize related points meaningfully

- Skeleton definitions turn abstract point collections into interpretable visualizations

- Batch operations (like [`add_samples()`](https://beta-docs.voxel51.com/api/fiftyone.core.dataset.Dataset.html#add_samples)) help maintain efficiency when working with large datasets

### What's Next?

You're now equipped to:
- Load and visualize your own keypoint datasets
- Create custom skeleton definitions for your specific use cases
- Convert various keypoint formats into FiftyOne's structure
- Build more complex applications using FiftyOne's keypoint capabilities


### Next steps

Check out the additional resources in the Next Steps section to continue your journey with FiftyOne and computer vision!

* Checkout these poset estimation datasets on the Hugging Face Hub:
  * [DensePose-COCO](https://huggingface.co/datasets/Voxel51/DensePose-COCO)
  * [MPII_Human_Pose_Dataset](https://huggingface.co/datasets/Voxel51/MPII_Human_Pose_Dataset)

* Read more about [Creating Pose Skeletons from Scratch](https://voxel51.com/blog/creating-pose-skeletons-from-scratch-fiftyone-tips-and-tricks-sep-15-2023/)

* Learn about working with [Detections](https://beta-docs.voxel51.com/how_do_i/recipes/adding_detections/)

* Learn about working with [Segmentations](https://beta-docs.voxel51.com/fiftyone_concepts/using_datasets/#semantic-segmentation)

* Join the [Discord community](https://community.voxel51.com/)

* Follow us on [LinkedIn](https://www.linkedin.com/company/voxel51/)