# Getting Started with Keypoint Estimation: Loading Keypoint Data



### Loading Keypoint Data in Common Format

Start by downloading the dataset from [this website](https://zenodo.org/records/10057090). Alternatively, you can use your favorite method for programmatically downloading the dataset, for example by running the following command in your terminal (Note: This is a 6GB download):

```bash
wget https://zenodo.org/records/10057090/files/tampar.zip?download=1
```


This dataset is in [COCO format](https://beta-docs.voxel51.com/fiftyone_concepts/dataset_creation/datasets/#cocodetectiondataset) and we can [automatically load it FiftyOne format](https://beta-docs.voxel51.com/fiftyone_concepts/dataset_creation/#common-formats). For this we will need to install `pycocotools`:

In [None]:
!pip install pycocotools

This code demonstrates how to load TAMPAR dataset using FiftyOne's [data loaders for common formats](https://beta-docs.voxel51.com/fiftyone_concepts/dataset_creation/datasets/#supported-import-formats). Here's what's happening:


- **[`fo.Dataset.from_dir()`](https://beta-docs.voxel51.com/api/fiftyone.core.dataset.Dataset.html#from_dir)**: This method imports the TAMPAR dataset in COCO format directly from disk. It accepts several important parameters:
  - `dataset_type=fo.types.COCODetectionDataset`: Specifies we're loading data in [COCO object detection format](https://beta-docs.voxel51.com/api/fiftyone.types.dataset_types.COCODetectionDataset.html)
  - `data_path` and `labels_path`: Point to the directories containing images and annotation JSON file
  - `include_id=True`: Preserves the original COCO IDs
  - `name="TAMPAR"`: Gives our dataset a [descriptive name](https://beta-docs.voxel51.com/api/fiftyone.core.dataset.Dataset.html#name) in FiftyOne
  - `persistent=True`: [Persists](https://beta-docs.voxel51.com/fiftyone_concepts/using_datasets/#dataset-persistence) the dataset to disk for future sessions

- **[`compute_metadata()`](https://beta-docs.voxel51.com/api/fiftyone.core.collections.SampleCollection.html#compute_metadata)**: After loading, this method analyzes all images to extract and store metadata like dimensions, file sizes, and other properties that enhance the dataset's functionality within FiftyOne.


**Note:** If your annotations are in VOC format, you can use the [`VOCDetectionDataset`](https://beta-docs.voxel51.com/fiftyone_concepts/export_datasets/#vocdetectiondataset) type to import your dataset by passing [`fo.types.VOCDetectionDataset`](https://beta-docs.voxel51.com/api/fiftyone.types.html#fiftyone.types.VOCDetectionDataset) into the `dataset_type` argument.


In [None]:
import fiftyone as fo

tampar_dataset = fo.Dataset.from_dir(
    dataset_type=fo.types.COCODetectionDataset,
    data_path="tampar",
    labels_path="tampar/tampar_test.json",
    include_id=True,
    name="TAMPAR",
    persistent=True
)

tampar_dataset.compute_metadata()

The keypoints are automatically parsed as [Keypoint labels](https://beta-docs.voxel51.com/api/fiftyone.core.labels.Keypoint.html) in the Dataset. 

Keypoints in FiftyOne represents a collection of coordinate points that mark specific locations in an image. These can be used for localizing important features like facial landmarks, human pose joints, or in our case, the corners of parcel boxes. Each [Keypoint label](https://beta-docs.voxel51.com/api/fiftyone.core.labels.Keypoint.html) contains:

* A set of points as (x,y) coordinates normalized to [0,1] range
* Optional confidence scores for each point
* A semantic label describing what these points represent
* Optional attributes for storing additional metadata

This dataset has 8 keypoints (24 values total), all with `visibility=2` (which means "visible")

In [None]:
tampar_dataset.first()['keypoints']

Now that the Dataset is loaded, we can [launch the app](https://beta-docs.voxel51.com/getting_started/basic/application_tour/) and inspect it.

```python
fo.launch_app(tampar_dataset)
```

<img src="assets/tampar.gif">

# Connecting Keypoint Edges in FiftyOne

When combined with a [`KeypointSkeleton`](https://beta-docs.voxel51.com/api/fiftyone.core.odm.dataset.KeypointSkeleton.html), these individual points form a connected structure (like our box wireframe) that visually represents the spatial relationships between points.

In our parcel detection dataset, we use keypoints to mark the 8 corners of each box, allowing us to reconstruct the 3D geometry of parcels from 2D images.

## The Skeleton Structure

All [Dataset](https://beta-docs.voxel51.com/api/fiftyone.core.dataset.html#fiftyone.core.dataset.Dataset) instances have [`skeletons`](https://beta-docs.voxel51.com/api/fiftyone.core.dataset.Dataset.html#skeletons) and [`default_skeleton`](https://beta-docs.voxel51.com/api/fiftyone.core.dataset.Dataset.html#default_skeleton) properties that you can use to store keypoint skeletons for Keypoint field(s) of a dataset.

To visualize these boxes correctly in FiftyOne, we define a skeleton that connects these keypoints with edges. Our [`KeypointSkeleton`](https://beta-docs.voxel51.com/api/fiftyone.core.odm.dataset.KeypointSkeleton.html) defines:

1. **Labels**: Semantic names for each keypoint (e.g., "front_top_right", "back_bottom_left")
2. **Edges**: Connections between pairs of keypoints that form the wireframe of the 3D box

## Edge Connections

The edges are organized to represent the physical structure of the keypoints:

- **Front Face**: Connects the four front corners in a clockwise or counter-clockwise order

- **Top Face**: Connects the four top corners (front-top-left → back-top-left → back-top-right → front-top-right)

- **Right Side**: Connects the front and back edges on the right side of the box

- **Additional Edges**: Completes the remaining connections to form a full box

## Implementation

In FiftyOne, we implement this by creating a [`KeypointSkeleton`](https://beta-docs.voxel51.com/api/fiftyone.core.odm.dataset.KeypointSkeleton.html) object:


In [60]:
import fiftyone as fo

labels = [
    "front_upper_right",    # 0
    "front_lower_left",     # 1
    "front_upper_left",     # 2
    "front_lower_right",    # 3
    "back_upper_right",     # 4
    "back_upper_left",      # 5 
    "back_upper_left",      # 6 
    "back_lower_right"      # 7
]

# Complete set of edges to create a full 3D box
edges = [
    # Front face - complete square
    [2, 0],  # front upper-left to front upper-right
    [0, 3],  # front upper-right to front lower-right
    [3, 1],  # front lower-right to front lower-left
    [1, 2],  # front lower-left to front upper-left
    
    # Top face
    [2, 6],  # front upper-left to back upper-left
    [6, 4],  # back upper-left to back upper-right
    [4, 0],  # back upper-right to front upper-right
    
    # Right side face
    [0, 4],  # front upper-right to back upper-right
    [4, 7],  # back upper-right to back lower-right
    [7, 3],  # back lower-right to front lower-right
    
    # Back face 
    [6, 4],  # back upper-left to back upper-right
    [4, 7],  # back upper-right to back lower-right
    
    # Left side face 
    [2, 6],  # front upper-left to back upper-left
    
    # Needed for complete box
    [1, 5],  # front lower-left to back lower-left
    [5, 6],  # back lower-left to back upper-left
    [5, 7]   # back lower-left to back lower-right
]


tampar_dataset.default_skeleton = fo.KeypointSkeleton(
    labels=labels,
    edges=edges
)

tampar_dataset.save()



This skeleton definition ensures that when our 3D box keypoints are visualized in FiftyOne, the lines connecting them accurately represent the box's structure, making it easy to interpret the detection results at a glance.


```python
fo.launch_app(tampar_dataset)
```

<img src="assets/tampar-skeletons.gif">

## Custom Format

http://domedb.perception.cs.cmu.edu/handdb.html

This code will demonstrate how to parse a Keypoints datset with a custom annotation format. The specifics of your dataset may be different, but the core will remain the same:

- You will need to use the Keypoint lable type
- You will need to normalize the points to the [0,1] range

You can download the dataset from here: http://domedb.perception.cs.cmu.edu/panopticDB/hands/hand_labels.zip

In [None]:
import os
import json
import glob
import numpy as np
import cv2
import fiftyone as fo

# Input data path for labeled hand images
dataset_path = 'hand_labels/manual_test/'

# Get all annotation JSON files
annotation_files = glob.glob(os.path.join(dataset_path, "*.json"))

# Create a FiftyOne dataset with overwrite option to ensure clean slate
hand_dataset = fo.Dataset("hand_keypoints", overwrite=True)

# Create a list to collect all samples
samples_to_add = []

# Process each annotation file
for annotation_file in annotation_files:
    # Get corresponding image file
    image_file = annotation_file.replace(".json", ".jpg")
    
    # Load the annotation data
    with open(annotation_file, 'r') as file_handle:
        annotation_data = json.load(file_handle)
    
    # Create a sample for this image
    sample = fo.Sample(filepath=image_file)

    # Extract hand joint coordinates and hand type
    joint_coordinates = np.array(annotation_data['hand_pts'])
    is_left_hand = annotation_data['is_left']
    
    # Get image dimensions for normalization
    image = cv2.imread(image_file)
    image_height, image_width = image.shape[:2]
    
    # Create a list of all normalized keypoints for the hand
    normalized_joint_points = []
    
    for joint_index in range(joint_coordinates.shape[0]):
        # Normalize coordinates to [0, 1] range
        normalized_x = joint_coordinates[joint_index, 0] / image_width
        normalized_y = joint_coordinates[joint_index, 1] / image_height
        
        normalized_joint_points.append([normalized_x, normalized_y])
    
    # Determine field name and label based on hand type
    hand_field_name = "left_hand" if is_left_hand else "right_hand"
    hand_label = "left hand" if is_left_hand else "right hand"
    
    # Create a keypoint object containing all joint points
    hand_keypoint = fo.Keypoint(
        label=hand_label,
        points=normalized_joint_points,
        num_keypoints=len(normalized_joint_points),
    )
    
    # Create keypoints collection and assign the skeleton
    hand_keypoints_collection = fo.Keypoints(keypoints=[hand_keypoint])
    
    # Add keypoints to the sample
    sample[hand_field_name] = hand_keypoints_collection
    
    # Add sample to our list instead of directly to dataset
    samples_to_add.append(sample)

# Add all samples to the dataset in a single batch operation
hand_dataset.add_samples(samples_to_add)

In [93]:
# Define the keypoint labels for each joint in the hand
hand_joint_labels = [
    "WRIST", "THUMB_CMC", "THUMB_MCP", "THUMB_IP", "THUMB_TIP",
    "INDEX_MCP", "INDEX_PIP", "INDEX_DIP", "INDEX_TIP",
    "MIDDLE_MCP", "MIDDLE_PIP", "MIDDLE_DIP", "MIDDLE_TIP",
    "RING_MCP", "RING_PIP", "RING_DIP", "RING_TIP",
    "PINKY_MCP", "PINKY_PIP", "PINKY_DIP", "PINKY_TIP"
]

# Define the connections between joints to visualize the hand structure
hand_joint_connections = [
    [0, 1], [1, 2], [2, 3], [3, 4],       # Thumb
    [0, 5], [5, 6], [6, 7], [7, 8],       # Index finger
    [0, 9], [9, 10], [10, 11], [11, 12],  # Middle finger
    [0, 13], [13, 14], [14, 15], [15, 16], # Ring finger
    [0, 17], [17, 18], [18, 19], [19, 20]  # Pinky finger
]

# Create the hand skeleton definition for visualization
hand_skeleton = fo.KeypointSkeleton(
    labels=hand_joint_labels,
    edges=hand_joint_connections
)

hand_dataset.default_skeleton = hand_skeleton

# Save the dataset
hand_dataset.save()


In [None]:
fo.launch_app(hand_dataset)



### Next steps

* Checkout these poset estimation datasets on the Hugging Face Hub:
  * [DensePose-COCO](https://huggingface.co/datasets/Voxel51/DensePose-COCO)
  * [MPII_Human_Pose_Dataset](https://huggingface.co/datasets/Voxel51/MPII_Human_Pose_Dataset)

* Learn about working with [Detections](https://beta-docs.voxel51.com/how_do_i/recipes/adding_detections/)

* Learn about working with [Segmentations](https://beta-docs.voxel51.com/fiftyone_concepts/using_datasets/#semantic-segmentation)

* Join the [Discord community](https://community.voxel51.com/)

* Follow us on [LinkedIn](https://www.linkedin.com/company/voxel51/)