Skip to content

pandorgan/APT-36K

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 

Repository files navigation

APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking

This is the official repository of [NeurIPS'23] APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking.

Yuxiang Yang, Junjie Yang, Yufei Xu, Jing Zhang, Long Lan, Dacheng Tao

Introduction | APT-36K | Demo | Statement

Introduction

Animal pose estimation and tracking (APT) is a fundamental task for detecting and tracking animal keypoints from a sequence of video frames. Previous animal-related datasets focus either on animal tracking or single-frame animal pose estimation, and never on both aspects. To fill this gap, we make the first step and propose APT-36K, i.e., the first large-scale benchmark for animal pose estimation and tracking. Specifically, APT-36K consists of 2,400 video clips collected and filtered from 30 animal species with 15 frames for each video, resulting in 36,000 frames in total. After manual annotation and careful double-check, high-quality keypoint and tracking annotations are provided for all the animal instances. Based on APT-36K, we benchmark several representative models on the following three tracks: (1) supervised animal pose estimation on a single frame under intra- and inter-domain transfer learning settings, (2) inter-species domain generalization test for unseen animals, and (3) animal pose estimation with animal tracking. Based on the experimental results, we gain some empirical insights and show that APT-36K provides a valuable animal pose estimation and tracking benchmark, offering new challenges and opportunities for future research. Annotated files and corresponding images our datasets can be downloaded at https://1drv.ms/u/s!AimBgYV7JjTlgcZ9zLyl5KnM3dKMgg?e=uaaLz5. The individual annotation files can be downloaded at https://1drv.ms/u/s!AimBgYV7JjTlgTuYdjjtYON3sxEZ?e=5deTDn .

APT-36k

The goal of APT-36K is to provide a large-scale benchmark for animal pose estimation and tracking in real-world scenarios, which has been rarely explored in prior art. To this end, we resort to real-world video websites, i.e., YouTube, and carefully collect and filter 2,400 video clips covering 30 different animal species from different scenes, e.g., zoo, forest, and desert. Then we manually set the frame sampling rate for each video to ensure there are noticeable movement and posture differences for each animal in the sub-sampled video clips. Specifically, each clip contains 15 frames after the sampling process.The whole data collection, cleaning, annotation, and check process takes about 2,000 person-hours. A total of 36,000 images are finally labeled, following the COCO labeling format. There are typically 17 keypoints labeled for each animal instance, including two eyes, one nose, one neck, one tail, two shoulders, two elbows, two knees, two hips, and four paws.

We also calculate the distributions of the keypoint motion, IOU between tracked bounding boxes in adjacent frames, and the aspect ratio of the annotated bounding boxes in our APT-36K dataset. As shown in (a), the motion distribution and average motion distance vary a lot for different keypoints, e.g., the average motion distance of paws is over 50 pixels, which is much larger than that of eyes or necks (about 35 pixels). Moreover, the motion magnitudes of shoulder, knee, and hips lie between those of eyes and paws, which is in line with the movement characteristics of four-leg animals. Besides, most of the instances have small IOU scores between their tracked bounding boxes in adjacent frames, implying large motion is very common in APT-36K, as demonstrated in (b). It can also be observed from (c) that the aspect ratio of the bounding box varies a lot from less than 0.4 to more than 3.1. It is because APT-36K contains diverse animals with different actions, e.g., running rabbits and climbing monkeys. These results illustrate the diversity of APT-36K.

Demo

Here we show some examples from the APT-36K dataset. The motion trajectory of key points of the animal's body in 15 consecutive frames is shown in the third row of images.

APTv2

APTv2 is an extension of APT-36K, increasing the number of animal instances from 53,006 to 84,611.We split APTv2 into easy and hard subsets based on the number of instances that exists in the frame.

click here if you were interested: APTv2

Statement

If you are interested in our work, please consider citing the following:

@article{yang2022apt,
  title={Apt-36k: A large-scale benchmark for animal pose estimation and tracking},
  author={Yang, Yuxiang and Yang, Junjie and Xu, Yufei and Zhang, Jing and Lan, Long and Tao, Dacheng},
  journal={Advances in Neural Information Processing Systems},
  volume={35},
  pages={17301--17313},
  year={2022}
}

This project is under MIT licence.

Relevant Projects

[1] AP-10K: A Benchmark for Animal Pose Estimation in the Wild, Neurips, 2021 | Paper | Github
     Hang Yu, Yufei Xu, Jing Zhang, Dacheng Tao

About

The extension of this dataset (APTv2) can be found at:

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •