We present Okutama-Action, a new video dataset for aerial view concurrent human action detection. It consists of 43 minute-long fully-annotated sequences with 12 action classes. Okutama-Action features many challenges missing in current datasets, including dynamic transition of actions, significant changes in scale and aspect ratio, abrupt camera movement, as well as multi-labeled actors. As a result, our dataset is more challenging than existing ones, and will help push the field forward to enable real-world applications.
Find the training set (with labels) of Okutama-Action in the following link. Find the test set in the following link. The test set labels will not be disclosed but getting the test scores will be possible by uploading the results to CodaLab soon.