Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

METU-ALET: A Dataset for Tool Detection in the Wild

ALET is an abbreviation for Autameted Labeling of Equipment and Tools. In Turkish, it also stands for the word “tool”.

METU-ALET is an image dataset for the detection of the tools in the wild. We provide an extensive dataset in order to detect tools that belongs to the categories such as farming, gardening, office, stonemasonry, vehicle, woodworking and workshop. The images in the dataset contains a total of 15612 bounding boxes and 53 different tool categories.

METU-ALET Dataset at a Glance

Most of the scenes that reside in the dataset are not generated or constructed; they are snapshots of existing environments with or without humans using the tools. The rest of the scenes are generated; therefore, these scenes are considered as synthetic data.

The images that reside in the METU-ALET dataset can be split into three categories:

1) Downloaded and Crawled Data

These are the images that are downloaded and crawled from the Internet. The websites that are used to gather the images are the following: Creativecommons, Wikicommons, Flickr, Pexels, Unsplash, Shopify, Pixabay, Everystock, Imfree. It should also be noted that while crawling and downloading images from these websites, license issues had also been considered. Therefore, we have only taken the royalty-free images from these websites. These type of images in the dataset contains 9262 bounding boxes.

2) Photographed Data

These are the images that are photographed by ourselves that are mostly consisting of office and workshop scenes. The images in the METU-ALET dataset contains 820 bounding boxes related to this type of data.

3) Synthetic data

The images that are called synthetic are the ones that are generated by us. In order to construct synthetic data, we have first photographed several background images that belongs to different contexts. Then, for each class of tools, we have downloaded royalty-free transparent images, performed a set of random transformations on them, and finally, placed these transparent images of tools on top of the background images. In the METU-ALET dataset, the scenes that are considered as synthetic data contains 5530 bounding boxes.

Why need METU-ALET dataset?

Through the recent advancements in the field of robotics, we have come to a point where humans and robots will be performing tasks in a collaborative manner. By the help of this dataset, we aim to solve the object detection tasks where robots will be able to detect tools that can be grabed or carried by them. As the definition of a tool is too broad, it had been decided to consider only the tools for the dataset which can be manipulated by the robots.

The Aim of the METU-ALET dataset

As the datasets that are used in the field of robotics consider only a limited number of categories and instances, and they mainly focus on detection of tool affordances, they are not suitable for training a deep object detector. With METU-ALET, we introduce a dataset which consists of real-life scenes where the tools are unorganized as much as possible, and where the tools can be found in their natural habitat or while humans are using them.

The scenes that we consider also introduce several challenges for the object detection task, such as including the small scale of the tools, their articulated nature, occlusion and inter-class invariance.

Annotated Samples

Tool Detection with Deep Networks

For tool detection in the wild, we trained several state of the art deep object detectors, namely, Faster R-CNN, SSD, YOLO and RetinaNet. The following table summarizes the mAP results for each one of the deep object detectors.

“Wearing Helmet?”: A Critical and Practical Usage of the METU-ALET Dataset

“Wearing Helmet?” is a usecase for METU-ALET, through which we are able to judge the safeness of the enviroments where the robots and humans work collaboratively. As the dataset contains 1037 instances of safety helmet, this information is used in order to train a tool detector and a human detector & pose estimator. By the help of these detectors, the robots are able to determine whether the human in the scene wears a safety helmet or not. Therefore, problems related to the safety issues in critical scenes can be solved accordingly.


If you use the METU-ALET dataset or the related resources shared here, please cite the following work:

    author = {Kurnaz, F. C. and Hocaoglu, B. and Yilmaz, K. M. and Sulo, I. and Kalkan, S.},
    title = {ALET (Automated Labeling of Equipment and Tools): A Dataset, a Baseline and a Usecase for Tool Detection in the Wild},
    journal = {arxiv preprint},
    year = {2019}


For questions or comments please contact Sinan Kalkan at skalkan [@] or visit

You can’t perform that action at this time.