Simulated Environment with Unity for Robust Object Detection
This research project was my Master's Thesis @ Budapest University of Technology and Economics. 👨🏻🎓 The project was awarded with scholarship by the Artificial Intelligence National Laboratory (MILAB) of Hungary
The main focus of deep learning research is the development of self-supervised learning methods, where the agent completes different detection tasks by understanding his environment, and not by relying on labeled training examples. A typical usage of self-supervised learning methods is to substitute obscured details from images or to estimate the impact of actions taken by robots. The main goal of the following thesis is to create an algorithm that is able to recognize different objects and their relevant visual properties in a simulated environment and to estimate the effect of different actions on them. In this thesis, I will not only present such a robust object detection algorithm, but also create a simulated industrial environment capable of generating the data needed for training quickly and easily.
The environment consists of two main part:
The Object detection algorithm can be found here: Robust Object Detection in Simulated Industrial Environment
Thia repository contains a simulated industrial environment that can be used to generate images and metadata for the deep learning algorithm. The simulation should also be modifiable and capable of generating randomized scenarios sufficient to create diverse train dataset. The simulated environment is made with Unity Engine.
The simulation contains different scenarios; therefore, the simulated industrial building will have three different parts:
- A conveyor belt with different small objects on it like barrels or tools.
- A workshop for larger items, such as cars and industrial platforms.
- And a floor that functions as an office.
Different robotic arms stand at the two sides of the conveyor belt and at the workshop. In the initial setting, there are 67 different objects. First, the 3D models were downloaded from the Asset Store or other third-party providers. Then, they are imported into the project and saved with the required features as Prefabs.
Screenshots from the simulation are saved periodically after every few frames in PNG format with the marked 2D bounding boxes around the objects. Moreover, metadata is also saved, containing the ground truth information for the Deep Learning algorithm, including bounding box position, class label, and 3D information about the objects’ position in the simulation space.
YOLO traditionally expects the annotation file to be in the following format in a .txt named the same as the image it belongs to:
- is the index of the object class (from 0 to #classes-1)
- and are the center of the rectangles relative to the to the width and height of the image (0.0 to 1.0]
- and are the width and height of the bounding box relative to the width and height of the image