Skip to content

Convolutional Neural Network generalization for visual control of drones, through network interpretability and advanced image augmentation techniques. Person masking for background replacement to perform data augmentation during training.

Notifications You must be signed in to change notification settings

mferri17/thesis-master

Repository files navigation

Master Thesis | Interpretation of Neural Networks and Advanced Image Augmentation for Visual Control of Drones in Human Proximity

  • Master of Science in Computer Science at Università della Svizzera Italiana (USI)
  • Double Degree with University of Milano-Bicocca, Academic Year 2019/2020

Collaboration with the Robotics team at IDSIA for the application of Deep Learning Interpretability and Advanced Image Augmentation techniques to improve an existing Convolutional Neural Network (CNN) for Visual Control of Drones.

Deliverables

Abstract

We consider the task of predicting the pose of a person moving in front of a drone, using the input images coming from an on-board camera. We aim to improve a machine learning model designed for this intent [Mantegazza et al., 2019]. The approach relies on supervised learning to perform a regression on the user’s pose through a Residual Neural Network. The training data is collected in a dedicated drone arena, using a Motion Capture system to acquire the ground truth. The prototype achieves good performance inside the arena but cannot fulfill its duty in unknown environments.

First, we understand the main issues of the learned task through network interpretation. Applying Grad-CAM [Selvaraju et al., 2019], we observe that the model not only focus on the user who is actually facing the drone’s camera. Instead, various portions of the input images are considered when the model makes its predictions. We assume that the neural network has undesirably learned some details about the drone arena in which the dataset has been collected.

As a solution, we develop an advanced data augmentation technique designed for enhancing the generalization capabilities of the model. The goal is to break the relationship between the model’s learning and the training room. Our approach consists on modifying the original dataset through images’ background replacement, allowing us to simulate data collected in many different environments. The implementation is done by using Mask R-CNN [He et al., 2018] to compute the user’s mask from each sample in the original training set. Then, we use the computed masks for replacing the background of the corresponding images and retraining the neural network. We run experiments that show that our proposal is successful both from quantitative and qualitative viewpoints. The new model, trained on the augmented dataset, produces satisfactory results in a large variety of real-world scenarios.

About

Convolutional Neural Network generalization for visual control of drones, through network interpretability and advanced image augmentation techniques. Person masking for background replacement to perform data augmentation during training.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published