Master Thesis | Interpretation of Neural Networks and Advanced Image Augmentation for Visual Control of Drones in Human Proximity

Master of Science in Computer Science at Università della Svizzera Italiana (USI)
Double Degree with University of Milano-Bicocca, Academic Year 2019/2020

Collaboration with the Robotics team at IDSIA for the application of Deep Learning Interpretability and Advanced Image Augmentation techniques to improve an existing Convolutional Neural Network (CNN) for Visual Control of Drones.

Deliverables

Abstract

We consider the task of predicting the pose of a person moving in front of a drone, using the input images coming from an on-board camera. We aim to improve a machine learning model designed for this intent [Mantegazza et al., 2019]. The approach relies on supervised learning to perform a regression on the user’s pose through a Residual Neural Network. The training data is collected in a dedicated drone arena, using a Motion Capture system to acquire the ground truth. The prototype achieves good performance inside the arena but cannot fulfill its duty in unknown environments.

First, we understand the main issues of the learned task through network interpretation. Applying Grad-CAM [Selvaraju et al., 2019], we observe that the model not only focus on the user who is actually facing the drone’s camera. Instead, various portions of the input images are considered when the model makes its predictions. We assume that the neural network has undesirably learned some details about the drone arena in which the dataset has been collected.

As a solution, we develop an advanced data augmentation technique designed for enhancing the generalization capabilities of the model. The goal is to break the relationship between the model’s learning and the training room. Our approach consists on modifying the original dataset through images’ background replacement, allowing us to simulate data collected in many different environments. The implementation is done by using Mask R-CNN [He et al., 2018] to compute the user’s mask from each sample in the original training set. Then, we use the computed masks for replacing the background of the corresponding images and retraining the neural network. We run experiments that show that our proposal is successful both from quantitative and qualitative viewpoints. The new model, trained on the augmented dataset, produces satisfactory results in a large variety of real-world scenarios.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
contents		contents
deliverable		deliverable
other		other
presentation		presentation
.gitignore		.gitignore
README.md		README.md
biblio.bib		biblio.bib
marcoferri-masterthesis.ist		marcoferri-masterthesis.ist
marcoferri-masterthesis.pdf		marcoferri-masterthesis.pdf
marcoferri-masterthesis.tex		marcoferri-masterthesis.tex
usiinfthesis.cls		usiinfthesis.cls

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Master Thesis | Interpretation of Neural Networks and Advanced Image Augmentation for Visual Control of Drones in Human Proximity

Deliverables

Abstract

About

Releases

Packages

Languages

mferri17/thesis-master

Folders and files

Latest commit

History

Repository files navigation

Master Thesis | Interpretation of Neural Networks and Advanced Image Augmentation for Visual Control of Drones in Human Proximity

Deliverables

Abstract

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages