Skip to content

Investigating Transfer Learning in Modern Deep Reinforcement Learning Architectures

Notifications You must be signed in to change notification settings

jmdudek/DRL-for-SuperMarioBros

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About this Repository

This repository contains the code and paper to our (Gerrit Bartels, Thorsten Krause and Jacob Dudek) project in Deep Reinforcment Learning. We investigated benefits of Transfer Learning regarding Soft Q-Networks and Double Deep Q-Networks, as well as an alledged relationship between overfitting and negative transfer. We propose and executed an experimental setup, provide a ready-to-use implementation and identified and put forth major challenges that future research can build upon.

Currently under construction. Please try reloading until the project is finished.


The Environment

As test bed we used the popular NES game "Super Mario Bros.". The game consists of 32 levels in which the player has to control Super Mario through a parkour of obstacles by choosing from 256 distinct actions. We relied on a ready-to-use implementation that can be found here.

For transfer learning we chose level 1-1 (left) as the source and level 1-2 (right) as the target domain. Below are exemplary scenes from both levels.

Preprocessing

The agents received the game state as a normalized, rescaled 84x84 grey-scale picture and drew from a restricted action space of five actions: (1) idle, (2) move right, (3) jump right, (4) move right and throw a fire ball, (5) jump right and throw a fireball. As consecutive frames are highly correlated, we accelerated training by repeating each action over four frames and passing the corresponding states as a stacked 4x84x84 image.


Model Architecture

The following figure visualizes our CNN backbone architecture employed in both our DDQN and SoftQN.


Performance on Level 1-1

DDQN:

eval_video_DDQN_1-1.mp4

SoftQN:

eval_video_SOFTQ_1-1.mp4


Performance on Level 1-2

Note that with the label "untrained" we are referring to models that have been trained from scratch on Level 1-2 without any knowledge transfer.

DDQN:

eval_video_DDQN_1-2_untrained.mp4
eval_video_DDQN_1-2_transfer_all_wr35.mp4

SoftQN:

eval_video_SOFTQ_1-2_untrained.mp4
eval_video_SOFTQ_1-2_transfer_all_wr50.mp4


Video Presentation

Click here to get to a video presentation of our project held by Thorsten Krause.