Skip to content

Latest commit

 

History

History
94 lines (59 loc) · 3.83 KB

File metadata and controls

94 lines (59 loc) · 3.83 KB

Navigation

Project 1 of Deep Reinforcement Learning Nanodegree

The model used to generate this gif is final.pth (Dueling Double DQN), which was trained for 700 episodes using main.py.

Overview

The environment for this project is Banana from Unity and it is provided in the setup folder. This repository contains an implementation of the original DQN algorithm (although not directly from pixels) and two variants, Double Q-Learning and Dueling DQN.

For details on the implementation and comparison between the models see the report. Alternatively, you can find some pre-trained models under models/ and the source code in main.py and code/.

Environment

The agent is placed in a 3D room filled with yellow and blue bananas. The goal is to pick up as many yellow bananas as possible while avoiding the blue ones.

The state space has 37 dimensions and contains the agent's velocity, along with ray-based perception of objects around agent's forward direction.

At each timestep, the agent can take one of four actions:

  • 0, move forward
  • 1, move backward
  • 2, turn left
  • 3, turn right

The reward function gives +1 and -1 for picking up yellow and blue bananas, respectively. If no banana is picked up, the reward is zero.

The task is episodic and is considered solved when the agent gets an average score of +13 over 100 consecutive episodes.

Getting started

Note that this was tested in macOS only

Requirements

You'll need conda to prepare the environment and execute the code.

Other resources are already available in this repository under setup/, so you can simply clone it.

git clone https://github.com/francescotorregrossa/deep-reinforcement-learning-nanodegree.git
cd deep-reinforcement-learning-nanodegree/p1-navigation

Optionally, you can install jupyter if you want to work on the report notebook.

Create a conda environment

This will create an environment named p1_navigation and install the required libraries.

conda create --name p1_navigation python=3.6
conda activate p1_navigation
unzip setup.zip
pip install ./setup

Watch a pre-trained agent

You can use main.py to watch an agent play the game. The provided model final.pth is a Dueling Double DQN with uniform replay buffer.

python main.py

If you want to try another configuration, you can use one of the files under model/ but note that you might also need to change this line in main.py.

Train an agent from scratch

You can also use main.py to train a new agent. Again, if you want to change the configuration you have to update this line. You'll find other classes and functions in the code/ folder. The report also contains useful functions for plotting results with matplotlib.

python main.py -t

Note that this script will override final.pth.

Open the report with jupyter

python -m ipykernel install --user --name p1_navigation --display-name "p1_navigation"
jupyter notebook

Make sure to set the kernel to p1_notebook after you open the report.

Uninstall

conda deactivate
conda remove --name p1_navigation --all
jupyter kernelspec uninstall p1_navigation