Project 1: Navigation

Introduction

For this project, you will train an agent to navigate (and collect bananas!) in a large, square world.

A reward of +1 is provided for collecting a yellow banana, and a reward of -1 is provided for collecting a blue banana. Thus, the goal of your agent is to collect as many yellow bananas as possible while avoiding blue bananas.

The state space has 37 dimensions and contains the agent's velocity, along with ray-based perception of objects around agent's forward direction. Given this information, the agent has to learn how to best select actions. Four discrete actions are available, corresponding to:

0 - move forward.
1 - move backward.
2 - turn left.
3 - turn right.

The task is episodic, and in order to solve the environment, your agent must get an average score of +13 over 100 consecutive episodes.

Quick start

Run python3 -m pip install .
Re-run every cells in Navigation_solution.ipynb

Solution!!!!

During setting up environment, I modified a little bit the python dependencies.

PyTorch has been uploaded to v2.0.0 with CUDA 11.6, to make used of my RTX 3070Ti Laptop.
grpcio has been up loaded to v1.53.0

During training with the original environent, my Ubuntu machine kept crashing; therefore, I switched to headless version.
Policy used:

To replace the original uniform policy in the notebook, I used a DQN agent, located in agent folder, combining of 2 Q networks to learn a policy from 2000 episodes.
The DQN agent was modified from the one used in Lunar Lander, but with state being a (37,1) vector and action being a (4,1) vector.
Here is the plot of the scores achieved after each episode.

Training the DQN Agent:

By using 2 Q-Networks at the same time, my GPU consumption is
Here is the plot of the scores achieved after each episode.
My agent reached an avg score of 15.38 after 2000 episodes.
The checkpoint of the last episode is saved in checkpoint_last.pth
The checkpoint of the episode that has mean score >= 200.0 is saved in checkpoint_best.pth

Future plan:

Integrate wandb into training pipeline to have better visualization with different hyper-parameters (epsion, hidden units in the Q-nets).
Dockerization for better setup and packaging.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.ipynb_checkpoints		.ipynb_checkpoints
agent		agent
assets		assets
.gitignore		.gitignore
Dockerfile		Dockerfile
Navigation.ipynb		Navigation.ipynb
Navigation_Pixels.ipynb		Navigation_Pixels.ipynb
Navigation_solution.ipynb		Navigation_solution.ipynb
README.md		README.md
Report.html		Report.html
checkpoint_last.pth		checkpoint_last.pth
requirements.txt		requirements.txt
setup.py		setup.py
unity-environment.log		unity-environment.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project 1: Navigation

Introduction

Quick start

Solution!!!!

About

Releases

Packages

Languages

minhna1112/banana-dqn-brain

Folders and files

Latest commit

History

Repository files navigation

Project 1: Navigation

Introduction

Quick start

Solution!!!!

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages