drl_project_final

Training Progress in Maze environment:

0 training episodes: Agent acts randomly, has no notion of goal states and collides with obstacles multiple times.

50 training episodes: Agents learns to avoid obstacles, but doesn't know that reaching the goal state is more rewarding.

100 training episodes: Agent learns to reach goal state quickly, but collides with obstacles on the way.

200 training episodes: Agent learns to trade-off collision and time to reach the goals state. The currect policy seems to be close to optimal human behaviour.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
QD-learning		QD-learning
dqn_dxyang		dqn_dxyang
drecurrent_qnet		drecurrent_qnet
sample_effient_net		sample_effient_net
space_invaders_gaurav		space_invaders_gaurav
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

drl_project_final

About

Releases

Packages

Languages

sha2nkt/QD_learning

Folders and files

Latest commit

History

Repository files navigation

drl_project_final

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages