Skip to content

sha2nkt/QD_learning

Repository files navigation

drl_project_final

Training Progress in Maze environment:

0 training episodes: Agent acts randomly, has no notion of goal states and collides with obstacles multiple times.
Alt Text

50 training episodes: Agents learns to avoid obstacles, but doesn't know that reaching the goal state is more rewarding.
Alt Text

100 training episodes: Agent learns to reach goal state quickly, but collides with obstacles on the way.
Alt Text

200 training episodes: Agent learns to trade-off collision and time to reach the goals state. The currect policy seems to be close to optimal human behaviour.
Alt Text

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages