AI agent learning to play pong using policy descent, actor critic and other approaches.
A PDF report of this project is available here.
Methods load_model
and save_model
are implemented in
the agent interface.
The method reset
needs to have the same signature as
the as the agent's __init__
to avoid code duplication.
Then call reset()
from __init__
and write inside reset
what you
would have written in your __init__
.
The method get_action
needs to be implemented in each
agent child.
Like this (choose the options to your taste):
PYTHONPATH=. python demotivational_policy_descent/tests/run_policy_gradient.py --combine --preprocess --red --cuda
agents/
contains the agentsenvs/
contains the Pong environmenttest
contains tests to run test the agents (that's what you run)utils/io
egocentric I/O librarysave_models/
contains saved models, if it doesn't, you're not working enough
- Up ->
self.env.MOVE_UP
, 1 - Stay ->
self.env.STAY
, 0 - Down ->
self.env.MOVE_DOWN
, 2
The function env.reset()
returns a frame of the game as a couple of
mirrored frames. This is to have the frame in the same way for each player.
The function env.step(...)
returns the couple of frames, a couple of
rewards, done and info.
- Karpathy Pong with Policy Gradient: https://karpathy.github.io/2016/05/31/rl/