Skip to content
Pong Game problem solving using RL - Policy Gradient with OpenAI Gym Framework and Tensorflow
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Policy Gradient Implementation Using Gym and Tensorflow

Policy gradient network is implemented using popular atari game, Pong Game. "Policy gradients method involves running a policy for a while, seeing what actions lead to high rewards, increasing their probability through backpropagating gradients".


If there is a large scale problems which is aimed to solve, a type of function approximator should be used. In this problem, a neural network is used as function approximator. There are too many states and/or actions to store in memory, so look up table can not be used.


Andrej Karpathy (Deep Reinforcement Learning: Pong from Pixels):

Policy Gradient Neural Network, based on Andrej’s solution, will do:

  • take in images from the game and "preprocess" them (remove color, background, etc).
  • use the TF NN to compute a probability of moving up or down.
  • sample from that probability distribution and tell the agent to move up or down.
  • if the round is over, find whether you won or lost.
  • when the episode has finished, pass the result through the backpropagation algorithm to compute the gradient for weights.
  • after each episodes have finished, sum up the gradient and move the weights in the direction of the gradient.
  • repeat this process until weights are tuned to the point.

PongGame Experiment Results

After a period time, scores are getting better.



After 2 days running, system is learned and starting to beat opponent. Last saved checkpoint which is learned after 2days is committed in the checkpoint folder. When starting code in your environment, if there is a checkpoint point folder, it will be loaded..




Policy Gradients Method:

Policy Gradients from David Silver:

Pong Game Open AI Gym:

Open AI Gym:

You can’t perform that action at this time.