Reinforcement Learning Basketball
Robot Arm learning basketball using NFQ and Q-Learning
Methods
Q-Learning (Q Table)
Q-learning is slightly rudimentary but it had quite successful results. It uses a table of all possible states and uses the following equuation to explore and discover an optimal policy:
NFQ (Neural Fitted Q-Iteration)
NFQ uses a neural network to learn the Q values.
First a bunch of data is created using a random policy. Then a 2-layer neural net using PyTorch and a RPROP optimizer is created. Training was done target seen in the algorithm below.
Conclusions and Future
There were a lot of issues and there are still are. Firstly and most easily fixed is to change to a dynamic alpha and exploration value (epsilon) in the q-table variant. Much much more work can be done on the nfq side to create a better structured neural net as well tune the other various parameters.