In this project, we are interested in implementing Q-learning as an extension to PA3. Similar to MDP, we want to find an optimal policy for each grid in a given map. However, this time probabilistic models are not known and have to be learned. This can be a common case where robot has no idea how accurate its movements are and Q-learning ensures the optimization of the final policy list even when robot takes undesired movements in some iterations.
python learning.py
by default, the algorithm will run with configuration.json
. To apply the other configuration(s), simply rename.
*For the sake of clarity and simplicity, we did not use ROS for simulation. *