Grid World Value Iteration

This project involves creating a grid world environment and applying value iteration to find the optimum policy. Below is the value iteration pseudocode that was programmed and tested (Reinforcement Learning, Sutton & Barto, 2018, pp. 83).

The state space of the grid world was represented using an array of length 25 (NxM) with the index system as shown below.

In the grid world it is possible to have two different items, fire and water. The rewards for collecting fire was set to -10 and the reward for collecting water set to 10. Both the fire and water are terminal states. Also, for each step performed by the agent a reward of -1 is received.

Deterministic Environment

It was first decided to implement a deterministic environment as it is easier to check is the solution is correct. The two images below show the Values and Policy derived for this environment, with orange representing fire and blue representing water.

Stochastic Environment

For the stochastic environment there is a probability of 0.7 that the agents moves as intended. This means there is a probability of 0.3 a move will be randomly selected. Below shows the Values and Policy derived for this stochastic environment.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
deterministic.py		deterministic.py
stochastic.py		stochastic.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Grid World Value Iteration

Deterministic Environment

Stochastic Environment

About

Releases

Packages

Languages

mbodenham/gridworld-value-iteration

Folders and files

Latest commit

History

Repository files navigation

Grid World Value Iteration

Deterministic Environment

Stochastic Environment

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages