Policy iteration does not work #16

GoogleCodeExporter · 2015-12-15T10:48:12Z

What steps will reproduce the problem?
1. Attempt to use the policy iteration algorithm

What is the expected output? What do you see instead?

Policy iteration should iterate several times before converging to a
solution. Instead, it converges after exactly one iteration.

What version of the product are you using? On what operating system?

The version posted on http://aima.cs.berkeley.edu/python/mdp.html, using
Python 2.6

Please provide any additional information below.

I've attached a fixed version of the file. The only line that changes
is 139:

U[s] = R(s) + gamma * sum([p * U[s1] for (p, s1) in T(s, pi[s])])

Original issue reported on code.google.com by srbur...@gmail.com on 29 Apr 2010 at 5:54

Attachments:

mdp.py

The text was updated successfully, but these errors were encountered:

GoogleCodeExporter · 2015-12-15T10:48:12Z

[deleted comment]

GoogleCodeExporter · 2015-12-15T10:48:12Z

Fixed in r30.

Original comment by wit...@gmail.com on 15 Sep 2011 at 4:19

GoogleCodeExporter · 2015-12-15T10:48:12Z

Original comment by wit...@gmail.com on 15 Sep 2011 at 4:20

Changed state: Fixed

GoogleCodeExporter added Priority-Medium auto-migrated Type-Defect labels Dec 15, 2015

GoogleCodeExporter closed this as completed Dec 15, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Policy iteration does not work #16

Policy iteration does not work #16

GoogleCodeExporter commented Dec 15, 2015

GoogleCodeExporter commented Dec 15, 2015

GoogleCodeExporter commented Dec 15, 2015

GoogleCodeExporter commented Dec 15, 2015

Policy iteration does not work #16

Policy iteration does not work #16

Comments

GoogleCodeExporter commented Dec 15, 2015

GoogleCodeExporter commented Dec 15, 2015

GoogleCodeExporter commented Dec 15, 2015

GoogleCodeExporter commented Dec 15, 2015