Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Policy iteration does not work #16

Closed
GoogleCodeExporter opened this issue Dec 15, 2015 · 3 comments
Closed

Policy iteration does not work #16

GoogleCodeExporter opened this issue Dec 15, 2015 · 3 comments

Comments

@GoogleCodeExporter
Copy link

What steps will reproduce the problem?
1. Attempt to use the policy iteration algorithm

What is the expected output? What do you see instead?

Policy iteration should iterate several times before converging to a
solution. Instead, it converges after exactly one iteration.

What version of the product are you using? On what operating system?

The version posted on http://aima.cs.berkeley.edu/python/mdp.html, using
Python 2.6

Please provide any additional information below.

I've attached a fixed version of the file. The only line that changes
is 139:

U[s] = R(s) + gamma * sum([p * U[s1] for (p, s1) in T(s, pi[s])])




Original issue reported on code.google.com by srbur...@gmail.com on 29 Apr 2010 at 5:54

Attachments:

@GoogleCodeExporter
Copy link
Author

[deleted comment]

@GoogleCodeExporter
Copy link
Author

Fixed in r30.

Original comment by wit...@gmail.com on 15 Sep 2011 at 4:19

@GoogleCodeExporter
Copy link
Author

Original comment by wit...@gmail.com on 15 Sep 2011 at 4:20

  • Changed state: Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant