-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gambler's problem #39
Comments
The book mentioned that
I just wonder how you manage to make the policy figure exactly same as the book. It will be much appreciated if you could briefly talk about how you select actions if there is a tie? |
Under the Figure 4.3 it states: "In particular, for capital of 50 it bets it all on one flip, but for capital of 51 it does not." But this is misleading, since it is perfectly fine, i.e. optimal to bet 49 while in state 51. And:
you are right. I overlooked this remark and focussed on the caption of the figure and the remark in your code. Your code doesn't reproduce the policy plotted in the book, but you still obtain a different optimal policy. Regarding implementation: However, the arg max when establishing the greedy policy is often not unique. That means, if you notice that the maximum occurs for multiple actions, all these actions can be assigned a non-zero probability in a greedy policy. In my implementation, the class Below, are the greedy actions for each state. (I hope that it's correct...)
|
Oh got it, you just plot all the actions if there is a tie rather than just randomly select one. It makes the figure more good-looking. Thanks for your reply! |
I thank you for your awesome code and repository! |
…tion Fix value prediction in A3C
there is a mistake in the book:
for the gambler's problem in Chapter 4, there is no unique optimal policy.
The plot of optimal actions is missing other equally valuable actions.
For example
There are no floating point problems in your implementation.
The correct solution (i hope) is displayed in the images on the page
https://github.com/idsc-frazzoli/subare
Suggestion: prohibit the gambler to bet an amount of 0 as long as his/her cash is between 0 and 100. This will speed up the convergence.
The text was updated successfully, but these errors were encountered: