New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
np.argmax may lead to unexpected behavior #51
Comments
I just found this problem as well. max_actions = np.argwhere(values==np.amax(values))
action = np.random.choice(max_actions.flatten()) |
@zwfcrazy It looks cool! Did you find any bug in the repo, or I have fixed them all? |
@ShangtongZhang Not yet. I am still reading the book...slowly...I am glad to see that Prof. Sutton has finished the draft! BTW, perhaps we could try to improve the efficiency of the simulations...Exercise 2.9 of chapter 2 requires running the parameter study for 200k steps, which takes days to complete... |
@ShangtongZhang Hi I just found you didn't fix the argmax problem in chapter 2. |
@zwfcrazy Did it lead to some bugs? I didn't mean to replace all the |
@ShangtongZhang the simulation results seem no difference, but the simple bandit algorithm requires breaking ties randomly (see p24 of the complete draft). I think it may slow down exploration at the beginning since we assume initial estimates of Q(a) are the same for all actions. This won't be critical later on as ties happen rarely. |
One issue of
np.argmax
is that it always return the first index even if there is many maximal values. Instead we should useAt the very beginning I implemented my own
argmax
inutils
like this. However later onutils
was removed due to much complaint ofimport error
and at that time I switched back tonp.argmax
. Now it turns out to be problematic especially for some tabular examples. It may lead to infinite loop.I fixed it if it's used for behavior policy as I think it's the only case
np.argmax
will lead to problem. However I still leave this issue open to give some hint if you find some strange bug.The text was updated successfully, but these errors were encountered: