-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question #12
Comments
Hi, think in the following way may help. If we have a positive value for q target, the q table will also add some value at the end. So the values in q table will somehow explode. |
Do you mean large positive values or just positive values. If the values are between 0 to 1, then it may not. In the very example q_table.ix[S, A] += ALPHA * (q_target) is working well and converges faster. Would be interesting to understand when you could end up in blow out. Thanks for responding to the question though. |
It will show the right behaviours in this example, but it will never coverge. Actrally, no matter the value's sign, any value will give you an unconverged but right behaviour result. If you keep running the script in your way, you will find your q table will exceed its capacity to hold one values, it may show NaN at the end |
Thanks for the sample code - I had a question on this
q_table.ix[S, A] += ALPHA * (q_target - q_predict) # update
Why subtract q_predict from q_target. The q_target should be good enough. I am confused about the use of q_predict in the above formula.
Should something like this not suffice?
q_table.ix[S, A] += ALPHA * (q_target)
The text was updated successfully, but these errors were encountered: