Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question #12

Closed
shivajid opened this issue Sep 16, 2017 · 3 comments
Closed

Question #12

shivajid opened this issue Sep 16, 2017 · 3 comments

Comments

@shivajid
Copy link

Thanks for the sample code - I had a question on this
q_table.ix[S, A] += ALPHA * (q_target - q_predict) # update

Why subtract q_predict from q_target. The q_target should be good enough. I am confused about the use of q_predict in the above formula.
Should something like this not suffice?
q_table.ix[S, A] += ALPHA * (q_target)

@MorvanZhou
Copy link
Owner

Hi, think in the following way may help. If we have a positive value for q target, the q table will also add some value at the end. So the values in q table will somehow explode.

@shivajid
Copy link
Author

Do you mean large positive values or just positive values. If the values are between 0 to 1, then it may not.
But later I saw that you confirmed to Qdash(S,A) - Q(S,A).

In the very example q_table.ix[S, A] += ALPHA * (q_target) is working well and converges faster. Would be interesting to understand when you could end up in blow out.

Thanks for responding to the question though.

@MorvanZhou
Copy link
Owner

It will show the right behaviours in this example, but it will never coverge. Actrally, no matter the value's sign, any value will give you an unconverged but right behaviour result.

If you keep running the script in your way, you will find your q table will exceed its capacity to hold one values, it may show NaN at the end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants