Question #12

shivajid · 2017-09-16T23:58:09Z

Thanks for the sample code - I had a question on this
q_table.ix[S, A] += ALPHA * (q_target - q_predict) # update

Why subtract q_predict from q_target. The q_target should be good enough. I am confused about the use of q_predict in the above formula.
Should something like this not suffice?
q_table.ix[S, A] += ALPHA * (q_target)

MorvanZhou · 2017-09-17T01:10:05Z

Hi, think in the following way may help. If we have a positive value for q target, the q table will also add some value at the end. So the values in q table will somehow explode.

shivajid · 2017-09-17T23:56:02Z

Do you mean large positive values or just positive values. If the values are between 0 to 1, then it may not.
But later I saw that you confirmed to Qdash(S,A) - Q(S,A).

In the very example q_table.ix[S, A] += ALPHA * (q_target) is working well and converges faster. Would be interesting to understand when you could end up in blow out.

Thanks for responding to the question though.

MorvanZhou · 2017-09-18T01:15:54Z

It will show the right behaviours in this example, but it will never coverge. Actrally, no matter the value's sign, any value will give you an unconverged but right behaviour result.

If you keep running the script in your way, you will find your q table will exceed its capacity to hold one values, it may show NaN at the end

MorvanZhou closed this as completed Nov 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question #12

Question #12

shivajid commented Sep 16, 2017

MorvanZhou commented Sep 17, 2017

shivajid commented Sep 17, 2017

MorvanZhou commented Sep 18, 2017

Question #12

Question #12

Comments

shivajid commented Sep 16, 2017

MorvanZhou commented Sep 17, 2017

shivajid commented Sep 17, 2017

MorvanZhou commented Sep 18, 2017