Ch2, line 48, 62, & 77 : don't seem to match book calc #9

atki4564 · 2016-11-25T22:54:59Z

Ch2, line62:

Issue#1:
If, line 48, you set self.actionCount to array of 10 zeros,
Then if, line 62, - np.asarray(self.actionCount) + 1 - then adds +1 to the count of EVERY actionCount, not the SPECIFIC actionCount; this seems to be an error because...; note: asarray doesn't copy, it's ref'd
Then, line 77, self.actionCount[action] += 1;

Issue #2:
Also, if, line 62, self.qEst + \
then adds self.UCBParam * np.sqrt(np.log(self.time + 1) / (np.asarray(self.actionCount) + 1)) to EVERY ActionEstimate(qEst), not just the SPECIFIC ActionEstimate

The book does both these calc for each SPECIFIC action, not EVERY action; see page 37, Upper-Confidence-Bound Action Selection

ShangtongZhang · 2016-11-25T23:04:08Z

Issue#1: note np.asarray(self.actionCount) + 1 doesn't really change self.actionCount. I add +1 to all action counts to avoid division by 0, but self.actionCount stays unchanged.
Issue#2: From my view, firstly we get the original estimation for every action, then we need to repair every action estimation per formula 2.8 in book. Then choose the action with maximum estimation.

atki4564 · 2016-11-25T23:13:10Z

On Issue#1: https://docs.scipy.org/doc/numpy/reference/generated/numpy.asarray.html says 'No copy is performed if the input is already an ndarray''. I tested this on https://www.pythonanywhere.com/try-ipython/, and it actually is changing the array; therefore line 77 is actually a duplicate calc. I run print just before line 77 to see who's right.

ShangtongZhang · 2016-11-25T23:16:32Z

But self.actionCount is a list (line 40) not an ndarray

atki4564 · 2016-11-25T23:17:36Z

My bad, python newbie mistake (sorry)

ShangtongZhang · 2016-11-25T23:21:20Z

No problem. And even it's ndarray, it won't be changed. Because np.asarray(self.actionCount) + 1 simply returns a new ndarray without changing the original one.
Try

a = np.zeros(4)
b = np.asarray(a) + 1

and see the value of a and b

atki4564 · 2016-11-25T23:34:10Z

On Issue#2:
then line 90,
else:
# update estimation with constant step size
self.qEst[action] += 0.1 * (reward - self.qEst[action])
is duplicate because you already 'repaired every action estimation per formula' in line 62?

ShangtongZhang · 2016-11-25T23:37:42Z

No. This is normal update for action estimation. Action estimation needs to be repaired whenever we want to choose an action. But that repair isn't and shouldn't be lasting, it should be forgotten after having chosen an action.

ShangtongZhang closed this as completed Nov 25, 2016

ShangtongZhang added the invalid label Nov 25, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ch2, line 48, 62, & 77 : don't seem to match book calc #9

Ch2, line 48, 62, & 77 : don't seem to match book calc #9

atki4564 commented Nov 25, 2016 •

edited

ShangtongZhang commented Nov 25, 2016

atki4564 commented Nov 25, 2016 •

edited

ShangtongZhang commented Nov 25, 2016

atki4564 commented Nov 25, 2016

ShangtongZhang commented Nov 25, 2016

atki4564 commented Nov 25, 2016

ShangtongZhang commented Nov 25, 2016

Ch2, line 48, 62, & 77 : don't seem to match book calc #9

Ch2, line 48, 62, & 77 : don't seem to match book calc #9

Comments

atki4564 commented Nov 25, 2016 • edited

ShangtongZhang commented Nov 25, 2016

atki4564 commented Nov 25, 2016 • edited

ShangtongZhang commented Nov 25, 2016

atki4564 commented Nov 25, 2016

ShangtongZhang commented Nov 25, 2016

atki4564 commented Nov 25, 2016

ShangtongZhang commented Nov 25, 2016

atki4564 commented Nov 25, 2016 •

edited

atki4564 commented Nov 25, 2016 •

edited