-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ch2, line 48, 62, & 77 : don't seem to match book calc #9
Comments
Issue#1: note np.asarray(self.actionCount) + 1 doesn't really change self.actionCount. I add +1 to all action counts to avoid division by 0, but self.actionCount stays unchanged. |
On Issue#1: https://docs.scipy.org/doc/numpy/reference/generated/numpy.asarray.html says 'No copy is performed if the input is already an ndarray''. I tested this on https://www.pythonanywhere.com/try-ipython/, and it actually is changing the array; therefore line 77 is actually a duplicate calc. I run print just before line 77 to see who's right. |
But self.actionCount is a list (line 40) not an ndarray |
My bad, python newbie mistake (sorry) |
No problem. And even it's ndarray, it won't be changed. Because np.asarray(self.actionCount) + 1 simply returns a new ndarray without changing the original one.
and see the value of a and b |
On Issue#2: |
No. This is normal update for action estimation. Action estimation needs to be repaired whenever we want to choose an action. But that repair isn't and shouldn't be lasting, it should be forgotten after having chosen an action. |
Ch2, line62:
Issue#1:
If, line 48, you set self.actionCount to array of 10 zeros,
Then if, line 62, - np.asarray(self.actionCount) + 1 - then adds +1 to the count of EVERY actionCount, not the SPECIFIC actionCount; this seems to be an error because...; note: asarray doesn't copy, it's ref'd
Then, line 77, self.actionCount[action] += 1;
Issue #2:
Also, if, line 62, self.qEst + \
then adds self.UCBParam * np.sqrt(np.log(self.time + 1) / (np.asarray(self.actionCount) + 1)) to EVERY ActionEstimate(qEst), not just the SPECIFIC ActionEstimate
The book does both these calc for each SPECIFIC action, not EVERY action; see page 37, Upper-Confidence-Bound Action Selection
The text was updated successfully, but these errors were encountered: