Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ch2, line 48, 62, & 77 : don't seem to match book calc #9

Closed
atki4564 opened this issue Nov 25, 2016 · 7 comments
Closed

Ch2, line 48, 62, & 77 : don't seem to match book calc #9

atki4564 opened this issue Nov 25, 2016 · 7 comments
Labels

Comments

@atki4564
Copy link

atki4564 commented Nov 25, 2016

Ch2, line62:

Issue#1:
If, line 48, you set self.actionCount to array of 10 zeros,
Then if, line 62, - np.asarray(self.actionCount) + 1 - then adds +1 to the count of EVERY actionCount, not the SPECIFIC actionCount; this seems to be an error because...; note: asarray doesn't copy, it's ref'd
Then, line 77, self.actionCount[action] += 1;

Issue #2:
Also, if, line 62, self.qEst + \
then adds self.UCBParam * np.sqrt(np.log(self.time + 1) / (np.asarray(self.actionCount) + 1)) to EVERY ActionEstimate(qEst), not just the SPECIFIC ActionEstimate

The book does both these calc for each SPECIFIC action, not EVERY action; see page 37, Upper-Confidence-Bound Action Selection

@ShangtongZhang
Copy link
Owner

Issue#1: note np.asarray(self.actionCount) + 1 doesn't really change self.actionCount. I add +1 to all action counts to avoid division by 0, but self.actionCount stays unchanged.
Issue#2: From my view, firstly we get the original estimation for every action, then we need to repair every action estimation per formula 2.8 in book. Then choose the action with maximum estimation.

@atki4564
Copy link
Author

atki4564 commented Nov 25, 2016

On Issue#1: https://docs.scipy.org/doc/numpy/reference/generated/numpy.asarray.html says 'No copy is performed if the input is already an ndarray''. I tested this on https://www.pythonanywhere.com/try-ipython/, and it actually is changing the array; therefore line 77 is actually a duplicate calc. I run print just before line 77 to see who's right.

@ShangtongZhang
Copy link
Owner

But self.actionCount is a list (line 40) not an ndarray

@atki4564
Copy link
Author

My bad, python newbie mistake (sorry)

@ShangtongZhang
Copy link
Owner

No problem. And even it's ndarray, it won't be changed. Because np.asarray(self.actionCount) + 1 simply returns a new ndarray without changing the original one.
Try

a = np.zeros(4)
b = np.asarray(a) + 1

and see the value of a and b

@atki4564
Copy link
Author

On Issue#2:
then line 90,
else:
# update estimation with constant step size
self.qEst[action] += 0.1 * (reward - self.qEst[action])
is duplicate because you already 'repaired every action estimation per formula' in line 62?

@ShangtongZhang
Copy link
Owner

No. This is normal update for action estimation. Action estimation needs to be repaired whenever we want to choose an action. But that repair isn't and shouldn't be lasting, it should be forgotten after having chosen an action.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants