Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the accuracy can be better after fine tune the function calc_cum_discounted_reward as follows #28

Closed
suenpun opened this issue Sep 18, 2019 · 2 comments

Comments

@suenpun
Copy link

suenpun commented Sep 18, 2019

the code is at

running_add = self.gamma * running_add + cum_disc_reward[:, t]
cum_disc_reward[:, t] = running_add

I modified the code as a normal way to calculate returns(just change the time t), it becomes better.
that is from :

for t in reversed(range(self.path_length)):
            running_add = self.gamma * running_add + cum_disc_reward[:, t]
            cum_disc_reward[:, t] = running_add

to :

for t in reversed(range(1, self.path_length)):
            running_add = self.gamma * running_add + cum_disc_reward[:, t]
            cum_disc_reward[:, t-1] = running_add

Thank you! I very appreciate you hard works, so I learned it line by line.

@ty4b112
Copy link

ty4b112 commented Sep 25, 2019

As I understand, if the rewards is 1 for the final step and there is 3 steps in a trail, the reward will be [gamma+1, 1, 1]. Normally, it should be [gamma^2, gamma, 1], as the initial code.
In option.py, the gamma is 1, I think it's very strange that you can make performance gains with this kind of modification. Is there anything wrong with my understanding?

@suenpun
Copy link
Author

suenpun commented Oct 9, 2019

As I understand, if the rewards is 1 for the final step and there is 3 steps in a trail, the reward will be [gamma+1, 1, 1]. Normally, it should be [gamma^2, gamma, 1], as the initial code.
In option.py, the gamma is 1, I think it's very strange that you can make performance gains with this kind of modification. Is there anything wrong with my understanding?

Sorry for responding so late. Thanks for your patient response. You are totally right. After checked my code, some parameters have occasionally been enhanced, leading to the returns be better. Sorry for my careless. Thank you again.

@suenpun suenpun closed this as completed Oct 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants