the accuracy can be better after fine tune the function `calc_cum_discounted_reward` as follows #28

suenpun · 2019-09-18T10:23:01Z

the code is at

Lines 182 to 183 in d2f44ad

    
           running_add = self.gamma * running_add + cum_disc_reward[:, t] 
        
           cum_disc_reward[:, t] = running_add

I modified the code as a normal way to calculate returns(just change the time t), it becomes better.
that is from :

for t in reversed(range(self.path_length)):
            running_add = self.gamma * running_add + cum_disc_reward[:, t]
            cum_disc_reward[:, t] = running_add

to :

for t in reversed(range(1, self.path_length)):
            running_add = self.gamma * running_add + cum_disc_reward[:, t]
            cum_disc_reward[:, t-1] = running_add

Thank you! I very appreciate you hard works, so I learned it line by line.

ty4b112 · 2019-09-25T15:51:43Z

As I understand, if the rewards is 1 for the final step and there is 3 steps in a trail, the reward will be [gamma+1, 1, 1]. Normally, it should be [gamma^2, gamma, 1], as the initial code.
In option.py, the gamma is 1, I think it's very strange that you can make performance gains with this kind of modification. Is there anything wrong with my understanding?

suenpun · 2019-10-09T02:19:35Z

As I understand, if the rewards is 1 for the final step and there is 3 steps in a trail, the reward will be [gamma+1, 1, 1]. Normally, it should be [gamma^2, gamma, 1], as the initial code.
In option.py, the gamma is 1, I think it's very strange that you can make performance gains with this kind of modification. Is there anything wrong with my understanding?

Sorry for responding so late. Thanks for your patient response. You are totally right. After checked my code, some parameters have occasionally been enhanced, leading to the returns be better. Sorry for my careless. Thank you again.

suenpun closed this as completed Oct 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the accuracy can be better after fine tune the function `calc_cum_discounted_reward` as follows #28

the accuracy can be better after fine tune the function `calc_cum_discounted_reward` as follows #28

suenpun commented Sep 18, 2019

ty4b112 commented Sep 25, 2019

suenpun commented Oct 9, 2019 •

edited

the accuracy can be better after fine tune the function calc_cum_discounted_reward as follows #28

the accuracy can be better after fine tune the function calc_cum_discounted_reward as follows #28

Comments

suenpun commented Sep 18, 2019

ty4b112 commented Sep 25, 2019

suenpun commented Oct 9, 2019 • edited

the accuracy can be better after fine tune the function `calc_cum_discounted_reward` as follows #28

the accuracy can be better after fine tune the function `calc_cum_discounted_reward` as follows #28

suenpun commented Oct 9, 2019 •

edited