You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As I understand, if the rewards is 1 for the final step and there is 3 steps in a trail, the reward will be [gamma+1, 1, 1]. Normally, it should be [gamma^2, gamma, 1], as the initial code.
In option.py, the gamma is 1, I think it's very strange that you can make performance gains with this kind of modification. Is there anything wrong with my understanding?
As I understand, if the rewards is 1 for the final step and there is 3 steps in a trail, the reward will be [gamma+1, 1, 1]. Normally, it should be [gamma^2, gamma, 1], as the initial code.
In option.py, the gamma is 1, I think it's very strange that you can make performance gains with this kind of modification. Is there anything wrong with my understanding?
Sorry for responding so late. Thanks for your patient response. You are totally right. After checked my code, some parameters have occasionally been enhanced, leading to the returns be better. Sorry for my careless. Thank you again.
the code is at
MINERVA/code/model/trainer.py
Lines 182 to 183 in d2f44ad
I modified the code as a normal way to calculate returns(just change the time t), it becomes better.
that is from :
to :
Thank you! I very appreciate you hard works, so I learned it line by line.
The text was updated successfully, but these errors were encountered: