You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I believe there is a bug in the A3C algorithm implementation. In the file "ProcessAgent.py" on line 107. The sub-episode return should be the value in the next state not the previous.
I suggest replacing:
prediction, value = self.predict(self.env.current_state)
...
if done or time_count == Config.TIME_MAX:
terminal_reward = 0 if done else value
with:
prediction, value = self.predict(self.env.current_state)
...
if done or time_count == Config.TIME_MAX:
terminal_reward = 0
if not done:
(_, terminal_reward) = self.predict(self.env.current_state)
The text was updated successfully, but these errors were encountered:
The last line indicates that experience is backward looking. I assume that's why the terminal_reward that is equal to value is consistent when done is False?
I believe there is a bug in the A3C algorithm implementation. In the file "ProcessAgent.py" on line 107. The sub-episode return should be the value in the next state not the previous.
I suggest replacing:
with:
The text was updated successfully, but these errors were encountered: