You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried an example with pole cart and the starter code with the provided FIFO on-policy replay buffer with PPO. There is an error regarding the cumulative reward.
With FIFOOnPolicyReplayBuffer and PPO we get:
Traceback (most recent call last):
File "C:\Users\ericz\Documents\Github\RLTrading Model\RLTrading\Modelling\Tests\Pearl\Basic.py", line 25, in <module>
trainer.train(num_iterations=1000)
File "C:\Users\ericz\Documents\Github\RLTrading Model\RLTrading\Modelling\Training\Pearl\Trainer.py", line 46, in train
info = online_learning(
File "C:\Users\ericz\Documents\Github\RLTrading Model\RLTrading-Data\RLTrading\Pearl\pearl\utils\functional_utils\train_and_eval\online_learning.py", line 107, in online_learning
episode_info, episode_total_steps = run_episode(
File "C:\Users\ericz\Documents\Github\RLTrading Model\RLTrading-Data\RLTrading\Pearl\pearl\utils\functional_utils\train_and_eval\online_learning.py", line 275, in run_episode
agent.learn()
File "C:\Users\ericz\Documents\Github\RLTrading Model\RLTrading-Data\RLTrading\Pearl\pearl\pearl_agent.py", line 206, in learn
report = self.policy_learner.learn(self.replay_buffer)
File "C:\Users\ericz\Documents\Github\RLTrading Model\RLTrading-Data\RLTrading\Pearl\pearl\policy_learners\sequential_decision_making\ppo.py", line 154, in learn
result = super().learn(replay_buffer)
File "C:\Users\ericz\Documents\Github\RLTrading Model\RLTrading-Data\RLTrading\Pearl\pearl\policy_learners\policy_learner.py", line 171, in learn
single_report = self.learn_batch(batch)
File "C:\Users\ericz\Documents\Github\RLTrading Model\RLTrading-Data\RLTrading\Pearl\pearl\policy_learners\sequential_decision_making\actor_critic_base.py", line 231, in learn_batch
self._critic_learn_batch(batch) # update critic
File "C:\Users\ericz\Documents\Github\RLTrading Model\RLTrading-Data\RLTrading\Pearl\pearl\policy_learners\sequential_decision_making\ppo.py", line 145, in _critic_learn_batch
assert batch.cum_reward is not None
AssertionError
Data in FIFOOnPolicyReplayBuffer can be used before reaching the end of each episode. But this is not the case with OnPolicyEpisodicReplayBuffer.
FIFOOnPolicyReplayBuffer involves two actions in each transition tuple (s, a, r, s', a') while OnPolicyEpisodicReplayBuffer involves one action in each transition tuple (s, a, r, s').
FIFOOnPolicyReplayBuffer and OnPolicyEpisodicReplayBuffer can be merged. This would reduce confusion and repetition. And this is already in our plan.
I tried an example with pole cart and the starter code with the provided FIFO on-policy replay buffer with PPO. There is an error regarding the cumulative reward.
With
FIFOOnPolicyReplayBuffer
and PPO we get:Here is the full example code:
The text was updated successfully, but these errors were encountered: