Sampling entire trajectorie for a single env from RolloutStorage #4

offline-rl-neurips · 2020-10-13T06:11:47Z

Hi, If I needed to sample entire trajectories for a single env from the RolloutStorage, what would be an easy way to do so?
For example, would it be easier to adapt the recurrent_generator -- any hints would be really appreciated (haven't dealt much with PPO, so this may be a really stupid question)!

Context: I want to randomly sample 2 trajectories (possibly from different envs) and add an auxiliary loss which depends on these two trajectories.

The text was updated successfully, but these errors were encountered:

agarwl · 2020-10-13T19:08:37Z

Would this work?

class TrajStorage(object):
    def __init__(self, rollouts):
      trajs = []
      num_processes = rollouts.obs.shape[1]
      for env_index in range(num_processes):
        env_masks = rollouts.masks[:, env_index, 0]
        env_obs = rollouts.obs[:, env_index]
        env_actions = rollouts.actions[:, env_index]

        indices = np.where(1 - env_masks)
        prev_index = 0
        for index in indices:
          obs = env_obs[prev_index: index]
          actions = env_actions[prev_index: index]
          prev_index = index 
          trajs.append((obs, actions))

      self.trajs = trajs
      self.num_trajs = len(trajs)

    def sample_trajs(self, trajs):
        idx1, idx2 = np.random.randint(0, self.num_trajs, 2)
        return self.trajs[idx1], self.trajs[idx2]

rraileanu · 2020-10-15T19:02:31Z

Hi! Yes, the above looks right to me!

The only issue is that you might end up with some partial trajectories since the first observation in rollouts might be a continuation of a trajectory collected during the previous update (so it's not necessarily the initial observation in an environment). But I think this can be fixed by removing the first trajectory that you are adding i.e. start from prev_index = indices[0] rather than prev_index = 0.

offline-rl-neurips closed this as completed Oct 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sampling entire trajectorie for a single env from RolloutStorage #4

Sampling entire trajectorie for a single env from RolloutStorage #4

offline-rl-neurips commented Oct 13, 2020

agarwl commented Oct 13, 2020

rraileanu commented Oct 15, 2020

Sampling entire trajectorie for a single env from RolloutStorage #4

Sampling entire trajectorie for a single env from RolloutStorage #4

Comments

offline-rl-neurips commented Oct 13, 2020

agarwl commented Oct 13, 2020

rraileanu commented Oct 15, 2020