Skip to content

[Feature Request] thrl_envs.EnvBase.rollout save to different device #3177

@busFred

Description

@busFred

Motivation

For environments with high-dimensional observation (e.g. camera feed of autonomous vehicles) and/or long rollout times (i.e. with large max_steps) running on gpu, using thrl_envs.EnvBase.rollout to collect trajectories of the policy interacting with the environment could cause vram to explode over time. From my rough understanding, all the intermediate tensordict returned by thrl_envs.EnvBase.step would all be stored on gpu. Suppose I am currently at $t=99$, the thrl_envs.EnvBase.step method only depends on the tensordict returned at $t=98$, whereas the rest of the tensordicts returned prior to $t=98$ are sitting on vram but not actively participating in any kind of arithmetic operations.

Solution

When collecting rollout trajectories with thrl_envs.EnvBase.rollout, is it possible to move intermediate tensordicts returned by thrl_envs.EnvBase.step to cpu so that precious gpu vram can be freed for other arithmetic computations that actually requires gpu? That is, perhaps add an argument trajectory_save_device to thrl_envs.EnvBase.rollout so that its return is on cpu if trajectory_save_device="cpu".

Alternatives

It is possible, but a pain, to use for loops to replicate the behavior of thrl_envs.EnvBase.rollout with move and save on cpu.

Additional context

Add any other context or screenshots about the feature request here.

Checklist

  • [ *] I have checked that there is no similar issue in the repo (required)

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions