-
Notifications
You must be signed in to change notification settings - Fork 418
Description
Motivation
For environments with high-dimensional observation (e.g. camera feed of autonomous vehicles) and/or long rollout times (i.e. with large max_steps) running on gpu, using thrl_envs.EnvBase.rollout to collect trajectories of the policy interacting with the environment could cause vram to explode over time. From my rough understanding, all the intermediate tensordict returned by thrl_envs.EnvBase.step would all be stored on gpu. Suppose I am currently at thrl_envs.EnvBase.step method only depends on the tensordict returned at tensordicts returned prior to
Solution
When collecting rollout trajectories with thrl_envs.EnvBase.rollout, is it possible to move intermediate tensordicts returned by thrl_envs.EnvBase.step to cpu so that precious gpu vram can be freed for other arithmetic computations that actually requires gpu? That is, perhaps add an argument trajectory_save_device to thrl_envs.EnvBase.rollout so that its return is on cpu if trajectory_save_device="cpu".
Alternatives
It is possible, but a pain, to use for loops to replicate the behavior of thrl_envs.EnvBase.rollout with move and save on cpu.
Additional context
Add any other context or screenshots about the feature request here.
Checklist
- [ *] I have checked that there is no similar issue in the repo (required)