You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Feedback from teams adopting TorchRL is that the library works well when things go right, but debugging is hard: when a bug surfaces, navigating the layers of code to find the relevant component takes significant effort. Specific pain points reported recently:
Understanding what happens inside a collector rollout (what the _carrier / "shuttle" is for, when and why device casts happen, what each helper in the inner loop does).
Tracing a bug in recurrent (LSTM) hidden-state handling, which requires reading across LSTMModule, InitTracker, and loss-side is_init masking.
The underlying issue is not missing API reference — that part is comprehensive. What is missing is the architecture/internals layer between API ref and user-facing tutorials, plus a glossary of TorchRL-specific terms. Users only need to read the code when something breaks, and right now that path is steep.
This issue tracks a focused ~3-day doc push to address it.
Audit findings
Collector internals
_carrier (formerly shuttle) is the central abstraction of the rollout loop and is defined nowhere in docs. References at torchrl/collectors/_single.py:1638, 1664, 1689, 1691, 1723 are orphaned for a reader.
The main rollout loop at _single.py:1597-1800 has only low-level inline comments — no high-level outline of the per-timestep flow.
Recent improvements (hook mechanism in b0990828c, optional trajectory IDs in d3862872e) have not yet been surfaced in docs.
Recurrent state lifecycle
Debugging a hidden-state bug requires traversing three subsystems with no unifying doc:
torchrl/modules/tensordict_module/rnn.py (LSTMModule.forward L791, _lstm L896, set_recurrent_mode L1829, global recurrent_mode_state_manager L1821).
torchrl/objectives/value/advantages.py:533 (is_init masking in GAE / advantage computation).
The shape transformations at rnn.py:913-919 (the [..., 0, :, :] indexing and transposes) have no explanatory comments. _lstm has no docstring.
The "final hidden values should not be trusted" caveat at rnn.py:445 is easy to miss.
Existing tests at test/test_tensordictmodules.py:1059 and :2007 cover unit-level paths only — no integration test exercises the full policy -> collector -> buffer -> loss path with a multi-trajectory batch and mid-batch done.
Navigation
docs/source/index.rst orients newcomers toward "getting started" and API reference, neither of which helps a debugger. There is no "when something breaks, start here" entry point.
No glossary exists for TorchRL-specific terms (shuttle/carrier, _AcceptedKeys, set_keys, in_keys/out_keys, recurrent mode, TensorDictPrimer, is_init).
data_layout.rst is a model document — precise, well-diagrammed — but it is underlinked from anywhere a new contributor would naturally start.
Action plan (~3 focused days)
Day 1 — Collector internals
New page docs/source/reference/collectors_internals.rst containing: a diagram of the per-timestep rollout flow, an explanation of _carrier (why it exists, deviceless semantics, allocation amortization), the three sync points (_sync_policy, _sync_env, _sync_storage), and the conditions under which the _cast_to_*_device and _shuttle_has_no_device flags are set.
Expand the SyncDataCollector.rollout docstring at _single.py:1597 with a 5-bullet high-level outline of the per-timestep flow.
New page docs/source/reference/glossary.rst with short entries for shuttle/carrier, in_keys/out_keys, _AcceptedKeys, set_keys, recurrent mode, TensorDictPrimer, is_init. Cross-link from the relevant API pages.
Document the hook mechanism and the optional-trajectory-ID flag in collectors_basics.rst.
Day 2 — Recurrent state lifecycle
New page docs/source/reference/recurrent_state_lifecycle.rst (or a new section in tutorials/sphinx-tutorials/dqn_with_rnn.rst) tracing one hidden-state vector through policy -> carrier -> buffer -> loss, with a diagram of the call graph across the three subsystems and explicit semantics for when state is zeroed vs. carried.
Add a docstring to _lstm in rnn.py:896, with inline comments explaining each reshape/transpose.
Expand LSTMModule.forward docstring (rnn.py:791) to cover is_init semantics and the hidden-state-trust caveat.
Document set_recurrent_mode and the global recurrent_mode_state_manager (thread safety, default state, lifecycle).
Add an integration test in test/test_tensordictmodules.py covering policy -> SyncDataCollector -> replay buffer -> loss with a multi-trajectory batch containing a mid-batch done.
Day 3 — Navigation
Add a "Debugging and Internals" section to docs/source/index.rst linking to the new internals/lifecycle pages, the glossary, data_layout.rst, and knowledge_base/DEBUGGING_RL.md.
Symptom -> file map (10-15 entries) on a new debugging page. Examples: "Hidden state stays identical across trajectories -> check InitTracker and is_init key wiring." "Collector yields empty tensors -> check max_frames_per_traj."
Cross-links from collectors.rst and modules_actors.rst into the new pages.
Possible follow-ups (out of scope for this issue)
Small structural refactor of _single.py: extract device-casting blocks into named methods (e.g. _prepare_for_policy, _prepare_for_env) to flatten the rollout loop. The current 36 private helper methods are mostly fine but some can be inlined.
Refactor _lstm shape handling to use named intermediates instead of [..., 0, :, :] indexing.
Performance documentation (separate workstream)
Adjacent to the navigability work above, TorchRL has no consolidated performance section. Users currently discover throughput-relevant knobs by reading source or asking. Proposed structure for a new docs/source/reference/performance/ tree, in two layers:
Recipes (cookbook): end-to-end pages such as "PPO on Isaac at max throughput", "SAC on MuJoCo (mjx vs CPU)", "DQN on Atari", "GRPO/LLM training" — each tied to a runnable script in benchmarks/ so the numbers stay verifiable.
Knobs reference (per-component): each performance-relevant kwarg documented with trade-off, default, and when to flip it. Linked from the recipes.
Performance docs decay faster than any other doc — every page should carry a version stamp and link the benchmark script that produced its numbers, otherwise the section becomes misleading within two releases.
Proposed pages
Environments at scale: Isaac Lab (GPU-resident rollouts, native autoreset path from 3f0ea1a85, avoiding host-device transfers); MuJoCo (mjx vs classic CPU, batch-size crossover); Gym/Gymnasium (AsyncVectorEnv vs ParallelEnv vs SerialEnv with measured numbers); custom-env pitfalls (.item(), batched specs).
Device placement (its own page — most-misunderstood TorchRL perf concept): when device matters vs storing_device vs policy_device vs env.device, what gets copied where, what triggers a sync. Doubles as a debugging aid.
Collector configurations: SyncDataCollector vs MultiSync vs MultiaSync vs AsyncBatchedCollector decision table with measured throughput; the knobs (frames_per_batch, init_random_frames, max_frames_per_traj, update_at_each_batch, cudagraph_policy, compile_policy, trust_policy, no_cuda_sync, optional trajectory IDs); weight-sync schemes (recent redesign) with throughput per topology; the inference-server pattern.
torch.compile + cudagraphs: compile modes with TorchRL-specific gotchas (TensorDictModule recompilation triggers, data-dependent shapes, set_interaction_type boundaries); cudagraph wrappers — where they help, where they break; compiled losses; recompilation diagnostics (TORCH_LOGS=recompiles).
RNN/recurrent at scale: recurrent mode vs sequential mode throughput trade-off; padded vs packed sequences for variable-length trajectories; make_tensordict_primer() impact; pitfalls that look like perf bugs but are correctness bugs (is_init not wired, InitTracker missing).
Value/policy nets: shared trunks (share_parameters) and when sharing hurts; ProbabilisticActor overheads (lazy distribution construction, default_interaction_type, return_log_prob); functional=True vs stateful module paths in losses; target-net update strategies; mixed precision (bf16) — what's safe to lower.
Replay buffers: storage choice (LazyTensorStorage vs LazyMemmapStorage vs CompressedStorage) decision table by size and access pattern; sampler overhead (prioritized vs uniform, slice samplers); async writers / prefetching / pinned memory; buffer-on-GPU vs buffer-on-CPU+pinned crossover.
Losses / advantages: vec_advantage and the vectorized GAE path; where the per-iteration cost lives in PPO/A2C/SAC/IMPALA with annotated profiler traces; compiled-loss gotchas.
Profiling and diagnostics (highest leverage of the bunch): TORCHRL_PROFILING=1 coverage (recent commit 0cf0faa6d); torchrl.timeit usage; torch.profiler + NVTX integration; a bottleneck-triage flowchart (env-step-dominant vs policy-dominant vs advantage/loss-dominant) pointing readers at the right knob page.
v1 scope
Resist covering everything. Three or four high-quality recipes (Isaac PPO, MuJoCo SAC, Atari DQN, an LLM/GRPO setup) plus the device-placement page and the profiling page would move the needle more than fifteen thin pages. Add knob references incrementally as recipes reveal which knobs matter.
Contributions welcome
Any of these items is a self-contained doc PR — happy to coordinate or review. Comments on additional pain points, especially from teams that have adopted TorchRL recently, are very welcome on this issue.
Motivation
Feedback from teams adopting TorchRL is that the library works well when things go right, but debugging is hard: when a bug surfaces, navigating the layers of code to find the relevant component takes significant effort. Specific pain points reported recently:
_carrier/ "shuttle" is for, when and why device casts happen, what each helper in the inner loop does).LSTMModule,InitTracker, and loss-sideis_initmasking.The underlying issue is not missing API reference — that part is comprehensive. What is missing is the architecture/internals layer between API ref and user-facing tutorials, plus a glossary of TorchRL-specific terms. Users only need to read the code when something breaks, and right now that path is steep.
This issue tracks a focused ~3-day doc push to address it.
Audit findings
Collector internals
_carrier(formerlyshuttle) is the central abstraction of the rollout loop and is defined nowhere in docs. References attorchrl/collectors/_single.py:1638, 1664, 1689, 1691, 1723are orphaned for a reader._single.py:1597-1800has only low-level inline comments — no high-level outline of the per-timestep flow.b0990828c, optional trajectory IDs ind3862872e) have not yet been surfaced in docs.Recurrent state lifecycle
torchrl/modules/tensordict_module/rnn.py(LSTMModule.forwardL791,_lstmL896,set_recurrent_modeL1829, globalrecurrent_mode_state_managerL1821).torchrl/envs/transforms/_env.py:1364(InitTracker).torchrl/objectives/value/advantages.py:533(is_initmasking in GAE / advantage computation).rnn.py:913-919(the[..., 0, :, :]indexing and transposes) have no explanatory comments._lstmhas no docstring.rnn.py:445is easy to miss.test/test_tensordictmodules.py:1059and:2007cover unit-level paths only — no integration test exercises the full policy -> collector -> buffer -> loss path with a multi-trajectory batch and mid-batchdone.Navigation
docs/source/index.rstorients newcomers toward "getting started" and API reference, neither of which helps a debugger. There is no "when something breaks, start here" entry point._AcceptedKeys,set_keys,in_keys/out_keys, recurrent mode,TensorDictPrimer,is_init).data_layout.rstis a model document — precise, well-diagrammed — but it is underlinked from anywhere a new contributor would naturally start.Action plan (~3 focused days)
Day 1 — Collector internals
docs/source/reference/collectors_internals.rstcontaining: a diagram of the per-timestep rollout flow, an explanation of_carrier(why it exists, deviceless semantics, allocation amortization), the three sync points (_sync_policy,_sync_env,_sync_storage), and the conditions under which the_cast_to_*_deviceand_shuttle_has_no_deviceflags are set.SyncDataCollector.rolloutdocstring at_single.py:1597with a 5-bullet high-level outline of the per-timestep flow.docs/source/reference/glossary.rstwith short entries for shuttle/carrier,in_keys/out_keys,_AcceptedKeys,set_keys, recurrent mode,TensorDictPrimer,is_init. Cross-link from the relevant API pages.collectors_basics.rst.Day 2 — Recurrent state lifecycle
docs/source/reference/recurrent_state_lifecycle.rst(or a new section intutorials/sphinx-tutorials/dqn_with_rnn.rst) tracing one hidden-state vector through policy -> carrier -> buffer -> loss, with a diagram of the call graph across the three subsystems and explicit semantics for when state is zeroed vs. carried._lstminrnn.py:896, with inline comments explaining each reshape/transpose.LSTMModule.forwarddocstring (rnn.py:791) to coveris_initsemantics and the hidden-state-trust caveat.set_recurrent_modeand the globalrecurrent_mode_state_manager(thread safety, default state, lifecycle).test/test_tensordictmodules.pycovering policy ->SyncDataCollector-> replay buffer -> loss with a multi-trajectory batch containing a mid-batchdone.Day 3 — Navigation
docs/source/index.rstlinking to the new internals/lifecycle pages, the glossary,data_layout.rst, andknowledge_base/DEBUGGING_RL.md.InitTrackerandis_initkey wiring." "Collector yields empty tensors -> checkmax_frames_per_traj."collectors.rstandmodules_actors.rstinto the new pages.Possible follow-ups (out of scope for this issue)
_single.py: extract device-casting blocks into named methods (e.g._prepare_for_policy,_prepare_for_env) to flatten the rollout loop. The current 36 private helper methods are mostly fine but some can be inlined._lstmshape handling to use named intermediates instead of[..., 0, :, :]indexing.Performance documentation (separate workstream)
Adjacent to the navigability work above, TorchRL has no consolidated performance section. Users currently discover throughput-relevant knobs by reading source or asking. Proposed structure for a new
docs/source/reference/performance/tree, in two layers:benchmarks/so the numbers stay verifiable.Performance docs decay faster than any other doc — every page should carry a version stamp and link the benchmark script that produced its numbers, otherwise the section becomes misleading within two releases.
Proposed pages
3f0ea1a85, avoiding host-device transfers); MuJoCo (mjxvs classic CPU, batch-size crossover); Gym/Gymnasium (AsyncVectorEnvvsParallelEnvvsSerialEnvwith measured numbers); custom-env pitfalls (.item(), batched specs).devicematters vsstoring_devicevspolicy_devicevsenv.device, what gets copied where, what triggers a sync. Doubles as a debugging aid.SyncDataCollectorvsMultiSyncvsMultiaSyncvsAsyncBatchedCollectordecision table with measured throughput; the knobs (frames_per_batch,init_random_frames,max_frames_per_traj,update_at_each_batch,cudagraph_policy,compile_policy,trust_policy,no_cuda_sync, optional trajectory IDs); weight-sync schemes (recent redesign) with throughput per topology; the inference-server pattern.torch.compile+ cudagraphs: compile modes with TorchRL-specific gotchas (TensorDictModule recompilation triggers, data-dependent shapes,set_interaction_typeboundaries); cudagraph wrappers — where they help, where they break; compiled losses; recompilation diagnostics (TORCH_LOGS=recompiles).make_tensordict_primer()impact; pitfalls that look like perf bugs but are correctness bugs (is_initnot wired,InitTrackermissing).share_parameters) and when sharing hurts;ProbabilisticActoroverheads (lazy distribution construction,default_interaction_type,return_log_prob);functional=Truevs stateful module paths in losses; target-net update strategies; mixed precision (bf16) — what's safe to lower.LazyTensorStoragevsLazyMemmapStoragevsCompressedStorage) decision table by size and access pattern; sampler overhead (prioritized vs uniform, slice samplers); async writers / prefetching / pinned memory; buffer-on-GPU vs buffer-on-CPU+pinned crossover.vec_advantageand the vectorized GAE path; where the per-iteration cost lives in PPO/A2C/SAC/IMPALA with annotated profiler traces; compiled-loss gotchas.TORCHRL_PROFILING=1coverage (recent commit0cf0faa6d);torchrl.timeitusage;torch.profiler+ NVTX integration; a bottleneck-triage flowchart (env-step-dominant vs policy-dominant vs advantage/loss-dominant) pointing readers at the right knob page.v1 scope
Resist covering everything. Three or four high-quality recipes (Isaac PPO, MuJoCo SAC, Atari DQN, an LLM/GRPO setup) plus the device-placement page and the profiling page would move the needle more than fifteen thin pages. Add knob references incrementally as recipes reveal which knobs matter.
Contributions welcome
Any of these items is a self-contained doc PR — happy to coordinate or review. Comments on additional pain points, especially from teams that have adopted TorchRL recently, are very welcome on this issue.