[Doc] Improve navigability for new contributors

## Motivation

Feedback from teams adopting TorchRL is that the library works well when things go right, but debugging is hard: when a bug surfaces, navigating the layers of code to find the relevant component takes significant effort. Specific pain points reported recently:

- Understanding what happens inside a collector rollout (what the `_carrier` / "shuttle" is for, when and why device casts happen, what each helper in the inner loop does).
- Tracing a bug in recurrent (LSTM) hidden-state handling, which requires reading across `LSTMModule`, `InitTracker`, and loss-side `is_init` masking.

The underlying issue is not missing API reference — that part is comprehensive. What is missing is the architecture/internals layer between API ref and user-facing tutorials, plus a glossary of TorchRL-specific terms. Users only need to read the code when something breaks, and right now that path is steep.

This issue tracks a focused ~3-day doc push to address it.

## Audit findings

### Collector internals
- `_carrier` (formerly `shuttle`) is the central abstraction of the rollout loop and is **defined nowhere in docs**. References at `torchrl/collectors/_single.py:1638, 1664, 1689, 1691, 1723` are orphaned for a reader.
- The main rollout loop at `_single.py:1597-1800` has only low-level inline comments — no high-level outline of the per-timestep flow.
- Recent improvements (hook mechanism in `b0990828c`, optional trajectory IDs in `d3862872e`) have not yet been surfaced in docs.

### Recurrent state lifecycle
- Debugging a hidden-state bug requires traversing three subsystems with no unifying doc:
  - `torchrl/modules/tensordict_module/rnn.py` (`LSTMModule.forward` L791, `_lstm` L896, `set_recurrent_mode` L1829, global `recurrent_mode_state_manager` L1821).
  - `torchrl/envs/transforms/_env.py:1364` (`InitTracker`).
  - `torchrl/objectives/value/advantages.py:533` (`is_init` masking in GAE / advantage computation).
- The shape transformations at `rnn.py:913-919` (the `[..., 0, :, :]` indexing and transposes) have no explanatory comments. `_lstm` has no docstring.
- The "final hidden values should not be trusted" caveat at `rnn.py:445` is easy to miss.
- Existing tests at `test/test_tensordictmodules.py:1059` and `:2007` cover unit-level paths only — no integration test exercises the full policy -> collector -> buffer -> loss path with a multi-trajectory batch and mid-batch `done`.

### Navigation
- `docs/source/index.rst` orients newcomers toward "getting started" and API reference, neither of which helps a debugger. There is no "when something breaks, start here" entry point.
- No glossary exists for TorchRL-specific terms (shuttle/carrier, `_AcceptedKeys`, `set_keys`, `in_keys`/`out_keys`, recurrent mode, `TensorDictPrimer`, `is_init`).
- `data_layout.rst` is a model document — precise, well-diagrammed — but it is underlinked from anywhere a new contributor would naturally start.

## Action plan (~3 focused days)

### Day 1 — Collector internals
- [ ] New page `docs/source/reference/collectors_internals.rst` containing: a diagram of the per-timestep rollout flow, an explanation of `_carrier` (why it exists, deviceless semantics, allocation amortization), the three sync points (`_sync_policy`, `_sync_env`, `_sync_storage`), and the conditions under which the `_cast_to_*_device` and `_shuttle_has_no_device` flags are set.
- [ ] Expand the `SyncDataCollector.rollout` docstring at `_single.py:1597` with a 5-bullet high-level outline of the per-timestep flow.
- [ ] New page `docs/source/reference/glossary.rst` with short entries for shuttle/carrier, `in_keys`/`out_keys`, `_AcceptedKeys`, `set_keys`, recurrent mode, `TensorDictPrimer`, `is_init`. Cross-link from the relevant API pages.
- [ ] Document the hook mechanism and the optional-trajectory-ID flag in `collectors_basics.rst`.

### Day 2 — Recurrent state lifecycle
- [ ] New page `docs/source/reference/recurrent_state_lifecycle.rst` (or a new section in `tutorials/sphinx-tutorials/dqn_with_rnn.rst`) tracing one hidden-state vector through policy -> carrier -> buffer -> loss, with a diagram of the call graph across the three subsystems and explicit semantics for when state is zeroed vs. carried.
- [ ] Add a docstring to `_lstm` in `rnn.py:896`, with inline comments explaining each reshape/transpose.
- [ ] Expand `LSTMModule.forward` docstring (`rnn.py:791`) to cover `is_init` semantics and the hidden-state-trust caveat.
- [ ] Document `set_recurrent_mode` and the global `recurrent_mode_state_manager` (thread safety, default state, lifecycle).
- [ ] Add an integration test in `test/test_tensordictmodules.py` covering policy -> `SyncDataCollector` -> replay buffer -> loss with a multi-trajectory batch containing a mid-batch `done`.

### Day 3 — Navigation
- [ ] Add a "Debugging and Internals" section to `docs/source/index.rst` linking to the new internals/lifecycle pages, the glossary, `data_layout.rst`, and `knowledge_base/DEBUGGING_RL.md`.
- [ ] Symptom -> file map (10-15 entries) on a new debugging page. Examples: "Hidden state stays identical across trajectories -> check `InitTracker` and `is_init` key wiring." "Collector yields empty tensors -> check `max_frames_per_traj`."
- [ ] Cross-links from `collectors.rst` and `modules_actors.rst` into the new pages.

## Possible follow-ups (out of scope for this issue)

- Small structural refactor of `_single.py`: extract device-casting blocks into named methods (e.g. `_prepare_for_policy`, `_prepare_for_env`) to flatten the rollout loop. The current 36 private helper methods are mostly fine but some can be inlined.
- Refactor `_lstm` shape handling to use named intermediates instead of `[..., 0, :, :]` indexing.

## Performance documentation (separate workstream)

Adjacent to the navigability work above, TorchRL has no consolidated performance section. Users currently discover throughput-relevant knobs by reading source or asking. Proposed structure for a new `docs/source/reference/performance/` tree, in two layers:

1. **Recipes** (cookbook): end-to-end pages such as "PPO on Isaac at max throughput", "SAC on MuJoCo (mjx vs CPU)", "DQN on Atari", "GRPO/LLM training" — each tied to a runnable script in `benchmarks/` so the numbers stay verifiable.
2. **Knobs reference** (per-component): each performance-relevant kwarg documented with trade-off, default, and when to flip it. Linked from the recipes.

Performance docs decay faster than any other doc — every page should carry a version stamp and link the benchmark script that produced its numbers, otherwise the section becomes misleading within two releases.

### Proposed pages

- [ ] **Environments at scale**: Isaac Lab (GPU-resident rollouts, native autoreset path from `3f0ea1a85`, avoiding host-device transfers); MuJoCo (`mjx` vs classic CPU, batch-size crossover); Gym/Gymnasium (`AsyncVectorEnv` vs `ParallelEnv` vs `SerialEnv` with measured numbers); custom-env pitfalls (`.item()`, batched specs).
- [ ] **Device placement** (its own page — most-misunderstood TorchRL perf concept): when `device` matters vs `storing_device` vs `policy_device` vs `env.device`, what gets copied where, what triggers a sync. Doubles as a debugging aid.
- [ ] **Collector configurations**: `SyncDataCollector` vs `MultiSync` vs `MultiaSync` vs `AsyncBatchedCollector` decision table with measured throughput; the knobs (`frames_per_batch`, `init_random_frames`, `max_frames_per_traj`, `update_at_each_batch`, `cudagraph_policy`, `compile_policy`, `trust_policy`, `no_cuda_sync`, optional trajectory IDs); weight-sync schemes (recent redesign) with throughput per topology; the inference-server pattern.
- [ ] **`torch.compile` + cudagraphs**: compile modes with TorchRL-specific gotchas (TensorDictModule recompilation triggers, data-dependent shapes, `set_interaction_type` boundaries); cudagraph wrappers — where they help, where they break; compiled losses; recompilation diagnostics (`TORCH_LOGS=recompiles`).
- [ ] **RNN/recurrent at scale**: recurrent mode vs sequential mode throughput trade-off; padded vs packed sequences for variable-length trajectories; `make_tensordict_primer()` impact; pitfalls that look like perf bugs but are correctness bugs (`is_init` not wired, `InitTracker` missing).
- [ ] **Value/policy nets**: shared trunks (`share_parameters`) and when sharing hurts; `ProbabilisticActor` overheads (lazy distribution construction, `default_interaction_type`, `return_log_prob`); `functional=True` vs stateful module paths in losses; target-net update strategies; mixed precision (bf16) — what's safe to lower.
- [ ] **Replay buffers**: storage choice (`LazyTensorStorage` vs `LazyMemmapStorage` vs `CompressedStorage`) decision table by size and access pattern; sampler overhead (prioritized vs uniform, slice samplers); async writers / prefetching / pinned memory; buffer-on-GPU vs buffer-on-CPU+pinned crossover.
- [ ] **Losses / advantages**: `vec_advantage` and the vectorized GAE path; where the per-iteration cost lives in PPO/A2C/SAC/IMPALA with annotated profiler traces; compiled-loss gotchas.
- [ ] **Distributed**: distributed collectors, Ray-based setups, multi-GPU policy + collector pinning, weight-broadcast strategies. (Decays fastest — keep lean and link runnable examples.)
- [ ] **Profiling and diagnostics** (highest leverage of the bunch): `TORCHRL_PROFILING=1` coverage (recent commit `0cf0faa6d`); `torchrl.timeit` usage; `torch.profiler` + NVTX integration; a bottleneck-triage flowchart (env-step-dominant vs policy-dominant vs advantage/loss-dominant) pointing readers at the right knob page.

### v1 scope

Resist covering everything. Three or four high-quality recipes (Isaac PPO, MuJoCo SAC, Atari DQN, an LLM/GRPO setup) plus the device-placement page and the profiling page would move the needle more than fifteen thin pages. Add knob references incrementally as recipes reveal which knobs matter.

## Contributions welcome

Any of these items is a self-contained doc PR — happy to coordinate or review. Comments on additional pain points, especially from teams that have adopted TorchRL recently, are very welcome on this issue.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc] Improve navigability for new contributors #3746

Motivation

Audit findings

Collector internals

Recurrent state lifecycle

Navigation

Action plan (~3 focused days)

Day 1 — Collector internals

Day 2 — Recurrent state lifecycle

Day 3 — Navigation

Possible follow-ups (out of scope for this issue)

Performance documentation (separate workstream)

Proposed pages

v1 scope

Contributions welcome

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Doc] Improve navigability for new contributors #3746

Description

Motivation

Audit findings

Collector internals

Recurrent state lifecycle

Navigation

Action plan (~3 focused days)

Day 1 — Collector internals

Day 2 — Recurrent state lifecycle

Day 3 — Navigation

Possible follow-ups (out of scope for this issue)

Performance documentation (separate workstream)

Proposed pages

v1 scope

Contributions welcome

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions