Update DexSuite camera mini-batches#5668
Conversation
Set the camera-based DexSuite Kuka-Allegro RSL-RL examples to use 8 mini-batches per update for PPO training.
Greptile SummaryThis PR updates the
Confidence Score: 5/5Safe to merge — only a numeric hyperparameter is changed in two example configs with no effect on any shared base config or other tasks. The change is isolated to two camera-based example runner configs. num_steps_per_env=32 is evenly divisible by 8, so the rollout buffer splits cleanly. The base ALGO_CFG and the state-only default config are untouched, and the changelog entry matches the change. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Rollout Collection\nnum_steps_per_env=32 per env] --> B[Full Rollout Buffer\n32 × num_envs steps]
B --> C{Split into mini-batches}
C -->|Before PR\nnum_mini_batches=2| D[2 mini-batches\n16 steps/env each]
C -->|After PR\nnum_mini_batches=8| E[8 mini-batches\n4 steps/env each]
D --> F[PPO Update\nnum_learning_epochs=5]
E --> F
F --> G[Next Rollout]
style E fill:#90EE90
style D fill:#FFB6C1
Reviews (1): Last reviewed commit: "Update DexSuite camera mini-batches" | Re-trigger Greptile |
There was a problem hiding this comment.
Code Review: PR #5668 - Update DexSuite camera mini-batches
Summary
This is a well-motivated configuration fix that addresses a training regression introduced by PR #5024 (heterogeneous Newton backend support). The empirical evidence showing success rate recovery from 0.51% to 69.11% is compelling.
✅ Strengths
- Clear motivation with data: The PR provides concrete metrics demonstrating the issue and fix effectiveness
- Consistent changes: Both
single_cameraandduo_cameraconfigs are updated symmetrically - Proper changelog fragment: Follows the repository's changelog convention
📝 Observations
-
Mini-batch ratio change: The camera-based configs now use 8 mini-batches vs. the default
ALGO_CFGvalue of 4. This 2x multiplier for camera-based training makes sense given the higher memory/compute requirements of CNN policies, but it would be helpful to document why camera configs specifically need this. -
CI Status: The
rendering-correctnesscheck is failing, though this appears unrelated to the config changes in this PR. Worth confirming this is a known flaky test or pre-existing issue ondevelop. -
Divisibility consideration: With
num_steps_per_env = 32andnum_mini_batches = 8, the effective mini-batch size will be(num_envs * 32) / 8. This should work fine for typical environment counts, but ensure the total samples per update is divisible by 8 for the environments where these configs are used.
💭 Optional Suggestions (Non-blocking)
- Consider adding a brief inline comment in the config explaining why camera-based training uses a higher mini-batch count than the default (e.g.,
# Higher mini-batch count for better sample efficiency with CNN policies)
Verdict
The change is straightforward, well-justified with empirical data, and follows the existing code patterns. LGTM from a code perspective.
## Summary Since Heterogeneous support isaac-sim#5024 for newton, the base camera training for dexsuite became more difficult to get decent success rate within 2000 iterations, increasing minibatch size from 2 to 8 helps recover the sample efficiency ``` sha | description | SR | verdict ------------------+--------------------------------+--------+-------- 6621d49 | parent commit | 0.7122 | PASS 965136d | PR isaac-sim#5024 default | 0.0051 | FAIL 965136dc_minib8 | PR isaac-sim#5024 + num_mini_batches=8 | 0.6911 | PASS ``` - Updated DexSuite Kuka-Allegro single-camera and duo-camera RSL-RL PPO examples to use 8 mini-batches per update. - Added an isaaclab_tasks changelog fragment for the example config change.
Summary
Since Heterogeneous support #5024 for newton, the base camera training for dexsuite became more difficult to get decent success rate within 2000 iterations, increasing minibatch size from 2 to 8 helps recover the sample efficiency