Skip to content

[Feature] Add Evaluator class for sync/async evaluation#3594

Merged
vmoens merged 3 commits intomainfrom
async-evaluator
Apr 6, 2026
Merged

[Feature] Add Evaluator class for sync/async evaluation#3594
vmoens merged 3 commits intomainfrom
async-evaluator

Conversation

@vmoens
Copy link
Copy Markdown
Collaborator

@vmoens vmoens commented Apr 5, 2026

Summary

  • Adds a new Evaluator class (torchrl/collectors/_evaluator.py) that provides a unified API for running evaluation rollouts during RL training, either synchronously (blocking) or asynchronously (fire-and-forget in a background thread)
  • Supports two pluggable backends: "thread" (default, uses daemon thread + env.rollout()) and "ray" (wraps existing RayEvalWorker)
  • Includes automatic metric logging, custom metrics via metrics_fn, video recording via VideoRecorder transforms, and user callbacks

Motivation

All sota-implementations currently do blocking synchronous evaluation inside the training loop. For expensive environments (robotics simulators, LLM generation), this wastes significant training time. The Evaluator decouples evaluation from training.

API

from torchrl.collectors import Evaluator

evaluator = Evaluator(make_eval_env, eval_policy, max_steps=1000, logger=logger)

# Blocking:
metrics = evaluator.evaluate(weights=train_policy, step=step)

# Non-blocking:
evaluator.trigger_eval(weights=train_policy, step=step)
result = evaluator.poll()   # None if still running
result = evaluator.wait()   # block until done

evaluator.shutdown()

Files changed

  • New: torchrl/collectors/_evaluator.pyEvaluator, _ThreadEvalBackend, _RayEvalBackend
  • New: docs/source/reference/collectors_eval.rst — documentation page
  • New: test/test_evaluator.py — 20 unit tests
  • Modified: torchrl/collectors/__init__.py — exports Evaluator
  • Modified: docs/source/reference/collectors.rst — adds eval page to toctree

Test plan

  • All 20 tests pass locally (pytest test/test_evaluator.py -v — 20 passed)
  • CI passes
  • Ray backend tests (requires Ray, skipped in basic CI)

🤖 Generated with Claude Code

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 5, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3594

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit d0e6e64 with merge base d4bb55e (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 5, 2026
@github-actions github-actions Bot added Feature New feature Documentation Improvements or additions to documentation Collectors labels Apr 5, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 5, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}13$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 80.0962μs 79.3426μs 12.6036 KOps/s 12.3568 KOps/s $\color{#35bf28}+2.00\%$
test_tensor_to_bytestream_speed[torch.save] 0.1390ms 0.1384ms 7.2248 KOps/s 7.1226 KOps/s $\color{#35bf28}+1.44\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1104s 0.1099s 9.1028 Ops/s 9.1117 Ops/s $\color{#d91a1a}-0.10\%$
test_tensor_to_bytestream_speed[numpy] 2.6523μs 2.6443μs 378.1786 KOps/s 399.2702 KOps/s $\textbf{\color{#d91a1a}-5.28\%}$
test_tensor_to_bytestream_speed[safetensors] 38.4297μs 37.1627μs 26.9087 KOps/s 27.3886 KOps/s $\color{#d91a1a}-1.75\%$
test_simple 0.9086s 0.8075s 1.2385 Ops/s 1.2417 Ops/s $\color{#d91a1a}-0.26\%$
test_transformed 1.3768s 1.3724s 0.7287 Ops/s 0.7155 Ops/s $\color{#35bf28}+1.84\%$
test_serial 2.2932s 2.2914s 0.4364 Ops/s 0.4329 Ops/s $\color{#35bf28}+0.81\%$
test_parallel 1.7963s 1.7867s 0.5597 Ops/s 0.5466 Ops/s $\color{#35bf28}+2.40\%$
test_step_mdp_speed[True-True-True-True-True] 0.3753ms 41.5242μs 24.0823 KOps/s 24.3269 KOps/s $\color{#d91a1a}-1.01\%$
test_step_mdp_speed[True-True-True-True-False] 46.6120μs 23.0783μs 43.3308 KOps/s 43.1867 KOps/s $\color{#35bf28}+0.33\%$
test_step_mdp_speed[True-True-True-False-True] 51.0720μs 23.5854μs 42.3991 KOps/s 42.1952 KOps/s $\color{#35bf28}+0.48\%$
test_step_mdp_speed[True-True-True-False-False] 44.5810μs 12.7830μs 78.2291 KOps/s 78.4819 KOps/s $\color{#d91a1a}-0.32\%$
test_step_mdp_speed[True-True-False-True-True] 83.8330μs 43.9690μs 22.7433 KOps/s 22.5020 KOps/s $\color{#35bf28}+1.07\%$
test_step_mdp_speed[True-True-False-True-False] 80.4330μs 25.1448μs 39.7697 KOps/s 39.2552 KOps/s $\color{#35bf28}+1.31\%$
test_step_mdp_speed[True-True-False-False-True] 60.0920μs 26.3024μs 38.0193 KOps/s 37.4601 KOps/s $\color{#35bf28}+1.49\%$
test_step_mdp_speed[True-True-False-False-False] 59.8220μs 15.4476μs 64.7350 KOps/s 64.3641 KOps/s $\color{#35bf28}+0.58\%$
test_step_mdp_speed[True-False-True-True-True] 0.1245ms 45.4916μs 21.9821 KOps/s 21.4720 KOps/s $\color{#35bf28}+2.38\%$
test_step_mdp_speed[True-False-True-True-False] 56.1120μs 27.8416μs 35.9175 KOps/s 35.3731 KOps/s $\color{#35bf28}+1.54\%$
test_step_mdp_speed[True-False-True-False-True] 53.0620μs 25.9496μs 38.5363 KOps/s 37.7967 KOps/s $\color{#35bf28}+1.96\%$
test_step_mdp_speed[True-False-True-False-False] 41.6220μs 15.1239μs 66.1205 KOps/s 64.2287 KOps/s $\color{#35bf28}+2.95\%$
test_step_mdp_speed[True-False-False-True-True] 77.1240μs 48.5756μs 20.5865 KOps/s 20.1775 KOps/s $\color{#35bf28}+2.03\%$
test_step_mdp_speed[True-False-False-True-False] 55.9820μs 30.3238μs 32.9774 KOps/s 32.5562 KOps/s $\color{#35bf28}+1.29\%$
test_step_mdp_speed[True-False-False-False-True] 79.5730μs 28.1232μs 35.5578 KOps/s 34.1523 KOps/s $\color{#35bf28}+4.12\%$
test_step_mdp_speed[True-False-False-False-False] 42.3310μs 17.6089μs 56.7896 KOps/s 54.7946 KOps/s $\color{#35bf28}+3.64\%$
test_step_mdp_speed[False-True-True-True-True] 79.6130μs 46.4803μs 21.5145 KOps/s 21.0795 KOps/s $\color{#35bf28}+2.06\%$
test_step_mdp_speed[False-True-True-True-False] 57.8120μs 27.6902μs 36.1138 KOps/s 35.1739 KOps/s $\color{#35bf28}+2.67\%$
test_step_mdp_speed[False-True-True-False-True] 2.4656ms 30.0344μs 33.2951 KOps/s 33.2054 KOps/s $\color{#35bf28}+0.27\%$
test_step_mdp_speed[False-True-True-False-False] 48.1420μs 16.7832μs 59.5834 KOps/s 58.3654 KOps/s $\color{#35bf28}+2.09\%$
test_step_mdp_speed[False-True-False-True-True] 0.1006ms 49.3033μs 20.2826 KOps/s 20.0221 KOps/s $\color{#35bf28}+1.30\%$
test_step_mdp_speed[False-True-False-True-False] 55.7620μs 30.2929μs 33.0111 KOps/s 32.0895 KOps/s $\color{#35bf28}+2.87\%$
test_step_mdp_speed[False-True-False-False-True] 63.8530μs 31.9752μs 31.2742 KOps/s 30.7488 KOps/s $\color{#35bf28}+1.71\%$
test_step_mdp_speed[False-True-False-False-False] 43.3720μs 19.0165μs 52.5859 KOps/s 51.3398 KOps/s $\color{#35bf28}+2.43\%$
test_step_mdp_speed[False-False-True-True-True] 84.7630μs 51.4824μs 19.4241 KOps/s 19.2212 KOps/s $\color{#35bf28}+1.06\%$
test_step_mdp_speed[False-False-True-True-False] 62.7130μs 32.7752μs 30.5108 KOps/s 30.3464 KOps/s $\color{#35bf28}+0.54\%$
test_step_mdp_speed[False-False-True-False-True] 76.8430μs 32.0299μs 31.2208 KOps/s 30.8625 KOps/s $\color{#35bf28}+1.16\%$
test_step_mdp_speed[False-False-True-False-False] 44.6020μs 19.0747μs 52.4254 KOps/s 51.7542 KOps/s $\color{#35bf28}+1.30\%$
test_step_mdp_speed[False-False-False-True-True] 84.8240μs 53.2087μs 18.7939 KOps/s 18.5571 KOps/s $\color{#35bf28}+1.28\%$
test_step_mdp_speed[False-False-False-True-False] 61.2020μs 35.2605μs 28.3603 KOps/s 28.0050 KOps/s $\color{#35bf28}+1.27\%$
test_step_mdp_speed[False-False-False-False-True] 69.2630μs 33.7355μs 29.6423 KOps/s 29.5883 KOps/s $\color{#35bf28}+0.18\%$
test_step_mdp_speed[False-False-False-False-False] 47.9720μs 21.5183μs 46.4722 KOps/s 45.8295 KOps/s $\color{#35bf28}+1.40\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7253s 0.7161s 1.3965 Ops/s 1.3399 Ops/s $\color{#35bf28}+4.22\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7198s 0.6108s 1.6373 Ops/s 1.6352 Ops/s $\color{#35bf28}+0.13\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7550s 1.6550s 0.6042 Ops/s 0.6002 Ops/s $\color{#35bf28}+0.66\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5118s 1.4299s 0.6993 Ops/s 0.6997 Ops/s $\color{#d91a1a}-0.05\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9752s 1.8873s 0.5299 Ops/s 0.5260 Ops/s $\color{#35bf28}+0.73\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7691s 1.6793s 0.5955 Ops/s 0.5956 Ops/s $\color{#d91a1a}-0.02\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.7160s 4.6156s 0.2167 Ops/s 0.2179 Ops/s $\color{#d91a1a}-0.57\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.6412s 4.4625s 0.2241 Ops/s 0.2272 Ops/s $\color{#d91a1a}-1.37\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9728s 1.8785s 0.5323 Ops/s 0.5351 Ops/s $\color{#d91a1a}-0.52\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7233s 1.6124s 0.6202 Ops/s 0.6274 Ops/s $\color{#d91a1a}-1.15\%$
test_values[generalized_advantage_estimate-True-True] 20.5628ms 20.1451ms 49.6398 Ops/s 50.4520 Ops/s $\color{#d91a1a}-1.61\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1501s 3.9066ms 255.9760 Ops/s 283.7865 Ops/s $\textbf{\color{#d91a1a}-9.80\%}$
test_values[td0_return_estimate-False-False] 0.1128ms 81.1196μs 12.3275 KOps/s 12.3886 KOps/s $\color{#d91a1a}-0.49\%$
test_values[td1_return_estimate-False-False] 48.1176ms 47.7229ms 20.9543 Ops/s 21.2698 Ops/s $\color{#d91a1a}-1.48\%$
test_values[vec_td1_return_estimate-False-False] 1.2763ms 1.0707ms 933.9734 Ops/s 947.4477 Ops/s $\color{#d91a1a}-1.42\%$
test_values[td_lambda_return_estimate-True-False] 78.5518ms 78.0979ms 12.8044 Ops/s 13.0896 Ops/s $\color{#d91a1a}-2.18\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.2891ms 1.0691ms 935.3573 Ops/s 948.5962 Ops/s $\color{#d91a1a}-1.40\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 20.3554ms 20.1636ms 49.5944 Ops/s 50.6368 Ops/s $\color{#d91a1a}-2.06\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0179ms 0.7383ms 1.3545 KOps/s 1.3561 KOps/s $\color{#d91a1a}-0.12\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7165ms 0.6614ms 1.5119 KOps/s 1.5361 KOps/s $\color{#d91a1a}-1.57\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5226ms 1.4754ms 677.7625 Ops/s 682.8732 Ops/s $\color{#d91a1a}-0.75\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7232ms 0.6772ms 1.4767 KOps/s 1.5017 KOps/s $\color{#d91a1a}-1.66\%$
test_dqn_speed[False-None] 1.6639ms 1.5637ms 639.5257 Ops/s 642.3259 Ops/s $\color{#d91a1a}-0.44\%$
test_dqn_speed[False-backward] 2.2669ms 2.1949ms 455.5989 Ops/s 454.8796 Ops/s $\color{#35bf28}+0.16\%$
test_dqn_speed[True-None] 1.1516ms 0.6001ms 1.6663 KOps/s 1.6689 KOps/s $\color{#d91a1a}-0.16\%$
test_dqn_speed[True-backward] 1.1941ms 1.1437ms 874.3657 Ops/s 849.3294 Ops/s $\color{#35bf28}+2.95\%$
test_dqn_speed[reduce-overhead-None] 0.6968ms 0.6272ms 1.5943 KOps/s 1.5583 KOps/s $\color{#35bf28}+2.31\%$
test_ddpg_speed[False-None] 3.3667ms 2.9816ms 335.3913 Ops/s 329.2027 Ops/s $\color{#35bf28}+1.88\%$
test_ddpg_speed[False-backward] 4.8270ms 4.2263ms 236.6118 Ops/s 234.3690 Ops/s $\color{#35bf28}+0.96\%$
test_ddpg_speed[True-None] 1.4800ms 1.3713ms 729.2389 Ops/s 724.6778 Ops/s $\color{#35bf28}+0.63\%$
test_ddpg_speed[True-backward] 2.4459ms 2.3973ms 417.1433 Ops/s 413.0310 Ops/s $\color{#35bf28}+1.00\%$
test_ddpg_speed[reduce-overhead-None] 1.5072ms 1.3953ms 716.6789 Ops/s 715.8704 Ops/s $\color{#35bf28}+0.11\%$
test_sac_speed[False-None] 8.8492ms 8.3591ms 119.6304 Ops/s 119.7909 Ops/s $\color{#d91a1a}-0.13\%$
test_sac_speed[False-backward] 11.8078ms 11.2781ms 88.6670 Ops/s 89.2520 Ops/s $\color{#d91a1a}-0.66\%$
test_sac_speed[True-None] 2.0431ms 1.9421ms 514.9151 Ops/s 519.2625 Ops/s $\color{#d91a1a}-0.84\%$
test_sac_speed[True-backward] 3.6359ms 3.5594ms 280.9425 Ops/s 278.0323 Ops/s $\color{#35bf28}+1.05\%$
test_sac_speed[reduce-overhead-None] 16.6191ms 10.1380ms 98.6384 Ops/s 99.3433 Ops/s $\color{#d91a1a}-0.71\%$
test_redq_deprec_speed[False-None] 10.2642ms 9.3438ms 107.0227 Ops/s 106.3353 Ops/s $\color{#35bf28}+0.65\%$
test_redq_deprec_speed[False-backward] 13.1560ms 12.3552ms 80.9377 Ops/s 80.9996 Ops/s $\color{#d91a1a}-0.08\%$
test_redq_deprec_speed[True-None] 2.8711ms 2.6950ms 371.0536 Ops/s 368.3114 Ops/s $\color{#35bf28}+0.74\%$
test_redq_deprec_speed[True-backward] 4.6208ms 4.2123ms 237.4019 Ops/s 225.9621 Ops/s $\textbf{\color{#35bf28}+5.06\%}$
test_redq_deprec_speed[reduce-overhead-None] 14.5065ms 9.6198ms 103.9528 Ops/s 103.6625 Ops/s $\color{#35bf28}+0.28\%$
test_td3_speed[False-None] 8.3750ms 8.1757ms 122.3136 Ops/s 122.0013 Ops/s $\color{#35bf28}+0.26\%$
test_td3_speed[False-backward] 10.8127ms 10.4689ms 95.5212 Ops/s 93.2551 Ops/s $\color{#35bf28}+2.43\%$
test_td3_speed[True-None] 1.7304ms 1.7039ms 586.8913 Ops/s 588.2530 Ops/s $\color{#d91a1a}-0.23\%$
test_td3_speed[True-backward] 3.1964ms 3.0694ms 325.7972 Ops/s 305.5310 Ops/s $\textbf{\color{#35bf28}+6.63\%}$
test_td3_speed[reduce-overhead-None] 50.0525ms 25.7202ms 38.8799 Ops/s 39.1475 Ops/s $\color{#d91a1a}-0.68\%$
test_cql_speed[False-None] 17.6528ms 17.3587ms 57.6081 Ops/s 57.7314 Ops/s $\color{#d91a1a}-0.21\%$
test_cql_speed[False-backward] 22.9284ms 22.4243ms 44.5945 Ops/s 44.2186 Ops/s $\color{#35bf28}+0.85\%$
test_cql_speed[True-None] 3.7007ms 3.4957ms 286.0636 Ops/s 291.9595 Ops/s $\color{#d91a1a}-2.02\%$
test_cql_speed[True-backward] 6.0098ms 5.6060ms 178.3808 Ops/s 176.5661 Ops/s $\color{#35bf28}+1.03\%$
test_cql_speed[reduce-overhead-None] 18.9713ms 12.0617ms 82.9068 Ops/s 83.1286 Ops/s $\color{#d91a1a}-0.27\%$
test_a2c_speed[False-None] 3.3831ms 3.2903ms 303.9258 Ops/s 300.6332 Ops/s $\color{#35bf28}+1.10\%$
test_a2c_speed[False-backward] 7.3273ms 6.1495ms 162.6161 Ops/s 165.5934 Ops/s $\color{#d91a1a}-1.80\%$
test_a2c_speed[True-None] 1.5997ms 1.4831ms 674.2694 Ops/s 679.3302 Ops/s $\color{#d91a1a}-0.74\%$
test_a2c_speed[True-backward] 3.1827ms 3.1136ms 321.1757 Ops/s 315.0490 Ops/s $\color{#35bf28}+1.94\%$
test_a2c_speed[reduce-overhead-None] 1.2114ms 1.1048ms 905.1770 Ops/s 898.6840 Ops/s $\color{#35bf28}+0.72\%$
test_ppo_speed[False-None] 4.2527ms 3.9946ms 250.3388 Ops/s 249.7363 Ops/s $\color{#35bf28}+0.24\%$
test_ppo_speed[False-backward] 7.4779ms 7.0697ms 141.4479 Ops/s 140.5737 Ops/s $\color{#35bf28}+0.62\%$
test_ppo_speed[True-None] 1.6826ms 1.6045ms 623.2543 Ops/s 625.3761 Ops/s $\color{#d91a1a}-0.34\%$
test_ppo_speed[True-backward] 3.6709ms 3.3187ms 301.3233 Ops/s 288.5366 Ops/s $\color{#35bf28}+4.43\%$
test_ppo_speed[reduce-overhead-None] 1.2387ms 1.1589ms 862.8797 Ops/s 854.0123 Ops/s $\color{#35bf28}+1.04\%$
test_reinforce_speed[False-None] 2.7332ms 2.3557ms 424.5085 Ops/s 424.6103 Ops/s $\color{#d91a1a}-0.02\%$
test_reinforce_speed[False-backward] 3.5090ms 3.3443ms 299.0146 Ops/s 288.4732 Ops/s $\color{#35bf28}+3.65\%$
test_reinforce_speed[True-None] 1.5644ms 1.4564ms 686.6050 Ops/s 683.4534 Ops/s $\color{#35bf28}+0.46\%$
test_reinforce_speed[True-backward] 3.3057ms 3.1332ms 319.1578 Ops/s 305.4843 Ops/s $\color{#35bf28}+4.48\%$
test_reinforce_speed[reduce-overhead-None] 16.6775ms 9.0676ms 110.2830 Ops/s 113.6157 Ops/s $\color{#d91a1a}-2.93\%$
test_iql_speed[False-None] 10.1807ms 9.5716ms 104.4759 Ops/s 104.4069 Ops/s $\color{#35bf28}+0.07\%$
test_iql_speed[False-backward] 13.9072ms 13.1973ms 75.7733 Ops/s 74.5167 Ops/s $\color{#35bf28}+1.69\%$
test_iql_speed[True-None] 2.5328ms 2.3656ms 422.7330 Ops/s 428.6382 Ops/s $\color{#d91a1a}-1.38\%$
test_iql_speed[True-backward] 5.2769ms 4.9397ms 202.4420 Ops/s 196.0212 Ops/s $\color{#35bf28}+3.28\%$
test_iql_speed[reduce-overhead-None] 17.0378ms 10.0558ms 99.4452 Ops/s 100.4271 Ops/s $\color{#d91a1a}-0.98\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.4567ms 5.9774ms 167.2967 Ops/s 163.9776 Ops/s $\color{#35bf28}+2.02\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8523ms 0.3752ms 2.6655 KOps/s 2.6586 KOps/s $\color{#35bf28}+0.26\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7284ms 0.3596ms 2.7806 KOps/s 2.7575 KOps/s $\color{#35bf28}+0.84\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1098ms 5.8143ms 171.9891 Ops/s 170.2422 Ops/s $\color{#35bf28}+1.03\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.9623ms 0.3202ms 3.1234 KOps/s 2.7236 KOps/s $\textbf{\color{#35bf28}+14.68\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7100ms 0.3086ms 3.2400 KOps/s 2.7543 KOps/s $\textbf{\color{#35bf28}+17.63\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.4382ms 1.2720ms 786.1516 Ops/s 696.8319 Ops/s $\textbf{\color{#35bf28}+12.82\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.5196ms 1.1806ms 847.0147 Ops/s 745.8400 Ops/s $\textbf{\color{#35bf28}+13.57\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 10.2229ms 6.0984ms 163.9771 Ops/s 166.3387 Ops/s $\color{#d91a1a}-1.42\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.2632ms 0.4375ms 2.2855 KOps/s 2.0525 KOps/s $\textbf{\color{#35bf28}+11.35\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8472ms 0.4686ms 2.1340 KOps/s 2.3456 KOps/s $\textbf{\color{#d91a1a}-9.02\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0802ms 5.8232ms 171.7276 Ops/s 169.6339 Ops/s $\color{#35bf28}+1.23\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.1357ms 0.3600ms 2.7779 KOps/s 2.8280 KOps/s $\color{#d91a1a}-1.77\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6338ms 0.3161ms 3.1635 KOps/s 2.6996 KOps/s $\textbf{\color{#35bf28}+17.19\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.9605ms 5.7003ms 175.4281 Ops/s 172.0249 Ops/s $\color{#35bf28}+1.98\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.4321ms 0.3967ms 2.5207 KOps/s 3.1316 KOps/s $\textbf{\color{#d91a1a}-19.51\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6238ms 0.3616ms 2.7651 KOps/s 3.2675 KOps/s $\textbf{\color{#d91a1a}-15.37\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.1093ms 5.9054ms 169.3373 Ops/s 167.5518 Ops/s $\color{#35bf28}+1.07\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.2289ms 0.5022ms 1.9913 KOps/s 2.2370 KOps/s $\textbf{\color{#d91a1a}-10.99\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6285ms 0.4672ms 2.1404 KOps/s 2.0557 KOps/s $\color{#35bf28}+4.12\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 1.0019s 25.0260ms 39.9584 Ops/s 34.7982 Ops/s $\textbf{\color{#35bf28}+14.83\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 11.6869ms 2.0962ms 477.0619 Ops/s 534.5499 Ops/s $\textbf{\color{#d91a1a}-10.75\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.2365ms 1.0138ms 986.4255 Ops/s 1.0300 KOps/s $\color{#d91a1a}-4.23\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 10.0821ms 5.1488ms 194.2200 Ops/s 192.7774 Ops/s $\color{#35bf28}+0.75\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 4.1092ms 1.8445ms 542.1397 Ops/s 509.9263 Ops/s $\textbf{\color{#35bf28}+6.32\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.1743ms 0.9784ms 1.0220 KOps/s 721.6839 Ops/s $\textbf{\color{#35bf28}+41.62\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 9.2171ms 5.2568ms 190.2291 Ops/s 187.3124 Ops/s $\color{#35bf28}+1.56\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 9.3717ms 2.1826ms 458.1724 Ops/s 467.7632 Ops/s $\color{#d91a1a}-2.05\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 10.9603ms 1.3442ms 743.9505 Ops/s 847.6312 Ops/s $\textbf{\color{#d91a1a}-12.23\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 43.0435ms 39.8728ms 25.0797 Ops/s 25.6298 Ops/s $\color{#d91a1a}-2.15\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 20.4239ms 18.2709ms 54.7320 Ops/s 29.0545 Ops/s $\textbf{\color{#35bf28}+88.38\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 43.8473ms 40.1354ms 24.9156 Ops/s 24.6663 Ops/s $\color{#35bf28}+1.01\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.6332ms 18.1770ms 55.0146 Ops/s 54.8136 Ops/s $\color{#35bf28}+0.37\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 43.1803ms 41.7309ms 23.9631 Ops/s 23.6687 Ops/s $\color{#35bf28}+1.24\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 20.8341ms 19.6175ms 50.9750 Ops/s 51.1694 Ops/s $\color{#d91a1a}-0.38\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8586ms 0.2213ms 4.5193 KOps/s 4.5430 KOps/s $\color{#d91a1a}-0.52\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.6594ms 1.4036ms 712.4554 Ops/s 682.1948 Ops/s $\color{#35bf28}+4.44\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.7515ms 2.3348ms 428.3000 Ops/s 424.4547 Ops/s $\color{#35bf28}+0.91\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.1784ms 2.9766ms 335.9583 Ops/s 330.7823 Ops/s $\color{#35bf28}+1.56\%$
test_storage_write_contiguous[50-img_shape0-small] 0.5651ms 0.1635ms 6.1166 KOps/s 6.1850 KOps/s $\color{#d91a1a}-1.11\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.4123ms 0.2379ms 4.2028 KOps/s 4.4283 KOps/s $\textbf{\color{#d91a1a}-5.09\%}$
test_storage_write_contiguous[100-img_shape2-large_img] 1.9118ms 1.7236ms 580.1670 Ops/s 543.6191 Ops/s $\textbf{\color{#35bf28}+6.72\%}$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.6697ms 1.3806ms 724.3215 Ops/s 711.4115 Ops/s $\color{#35bf28}+1.81\%$
test_collector_stack_then_write[50-img_shape0-small] 1.3510ms 1.1511ms 868.7418 Ops/s 859.0694 Ops/s $\color{#35bf28}+1.13\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.9382ms 3.6586ms 273.3257 Ops/s 268.7411 Ops/s $\color{#35bf28}+1.71\%$
test_collector_stack_then_write[100-img_shape2-large_img] 11.5958ms 6.0320ms 165.7835 Ops/s 166.3441 Ops/s $\color{#d91a1a}-0.34\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.6884ms 7.2311ms 138.2909 Ops/s 136.1460 Ops/s $\color{#35bf28}+1.58\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4367ms 0.2798ms 3.5743 KOps/s 3.6000 KOps/s $\color{#d91a1a}-0.71\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.8569ms 1.5630ms 639.7760 Ops/s 628.9244 Ops/s $\color{#35bf28}+1.73\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.8903ms 2.5263ms 395.8292 Ops/s 400.1359 Ops/s $\color{#d91a1a}-1.08\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.5921ms 3.2749ms 305.3487 Ops/s 305.0754 Ops/s $\color{#35bf28}+0.09\%$
test_collector_without_rb[100-img_shape0-atari] 33.0578ms 32.4622ms 30.8051 Ops/s 30.0576 Ops/s $\color{#35bf28}+2.49\%$
test_collector_without_rb[200-img_shape1-large_batch] 64.6469ms 64.2287ms 15.5694 Ops/s 15.2756 Ops/s $\color{#35bf28}+1.92\%$
test_collector_with_rb[100-img_shape0-atari] 37.7719ms 37.1068ms 26.9492 Ops/s 26.5011 Ops/s $\color{#35bf28}+1.69\%$
test_collector_with_rb[200-img_shape1-large_batch] 73.0327ms 72.3682ms 13.8182 Ops/s 13.4366 Ops/s $\color{#35bf28}+2.84\%$
test_collector_without_rb_cuda[100-img_shape0-atari] 56.4060ms 54.6888ms 18.2853 Ops/s 17.8688 Ops/s $\color{#35bf28}+2.33\%$
test_collector_without_rb_cuda[200-img_shape1-large_batch] 0.1093s 0.1088s 9.1948 Ops/s 9.0615 Ops/s $\color{#35bf28}+1.47\%$
test_collector_with_rb_cuda[100-img_shape0-atari] 58.8556ms 56.8827ms 17.5800 Ops/s 17.6136 Ops/s $\color{#d91a1a}-0.19\%$
test_collector_with_rb_cuda[200-img_shape1-large_batch] 0.1132s 0.1126s 8.8772 Ops/s 8.8096 Ops/s $\color{#35bf28}+0.77\%$

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 5, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}16$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 79.7888μs 78.8796μs 12.6775 KOps/s 12.4313 KOps/s $\color{#35bf28}+1.98\%$
test_tensor_to_bytestream_speed[torch.save] 0.1382ms 0.1379ms 7.2491 KOps/s 6.9945 KOps/s $\color{#35bf28}+3.64\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1077s 0.1072s 9.3306 Ops/s 9.2099 Ops/s $\color{#35bf28}+1.31\%$
test_tensor_to_bytestream_speed[numpy] 2.7122μs 2.6975μs 370.7136 KOps/s 370.9298 KOps/s $\color{#d91a1a}-0.06\%$
test_tensor_to_bytestream_speed[safetensors] 38.0552μs 37.8966μs 26.3876 KOps/s 27.2923 KOps/s $\color{#d91a1a}-3.31\%$
test_simple 0.5623s 0.5479s 1.8250 Ops/s 1.7267 Ops/s $\textbf{\color{#35bf28}+5.69\%}$
test_transformed 1.2090s 1.1025s 0.9070 Ops/s 0.8823 Ops/s $\color{#35bf28}+2.81\%$
test_serial 1.6854s 1.6848s 0.5936 Ops/s 0.5782 Ops/s $\color{#35bf28}+2.66\%$
test_parallel 1.0254s 1.0244s 0.9762 Ops/s 0.9650 Ops/s $\color{#35bf28}+1.16\%$
test_step_mdp_speed[True-True-True-True-True] 0.3196ms 41.0031μs 24.3884 KOps/s 24.1880 KOps/s $\color{#35bf28}+0.83\%$
test_step_mdp_speed[True-True-True-True-False] 52.3930μs 22.8810μs 43.7044 KOps/s 43.6744 KOps/s $\color{#35bf28}+0.07\%$
test_step_mdp_speed[True-True-True-False-True] 58.2840μs 23.3976μs 42.7395 KOps/s 42.2387 KOps/s $\color{#35bf28}+1.19\%$
test_step_mdp_speed[True-True-True-False-False] 39.5320μs 12.6217μs 79.2283 KOps/s 79.3265 KOps/s $\color{#d91a1a}-0.12\%$
test_step_mdp_speed[True-True-False-True-True] 85.3650μs 43.8291μs 22.8159 KOps/s 22.6387 KOps/s $\color{#35bf28}+0.78\%$
test_step_mdp_speed[True-True-False-True-False] 52.3530μs 25.1398μs 39.7776 KOps/s 39.4184 KOps/s $\color{#35bf28}+0.91\%$
test_step_mdp_speed[True-True-False-False-True] 68.4940μs 25.5604μs 39.1230 KOps/s 38.3382 KOps/s $\color{#35bf28}+2.05\%$
test_step_mdp_speed[True-True-False-False-False] 37.8620μs 15.1997μs 65.7910 KOps/s 66.0222 KOps/s $\color{#d91a1a}-0.35\%$
test_step_mdp_speed[True-False-True-True-True] 81.4040μs 45.9770μs 21.7500 KOps/s 21.5109 KOps/s $\color{#35bf28}+1.11\%$
test_step_mdp_speed[True-False-True-True-False] 64.6440μs 28.0252μs 35.6822 KOps/s 36.1384 KOps/s $\color{#d91a1a}-1.26\%$
test_step_mdp_speed[True-False-True-False-True] 61.6730μs 26.0734μs 38.3532 KOps/s 38.6024 KOps/s $\color{#d91a1a}-0.65\%$
test_step_mdp_speed[True-False-True-False-False] 43.1330μs 15.1654μs 65.9395 KOps/s 65.5220 KOps/s $\color{#35bf28}+0.64\%$
test_step_mdp_speed[True-False-False-True-True] 81.4050μs 49.0367μs 20.3929 KOps/s 20.5170 KOps/s $\color{#d91a1a}-0.61\%$
test_step_mdp_speed[True-False-False-True-False] 70.6340μs 30.0135μs 33.3183 KOps/s 33.1063 KOps/s $\color{#35bf28}+0.64\%$
test_step_mdp_speed[True-False-False-False-True] 64.5640μs 28.9546μs 34.5369 KOps/s 35.5458 KOps/s $\color{#d91a1a}-2.84\%$
test_step_mdp_speed[True-False-False-False-False] 40.2520μs 17.8111μs 56.1447 KOps/s 56.3726 KOps/s $\color{#d91a1a}-0.40\%$
test_step_mdp_speed[False-True-True-True-True] 93.1950μs 46.8862μs 21.3283 KOps/s 21.4461 KOps/s $\color{#d91a1a}-0.55\%$
test_step_mdp_speed[False-True-True-True-False] 66.6340μs 27.9883μs 35.7293 KOps/s 35.9380 KOps/s $\color{#d91a1a}-0.58\%$
test_step_mdp_speed[False-True-True-False-True] 2.6890ms 30.0010μs 33.3322 KOps/s 33.7903 KOps/s $\color{#d91a1a}-1.36\%$
test_step_mdp_speed[False-True-True-False-False] 46.4130μs 16.8320μs 59.4106 KOps/s 59.0515 KOps/s $\color{#35bf28}+0.61\%$
test_step_mdp_speed[False-True-False-True-True] 87.6150μs 48.9661μs 20.4223 KOps/s 20.3409 KOps/s $\color{#35bf28}+0.40\%$
test_step_mdp_speed[False-True-False-True-False] 62.4240μs 30.1543μs 33.1628 KOps/s 33.0041 KOps/s $\color{#35bf28}+0.48\%$
test_step_mdp_speed[False-True-False-False-True] 65.4240μs 31.5390μs 31.7068 KOps/s 30.7735 KOps/s $\color{#35bf28}+3.03\%$
test_step_mdp_speed[False-True-False-False-False] 50.3030μs 19.1348μs 52.2608 KOps/s 51.2029 KOps/s $\color{#35bf28}+2.07\%$
test_step_mdp_speed[False-False-True-True-True] 86.4550μs 50.9669μs 19.6206 KOps/s 19.5537 KOps/s $\color{#35bf28}+0.34\%$
test_step_mdp_speed[False-False-True-True-False] 62.4430μs 32.7648μs 30.5205 KOps/s 30.3085 KOps/s $\color{#35bf28}+0.70\%$
test_step_mdp_speed[False-False-True-False-True] 66.1640μs 31.9058μs 31.3422 KOps/s 31.6035 KOps/s $\color{#d91a1a}-0.83\%$
test_step_mdp_speed[False-False-True-False-False] 56.9730μs 19.4982μs 51.2868 KOps/s 51.5307 KOps/s $\color{#d91a1a}-0.47\%$
test_step_mdp_speed[False-False-False-True-True] 84.7750μs 52.4006μs 19.0837 KOps/s 18.7636 KOps/s $\color{#35bf28}+1.71\%$
test_step_mdp_speed[False-False-False-True-False] 0.1055ms 35.1059μs 28.4852 KOps/s 28.2203 KOps/s $\color{#35bf28}+0.94\%$
test_step_mdp_speed[False-False-False-False-True] 63.8940μs 33.2053μs 30.1157 KOps/s 29.7388 KOps/s $\color{#35bf28}+1.27\%$
test_step_mdp_speed[False-False-False-False-False] 49.9030μs 21.5269μs 46.4535 KOps/s 46.0338 KOps/s $\color{#35bf28}+0.91\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8571s 0.7495s 1.3342 Ops/s 1.3573 Ops/s $\color{#d91a1a}-1.70\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7172s 0.6115s 1.6352 Ops/s 1.6624 Ops/s $\color{#d91a1a}-1.63\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7395s 1.6507s 0.6058 Ops/s 0.6160 Ops/s $\color{#d91a1a}-1.65\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5217s 1.4350s 0.6969 Ops/s 0.7103 Ops/s $\color{#d91a1a}-1.89\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9892s 1.9007s 0.5261 Ops/s 0.5319 Ops/s $\color{#d91a1a}-1.09\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7711s 1.6839s 0.5938 Ops/s 0.6031 Ops/s $\color{#d91a1a}-1.54\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6858s 4.5892s 0.2179 Ops/s 0.2190 Ops/s $\color{#d91a1a}-0.52\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.4863s 4.3568s 0.2295 Ops/s 0.2313 Ops/s $\color{#d91a1a}-0.75\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9433s 1.8713s 0.5344 Ops/s 0.5385 Ops/s $\color{#d91a1a}-0.76\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6691s 1.5762s 0.6344 Ops/s 0.6257 Ops/s $\color{#35bf28}+1.40\%$
test_values[generalized_advantage_estimate-True-True] 10.3801ms 10.2088ms 97.9547 Ops/s 98.6312 Ops/s $\color{#d91a1a}-0.69\%$
test_values[vec_generalized_advantage_estimate-True-True] 18.4712ms 14.6962ms 68.0448 Ops/s 55.3306 Ops/s $\textbf{\color{#35bf28}+22.98\%}$
test_values[td0_return_estimate-False-False] 0.2326ms 0.1359ms 7.3592 KOps/s 7.8613 KOps/s $\textbf{\color{#d91a1a}-6.39\%}$
test_values[td1_return_estimate-False-False] 29.5252ms 27.9732ms 35.7484 Ops/s 35.8213 Ops/s $\color{#d91a1a}-0.20\%$
test_values[vec_td1_return_estimate-False-False] 18.6088ms 17.8655ms 55.9737 Ops/s 55.0096 Ops/s $\color{#35bf28}+1.75\%$
test_values[td_lambda_return_estimate-True-False] 42.5563ms 41.4499ms 24.1255 Ops/s 23.8773 Ops/s $\color{#35bf28}+1.04\%$
test_values[vec_td_lambda_return_estimate-True-False] 18.8091ms 17.8335ms 56.0742 Ops/s 55.0798 Ops/s $\color{#35bf28}+1.81\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 9.1162ms 9.0033ms 111.0707 Ops/s 110.5762 Ops/s $\color{#35bf28}+0.45\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.7143ms 1.5272ms 654.8133 Ops/s 640.2358 Ops/s $\color{#35bf28}+2.28\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.6004ms 0.4203ms 2.3795 KOps/s 2.3703 KOps/s $\color{#35bf28}+0.39\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 30.6092ms 30.1171ms 33.2038 Ops/s 28.4301 Ops/s $\textbf{\color{#35bf28}+16.79\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 2.1034ms 1.7706ms 564.7788 Ops/s 558.1070 Ops/s $\color{#35bf28}+1.20\%$
test_dqn_speed[False-None] 1.5864ms 1.4309ms 698.8827 Ops/s 693.2533 Ops/s $\color{#35bf28}+0.81\%$
test_dqn_speed[False-backward] 2.0185ms 1.9658ms 508.6889 Ops/s 499.7213 Ops/s $\color{#35bf28}+1.79\%$
test_dqn_speed[True-None] 0.8133ms 0.5956ms 1.6791 KOps/s 1.6557 KOps/s $\color{#35bf28}+1.41\%$
test_dqn_speed[True-backward] 1.1312ms 1.0786ms 927.0970 Ops/s 806.9923 Ops/s $\textbf{\color{#35bf28}+14.88\%}$
test_dqn_speed[reduce-overhead-None] 0.9449ms 0.5700ms 1.7543 KOps/s 1.6997 KOps/s $\color{#35bf28}+3.21\%$
test_ddpg_speed[False-None] 3.2629ms 2.9284ms 341.4807 Ops/s 347.4772 Ops/s $\color{#d91a1a}-1.73\%$
test_ddpg_speed[False-backward] 4.6447ms 4.2064ms 237.7336 Ops/s 242.1624 Ops/s $\color{#d91a1a}-1.83\%$
test_ddpg_speed[True-None] 1.8836ms 1.4919ms 670.2909 Ops/s 664.0839 Ops/s $\color{#35bf28}+0.93\%$
test_ddpg_speed[True-backward] 2.6618ms 2.5540ms 391.5429 Ops/s 353.9592 Ops/s $\textbf{\color{#35bf28}+10.62\%}$
test_ddpg_speed[reduce-overhead-None] 1.7972ms 1.5052ms 664.3723 Ops/s 672.1804 Ops/s $\color{#d91a1a}-1.16\%$
test_sac_speed[False-None] 9.1808ms 8.3030ms 120.4390 Ops/s 123.2388 Ops/s $\color{#d91a1a}-2.27\%$
test_sac_speed[False-backward] 12.2116ms 11.7068ms 85.4207 Ops/s 87.4007 Ops/s $\color{#d91a1a}-2.27\%$
test_sac_speed[True-None] 2.6985ms 2.2850ms 437.6435 Ops/s 438.8035 Ops/s $\color{#d91a1a}-0.26\%$
test_sac_speed[True-backward] 4.3707ms 4.2483ms 235.3879 Ops/s 222.5766 Ops/s $\textbf{\color{#35bf28}+5.76\%}$
test_sac_speed[reduce-overhead-None] 2.4819ms 2.2614ms 442.2082 Ops/s 439.2579 Ops/s $\color{#35bf28}+0.67\%$
test_redq_speed[False-None] 14.1530ms 11.1608ms 89.5996 Ops/s 91.1868 Ops/s $\color{#d91a1a}-1.74\%$
test_redq_speed[False-backward] 19.6448ms 19.0741ms 52.4270 Ops/s 52.9921 Ops/s $\color{#d91a1a}-1.07\%$
test_redq_speed[True-None] 5.1999ms 4.8312ms 206.9900 Ops/s 198.8887 Ops/s $\color{#35bf28}+4.07\%$
test_redq_speed[reduce-overhead-None] 5.2230ms 4.8004ms 208.3174 Ops/s 201.0090 Ops/s $\color{#35bf28}+3.64\%$
test_redq_deprec_speed[False-None] 13.0648ms 11.8301ms 84.5299 Ops/s 84.8347 Ops/s $\color{#d91a1a}-0.36\%$
test_redq_deprec_speed[False-backward] 17.9826ms 17.0337ms 58.7071 Ops/s 58.6793 Ops/s $\color{#35bf28}+0.05\%$
test_redq_deprec_speed[True-None] 4.3076ms 3.9177ms 255.2535 Ops/s 250.1287 Ops/s $\color{#35bf28}+2.05\%$
test_redq_deprec_speed[True-backward] 8.3133ms 7.8673ms 127.1086 Ops/s 125.9204 Ops/s $\color{#35bf28}+0.94\%$
test_redq_deprec_speed[reduce-overhead-None] 4.0303ms 3.8375ms 260.5880 Ops/s 259.4220 Ops/s $\color{#35bf28}+0.45\%$
test_td3_speed[False-None] 8.3884ms 8.2386ms 121.3797 Ops/s 122.4818 Ops/s $\color{#d91a1a}-0.90\%$
test_td3_speed[False-backward] 11.5483ms 11.2469ms 88.9134 Ops/s 90.0474 Ops/s $\color{#d91a1a}-1.26\%$
test_td3_speed[True-None] 1.9404ms 1.9012ms 525.9949 Ops/s 525.9583 Ops/s $+0.01\%$
test_td3_speed[True-backward] 3.9137ms 3.7654ms 265.5734 Ops/s 257.9355 Ops/s $\color{#35bf28}+2.96\%$
test_td3_speed[reduce-overhead-None] 1.9562ms 1.9125ms 522.8626 Ops/s 523.0754 Ops/s $\color{#d91a1a}-0.04\%$
test_cql_speed[False-None] 28.3450ms 27.4218ms 36.4674 Ops/s 36.8155 Ops/s $\color{#d91a1a}-0.95\%$
test_cql_speed[False-backward] 38.7215ms 37.5679ms 26.6185 Ops/s 26.9028 Ops/s $\color{#d91a1a}-1.06\%$
test_cql_speed[True-None] 13.6599ms 13.1145ms 76.2518 Ops/s 78.0932 Ops/s $\color{#d91a1a}-2.36\%$
test_cql_speed[True-backward] 20.0936ms 19.3051ms 51.7998 Ops/s 55.5748 Ops/s $\textbf{\color{#d91a1a}-6.79\%}$
test_cql_speed[reduce-overhead-None] 13.1740ms 12.8454ms 77.8491 Ops/s 74.5435 Ops/s $\color{#35bf28}+4.43\%$
test_a2c_speed[False-None] 6.0675ms 5.5732ms 179.4286 Ops/s 174.8567 Ops/s $\color{#35bf28}+2.61\%$
test_a2c_speed[False-backward] 12.8250ms 12.2999ms 81.3017 Ops/s 79.8159 Ops/s $\color{#35bf28}+1.86\%$
test_a2c_speed[True-None] 4.0266ms 3.9115ms 255.6533 Ops/s 252.6995 Ops/s $\color{#35bf28}+1.17\%$
test_a2c_speed[True-backward] 9.5020ms 9.2398ms 108.2271 Ops/s 107.6206 Ops/s $\color{#35bf28}+0.56\%$
test_a2c_speed[reduce-overhead-None] 4.3485ms 3.9809ms 251.2008 Ops/s 248.7115 Ops/s $\color{#35bf28}+1.00\%$
test_ppo_speed[False-None] 6.4472ms 6.1654ms 162.1949 Ops/s 162.6683 Ops/s $\color{#d91a1a}-0.29\%$
test_ppo_speed[False-backward] 14.0577ms 13.5914ms 73.5757 Ops/s 74.7576 Ops/s $\color{#d91a1a}-1.58\%$
test_ppo_speed[True-None] 4.2121ms 3.9631ms 252.3281 Ops/s 252.5672 Ops/s $\color{#d91a1a}-0.09\%$
test_ppo_speed[True-backward] 9.5309ms 9.1698ms 109.0534 Ops/s 103.6706 Ops/s $\textbf{\color{#35bf28}+5.19\%}$
test_ppo_speed[reduce-overhead-None] 4.8292ms 3.9771ms 251.4369 Ops/s 253.6279 Ops/s $\color{#d91a1a}-0.86\%$
test_reinforce_speed[False-None] 5.2912ms 4.9304ms 202.8240 Ops/s 206.1623 Ops/s $\color{#d91a1a}-1.62\%$
test_reinforce_speed[False-backward] 8.3537ms 7.9878ms 125.1905 Ops/s 127.2201 Ops/s $\color{#d91a1a}-1.60\%$
test_reinforce_speed[True-None] 3.7407ms 3.2272ms 309.8650 Ops/s 316.1231 Ops/s $\color{#d91a1a}-1.98\%$
test_reinforce_speed[True-backward] 9.0781ms 8.4395ms 118.4898 Ops/s 108.4354 Ops/s $\textbf{\color{#35bf28}+9.27\%}$
test_reinforce_speed[reduce-overhead-None] 3.4317ms 3.1841ms 314.0618 Ops/s 307.7318 Ops/s $\color{#35bf28}+2.06\%$
test_iql_speed[False-None] 21.8773ms 21.0504ms 47.5050 Ops/s 46.6462 Ops/s $\color{#35bf28}+1.84\%$
test_iql_speed[False-backward] 33.3710ms 32.2193ms 31.0373 Ops/s 31.1539 Ops/s $\color{#d91a1a}-0.37\%$
test_iql_speed[True-None] 9.5855ms 9.0064ms 111.0320 Ops/s 108.8117 Ops/s $\color{#35bf28}+2.04\%$
test_iql_speed[True-backward] 18.2988ms 17.7553ms 56.3214 Ops/s 57.3582 Ops/s $\color{#d91a1a}-1.81\%$
test_iql_speed[reduce-overhead-None] 9.4582ms 9.1618ms 109.1486 Ops/s 109.8446 Ops/s $\color{#d91a1a}-0.63\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.2947ms 6.1357ms 162.9800 Ops/s 162.3352 Ops/s $\color{#35bf28}+0.40\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3.2697ms 0.3571ms 2.8001 KOps/s 2.7848 KOps/s $\color{#35bf28}+0.55\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5397ms 0.2897ms 3.4515 KOps/s 3.0500 KOps/s $\textbf{\color{#35bf28}+13.16\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1450ms 5.9021ms 169.4319 Ops/s 169.5920 Ops/s $\color{#d91a1a}-0.09\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.9030s 0.8902ms 1.1233 KOps/s 3.3533 KOps/s $\textbf{\color{#d91a1a}-66.50\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5496ms 0.2819ms 3.5473 KOps/s 3.5638 KOps/s $\color{#d91a1a}-0.46\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.5669ms 1.3448ms 743.6202 Ops/s 751.3671 Ops/s $\color{#d91a1a}-1.03\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.5734ms 1.2576ms 795.1506 Ops/s 790.7876 Ops/s $\color{#35bf28}+0.55\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.1478ms 6.0007ms 166.6482 Ops/s 165.3813 Ops/s $\color{#35bf28}+0.77\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9710ms 0.4575ms 2.1856 KOps/s 2.1201 KOps/s $\color{#35bf28}+3.09\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6819ms 0.4426ms 2.2594 KOps/s 2.2430 KOps/s $\color{#35bf28}+0.73\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0043ms 5.8815ms 170.0246 Ops/s 170.3428 Ops/s $\color{#d91a1a}-0.19\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.0202ms 0.3038ms 3.2920 KOps/s 3.2477 KOps/s $\color{#35bf28}+1.36\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4910ms 0.2841ms 3.5200 KOps/s 3.5270 KOps/s $\color{#d91a1a}-0.20\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1188ms 5.8622ms 170.5830 Ops/s 170.5647 Ops/s $\color{#35bf28}+0.01\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.0447ms 0.3894ms 2.5680 KOps/s 2.7976 KOps/s $\textbf{\color{#d91a1a}-8.21\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6143ms 0.3758ms 2.6612 KOps/s 3.2725 KOps/s $\textbf{\color{#d91a1a}-18.68\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.1698ms 6.0044ms 166.5443 Ops/s 166.1701 Ops/s $\color{#35bf28}+0.23\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.3386ms 0.5283ms 1.8930 KOps/s 2.1113 KOps/s $\textbf{\color{#d91a1a}-10.34\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8696ms 0.4657ms 2.1472 KOps/s 2.1441 KOps/s $\color{#35bf28}+0.15\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.5629ms 5.0912ms 196.4182 Ops/s 49.1246 Ops/s $\textbf{\color{#35bf28}+299.84\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 0.6620s 15.2251ms 65.6810 Ops/s 477.3847 Ops/s $\textbf{\color{#d91a1a}-86.24\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 2.2104ms 1.1162ms 895.9298 Ops/s 805.3299 Ops/s $\textbf{\color{#35bf28}+11.25\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 9.4813ms 5.1204ms 195.2989 Ops/s 189.4439 Ops/s $\color{#35bf28}+3.09\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.8616ms 1.8044ms 554.2157 Ops/s 489.6614 Ops/s $\textbf{\color{#35bf28}+13.18\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 12.0153ms 1.3039ms 766.9523 Ops/s 856.5401 Ops/s $\textbf{\color{#d91a1a}-10.46\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 8.3842ms 5.2613ms 190.0660 Ops/s 180.2095 Ops/s $\textbf{\color{#35bf28}+5.47\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 8.7806ms 2.0834ms 479.9773 Ops/s 499.4787 Ops/s $\color{#d91a1a}-3.90\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.2900ms 1.2597ms 793.8434 Ops/s 906.4688 Ops/s $\textbf{\color{#d91a1a}-12.42\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 43.9278ms 39.5098ms 25.3102 Ops/s 24.9602 Ops/s $\color{#35bf28}+1.40\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.9240ms 18.5902ms 53.7918 Ops/s 53.6081 Ops/s $\color{#35bf28}+0.34\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 44.6159ms 40.7893ms 24.5162 Ops/s 24.1939 Ops/s $\color{#35bf28}+1.33\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.1919ms 18.7967ms 53.2008 Ops/s 54.2914 Ops/s $\color{#d91a1a}-2.01\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 44.2357ms 42.5537ms 23.4997 Ops/s 23.2993 Ops/s $\color{#35bf28}+0.86\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 22.0009ms 20.6110ms 48.5178 Ops/s 48.2872 Ops/s $\color{#35bf28}+0.48\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8593ms 0.2186ms 4.5738 KOps/s 4.4022 KOps/s $\color{#35bf28}+3.90\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.6612ms 1.4638ms 683.1552 Ops/s 650.9520 Ops/s $\color{#35bf28}+4.95\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.8068ms 2.4802ms 403.1873 Ops/s 398.8060 Ops/s $\color{#35bf28}+1.10\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.3401ms 3.0795ms 324.7280 Ops/s 314.8853 Ops/s $\color{#35bf28}+3.13\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2591ms 0.1370ms 7.3011 KOps/s 7.3227 KOps/s $\color{#d91a1a}-0.29\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3907ms 0.1985ms 5.0388 KOps/s 5.1640 KOps/s $\color{#d91a1a}-2.42\%$
test_storage_write_contiguous[100-img_shape2-large_img] 2.1252ms 1.8988ms 526.6394 Ops/s 518.8622 Ops/s $\color{#35bf28}+1.50\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.5863ms 1.3638ms 733.2519 Ops/s 697.9019 Ops/s $\textbf{\color{#35bf28}+5.07\%}$
test_collector_stack_then_write[50-img_shape0-small] 1.3077ms 1.1429ms 874.9712 Ops/s 880.0958 Ops/s $\color{#d91a1a}-0.58\%$
test_collector_stack_then_write[100-img_shape1-atari] 7.7084ms 3.7486ms 266.7673 Ops/s 269.7421 Ops/s $\color{#d91a1a}-1.10\%$
test_collector_stack_then_write[100-img_shape2-large_img] 11.6719ms 6.0427ms 165.4880 Ops/s 163.3328 Ops/s $\color{#35bf28}+1.32\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.7947ms 7.2841ms 137.2845 Ops/s 126.7655 Ops/s $\textbf{\color{#35bf28}+8.30\%}$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4420ms 0.2844ms 3.5156 KOps/s 3.6405 KOps/s $\color{#d91a1a}-3.43\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.7920ms 1.6309ms 613.1524 Ops/s 596.1265 Ops/s $\color{#35bf28}+2.86\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.9922ms 2.5362ms 394.2852 Ops/s 369.9692 Ops/s $\textbf{\color{#35bf28}+6.57\%}$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.4507ms 3.2955ms 303.4463 Ops/s 291.6487 Ops/s $\color{#35bf28}+4.05\%$
test_collector_without_rb[100-img_shape0-atari] 34.2958ms 33.8969ms 29.5012 Ops/s 29.0533 Ops/s $\color{#35bf28}+1.54\%$
test_collector_without_rb[200-img_shape1-large_batch] 66.9949ms 66.4293ms 15.0536 Ops/s 14.8902 Ops/s $\color{#35bf28}+1.10\%$
test_collector_with_rb[100-img_shape0-atari] 39.0489ms 38.4275ms 26.0231 Ops/s 25.8878 Ops/s $\color{#35bf28}+0.52\%$
test_collector_with_rb[200-img_shape1-large_batch] 75.8446ms 75.3176ms 13.2771 Ops/s 13.2172 Ops/s $\color{#35bf28}+0.45\%$

Adds a unified Evaluator with pluggable thread/Ray backends that
decouples evaluation from the training loop. Supports blocking
evaluate(), fire-and-forget trigger_eval()/poll()/wait(), automatic
logging, custom metrics, and video recording.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vmoens vmoens force-pushed the async-evaluator branch from a883a1c to 1d69b6d Compare April 5, 2026 10:04
vmoens and others added 2 commits April 6, 2026 08:49
…ti-GPU tests

- Move video dump and all logger writes to the caller thread (poll/wait/
  evaluate), so the background eval thread never touches the logger
- Expose step provenance in returned metrics as "<prefix>/step"
- Add shutdown-with-inflight-eval test to catch potential hangs
- Add multi-GPU tests (cuda:0 train, cuda:1 eval) for 2+ GPU CI
- Document backpressure/overlap semantics, device placement, and
  compilation in docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vmoens vmoens merged commit f54a7c7 into main Apr 6, 2026
125 of 127 checks passed
@vmoens vmoens deleted the async-evaluator branch April 20, 2026 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Collectors Documentation Improvements or additions to documentation Feature New feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant