Add time-based execution, megapixel scoring, and fast-jobs-first ordering to VideoTranscodeBench (#565) by marziehlenjaniMeta · Pull Request #565 · facebookresearch/DCPerf

marziehlenjaniMeta · 2026-04-07T00:03:33Z

Summary:

Motivation
The existing video_transcode_bench_svt_mini job uses --sample-rate 0.01
to reduce runtime, but this approach has fundamental problems on
high-core-count machines:

Core underutilization: With only ~1% of clips sampled, the total
number of encode jobs (clips x resolutions x CRF values) can be fewer
than available cores. On a 72-core machine, many cores sit idle for the
entire run — the benchmark measures a fraction of the machine's
capacity.
Score instability: The throughput-based score (GB/s) varies
significantly with sample-rate because different subsets of clips have
different total sizes and encoding characteristics. A 1% sample gives a
different score than a 10% sample, making cross-run comparisons
unreliable.
Unrepresentative workload: Sampling removes clips rather than
shortening the run, so the remaining workload may not reflect the full
distribution of resolutions and content types.

A time-based approach solves all three problems: use the full dataset
(sample-rate=1.0) so all cores stay saturated, and cap runtime with a
time limit instead. This ensures every machine — regardless of core
count — runs at full utilization for a consistent, predictable duration.

Additional challenges required further changes:

Score depends on which jobs complete: With a time limit, only a
subset of jobs finish. The GB/s throughput metric is biased by
resolution (1080p jobs contribute more bytes than 144p for similar
compute). A new megapixel-based metric (MPx/s) normalizes across
resolutions, making the score stable regardless of which jobs complete.
Slow drain phase: Jobs are ordered large-to-small by default. With a
short time limit, all slots fill with slow 1080p jobs, few complete
before the deadline, and in-flight jobs take 30-60s to drain. A new
--fast-jobs-first option reverses the order, maximizing completions and
reducing drain to 1-3s.
Inflated total_data_encoded: The previous calculation counted all
input clips regardless of whether they were actually encoded. Now
derived from joblog data — only successfully completed, deduplicated
input files are counted.

Summary

Adds a time-bounded execution mode to VideoTranscodeBench that caps
benchmark runtime via SIGTERM-based parallel job control, along with a
resolution-normalized megapixel scoring metric and an option to
prioritize fast jobs.

Key changes:

timed_parallel_feeder.sh (new): Wraps GNU parallel with a time limit.
When max_time > 0, runs parallel in the background and sends SIGTERM
after the deadline, allowing in-flight jobs to finish gracefully while
preventing new jobs from spawning. Produces a joblog for post-hoc
analysis. When max_time = 0, behaves identically to the original flow.
Megapixel score mode (--score-mode megapixel): Computes throughput as
MPx/s by extracting resolution and frame count from input filenames in
the joblog. This normalizes across resolutions, making scores stable
regardless of which job subset completes in timed runs. Baseline derived
mathematically from existing throughput baseline: 86.12 MPx/s.
Effective time from joblogs (megapixel mode only): When
score_mode=megapixel, uses sum(JobRuntime for successful jobs) /
num_parallel_slots instead of wall-clock time. The default throughput
mode continues to use wall-clock time, preserving backward compatibility
with existing baselines.
total_data_encoded fix: Now derives encoded data size from unique
input files of successful jobs (via joblog), instead of counting all
clips regardless of completion. Deduplicates across CRF values with sort
-u.
--fast-jobs-first: Reverses command file ordering (via tac) so
small/fast jobs run first. For short timed runs this maximizes job
completions and minimizes the drain phase (1-3s vs 30-60s).
Two new job definitions in jobs.yml:
- video_transcode_bench_svt_timed: 600s limit, megapixel scoring,
  fast-jobs-first
- video_transcode_bench_svt_timed_mini: 15s limit, megapixel scoring,
  fast-jobs-first
Backward compatible: Existing jobs are unchanged — max_time=0 means no
limit, throughput GB/s scoring with wall-clock time, original job
ordering.

Reviewed By: YifanYuan3

Differential Revision: D98639355

meta-codesync · 2026-04-07T00:03:43Z

@marziehlenjaniMeta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D98639355.

…ring to VideoTranscodeBench (facebookresearch#565) Summary: Pull Request resolved: facebookresearch#565 Motivation The existing video_transcode_bench_svt_mini job uses --sample-rate 0.01 to reduce runtime, but this approach has fundamental problems on high-core-count machines: 1. Core underutilization: With only ~1% of clips sampled, the total number of encode jobs (clips x resolutions x CRF values) can be fewer than available cores. On a 72-core machine, many cores sit idle for the entire run — the benchmark measures a fraction of the machine's capacity. 2. Score instability: The throughput-based score (GB/s) varies significantly with sample-rate because different subsets of clips have different total sizes and encoding characteristics. A 1% sample gives a different score than a 10% sample, making cross-run comparisons unreliable. 3. Unrepresentative workload: Sampling removes clips rather than shortening the run, so the remaining workload may not reflect the full distribution of resolutions and content types. A time-based approach solves all three problems: use the full dataset (sample-rate=1.0) so all cores stay saturated, and cap runtime with a time limit instead. This ensures every machine — regardless of core count — runs at full utilization for a consistent, predictable duration. Additional challenges required further changes: 4. Score depends on which jobs complete: With a time limit, only a subset of jobs finish. The GB/s throughput metric is biased by resolution (1080p jobs contribute more bytes than 144p for similar compute). A new megapixel-based metric (MPx/s) normalizes across resolutions, making the score stable regardless of which jobs complete. 5. Slow drain phase: Jobs are ordered large-to-small by default. With a short time limit, all slots fill with slow 1080p jobs, few complete before the deadline, and in-flight jobs take 30-60s to drain. A new --fast-jobs-first option reverses the order, maximizing completions and reducing drain to 1-3s. 6. Inflated total_data_encoded: The previous calculation counted all input clips regardless of whether they were actually encoded. Now derived from joblog data — only successfully completed, deduplicated input files are counted. Summary Adds a time-bounded execution mode to VideoTranscodeBench that caps benchmark runtime via SIGTERM-based parallel job control, along with a resolution-normalized megapixel scoring metric and an option to prioritize fast jobs. Key changes: - timed_parallel_feeder.sh (new): Wraps GNU parallel with a time limit. When max_time > 0, runs parallel in the background and sends SIGTERM after the deadline, allowing in-flight jobs to finish gracefully while preventing new jobs from spawning. Produces a joblog for post-hoc analysis. When max_time = 0, behaves identically to the original flow. - Megapixel score mode (--score-mode megapixel): Computes throughput as MPx/s by extracting resolution and frame count from input filenames in the joblog. This normalizes across resolutions, making scores stable regardless of which job subset completes in timed runs. Baseline derived mathematically from existing throughput baseline: 86.12 MPx/s. - Effective time from joblogs (megapixel mode only): When score_mode=megapixel, uses sum(JobRuntime for successful jobs) / num_parallel_slots instead of wall-clock time. The default throughput mode continues to use wall-clock time, preserving backward compatibility with existing baselines. - total_data_encoded fix: Now derives encoded data size from unique input files of successful jobs (via joblog), instead of counting all clips regardless of completion. Deduplicates across CRF values with sort -u. - --fast-jobs-first: Reverses command file ordering (via tac) so small/fast jobs run first. For short timed runs this maximizes job completions and minimizes the drain phase (1-3s vs 30-60s). - Two new job definitions in jobs.yml: - video_transcode_bench_svt_timed: 600s limit, megapixel scoring, fast-jobs-first - video_transcode_bench_svt_timed_mini: 15s limit, megapixel scoring, fast-jobs-first - Backward compatible: Existing jobs are unchanged — max_time=0 means no limit, throughput GB/s scoring with wall-clock time, original job ordering. Reviewed By: YifanYuan3 Differential Revision: D98639355

…ring to VideoTranscodeBench (#565) Summary: Pull Request resolved: #565 Motivation The existing video_transcode_bench_svt_mini job uses --sample-rate 0.01 to reduce runtime, but this approach has fundamental problems on high-core-count machines: 1. Core underutilization: With only ~1% of clips sampled, the total number of encode jobs (clips x resolutions x CRF values) can be fewer than available cores. On a 72-core machine, many cores sit idle for the entire run — the benchmark measures a fraction of the machine's capacity. 2. Score instability: The throughput-based score (GB/s) varies significantly with sample-rate because different subsets of clips have different total sizes and encoding characteristics. A 1% sample gives a different score than a 10% sample, making cross-run comparisons unreliable. 3. Unrepresentative workload: Sampling removes clips rather than shortening the run, so the remaining workload may not reflect the full distribution of resolutions and content types. A time-based approach solves all three problems: use the full dataset (sample-rate=1.0) so all cores stay saturated, and cap runtime with a time limit instead. This ensures every machine — regardless of core count — runs at full utilization for a consistent, predictable duration. Additional challenges required further changes: 4. Score depends on which jobs complete: With a time limit, only a subset of jobs finish. The GB/s throughput metric is biased by resolution (1080p jobs contribute more bytes than 144p for similar compute). A new megapixel-based metric (MPx/s) normalizes across resolutions, making the score stable regardless of which jobs complete. 5. Slow drain phase: Jobs are ordered large-to-small by default. With a short time limit, all slots fill with slow 1080p jobs, few complete before the deadline, and in-flight jobs take 30-60s to drain. A new --fast-jobs-first option reverses the order, maximizing completions and reducing drain to 1-3s. 6. Inflated total_data_encoded: The previous calculation counted all input clips regardless of whether they were actually encoded. Now derived from joblog data — only successfully completed, deduplicated input files are counted. Summary Adds a time-bounded execution mode to VideoTranscodeBench that caps benchmark runtime via SIGTERM-based parallel job control, along with a resolution-normalized megapixel scoring metric and an option to prioritize fast jobs. Key changes: - timed_parallel_feeder.sh (new): Wraps GNU parallel with a time limit. When max_time > 0, runs parallel in the background and sends SIGTERM after the deadline, allowing in-flight jobs to finish gracefully while preventing new jobs from spawning. Produces a joblog for post-hoc analysis. When max_time = 0, behaves identically to the original flow. - Megapixel score mode (--score-mode megapixel): Computes throughput as MPx/s by extracting resolution and frame count from input filenames in the joblog. This normalizes across resolutions, making scores stable regardless of which job subset completes in timed runs. Baseline derived mathematically from existing throughput baseline: 86.12 MPx/s. - Effective time from joblogs (megapixel mode only): When score_mode=megapixel, uses sum(JobRuntime for successful jobs) / num_parallel_slots instead of wall-clock time. The default throughput mode continues to use wall-clock time, preserving backward compatibility with existing baselines. - total_data_encoded fix: Now derives encoded data size from unique input files of successful jobs (via joblog), instead of counting all clips regardless of completion. Deduplicates across CRF values with sort -u. - --fast-jobs-first: Reverses command file ordering (via tac) so small/fast jobs run first. For short timed runs this maximizes job completions and minimizes the drain phase (1-3s vs 30-60s). - Two new job definitions in jobs.yml: - video_transcode_bench_svt_timed: 600s limit, megapixel scoring, fast-jobs-first - video_transcode_bench_svt_timed_mini: 15s limit, megapixel scoring, fast-jobs-first - Backward compatible: Existing jobs are unchanged — max_time=0 means no limit, throughput GB/s scoring with wall-clock time, original job ordering. Reviewed By: YifanYuan3 Differential Revision: D98639355 fbshipit-source-id: cad50ea6d9e123366a61f4990f27bdb9ebb57859

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 7, 2026

meta-codesync bot added fb-exported meta-exported labels Apr 7, 2026

marziehlenjaniMeta force-pushed the export-D98639355-to-v2-beta branch 4 times, most recently from c3dd00f to 1b06805 Compare April 7, 2026 01:42

meta-codesync bot changed the title ~~Add time-based execution, megapixel scoring, and fast-jobs-first ordering to VideoTranscodeBench~~ Add time-based execution, megapixel scoring, and fast-jobs-first ordering to VideoTranscodeBench Apr 7, 2026

meta-codesync bot changed the title ~~Add time-based execution, megapixel scoring, and fast-jobs-first ordering to VideoTranscodeBench~~ Add time-based execution, megapixel scoring, and fast-jobs-first ordering to VideoTranscodeBench (#565) Apr 7, 2026

marziehlenjaniMeta force-pushed the export-D98639355-to-v2-beta branch from 1b06805 to 960befa Compare April 7, 2026 01:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add time-based execution, megapixel scoring, and fast-jobs-first ordering to VideoTranscodeBench (#565)#565

Add time-based execution, megapixel scoring, and fast-jobs-first ordering to VideoTranscodeBench (#565)#565
marziehlenjaniMeta wants to merge 1 commit intofacebookresearch:v2-betafrom
marziehlenjaniMeta:export-D98639355-to-v2-beta

marziehlenjaniMeta commented Apr 7, 2026 •

edited by meta-codesync bot

Loading

Uh oh!

meta-codesync bot commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

marziehlenjaniMeta commented Apr 7, 2026 • edited by meta-codesync bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-codesync bot commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

marziehlenjaniMeta commented Apr 7, 2026 •

edited by meta-codesync bot

Loading