Add time-based execution, megapixel scoring, and fast-jobs-first ordering to VideoTranscodeBench (#565)#565
Open
marziehlenjaniMeta wants to merge 1 commit intofacebookresearch:v2-betafrom
Conversation
|
@marziehlenjaniMeta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D98639355. |
c3dd00f to
1b06805
Compare
…ring to VideoTranscodeBench (facebookresearch#565) Summary: Pull Request resolved: facebookresearch#565 Motivation The existing video_transcode_bench_svt_mini job uses --sample-rate 0.01 to reduce runtime, but this approach has fundamental problems on high-core-count machines: 1. Core underutilization: With only ~1% of clips sampled, the total number of encode jobs (clips x resolutions x CRF values) can be fewer than available cores. On a 72-core machine, many cores sit idle for the entire run — the benchmark measures a fraction of the machine's capacity. 2. Score instability: The throughput-based score (GB/s) varies significantly with sample-rate because different subsets of clips have different total sizes and encoding characteristics. A 1% sample gives a different score than a 10% sample, making cross-run comparisons unreliable. 3. Unrepresentative workload: Sampling removes clips rather than shortening the run, so the remaining workload may not reflect the full distribution of resolutions and content types. A time-based approach solves all three problems: use the full dataset (sample-rate=1.0) so all cores stay saturated, and cap runtime with a time limit instead. This ensures every machine — regardless of core count — runs at full utilization for a consistent, predictable duration. Additional challenges required further changes: 4. Score depends on which jobs complete: With a time limit, only a subset of jobs finish. The GB/s throughput metric is biased by resolution (1080p jobs contribute more bytes than 144p for similar compute). A new megapixel-based metric (MPx/s) normalizes across resolutions, making the score stable regardless of which jobs complete. 5. Slow drain phase: Jobs are ordered large-to-small by default. With a short time limit, all slots fill with slow 1080p jobs, few complete before the deadline, and in-flight jobs take 30-60s to drain. A new --fast-jobs-first option reverses the order, maximizing completions and reducing drain to 1-3s. 6. Inflated total_data_encoded: The previous calculation counted all input clips regardless of whether they were actually encoded. Now derived from joblog data — only successfully completed, deduplicated input files are counted. Summary Adds a time-bounded execution mode to VideoTranscodeBench that caps benchmark runtime via SIGTERM-based parallel job control, along with a resolution-normalized megapixel scoring metric and an option to prioritize fast jobs. Key changes: - timed_parallel_feeder.sh (new): Wraps GNU parallel with a time limit. When max_time > 0, runs parallel in the background and sends SIGTERM after the deadline, allowing in-flight jobs to finish gracefully while preventing new jobs from spawning. Produces a joblog for post-hoc analysis. When max_time = 0, behaves identically to the original flow. - Megapixel score mode (--score-mode megapixel): Computes throughput as MPx/s by extracting resolution and frame count from input filenames in the joblog. This normalizes across resolutions, making scores stable regardless of which job subset completes in timed runs. Baseline derived mathematically from existing throughput baseline: 86.12 MPx/s. - Effective time from joblogs (megapixel mode only): When score_mode=megapixel, uses sum(JobRuntime for successful jobs) / num_parallel_slots instead of wall-clock time. The default throughput mode continues to use wall-clock time, preserving backward compatibility with existing baselines. - total_data_encoded fix: Now derives encoded data size from unique input files of successful jobs (via joblog), instead of counting all clips regardless of completion. Deduplicates across CRF values with sort -u. - --fast-jobs-first: Reverses command file ordering (via tac) so small/fast jobs run first. For short timed runs this maximizes job completions and minimizes the drain phase (1-3s vs 30-60s). - Two new job definitions in jobs.yml: - video_transcode_bench_svt_timed: 600s limit, megapixel scoring, fast-jobs-first - video_transcode_bench_svt_timed_mini: 15s limit, megapixel scoring, fast-jobs-first - Backward compatible: Existing jobs are unchanged — max_time=0 means no limit, throughput GB/s scoring with wall-clock time, original job ordering. Reviewed By: YifanYuan3 Differential Revision: D98639355
1b06805 to
960befa
Compare
meta-codesync bot
pushed a commit
that referenced
this pull request
Apr 7, 2026
…ring to VideoTranscodeBench (#565) Summary: Pull Request resolved: #565 Motivation The existing video_transcode_bench_svt_mini job uses --sample-rate 0.01 to reduce runtime, but this approach has fundamental problems on high-core-count machines: 1. Core underutilization: With only ~1% of clips sampled, the total number of encode jobs (clips x resolutions x CRF values) can be fewer than available cores. On a 72-core machine, many cores sit idle for the entire run — the benchmark measures a fraction of the machine's capacity. 2. Score instability: The throughput-based score (GB/s) varies significantly with sample-rate because different subsets of clips have different total sizes and encoding characteristics. A 1% sample gives a different score than a 10% sample, making cross-run comparisons unreliable. 3. Unrepresentative workload: Sampling removes clips rather than shortening the run, so the remaining workload may not reflect the full distribution of resolutions and content types. A time-based approach solves all three problems: use the full dataset (sample-rate=1.0) so all cores stay saturated, and cap runtime with a time limit instead. This ensures every machine — regardless of core count — runs at full utilization for a consistent, predictable duration. Additional challenges required further changes: 4. Score depends on which jobs complete: With a time limit, only a subset of jobs finish. The GB/s throughput metric is biased by resolution (1080p jobs contribute more bytes than 144p for similar compute). A new megapixel-based metric (MPx/s) normalizes across resolutions, making the score stable regardless of which jobs complete. 5. Slow drain phase: Jobs are ordered large-to-small by default. With a short time limit, all slots fill with slow 1080p jobs, few complete before the deadline, and in-flight jobs take 30-60s to drain. A new --fast-jobs-first option reverses the order, maximizing completions and reducing drain to 1-3s. 6. Inflated total_data_encoded: The previous calculation counted all input clips regardless of whether they were actually encoded. Now derived from joblog data — only successfully completed, deduplicated input files are counted. Summary Adds a time-bounded execution mode to VideoTranscodeBench that caps benchmark runtime via SIGTERM-based parallel job control, along with a resolution-normalized megapixel scoring metric and an option to prioritize fast jobs. Key changes: - timed_parallel_feeder.sh (new): Wraps GNU parallel with a time limit. When max_time > 0, runs parallel in the background and sends SIGTERM after the deadline, allowing in-flight jobs to finish gracefully while preventing new jobs from spawning. Produces a joblog for post-hoc analysis. When max_time = 0, behaves identically to the original flow. - Megapixel score mode (--score-mode megapixel): Computes throughput as MPx/s by extracting resolution and frame count from input filenames in the joblog. This normalizes across resolutions, making scores stable regardless of which job subset completes in timed runs. Baseline derived mathematically from existing throughput baseline: 86.12 MPx/s. - Effective time from joblogs (megapixel mode only): When score_mode=megapixel, uses sum(JobRuntime for successful jobs) / num_parallel_slots instead of wall-clock time. The default throughput mode continues to use wall-clock time, preserving backward compatibility with existing baselines. - total_data_encoded fix: Now derives encoded data size from unique input files of successful jobs (via joblog), instead of counting all clips regardless of completion. Deduplicates across CRF values with sort -u. - --fast-jobs-first: Reverses command file ordering (via tac) so small/fast jobs run first. For short timed runs this maximizes job completions and minimizes the drain phase (1-3s vs 30-60s). - Two new job definitions in jobs.yml: - video_transcode_bench_svt_timed: 600s limit, megapixel scoring, fast-jobs-first - video_transcode_bench_svt_timed_mini: 15s limit, megapixel scoring, fast-jobs-first - Backward compatible: Existing jobs are unchanged — max_time=0 means no limit, throughput GB/s scoring with wall-clock time, original job ordering. Reviewed By: YifanYuan3 Differential Revision: D98639355 fbshipit-source-id: cad50ea6d9e123366a61f4990f27bdb9ebb57859
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Motivation
The existing video_transcode_bench_svt_mini job uses --sample-rate 0.01
to reduce runtime, but this approach has fundamental problems on
high-core-count machines:
number of encode jobs (clips x resolutions x CRF values) can be fewer
than available cores. On a 72-core machine, many cores sit idle for the
entire run — the benchmark measures a fraction of the machine's
capacity.
significantly with sample-rate because different subsets of clips have
different total sizes and encoding characteristics. A 1% sample gives a
different score than a 10% sample, making cross-run comparisons
unreliable.
shortening the run, so the remaining workload may not reflect the full
distribution of resolutions and content types.
A time-based approach solves all three problems: use the full dataset
(sample-rate=1.0) so all cores stay saturated, and cap runtime with a
time limit instead. This ensures every machine — regardless of core
count — runs at full utilization for a consistent, predictable duration.
Additional challenges required further changes:
subset of jobs finish. The GB/s throughput metric is biased by
resolution (1080p jobs contribute more bytes than 144p for similar
compute). A new megapixel-based metric (MPx/s) normalizes across
resolutions, making the score stable regardless of which jobs complete.
short time limit, all slots fill with slow 1080p jobs, few complete
before the deadline, and in-flight jobs take 30-60s to drain. A new
--fast-jobs-first option reverses the order, maximizing completions and
reducing drain to 1-3s.
input clips regardless of whether they were actually encoded. Now
derived from joblog data — only successfully completed, deduplicated
input files are counted.
Summary
Adds a time-bounded execution mode to VideoTranscodeBench that caps
benchmark runtime via SIGTERM-based parallel job control, along with a
resolution-normalized megapixel scoring metric and an option to
prioritize fast jobs.
Key changes:
When max_time > 0, runs parallel in the background and sends SIGTERM
after the deadline, allowing in-flight jobs to finish gracefully while
preventing new jobs from spawning. Produces a joblog for post-hoc
analysis. When max_time = 0, behaves identically to the original flow.
MPx/s by extracting resolution and frame count from input filenames in
the joblog. This normalizes across resolutions, making scores stable
regardless of which job subset completes in timed runs. Baseline derived
mathematically from existing throughput baseline: 86.12 MPx/s.
score_mode=megapixel, uses sum(JobRuntime for successful jobs) /
num_parallel_slots instead of wall-clock time. The default throughput
mode continues to use wall-clock time, preserving backward compatibility
with existing baselines.
input files of successful jobs (via joblog), instead of counting all
clips regardless of completion. Deduplicates across CRF values with sort
-u.
small/fast jobs run first. For short timed runs this maximizes job
completions and minimizes the drain phase (1-3s vs 30-60s).
fast-jobs-first
fast-jobs-first
limit, throughput GB/s scoring with wall-clock time, original job
ordering.
Reviewed By: YifanYuan3
Differential Revision: D98639355