-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eliminating bounds checks in the YUV->RGBA conversion #6
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a number of pathological cases that could trigger unsafety - they all have to do with mismatches between the chroma and luma input sizes. We don't have type-level assurances that the three arrays are compatibly sized, so we'll need some additional bounds checks.
I also would like to see some of the internal functions documented for their safety requirements, I've flagged those as well.
yuv/src/bt601.rs
Outdated
@@ -131,62 +180,60 @@ pub fn yuv420_to_rgba( | |||
let y_height = y.len() / y_width; | |||
let br_height = chroma_b.len() / br_width; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to hand this function a chroma_r
that is too short for it's given geometry. Since this is the entry point for the whole crate, it's unsound as-is. (For the record, this function's use of y
and chroma_b
appear to be sound; since we derive different bounds for it based on it's length.)
We should either have separate b_height
and r_height
calculations and variables, or return an error if chroma_r
is too short. I'd prefer the former over the latter but I'm not sure what the performance impact is in this case.
@@ -40,31 +38,31 @@ fn sample_chroma_for_luma( | |||
(luma_y as i32 - 1) / 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sample_chroma_for_luma
needs to be marked as unsafe
, as it has several invariants that must be fulfilled in order for code that uses it to remain sound:
- The
chroma_width
andchroma_height
parameters must be derived from the length ofchroma
in such a way that their product cannot exceed that length, otherwise clamping will fail to prohibit out-of-bounds reads - If
clamp
is off, thenluma_x
andluma_y
must already be bounds-checked to twice thechroma_width
andchroma_height
(respectively) in order to prohibit out-of-bounds reads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These invariants should also be documented in the function's doccomment
rgba[base + 2] = clamp(b); | ||
rgba[base + 3] = 255; | ||
#[inline] | ||
fn convert_and_write_pixel(yuv: (u8, u8, u8), rgba: &mut Vec<u8>, base: usize, luts: &LUTs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convert_and_write_pixel
should also be marked as unsafe
, as it has an invariant must be fulfilled for callers to remain sound:
base
must be less than the length ofrgba
minus four.
This invariant should also be doccommented.
Furthermore, rgba
itself is not being reallocated, so it should be passed as &mut [u8]
instead of &mut Vec<u8>
.
yuv/src/bt601.rs
Outdated
let b_sample = | ||
sample_chroma_for_luma(chroma_b, br_width, br_height, x_pos, y_pos, false) as f32; | ||
sample_chroma_for_luma(chroma_b, br_width, br_height, x_pos, y_pos, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pathologically small chroma_b
or chroma_r
inputs to this function are unsound, as the unclamped sampling will read outside their bounds. x_pos
and y_pos
are derived from the width and height of luma
, but there is nothing to ensure that chroma_b
or chroma_r
's geometry is compatible with luma
's. As a result, this code can read out of bounds.
yuv/src/bt601.rs
Outdated
} | ||
base += 8; // skipping the rightmost pixel, and the leftmost pixel in the next row | ||
} | ||
|
||
// doing the sides with clamping | ||
for y_pos in 0..y_height { | ||
for x_pos in [0, y_width - 1].iter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pathologically small inputs (specifically, 1px wide videos) will cause a panic here. This should instead be a saturating_sub
, so that in this case, the conversion will merely harmlessly run twice.
As far as I can tell, the use of the unsafe functions below is still sound, since clamp
ing is on. We may still want to test this somehow.
This may impact performance; depending on how LLVM's optimizer is feeling about loop-invariant code motion today.
for x_pos in 0..y_width { | ||
for y_pos in [0, y_height - 1].iter() { | ||
let y_sample = y.get(x_pos + y_pos * y_width).copied().unwrap_or(0) as f32; | ||
for y_pos in [0, y_height - 1].iter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can also panic on pathologically small video heights.
Before addressing the comments, let me just add a note that the H.263 ITU Recommendation has this paragraph in it:
Adhering to the "decoded in the same manner as if the width or height had the next larger size that would be divisible by 16" part could simplify some loops in the IDCT and gather phases - but is not relevant for this PR. The last part of that sentence is relevant however: "and then the picture is cropped at the right and the |
My guess would be that the cropping is supposed to happen after the YUV 420 conversion, because that removes ambiguity about how to handle odd-sized luma pictures. That being said, while the H.263 specification prohibits the pathological inputs I mentioned, the PR doesn't. It's entirely possible this code winds up getting reused for other video formats that are less restrictive with custom picture sizes. We don't need to correctly decode all invalid picture sizes, but we do need to reject them before any |
Superseded by #9. |
These changes together yield an overall ∼21% reduction of the time(outdated)run_frame()
takes, on average.Measured on the self-hosted web build, running the first 100 frames of z0r loops 3664, 4449, and 7081; but discarding the first 20 frame times of each loop before averaging - to let the WASM JIT warm up, and the numbers stabilize a little.
The results:
The first and last rows are about the current state, the middle rows are some "milestone" commits of this PR. The X axis is milliseconds. Every measurement was done four times.
The benchmarks were done in Chromium 94, on a Ryzen 2700X (using
playwright
and thetemci
tool).The first two commits I'm fairly confident in being "good", the middle one is just "okay", and the last two "should be fine, I think".(outdated)More details about the changes in the added comments.