yuv: Drop the LUTs, do integer-only (fixed-point), SIMD-capable arithmetic instead #15

torokati44 · 2021-12-17T22:56:37Z

This, somewhat surprisingly, already speeds up the conversion function by a factor of about 10% on the web target in itself.
But the more important thing is that it's SIMD-capable. And that target feature (SSE2) is enabled by default on desktop targets, AFAIK.
All the tests still pass (in fact I had to add a couple more), and the sample videos I tested still look completely fine to me.

Once ruffle-rs/ruffle#5834 is merged, it should speed it up (on web, in capable browsers) by an additional factor of 2x. Why only 2x, and not 4x, you ask? Well, I don't know. Maybe we should ask our friend, Amdahl. I'm not complaining though, it's still a nice uplift.

I've also experimented with 16-bit intermediate precision, but it's just barely not accurate enough for my taste, and isn't any faster. And using i32x4 also allows the neat transpose trick at the end.

Nor is doing a 2x2 group of pixels together faster, even though the chroma samples can be just splatted across all lanes, the additional shuffling in memory and more complicated iteration most probably negate that.

Additionally, if we were really serious about performance, we could also use bytemuck::pod_align_to, but since the h.263 decoder chops off the luma samples to odd widths in some cases, it would make lining up the chroma samples to it kinda awkward. And the WASM VM might also not even JIT these loads into the faster aligned versions of the native instructions...

…metic instead

torokati44 · 2021-12-17T23:00:31Z

Let me once again link to some high(er(ish)) resolution video content to benchmark on:
https://z0r.de/4449
https://z0r.de/3664
https://z0r.de/3946
https://z0r.de/7311
https://z0r.de/5092
https://z0r.de/7214

adrian17 · 2021-12-29T23:50:01Z

FYI, the crate references in main repo will need to be updated after this.

yuv: Drop the LUTs, do integer-only (fixed-point), SIMD-capable arith…

a733650

…metic instead

adrian17 approved these changes Dec 29, 2021

View reviewed changes

adrian17 merged commit b810e8c into ruffle-rs:master Dec 29, 2021

torokati44 mentioned this pull request Dec 30, 2021

chore: Bump h263-rs git reference ruffle-rs/ruffle#5946

Merged

torokati44 mentioned this pull request Jan 7, 2022

web: Build two WASM modules, with/without extensions, load the appropriate one ruffle-rs/ruffle#5834

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

yuv: Drop the LUTs, do integer-only (fixed-point), SIMD-capable arithmetic instead #15

yuv: Drop the LUTs, do integer-only (fixed-point), SIMD-capable arithmetic instead #15

torokati44 commented Dec 17, 2021 •

edited

torokati44 commented Dec 17, 2021

adrian17 commented Dec 29, 2021

yuv: Drop the LUTs, do integer-only (fixed-point), SIMD-capable arithmetic instead #15

yuv: Drop the LUTs, do integer-only (fixed-point), SIMD-capable arithmetic instead #15

Conversation

torokati44 commented Dec 17, 2021 • edited

torokati44 commented Dec 17, 2021

adrian17 commented Dec 29, 2021

torokati44 commented Dec 17, 2021 •

edited