Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yuv: Drop the LUTs, do integer-only (fixed-point), SIMD-capable arithmetic instead #15

Merged
merged 1 commit into from
Dec 29, 2021

Conversation

torokati44
Copy link
Member

@torokati44 torokati44 commented Dec 17, 2021

This, somewhat surprisingly, already speeds up the conversion function by a factor of about 10% on the web target in itself.
But the more important thing is that it's SIMD-capable. And that target feature (SSE2) is enabled by default on desktop targets, AFAIK.
All the tests still pass (in fact I had to add a couple more), and the sample videos I tested still look completely fine to me.

Once ruffle-rs/ruffle#5834 is merged, it should speed it up (on web, in capable browsers) by an additional factor of 2x. Why only 2x, and not 4x, you ask? Well, I don't know. Maybe we should ask our friend, Amdahl. I'm not complaining though, it's still a nice uplift.

I've also experimented with 16-bit intermediate precision, but it's just barely not accurate enough for my taste, and isn't any faster. And using i32x4 also allows the neat transpose trick at the end.

Nor is doing a 2x2 group of pixels together faster, even though the chroma samples can be just splatted across all lanes, the additional shuffling in memory and more complicated iteration most probably negate that.

Additionally, if we were really serious about performance, we could also use bytemuck::pod_align_to, but since the h.263 decoder chops off the luma samples to odd widths in some cases, it would make lining up the chroma samples to it kinda awkward. And the WASM VM might also not even JIT these loads into the faster aligned versions of the native instructions...

@torokati44
Copy link
Member Author

Let me once again link to some high(er(ish)) resolution video content to benchmark on:
https://z0r.de/4449
https://z0r.de/3664
https://z0r.de/3946
https://z0r.de/7311
https://z0r.de/5092
https://z0r.de/7214

@adrian17 adrian17 merged commit b810e8c into ruffle-rs:master Dec 29, 2021
@adrian17
Copy link
Contributor

FYI, the crate references in main repo will need to be updated after this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants