Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use multiplication by reciprocal to speed up divide #84

Draft
wants to merge 80 commits into
base: main
Choose a base branch
from
Draft

Conversation

mcroomp
Copy link
Collaborator

@mcroomp mcroomp commented May 29, 2024

Current approach doesn't work due to rounding issues.

Looking at using libdivide no-branch, since we never divide by 1 (because of the 1 << 13)

let orig_lt_zero = 1 - pred.cmp_lt(i32x8::ZERO);
let best_prior_is_zero: i32x8 = cast(!best_prior_abs.cmp_eq(u32x8::ZERO));

let best_prior_sign = best_prior_is_zero & orig_lt_zero;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here also can be a corner case: sign of prior as i16 is used in initial implementation, not of i32 prior.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick fix is here #85

@Melirius
Copy link
Collaborator

Melirius commented May 29, 2024

Hmm, and what about using libdivide only for 16-bit quantization table (or, better, for table with values larger than 8355)? Normal JPEGs will use then this fast method, and libdivide will be only a fallback for bad tables. Branch predictor should be OK with that. See https://github.com/Melirius/lepton_jpeg_rust/tree/recip-var

@mcroomp mcroomp marked this pull request as draft June 13, 2024 05:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants