Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the correct *wrapping* value for ilog{|2|10}? #100422

Closed
scottmcm opened this issue Aug 11, 2022 · 23 comments · Fixed by #102578
Closed

What's the correct *wrapping* value for ilog{|2|10}? #100422

scottmcm opened this issue Aug 11, 2022 · 23 comments · Fixed by #102578
Labels
B-unstable Blocker: Implemented in the nightly compiler and unstable. C-bug Category: This is a bug. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@scottmcm
Copy link
Member

Filing since it's in pFCP over at #70887 (comment), and I don't think we can reasonably change it after stabilization (especially since it's documented).

Currently, 0.ilog2() in release mode evaluates to 0. (And same for the other bases.)

It's not obvious to me that that's necessarily the best option. For example, the (BITS-1) - LeadingZeros implementation is slightly nicer on CPUs from the past decade or so https://rust.godbolt.org/z/3q6WME84Y (and I think would be just as nice on older ones if LLVM was a bit smarter), but gives -1as u32 in the wrapping case.

It also might be nice to have the wrapping case give something clearly different from any of the successful cases, rather than having 1.ilog2() and 0.ilog2() give the same thing. For example u32::MAX.next_power_of_two() returns 0 in release mode, which isn't a power of two (according to is_power_of_two), so it's distinguishable from all the non-wrapping cases.

This only affects ilog/ilog2/ilog10 on uNN. (The checked versions don't have an issue, nor do the ones on NonZeroUNN.)

@scottmcm scottmcm added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. B-unstable Blocker: Implemented in the nightly compiler and unstable. C-bug Category: This is a bug. I-libs-api-nominated The issue / PR has been nominated for discussion during a libs-api team meeting. labels Aug 11, 2022
@scottmcm
Copy link
Member Author

scottmcm commented Aug 11, 2022

I couldn't find any previous discussion of what the exact wrapping value ought to be, just that there ought to be one, as opposed to a panic!.

There was this note: #80918 (comment)

In this case of logs, checking has to be done anyway

For which I think the ilog2 implementation I mention above is a counter-example, albeit only for the base-2 case.

Also, currently ilog2 in release mode is essentially .checked_ilog2().unwrap_or(0), so we can pick any value we want to return for those cases.


FWIW, the implementation in https://graphics.stanford.edu/~seander/bithacks.html#IntegerLogLookup gives -1 as the result for ilog2(0).

@leonardo-m
Copy link

In Rust we try to do the right thing. Not even Python gives -1. Let's avoid in-band error values in Rust.

@scottmcm
Copy link
Member Author

@leonardo-m What's your proposal for what the "right" thing is, then? That 0.ilog2() panic in release too?

Arguably the most-correct value is −∞ (like (+0.0_f32).log2()), but that doesn't fit in any of our integer types.

So 0.saturating_ilog2() => 0 would make sense, but release mode for the other un-prefixed things is wrapping, not saturating.

@leonardo-m
Copy link

Python has some functions that return in-band values:

>>> "xy".find("y")
1
>>> "xy".find("z")
-1

From Rust I expect something better. In Rust if a function isn't naturally total and you can't restrict its input, then I expect it to return Option (or Result), or when that's isn't ergonomic enough, a panic (in release too). Returning -1 is dangerous as that Python find function, returning 0 is not math.

@cmpute
Copy link

cmpute commented Aug 11, 2022

I think returning -1 (wrapped if necessary) is a clear distinguish between normal and error result. Returning an Option is not ideal currently because it will makes the function practically unusable in const context (before const unwrap is stabilized)

If getting closest to the "correct" math answer, how about choosing i32::MIN as u32? However it seems weird and -1 still seems better to me

@leonardo-m
Copy link

I think returning -1 (wrapped if necessary) is a clear distinguish between normal and error result.

Perhaps in other languages. Rust is known for being better than that. I have chosen to use Rust because it's a bit better than Python (see above).

Returning an Option is not ideal currently because it will makes the function practically unusable in const context (before const unwrap is stabilized)

Let's return Option and and wait for const unwrap to be stabilized before making ilog const, then.

@scottmcm
Copy link
Member Author

Let's return Option and [...]

That's what 0.checked_ilog2() does already. Is your proposal, then, that the -> u32 version of ilog2 just shouldn't exist on uNN? How do you square that with precedent of other things -- like uNN::next_power_of_two or iNN::abs -- that have the checked_ version return Option, but the un-prefixed one wrap in release mode?

@programmerjake
Copy link
Member

what about treating it like division by zero and just panicking?

@lnicola
Copy link
Member

lnicola commented Aug 12, 2022

I'd be fine with -1. The point of these methods is to get fast code (otherwise many won't use them and will prefer to do UNN::BITS - x.leading_zeros() - 1 to keep the code branchless and avoid the ugly unwrap()) and a defined value outside of the usual input range. u32::MAX.next_power_of_two() (from above) is a good example.

If you want stricter guarantees than that, you can use NonZeroUNN.

@jdahlstrom
Copy link

A minor point: u32::MAX.next_power_of_two() being 0 is actually a reasonable choice - even though 0 is not a power of two, it is congruent to the correct power of two modulo 2^32 (which is 2^32 itself), so exactly what is expected in wrapping arithmetic. But unfortunately there's no "−∞ mod 2^32" that could be used for ilog2.

@cmpute
Copy link

cmpute commented Aug 12, 2022

Modular arithmetic is confusing in the context of logarithm, if you want to return the discrete logarithm of 0 in the release mode, then there are a lot of candidates, because (in case of u8) $$2^8 \equiv 2^{16} \equiv 2^{24} \equiv \ldots \equiv 2^{8k} \equiv 0 \;\mathrm{mod} \;256$$ therefore in the sense of modular arithmetic, $\log_2(0)$ could be any multiple of uNN:BITS. If you really want to achieve something close to $-\infty$, then I guess the best value is i32::MIN as u32 as I mentioned above. (Interestingly though, i32::MIN as u32 = $2^{31}$, which is a multiple of 8, 16, 32, 64, 128)

@scottmcm
Copy link
Member Author

what about treating it like division by zero and just panicking?

That's certainly a possibility. @m-ou-se suggested not doing that back in the original PR, though: #80918 (comment)

@tspiteri
Copy link
Contributor

There was this note: #80918 (comment)

In this case of logs, checking has to be done anyway

For which I think the ilog2 implementation I mention above is a counter-example, albeit only for the base-2 case.

Doesn't leading_zeros implicitly include a check for zero in the code generated? [godbolt]

@tspiteri
Copy link
Contributor

Also, since the ilog methods are implemented for signed integers as well, another issue would be the log of negative values. For example for floating-point, 0f32.log2() would return −∞, but (-1f32).log2() would return NaN. So if 0i32.ilog2 would return some value for −∞, would (-1i32).ilog2() panic anyway, or return some value?

@jdahlstrom
Copy link

jdahlstrom commented Aug 19, 2022

So if 0i32.ilog2 would return some value for −∞, would (-1i32).ilog2() panic anyway, or return some value?

I don't think it makes sense to do anything other than panic in the case of negative values, which in turn is probably an argument for panicking on zero as well. I guess one could argue that "wrapping" could be interpreted as wrapping the preimage to fit in the function's domain (rather than just wrapping the image to fit in the range) but IMO that would be entirely unintuitive and unhelpful as well.

After thinking about it a bit, it seems to me that ilog simply cannot have a meaningful wrapping variant, because wrapping semantics (as reasonably interpreted) cannot answer the question "what should the result be for values for which the original function is undefined?" There are no elements in ilog's domain for which the function underflows or overflows.

@scottmcm
Copy link
Member Author

scottmcm commented Aug 19, 2022

@tspiteri

Doesn't leading_zeros implicitly include a check for zero in the code generated? [godbolt]

That's only the case on old chips. As I linked in the OP (#100422 (comment)), that happens in the instruction generation very-last-phase of LLVM on old chips, and on newer chips it knows to emit a different instruction instead of the branching code: https://rust.godbolt.org/z/1KM5zqTMG.

The good news is that it looks like newer LLVM has gotten smarter here. It used to be that, because of how the intrinsic gets translated, it would emit obviously-redundant assembly on old chips in some code patterns, as you could see in Rust 1.16: https://rust.godbolt.org/z/rxPzGdxTY.

@joboet
Copy link
Contributor

joboet commented Aug 20, 2022

$\mathrm{ilog2}(x)$ is supposed to find the largest $k \in \mathbb{N}_0$ so that $2^k \leq x$. Since computers use wrapping arithmetic, the equation is better put as $2^k \mod{2^{bits}} \leq x$. The largest $k < 2^{32}$ satisfying this equation in the case $x = 0$ is $2^{32} - 1$ (aka u32::MAX).

So if we consider a wrapping ilog2 to be the inverse of 2.wrapping_pow, u32::MAX makes perfect sense (and is very efficient on modern CPUs, as others have pointed out).

I would also advocate for wrapping_ilog2 to be added as a method which always has this behaviour. ilog2 could then be treated like the other integer operations: panicking on overflow in debug mode, allowing people to catch these errors, and wrapping in release mode, making the operation faster. [Edit: it already behaves that way]

@scottmcm
Copy link
Member Author

ilog2 could then be treated like the other integer operations: panicking on overflow in debug mode, allowing people to catch these errors, and wrapping in release mode, making the operation faster.

Note that's already how ilog2 works today. It's just that it wraps to 0 in release right now -- this issue doesn't propose removing the panic in debug behaviour, just to figure out what the best value for release would be.

@scottmcm
Copy link
Member Author

Since, from the notes, it seems like the libs-api meeting today said "how about we panic instead of wrap", I'll try to answer my own question from above:

How do you square that with precedent of other things [...]

I would propose the following rule:

Conceptually, the result is first calculated in true ℤ (aka to infinite precision). Then for the non-prefixed versions

  • If that result is undefined (ilog(0), 1 / 0, etc), then it panics (regardless of compilation mode)
  • If that result is defined and fits in the result type, it's returned.
  • If that result is defined but not in-range for the result type, it panics if overflow checking is enabled, and wraps (mod 2N for uN) if overflow checking is disabled.

I think that's consistent with all the current precedent.

(#100422 (comment) also is good reading about the distinction I'm drawing here.)

@programmerjake
Copy link
Member

programmerjake commented Aug 23, 2022

  • If that result is undefined (ilog(0), 1 / 0, etc), then it panics (regardless of compilation mode)

I would redefine that to include cases where the result is defined but you can't wrap that to the result type, e.g. sqrt(-1) or infinity. that would cover log(0) = -infinity and log(-1) = pi * i

@scottmcm
Copy link
Member Author

I agree, but was considering that implied by "in ℤ", as that set includes neither −∞ nor ⅈ.

(Like how f32::log(-0.5) is NAN, not a complex number.)

@m-ou-se
Copy link
Member

m-ou-se commented Aug 30, 2022

what about treating it like division by zero and just panicking?

That's certainly a possibility. @m-ou-se suggested not doing that back in the original PR, though: #80918 (comment)

I'm not necessarily against panicking in that case. I just pointed out how it's a bit inconsistent with the other operations we have on integers. (Whether it makes sense for the other operations to have this difference in behavior between debug and release mode is an interesting discussion too, but I'm afraid that ship has sailed.)

The reason we don't always panic on overflow for + and - etc. is performance. It's simply too costly to do. (We always panic on division by zero because that's both mathematically undefined (even modulo 2BITS) and because on some platforms, the native division instruction will signal/abort/trap the program when the right hand side is zero.)

We applied the same reasoning to things like abs and next_power_of_two, although one might reasonably argue that for those it might be fine to always panic, since these are not as common as + and -. (You could then still use checked_*().unwrap_unchecked() if you really want to make sure there's no check, if you're sure it's unnecessary.)

I would propose the following rule: …

That seems reasonable.

It'd be nice if it'd still be easy to get the efficient 32 - some_u32.leading_zeros() calculation through some usage of ilog2 or checked_ilog2 though. It looks like llvm isn't smart enough today to turn some_u32.checked_ilog2().map_or(0, |n| n + 1) into simply a count-leading-zeros and a subtract instruction. :(

@m-ou-se m-ou-se removed the I-libs-api-nominated The issue / PR has been nominated for discussion during a libs-api team meeting. label Aug 30, 2022
@m-ou-se
Copy link
Member

m-ou-se commented Sep 13, 2022

We discussed this again in the libs-api meeting, and we concluded pretty much the same as @scottmcm's rule, meaning that these functions should panic on zero in all compilation modes, since there's not a modulo-2N-correct result.

@bors bors closed this as completed in d8091f8 Oct 12, 2022
thomcc pushed a commit to tcdi/postgrestd that referenced this issue Feb 10, 2023
Panic for invalid arguments of `{integer primitive}::ilog{,2,10}` in all modes

Decision made in rust-lang/rust#100422 (comment)

resolves rust-lang/rust#100422

tracking issue: rust-lang/rust#70887

r? `@m-ou-se`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-unstable Blocker: Implemented in the nightly compiler and unstable. C-bug Category: This is a bug. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants