Add f128 int to float conversions #624

tgross35 · 2024-05-29T06:26:51Z

No description provided.

tgross35 · 2024-05-29T08:34:54Z

This should be ready now

Cc @m-ou-se since you wrote the original int to float algorithms. The generic function's monos seem to get reasonably similar codegen, running the benchmarks seems like changes are within the noise margin.

m-ou-se · 2024-05-29T08:49:24Z

Haven't looked at it in detail, but was it necessary to make it generic? For f128, all integers up to 64 bit fit into it losslessly without rounding, so those conversions should be nearly trivial, right?

Anyway, happy to review this in detail next week.

r? m-ou-se

tgross35 · 2024-05-29T09:04:49Z

Haven't looked at it in detail, but was it necessary to make it generic?

Not strictly, but nine near-identical functions would be a lot. I think the constants are easier to follow than the magic numbers in any case.

Anyway, happy to review this in detail next week.

Thanks!

m-ou-se · 2024-06-07T13:07:11Z

Having reviewed in detail now, I'm still not convinced we should make this function generic. It now has quite a few different cases to handle, adding complexity. And now it always starts with an "if 0" check, even though a few functions (e.g. u64_to_f32_bits) were branch free (after codegen) before.

m-ou-se · 2024-06-07T13:24:09Z

Doing each of them separately makes it possible to spot nice optimizations that only apply to specific cases.

For example, in the u32 to f128 case, the lower 64 bits will always be zero, so all the operations can be done on u64 rather than u128, which results in much shorter and faster assembly:

pub fn u32_to_f128_bits(i: u32) -> u128 {
    if i == 0 {
        return 0;
    }
    let n = i.leading_zeros();
    let m = (i as u64) << (17 + n); // High 64 significant bits, with bit 113 still in tact.
    let e = 16413 - n as u64; // Exponent plus 16383, minus one.
    let h = (e << 48) + m; // High 64 bits of f128 representation.
    (h as u128) << 64 // Low bits are always 0.
}

tgross35 · 2024-06-13T12:46:59Z

Thanks for taking a look. I think I struck a happier medium - moved the redundant parts to generic functions to make things less magic-number-y, but did not combine the algorithms. Also included your suggested u32 -> f128 conversion.

Extract some common routines to separate functions in order to deduplicate code and remove some of the magic.

tgross35 · 2024-07-10T20:44:21Z

It looks like the system NaN functions are converting to max values rather than zero for f128. Not sure why this would be but I'll disable them and look into it.

@m-ou-se gentle nudge, could you take another look at this?

m-ou-se

I don't find it more readable when split up over multiple functions like this. For example, the exp function doesn't actually calculate the exponent; it calculates the exponent minus one, because of the trick with the mantissa containing an extra 1 bit that will unconditinally overflow into the exponent later. Separating tricks over multiple functions just makes it harder to follow and review.

That said, if you truly find this more maintainable, I'll be happy to approve this given that the code is well tested and benchmarked.

m-ou-se · 2024-07-16T12:00:43Z

src/float/conv.rs

+    /// Calculate the exponent from the number of leading zeros.
+    fn exp<I: Int, F: Float<Int: CastFrom<u32>>>(n: u32) -> F::Int {


It doesn't calculate the exponent. It calculates the exponent minus one.

m-ou-se · 2024-07-16T12:01:12Z

src/float/conv.rs

+    /// Shift the integer into the float's mantissa bits. Keep the lowest exponent bit intact.
+    fn m_base<I: Int, F: Float<Int: CastFrom<I>>>(i_m: I) -> F::Int {


It keeps the highest bit in tact.

m-ou-se · 2024-07-16T12:02:27Z

src/float/conv.rs

+        m_base + adj
+    }
+
+    /// Combine a final float repr from an exponent and mantissa.


It's an important detail that it doesn't just combine the exponent and mantissa fields, but instead is designed to take an off-by-one exponent and a mantissa with an extra 1 bit on the high side.

tgross35 force-pushed the f128-int-to-float branch 9 times, most recently from 4332449 to a062626 Compare May 29, 2024 08:29

tgross35 marked this pull request as ready for review May 29, 2024 08:34

tgross35 force-pushed the f128-int-to-float branch from a062626 to 0150c63 Compare May 29, 2024 09:17

Amanieu assigned m-ou-se Jun 1, 2024

tgross35 force-pushed the f128-int-to-float branch 3 times, most recently from 4dc63a4 to b084ae4 Compare June 13, 2024 12:35

tgross35 mentioned this pull request Jun 22, 2024

Tracking Issue for f16 and f128 float types rust-lang/rust#116909

Open

3 tasks

tgross35 force-pushed the f128-int-to-float branch from b084ae4 to 3d2857a Compare June 23, 2024 05:27

tgross35 added 3 commits July 8, 2024 19:11

Add an apfloat fallback for int to float tests

247fb63

Refactor integer to float conversion

ab1250b

Extract some common routines to separate functions in order to deduplicate code and remove some of the magic.

Add integer to f128 conversions

0bd5245

tgross35 force-pushed the f128-int-to-float branch from 3d2857a to 0bd5245 Compare July 8, 2024 23:11

tgross35 mentioned this pull request Jul 9, 2024

LLVM f128 -> f16 conversion selection failure on powerpc64le llvm/llvm-project#92866

Open

m-ou-se reviewed Jul 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add f128 int to float conversions #624

Add f128 int to float conversions #624

tgross35 commented May 29, 2024

tgross35 commented May 29, 2024 •

edited

Loading

m-ou-se commented May 29, 2024

tgross35 commented May 29, 2024

m-ou-se commented Jun 7, 2024

m-ou-se commented Jun 7, 2024

tgross35 commented Jun 13, 2024

tgross35 commented Jul 10, 2024

m-ou-se left a comment

m-ou-se Jul 16, 2024

m-ou-se Jul 16, 2024

m-ou-se Jul 16, 2024

		/// Calculate the exponent from the number of leading zeros.
		fn exp<I: Int, F: Float<Int: CastFrom<u32>>>(n: u32) -> F::Int {

		/// Shift the integer into the float's mantissa bits. Keep the lowest exponent bit intact.
		fn m_base<I: Int, F: Float<Int: CastFrom<I>>>(i_m: I) -> F::Int {

Add f128 int to float conversions #624

Are you sure you want to change the base?

Add f128 int to float conversions #624

Conversation

tgross35 commented May 29, 2024

tgross35 commented May 29, 2024 • edited Loading

m-ou-se commented May 29, 2024

tgross35 commented May 29, 2024

m-ou-se commented Jun 7, 2024

m-ou-se commented Jun 7, 2024

tgross35 commented Jun 13, 2024

tgross35 commented Jul 10, 2024

m-ou-se left a comment

Choose a reason for hiding this comment

m-ou-se Jul 16, 2024

Choose a reason for hiding this comment

m-ou-se Jul 16, 2024

Choose a reason for hiding this comment

m-ou-se Jul 16, 2024

Choose a reason for hiding this comment

tgross35 commented May 29, 2024 •

edited

Loading