Skip to content

math/big: rounding to denormal float32/64 still incorrect #14651

@griesemer

Description

@griesemer

This is a follow-up to issue #14553. In the special case of a math.Float number that is smaller than the smallest denormal, but that should be rounded up to the smallest denormal, rounding up doesn't happen for values x with 0.5 * 2**-149 (0.1000p-149) < x < 0.75 * 2**-149 (0.1100p-149) for float32 (analogously for float64).

Since the compiler is using this code, for these numbers we get the wrong bit patterns when converting/rounding at compile-time (constant evaluation):

package main

import (
    "fmt"
    "math"
)

const p149 = 1.0 / (1 << 149) // 1p-149

const (
    m0000 = 0x0 / 16.0 * p149 // = 0.0000p-149
    m1000 = 0x8 / 16.0 * p149 // = 0.1000p-149
    m1001 = 0x9 / 16.0 * p149 // = 0.1001p-149
    m1011 = 0xb / 16.0 * p149 // = 0.1011p-149
    m1100 = 0xc / 16.0 * p149 // = 0.1100p-149
)

func main() {
    print(float32(m0000), f32(m0000))
    print(float32(m1000), f32(m1000))
    print(float32(m1001), f32(m1001))
    print(float32(m1011), f32(m1011))
    print(float32(m1100), f32(m1100))
}

func f32(x float64) float32 {
    return float32(x)
}

func print(a, b float32) {
    fmt.Printf("%016x  %016x\n", math.Float32bits(a), math.Float32bits(b))
}

produces

0000000000000000  0000000000000000
0000000000000000  0000000000000000
0000000000000000  0000000000000001
0000000000000000  0000000000000001
0000000000000001  0000000000000001

(the left column is incorrect).

The problem in this case seems to be with rounding per se, and not so much the Float32/64 conversions.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions