Skip to content

llvm.fma.bf16 intrinsic is expanded incorrectly #131531

@beetrees

Description

@beetrees

Consider the following LLVM IR:

define bfloat @do_fma(bfloat %a, bfloat %b, bfloat %c) {
    %res = call bfloat @llvm.fma.bf16(bfloat %a, bfloat %b, bfloat %c)
    ret bfloat %res
}

LLVM turns this into the equivalent of:

define bfloat @do_fma(bfloat %a, bfloat %b, bfloat %c) {
    %a_f32 = fpext bfloat %a to float
    %b_f32 = fpext bfloat %b to float
    %c_f32 = fpext bfloat %c to float
    %res_f32 = call float @llvm.fma.f32(float %a_f32, float %b_f32, float %c_f32)
    %res = fptrunc float %res_f32 to bfloat
    ret bfloat %res
}

This is a miscompilation, however, as float does not have enough precision to do a fused-multiply-add for bfloat without double rounding becoming an issue. For instance: do_fma(0x1.40p+127, 0x1.04p+0, 0x1.00p-133) = 0x1.46p+127, but LLVM's lowering to float FMA gives an incorrect result of 0x1.44p+127.

Just using double instead of float would also not be a correct lowering: it would give the same incorrect result as the example above (using the reasoning from #128450 (comment), a 126 + 127 + 8 = 261-bit significand would be required for double rounding not to be a problem with this lowering). I suspect the best option here is to lower to a libcall instead.

Closely related to #98389/#128450

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions