Skip to content

cmd/compile: implement more optimizations on loong64 #59120

@xen0n

Description

@xen0n

This issue is mainly for tracking the implementation progress of various low-hanging fruits regarding loong64 optimizations.

There are many missed optimization chances on loong64. A quick survey on SSA intrinsics uncovers:

  • runtime.publicationBarrier
    • dmb st on arm64
    • dbar 0x1A on LA64 v1.10+ (gracefully downgrading to dbar 0 on LA64 v1.00) CL 577515
  • runtime.Bswap{32,64}
  • runtime/internal/sys.Prefetch{,Streamed}
    • preld on LA64 v1.00
  • runtime/internal/atomic.{And,Or}
  • math.{Trunc,Ceil,Floor,RoundToEven} not possible with LA64 v1.00
    • LA64 v1.00 frint.[sd] is not orthogonal: no fixed rounding mode variants (unlike e.g. ftintr{m,p,z,ne}).
  • math.Round
    • frint.[sd] on LA64 v1.00 -- have to check if the rounding mode behavior is tolerable
  • math.Abs
  • math.Copysign
  • math.FMA
    • f{,n}m{add,sub}.[sd] on LA64 v1.00: CL 483355
  • math/bits.TrailingZeros{64,32} (ssa.OpCtz{64,32})
  • math/bits.Len{64,32,} (ssa.OpBitLen{64,32})
    • clz.[wd] on LA64 v1.00: CL 483356
    • significant performance regression across the board, needs investigation confirmed to be micro-architecture quirk, alleviated somewhat by various alignment tricks
  • math/bits.Reverse{64,32,8} (ssa.OpBitRev{64,32,8})

We may want to implement (and preferably benchmark) all of the above.

cc @golang/loong64

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsFixThe path to resolution is known, but the work has not been done.Performancearch-loong64Issues solely affecting the loongson architecture.compiler/runtimeIssues related to the Go compiler and/or runtime.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions