Skip to content

cmd/link: add Cortex-A53 erratum 843419 workaround for arm64 #78577

@luizgre

Description

@luizgre

Go version

Crash observed on go1.25.x linux/arm64. Verified that cmd/link on tip (master) still has no erratum workaround.

What operating system and processor architecture are you using?

Linux 6.6.23 aarch64, NXP i.MX8MPlus SoC (4x Cortex-A53 @ 1.6 GHz).
Owasys Owa5x embedded device.
Binary built with Go internal linker (default, no -linkmode=external).

What did you do?

An MQTT client using crypto/tls with TLS 1.3 and an ECDSA P-256 client certificate connects to AWS IoT Core via mutual TLS. The TLS configuration uses tls.LoadX509KeyPair with MinVersion: tls.VersionTLS13.

During the handshake, sendClientCertificate calls crypto/ecdsa.SignASN1, which enters the FIPS 140 ECDSA signing path. Inside newDRBG, an h.Write() call panics with a nil pointer dereference.

The crash is intermittent but recurring. It was observed multiple times across different process restarts on the same binary, always at the exact same PC and fault address, with only heap addresses varying between crashes.

What did you expect to see?

Successful ECDSA signing, or an error. Not a panic.

What did you see instead?

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x80 pc=0xbfe048]

goroutine 31 [running]:
crypto/internal/fips140/hmac.(*HMAC).Write(...)
	/usr/local/go/src/crypto/internal/fips140/hmac/hmac.go:76
crypto/internal/fips140/ecdsa.newDRBG[...](0x100bd50, {0x4000526740, 0x20, 0x20}, {0x0, 0x0, 0x0}, {0x11c4200, 0x4005fc4ed0})
	/usr/local/go/src/crypto/internal/fips140/ecdsa/hmacdrbg.go:74 +0x188
crypto/internal/fips140/ecdsa.Sign[...](0x4000408a40, 0x100bd50, 0x4000408a80, {0x11c2420, 0x40002a4b50}, {0x4000526500, 0x20, 0x20})
	/usr/local/go/src/crypto/ecdsa/ecdsa.go:298 +0x1b0
crypto/ecdsa.signFIPS[...](0x4000408a40, 0x0, {0x11c2420?, 0x40002a4b50}, {0x4000526500, 0x20, 0x20})
	/usr/local/go/src/crypto/ecdsa/ecdsa.go:423 +0xa8
crypto/ecdsa.SignASN1({0x11c2420, 0x40002a4b50}, 0x4000568570, {0x4000526500, 0x20, 0x20})
	/usr/local/go/src/crypto/ecdsa/ecdsa.go:402 +0x230
crypto/ecdsa.(*PrivateKey).Sign(0x1c36060?, {0x11c2420?, 0x40002a4b50?}, ...)
	/usr/local/go/src/crypto/ecdsa/ecdsa.go:329 +0x44
crypto/tls.(*clientHandshakeStateTLS13).sendClientCertificate(0x4005fc52c0)
	/usr/local/go/src/crypto/tls/handshake_client_tls13.go:816 +0x41c
crypto/tls.(*clientHandshakeStateTLS13).handshake(0x4005fc52c0)
	/usr/local/go/src/crypto/tls/handshake_client_tls13.go:143 +0x6ac
crypto/tls.(*Conn).clientHandshake(0x4000664708, {0x11d5e80, 0x4005fb30e0})
	/usr/local/go/src/crypto/tls/handshake_client.go:367 +0x5e0
crypto/tls.(*Conn).handshakeContext(0x4000664708, {0x11d5ef0, 0x40004509a0})
	/usr/local/go/src/crypto/tls/conn.go:1575 +0x2d4

A second crash from a different process (pid 1625) shows identical PC and fault address, only heap pointers differ:

[signal SIGSEGV: segmentation violation code=0x1 addr=0x80 pc=0xbfe048]
goroutine 16 [running]:
crypto/internal/fips140/ecdsa.newDRBG[...](0x100bd50, {0x400062e140, 0x20, 0x20}, {0x0, 0x0, 0x0}, {0x11c4200, 0x4000996ed0})
	hmacdrbg.go:74 +0x188

Analysis

Crash site

The crash is in newDRBG (hmacdrbg.go:52-118), at line 74:

h := hmac.New(hash, K)   // line 71, returns non-nil *HMAC
h.Write(d.V)             // line 72, succeeds
h.Write([]byte{0x00})    // line 73, succeeds
h.Write(entropy)          // line 74, CRASH: h is corrupted

ARM64 instruction decode of the crashing binary

I decoded the raw ARM64 instructions at the crash site (newDRBG[go.shape.*uint8], base address 0xbfdec0):

0xbfdffc +0x13c: STR X0, [SP, #0x80]     ; store h to stack after hmac.New returns
0xbfe000 +0x140: LDR X5, [SP, #0x88]     ; load d
0xbfe004 +0x144: LDP X1, X2, [X5, #16]   ; load d.V (ptr, len)
0xbfe008 +0x148: LDR X3, [X5, #0x20]     ; load d.V cap
0xbfe00c +0x14c: LDP X5, X0, [X0, #64]   ; load h.inner (itab, data)
0xbfe010 +0x150: LDR X5, [X5, #0x38]     ; load Write from itab
0xbfe014 +0x154: BLR X5                   ; h.inner.Write(d.V), line 72 OK

0xbfe018 +0x158: ADRP R0, ...            ; setup for []byte{0x00}
0xbfe01c +0x15c: ADD X0, X0, #0x6a0
0xbfe020 +0x160: BL runtime.newobject     ; allocate []byte{0x00}
0xbfe024 +0x164: LDR X5, [SP, #0x80]     ; reload h from stack, VALID
0xbfe028 +0x168: LDP X5, X6, [X5, #64]   ; load h.inner, OK
0xbfe02c +0x16c: LDR X5, [X5, #0x38]     ; load Write from itab
  ...
0xbfe040 +0x180: BLR X5                   ; h.inner.Write([]byte{0x00}), line 73 OK

0xbfe044 +0x184: LDR X5, [SP, #0x80]     ; reload h from stack, CORRUPTED (0x40)
0xbfe048 +0x188: LDP X5, X0, [X5, #64]   ; CRASH: 0x40 + 64 = 0x80, SIGSEGV

The *HMAC pointer h lives at [SP+0x80]. At +0x164 (before the line 73 call), this slot holds a valid heap pointer. At +0x184 (after the line 73 call returns), the same slot contains 0x40, which is not a valid heap address. ARM64 heap pointers are in the 0x4000xxxxxx range.

The corruption happened during the execution of h.inner.Write([]byte{0x00}) (the BLR at +0x180). The callee ((*sha512.Digest).Write) does not touch the caller's stack frame, so the corruption must come from something else operating on this goroutine's stack, most likely the garbage collector.

Root cause: Cortex-A53 Erratum 843419

The device uses Cortex-A53 cores, affected by ARM Erratum 843419:

Under certain conditions, an ADRP instruction at the last two word-aligned addresses of a 4KB page (offset 0xFF8 or 0xFFC), when followed within three instructions by a memory access that uses the result of the ADRP, may use an incorrect address.

Go's internal linker has no workaround for this erratum. I searched all of cmd/link/internal/ and found zero references to erratum 843419, 835769, or any ARM CPU errata. The external linkers (GNU ld, LLVM lld) implement --fix-cortex-a53-843419 by default for aarch64 targets.

I scanned the crashing binary (~29MB, GOARCH=arm64, internal linker) with a scanner faithful to the erratum specification from LLVM lld's AArch64ErrataFix.cpp:

  • 186,628 ADRP instructions total
  • 387 at page offsets 0xFF8 or 0xFFC
  • 2 full erratum 843419 pattern matches

The erratum requires a specific multi-instruction sequence (not just ADRP + immediate next instruction):

Match 1 (page offset 0xFFC, 4-instruction variant):

0x7faffc: ADRP X27, 0x1c78000       ; Pos1: ADRP at page offset 0xFFC
0x7fb000: LDR  X1, [X27, #1856]     ; Pos2: load using X27, does not write X27
0x7fb004: ADRP X27, 0x1ca9000       ; Pos3: non-branch instruction
0x7fb008: LDR  W2, [X27, #608]      ; Pos4: unsigned-imm load, base=X27 -- AFFECTED

Match 2 (page offset 0xFF8, 4-instruction variant):

0x81cff8: ADRP X27, 0x1cab000       ; Pos1: ADRP at page offset 0xFF8
0x81cffc: STR  X3, [X27, #1080]     ; Pos2: store using X27, does not write X27
0x81d000: ADRP X27, 0x1c80000       ; Pos3: non-branch instruction
0x81d004: STR  X2, [X27, #1888]     ; Pos4: unsigned-imm store, base=X27 -- AFFECTED

An older build of the same binary (v1.5.1, ~25MB) has 10 matches -- the count varies with binary layout.

If the erratum causes one of these to compute a wrong page address, the resulting load/store silently corrupts data. This corrupted data can propagate and crash at a completely different code location. The crash at pc=0xbfe048 is a corrupted pointer dereference: a valid *HMAC heap pointer on the stack (at [SP+0x80]) gets replaced with 0x40 during a function call, and the subsequent LDP X5, X0, [X5, #64] faults at 0x40+64=0x80.

This explains the observed behavior:

  • Same PC across all crashes (same binary layout, same code addresses)
  • Different heap addresses (corruption depends on code layout, not data layout)
  • ARM64-only (erratum only affects Cortex-A53 and similar cores)
  • Intermittent (depends on GC timing coinciding with the erratum-affected code path)
  • Building with -linkmode=external eliminates the crash (external linker applies the workaround)

Evidence

Observation Detail
Fault address addr=0x80 = corrupted pointer 0x40 + struct field offset 64
PC consistency Same pc=0xbfe048 across crashes, heap addrs differ
Corruption window Stack slot valid before BLR, corrupted after BLR return
Processor Cortex-A53 (affected by erratum 843419)
Linker Go internal linker (no erratum workaround)
Binary scan 2 confirmed erratum 843419 pattern matches in crashing binary (10 in older build)
External linker Crash not observed after rebuilding with -linkmode=external

Proposed fix

The internal ARM64 linker (cmd/link/internal/arm64/) should implement a workaround for Cortex-A53 erratum 843419, as GNU ld and LLVM lld already do.

Detection

After address assignment (the address() pass in cmd/link/internal/ld/data.go), when processing R_ADDRARM64 relocations that produce ADRP instructions, check whether the ADRP lands at page offset 0xFF8 or 0xFFC. If so, check the following 2-3 instructions for the erratum sequence as specified in the ARM errata document and implemented in LLVM lld (AArch64ErrataFix.cpp):

  • 3-insn variant (ADRP at 0xFFC): Pos2 is a qualifying load/store that does not writeback to Rn; Pos3 is an unsigned-immediate load/store using the ADRP destination register as base.
  • 4-insn variant (ADRP at 0xFF8 or 0xFFC): Pos2 as above; Pos3 is any non-branch instruction; Pos4 is an unsigned-immediate load/store using the ADRP destination register as base.

The ADRP opcode is identified by (insn & 0x9F000000) == 0x90000000.

Mitigation

Two approaches are used by other linkers:

NOP padding. Insert a NOP before the ADRP to shift it to a safe page offset. This requires adjusting the layout, similar to how the existing trampoline mechanism works. The trampoline() function in arm64/asm.go already inserts extra instructions during the address assignment loop.

Branch to veneer. Replace the triggering sequence with a branch to a stub in a separate section. The stub performs the ADRP + load/store at a safe address and branches back. This is what GNU ld does with --fix-cortex-a53-843419, and it avoids re-laying out the entire text section.

The trampoline infrastructure in arm64/asm.go (gentramp(), gentrampgot()) already generates ADRP+ADD+BR veneers for out-of-range branches. A similar mechanism could generate erratum-safe veneers for affected ADRP sequences.

Where to implement

The archreloc() function in cmd/link/internal/arm64/asm.go handles all ADRP relocations. At this point the final address of each ADRP instruction is known. This function could detect the erratum pattern and redirect through a veneer, similar to how it already handles out-of-range branches by redirecting through trampolines.

Workaround

Users on Cortex-A53 and other affected cores can build with the external linker:

CGO_ENABLED=1 go build -ldflags="-linkmode=external" ./...

The system linker (GNU ld, LLVM lld) applies --fix-cortex-a53-843419 automatically for aarch64 targets.

Affected platforms

Cortex-A53 is widely deployed:

  • Raspberry Pi 3/3B+/4 (A53 little cores)
  • NXP i.MX8M family (i.MX8MPlus, i.MX8MMini, i.MX8MNano)
  • Rockchip RK3399 (A53 little cores)
  • Allwinner H5/H6
  • AWS Graviton (first generation)

All Go programs built with the internal linker for GOARCH=arm64 and running on these platforms are potentially affected.

Related

Metadata

Metadata

Assignees

Labels

WaitingForInfoIssue is not actionable because of missing required information, which needs to be provided.compiler/runtimeIssues related to the Go compiler and/or runtime.

Type

No type

Projects

Status

Todo

Relationships

None yet

Development

No branches or pull requests

Issue actions