Skip to content

Hard lockup under sustained full CPU load — sunxi-wdt triggers reboot, watchdog timeout error (Radxa Cubie A7S / Allwinner A733) #537

@petrzalskypenzista

Description

@petrzalskypenzista

Environment
Board: Radxa Cubie A7S
SoC: Allwinner A733 (T527)
Kernel: 5.15.147-18-a733 #18 SMP PREEMPT
OS: Debian GNU/Linux 11 (bullseye), arm64
Cooling: Active (PWM fan), CPU temp ~50°C during lockup

Description
The board experiences a hard lockup after approximately 20 minutes of sustained full CPU load across all 8 cores (6× Cortex-A55 + 2× Cortex-A76). The system becomes completely unresponsive — no ping, no SSH — and requires a power cycle or is recovered automatically by the hardware watchdog (sunxi-wdt).

Notably, the lockup occurs at normal operating temperature (~50–52°C), ruling out thermal shutdown as the cause. There is no OOM killer activity and no kernel panic recorded in the journal. The hardware watchdog silently resets the board.

A secondary issue is present: systemd reports a failure when attempting to configure the sunxi-wdt timeout, suggesting the watchdog driver does not correctly handle the requested timeout value.

Steps to reproduce

  1. Boot the board with adequate cooling (fan running, CPU temp ~35°C at idle)
  2. Run a sustained multi-core CPU stress workload, e.g.:
    sudo sbc-bench.sh -r or 7zr b -mmt8
  3. Wait approximately 15–25 minutes with all 8 cores at 100% load
  4. Board becomes unresponsive (no ping, no SSH)
  5. no automatic reboot from watchdog

Expected behavior
Board should remain stable indefinitely under full CPU load at normal operating temperatures. If a watchdog is used, its timeout should be configurable without error.

Actual behavior
Board locks up hard after ~20 minutes. The hardware watchdog (sunxi-wdt) don't reset the system. No panic, no OOM, no thermal event — clean watchdog reset with no root cause logged.

Relevant log output (journalctl -b -1)
Apr 17 22:26:23 radxa-cubie-a7s systemd[1]: Using hardware watchdog 'sunxi-wdt', version 0, device /dev/watchdog
Apr 17 22:26:23 radxa-cubie-a7s systemd[1]: Failed to set timeout to 600s: Invalid argument
Apr 17 22:26:24 radxa-cubie-a7s systemd-shutdown[1]: Syncing filesystems and block devices.
Apr 17 22:26:24 radxa-cubie-a7s systemd-shutdown[1]: Sending SIGTERM to remaining processes...
Apr 17 22:26:24 radxa-cubie-a7s systemd-journald[274]: Journal stopped

No kernel panic, no OOM killer, no thermal trip in logs before shutdown.

dmesg grep for panic|lockup|hung|oom|killed returned no results.

CPU state at time of lockup
Frequency: 1404–1508 MHz (A55) / 1196–1404 MHz (A76) — normal operation
Temperature: ~51°C (cpub_idle_zone) — well below trip points
Load: 100% all 8 cores
Duration: ~20 minutes of sustained load

Secondary issue: sunxi-wdt timeout configuration failure
systemd[1]: Failed to set timeout to 600s: Invalid argument
The sunxi-wdt driver rejects the timeout value requested by systemd. This may indicate an incorrect maximum timeout value exposed by the driver, or a missing implementation of the timeout-set ioctl for this SoC variant.

Additional notes
— Power supply: 5V/3A, measured current draw ~900mA peak under full load (well within limits)
— A related issue has been filed separately: CPU thermal throttling not released after cooldown (frequency stuck at 416 MHz)
— Both issues observed on kernel 5.15.147-18-a733, no other kernels tested

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions