Skip to content

jni/ns16550a_bridge: simulate TX-empty rising edge on drain#2

Merged
SolAstrius merged 1 commit into
stagingfrom
fix/uart-tx-edge
Apr 21, 2026
Merged

jni/ns16550a_bridge: simulate TX-empty rising edge on drain#2
SolAstrius merged 1 commit into
stagingfrom
fix/uart-tx-edge

Conversation

@SolAstrius
Copy link
Copy Markdown
Collaborator

Summary

  • NS16550A JNI bridge's `CHARDEV_TX` flag means "ring has space for more guest writes", which is ~continuously true for the 64 KiB ring.
  • `ns16550a_notify` short-circuits on an unchanging flag word, so after the guest kernel enables `IER.THR` and handles the first TX-empty IRQ, the UART never re-raises the TX IRQ on subsequent writes to THR.
  • Kernel's IRQ-driven 8250 TX path then deadlocks on tty close-time drain: `echo > /dev/ttyS1` hangs inside `close()`. The console port works fine because it uses polled TX and doesn't depend on the IRQ.

Fix

In `Java_..._ns16550a_1bridge_1poll`, after Java actually drains bytes out of the TX ring, pulse the UART's cached flags: notify once with `CHARDEV_TX` cleared, then notify again with the real flags. This emulates the real-hardware THRE low→high transition that occurs after each transmit, giving the 8250 driver the rising edge its IRQ handler expects.

No-op when zero bytes were drained, to avoid spurious IRQ re-fires every host tick.

Repro (before this patch)

On a guest with two NS16550A bridges (console + RPC):

  • `echo hi > /dev/ttyS0` → completes instantly (console port, polled TX)
  • `echo hi > /dev/ttyS1` → hangs in `close()` (IRQ-driven TX, missing edge)
  • `setserial /dev/ttyS1 irq 0 ; echo hi > /dev/ttyS1` → completes instantly (forces polled TX, confirming the IRQ path is the problem)

The bridge's CHARDEV_TX flag means "ring has space for more guest
writes" — true continuously for the 64KB ring in practical use. That
kept ns16550a_notify short-circuiting on an unchanging flag word,
which meant the UART never re-raised its TX-empty IRQ after the
kernel enabled IER.THR and handled the first interrupt. Kernel's
IRQ-driven 8250 tx path then deadlocked: single-byte `echo > ttyS1`
hangs inside close()'s tty drain, while the console port (polled TX,
no IRQ dependency) works fine. Confirmed by `setserial ttyS1 irq 0`
making writes complete instantly.

Fix: whenever Java actually drains bytes out of the ring, issue a
short flag-pulse to ns16550a_notify — first with TX cleared, then
with TX restored — emulating the THRE low→high transition real
hardware exhibits after each transmit. No-op when zero bytes were
drained, to avoid spurious IRQ re-fires every tick.
@SolAstrius SolAstrius merged commit 7638bb7 into staging Apr 21, 2026
@SolAstrius SolAstrius deleted the fix/uart-tx-edge branch April 26, 2026 21:25
SolAstrius added a commit that referenced this pull request Apr 27, 2026
Linux's snd_hdac_bus_init_cmd_io (sound/hda/core/controller.c) does:

  writew(CORBRP, AZX_CORBRP_RST);           // bit 15 = 1
  for (timeout = 1000; timeout > 0; timeout--)
      if (readw(CORBRP) & AZX_CORBRP_RST)   // poll for bit 15 = 1
          break;
  // ↑ "CORB reset timeout #1" if poll times out

  writew(CORBRP, 0);
  for (timeout = 1000; timeout > 0; timeout--)
      if (readw(CORBRP) == 0)               // poll for bit 15 = 0
          break;
  // ↑ "CORB reset timeout #2" if poll times out

Our handler reset corb_rp=0 immediately on the bit-15 write but never
echoed bit 15 back on read, so Linux's first poll always timed out
(1000 µs of busy-looping, then the dev_err warning). The cascade from
there causes spurious response timeouts during codec discovery —
Linux falls back to polling mode, retries, and somewhere in that
chaos NID 2's widget caps response gets dropped or returns garbage.
Linux mis-classifies the topology: NID 2 disappears entirely from the
codec dump, and snd_hda_codec_generic configures the pin (NID 3) as
"mono_out=0x3" with no converter underneath. No PCM device is
created, no Master mixer control exists, speaker-test errors with
ENOENT.

Fix: store corb_rprst (bit 15) separately from corb_rp (bits 7:0),
echo on read. SW writes 1 → set rprst, reset corb_rp to 0; SW writes
0 → clear rprst, store new RP value. Both Linux poll loops now exit
on the first iteration.

This regression went unnoticed because all the standalone smoke
boots in this session ran rvvm_x86_64 without the -hda_test flag —
HDA wasn't actually attached to the PCI bus. The user discovered it
when running through the ScalarEvolution mod's full machine config
(which always attaches HDA) and seeing playback fail.

The CORBRP RST handshake has been broken since this device was first
written; before commit 5d10843 (table-driven SD dispatch) the code
also reset corb_rp=0 on bit-15 write without echoing the bit. It
worked for users in practice because Linux only logs the timeout as
a warning and continues — the real damage was the 1 ms × 1000 = 1 s
delay during which subsequent codec verbs raced. Some guests were
lucky and got a clean topology; others (apparently anyone with the
new Beep widget bumping subnode count to 3) tipped over.
SolAstrius added a commit that referenced this pull request May 1, 2026
The bridge's CHARDEV_TX flag means "ring has space for more guest
writes" — true continuously for the 64KB ring in practical use. That
kept ns16550a_notify short-circuiting on an unchanging flag word,
which meant the UART never re-raised its TX-empty IRQ after the
kernel enabled IER.THR and handled the first interrupt. Kernel's
IRQ-driven 8250 tx path then deadlocked: single-byte `echo > ttyS1`
hangs inside close()'s tty drain, while the console port (polled TX,
no IRQ dependency) works fine. Confirmed by `setserial ttyS1 irq 0`
making writes complete instantly.

Fix: whenever Java actually drains bytes out of the ring, issue a
short flag-pulse to ns16550a_notify — first with TX cleared, then
with TX restored — emulating the THRE low→high transition real
hardware exhibits after each transmit. No-op when zero bytes were
drained, to avoid spurious IRQ re-fires every tick.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant