jni/ns16550a_bridge: simulate TX-empty rising edge on drain#2
Merged
Conversation
The bridge's CHARDEV_TX flag means "ring has space for more guest writes" — true continuously for the 64KB ring in practical use. That kept ns16550a_notify short-circuiting on an unchanging flag word, which meant the UART never re-raised its TX-empty IRQ after the kernel enabled IER.THR and handled the first interrupt. Kernel's IRQ-driven 8250 tx path then deadlocked: single-byte `echo > ttyS1` hangs inside close()'s tty drain, while the console port (polled TX, no IRQ dependency) works fine. Confirmed by `setserial ttyS1 irq 0` making writes complete instantly. Fix: whenever Java actually drains bytes out of the ring, issue a short flag-pulse to ns16550a_notify — first with TX cleared, then with TX restored — emulating the THRE low→high transition real hardware exhibits after each transmit. No-op when zero bytes were drained, to avoid spurious IRQ re-fires every tick.
SolAstrius
added a commit
that referenced
this pull request
Apr 27, 2026
Linux's snd_hdac_bus_init_cmd_io (sound/hda/core/controller.c) does:
writew(CORBRP, AZX_CORBRP_RST); // bit 15 = 1
for (timeout = 1000; timeout > 0; timeout--)
if (readw(CORBRP) & AZX_CORBRP_RST) // poll for bit 15 = 1
break;
// ↑ "CORB reset timeout #1" if poll times out
writew(CORBRP, 0);
for (timeout = 1000; timeout > 0; timeout--)
if (readw(CORBRP) == 0) // poll for bit 15 = 0
break;
// ↑ "CORB reset timeout #2" if poll times out
Our handler reset corb_rp=0 immediately on the bit-15 write but never
echoed bit 15 back on read, so Linux's first poll always timed out
(1000 µs of busy-looping, then the dev_err warning). The cascade from
there causes spurious response timeouts during codec discovery —
Linux falls back to polling mode, retries, and somewhere in that
chaos NID 2's widget caps response gets dropped or returns garbage.
Linux mis-classifies the topology: NID 2 disappears entirely from the
codec dump, and snd_hda_codec_generic configures the pin (NID 3) as
"mono_out=0x3" with no converter underneath. No PCM device is
created, no Master mixer control exists, speaker-test errors with
ENOENT.
Fix: store corb_rprst (bit 15) separately from corb_rp (bits 7:0),
echo on read. SW writes 1 → set rprst, reset corb_rp to 0; SW writes
0 → clear rprst, store new RP value. Both Linux poll loops now exit
on the first iteration.
This regression went unnoticed because all the standalone smoke
boots in this session ran rvvm_x86_64 without the -hda_test flag —
HDA wasn't actually attached to the PCI bus. The user discovered it
when running through the ScalarEvolution mod's full machine config
(which always attaches HDA) and seeing playback fail.
The CORBRP RST handshake has been broken since this device was first
written; before commit 5d10843 (table-driven SD dispatch) the code
also reset corb_rp=0 on bit-15 write without echoing the bit. It
worked for users in practice because Linux only logs the timeout as
a warning and continues — the real damage was the 1 ms × 1000 = 1 s
delay during which subsequent codec verbs raced. Some guests were
lucky and got a clean topology; others (apparently anyone with the
new Beep widget bumping subnode count to 3) tipped over.
SolAstrius
added a commit
that referenced
this pull request
May 1, 2026
The bridge's CHARDEV_TX flag means "ring has space for more guest writes" — true continuously for the 64KB ring in practical use. That kept ns16550a_notify short-circuiting on an unchanging flag word, which meant the UART never re-raised its TX-empty IRQ after the kernel enabled IER.THR and handled the first interrupt. Kernel's IRQ-driven 8250 tx path then deadlocked: single-byte `echo > ttyS1` hangs inside close()'s tty drain, while the console port (polled TX, no IRQ dependency) works fine. Confirmed by `setserial ttyS1 irq 0` making writes complete instantly. Fix: whenever Java actually drains bytes out of the ring, issue a short flag-pulse to ns16550a_notify — first with TX cleared, then with TX restored — emulating the THRE low→high transition real hardware exhibits after each transmit. No-op when zero bytes were drained, to avoid spurious IRQ re-fires every tick.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix
In `Java_..._ns16550a_1bridge_1poll`, after Java actually drains bytes out of the TX ring, pulse the UART's cached flags: notify once with `CHARDEV_TX` cleared, then notify again with the real flags. This emulates the real-hardware THRE low→high transition that occurs after each transmit, giving the 8250 driver the rising edge its IRQ handler expects.
No-op when zero bytes were drained, to avoid spurious IRQ re-fires every host tick.
Repro (before this patch)
On a guest with two NS16550A bridges (console + RPC):