Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ot] Do not drop data on PTY write if it is connected #23

Closed
wants to merge 1 commit into from

Conversation

pamaury
Copy link

@pamaury pamaury commented Aug 22, 2023

The current code in char-pty.c is a bit odd: is uses poll to determine when it can read from a device, but it does not poll for write. Instead, it uses a timer to regularly determine whether the PTY is connected. We can always determine the connection status of a PTY by polling the HUP flag (up for disconnected, down for connected). This commit refactors the code to get rid of the timer and simply poll with no delay on every write to see if it connected.

A better solution would be to use an I/O watch with a callback.

Context: we want to use a PTY to connect the UART to opentitantool. However with the current code, the connection status might take up to a second to be updated so if we want to avoid dropping data from early boot, we need to wait a whole second doing nothing on startup. This will prevent this delay.

The current code in char-pty.c is a bit odd: is uses poll to
determine when it can read from a device, but it does not poll
for write. Instead, it uses a timer to regularly determine whether
the PTY is connected. We can always determine the connection status
of a PTY by polling the HUP flag (up for disconnected, down for
connected). This commit refactors the code to get rid of the timer
and simply poll with no delay on every write to see if it connected.

A better solution would be to use an I/O watch with a callback.

Signed-off-by: Amaury Pouly <amaury.pouly@lowrisc.org>
Copy link

@jwnrt jwnrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding and fixing this @pamaury

I think I understand the change: remove the timer code and do the poll on writes as well as the existing reads.

Is this patch suitable for upstreaming to QEMU? I'm guessing you don't think so based on:

A better solution would be to use an I/O watch with a callback.

Comment on lines +59 to +61
/* If a PTY is connected, the HUP flag is cleared so
* we can poll the connection status this way. */
pfd.events = G_IO_HUP;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For opentitantool we have exclusive access to the PTY so this isn't an issue, however do you know what happens with HUP when there are two readers and only one of them disconnects? Can you even have two readers on a PTY?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried on my machine and I could not get a second terminal to connect (it said that the device is locked).

Comment on lines +69 to 72
static void pty_chr_update_read_handler(Chardev *chr)
{
char_pty_poll_connection_status(chr);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this function wrapper do?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pty_chr_update_read_handler is using by QEMU to update the reading side of the device. I am sure a QEMU export can fill the blanks, but the bottom line is that this function was already there and polled the FD. I just factored out the code to char_pty_poll_connection_status.

@rivos-eblot
Copy link

I think you need to discuss this with QEMU upstream. PTY and chardev support is a delicate piece of SW in QEMU, and not breaking stuff might be difficult. You also need to check it works on other Unix flavors (*BSD, macOS, ...) - just trying not opening a can of worms :-)

BTW can't you use sockets to connect the UART?

@jwnrt
Copy link

jwnrt commented Aug 22, 2023

BTW can't you use sockets to connect the UART?

Yes, it just means implementing in-band software flow control manually rather than relying on existing code for serial ports that connect to TTYs.

We've had code mostly working using sockets before, and may go back to using them if this doesn't work out

@rivos-eblot
Copy link

rivos-eblot commented Aug 22, 2023

BTW can't you use sockets to connect the UART?

Yes, it just means implementing flow control manually rather than relying on existing code for serial ports that connect to TTYs.

Do you need flow control at all in this case?

@jwnrt
Copy link

jwnrt commented Aug 22, 2023

Do you need flow control at all in this case?

I'm not sure, I was fine running the OpenTitan UART tests that I tried, but opentitantool has implemented flow control support for the other "transports" (verilator and FPGAs) so I was thinking there might be some tests that rely on it.

Maybe we can go ahead using sockets without flow control and see if anything breaks.

@pamaury
Copy link
Author

pamaury commented Aug 22, 2023

@jwnrt thinks it's simpler to reuse the PTY code that we have for real hardware since it's supported by QEMU.

I don't know if we can upstream that change, ideally yes but as you said it's probably quite tedious to check it on every single platform.

@rivos-eblot
Copy link

Note that sockets in QEMU are raw sockets so if you need to use out-of-bad data, you will need to provide your own chardev backend. It seems quite some work - not sure it is worth it.

@rivos-eblot
Copy link

rivos-eblot commented Aug 22, 2023

@jwnrt thinks it's simpler to reuse the PTY code that we have for real hardware since it's supported by QEMU.

I would tend to think it is much simpler to change OT tools than attempting to change anything upstream :-) (personal note: I would never touch this piece of code myself, too many risks of unexpected regressions)

@jwnrt
Copy link

jwnrt commented Aug 23, 2023

Thanks for the guidance @rivos-eblot, I'm going to try shuffle opentitantool around so we can reuse the software flow control code over a socket as well as the PTY

@pamaury
Copy link
Author

pamaury commented Aug 23, 2023

By some incredible luck, someome posted a very similar patch to the qemu devel mailing list a few days ago:
https://lore.kernel.org/qemu-devel/20230816210743.1319018-1-thuth@redhat.com/
I have sent a message saying that we are interested in seeing this change happen.

@jwnrt
Copy link

jwnrt commented Aug 24, 2023

I've updated my opentitantool PR to use files as the UART connection so it can share the verilator code.

If that patch lands in upstream QEMU, maybe we can cherry pick it in and switch to using the serial directly, but for now the file-based connection seems to work fine.

@rivos-eblot
Copy link

I've updated my opentitantool PR to use files as the UART connection so it can share the verilator code.

Is this PR lowRISC/opentitan#19471 ?

@jwnrt
Copy link

jwnrt commented Sep 6, 2023

Is this PR lowRISC/opentitan#19471 ?

Yes, that's right.

That PR is based on / waiting on lowRISC/opentitan#19420 which is for having Bazel pass the QEMU flags. We're just missing that flag to ignore the incorrect flashes.

loiclefort pushed a commit that referenced this pull request Dec 5, 2023
virtio_load() as a whole should run in coroutine context because it
reads from the migration stream and we don't want this to block.

However, it calls virtio_set_features_nocheck() and devices don't
expect their .set_features callback to run in a coroutine and therefore
call functions that may not be called in coroutine context. To fix this,
drop out of coroutine context for calling virtio_set_features_nocheck().

Without this fix, the following crash was reported:

  #0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
  #1  0x00007efc738c05d3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
  #2  0x00007efc73873d26 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
  #3  0x00007efc738477f3 in __GI_abort () at abort.c:79
  #4  0x00007efc7384771b in __assert_fail_base (fmt=0x7efc739dbcb8 "", assertion=assertion@entry=0x560aebfbf5cf "!qemu_in_coroutine()",
     file=file@entry=0x560aebfcd2d4 "../block/graph-lock.c", line=line@entry=275, function=function@entry=0x560aebfcd34d "void bdrv_graph_rdlock_main_loop(void)") at assert.c:92
  #5  0x00007efc7386ccc6 in __assert_fail (assertion=0x560aebfbf5cf "!qemu_in_coroutine()", file=0x560aebfcd2d4 "../block/graph-lock.c", line=275,
     function=0x560aebfcd34d "void bdrv_graph_rdlock_main_loop(void)") at assert.c:101
  #6  0x0000560aebcd8dd6 in bdrv_register_buf ()
  #7  0x0000560aeb97ed97 in ram_block_added.llvm ()
  #8  0x0000560aebb8303f in ram_block_add.llvm ()
  #9  0x0000560aebb834fa in qemu_ram_alloc_internal.llvm ()
  #10 0x0000560aebb2ac98 in vfio_region_mmap ()
  #11 0x0000560aebb3ea0f in vfio_bars_register ()
  #12 0x0000560aebb3c628 in vfio_realize ()
  #13 0x0000560aeb90f0c2 in pci_qdev_realize ()
  #14 0x0000560aebc40305 in device_set_realized ()
  #15 0x0000560aebc48e07 in property_set_bool.llvm ()
  #16 0x0000560aebc46582 in object_property_set ()
  #17 0x0000560aebc4cd58 in object_property_set_qobject ()
  #18 0x0000560aebc46ba7 in object_property_set_bool ()
  #19 0x0000560aeb98b3ca in qdev_device_add_from_qdict ()
  #20 0x0000560aebb1fbaf in virtio_net_set_features ()
  #21 0x0000560aebb46b51 in virtio_set_features_nocheck ()
  #22 0x0000560aebb47107 in virtio_load ()
  #23 0x0000560aeb9ae7ce in vmstate_load_state ()
  #24 0x0000560aeb9d2ee9 in qemu_loadvm_state_main ()
  #25 0x0000560aeb9d45e1 in qemu_loadvm_state ()
  #26 0x0000560aeb9bc32c in process_incoming_migration_co.llvm ()
  #27 0x0000560aebeace56 in coroutine_trampoline.llvm ()

Cc: qemu-stable@nongnu.org
Buglink: https://issues.redhat.com/browse/RHEL-832
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20230905145002.46391-3-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 92e2e6a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
@jwnrt
Copy link

jwnrt commented Dec 8, 2023

By some incredible luck, someome posted a very similar patch to the qemu devel mailing list a few days ago: https://lore.kernel.org/qemu-devel/20230816210743.1319018-1-thuth@redhat.com/ I have sent a message saying that we are interested in seeing this change happen.

That fix landed upstream in (I think) 8.2.0, so maybe we can close this PR and wait until the next rebase on QEMU: qemu@4f7689f

@jwnrt
Copy link

jwnrt commented Dec 8, 2023

Apologies, it actually landed pre-8.1.2 so we have the fix in our tree now. I'll close this PR.

@jwnrt jwnrt closed this Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants