Skip to content

Conversation

@mvds00
Copy link
Contributor

@mvds00 mvds00 commented Jun 22, 2023

For a bit of background see https://github.com/orgs/micropython/discussions/11807

MicroPython code may depend on the return value of sys.stdout.buffer.write() to reflect the number of bytes actually written. In most cases a write() succeeds, but in case it doesn't, data gets lost without any way to know it, if write() simply returns the number of bytes that should have been written.

One reason why a write may fail, is where USB is used and the receiving end doesn't read fast enough to clear out the receive buffer. In that case, write on the MicroPython side will timeout, and without this patch, part of the data is lost. This behavior was observed between a Pi Pico as client and a Linux host using the ACM driver.

This patch addresses this issue by at least moving the responsibility of supplying a return value from the core code in sys_stdio_mphal.c to the respective mp_hal_stdout_tx_strn() functions in each port. The next step is to make each port return the number of bytes actually written. For some ports, this is already implemented by this patch: where the write() system call is used, or tud_cdc_write() is used to write to USB.

A tricky problem is where mp_hal_stdout_tx_strn() has multiple outputs, e.g. USB combined with dupterm and/or hardware UART. In such cases, only successfully written bytes should probably be sent to dupterm, or else a second attempt to submit the same data will show on dupterm as duplicated data, and/or dupterm shows data that is not visible on USB.

sys.stdout.write() is a bit more difficult (but not impossible) to fix as it performs data modification (aka "cooked" output). It is kept as-is in this patch. The tradeoff made here is that developers that really care about the return value of write(), should probably use sys.stdout.buffer.write() anyway.

This patch may break some existing code, but the expectation is that it fixes existing code as well, given the fact that the return value oddness of sys.stdout.buffer.write() is not documented and probably not well known, so it is not likely that code depends on the prior (broken) behavior.

@github-actions
Copy link

github-actions bot commented Jun 22, 2023

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:   +24 +0.013% 
   unix x64:   +32 +0.004% standard
      stm32:   +72 +0.018% PYBV10
     mimxrt:   +72 +0.020% TEENSY40
        rp2:   +72 +0.022% RPI_PICO
       samd:   +80 +0.031% ADAFRUIT_ITSYBITSY_M4_EXPRESS

@codecov
Copy link

codecov bot commented Jun 22, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (5d28bb4) 98.40% compared to head (3bca93b) 98.40%.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #11850   +/-   ##
=======================================
  Coverage   98.40%   98.40%           
=======================================
  Files         159      159           
  Lines       21088    21088           
=======================================
  Hits        20752    20752           
  Misses        336      336           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@robert-hh
Copy link
Contributor

robert-hh commented Jun 23, 2023

It seems wrong for me to restrict the amount of data sent though the dupterm channel to the number that was sent to USB. dupterm is e.g. used by webrepl, and that must not lock up if USB does or is simply not available.

@mvds00
Copy link
Contributor Author

mvds00 commented Jun 23, 2023

It seems wrong for me to restrict the amount of data sent though the dupterm channel to the number that was sent to USB. dupterm is e.g. used by webrepl, and that must not lock up if USB does or is simply not available.

The logic of writing the number of bytes sent to USB is only followed if USB is connected: if tud_cdc_connected() is false, ret=len and mp_hal_stdout_tx_strn() pretends write was successful and all data is sent to dupterm.

Is this approach acceptable, or should I make dupterm simply receive all bytes always?

@mvds00 mvds00 force-pushed the pullreq1 branch 2 times, most recently from 5aa3ec9 to 6984c93 Compare October 21, 2023 09:36
@mvds00
Copy link
Contributor Author

mvds00 commented Oct 21, 2023

I rebased the pull requests on the current master branch of micropython. Are there any issues that need to be resolved?

Copy link
Contributor

@projectgus projectgus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @thingsconnected,

Thanks for opening this and for reminding us about it. @dpgeorge, @jimmo and myself just had a bit of a discussion about it.

In general, it looks like a good change. It's always possible for a USB CDC device to stop sending data to the host (either intermittently if the host becomes overloaded and doesn't request new transfers, or permanently if the host OS allows the program to close the serial port without de-asserting RTS & DTR. Windows at least sometimes does this.) The current behaviour of silently dropping bytes isn't ideal, as you've found.

As pointed out by @robert-hh, truncating the writes to os.dupterm() can have weird failure modes as well - for example in the second case describe above, if the USB CDC stops accepting data permanently. In that case 0 bytes will also be written to os.dupterm() on each write, and if using print() then the caller will not see that. In the case of something like a WebREPL this means that the USB port state can permanently freeze the WebREPL.

It's fundamentally hard to come up with an API that covers the failure modes of writing to multiple different places in one function call.

We settled on this: mp_hal_stdout_tx_strn() should attempt to write len bytes to all of the possible destinations for that data, and return the minimum successful write length. So in the case of a single destination, the result is always accurate. In the case of multiple destinations, a "short write" result will indicate that at least one destination received a short write and it's up to the caller to decide what to do from there.

(It's unclear exactly where to document this, maybe add a description of sys.stdout.buffer in the sys docs and note it there??)

Would you be up for making changes along those lines?

@mvds00 mvds00 force-pushed the pullreq1 branch 6 times, most recently from 9a892a9 to 89dc33e Compare October 24, 2023 23:30
@mvds00
Copy link
Contributor Author

mvds00 commented Oct 24, 2023

I tried to implement the logic as requested, and agree that anyone interested in the return value of sys.stdout.buffer.write() would probably want the most conservative value, but it gets pretty ugly, especially when considering "no output available/enabled" scenarios, and when considering scenarios with 0..N dupterm outputs.

@mvds00
Copy link
Contributor Author

mvds00 commented Nov 2, 2023

@projectgus I just rebased the PR so that it merges again after the teensy port was deleted. Anything else that I can do to help?


void mp_os_dupterm_tx_strn(const char *str, size_t len) {
int mp_os_dupterm_tx_strn(const char *str, size_t len) {
// Returns the minimum successful write length, or -1 if no write is attempted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe for simplicity it can return len if no write is attempted? Because technically it's true, that the whole write did succeed. That should simplify the logic for most callers of this function.

And then the return type can be size_t and match the type of len.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done to keep logic in line with the logic in ports/nrf/drivers/bluetooth/ble_uart.c and if (MP_STATE_PORT(board_stdio_uart) == NULL) in nrf/mphalport.c, where @projectgus and me agreed that "no connection == no bytes written" was logical, see #11850 (comment) above. If the dupterm outputs are all unconnected, it would not be logical to report that bytes were written.

(Possible separate issue: shouldn't mp_hal_stdout_tx_strn() in blue_uart.c call dupterm?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a place where the caller performs different logic based on 0 or -1 being returned?

It seems like in most places the caller's logic is if if (ret < 0) ret = 0;, in which case it seems like these functions can return 0 in both cases for "zero bytes written".

But I feel like I might be missing a nuance here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a nuance, and there are places where a return value of 0 is handled differently than -1. The logic is: if no write attempt had to be made in mp_os_dupterm_tx_strn(), it is basically a void call and it indicates nothing was done and hence nothing failed/succeeded, and the "bytes written" result of mp_os_dupterm_tx_strn() should not even be considered. However, if an attempt was made, it is of interest whether it wrote any bytes at all.

See e.g. mp_hal_stdout_tx_strn() in ports/stm32/mphalport.c:

MP_WEAK mp_uint_t mp_hal_stdout_tx_strn(const char *str, size_t len) {
    mp_uint_t ret = len;
    bool did_write = false;
    if (MP_STATE_PORT(pyb_stdio_uart) != NULL) {
        uart_tx_strn(MP_STATE_PORT(pyb_stdio_uart), str, len);
        did_write = true;
    }
    #if 0 && defined(USE_HOST_MODE) && MICROPY_HW_HAS_LCD
    lcd_print_strn(str, len);
    #endif
    int dupterm_res = mp_os_dupterm_tx_strn(str, len);
    if (dupterm_res >= 0) {
        did_write = true;
        ret = MIN((mp_uint_t)dupterm_res, ret);
    }
    return did_write?ret:0;
}

In case the pyb_stdio_uart write succeeds, we have ret == len. If then mp_os_dupterm_tx_strn() would have nothing to do (no outputs so nothing to attempt) and would return zero, then this function mp_hal_stdout_tx_strn() would return zero as well, indication something went wrong, while everything would actually be fine.

I think there is no shortcut here for mp_os_dupterm_tx_strn(), it basically has three main results: no write attempted, write attempted but (partly) failed, write succeeded completely. Any simplification there would require simplification in the higher level logic. One overall simplification could be to rework the entire thing and create a single list of outputs, i.e. make every output a "dupterm" output. But that would be a lot of work on multiple levels, with marginal benefits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me, thanks for the detailed explanation. I agree the approach in this PR is the best way to proceed.

Will leave this for @dpgeorge to pick up when he has time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is no shortcut here for mp_os_dupterm_tx_strn(), it basically has three main results: no write attempted, write attempted but (partly) failed, write succeeded completely.

Yes, I agree, it needs these three distinct return scenarios.

@mvds00 mvds00 force-pushed the pullreq1 branch 2 times, most recently from f87b37f to a299bb1 Compare November 3, 2023 11:07
@dpgeorge dpgeorge added the py-core Relates to py/ directory in source label Nov 9, 2023
@mvds00
Copy link
Contributor Author

mvds00 commented Nov 15, 2023

Hi @projectgus, is there anything that I need to do still, to get this PR through?

Copy link
Contributor

@projectgus projectgus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thingsconnected Thanks for putting in all this extra work!

The only thing I can see here is the unresolved discussion above.


void mp_os_dupterm_tx_strn(const char *str, size_t len) {
int mp_os_dupterm_tx_strn(const char *str, size_t len) {
// Returns the minimum successful write length, or -1 if no write is attempted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a place where the caller performs different logic based on 0 or -1 being returned?

It seems like in most places the caller's logic is if if (ret < 0) ret = 0;, in which case it seems like these functions can return 0 in both cases for "zero bytes written".

But I feel like I might be missing a nuance here?

@mvds00
Copy link
Contributor Author

mvds00 commented Dec 20, 2023

I rebased the pull request so that it applies flawlessly (only ports/cc3200/hal/cc3200_hal.c and ports/nrf/mphalport.c needed some attention). The only failing check is due to unix port / coverage malfunctioning.

It would be great if the PR could be merged and sys.stdout.buffer.write() would start returning the number of bytes written, before yet another port is modified...

@dpgeorge dpgeorge added this to the release-1.22.0 milestone Dec 21, 2023
@dpgeorge
Copy link
Member

@thingsconnected I see this PR has 3 commits, but the third one is cleaning up the logic from the first commit.

I suggest the commits be rearranged as follows:

  • the second one comes first, ie "extmod/os_dupterm.c: Let mp_os_dupterm_tx_strn() return bytes written." is the first commit
  • the other two commits are squashed together and appear as the second commit

@mvds00
Copy link
Contributor Author

mvds00 commented Dec 21, 2023

@thingsconnected I see this PR has 3 commits, but the third one is cleaning up the logic from the first commit.

I suggest the commits be rearranged as follows:

* the second one comes first, ie "extmod/os_dupterm.c: Let mp_os_dupterm_tx_strn() return bytes written." is the first commit

* the other two commits are squashed together and appear as the second commit

Agree, done! Also rewrote the commit message. Code is kept equal.

In case of multiple outputs, the minimum successful write length is
returned.  In line with this, in case any output has a write error, zero is
returned.

In case of no outputs, -1 is returned.

The return value can be used to assess whether writes were attempted, and
if so, whether they succeeded.

Signed-off-by: Maarten van der Schrieck <maarten@thingsconnected.nl>
MicroPython code may rely on the return value of sys.stdout.buffer.write()
to reflect the number of bytes actually written. While in most scenarios a
write() operation is successful, there are cases where it fails, leading to
data loss. This problem arises because, currently, write() merely returns
the number of bytes it was supposed to write, without indication of
failure.

One scenario where write() might fail, is where USB is used and the
receiving end doesn't read quickly enough to empty the receive buffer. In
that case, write() on the MicroPython side can timeout, resulting in the
loss of data without any indication, a behavior observed notably in
communication between a Pi Pico as a client and a Linux host using the ACM
driver.

A complex issue arises with mp_hal_stdout_tx_strn() when it involves
multiple outputs, such as USB, dupterm and hardware UART. The challenge is
in handling cases where writing to one output is successful, but another
fails, either fully or partially. This patch implements the following
solution:

mp_hal_stdout_tx_strn() attempts to write len bytes to all of the possible
destinations for that data, and returns the minimum successful write
length.

The implementation of this is complicated by several factors:
- multiple outputs may be enabled or disabled at compiled time
- multiple outputs may be enabled or disabled at runtime
- mp_os_dupterm_tx_strn() is one such output, optionally containing
  multiple additional outputs
- each of these outputs may or may not be able to report success
- each of these outputs may or may not be able to report partial writes

As a result, there's no single strategy that fits all ports, necessitating
unique logic for each instance of mp_hal_stdout_tx_strn().

Note that addressing sys.stdout.write() is more complex due to its data
modification process ("cooked" output), and it remains unchanged in this
patch. Developers who are concerned about accurate return values from
write operations should use sys.stdout.buffer.write().

This patch might disrupt some existing code, but it's also expected to
resolve issues, considering that the peculiar return value behavior of
sys.stdout.buffer.write() is not well-documented and likely not widely
known. Therefore, it's improbable that much existing code relies on the
previous behavior.

Signed-off-by: Maarten van der Schrieck <maarten@thingsconnected.nl>
@dpgeorge dpgeorge merged commit 3bca93b into micropython:master Dec 21, 2023
@dpgeorge
Copy link
Member

Thanks for updating the commits. Rebased and merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

py-core Relates to py/ directory in source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants