-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serial not concurrency safe? #1157
Comments
I think this is not a "Forever" concurrency issue. See the following program - https://makecode.microbit.org/_71UgTKfrw4jh The first 3 Forever loops increment or decrement the shared variable, and the last one prints it. The first "iteration" adds a net +1 to the variable (-1 + 1 +1 = 1), the second "iteration" adds a net +2 (2 - 2 +2 = 2), the third, +3 (3 + 3 - 3 = 3), and the fourth resets the variable to 0. On both the simulator and the device, I see the sequence 0 -> 1 -> 3 -> 6 -> 0, which means all 3 Forever loops execute all of their instructions in order (otherwise I would see a different sequence of numbers). This means that the problem is necessarily with USB reading, or with the serial module. |
Yes, it appears that the statements in the loops are indeed executed. https://makecode.microbit.org/_b6vi4diPYbqy and they are apparently correctly incremented, even though the corresponding output to serial does not always show up on the PC. Maybe there is some buffer overwrite problem somewhere? However, if i take the statements of the three serial forever loops and merge them into a single loop in the same order that they (presumably) are executed by the microbit when in the separate concurrent loops, the problem disappears. https://makecode.microbit.org/_DuHaeLhDbKxm So it seems to me that concurrency/context switch possibly plays at least some part in this even though it might not be at fault. --Tomas |
In the DAL, the serial transmit functions can be used by only one fibre at a time. So I think the first forever loop sleeps inside write line until the transmission completes, with a lock set on the transmit functions. When the second loop calls transmit, the lock causes write line to fail. Which loop gets to transmit at each stage is a matter of timing. |
Assuming that the serial package uses the default mode of SYNC_SLEEP when calling e.g. "Send" in the DAL, then a forever loop calling serial.WriteLine while it is busy will simply not have the statement executed. The DAL returns the error code MICROBIT_SERIAL_IN_USE instead of sending anything. The desired behaviour really should be that the fiber instead sleeps, the serial send operation is carried out when serial is no longer busy and then the fiber is woken and resumes execution. This would ensure correct execution of all calls to serial. I suppose that the most efficient implementation would be inside the DAL, but it could also be made in the mapping from javascript to DAL services. (I'm speculating a bit here, as I know very little aboout the internals). If this cannot be fixed, i think it is important to document the concurrency-unsafe behavior of serial. When serial is used for debugging, a text not printed in the terminal window will be taken as evidence that that part of the program for some reason was not executed. The coder will then (as I did) scratch her head, understand nothing and insert further debugging printouts in the code to shed light on what's going on. This may lead to even more confusion as possibly other debug printouts will now be omitted. --Tomas |
The serial writeString impl is at https://github.com/Microsoft/pxt-microbit/blob/96ede2fc749198e805716ed4e7f49a355d2406ba/libs/core/serial.cpp#L109 . It uses the default mode of the DAL serial send https://github.com/lancaster-university/microbit-dal/blob/2cff906f019944ae15d532157c1301c940f67862/inc/drivers/MicroBitSerial.h#L272 which happens to be SYNC_SLEEP https://github.com/lancaster-university/microbit-dal/blob/2cff906f019944ae15d532157c1301c940f67862/inc/core/MicroBitConfig.h#L364 ( @tomas-mbit could you take screenshots in English? It helps finding which block you are using ) |
@finneyj is visiting our offices next week. I'll chat about this with him. |
>@tomas-mbit could you take screenshots in English? It helps finding which block you are using I didn't know how to do this, but soon found out. Here they are. I suppose that if you use the program links that I provided, it will come up at your end in whatever language is in your settings? --Tomas |
SYNC_SLEEP is great for reading operations. SYNC_SPINWAIT would solve the concurrency issue for write operations. From a students perspective, it seems more important to deliver reliably vs freeing up the CPU while transferring. Of course, I'm assuming that usually only small amounts of data are transferred. Thus, a change of the default parameters for write operations might already help a lot. |
thanks for reporting. For the nrf51 on micro:bit I think this makes a lot of sense - it has no DMA support and the transmit buffer is very small (just a handful of bytes). We should also fix the bug though - I guess we missed a mutex lock in there! |
@finneyj Should I prepare a PR for the microbit-dal that sets the default parameter for write to SYNC_SPINWAIT? Or should I update serial.cpp to use SYNC_SPINWAIT? |
@finneyj @tomas-mbit SL2a SL2a SL2a |
Thanks @tomas-mbit yes, please pop that in as a PR for now, which keeps things nice and safe until we get chance to drop in a mutex and test it through. |
@finneyj Done :-) |
thanks! |
@philipphenkel @finneyj Thanks for fixing this. I hope that my initial attempts at testing and analysis were of some help. |
Right now we are waiting on @finneyj to finish review this change. After that, the changes should make it to /beta within a day. |
@tomas-mbit Without your report and the analysis, I would have never connected it to the issue I struggled with. Also, your test program made it pretty convenient to verify the potential fix. Super helpful! |
@finneyj Do you approve philipphenkel@b72ad9a for merge? |
@finneyj did you review philipphenkel@b72ad9a ? |
@pelikhan Apologies. Just read it over. As a stop gap, updating the send() functions to be synchronous seems fine. I'll look into the concurrency issue next time I release microbit-dal, then we can revert back to the more efficient implementation. IMHO the calling fiber should block until the resource is ready. I'm not sure we want to make the receive / receiveUntil busy waiting too though? That could lock out the the CPU for an arbitrary period of time. Do we know there's a similar concurrency issue on the receive side? |
This needs a thorough investigation before going live. I am leaning on adding an optional argument to the serial APIs call that allow to specify the mode. |
Actually, I just misread the code - my bad. The read calls are SYNC_SLEEP, not SYNC_SPIN_WAIT. That's the existing default behaviour, so for receive() calls we're just being more explicit about what we already do. send() calls should be OK with SPIN_WAIT as that is a much more bounded use case (the fiber can only hog the CPU until its buffer runs out...) |
I intended to make the code more explicit so that it is easier to understand. The semantic change should be with send only. |
This is on the DAL side and not makecode. Correct me if I am wrong. |
Yes - DAL side. |
The issue does still exist. I tested with |
Describe the bug
Mulitiple forever loops writing to serial produces erroneous output.
To Reproduce
Run the simple program at https://makecode.microbit.org/_WEga5JYiX1x7
Three forever loops run in parallell. The body of each loop repeats three times the sequence of pausing one second and writing to serial. Each time a different string is written - loop 1 writes SL1a, SL1b, SL1c, loop 2 writes SL2a, SL2b... and so on. Output on Teraterm is incorrectly SL1a, SL2b SL3c, SL1a, SL2b, SL3c, SL1a...
Picture of block program:
Producing this incorrect output:
Expected behavior
For each loop, all the text should be output, and in the correct order (though it could be interleaved with the output from the other loops). E.g. SL1a, SL2a, SL3a, SL1b, SL2b, SL3b.... and so on.
When a "gate" function is added to make sure that serial is entered only by one forever loop at a time, correct output is produced, e.g. this program: https://makecode.microbit.org/_hDo7Ea59THbY
Writing to the LED matrix instead of to serial also works corretly (without any "gate")
Picture of program with "gate":
Producing this correct output:
Could this be an error in restoring an execution pointer in a context switch? Or an overwrite problem in some serial buffer?
The problems occurred in a rather complex program that I was debugging. The more debugging output I put into the program, the more confused I got and it took a lot of time. Seems rather unfortunate that the tool used for debugging introduces bugs itself.
After suspecting that the strange behavious actually wasn't my fault, I was eventually able to reproduce the problem in the simple programs above.
--Tomas
Desktop :
The text was updated successfully, but these errors were encountered: