-
Notifications
You must be signed in to change notification settings - Fork 7.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TW#24580] UART ISR race condition in uart_rx_intr_handler_default causing stuck data in the UART FIFO #2204
Comments
"The ISR will be cleared so the data will be stuck in the FIFO until more data is received. This will delay communications." Actually the interrupt status can not be cleared before reading data out of the FIFO. We will update the documentation about this point. Even if there is one byte left in the FIFO, it will trigger timeout interrupt if no more data is received. |
I'm experience the same issue, but for me the status reg |
Hello |
Hi @koobest we have set our rx_flow_ctrl_thresh to 110 and the rest is default from esp
I don't think fifo overflow is our problem, with my settings and flow control i expect that an overflow should never happen |
If tout interrupt occurs after rxfifo_cnt was read (during fifo readout) it will be cleared anyway together with fifo full interrupt and bytes will be stuck. |
Hi,
|
@negativekelvin, |
@koobest I don't think fifo overflow is our problem, with my settings and flow control i expect that an overflow should never happen. we have set our rx_flow_ctrl_thresh to 100 and the rest is default from esp. our rx buffer is 1024 and we aren't handling all UART_DATA events when it arrives meaning we get UART_BUFFER_FULL from time to time |
@koobest worst case with interrupt latency the new byte arrives just after rxfifo_cnt is read and timeout occurs just before flag is cleared so 10us, how long does it take to empty fifo at 80mhz? |
Hi,@lollizze, can you provide more details to help us debugging?
thanks!! |
Hi,@negativekelvin |
@koobest ok so it could be possible to mistakenly squelch timeout interrupt. All that needs to be done is to AND the interrupt clear mask with uart_intr_status. I don't see any downside. |
|
Hi,@ lollizze |
@koobest How can the size of the ring buffer affect the rx fifo, the stuck byte is in fifo before it is copied into the buffer. We can't have a buffer that is big enough to never get full. |
@ lollizze Can you help check the following register of uart when the problem occurs?
If ( (UART[uart_num]->status.rxfifo_cnt == 0) && ( UART[uart_num]->mem_rx_status.wr_addr != it means there some data was stuck in the fifo. this can helps us locate the problem. thanks !! |
@koobest here is the status of the registers |
While having a breakpoint in uart isr context to check register rx rd_addr drifted to 426, can that happen if breaking at that point? when starting again the uart is completely out of sync |
@lollizze
|
@koobest yes we are using that( |
Hi, @lollizze
|
We are handling |
I am not sure that this is caused by a race condition between the CPU and the ISR as described above. As far as I understand the CPU is not involved with the internal management of the fifo ( |
I added a pull request: espressif/arduino-esp32#1849 for #2388 This may also help here. Please check |
@qt1 your problem is probably the same as I have seen, rx_cnt is 0 but the wr rd addr is not the same |
Possible suggestion: https://gist.github.com/negativekelvin/f1c144aea3a6ba7e9280b4c93319fb7e/revisions |
@lollizze As far as I understand the management of the FIFO is (should be) done by the hardware or some microcode. The hardware API seems to be consistent: you only read from the FIFO and everything else happens automatically, so the CPU is not involved in the transaction and therefore the FIFO management should be affected by the CPU or interrupts. I suspect that this behaviour is a bug in the hardware and should be fixed by the Espresiff (I didn't read the whole patch as it seems to be very long) |
This is what leads to |
Hi, @negativekelvin Since we can not reproduce your issue, we can't fix it. However, we can continue to work on this issue, but need some help from your:
thanks !! |
Dear @koobest , The uart_reg->status.rxfifo_cnt == 0 && uart_reg->mem_rx_status.wr_addr != uart_reg->mem_rx_status.rd_addr, is a patch. I have a system where the problem occurs every 20-30 minutes. From time to time the patch does not help and there is some other mess in the buffer. The behaviour is statistical. My guess is that it is probably a race condition in the hardware that mix UART channels. Good luck fixing it! |
@qt1 |
Environment
Problem Description
There is a UART ISR race condition in uart_rx_intr_handler_default when using high baud rate (500000).
Current ISR:
1 rx_fifo_len = uart_reg->status.rxfifo_cnt;
2 [... deleted code for clarity]
3 while (buf_idx < rx_fifo_len) {
4 p_uart->rx_data_buf[buf_idx++] = uart_reg->fifo.rw_byte;
5 }
6 [.. delete ed code for clarity]
7 uart_clear_intr_status(uart_num, UART_RXFIFO_TOUT_INT_CLR_M | UART_RXFIFO_FULL_INT_CLR_M);
It is possible to get characters stuck in the FIFO as the ISR first reads the FIFO size (line 1 above) and pulls the data from the FIFO before clearing the ISR. If data is received while reading from the FIFO (time between lines 2 to 6 above)) it it will not be read. The ISR will be cleared so the data will be stuck in the FIFO until more data is received. This will delay communications.
I suggest that the ISR should be cleared before reading the uart_reg->status.rxfifo_cnt; This way if more data is received while read the FIFO another interrupt will be generated once completed.
Expected Behavior
Actual Behavior
Steps to repropduce
Very hard to reproduce:
esp32 --UART--> esp32
Baud rate set to 500000
UART config:
#define UART_EMPTY_THRESH_DEFAULT (10)
#define UART_FULL_THRESH_DEFAULT (127)
#define UART_TOUT_THRESH_DEFAULT (1)
Packet burst from esp32 1 to esp32 2
The text was updated successfully, but these errors were encountered: