New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nrf5x: Fix EP OUT race conditions #1279
Conversation
When dcd_edpt_xfer() starts new transfer two separate problems were observed. For both problems stream of OUT packets was pouring from host. First problem was that total_len and actual_len were not atomic. In case where incoming OUT packets are less (63) than MPS (64), actual_len and total_len are set 63. Then transfer complete from USBD is called that will schedule next 64 bytes transfer. At that point incoming packet would start DMA if there is place in RAM, normally it does not happen since actual_len == total_len. If packets arrives and interrupt is raised after total_len is set (64) but actual_len is still 63 from previous transfer, interrupt code sees that there is place in ram (1 byte) and transfer this 1 byte to buffer that was already filled with previous packet. To remedy this USB interrupt is blocked during transfer setup. Second problem can happen when dcd_edpt_xfer setups xfer->total_len and actual_len correctly but then context switch happens before xfer->data_received is checked. If during this time two packets arrive one will be copied to RAM second will stay in endpoint with data_received set to 1. Then when xfer_edpt_xfer() checks data_receive flag it starts DMA again overwriting data. To remedy this, data_received is checked together with check if data was already transferred. If transfer was complete, there is no need to start DMA yet. In such case data_received will be handled in same place by next xfer_edpt_xfer() correctly.
It turns out that its not enough to handle races. To fix this:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you very much for putting effort to fix this race condition. I have been some hard time troubleshooting this before as well and have done a few PR to improve the race. All changes make sense, changes look small but have a huge impact and is very difficult to trace down.
@@ -453,9 +453,11 @@ bool dcd_edpt_xfer (uint8_t rhport, uint8_t ep_addr, uint8_t * buffer, uint16_t | |||
|
|||
xfer_td_t* xfer = get_td(epnum, dir); | |||
|
|||
dcd_int_disable(rhport); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I think I did put the int disable/enable here before, but somehow got reverted.
@@ -476,7 +478,7 @@ bool dcd_edpt_xfer (uint8_t rhport, uint8_t ep_addr, uint8_t * buffer, uint16_t | |||
edpt_dma_start(&NRF_USBD->TASKS_EP0RCVOUT); | |||
}else | |||
{ | |||
if ( xfer->data_received ) | |||
if ( xfer->data_received && xfer->total_len > xfer->actual_len) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, right, I fell the data_received
is not enough as well.
Describe the PR
When
dcd_edpt_xfer()
starts new transfer two separate problems were observed.For both problems stream of OUT packets was pouring from host.
First problem was that total_len and actual_len were not atomic.
In case where incoming OUT packets are less (63) than MPS (64), actual_len and total_len
are set 63.
Then transfer complete from USBD is called that will schedule next 64 bytes transfer.
At that point incoming packet would start DMA if there is place in RAM, normally
it does not happen since
actual_len == total_len.
If packets arrives and interrupt is raised after
total_len
is set (64) butactual_len
is still 63 fromprevious transfer, interrupt code sees that there is place in ram (1 byte) and transfer this 1 byte
to buffer that was already filled with previous packet.
To remedy this USB interrupt is blocked during transfer setup.
Second problem can happen when
dcd_edpt_xfer
setupstotal_len
andactual_len
correctlybut then context switch happens (or interrupts) before
xfer->data_received
is checked.If during this time two packets arrive one will be copied to RAM second will stay in endpoint with
data_received
set to 1.Then when
xfer_edpt_xfer()
checks data_receive flag it starts DMA again overwriting data.To remedy this, data_received is checked together with check if data was already transferred.
If transfer was complete, there is no need to start DMA yet.
In such case
data_received
will be handled in same place by nextxfer_edpt_xfer()
correctly.Additional context
Both problems were discovered while stress testing BTH with massive amount of ACL data going
through OUT endpoint.
It should be possible to reach same result with other source of steady flow non-MPS packets.
In my mace it was triggers by sending constant packet stream of 64-64-64-63 packet sequence.