nrf5x: Fix DMA access race condition #1280

kasjer · 2022-01-14T09:06:10Z

Describe the PR
In multi-thread mode starting DMA in thread mode was
prone to race condition resulting in infinite loop.
It may happen on single core CPU with strict priority based
tasks scheduler where ready high prio task never yields to
ready low prio task (Mynewt).

Sequence that failed (T1 - low priority task, T2 - high priority task)

T1 called start_dma()
T1 set _dcd.dma_running (DMA not started yet, context switch happens)
T2 took CPU and saw that _dcd.dma_running is set, so waits for _dcd.dma_running to be 0
T1 never gets CPU again, DMA is not started T2 waits forever

OSAL mutex resolves problem of DMA starting from thread-context.

Additional context
Happened while stress testing Nimble stack on Mynewt with USB transport.

In multi-thread mode starting DMA in thread mode was prone to race condition resulting in infinite loop. It may happen on single core CPU with strict priority based tasks scheduler where ready high prio task never yields to ready low prio task (Mynewt). Sequence that failed (T1 - low priority task, T2 - high priority task) - T1 called start_dma() - T1 set _dcd.dma_running (DMA not started yet, context switch happens) - T2 took CPU and saw that _dcd.dma_running is set, so waits for _dcd.dma_running to be 0 - T1 never gets CPU again, DMA is not started T2 waits forever OSAL mutex resolves problem of DMA starting from thread-context.

kasjer · 2022-01-19T08:10:24Z

Additional mutex lock added during transfer setup to prevent premature interrupt enable that could happen if two tasks started two separate transfers.

When two tasks entered dcd_edpt_xfer() it was possible that first disabled interrupt to setup total_len and actual_len but second task for another endpoint enabled interrupt between total_len and actual_len resulting in race condition with interrupt, hence mutex is added on top of interrupt being blocked.

hathach

Thank your for the PR and sorry for late response, this somehow falled off my radar. The current implementation of nrf5x indeed has issue with race condition in preempted RTOS. I was hoping to use LDREX/STREX to have mutex, but couldn't get those to work. In the future, I think we could make use of semaphore as resource management instead of mutex which make a little bit more sense.

kasjer added the Port nRF label Jan 14, 2022

kasjer force-pushed the kasjer/nrf5x-dma-race branch 2 times, most recently from bad5610 to 9085b34 Compare January 18, 2022 13:09

kasjer force-pushed the kasjer/nrf5x-dma-race branch from 9085b34 to c16b56a Compare January 19, 2022 08:07

kasjer force-pushed the kasjer/nrf5x-dma-race branch from c16b56a to 36b6ed8 Compare January 19, 2022 08:48

hathach approved these changes Feb 22, 2022

View reviewed changes

hathach merged commit e04f15f into hathach:master Feb 22, 2022

kasjer deleted the kasjer/nrf5x-dma-race branch February 22, 2022 14:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nrf5x: Fix DMA access race condition #1280

nrf5x: Fix DMA access race condition #1280

kasjer commented Jan 14, 2022

kasjer commented Jan 19, 2022

hathach left a comment

nrf5x: Fix DMA access race condition #1280

nrf5x: Fix DMA access race condition #1280

Conversation

kasjer commented Jan 14, 2022

kasjer commented Jan 19, 2022

hathach left a comment

Choose a reason for hiding this comment