Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stmhal: Add DMA support for sdcard #1625

Closed
wants to merge 3 commits into from

Conversation

dhylands
Copy link
Contributor

This also adds code to turn off the DMA blocks when they're idle which saves approx 8mA per DMA block.

This PR superceds PR #1608

- added some comments to explain the priority/sub-priority.
- adds an entry for SDIO (to be used in a later patch)
- increases DMA priority above USB so that DMA can be used
  for sdcard I/O when using USB Mass Storage.
@dpgeorge
Copy link
Member

Thanks @dhylands!

I think though that calling dma_tick_handler every ms is quite a bit of overhead, don't you think? Some options:

  • Call every 8ms or so by checking the low bits of uwTick (ie uwTick & 7 == 0).
  • Use the "system timer" TIM3. This timer is also used to flush the flash cache so would make sense to use for DMA (it's called every 10ms for USB CDC purposes). [Or maybe we can free up this timer and use SysTick for all system stuff, USB CDC and cache flushing? IRQ priorities would be an issue though.]
  • Only enable the dma_tick_handle call if DMA channels are on. This would still require a small check in SysTick_Handler.

@dpgeorge
Copy link
Member

@chuckbook says in another thread (moved here to keep conversation contained):

If you don't mind I'd ask you for 2-3 days extra time to allow me to check the impact of changes @dhylands proposed recently to solve the DMA power issue. He uses the SYSTICK to do timeout checks. We also have a RTC variant which uses SYSTICK during runtime. We have SYSTICK at highest priority and that might interfere with the rel. lengthy DMA check routines.

Yes, no problem, please make your tests! stmhal's systick is also at the highest prioriyt so it is important to keep that interrupt very lean (I made some suggestions above how to do that).

Turning on each DMA block increases the current consumption
by about 8 mA. This code adds an idle timer for each DMA
block and turns off the clocks when no streams are in use
for 128 msec. Having a small timeout allows for improved
performance when back-to-back transfers are being performed.

The 128 msec is basically a guess.
This started out using IgorLektorovEpam work in PR micropython#1389
and reworked it.
@dhylands
Copy link
Contributor Author

@dpgeorge I waffled back and forth when I first implemented it. I like the idea of only calling every Nth tick, so I did both that and check if its enabled.

I picked once every 16 msec (it's rather arbitrary) and made the timeout be 128 msec. I also check to see if the idle counter is enabled. Since I figured the DMA idle counter enabled would be less often than every 16 msec, I put that test first.

@dhylands
Copy link
Contributor Author

And as for @chuckbook's point, I believe that I took care of any interference by the dma_idle_handler and long DMA's.

In the dma_enable_clock routine, I grab a copy of the dma_enable_mask and mark the stream as enabled as an atomic operation. The tick_handler won't turn off the DMA clock (for a given controller) if any stream is marked as enabled.

@dhylands
Copy link
Contributor Author

Use the "system timer" TIM3. This timer is also used to flush the flash cache so would make sense to use for DMA (it's called every 10ms for USB CDC purposes). [Or maybe we can free up this timer and use SysTick for all system stuff, USB CDC and cache flushing? IRQ priorities would be an issue though.]

I thought about using TIM3, and that's certainly a viable option. I was concerned though that down the road, not all systems would have the need for the TIM3 functionality, which is why I opted for the SysTick functionality instead. (i.e. if USB & flash storage were both disabled then TIM3 could be freed up).

The work actually performed by the dma_idle_handler is quite minimal, which is why I thought using SysTick would be fine.

I think that running the USB CDC stuff off SysTick would require lowering the systick priority to be below flash/sdio/dma, or implementing the I/O as async.

@chuckbook
Copy link

I like the SYSTICK approach. I'm only concerned of stuffing too much into it. A simple time slicer in SysTickHandler with maybe 16 weak entries could do the job. Eventually the two DMA engines could be distributed as well.
This might be a weird looking proposal but let me give some motivation.
My current test case is a 32ch/24bit transient recorder running at ~20kHz with realtime signal processing utilizing 8 DMA channels. Out of curiosity I replaced the CLI with mpy and it works like a charm. Even USB performance is outstanding and with DMA driven sdcard (and a little bit of luck) we might get hours of local recording.
So far we do not loose a single SYSTICK in weeks of uninterrupted operation with occasional USB connection (and disconnection) to dump trace data with about 750 kB/s. Something we could not achieve by using one of the available RTOS.
It turns out that SYSTICK, running at prio 0, is the only performance pig in this setup and I would like to preserve it's current flat profile.

@dhylands
Copy link
Contributor Author

@chuckbook Just wanted to make that you get the latest version. I did some changes based on @dpgeorge 's feedback, so that it only calls the dma_idle_handler at most once every 16 msec, and only if there are dma idle countdowns enabled.


#define IRQ_PRI_OTG_FS 0x6
#define IRQ_SUBPRI_OTG_FS 0x0
#define IRQ_PRI_SDIO 4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add a comment here that SDIO IRQ must be higher priority than the DMA that it uses (it is, but just need a comment).

@dbc
Copy link
Contributor

dbc commented Nov 19, 2015

In closing pr#1565 @dpgeorge commented:

With the priority mechanism that we use (PRIORITYGROUP_4) subpriorities have no meaning, so they should all be 0.

I didn't follow up because: 1) I was trying to keep 1565 a no-risk text-substitution-zero-functionality-change PR, and 2) I'm not sure what mechanisms are in place to validate the sub-priority change, even though it should be no-risk.

Since @dhylands is digging a little deeper into IRQs with this change, I wonder if it might make sense to set the sub-pri's to zero with this PR, since presumably the same validation effort should cover both.

@chuckbook
Copy link

I finally completed my tests regarding DMA / Systick handling. With @dhylands modifications we can now stream 32 24-bit channels at ~20kHz to SDCARD (fatfs). There is only one modification to @dhylands PR.
in Systick handler a simple dispatcher

((void(*)(void))(fdisp[uwTick & 0xf]))();

was added and dma_tick_handler() was split up into two functions handling each of the DMAx subsystems. With 16ms cycle time 14 additional slots would be available for other tasks while keeping default exec times low.

@dpgeorge
Copy link
Member

Ok, I'll merge this as-is and we can improve the dispatch as per @chuckbook later.

@dpgeorge
Copy link
Member

Merged in 9f5486c, b677f03 and 6edffd0.

Code size increase by about 1500 bytes. SD read speed is pretty much the same as before the patch, at around 3700 kb/s (where k=1024 bytes).

@dpgeorge dpgeorge closed this Nov 24, 2015
@dpgeorge
Copy link
Member

@dhylands @chuckbook I implemented some optimisations in 522d454, 9936aa3 and 22bd231.

@dpgeorge
Copy link
Member

@dbc I set subpri to 0 in 3cfb02f. Seems to work fine.

@dbc
Copy link
Contributor

dbc commented Nov 25, 2015

@dbc I set subpri to 0 in 3cfb02f. Seems to work fine.

Which is good news, in that it all works the way we believe is should :) It is good to be cautious when fiddling interrupts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants