Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DMA transfers not working on the STM32F7 #1677

Closed
ruffy91 opened this issue Dec 3, 2015 · 28 comments
Closed

DMA transfers not working on the STM32F7 #1677

ruffy91 opened this issue Dec 3, 2015 · 28 comments

Comments

@ruffy91
Copy link
Contributor

@ruffy91 ruffy91 commented Dec 3, 2015

The sd card on a STM32F7-DISCO board can no longer be mounted since DMA has been added to the sdcard.c driver (7319e96) .
The shown error is:

PYB: sync filesystems
PYB: soft reboot
[SD] could not mount SD card
MicroPython v1.5.1-71-g9cc1831 on 2015-12-03; F7DISC with STM32F746
Type "help()" for more information.
>>> 

sd.read() gives back a bytearray which contains corrupt data.
example (double and triple , %, q, o, h, y, _ etc.):

>>> buffer=pyb.SD.read(512)
>>> print(buffer)
bytearray(b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\x01\x00\x00%\x01\x00\x00\x00\x01\x00\x00\\\x02\x00\x00\x00\x02\x00\x00]\x00\x00\x00\x00\x01\x00\x00b\x01\x00\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\x01\x00\x00n\x02\x00\x00\x00\x02\x00\x00o\x00\x00\x00\x00\x02\x00\x00q\x00\x00\x00\x00\x02\x00\x00s\x00\x00\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\x01\x00\x00e\x02\x00\x00\x00\x01\x00\x00h\x00\x00\x00\x00\x01\x00\x00g\x00\x00\x00\x00\x01\x00\x00m\x01\x00\x00\x00\x01\x00\x00n\x02\x00\x00\x00\x01\x00\x00o\x00\x00\x00\x00\x01\x00\x00q\x00\x00\x00\x00\x01\x00\x00s\x00\x00\x00\x00\x01\x00\x00y\x00\x00\x00\x00\x01\x00\x00\x89\x00\x00\x00\x00\x01\x00\x00\x88\x00\x00\x00\x00_\xff\xd0\xdc\xc6\xd5\x9e\xbc\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff')
@dpgeorge

This comment has been minimized.

Copy link
Member

@dpgeorge dpgeorge commented Dec 8, 2015

I don't have an F7 board so I can't work on this.

@dhylands

This comment has been minimized.

Copy link
Contributor

@dhylands dhylands commented Dec 8, 2015

I'll put this on my to-do list

@ruffy91

This comment has been minimized.

Copy link
Contributor Author

@ruffy91 ruffy91 commented Dec 16, 2015

The same problem applies to I2C reads/writes which use DMA. The result contains corrupted data, no matter how many bytes I read with I2C.mem_read(). Also doesn't matter if I use 100kBaud or 400kBaud.

@dpgeorge

This comment has been minimized.

Copy link
Member

@dpgeorge dpgeorge commented Dec 16, 2015

We probably need a compile time option to use/not use the DMA. Eg MICROPY_HW_USE_DMA.

@ruffy91

This comment has been minimized.

Copy link
Contributor Author

@ruffy91 ruffy91 commented Dec 16, 2015

I think DMA should be fixed on the F7. The problem can't be very big, as the F7 DMA peripheral is very similar to the F1 and F4 series. I'm trying to do it myself but until now I couldn't find the cause for the corrupted transfers.
i2c.c checks if IRQs are disabled and then uses the non-DMA method, but I don't know how to force the IRQs to disabled while doing I2C transfers.
I will change the code to always use the non-DMA method and report back tomorrow.
I already got the SD-Card to work again with the older sdcard.c which doesn't use DMA.

@dpgeorge

This comment has been minimized.

Copy link
Member

@dpgeorge dpgeorge commented Dec 16, 2015

but I don't know how to force the IRQs to disabled while doing I2C transfers.

In Python:

irq_state = pyb.disable_irq()
i2c.send(...)
pyb.enable_irq(irq_state)

In C, it's similar, but use the functions disable_irq and enable_irq.

@ruffy91

This comment has been minimized.

Copy link
Contributor Author

@ruffy91 ruffy91 commented Dec 17, 2015

Disabling I and D caches (in HAL_MspInit()) gets the DMA to work!
There seems to be additional work to be done so that the caches can be enabled.
And changing the clock configuration for the STM32F7DISC from (50MHz USB Clock)
#define MICROPY_HW_CLK_PLLN (400)
#define MICROPY_HW_CLK_PLLQ (8)
to (48MHz USB Clock)
#define MICROPY_HW_CLK_PLLN (336)
#define MICROPY_HW_CLK_PLLQ (7)
got USB MSC to work without problems with the sdcard

@dhylands

This comment has been minimized.

Copy link
Contributor

@dhylands dhylands commented Dec 17, 2015

Interesting. I know that with the I & D caches disabled, the non-DMA path for te sdcard doesn't work reliably (the code slows down enough that it allows the FIFO to empty which the sdcard doesn't like).

That's a really useful clue (about the caches). I guess we need to do cache flushing/invalidation just like on the A-series or something (I'm familiar with caching issues and DMA on the larger processors - and the M4 has DMA going through the cache - so there aren't any problems. Obviously the M7 is different).

Do you want to do a PR for the clock configuration? If not, I can do it.

@ruffy91

This comment has been minimized.

Copy link
Contributor Author

@ruffy91 ruffy91 commented Dec 18, 2015

I'll do a pull request for the clock configuration
For now i'll try to reenable the I-Cache and test again. I'll then do a pull request for the cache configuration (next week).
Second step is enabling the D-cache in write-through mode which should work as with disabled D-cache (I'm not sure if this is only true for TX or also for RX), but with a chance that the processor halts while waiting for the underlying write operation to finish. This should none the less be faster than a disabled D-cache.
In the future the DMA driver should be extended so that it supports the F7, which requires different things:

  • RX and TX DMA buffers must be aligned to the D-cache lines
  • Cache lines must be flushed + invalidated to the right times
  • DMA buffers must not be in the DTCM memory

I've found some very good resources on this here: http://www.nuttx.org/doku.php?id=wiki:howtos:port-drivers_stm32f7

@ruffy91 ruffy91 changed the title SD Card can not be mounted on STM32F7-DISCO DMA transfers not working on the STM32F7 Dec 18, 2015
ruffy91 added a commit to ruffy91/micropython that referenced this issue Dec 18, 2015
@dhylands

This comment has been minimized.

Copy link
Contributor

@dhylands dhylands commented Dec 18, 2015

Second step is enabling the D-cache in write-through mode which should work as with disabled D-cache (I'm not sure if this is only true for TX or also for RX), but with a chance that the processor halts while waiting for the underlying write operation to finish. This should none the less be faster than a disabled D-cache.

Yeah - that's just a fact of life when dealing with DMA. There are occasions where using uncached buffers can offer better performance/less cache thrashing, but I think that type of operation is only available when you have an MMU.

The data sheet for the F7 series specifically mentions that the DMA can use DTCM.

Generally the operations for performing cache coherency goes something like this:

For Memory to Peripheral direction, you need to flush the caches to memory and wait for the flush to complete before initiating the DMA.

For Peripheral to Memory direction, you need to invalidate the caches for the destination memory addresses and wait for the invalidate to complete before starting the DMA.

For bi-dierctional, you have to do both. Flush first, wait for flush to complete, invalidate second, wait for invalidate to finish, and then start DMA.

The buffers don't necessarily need to cache line aligned, but it definitely helps. The places that will cause problems if if you invalidate a cache line, and while you're performing your DMA, an interrupt handler comes along and touches something in the cache line but outside the buffer.

@Anton-2

This comment has been minimized.

Copy link
Contributor

@Anton-2 Anton-2 commented Dec 18, 2015

I'll need to dig further on the documentation, but I think that you can tag regions as cachable or not, using the mpu.
As configured before disabling I/D cache, with micropython on a STM32F7 DISCO, internal sram goes thru the D/I cache, but external SDRAM does not : I've got display artifacts on the LCD when the frame buffer is on internal SRAM, and not in external SDRAM.

@Anton-2

This comment has been minimized.

Copy link
Contributor

@Anton-2 Anton-2 commented Jan 24, 2016

I've got I2C with DMA working reliably on the F7 by doing SCB_CleanDCache_by_Addr before sending data and SCB_CleanInvalidateDCache_by_Addr before receiving it.

Clean is CMSIS way of saying flush, so it's basically what @dhylands said above. And I still don't understand why you need to Flush and Invalidate before starting the DMA when receiving data, but doing it after does not work...

I'm calculating all the cache lines spanned by the buffer, so the buffer does not need to be cache line aligned.

If nobody's working on this, I'm ready to prepare a PR for it, but need help on :

  • where to put those functions (or macros ?) so that it's available for all device drivers ?
  • what are the existing drivers using DMA (beside I2C and sd card) that need to be patched ?
  • testing it (i've got only very basic I2C devices)...
@Anton-2

This comment has been minimized.

Copy link
Contributor

@Anton-2 Anton-2 commented Jan 24, 2016

Found an old sdcard : it works perfectly.

@dhylands

This comment has been minimized.

Copy link
Contributor

@dhylands dhylands commented Jan 24, 2016

I think that having the buffer not be cache line aligned will eventually cause problems.

Consider the scenario where the buffer starts in the middle of a cacheline.

At the start, there could be cached data sitting in the non-buffer portion of the cacheline. Doing a clean and invalidate before the DMA starts will cause that cached data to be written to physical memory.

There is a small window where there is a race condition that can cause problems. If the CPU reads from the non-buffer portion of the cacheline before the DMA manages to fill the remainder of the physical memory corresponding to the the cache line, then that will cause the cache to contain bad data for the buffer portion of the cache line.

You can't invalidate after, because you'd be tossing cached CPU data in the non-buffer portion of the cacheline (that was written after the DMA started).
You can't clean after, because the cache line might contain stuff that doesn't belong in the buffer portion of the cache line.

So whether this little race is significant or not depends alot on what in the data thats in cache line before the buffer.

If the heap is cacheline aligned, then I think that you can get away with ignoring this race. If the heap is not cacheline aligned, then an ISR could fire at just the wrong time, the ISR could possibly be the thing that has the data in the non-buffer portion of the cacheline.

This is something that needs to be very carefully considered.

The SCB_Clean& SCB_Invalidate macros are only defined for the F7 (in the cmsis files). I'd be inclined to create our own macros which evaluate to the SCB_Xxx functions on the F7 and which are no-ops on the other fmailies.

A search of the tree shows that these drivers currently use DMA: I2C, SPI, DAC, SDCARD, and it looks like some of the USB stuff uses DMA too.

@Anton-2

This comment has been minimized.

Copy link
Contributor

@Anton-2 Anton-2 commented Jan 25, 2016

I agree that we should have buffers cache line aligned (and sized...). I was reading about the case where cleaning the non buffer part of the cache line may overwrite some data coming from other DMA channel...

Then it seems that we have two situations :

  • (a ) We can control buffer allocation :

    We just have to make sure that the buffers are 'DMA ready' (either aligned so we can Flush/Invalidate them, or coming from a non cached memory region)

  • (b) The user provides the buffer :

    It could be possible to handle this by copying the unaligned start (or/and end) of the buffer in a temporary location (just one cache line, aligned), and doing up to 3 DMA transfers. Or we could send/receive the unaligned parts without DMA. But is this worth the added complexity ?

    I would better let the user choose if he wants DMA, and fail if the buffer is not 'DMA ready' when DMA was requested.

@JF002

This comment has been minimized.

Copy link
Contributor

@JF002 JF002 commented Nov 13, 2016

I could reproduce this issue on my STM32F7 Disco board.
Here is my observations :

  • The SD card works with DMA when disabling the cache memories in HAL_MspInit();
  • The SD card works without DMA when cache memories are enabled;
  • Adding SCB_CleanInvalidateDCache() before starting the DMA for reading, and SCB_CleanDCache() before starting the DMA for writing makes the SD card work with DMA and caches.

I also read a lot of information about how to handle cache coherency. We could mark the memory zone as non-cached, or store it in the DTCM memory, for example.

Any idea on how to best handle this issue?

@dhylands

This comment has been minimized.

Copy link
Contributor

@dhylands dhylands commented Nov 13, 2016

I think that the right thing to do is to create macros which on the M7 will call Clean/Invalidate and other platforms will do nothing.

Then add those calls around the DMA. This really needs to be done for all DMA transfers, not just the sdcard.

Writing will need a slightly different pattern.

There are also functions SCB_CleanInvalidateDCache_by_addr which take an address and size so you don't have to do the whole cache.

@r4d10n

This comment has been minimized.

Copy link

@r4d10n r4d10n commented Nov 13, 2016

Thanks JF002 for bringing this up again... I was breaking my head debugging the filesystem code to figure out why SD card was not mounting on my STM32F7 Disco.

Adding SCB_CleanInvalidateDCache() in sdcard_read_blocks() and SCB_CleanDCache() in sdcard_write_blocks() got it working for me.

@JF002

This comment has been minimized.

Copy link
Contributor

@JF002 JF002 commented Nov 20, 2016

There are also functions SCB_CleanInvalidateDCache_by_addr which take an address and size so you don't have to do the whole cache.

I tried these "By_Addr" functions, but there's something wrong : transferts are corrupt when reading/writing with these calls. When it is used in the read function, µP can't mount the SD card, and when it is used in the write function, the data written is corrupt.

I've never used them before, is there anything wrong with the parameters?

sdcard_read_blocks() :
SCB_CleanInvalidateDCache_by_Addr((uint32_t*)dest, num_blocks * SDCARD_BLOCK_SIZE);

sdcard_write_blocks() :
SCB_CleanDCache_by_Addr((uint32_t*)src, num_blocks * SDCARD_BLOCK_SIZE);

Using SCB_CleanInvalidateDCache() and SCB_CleanDCache() works like a charm.

@ruffy91

This comment has been minimized.

Copy link
Contributor Author

@ruffy91 ruffy91 commented Nov 21, 2016

I've never used them before, is there anything wrong with the parameters?

The address and size have to be aligned to the cache line size (32 Bytes on the M7).

ST does only use the instructions which invalidate and clean the whole caches.
For reference:
AN4839 - Level 1 cache on STM32F7 Series
PM0253 - STM32F7 Series Cortex®-M7 processor programming manual Chapter 4.8 (Cache maintenance operations)

@JF002

This comment has been minimized.

Copy link
Contributor

@JF002 JF002 commented Nov 21, 2016

The address and size have to be aligned to the cache line size (32 Bytes on the M7).

Oh right! I've seen this... but I read bits instead of bytes. If I ensure that the address is 32 BYTES aligned, it works perfectly!

Read :
uint32_t align_dest = ((uint32_t)dest ) & ~(0x1F);
SCB_CleanInvalidateDCache_by_Addr((uint32_t*)align_dest, (num_blocks * SDCARD_BLOCK_SIZE) + ((uint32_t)dest - align_dest));

Write:
uint32_t align_src = ((uint32_t)src) & ~(0x1F);
SCB_CleanDCache_by_Addr((uint32_t*)align_src, (num_blocks * SDCARD_BLOCK_SIZE) + ((uint32_t)src - align_src));

@JF002

This comment has been minimized.

Copy link
Contributor

@JF002 JF002 commented Nov 21, 2016

I think that the right thing to do is to create macros which on the M7 will call Clean/Invalidate and other platforms will do nothing.

I think the best place to put these macros is in file "mpconfigboard.h".

Should I define it in all 'stmhal' boards as an empty macro, or should I check if the macro is defined before using it? What is the best practice in this case?

Thanks!

@dhylands

This comment has been minimized.

Copy link
Contributor

@dhylands dhylands commented Nov 21, 2016

Since it's more processor specific than board specific, I'd be inclined to put a #ifdef STM32F7 in either mpconfigport.h or mphalport.h, rather than putting them in the mpconfigboard.h files.

@dpgeorge

This comment has been minimized.

Copy link
Member

@dpgeorge dpgeorge commented Dec 2, 2016

Thanks guys for working on this issue! The SD card should now be working correctly with DMA on F7 MCUs.

There are still other peripherals that use DMA that probably still need fixing. Eg SPI transfers.

@JF002

This comment has been minimized.

Copy link
Contributor

@JF002 JF002 commented Dec 2, 2016

Yeah, I know for the other peripheral, but I've no devices to hook-up and test for now. And I'm a little reluctant to change de code without beeing able to test it.

@iabdalkader

This comment has been minimized.

Copy link
Contributor

@iabdalkader iabdalkader commented Dec 10, 2016

Just a heads up for anyone else working with a newer HAL, it seems SDMMC DMA is broken in V1.1.0 (cache enabled or not), I always get SD_RX_OVERRUN. Seems the only fix is an older DMA driver, used the one in MP hal.

BTW I'm using the DTCM for DMA buffers because that region is not cacheable, that's probably not going with MP looks like it's used for FS cache.

@iabdalkader

This comment has been minimized.

Copy link
Contributor

@iabdalkader iabdalkader commented Dec 20, 2016

There are still other peripherals that use DMA that probably still need fixing. Eg SPI transfers.

@JF002 @dpgeorge Why not clean the cache in the DMA driver ? I think polling mode is not used anywhere, so changing DMA_Start_IT should be enough:

if (hdma->Init.Direction == DMA_PERIPH_TO_MEMORY) {
     SCB_CleanInvalidateDCache();
} else { //DMA_MEMORY_TO_PERIPH || DMA_MEMORY_TO_MEMORY
    SCB_CleanDCache();
}

Update: I think for the DMA_MEMORY_TO_MEMORY case an invalidate is needed after the DMA transfer ? Also I think the code can be smarter and check if the src/dst is aligned to cache line and flush using the address and size.

@dpgeorge

This comment has been minimized.

Copy link
Member

@dpgeorge dpgeorge commented Mar 31, 2017

DMA transfers are now working on F7 MCUs for SD card (SDMMC), SPI and I2C; see
ff927cb, 7b1804c and 2460888.

SDMMC2 is also now supported on F76x MCUs, see 7876e54.

it seems SDMMC DMA is broken in V1.1.0

We are now using V1.1.2 and SDMMC DMA works.

Why not clean the cache in the DMA driver
...
I think for the DMA_MEMORY_TO_MEMORY case an invalidate is needed after the DMA transfer ? Also I think the code can be smarter and check if the src/dst is aligned to cache line and flush using the address and size.

I've spent a fair bit of time testing the recent set of changes that get DMA working on the F7, so am reluctant to do any further optimisation at this stage. Feel free to make patches and test them :)

@dpgeorge dpgeorge closed this Mar 31, 2017
nickzoic added a commit to nickzoic/micropython that referenced this issue Apr 6, 2019
I'm going to move this to common-module/ but for now at least there's a comment
explaining what's going on!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.