-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add alternative MMC driver and support scatter/gather transfers in dmaengine #652
Conversation
Hi, it is good to see somebody working on the DMAEngine. There is still a problem when used within the I2S subsystem. There is a memory leak. Find the problem description here: I created a very crude patch that solves the problem, but might have major side effects. Unfortunately I don't completely understand how ASoc/I2S and dmaengine interact. Therefore I'm not sure if this has to be fixed in the I2S subsystem or in the dmaengine driver. Could you have a look and see if there is a good way to fix this? Thank you |
scripts/checkpatch.pl reports several whitespace errors:
Don't remove Florian Meier as author. Your work is adding to what he has done, this is not a new driver. Add yourself as author instead. +#define OLDDMA //means we use the old and rusty dmaengine.c What's the new way of doing it? This feels quite hackish with all the #ifdef's. IMHO better to have two patches: one for bcm2708-dmaengine.c and one for bcm2835-dma.c Thanks for working on slave_sg support. |
commit 2ba3154 upstream. The PL061 driver had the irqdomain initialization in an unfortunate place: when used with device tree (and thus passing the base IRQ 0) the driver would work, as this registers an irqdomain and waits for mappings to be done dynamically as the devices request their IRQs, whereas when booting using platform data the irqdomain core would attempt to allocate IRQ descriptors dynamically (which works fine) but also to associate the irq_domain_associate_many() on all IRQs, which in turn will call the mapping function which at this point will try to set the type of the IRQ and then tries to acquire a non-initialized spinlock yielding a backtrace like this: CPU: 0 PID: 1 Comm: swapper Not tainted 3.13.0-rc1+ raspberrypi#652 Backtrace: [<c0016f0c>] (dump_backtrace) from [<c00172ac>] (show_stack+0x18/0x1c) r6:c798ace0 r5:00000000 r4:c78257e0 r3:00200140 [<c0017294>] (show_stack) from [<c0329ea0>] (dump_stack+0x20/0x28) [<c0329e80>] (dump_stack) from [<c004fa80>] (__lock_acquire+0x1c0/0x1b80) [<c004f8c0>] (__lock_acquire) from [<c0051970>] (lock_acquire+0x6c/0x80) r10:00000000 r9:c0455234 r8:00000060 r7:c047d798 r6:600000d3 r5:00000000 r4:c782c000 [<c0051904>] (lock_acquire) from [<c032e484>] (_raw_spin_lock_irqsave+0x60/0x74) r6:c01a1100 r5:800000d3 r4:c798acd0 [<c032e424>] (_raw_spin_lock_irqsave) from [<c01a1100>] (pl061_irq_type+0x28/0x) r6:00000000 r5:00000000 r4:c798acd0 [<c01a10d8>] (pl061_irq_type) from [<c0059ef4>] (__irq_set_trigger+0x70/0x104) r6:00000000 r5:c01a10d8 r4:c046da1c r3:c01a10d8 [<c0059e84>] (__irq_set_trigger) from [<c005b348>] (irq_set_irq_type+0x40/0x60) r10:c043240c r8:00000060 r7:00000000 r6:c046da1c r5:00000060 r4:00000000 [<c005b308>] (irq_set_irq_type) from [<c01a1208>] (pl061_irq_map+0x40/0x54) r6:c79693c0 r5:c798acd0 r4:00000060 [<c01a11c8>] (pl061_irq_map) from [<c005d27c>] (irq_domain_associate+0xc0/0x190) r5:00000060 r4:c046da1c [<c005d1bc>] (irq_domain_associate) from [<c005d604>] (irq_domain_associate_man) r8:00000008 r7:00000000 r6:c79693c0 r5:00000060 r4:00000000 [<c005d5d0>] (irq_domain_associate_many) from [<c005d864>] (irq_domain_add_simp) r8:c046578c r7:c035b72c r6:c79693c0 r5:00000060 r4:00000008 r3:00000008 [<c005d814>] (irq_domain_add_simple) from [<c01a1380>] (pl061_probe+0xc4/0x22c) r6:00000060 r5:c0464380 r4:c798acd0 [<c01a12bc>] (pl061_probe) from [<c01c0450>] (amba_probe+0x74/0xe0) r10:c043240c r9:c0455234 r8:00000000 r7:c047d7f8 r6:c047d744 r5:00000000 r4:c0464380 This moves the irqdomain initialization to a point where the spinlock and GPIO chip are both fully propulated, so the callbacks can be used without crashes. I had some problem reproducing the crash, as the devm_kzalloc():ed zeroed memory would seemingly mask the spinlock as something OK, but by poisoning the lock like this: u32 *dum; dum = (u32 *) &chip->lock; *dum = 0xaaaaaaaaU; I could reproduce, fix and test the patch. Reported-by: Russell King <linux@arm.linux.org.uk> Cc: Rob Herring <robherring2@gmail.com> Cc: Haojian Zhuang <haojian.zhuang@linaro.org> Cc: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thanks for pointing out these errors. #ifdefs have been cleaned up a bit and now I think it is comprehensible that they are only to support probing using device-tree information as well as the conventional method used in the current raspbian kernel (when ARCH_BCM2708 is enabled). This allows the same dmaengine driver to be run with the upstream kernel that uses device-tree (when ARCH_BCM2708 is disabled). |
If this work is to go upstream, the file to patch is: bcm2835-dma I will argue that it's better to get a patch for bcm2835-dma accepted upstream first, that's where the experts are. Sometimes even the experts themselves have to resend a patch before it's accepted. I really don't see any point in patching bcm2708-dmaengine.c |
A new upstreamable mmc driver is on the way that uses dmaengine's scatter/gather transfers. It will need to be tested perhaps by bringing it into a next branch. This dmaengine patch has no dependencies and it only expands functionality so it does not break anything. Working to get it upstreamed is a lengthy process that can run in parallel. |
Well, improvements to mmc driver are certainly desirable (I'm crossing my fingers it involves replacing the sleepy polling loops, and the low-latency-mode hack with something triggered from an mmc or timer interrupt). So, yes if this PR enables that, then I'm happier pulling it in now. Ideally it will be updated with any changes that energe from upstreaming process, but it doesn't have to wait for that. Need to decide where the "next/mmc" source lives. Could be this tree but with command line parameters to request new mmc behaviour. Could be a branch of this (e.g. rpi-3.12.y-mmc), or it could be latest (3.16.y) tree. |
@weiszg I agree with patching bcm2708-dmaengine, if it has to do with mmc driver testing. With regards to getting patches upstream, I recommend this video: Write and Submit your first Linux kernel Patch @popcornmix My kudos to the one that thought of putting the DT overlays in the eeprom. Really nice. |
@notro any comments? For testers who build their own kernels you can try the new mmc driver by adding this to cmdline.txt Compared to the old driver it should do less polling (which was often done with interrupts disabled) which may improve performance and avoid interference with other kernel drivers. This will be available in official firmware releases once the PR is merged (initially disabled by default, but will be enabled if testing is positive). It appears to work fine for me, but backing up the sdcard before testing is always recommended. |
scripts/checkpatch.pl reports:
Remember to run checkpatch before submitting patches upstream. It gives the whole review process a bad start if there are style issues detectable by checkpatch. drivers/dma/bcm2708-dmaengine.c Florian Meier can't be removed as MODULE_AUTHOR. He IS the principal author. /*
* Author(s), use "Name <email>" or just "Name", for multiple
* authors use multiple MODULE_AUTHOR() statements/lines.
*/ I don't see the point in changing 2708 -> 2835 all over. The driver is still called bcm2708-dmaengine.c. And it introduces so much noise in the patch which makes it hard to review. drivers/mmc/host/Kconfig This needs BCM2835 in it
drivers/mmc/host/bcm2835-mmc.c use dev_err()/dev_info() instead of pr_err()/pr_info() Even though the functions are static, they should be prefixed with bcm2835_mmc_ or something unique. sdhci-bcm2835.c uses bcm2835_sdhci_ for instance. The clock can be added in bcm2708.c for the non-DT case. There are some unecessary newlines, even double ones. I'm not sure about the exact newline policy, but it feels too much. |
I tested the driver on 3.16.1 ARCH_BCM2708 and CONFIG_BCM2708_DT since I'm currently working with that. Anyway here's the result:
|
The clock speed in the old driver case was passed through in the commandline parameter init_emmc_clock - I suppose that a corresponding clock node must be instantiated in the DT for the OF version to work properly... |
Yes, that's right, but I also need the DMA controller and MMC device tree nodes for this to work (to get the channels). from bcm2835-mmc.c + host->dma_chan_tx = of_dma_request_slave_channel(node, "tx");
+ host->dma_chan_rx = of_dma_request_slave_channel(node, "rx");
+#endif
+ clk = of_clk_get(node, 0); |
@popcornmix I have renamed MMC_PIO_DMA_BARRIER to MMC_BCM2835_PIO_DMA_BARRIER @notro I have fixed the errors you noticed. I prefer using 2835 because this IS (or will be) the patched bcm2835-dmaengine.c. In fact the driver was written based on the current bcm2835-dmaengine.c. Here are the DT settings I use the drivers with:
|
I gave up on DT for now and did a test with a regular build. Hardware: Model B with a SanDisk Extreme 45MB/s 16GB Version
Regular Kernel messages
Test
bcm2708.bcm2835_mmc=1 Kernel messages
Test
|
@notro I have decreased the number of wait cycles in the dma to 1 which resulted in an increase in raw read speed compared to the old driver when using a NOOBS card. Could you please test again with your card? |
ARCH_BCM2708, CONFIG_BCM2708_DT and bcm2835-mmc Added DMA and MMC DT nodes. From bcm2708.c I removed these lines:
Version
Kernel messages
Test
|
@weiszg
How do you test performance? |
Same setup as previously with this change: -#define SDHCI_BCM_DMA_WAITS 30 /* delays slowing DMA transfers: 0-31 */
+#define SDHCI_BCM_DMA_WAITS 1 /* delays slowing DMA transfers: 0-31 */ Test
|
sdhci-bcm2708 and dma.c: fix for LITE channels
I have updated the dmaengine driver (and the bit of the old sdhci that I changed) to use WAIT_RESP instead of DMA_WAITS. This will safely transfer data even with a busy memory bus. This change should not affect transfer speeds. @notro |
An aside for the suitably interested (and something @weiszg and I were discussing today): The current expansion to the DMAengine driver hardcodes these defaults: this is fine for the time being as the only clients of the BCM2835 dmaengines are the I2S and (just added) bcm2835-MMC driver, which are low-bandwidth peripherals operating at a maximum of a couple of dozen megabytes per second. Nobbling the DMA throughput by default for these clients has only minor effects on the MMC host speed as evidenced by winding the DMA wait cycle count up to 30. The DMA engines (in particular the full-fat ones) implement functionality for 2D strided transfers as well as the various AXI "magic" flags detailed in the peripherals datasheet. Another candidate for conversion to DMAengine client use (in fact the only other one currently using the BCM DMA channel request/allocate API) is the dma-accelerated framebuffer driver. This driver needs to specify 2D area moves therefore a straightforward conversion is not possible: as described in the comment for http://lxr.free-electrons.com/source/include/linux/dmaengine.h#L306 Ideally a future expansion to the BCM2835 dmaengine driver, and subsequent modification of the clients, will pack the necessary 2D stride information and any flags peculiar to the hardware peripheral (in effect the TI and STRIDE registers) in with the dma_slave_config struct. |
non-DT build this time with bcm2708.bcm2835_mmc=1
|
add alternative MMC driver and support scatter/gather transfers in dmaengine
The PL061 driver had the irqdomain initialization in an unfortunate place: when used with device tree (and thus passing the base IRQ 0) the driver would work, as this registers an irqdomain and waits for mappings to be done dynamically as the devices request their IRQs, whereas when booting using platform data the irqdomain core would attempt to allocate IRQ descriptors dynamically (which works fine) but also to associate the irq_domain_associate_many() on all IRQs, which in turn will call the mapping function which at this point will try to set the type of the IRQ and then tries to acquire a non-initialized spinlock yielding a backtrace like this: CPU: 0 PID: 1 Comm: swapper Not tainted 3.13.0-rc1+ raspberrypi#652 Backtrace: [<c0016f0c>] (dump_backtrace) from [<c00172ac>] (show_stack+0x18/0x1c) r6:c798ace0 r5:00000000 r4:c78257e0 r3:00200140 [<c0017294>] (show_stack) from [<c0329ea0>] (dump_stack+0x20/0x28) [<c0329e80>] (dump_stack) from [<c004fa80>] (__lock_acquire+0x1c0/0x1b80) [<c004f8c0>] (__lock_acquire) from [<c0051970>] (lock_acquire+0x6c/0x80) r10:00000000 r9:c0455234 r8:00000060 r7:c047d798 r6:600000d3 r5:00000000 r4:c782c000 [<c0051904>] (lock_acquire) from [<c032e484>] (_raw_spin_lock_irqsave+0x60/0x74) r6:c01a1100 r5:800000d3 r4:c798acd0 [<c032e424>] (_raw_spin_lock_irqsave) from [<c01a1100>] (pl061_irq_type+0x28/0x) r6:00000000 r5:00000000 r4:c798acd0 [<c01a10d8>] (pl061_irq_type) from [<c0059ef4>] (__irq_set_trigger+0x70/0x104) r6:00000000 r5:c01a10d8 r4:c046da1c r3:c01a10d8 [<c0059e84>] (__irq_set_trigger) from [<c005b348>] (irq_set_irq_type+0x40/0x60) r10:c043240c r8:00000060 r7:00000000 r6:c046da1c r5:00000060 r4:00000000 [<c005b308>] (irq_set_irq_type) from [<c01a1208>] (pl061_irq_map+0x40/0x54) r6:c79693c0 r5:c798acd0 r4:00000060 [<c01a11c8>] (pl061_irq_map) from [<c005d27c>] (irq_domain_associate+0xc0/0x190) r5:00000060 r4:c046da1c [<c005d1bc>] (irq_domain_associate) from [<c005d604>] (irq_domain_associate_man) r8:00000008 r7:00000000 r6:c79693c0 r5:00000060 r4:00000000 [<c005d5d0>] (irq_domain_associate_many) from [<c005d864>] (irq_domain_add_simp) r8:c046578c r7:c035b72c r6:c79693c0 r5:00000060 r4:00000008 r3:00000008 [<c005d814>] (irq_domain_add_simple) from [<c01a1380>] (pl061_probe+0xc4/0x22c) r6:00000060 r5:c0464380 r4:c798acd0 [<c01a12bc>] (pl061_probe) from [<c01c0450>] (amba_probe+0x74/0xe0) r10:c043240c r9:c0455234 r8:00000000 r7:c047d7f8 r6:c047d744 r5:00000000 r4:c0464380 This moves the irqdomain initialization to a point where the spinlock and GPIO chip are both fully propulated, so the callbacks can be used without crashes. I had some problem reproducing the crash, as the devm_kzalloc():ed zeroed memory would seemingly mask the spinlock as something OK, but by poisoning the lock like this: u32 *dum; dum = (u32 *) &chip->lock; *dum = 0xaaaaaaaaU; I could reproduce, fix and test the patch. Reported-by: Russell King <linux@arm.linux.org.uk> Cc: Rob Herring <robherring2@gmail.com> Cc: Haojian Zhuang <haojian.zhuang@linaro.org> Cc: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
ppc: apply fix for hard lockup
this also expands the upstream BCM2835 dmaengine driver and will theoretically replace it