Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add alternative MMC driver and support scatter/gather transfers in dmaengine #652

Merged
merged 3 commits into from
Aug 20, 2014

Conversation

weiszg
Copy link
Contributor

@weiszg weiszg commented Jul 29, 2014

this also expands the upstream BCM2835 dmaengine driver and will theoretically replace it

@hifiberry
Copy link

Hi,

it is good to see somebody working on the DMAEngine. There is still a problem when used within the I2S subsystem. There is a memory leak. Find the problem description here:
http://www.hifiberry.com/forums/topic/dma-coherent-pool-is-to-small-crash

I created a very crude patch that solves the problem, but might have major side effects. Unfortunately I don't completely understand how ASoc/I2S and dmaengine interact. Therefore I'm not sure if this has to be fixed in the I2S subsystem or in the dmaengine driver.
The patch can be found in the archive here:
http://www.hifiberry.com/forums/topic/dma-coherent-pool-is-to-small-crash/#post-10645

Could you have a look and see if there is a good way to fix this?

Thank you
Daniel

@notro
Copy link
Contributor

notro commented Jul 30, 2014

scripts/checkpatch.pl reports several whitespace errors:

total: 46 errors, 10 warnings, 1031 lines checked

Don't remove Florian Meier as author. Your work is adding to what he has done, this is not a new driver. Add yourself as author instead.

+#define OLDDMA //means we use the old and rusty dmaengine.c

What's the new way of doing it?

This feels quite hackish with all the #ifdef's. IMHO better to have two patches: one for bcm2708-dmaengine.c and one for bcm2835-dma.c

Thanks for working on slave_sg support.

koalo pushed a commit to koalo/linux that referenced this pull request Aug 4, 2014
commit 2ba3154 upstream.

The PL061 driver had the irqdomain initialization in an unfortunate
place: when used with device tree (and thus passing the base IRQ
0) the driver would work, as this registers an irqdomain and waits
for mappings to be done dynamically as the devices request their
IRQs, whereas when booting using platform data the irqdomain core
would attempt to allocate IRQ descriptors dynamically (which works
fine) but also to associate the irq_domain_associate_many() on all
IRQs, which in turn will call the mapping function which at this
point will try to set the type of the IRQ and then tries to acquire
a non-initialized spinlock yielding a backtrace like this:

CPU: 0 PID: 1 Comm: swapper Not tainted 3.13.0-rc1+ raspberrypi#652
Backtrace:
[<c0016f0c>] (dump_backtrace) from [<c00172ac>] (show_stack+0x18/0x1c)
 r6:c798ace0 r5:00000000 r4:c78257e0 r3:00200140
[<c0017294>] (show_stack) from [<c0329ea0>] (dump_stack+0x20/0x28)
[<c0329e80>] (dump_stack) from [<c004fa80>] (__lock_acquire+0x1c0/0x1b80)
[<c004f8c0>] (__lock_acquire) from [<c0051970>] (lock_acquire+0x6c/0x80)
 r10:00000000 r9:c0455234 r8:00000060 r7:c047d798 r6:600000d3 r5:00000000
 r4:c782c000
[<c0051904>] (lock_acquire) from [<c032e484>] (_raw_spin_lock_irqsave+0x60/0x74)
 r6:c01a1100 r5:800000d3 r4:c798acd0
[<c032e424>] (_raw_spin_lock_irqsave) from [<c01a1100>] (pl061_irq_type+0x28/0x)
 r6:00000000 r5:00000000 r4:c798acd0
[<c01a10d8>] (pl061_irq_type) from [<c0059ef4>] (__irq_set_trigger+0x70/0x104)
 r6:00000000 r5:c01a10d8 r4:c046da1c r3:c01a10d8
[<c0059e84>] (__irq_set_trigger) from [<c005b348>] (irq_set_irq_type+0x40/0x60)
 r10:c043240c r8:00000060 r7:00000000 r6:c046da1c r5:00000060 r4:00000000
[<c005b308>] (irq_set_irq_type) from [<c01a1208>] (pl061_irq_map+0x40/0x54)
 r6:c79693c0 r5:c798acd0 r4:00000060
[<c01a11c8>] (pl061_irq_map) from [<c005d27c>] (irq_domain_associate+0xc0/0x190)
 r5:00000060 r4:c046da1c
[<c005d1bc>] (irq_domain_associate) from [<c005d604>] (irq_domain_associate_man)
 r8:00000008 r7:00000000 r6:c79693c0 r5:00000060 r4:00000000
[<c005d5d0>] (irq_domain_associate_many) from [<c005d864>] (irq_domain_add_simp)
 r8:c046578c r7:c035b72c r6:c79693c0 r5:00000060 r4:00000008 r3:00000008
[<c005d814>] (irq_domain_add_simple) from [<c01a1380>] (pl061_probe+0xc4/0x22c)
 r6:00000060 r5:c0464380 r4:c798acd0
[<c01a12bc>] (pl061_probe) from [<c01c0450>] (amba_probe+0x74/0xe0)
 r10:c043240c r9:c0455234 r8:00000000 r7:c047d7f8 r6:c047d744 r5:00000000
 r4:c0464380

This moves the irqdomain initialization to a point where the spinlock
and GPIO chip are both fully propulated, so the callbacks can be used
without crashes.

I had some problem reproducing the crash, as the devm_kzalloc():ed
zeroed memory would seemingly mask the spinlock as something OK,
but by poisoning the lock like this:

u32 *dum;
dum = (u32 *) &chip->lock;
*dum = 0xaaaaaaaaU;

I could reproduce, fix and test the patch.

Reported-by: Russell King <linux@arm.linux.org.uk>
Cc: Rob Herring <robherring2@gmail.com>
Cc: Haojian Zhuang <haojian.zhuang@linaro.org>
Cc: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
@weiszg
Copy link
Contributor Author

weiszg commented Aug 7, 2014

Thanks for pointing out these errors.

#ifdefs have been cleaned up a bit and now I think it is comprehensible that they are only to support probing using device-tree information as well as the conventional method used in the current raspbian kernel (when ARCH_BCM2708 is enabled). This allows the same dmaengine driver to be run with the upstream kernel that uses device-tree (when ARCH_BCM2708 is disabled).

@popcornmix
Copy link
Collaborator

@weiszg
Can you squash and force push? Something like:

git rebase -i HEAD~3
(follow instructions and squash the second two commits)
git push -f

@notro does this look usable by upstream bcm2835 tree?

@notro
Copy link
Contributor

notro commented Aug 7, 2014

does this look usable by upstream bcm2835 tree?

If this work is to go upstream, the file to patch is: bcm2835-dma
This driver entered the tree with 3.14

I will argue that it's better to get a patch for bcm2835-dma accepted upstream first, that's where the experts are. Sometimes even the experts themselves have to resend a patch before it's accepted.

I really don't see any point in patching bcm2708-dmaengine.c

@popcornmix
Copy link
Collaborator

@weiszg
Are there other drivers planned that use the scatter/gather functionality?

As @notro suggests, applying the patch to bcm2835-dma and posting to linux-rpi-kern list would be useful.

@weiszg
Copy link
Contributor Author

weiszg commented Aug 8, 2014

A new upstreamable mmc driver is on the way that uses dmaengine's scatter/gather transfers. It will need to be tested perhaps by bringing it into a next branch. This dmaengine patch has no dependencies and it only expands functionality so it does not break anything. Working to get it upstreamed is a lengthy process that can run in parallel.

@popcornmix
Copy link
Collaborator

Well, improvements to mmc driver are certainly desirable (I'm crossing my fingers it involves replacing the sleepy polling loops, and the low-latency-mode hack with something triggered from an mmc or timer interrupt).

So, yes if this PR enables that, then I'm happier pulling it in now. Ideally it will be updated with any changes that energe from upstreaming process, but it doesn't have to wait for that.

Need to decide where the "next/mmc" source lives. Could be this tree but with command line parameters to request new mmc behaviour. Could be a branch of this (e.g. rpi-3.12.y-mmc), or it could be latest (3.16.y) tree.

@notro
Copy link
Contributor

notro commented Aug 8, 2014

@weiszg
can the mmc driver live with the currently patched mmc subsystem?
or does it need a tree where the subsystem is not patched?
In other words, can it live within the current raspberrypi/linux branches?
(sdhci-bcm2835 for instance won't work with the patched subsystem)

I agree with patching bcm2708-dmaengine, if it has to do with mmc driver testing.

With regards to getting patches upstream, I recommend this video: Write and Submit your first Linux kernel Patch

@popcornmix
Can we have a rpi-update installable firmware branch for this mmc testing?
And can we enable Device Tree support for this also?
To keep the DT work moving we need testing to see if any ill sideeffects has come from enabling Device Tree support. And to get testing by the masses, it has to be a simple process (i.e. rpi-update).
When things has turned out good, we can look at preparing other drivers that has upstream alternatives.
The other benefit is that when the bootloader can load HAT eeprom DT overlays, there's a kernel readily available with DT support.

My kudos to the one that thought of putting the DT overlays in the eeprom. Really nice.

@weiszg weiszg changed the title expand functionality by supporting scatter/gather transfers add alternative MMC driver and support scatter/gather transfers in dmaengine Aug 15, 2014
@popcornmix
Copy link
Collaborator

@notro any comments?

For testers who build their own kernels you can try the new mmc driver by adding this to cmdline.txt
bcm2708.bcm2835_mmc=1

Compared to the old driver it should do less polling (which was often done with interrupts disabled) which may improve performance and avoid interference with other kernel drivers.

This will be available in official firmware releases once the PR is merged (initially disabled by default, but will be enabled if testing is positive).

It appears to work fine for me, but backing up the sdcard before testing is always recommended.

@notro
Copy link
Contributor

notro commented Aug 15, 2014

scripts/checkpatch.pl reports:

total: 28 errors, 48 warnings, 2903 lines checked

NOTE: whitespace errors detected, you may wish to use scripts/cleanpatch or
      scripts/cleanfile

Remember to run checkpatch before submitting patches upstream. It gives the whole review process a bad start if there are style issues detectable by checkpatch.

drivers/dma/bcm2708-dmaengine.c

Florian Meier can't be removed as MODULE_AUTHOR. He IS the principal author.
Multiple MODULE_AUTHOR statements can be used.
from include/linux/module.h

/*
 * Author(s), use "Name <email>" or just "Name", for multiple
 * authors use multiple MODULE_AUTHOR() statements/lines.
 */

I don't see the point in changing 2708 -> 2835 all over. The driver is still called bcm2708-dmaengine.c. And it introduces so much noise in the patch which makes it hard to review.
bcm2835-dma would have to be patched with slave_sg support later anyway.

drivers/mmc/host/Kconfig

This needs BCM2835 in it

config MMC_PIO_DMA_BARRIER

drivers/mmc/host/bcm2835-mmc.c

use dev_err()/dev_info() instead of pr_err()/pr_info()

Even though the functions are static, they should be prefixed with bcm2835_mmc_ or something unique. sdhci-bcm2835.c uses bcm2835_sdhci_ for instance.
This becomes evident when browsing/searching the source code with LXR.

The clock can be added in bcm2708.c for the non-DT case.

There are some unecessary newlines, even double ones. I'm not sure about the exact newline policy, but it feels too much.

@notro
Copy link
Contributor

notro commented Aug 16, 2014

I tested the driver on 3.16.1 ARCH_BCM2708 and CONFIG_BCM2708_DT since I'm currently working with that.
Don't know if this is meant to work, I guess I need a clock node and some other node in the Device Tree. If you provide them I can test.

Anyway here's the result:

drivers/mmc/host/bcm2835-mmc.c: In function ‘mmc_irq’:
drivers/mmc/host/bcm2835-mmc.c:955:6: warning: unused variable ‘cardint’ [-Wunused-variable]
  int cardint = 0;
      ^
drivers/mmc/host/bcm2835-mmc.c: In function ‘bcm2835_probe’:
drivers/mmc/host/bcm2835-mmc.c:1368:17: warning: unused variable ‘mask’ [-Wunused-variable]
  dma_cap_mask_t mask;
                 ^
[    0.000000] Linux version 3.16.1+ (pi@raspi2) (gcc version 4.8.3 20140106 (prerelease) (crosstool-NG linaro-1.13.1-4.8-2014.01 - Linaro GCC 2013.11) ) #1 PREEMPT Sat Aug 16 15:15:54 CEST 2014

[    2.207201] sdhci: Secure Digital Host Controller Interface driver
[    2.215343] sdhci: Copyright(c) Pierre Ossman
[    2.221998] of_dma_request_slave_channel: not enough information provided
[    2.230808] of_dma_request_slave_channel: not enough information provided
[    2.239704] mmc-bcm2835 mmc-bcm2835.0: get CLOCK failed
[    2.246875] mmc-bcm2835: probe of mmc-bcm2835.0 failed with error -2
[    2.255284] sdhci-pltfm: SDHCI platform and OF driver helper

@P33M
Copy link
Contributor

P33M commented Aug 16, 2014

The clock speed in the old driver case was passed through in the commandline parameter init_emmc_clock - I suppose that a corresponding clock node must be instantiated in the DT for the OF version to work properly...

@notro
Copy link
Contributor

notro commented Aug 16, 2014

Yes, that's right, but I also need the DMA controller and MMC device tree nodes for this to work (to get the channels).
I guess the driver has been tried with ARCH_BCM2835, so the same DT nodes would be needed in my case.

from bcm2835-mmc.c

+   host->dma_chan_tx = of_dma_request_slave_channel(node, "tx");
+   host->dma_chan_rx = of_dma_request_slave_channel(node, "rx");
+#endif
+   clk = of_clk_get(node, 0);

@weiszg
Copy link
Contributor Author

weiszg commented Aug 18, 2014

@popcornmix I have renamed MMC_PIO_DMA_BARRIER to MMC_BCM2835_PIO_DMA_BARRIER

@notro I have fixed the errors you noticed. I prefer using 2835 because this IS (or will be) the patched bcm2835-dmaengine.c. In fact the driver was written based on the current bcm2835-dmaengine.c.

Here are the DT settings I use the drivers with:

mmc: mmc@7e300000 {
    compatible = "brcm,bcm2835-mmc";
    reg = <0x7e300000 0x100>;
    interrupts = <2 30>;
    clocks = <&clk_mmc>;
    dmas = <&dma 5>,
           <&dma 5>;
    dma-names = "tx", "rx";
    status = "disabled";
};
dma: dma@7e007000 {
    compatible = "brcm,bcm2835-dma";
    reg = <0x7e007000 0xf00>;
    interrupts = <1 16>,
             <1 17>,
             <1 18>,
             <1 19>,
             <1 20>,
             <1 21>,
             <1 22>,
             <1 23>,
             <1 24>,
             <1 25>,
             <1 26>,
             <1 27>,
             <1 28>;

    #dma-cells = <1>;
    brcm,dma-channel-mask = <0x7f35>;
};

and in clocks {
clk_mmc: clock@0 {
    compatible = "fixed-clock";
    reg = <0>;
    #clock-cells = <0>;
    clock-output-names = "mmc";
    clock-frequency = <250000000>;
};

@notro
Copy link
Contributor

notro commented Aug 19, 2014

I gave up on DT for now and did a test with a regular build.
I used a simple test with 5 write runs and then 5 read runs.

Hardware: Model B with a SanDisk Extreme 45MB/s 16GB

Version

$ cat /proc/version
Linux version 3.16.1+ (pi@raspi2) (gcc version 4.8.3 20140106 (prerelease) (crosstool-NG linaro-1.13.1-4.8-2014.01 - Linaro GCC 2013.11) ) #1 PREEMPT Tue Aug 19 13:09:16 CEST 2014

Regular

Kernel messages

[    2.001342] sdhci: Secure Digital Host Controller Interface driver
[    2.009603] sdhci: Copyright(c) Pierre Ossman
[    2.016244] mmc0: no vqmmc regulator found
[    2.022320] mmc0: no vmmc regulator found
[    2.065046] mmc0: SDHCI controller on BCM2708_Arasan [platform] using platform's DMA
[    2.075126] mmc0: BCM2708 SDHC host at 0x20300000 DMA 2 IRQ 77
[    2.083196] sdhci-pltfm: SDHCI platform and OF driver helper

[    2.246259] Waiting for root device /dev/mmcblk0p2...
[    2.385122] usb 1-1: new high-speed USB device number 2 using dwc_otg
[    2.393765] Indeed it is in host mode hprt0 = 00001101
[    2.401975] mmc0: could read SD Configuration register (SCR) at the 2th attempt
[    2.487133] mmc0: read SD Status register (SSR) after 4 attempts
[    2.512998] mmc0: new high speed SDHC card at address e624
[    2.521376] mmcblk0: mmc0:e624 SU16G 14.8 GiB
[    2.530913]  mmcblk0: p1 p2

Test

$ sync; time dd if=/dev/zero of=~/test.tmp bs=500K count=1024
524288000 bytes (524 MB) copied, 29.0657 s, 18.0 MB/s
real    0m29.083s
user    0m0.030s
sys     0m11.780s

524288000 bytes (524 MB) copied, 27.8121 s, 18.9 MB/s
real    0m28.314s
user    0m0.000s
sys     0m12.310s

524288000 bytes (524 MB) copied, 30.0053 s, 17.5 MB/s
real    0m30.508s
user    0m0.000s
sys     0m11.920s

524288000 bytes (524 MB) copied, 27.7773 s, 18.9 MB/s
real    0m28.282s
user    0m0.000s
sys     0m11.410s

524288000 bytes (524 MB) copied, 27.8997 s, 18.8 MB/s
real    0m28.401s
user    0m0.000s
sys     0m11.330s



$ dd if=~/test.tmp of=/dev/null bs=500K count=1024

524288000 bytes (524 MB) copied, 28.6384 s, 18.3 MB/s
524288000 bytes (524 MB) copied, 28.6455 s, 18.3 MB/s
524288000 bytes (524 MB) copied, 28.6756 s, 18.3 MB/s
524288000 bytes (524 MB) copied, 28.6732 s, 18.3 MB/s
524288000 bytes (524 MB) copied, 28.6738 s, 18.3 MB/s

bcm2708.bcm2835_mmc=1

Kernel messages

[    2.001297] sdhci: Secure Digital Host Controller Interface driver
[    2.009553] sdhci: Copyright(c) Pierre Ossman
[    2.016164] DMA channels allocated for the MMC driver
[    2.057054] Load BCM2835 MMC driver
[    2.064924] sdhci-pltfm: SDHCI platform and OF driver helper

[    2.197817] Waiting for root device /dev/mmcblk0p2...
[    2.208350] mmc0: host does not support reading read-only switch. assuming write-enable.
[    2.231565] mmc0: new high speed SDHC card at address e624
[    2.247560] mmcblk0: mmc0:e624 SU16G 14.8 GiB
[    2.269277]  mmcblk0: p1 p2

Test

$ sync; time dd if=/dev/zero of=~/test.tmp bs=500K count=1024

524288000 bytes (524 MB) copied, 31.4771 s, 16.7 MB/s
real    0m31.494s
user    0m0.020s
sys     0m12.810s

524288000 bytes (524 MB) copied, 32.549 s, 16.1 MB/s
real    0m33.054s
user    0m0.030s
sys     0m11.440s

524288000 bytes (524 MB) copied, 26.1999 s, 20.0 MB/s
real    0m26.706s
user    0m0.020s
sys     0m11.960s

524288000 bytes (524 MB) copied, 29.935 s, 17.5 MB/s
real    0m30.442s
user    0m0.020s
sys     0m11.460s

524288000 bytes (524 MB) copied, 29.4643 s, 17.8 MB/s
real    0m29.970s
user    0m0.020s
sys     0m11.190s


$ dd if=~/test.tmp of=/dev/null bs=500K count=1024

524288000 bytes (524 MB) copied, 30.9717 s, 16.9 MB/s
524288000 bytes (524 MB) copied, 30.9838 s, 16.9 MB/s
524288000 bytes (524 MB) copied, 30.9797 s, 16.9 MB/s
524288000 bytes (524 MB) copied, 30.9818 s, 16.9 MB/s
524288000 bytes (524 MB) copied, 30.9809 s, 16.9 MB/s

@weiszg
Copy link
Contributor Author

weiszg commented Aug 19, 2014

@notro I have decreased the number of wait cycles in the dma to 1 which resulted in an increase in raw read speed compared to the old driver when using a NOOBS card. Could you please test again with your card?

@notro
Copy link
Contributor

notro commented Aug 19, 2014

ARCH_BCM2708, CONFIG_BCM2708_DT and bcm2835-mmc

Added DMA and MMC DT nodes.

From bcm2708.c I removed these lines:

    bcm_register_device(&bcm2708_dmaman_device);
    if (!bcm2835_mmc) bcm_register_device(&bcm2708_emmc_device);
    if (bcm2835_mmc) bcm_register_device(&bcm2835_emmc_device);

Version

cat /proc/version
Linux version 3.16.1+ (pi@raspi2) (gcc version 4.8.3 20140106 (prerelease) (crosstool-NG linaro-1.13.1-4.8-2014.01 - Linaro GCC 2013.11) ) #2 PREEMPT Tue Aug 19 17:03:52 CEST 2014

Kernel messages

[    1.957752] sdhci: Secure Digital Host Controller Interface driver
[    1.964043] sdhci: Copyright(c) Pierre Ossman
[    1.968837] DMA channels allocated for the MMC driver
[    2.010953] Load BCM2835 MMC driver
[    2.015009] sdhci-pltfm: SDHCI platform and OF driver helper

[    2.112509] Waiting for root device /dev/mmcblk0p2...
[    2.126169] mmc0: host does not support reading read-only switch. assuming write-enable.
[    2.145544] mmc0: new high speed SDHC card at address e624
[    2.161571] mmcblk0: mmc0:e624 SU16G 14.8 GiB
[    2.168811]  mmcblk0: p1 p2

Test

sync; time dd if=/dev/zero of=~/test.tmp bs=500K count=1024
524288000 bytes (524 MB) copied, 33.5181 s, 15.6 MB/s
real    0m33.710s
user    0m0.030s
sys     0m16.650s

524288000 bytes (524 MB) copied, 30.4738 s, 17.2 MB/s
real    0m30.880s
user    0m0.040s
sys     0m16.180s

524288000 bytes (524 MB) copied, 30.9445 s, 16.9 MB/s
real    0m31.258s
user    0m0.000s
sys     0m11.670s

524288000 bytes (524 MB) copied, 29.4293 s, 17.8 MB/s
real    0m29.647s
user    0m0.000s
sys     0m11.140s

524288000 bytes (524 MB) copied, 29.0775 s, 18.0 MB/s
real    0m29.391s
user    0m0.000s
sys     0m11.210s


dd if=~/test.tmp of=/dev/null bs=500K count=1024
524288000 bytes (524 MB) copied, 28.5766 s, 18.3 MB/s
524288000 bytes (524 MB) copied, 28.5724 s, 18.3 MB/s
524288000 bytes (524 MB) copied, 28.5706 s, 18.4 MB/s
524288000 bytes (524 MB) copied, 28.5765 s, 18.3 MB/s
524288000 bytes (524 MB) copied, 28.5752 s, 18.3 MB/s

@notro
Copy link
Contributor

notro commented Aug 19, 2014

@weiszg
I will try this change

#define SDHCI_BCM_DMA_WAITS 1  /* delays slowing DMA transfers: 0-31 */

How do you test performance?

@notro
Copy link
Contributor

notro commented Aug 19, 2014

Same setup as previously with this change:

-#define SDHCI_BCM_DMA_WAITS 30  /* delays slowing DMA transfers: 0-31 */
+#define SDHCI_BCM_DMA_WAITS 1  /* delays slowing DMA transfers: 0-31 */

Test

$ sync; time dd if=/dev/zero of=~/test.tmp bs=500K count=1024
524288000 bytes (524 MB) copied, 33.061 s, 15.9 MB/s
real    0m33.378s
user    0m0.000s
sys     0m16.430s

524288000 bytes (524 MB) copied, 30.3007 s, 17.3 MB/s
real    0m30.633s
user    0m0.030s
sys     0m16.060s

524288000 bytes (524 MB) copied, 30.8766 s, 17.0 MB/s
real    0m31.110s
user    0m0.000s
sys     0m16.000s

524288000 bytes (524 MB) copied, 29.419 s, 17.8 MB/s
real    0m29.637s
user    0m0.020s
sys     0m16.040s

524288000 bytes (524 MB) copied, 29.6848 s, 17.7 MB/s
real    0m29.998s
user    0m0.030s
sys     0m16.010s


$ dd if=~/test.tmp of=/dev/null bs=500K count=1024
524288000 bytes (524 MB) copied, 28.5656 s, 18.4 MB/s
524288000 bytes (524 MB) copied, 28.5769 s, 18.3 MB/s
524288000 bytes (524 MB) copied, 28.5653 s, 18.4 MB/s
524288000 bytes (524 MB) copied, 28.5812 s, 18.3 MB/s
524288000 bytes (524 MB) copied, 28.5683 s, 18.4 MB/s

sdhci-bcm2708 and dma.c: fix for LITE channels
@weiszg
Copy link
Contributor Author

weiszg commented Aug 19, 2014

I have updated the dmaengine driver (and the bit of the old sdhci that I changed) to use WAIT_RESP instead of DMA_WAITS. This will safely transfer data even with a busy memory bus. This change should not affect transfer speeds.

@notro
Could you test this new version, too?
It seems to me that read speed is unaffected while write suffers about 5% decrease. This is logical as the old driver busy waits for the FIFO to empty after getting a DMA interrupt. ie the new driver has an additional interrupt latency when doing writes which is preferable to hogging the CPU.

@P33M
Copy link
Contributor

P33M commented Aug 19, 2014

An aside for the suitably interested (and something @weiszg and I were discussing today):

The current expansion to the DMAengine driver hardcodes these defaults: this is fine for the time being as the only clients of the BCM2835 dmaengines are the I2S and (just added) bcm2835-MMC driver, which are low-bandwidth peripherals operating at a maximum of a couple of dozen megabytes per second. Nobbling the DMA throughput by default for these clients has only minor effects on the MMC host speed as evidenced by winding the DMA wait cycle count up to 30.

The DMA engines (in particular the full-fat ones) implement functionality for 2D strided transfers as well as the various AXI "magic" flags detailed in the peripherals datasheet.

Another candidate for conversion to DMAengine client use (in fact the only other one currently using the BCM DMA channel request/allocate API) is the dma-accelerated framebuffer driver.

This driver needs to specify 2D area moves therefore a straightforward conversion is not possible: as described in the comment for struct dma_slave_config, the API can be extended through container_of() and friends:

http://lxr.free-electrons.com/source/include/linux/dmaengine.h#L306

Ideally a future expansion to the BCM2835 dmaengine driver, and subsequent modification of the clients, will pack the necessary 2D stride information and any flags peculiar to the hardware peripheral (in effect the TI and STRIDE registers) in with the dma_slave_config struct.

@notro
Copy link
Contributor

notro commented Aug 19, 2014

non-DT build this time with bcm2708.bcm2835_mmc=1

[    2.080386] DMA channels allocated for the MMC driver
[    2.121444] Load BCM2835 MMC driver
$ sync; time dd if=/dev/zero of=~/test.tmp bs=500K count=1024
524288000 bytes (524 MB) copied, 30.0866 s, 17.4 MB/s
real    0m30.110s
user    0m0.020s
sys     0m12.150s

524288000 bytes (524 MB) copied, 27.768 s, 18.9 MB/s
real    0m28.264s
user    0m0.020s
sys     0m11.720s

524288000 bytes (524 MB) copied, 28.4557 s, 18.4 MB/s
real    0m28.950s
user    0m0.020s
sys     0m11.060s

524288000 bytes (524 MB) copied, 27.5649 s, 19.0 MB/s
real    0m28.062s
user    0m0.060s
sys     0m11.000s

524288000 bytes (524 MB) copied, 26.6236 s, 19.7 MB/s
real    0m27.121s
user    0m0.000s
sys     0m11.260s


$ dd if=~/test.tmp of=/dev/null bs=500K count=1024
524288000 bytes (524 MB) copied, 28.4198 s, 18.4 MB/s
524288000 bytes (524 MB) copied, 28.4442 s, 18.4 MB/s
524288000 bytes (524 MB) copied, 28.4451 s, 18.4 MB/s
524288000 bytes (524 MB) copied, 28.4335 s, 18.4 MB/s
524288000 bytes (524 MB) copied, 28.4427 s, 18.4 MB/s

popcornmix added a commit that referenced this pull request Aug 20, 2014
add alternative MMC driver and support scatter/gather transfers in dmaengine
@popcornmix popcornmix merged commit 495e54e into raspberrypi:rpi-3.12.y Aug 20, 2014
xobs pushed a commit to adafruit/adafruit-raspberrypi-linux that referenced this pull request Sep 1, 2014
The PL061 driver had the irqdomain initialization in an unfortunate
place: when used with device tree (and thus passing the base IRQ
0) the driver would work, as this registers an irqdomain and waits
for mappings to be done dynamically as the devices request their
IRQs, whereas when booting using platform data the irqdomain core
would attempt to allocate IRQ descriptors dynamically (which works
fine) but also to associate the irq_domain_associate_many() on all
IRQs, which in turn will call the mapping function which at this
point will try to set the type of the IRQ and then tries to acquire
a non-initialized spinlock yielding a backtrace like this:

CPU: 0 PID: 1 Comm: swapper Not tainted 3.13.0-rc1+ raspberrypi#652
Backtrace:
[<c0016f0c>] (dump_backtrace) from [<c00172ac>] (show_stack+0x18/0x1c)
 r6:c798ace0 r5:00000000 r4:c78257e0 r3:00200140
[<c0017294>] (show_stack) from [<c0329ea0>] (dump_stack+0x20/0x28)
[<c0329e80>] (dump_stack) from [<c004fa80>] (__lock_acquire+0x1c0/0x1b80)
[<c004f8c0>] (__lock_acquire) from [<c0051970>] (lock_acquire+0x6c/0x80)
 r10:00000000 r9:c0455234 r8:00000060 r7:c047d798 r6:600000d3 r5:00000000
 r4:c782c000
[<c0051904>] (lock_acquire) from [<c032e484>] (_raw_spin_lock_irqsave+0x60/0x74)
 r6:c01a1100 r5:800000d3 r4:c798acd0
[<c032e424>] (_raw_spin_lock_irqsave) from [<c01a1100>] (pl061_irq_type+0x28/0x)
 r6:00000000 r5:00000000 r4:c798acd0
[<c01a10d8>] (pl061_irq_type) from [<c0059ef4>] (__irq_set_trigger+0x70/0x104)
 r6:00000000 r5:c01a10d8 r4:c046da1c r3:c01a10d8
[<c0059e84>] (__irq_set_trigger) from [<c005b348>] (irq_set_irq_type+0x40/0x60)
 r10:c043240c r8:00000060 r7:00000000 r6:c046da1c r5:00000060 r4:00000000
[<c005b308>] (irq_set_irq_type) from [<c01a1208>] (pl061_irq_map+0x40/0x54)
 r6:c79693c0 r5:c798acd0 r4:00000060
[<c01a11c8>] (pl061_irq_map) from [<c005d27c>] (irq_domain_associate+0xc0/0x190)
 r5:00000060 r4:c046da1c
[<c005d1bc>] (irq_domain_associate) from [<c005d604>] (irq_domain_associate_man)
 r8:00000008 r7:00000000 r6:c79693c0 r5:00000060 r4:00000000
[<c005d5d0>] (irq_domain_associate_many) from [<c005d864>] (irq_domain_add_simp)
 r8:c046578c r7:c035b72c r6:c79693c0 r5:00000060 r4:00000008 r3:00000008
[<c005d814>] (irq_domain_add_simple) from [<c01a1380>] (pl061_probe+0xc4/0x22c)
 r6:00000060 r5:c0464380 r4:c798acd0
[<c01a12bc>] (pl061_probe) from [<c01c0450>] (amba_probe+0x74/0xe0)
 r10:c043240c r9:c0455234 r8:00000000 r7:c047d7f8 r6:c047d744 r5:00000000
 r4:c0464380

This moves the irqdomain initialization to a point where the spinlock
and GPIO chip are both fully propulated, so the callbacks can be used
without crashes.

I had some problem reproducing the crash, as the devm_kzalloc():ed
zeroed memory would seemingly mask the spinlock as something OK,
but by poisoning the lock like this:

u32 *dum;
dum = (u32 *) &chip->lock;
*dum = 0xaaaaaaaaU;

I could reproduce, fix and test the patch.

Reported-by: Russell King <linux@arm.linux.org.uk>
Cc: Rob Herring <robherring2@gmail.com>
Cc: Haojian Zhuang <haojian.zhuang@linaro.org>
Cc: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
pfpacket pushed a commit to pfpacket/linux-rpi-rust that referenced this pull request Apr 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants