Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Irregular GPIO latency #497
I have medium fast data acquisition hardware / code that uses the falling edge of a GPIO input to signal an FIQ. The FIQ code preempts a PREEMPT_RT Linux kernel and stores ADC data from other GPIO pins in a buffer. This code works as expected on the Raspberry Pi 1 B using older firmware and slightly modified USB drivers for OSADL’s old 3.12.24-rt38 patch set from last year.
To push higher data rates and modernize the system, I recently redesigned the hardware to take advantage of the new 40 pin header and ported my FIQ code to the RPi 2’s newer kernel. I immediately discovered irregular GPIO latency issues, which I initially attributed multi-core bus contention. I tried a few different troubleshooting “fixes” (busy waiting the other cores during an FIQ, disabling DMA, reviewing the newest USB FIQ code for incompatibilities, etc.), but they were all dead ends.
Running out of ideas with the RPi 2, I purchased an RPi 1 B+ since it was compatible with my working BCM2835 code and had the new 40 pin header. I found that the irregular latency issue was also present in recent firmware (including commit 1f5c0d7), so I then proceeded to work backwards in commits to find when the issue appeared.
After a fair amount of testing, I found that my code runs with zero GPIO latency issues for 9+ hours using the Oct 6, 2014 c786f85 commit but within minutes shows GPIO latency issues on the Oct 12, 2014 e4afeda commit. This suggests to me that the cause is the low-level firmware since I used the same 3.12.24-rt38 kernel to test each commit.
Attached below are screenshots of the observed behavior using the RPi 1 B+ and an oscilloscope with infinite persistence turned on. Infinite persistence is the sum of all signal traces observed during a given time window and is useful for catching infrequent, irregular signal behavior.
The falling edge that the FIQ is responding to is shown on channel 1 (yellow) and the ADC data read strobe driven by FIQ code is shown on channel 3 (purple). Commits 1f5c0d7 and e4afeda show the undesirable, irregular GPIO latency behavior and commit c786f85 does not.
Can someone with access to the firmware source code please look into why this latency issue might be occurring and perhaps offer a fix? I would greatly appreciate it.
I would be happy to help test new firmware fixes. I can also provide the kernel I'm using, but it is of limited utility since reproducing the latency issue is somewhat involved (it requires an oscilloscope / logic analyzer and a method of generating a square wave). I unfortunately do not have a minimal example of my kernel code for others to reproduce this issue, but I can try to provide one if it'd be helpful. My code is tied into a lot of other logic, which makes it difficult to create a minimal example (a high level overview of the code is provided in the diagram below).
My cmdline.txt is:
dwc_otg.speed=1 dwc_otg.fiq_fix_enable=0 dwc_otg.fiq_split_enable=0 root=/dev/mmcblk0p2 rootfstype=ext4 rootwait elevator=noop
My config.txt file is:
I run my RPi without Xorg using the minimal raspbian-ua-netinst, but I did try increasing gpu_mem to make the config.txt more standard. It did not fix the issue.
9 hours with c786f85:
15 minutes with e4afeda:
10 minutes with 1f5c0d7:
There are a small number of commits in firmware tree between Oct 6th and Oct 12th 2014. None of them look particularly obvious for causing this.
@popcornmix Thank you so much for the firmware files! I ran through the list a few times and observed the following behavior:
I need to look more into the nature of the kernel panic. The panic seems to occur within seconds to an hour. Sometimes I’ve just stopped the test to move on to another firmware file if no panic has occurred and the latency has looked normal for a while (>20 minutes).
The above was done on an RPi 1 B+ with bootcode.bin from c786f85. I used dd if=/dev/zero of=/dev/null and my userspace DAQ code to generate a high CPU load of about 2.
Was anything changed with respect to disable_pvt in the above firmware files?
@P33M Yes, the issue is infrequent -- it can take anywhere from a few seconds to a couple minutes before the first latency issue shows up.
It's a change to order of initialisation rather than any deliberate change in behaviour.
It's unfortunate that there's no clear explanation beyond just a change in the initialization order, but I suppose this is still progress because the GPIO issue is now associated with a GPIO related firmware change.
Perhaps I should start working on minimal example code that reproduces the latency issue? It might help in further isolating what's going on. Also, here are the logs from vcdbg for 5ee9b56e and 32a704ea:
I thought they might be useful.
Finally, I spent some time this evening re-running firmware tests to re-confirm we've isolated the right firmware change. I definitely see the latency problem in firmwares 1f438843 to 5ee9b56e, but not 32a704ea to a363c9d4. I also removed my DAQ circuitry and drove the GPIO pin directly (with source termination) in the off chance there was some issue with my hardware causing false FIQ fires. No luck.
I've stepped through the code before and after that commit and the problem is that the config.txt parsing code is now disabling pvt before powerman is initialised (which is enabling it again).
I've moved the handling of disable_pvt to later. Here is a top-of-tree firmware with the fix added. Can you test?
So far, so good. There hasn't been an FIQ GPIO glitch (or kernel panic) in the past 45 minutes. I'm going to let it run longer on the RPi 1 B+ and then switch over to the RPi 2 code. I'll probably have that done by Monday (I haven't touched my RPi 2 code in a while). Then I guess we can close this issue and have a beer?
Thanks so much for looking into this -- I really appreciate it.
@subspclr4 - very cool project (and great drill down)!
if I may tack on a question - I am interested in the hardware->usercode notification path using fasync(): what latency and jitter do you observe in the "external event->user task" gets notified path?
(I could make good use of a low-latency notification path for this particular purpose: machinekit/machinekit#687 - any chance that part (kernel module, userland code fragment) being shared?
@mhaberler Thanks for the interest. Latency and jitter on fasync notification hasn't been issue for my application, so I haven't measured it in a while. If I remember, the worst case latency was on par with the results I got from cyclictest:
Looking at my old notes, I unfortunately only have end-to-end plots that confound the fasync latency measurement with 300ish microseconds of control algorithm execution time. They show the total chain of signaling: FIQ » high priority IRQ at specific crank angle(s) » fasync » user space control algorithm with floating point math » ioctl back to kernel » my SPI driver » CAN packet to ECU. From those plots the upper bound on fasync is roughly consistent with the cyclictest results above.
I have been rethinking the entire code architecture, and fasync is actually on the chopping block. I want to explore using a plain, threaded blocking read instead of a signal since it will be simpler, will (hopefully) give similar performance, and will be way more standard than relying on signaling.
The results are, however, still TBD. I'll let you know how it goes (and put together some plots / code). As for my current code, it's pretty much exactly what's in LDD. I am not ready to release the entire code base, but it will happen soon. My goal is to rework it into something not tied to a niche automotive application.
@subspclr4 - interesting; I was wondering about the rationale for the fasync use, which I had not seen before
I did some loosely-related work on a beaglebone - propagating a GPIO flip to a Xenomai thread using an RTDM driver, and I scoped the result - about 5uS until the kernel code starts executing, and about 15uS until userland gets unblocked
While this is Xenomai/RTDM-specific I think a plain Linux driver and userland doing say a read() might not show drastically different results; and since RT-PREEMPT will likely be the goto-kernel for RT support I am really interested in your results!
added a commit
Nov 30, 2015
@mhaberler Sorry for the delay writing back. I did run some preliminary tests last month, but I didn't see much of a difference between fasync and a blocking read() on the scope. I was hoping to refine those tests with hard latency numbers by now, but I haven't had a chance yet. Anyways, I figured it'd be good to at least mention that fasync is probably not a mechanism for improved notification performance on machinekit.