Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

led-chain – Flickering with Omega2 and WS2812 #3

Closed
bennigraf opened this issue Mar 3, 2019 · 11 comments
Closed

led-chain – Flickering with Omega2 and WS2812 #3

bennigraf opened this issue Mar 3, 2019 · 11 comments

Comments

@bennigraf
Copy link

Hey Lukas,

first off, thanks for open sourcing all this stuff!

I'm trying to run some ws2812 LEDs with a Onion Omega2, but have some trouble controlling them. Especially when writing data to /dev/ledchain0 at higher framerates (> 2fps), I get a lot of flickering/flashes (apparently wrong data pushed to the LEDs, since they keep the wrong color until the next update). Often they're at the correct color but fully turned on instead of dimmed, sometimes color's completely off.

I'm trying to write to that file via python right now, but the same issues happen when using echo -en … directly from CLI.

Strangely the last LED always lights up correctly – no matter how many LEDs I use in total. I tested with between 1 and 30 LEDs on two different strands.

Is that a known problem with the apparently demanding timing of the 2812 LEDs or could there be a different issue with how I use the driver? Do you have any ideas or maybe encountered a similar situation?

I'm using kmod-p44-ledchain_4.4.61+0.9-2_mipsel_24kc that I downloaded from plan44.ch (found a link in some forum thread...)

I'd appreciate any ideas on how to debug this.
Best,
Benjamin.

@bennigraf bennigraf changed the title led-chainFlickering with Omega2 led-chainFlickering with Omega2 and WS2812 Mar 3, 2019
@bennigraf bennigraf changed the title led-chainFlickering with Omega2 and WS2812 led-chain – Flickering with Omega2 and WS2812 Mar 3, 2019
@plan44
Copy link
Owner

plan44 commented Mar 3, 2019

Hi Benjamin,

if you are using more than one ledchain device, then the version you are using is definitely outdated - I found and fixed a bug in the interrupt handling that causes massive flickering when more than one chain is used at the same time.
But when using only a single chain, I am not aware of a bug that would cause these problems.

How did you connect the chain? Directly or via a level shifter? Without level shifter, flickering or not is a matter of luck and tiny differences in supply voltage (I had a case like: 4.99V works, 5.03V it doesn't). I really recommend a level shifter.

Have you checked the driver's statistics (cat /dev/ledchain0)? Does it show a lot of errors or overruns? Basically, 2fps should be no problem at all. I have an application with chains of 518 LEDs (ws2813, though) which works up to 50fps.

BTW: Onion has includes p44-ledchain in their Omega2 PRO firmware, and now that OpenWrt v18.06 is also available for the other Omegas, I guess p44-ledchain should be there as well.

Lukas

@bennigraf
Copy link
Author

Hey,

thanks for catching up so fast.

I always only used one actual "device" (as in /dev/ledchainN) with only one LED strip connected with 1-30 LEDs. Sorry for the confusion.

Thanks for pointing out the new release – I did upgrade the firmware when starting with this project, but had to google around a bit to find out that I needed to upgrade to latest and would then get v18.06. p44-ledchain is indeed in /lib/modules for that new 4.14 kernel. Nice! (Only that by default it seems to load /dev/ledchain3 which isn't available on a non-plus Omega2, so I had to rmmod and insmod again with the correct device).

I tried that, but still no luck unfortunately:

The statistics show a lot of "retries" and some errors, but I don't know how to interpret these numbers:

root@Omega-4CED:~# cat /dev/ledchain0 
Ready
Last update: 1 repeats, last timeout=10001 nS, max irq=5847 nS
Totals: updates=6556, overruns=0, retries=3195, errors=13, irqs=19628

I tried it both without and with a level shifter and even with running the LED strip directly on 3v3 (don't know if that should work, but at least logic level shouldn't be a problem then ;-) ). Funny enough it's always the same strange behaviour.

I'll still order an actual 74AHCT125 that you suggested in some other forum thread, since you mentioned that other shifters sometimes cannot cope with those HF signals. Let's see if that helps…

Thanks again,
Benjamin.

@plan44
Copy link
Owner

plan44 commented Mar 5, 2019

Hi Benjamin,

I dug in my LED box to find some old WS2812 strips to do some experiments myself.

It turns out that the assumption about the max pause time between bits a WS2812 would not yet interpret as a chain reset can be lower than what the driver assumed (10µS). When this happens, the update flickers badly, because the chain resets in the middle of the update.

To fix that, I added an optional parameter maxTpassive, which can set a different maximal bit pause time than the default for the LED type. I got my stone age WS2812 chain working with maxTpassive=5100(nS).

Have a look at the updated README.md, it explains the new settings and also the meaning of the statistics info.

Lukas

@plan44
Copy link
Owner

plan44 commented Mar 5, 2019

BTW: you can find a built version here

@bennigraf
Copy link
Author

Hey Lukas,

thanks a lot for that huge effort of digging into this!

I tried to load the module you provided directly (from the link above). insmod works, but the omega crashes when I try to write data to /dev/ledchain…. I guess it's because of conflicting kernels:

root@Omega-4CED:~# uname -a
Linux Omega-4CED 4.14.81 #0 Thu Feb 21 20:59:23 2019 mips GNU/Linux

…while your module build apparently requires 4.14.95. (I used --force-depends when installing the package…)

I'm not sure how to go on from here…
– Is there a way for me to upgrade my Omega2 to 4.14.95? It seems even the official Omega build environment (https://github.com/OnionIoT/source/) is still at 4.14.81 – I wonder how you went to .95 in the first place ;-)
– If I installed their build environment, would I be able to build your module against 4.14.81?
– Or do you happen to have a build for 4.14.81 laying around as well?

Btw when getting device info right after insmod before writing data to it, I get strange data:

root@Omega-4CED:~# cat /dev/ledchain0 
Ready
Last update: 0 retries, last timeout=0 nS, min..max irq=0..0 nS
Totals: updates=2177551360, overruns=2206473472, retries=2177551360, errors=2177681792, irqs=0

Best,
Benjamin.

@plan44
Copy link
Owner

plan44 commented Mar 5, 2019

hmm. strange. I don't think it's the kernel version, I've tested it yesterday on a 4.14.63 build with --force-depends.
Do you see something related (maybe the cause of the crash) when typing dmesg?

About where 4.14.95 comes from - I'm not using Onion's firmware, but my own OpenWrt builds, and these are now on OpenWrt v18.06.2 with 4.14.95 kernel (whereas Onion is currently on v18.06.1 AFAIK).

@bennigraf
Copy link
Author

Hey,

dmesg doesn't show something meaningful to me. (I can only run it after the Omega rebooted, since it becomes unresponsive instantly when running echo -en '\xFF\x00\x00\x00\xFF\x00' > /dev/ledchain0.) I uploaded it here.

Running logread -f in a parallel screen session also doesn't show any messages before becoming unresponsive.

It does show the following when doing insmod …, but that seems allright to me:

root@Omega-4CED:~# logread -f
Tue Mar  5 19:48:24 2019 kern.info kernel: [  398.675021] ledchain: pwm_base=0xB0005000
Tue Mar  5 19:48:24 2019 kern.info kernel: [  398.679393] ledchain: v2 - Device: /dev/ledchain0
Tue Mar  5 19:48:24 2019 kern.info kernel: [  398.684463] ledchain: - PWM channel    : 0
Tue Mar  5 19:48:24 2019 kern.info kernel: [  398.688620] ledchain: - PWM buffer size: 132
Tue Mar  5 19:48:24 2019 kern.info kernel: [  398.692987] ledchain: - Number of LEDs : 10
Tue Mar  5 19:48:24 2019 kern.info kernel: [  398.697226] ledchain: - Inverted       : 0
Tue Mar  5 19:48:24 2019 kern.info kernel: [  398.701388] ledchain: - LED type       : WS2812
Tue Mar  5 19:48:24 2019 kern.info kernel: [  398.705978] ledchain: - Max retries    : 3
Tue Mar  5 19:48:24 2019 kern.info kernel: [  398.710139] ledchain: - Max Tpassive   : 5100 nS

I don't know if there are any other logs available on the Omega. Next thing would probably be to try to read the serial/uart console somehow, but I'll only get to try that next weekend or so.

Uninstalling your package and re-installing the original one from the onion sources makes the ledchain device basically work again, but with these flashes/false updates.

@bennigraf
Copy link
Author

Hello again,

Today I managed to connect to serial console and read it's output. There is an error happening when doing echo … after loading your v2 module: Unhandled kernel unaligned access[#1]…

I uploaded the full output here: https://gist.github.com/bennigraf/3d4207e0178f1b48b9c9152a11b55f6f

In the call trace it points to update_leds, but I don't really know how to read this output. Can you get some information out of this?

Best and thanks,
Benjamin.

@plan44
Copy link
Owner

plan44 commented Mar 8, 2019

Thanks for the console log.

This looks very much like a memory corruption issue. Especially I remembered your earlier observation about the strange data readouts when doing cat /dev/ledchain0.

I looked into the code and I see no way how these variables could be anything but zero right after initialisation. The device struct where they reside is allocated with kzalloc(), which zeroes the entire area. So getting anything but zero for these statistics before doing any write to /dev/ledchain0 means that the memory area gets overwritten by something else.

I see no way how p44-ledchain's own code could be doing that. I also have a hard time to believe that this could be a specific kernel dependency on 4.14.81. I rather suspect that the current onion kernel (or libaries with direct physical memory access in the onion fw) contains something that interferes with p44-ledchain.

So to narrow down the problem:

  • do you see the strange data also with the old kmod when you read out before writing anything?
  • are there any special drivers or libraries you are using that might directly access the physical memory?
  • maybe you can post the output of lsmod?

@bennigraf
Copy link
Author

bennigraf commented Mar 12, 2019

Hello again,

TLDR: I've built an ipk of the updated version of plan44/p44-ledchain (which provides kmod-p44-ledchain) matching the current 4.14.81 Kernel using the OnionIOT/source repo on the openwrt-18.06 branch. I've successfully loaded this module on both a Omega2 and a Omega2+ and used it to drive a WS2813 and a (previously horribly flickering) WS2812 LED chain. Here's the ipk, in case someone else needs it: kmod-p44-ledchain_4.14.81+2.0-7_mipsel_24kc.ipk.zip

Regarding your questions:

  • Both with your "old" kmod and with the current one which is included in OnionOS cat /dev/ledchain0 reports are apparently sane – all values start at 0 and rise more or less slowly when writing to the device. It was only your new build linked above which produced the strange readings. (With those numbers being so high and so similar – could that be some issue with overflowing or signed vs. unsigned values?)
    The built targeting the matching kernel also behaves nicely (starting at 0, rising slowly...), btw.
  • I don't have the devices at hand right now, so no lsmod for now unfortunately. But both devices I tested with were pretty much only running the original OnionOS firmware upgraded to the latest version. I only installed python3-lite and screen. I can still post lsmod later today, maybe something catches your eye ;-).
  • I guess it could still be some super strange hardware issue that causes the memory corruption, but since both devices behave so similar, that seems not very likely to me. I also double checked the power supply, but don't have super nice tools at hand to test the electrical parts (no oscilloscope/lab power supply), so it's a bit guesswork on my hand. In my quick test yesterday the devices appeared quite stable, but I'll keep monitoring their behaviour, obviously ;-)…

Since you went out of your way to provide an updated version of p44-ledchain that successfully drives my shabby WS2812 and I have a working build of it now, we can close this issue in my opinion. But I'll leave it up to you, in case you have further questions or want to dig deeper into this issue above.

Best and thanks a lot,
Benjamin.

@plan44
Copy link
Owner

plan44 commented Mar 12, 2019

Thanks a lot for your work! And thanks for providing the build for current OnionOS!

I think we can safely assume now that p44-ledchain code is ok, but running a kmod with another kernel version than the one it was built for is not. Not that we didn't know that already ;-)

But still, I was surprised aboout such strange side effects. Probably I was misguided by thinking only about the kernel APIs as such (none of those used by p44-ledchain have changed from 4.4.61 to 4.14.81 to 4.14.95). However, what is very likely to cause memory corruptions is when a kernel data struct changes its size, as kmod code uses sizeof() and offsetof() a lot. In particular, a change in size of struct cdev would obviously kill p44-ledchain. Now, this apparently hasn't happened between my 4.4.61 and 4.14.95 builds, but something is different in the onion 4.4.81 build. Conclusion: most probably it's not the kernel version number, but a different set of kernel build options betwen my mostly vanilla OpenWrt builds and the current Onion build.

Lesson learned!

I agree that we can close this issue now - thanks to your verification.

@plan44 plan44 closed this as completed Mar 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants