New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible strange behaviour with HDMI on first boot #51

Open
sihil opened this Issue Mar 14, 2017 · 31 comments

Comments

Projects
None yet
8 participants
@sihil

sihil commented Mar 14, 2017

Apologies for duplicating my post on the Pine64 forum. Unfortunately I'm unable to reply further due to an anti-spam measure that they have introduced on the forums (according to my IRC conversation, as a new user I have to wait three days before I can make my second post).

For completeness I'm going to include my original text:

I've observed a weird issue with the xenial-pine64-bspkernel-20161218-1.img image whilst trying to get it to run headless on my Pine64. Based on an evening of flashing and re-flashing SD cards I have concluded that:
If an HDMI display is connected on the first ever boot then it seems that the OS will NEVER boot without an HDMI display.
If NO HDMI display is connected on the first ever boot then the OS will boot happily - with or without a display for ever more.
This has tripped me up on an OpenHABian derivative image that exhibits the same behaviour (see issue at openhab/openhabian#105).

I figure there is a script that is running on the first ever boot that sets a piece of configuration differently depending on whether a display is connected or not. Thus far I've not figured out what that is or how to fix it so that a system booted with HDMI the first time can later be booted headless.

Sadly I do not have a serial cable for my P64 so am unable to see the console and figure out what's happening.

Sounds suspiciously like unintended behaviour - if anyone has any suggestions then I'd be glad to hear them.

@longsleep kindly replied thus:

Well this sounds strange. The only thing that happens on the first boot is generating keys. This takes a lot of computing power. May be the power supply is not sufficient for this and when HDMI is connected extra power is available through HDMI.

If the board does not not, do you know what the error is? Where is it stuck? How did you find out that it did not boot?

@sihil

This comment has been minimized.

Show comment
Hide comment
@sihil

sihil Mar 14, 2017

I'm using a Raspberry Pi 2A PSU that I had to hand so I'm reasonably confident that power is not an issue. Also, the issue only occurs on subsequent boots if an HDMI display was attached on the first boot - and it doesn't sound like it should be generating keys on subsequent boots.

My testing setup has been brutally simple: have it plugged into an ethernet port. My criteria as to whether it has booted or not is whether the interface comes up and I see traffic on the port. I've been leaving my laptop pinging the IP address. Crude, but effective and reproducible many times.

I was looking at dmesg output and noticed that the sunxi disp2 is initialised once on first boot and twice on subsequent boots. I have no idea if that's connected.

Sadly it's impossible to tell where it is stuck without a display or console attached. I've just ordered a USB/UART cable so I can do that (been regretting not buying the Pine64 adaptor in the first place). I might try seeing if I can connect it to the serial port of a raspberry pi tonight rather that waiting for that delivery.

I'd be intrigued to know if anyone else was able to re-produce it (or not able to re-produce it) - would give me more confidence that this is actually a thing rather than it being something silly that I've done or my particular board.

I'll write more when I discover anything new.

sihil commented Mar 14, 2017

I'm using a Raspberry Pi 2A PSU that I had to hand so I'm reasonably confident that power is not an issue. Also, the issue only occurs on subsequent boots if an HDMI display was attached on the first boot - and it doesn't sound like it should be generating keys on subsequent boots.

My testing setup has been brutally simple: have it plugged into an ethernet port. My criteria as to whether it has booted or not is whether the interface comes up and I see traffic on the port. I've been leaving my laptop pinging the IP address. Crude, but effective and reproducible many times.

I was looking at dmesg output and noticed that the sunxi disp2 is initialised once on first boot and twice on subsequent boots. I have no idea if that's connected.

Sadly it's impossible to tell where it is stuck without a display or console attached. I've just ordered a USB/UART cable so I can do that (been regretting not buying the Pine64 adaptor in the first place). I might try seeing if I can connect it to the serial port of a raspberry pi tonight rather that waiting for that delivery.

I'd be intrigued to know if anyone else was able to re-produce it (or not able to re-produce it) - would give me more confidence that this is actually a thing rather than it being something silly that I've done or my particular board.

I'll write more when I discover anything new.

@longsleep

This comment has been minimized.

Show comment
Hide comment
@longsleep

longsleep Mar 14, 2017

Owner

Well, just to be clear. I have flashed my images many times and usually do not have HDMI connected at all ever. I gues the issue is specific to your particular setup.

Owner

longsleep commented Mar 14, 2017

Well, just to be clear. I have flashed my images many times and usually do not have HDMI connected at all ever. I gues the issue is specific to your particular setup.

@sihil

This comment has been minimized.

Show comment
Hide comment
@sihil

sihil Mar 14, 2017

Yes, and that works. Unfortunately I built a machine that happened to be connected to HDMI on first boot and now I can't unplug the display to hide it in a cupboard as it won't boot :(

The simplest answer for me is to rebuild it and start over (which is now my plan for tonight), but that won't solve it for future users and violates the principle of least surprise.

sihil commented Mar 14, 2017

Yes, and that works. Unfortunately I built a machine that happened to be connected to HDMI on first boot and now I can't unplug the display to hide it in a cupboard as it won't boot :(

The simplest answer for me is to rebuild it and start over (which is now my plan for tonight), but that won't solve it for future users and violates the principle of least surprise.

@pfeerick

This comment has been minimized.

Show comment
Hide comment
@pfeerick

pfeerick Mar 15, 2017

Contributor

I'll test that tonight, as I can't say with certainly I've done exactly that... connected with HDMI in the first instance, and then run the pine64 headless afterwards. I have mostly run it with HDMI connected all the time as it was a GUI image, or with no HDMI connected right from the start as I have run it with a console cable connected for the initial configuration.

btw, you should be able to post 1 message per day during the settling in period. If not, please send me a PM (same handle on the forum), as it means something has been misconfigured.

Contributor

pfeerick commented Mar 15, 2017

I'll test that tonight, as I can't say with certainly I've done exactly that... connected with HDMI in the first instance, and then run the pine64 headless afterwards. I have mostly run it with HDMI connected all the time as it was a GUI image, or with no HDMI connected right from the start as I have run it with a console cable connected for the initial configuration.

btw, you should be able to post 1 message per day during the settling in period. If not, please send me a PM (same handle on the forum), as it means something has been misconfigured.

@sihil

This comment has been minimized.

Show comment
Hide comment
@sihil

sihil Mar 15, 2017

@pfeerick I am able to post again. It would be really helpful if you could add another line of text to the error page that indicates that rate limiting might be the reason.

I'm really interested to hear what your results are :)

sihil commented Mar 15, 2017

@pfeerick I am able to post again. It would be really helpful if you could add another line of text to the error page that indicates that rate limiting might be the reason.

I'm really interested to hear what your results are :)

@pfeerick

This comment has been minimized.

Show comment
Hide comment
@pfeerick

pfeerick Mar 15, 2017

Contributor

I wasn't able to reproduce that behaviour. Here was my test methodology so we can verify we are on the same page.

I have booted a fresh image of Ubuntu (https://www.stdin.xyz/downloads/people/longsleep/pine64-images/ubuntu/xenial-pine64-bspkernel-20161218-1.img.xz). I plugged in a wireless USB keyboard/mouse dongle, ethernet, and HDMI. Powered up the pine64, let it boot up, logged in, rebooted. I pulled the HDMI as the pine64 was shutting down. Watched the ethernet lights, the pine64 came back up again, and I was able to log in via SSH.

So it has booted up with HDMI in the first instance, and had no problems. Booting up without the HDMI also appear to be fine. I tried powering up the pine64 up and down a few times, and it continued to start up flawlessly, so it wasn't a one off brought about by rebooting it.

My power supply is a 5A capable 12v to quad-usb converter, and it is tuned to the slightly higher voltage of 5.2v. Hopefully that will start to determine what is the cause of the problem. If you have a similar setup bar the power supply, then it does start sounding like it is power related.

Contributor

pfeerick commented Mar 15, 2017

I wasn't able to reproduce that behaviour. Here was my test methodology so we can verify we are on the same page.

I have booted a fresh image of Ubuntu (https://www.stdin.xyz/downloads/people/longsleep/pine64-images/ubuntu/xenial-pine64-bspkernel-20161218-1.img.xz). I plugged in a wireless USB keyboard/mouse dongle, ethernet, and HDMI. Powered up the pine64, let it boot up, logged in, rebooted. I pulled the HDMI as the pine64 was shutting down. Watched the ethernet lights, the pine64 came back up again, and I was able to log in via SSH.

So it has booted up with HDMI in the first instance, and had no problems. Booting up without the HDMI also appear to be fine. I tried powering up the pine64 up and down a few times, and it continued to start up flawlessly, so it wasn't a one off brought about by rebooting it.

My power supply is a 5A capable 12v to quad-usb converter, and it is tuned to the slightly higher voltage of 5.2v. Hopefully that will start to determine what is the cause of the problem. If you have a similar setup bar the power supply, then it does start sounding like it is power related.

@sihil

This comment has been minimized.

Show comment
Hide comment
@sihil

sihil Mar 15, 2017

Hmmm, curious. That does sound similar - except I have not plugged in a mouse or keyboard, just HDMI (that sounds ridiculous now I'm writing it down, but none the less).

I'll have another go tonight.

sihil commented Mar 15, 2017

Hmmm, curious. That does sound similar - except I have not plugged in a mouse or keyboard, just HDMI (that sounds ridiculous now I'm writing it down, but none the less).

I'll have another go tonight.

@longsleep

This comment has been minimized.

Show comment
Hide comment
@longsleep

longsleep Mar 15, 2017

Owner

Thanks for testing this. I am very interested in getting this resolved. @sihil do you have an alternative power supply which you could try? Preferably power via the PINs on the Euler connector.

Also connecting any extra USB devices like keyboard or mouse require even more power unless they are connected via a powered USB hub which then might in turn feed power to Pine64.

Owner

longsleep commented Mar 15, 2017

Thanks for testing this. I am very interested in getting this resolved. @sihil do you have an alternative power supply which you could try? Preferably power via the PINs on the Euler connector.

Also connecting any extra USB devices like keyboard or mouse require even more power unless they are connected via a powered USB hub which then might in turn feed power to Pine64.

@pfeerick

This comment has been minimized.

Show comment
Hide comment
@pfeerick

pfeerick Mar 15, 2017

Contributor

Doesn't sound too ridiculous... you can always plug in the keyboard/mouse after the pine64 has booted and you can see stuff on the screen... or you might have the screen connected just to see boot messages ;)

Another thing to consider is kernel/uboot updates. If you had done that on the first boot, and something went wrong (it can happen, but it is likely to be power or sd card corruption related), that could be the cause, not the first boot with HDMI. In other words, don't do it (just in case that is the issue). And as longsleep said, alternate power supply to the euler pins would be great also, as that will provide more reliable power to the pine64.

Contributor

pfeerick commented Mar 15, 2017

Doesn't sound too ridiculous... you can always plug in the keyboard/mouse after the pine64 has booted and you can see stuff on the screen... or you might have the screen connected just to see boot messages ;)

Another thing to consider is kernel/uboot updates. If you had done that on the first boot, and something went wrong (it can happen, but it is likely to be power or sd card corruption related), that could be the cause, not the first boot with HDMI. In other words, don't do it (just in case that is the issue). And as longsleep said, alternate power supply to the euler pins would be great also, as that will provide more reliable power to the pine64.

@sihil

This comment has been minimized.

Show comment
Hide comment
@sihil

sihil Mar 16, 2017

I experienced the same issue again. I'll see if I can borrow a workbench PSU and do as you suggest.

sihil commented Mar 16, 2017

I experienced the same issue again. I'll see if I can borrow a workbench PSU and do as you suggest.

@RyanRamchandar

This comment has been minimized.

Show comment
Hide comment
@RyanRamchandar

RyanRamchandar Mar 31, 2017

I am seeing similar behaviours that you are @sihil when I flashed the xenial-pine64-bspkernel-20161218-1.img. In my case my goal is to run headless, only access the board by ssh.

After flashing the board, I did not connect any cables except power (5V 2A) and ethernet. The board sometimes would come up though other times it would not. I read your post on the forum that it had some success when connecting an HDMI display so I tried that. And to my luck it came up just fine. I then unplugged the HDMI cable and used it headless.

However, if I reboot the board or power is lost, there is a good chance it won't come back up unless I connect an HDMI monitor and power cycle it a few times.

Note about power draw [1]:

On the 1GB and 2GB Pine64+ variants a DC5V/BAT POWER switch can be used to bypass the MT3608 boost converter (input voltage to 5V). If the board is powered from DC-IN (micro-USB or Euler connector), the DC5V setting connects the input voltage to the USB power supply rails, in BAT setting 5V is generated from any of the connected power sources (e.g. battery or DC-IN). The USB ports are current-limited to about 650mA per port in either setting.

Please be aware that when using the jumper in DC5V position an insufficient supply voltage is directly visible on the USB ports. If the Pine64+ is running on battery, the USB ports are only powered when the BAT setting is used.

[1] http://linux-sunxi.org/Pine64#DC5V.2FBAT_POWER_jumper

RyanRamchandar commented Mar 31, 2017

I am seeing similar behaviours that you are @sihil when I flashed the xenial-pine64-bspkernel-20161218-1.img. In my case my goal is to run headless, only access the board by ssh.

After flashing the board, I did not connect any cables except power (5V 2A) and ethernet. The board sometimes would come up though other times it would not. I read your post on the forum that it had some success when connecting an HDMI display so I tried that. And to my luck it came up just fine. I then unplugged the HDMI cable and used it headless.

However, if I reboot the board or power is lost, there is a good chance it won't come back up unless I connect an HDMI monitor and power cycle it a few times.

Note about power draw [1]:

On the 1GB and 2GB Pine64+ variants a DC5V/BAT POWER switch can be used to bypass the MT3608 boost converter (input voltage to 5V). If the board is powered from DC-IN (micro-USB or Euler connector), the DC5V setting connects the input voltage to the USB power supply rails, in BAT setting 5V is generated from any of the connected power sources (e.g. battery or DC-IN). The USB ports are current-limited to about 650mA per port in either setting.

Please be aware that when using the jumper in DC5V position an insufficient supply voltage is directly visible on the USB ports. If the Pine64+ is running on battery, the USB ports are only powered when the BAT setting is used.

[1] http://linux-sunxi.org/Pine64#DC5V.2FBAT_POWER_jumper

@longsleep

This comment has been minimized.

Show comment
Hide comment
@longsleep

longsleep Apr 1, 2017

Owner

@RyanRamchandar - so far i have seen no indication that there is a general issue with my image. I strongly suggest you get a better power supply or a lower AWG cable as i still think you guys suffer from a voltage drop which makes things go sideways on boot and HDMI just gives the extra juice to cope with that.

Owner

longsleep commented Apr 1, 2017

@RyanRamchandar - so far i have seen no indication that there is a general issue with my image. I strongly suggest you get a better power supply or a lower AWG cable as i still think you guys suffer from a voltage drop which makes things go sideways on boot and HDMI just gives the extra juice to cope with that.

@TinkerBear

This comment has been minimized.

Show comment
Hide comment
@TinkerBear

TinkerBear Apr 11, 2017

I didn't want to think it was a power supply issue either, but when running off a bench power supply (5A, good filtering), my previously 100% repro crash went away.

Possible solution: A 10µF tantalum (low ESR) capacitor soldered between the DC IN and GND pins of the Euler connector (via a 2x3 female header). Result: It's not 100% successful, but I've had 4 successful boots out of 5 now. Maybe a bigger cap will do it.

TinkerBear commented Apr 11, 2017

I didn't want to think it was a power supply issue either, but when running off a bench power supply (5A, good filtering), my previously 100% repro crash went away.

Possible solution: A 10µF tantalum (low ESR) capacitor soldered between the DC IN and GND pins of the Euler connector (via a 2x3 female header). Result: It's not 100% successful, but I've had 4 successful boots out of 5 now. Maybe a bigger cap will do it.

@longsleep

This comment has been minimized.

Show comment
Hide comment
@longsleep

longsleep Apr 11, 2017

Owner

I didn't want to think it was a power supply issue either, but when running off a bench power supply (5A, good filtering), my previously 100% repro crash went away.

Possible solution: A 10µF tantalum (low ESR) capacitor soldered between the DC IN and GND pins of the Euler connector (via a 2x3 female header). Result: It's not 100% successful, but I've had 4 successful boots out of 5 now. Maybe a bigger cap will do it.

So what are you saying. It does not crash with your bench PSU? What is the reason for the capacitor? Did you try to slightly increase voltage with the bench PSU to 5.1V or 5.2V?

Owner

longsleep commented Apr 11, 2017

I didn't want to think it was a power supply issue either, but when running off a bench power supply (5A, good filtering), my previously 100% repro crash went away.

Possible solution: A 10µF tantalum (low ESR) capacitor soldered between the DC IN and GND pins of the Euler connector (via a 2x3 female header). Result: It's not 100% successful, but I've had 4 successful boots out of 5 now. Maybe a bigger cap will do it.

So what are you saying. It does not crash with your bench PSU? What is the reason for the capacitor? Did you try to slightly increase voltage with the bench PSU to 5.1V or 5.2V?

@TinkerBear

This comment has been minimized.

Show comment
Hide comment
@TinkerBear

TinkerBear Apr 11, 2017

Yes, with my bench supply (set at 5.00v as exactly as possible) no crash. With all my other power supplies it crashed. Didn't try a higher voltage on the bench supply, because it works fine.

Adding a capacitor between DC IN and GND on the Euler connector gets booting working on several of those supplies... most of the time (roughly 80%).

TinkerBear commented Apr 11, 2017

Yes, with my bench supply (set at 5.00v as exactly as possible) no crash. With all my other power supplies it crashed. Didn't try a higher voltage on the bench supply, because it works fine.

Adding a capacitor between DC IN and GND on the Euler connector gets booting working on several of those supplies... most of the time (roughly 80%).

@whongx

This comment has been minimized.

Show comment
Hide comment
@whongx

whongx Apr 11, 2017

Hi, i do encounter the same issue using headless image with kernel 3.10.105. However, it is not caused by HDMI but the ethernet. It cannot boot up at all and shows "BUG: soft lockup - CPU#0 stuck for 22s! " without ethernet plugged in but it sometimes can boot up successfully with ethernet plugged in. So, is it related to power supply issue too?

whongx commented Apr 11, 2017

Hi, i do encounter the same issue using headless image with kernel 3.10.105. However, it is not caused by HDMI but the ethernet. It cannot boot up at all and shows "BUG: soft lockup - CPU#0 stuck for 22s! " without ethernet plugged in but it sometimes can boot up successfully with ethernet plugged in. So, is it related to power supply issue too?

@longsleep

This comment has been minimized.

Show comment
Hide comment
@longsleep

longsleep Apr 11, 2017

Owner

@whongx yes - Ethernet draws quite some power and Gigabit Ethernet even more.

Owner

longsleep commented Apr 11, 2017

@whongx yes - Ethernet draws quite some power and Gigabit Ethernet even more.

@whongx

This comment has been minimized.

Show comment
Hide comment
@whongx

whongx Apr 11, 2017

@longsleep ok! But it cannot boot up when the ethernet is not plugged in. And I forget to mention that it does not encounter the issue when using kernel 3.10.104.

whongx commented Apr 11, 2017

@longsleep ok! But it cannot boot up when the ethernet is not plugged in. And I forget to mention that it does not encounter the issue when using kernel 3.10.104.

@longsleep

This comment has been minimized.

Show comment
Hide comment
@longsleep

longsleep Apr 25, 2017

Owner

@whongx what does it mean "cannot boot up" ? Do you have logs or at least an error message?

Owner

longsleep commented Apr 25, 2017

@whongx what does it mean "cannot boot up" ? Do you have logs or at least an error message?

@zador-blood-stained

This comment has been minimized.

Show comment
Hide comment
@zador-blood-stained

zador-blood-stained Apr 26, 2017

@longsleep
Most likely related: similar issue can be reproduced with Armbian builds (your BSP kernel source with slightly different configuration). Kernel randomly stalls on boot with different stall to success rate depending on connected/disconnected Ethernet, connected/disconnected HDMI display, etc., but there is no clear conection between these factors.
Dmesg logs with stack traces can be found in attachments in this thread, I'm attaching one of them here:
BOOTFail_2017-04-15-C1.txt

According to my understanding it locks up somewhere here when setting up IRQ for the DE2 HDMI driver:

[   45.232803] [<ffffffc000083dc0>] el1_irq+0x80/0xe4
[   45.241520] [<ffffffc000125844>] __setup_irq+0x318/0x3e0
[   45.250792] [<ffffffc000125a84>] request_threaded_irq+0xe0/0x124
[   45.260858] [<ffffffc00041280c>] disp_sys_register_irq+0x88/0x98
[   45.270936] [<ffffffc000420610>] disp_hdmi_enable+0x1d4/0x278
[   45.280724] [<ffffffc000414540>] disp_device_attached_and_enable+0x1bc/0x1d4
[   45.291985] [<ffffffc0004146f8>] bsp_disp_device_switch+0xbc/0xe4
[   45.302194] [<ffffffc00040b50c>] start_work+0x174/0x1f0
[   45.311445] [<ffffffc0000cb788>] process_one_work+0x27c/0x42c
[   45.321274] [<ffffffc0000cc76c>] worker_thread+0x208/0x320
[   45.330810] [<ffffffc0000d27ec>] kthread+0xb4/0xbc

Part of the stack trace above this must be related to the watchdog that detects the lockup, but in case it doesn't it may be related to the arch timer bug referenced in longsleep/linux-pine64#44

I am using modified ATX power supply for tests connected to the pin header, so underpowering should not be an issue in my setup.

zador-blood-stained commented Apr 26, 2017

@longsleep
Most likely related: similar issue can be reproduced with Armbian builds (your BSP kernel source with slightly different configuration). Kernel randomly stalls on boot with different stall to success rate depending on connected/disconnected Ethernet, connected/disconnected HDMI display, etc., but there is no clear conection between these factors.
Dmesg logs with stack traces can be found in attachments in this thread, I'm attaching one of them here:
BOOTFail_2017-04-15-C1.txt

According to my understanding it locks up somewhere here when setting up IRQ for the DE2 HDMI driver:

[   45.232803] [<ffffffc000083dc0>] el1_irq+0x80/0xe4
[   45.241520] [<ffffffc000125844>] __setup_irq+0x318/0x3e0
[   45.250792] [<ffffffc000125a84>] request_threaded_irq+0xe0/0x124
[   45.260858] [<ffffffc00041280c>] disp_sys_register_irq+0x88/0x98
[   45.270936] [<ffffffc000420610>] disp_hdmi_enable+0x1d4/0x278
[   45.280724] [<ffffffc000414540>] disp_device_attached_and_enable+0x1bc/0x1d4
[   45.291985] [<ffffffc0004146f8>] bsp_disp_device_switch+0xbc/0xe4
[   45.302194] [<ffffffc00040b50c>] start_work+0x174/0x1f0
[   45.311445] [<ffffffc0000cb788>] process_one_work+0x27c/0x42c
[   45.321274] [<ffffffc0000cc76c>] worker_thread+0x208/0x320
[   45.330810] [<ffffffc0000d27ec>] kthread+0xb4/0xbc

Part of the stack trace above this must be related to the watchdog that detects the lockup, but in case it doesn't it may be related to the arch timer bug referenced in longsleep/linux-pine64#44

I am using modified ATX power supply for tests connected to the pin header, so underpowering should not be an issue in my setup.

@longsleep

This comment has been minimized.

Show comment
Hide comment
@longsleep

longsleep May 13, 2017

Owner

I was able to reproduce a boot-up panic with a specific USB device connected. PR longsleep/linux-pine64#56 seems to fix that. If you can please try if that change also fixes your particular issue.

Owner

longsleep commented May 13, 2017

I was able to reproduce a boot-up panic with a specific USB device connected. PR longsleep/linux-pine64#56 seems to fix that. If you can please try if that change also fixes your particular issue.

@zador-blood-stained

This comment has been minimized.

Show comment
Hide comment
@zador-blood-stained

zador-blood-stained May 13, 2017

I'm getting these lockups with no USB devices connected (even got one today with another good power supply when I was testing u-boot changes). While the problem can be power related stack traces look too strange to me,
Also one time I got this log pine64-lockup-debug3.txt - it didn't happen in initrd as usual but much later in the boot process.

Anyway I'll try to test the PR changes later.

zador-blood-stained commented May 13, 2017

I'm getting these lockups with no USB devices connected (even got one today with another good power supply when I was testing u-boot changes). While the problem can be power related stack traces look too strange to me,
Also one time I got this log pine64-lockup-debug3.txt - it didn't happen in initrd as usual but much later in the boot process.

Anyway I'll try to test the PR changes later.

@longsleep

This comment has been minimized.

Show comment
Hide comment
@longsleep

longsleep May 13, 2017

Owner

Yes - i doubt that the USB change does fix lock-ups which happen later. I will also merge your backport-fsl-errata.patch now after reading up on the issue. But as you probably use a Kernel with that patch already this also does not fix every issue. That FSL fix might resolve longsleep/linux-pine64#44 though.

Owner

longsleep commented May 13, 2017

Yes - i doubt that the USB change does fix lock-ups which happen later. I will also merge your backport-fsl-errata.patch now after reading up on the issue. But as you probably use a Kernel with that patch already this also does not fix every issue. That FSL fix might resolve longsleep/linux-pine64#44 though.

@zador-blood-stained

This comment has been minimized.

Show comment
Hide comment
@zador-blood-stained

zador-blood-stained May 13, 2017

Yes - i doubt that the USB change does fix lock-ups which happen later.

The stack traces for the "stuck" kworker look too similar in both cases, so it looks like the same issue. And since I enabled a lot of debugging options for spinlocks and mutexes, each time HDMI lock was still held by disp_hdmi_enable() function.
Unfortunately it's still not clear what IRQs correspond to lines like el1_irq+0x84/0xec.

zador-blood-stained commented May 13, 2017

Yes - i doubt that the USB change does fix lock-ups which happen later.

The stack traces for the "stuck" kworker look too similar in both cases, so it looks like the same issue. And since I enabled a lot of debugging options for spinlocks and mutexes, each time HDMI lock was still held by disp_hdmi_enable() function.
Unfortunately it's still not clear what IRQs correspond to lines like el1_irq+0x84/0xec.

@longsleep

This comment has been minimized.

Show comment
Hide comment
@longsleep

longsleep May 13, 2017

Owner

I was able to reproduce a boot-up panic with a specific USB device connected. PR longsleep/linux-pine64#56 seems to fix that. If you can please try if that change also fixes your particular issue.

longsleep/linux-pine64#56 makes USB crash less often but it still crashes a lot on boot with "MOSART Semi. Rapoo 2.4G Wireless Touch Desktop" plugged in. Also the FSL fix does not help.

Owner

longsleep commented May 13, 2017

I was able to reproduce a boot-up panic with a specific USB device connected. PR longsleep/linux-pine64#56 seems to fix that. If you can please try if that change also fixes your particular issue.

longsleep/linux-pine64#56 makes USB crash less often but it still crashes a lot on boot with "MOSART Semi. Rapoo 2.4G Wireless Touch Desktop" plugged in. Also the FSL fix does not help.

@longsleep

This comment has been minimized.

Show comment
Hide comment
@longsleep

longsleep May 13, 2017

Owner

Btw, on Pinebook with exactly same Kernel - it works just fine every time.

Owner

longsleep commented May 13, 2017

Btw, on Pinebook with exactly same Kernel - it works just fine every time.

@zador-blood-stained

This comment has been minimized.

Show comment
Hide comment
@zador-blood-stained

zador-blood-stained May 14, 2017

@longsleep
Are you getting lockups with stack traces similar to posted previously with disp2 HDMI functions in them?

zador-blood-stained commented May 14, 2017

@longsleep
Are you getting lockups with stack traces similar to posted previously with disp2 HDMI functions in them?

@longsleep

This comment has been minimized.

Show comment
Hide comment
@longsleep

longsleep May 14, 2017

Owner

@longsleep
Are you getting lockups with stack traces similar to posted previously with disp2 HDMI functions in them?

@zador-blood-stained - Yes, very similar to pine64-lockup-debug3.txt - it has

[   39.838477] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:1:30]                       
[   39.851912] Modules linked in:                                                             
[   39.861726]                                                                                
[   39.869831] CPU: 0 PID: 30 Comm: kworker/0:1 Not tainted 3.10.105-- #35                    
[   39.883727] Workqueue: events start_work                                                   
[   39.894722] task: ffffffc078b52f80 ti: ffffffc078b54000 task.ti: ffffffc078b54000          
[   39.909764] PC is at __do_softirq+0xb4/0x2d8                                               
[   39.921341] LR is at __do_softirq+0x30/0x2d8 

and

[   44.313504] [<ffffffc000083dc0>] el1_irq+0x80/0xe4
[   44.323414] [<ffffffc00012584c>] __setup_irq+0x318/0x3e0
[   44.333885] [<ffffffc000125a8c>] request_threaded_irq+0xe0/0x124
[   44.345147] [<ffffffc00040f004>] disp_sys_register_irq+0x88/0x98
[   44.356431] [<ffffffc00041cf9c>] disp_hdmi_enable+0x1d4/0x278
[   44.367423] [<ffffffc000410d38>] disp_device_attached_and_enable+0x1bc/0x1d4
[   44.379876] [<ffffffc000410ef0>] bsp_disp_device_switch+0xbc/0xe4
[   44.391253] [<ffffffc000407d04>] start_work+0x174/0x1f0
[   44.401655] [<ffffffc0000cb784>] process_one_work+0x27c/0x42c
[   44.412623] [<ffffffc0000cc768>] worker_thread+0x208/0x320
[   44.423315] [<ffffffc0000d27f0>] kthread+0xb4/0xbc
[   44.433240] kworker/1:1     S ffffffc0000853b8     0  

and

   45.225365] [<ffffffc0000853b8>] __switch_to+0x7c/0x88                           [445/9673]
[   45.235455] [<ffffffc0007244f4>] __schedule+0x4fc/0x714
[   45.245628] [<ffffffc000724780>] schedule+0x74/0x7c
[   45.255409] [<ffffffc000722564>] schedule_timeout+0x34/0x27c
[   45.266012] [<ffffffc000723cbc>] wait_for_common+0x118/0x158
[   45.276588] [<ffffffc000723d24>] wait_for_completion+0x28/0x34
[   45.287325] [<ffffffc0000cb108>] flush_work+0xf8/0x11c
[   45.297312] [<ffffffc0000cccd4>] schedule_on_each_cpu+0xf8/0x124
[   45.308281] [<ffffffc00016c5f0>] lru_add_drain_all+0x1c/0x24
[   45.318875] [<ffffffc0001a4d54>] migrate_prep+0x14/0x20
[   45.328979] [<ffffffc000167d78>] alloc_contig_range+0xb8/0x26c
[   45.339729] [<ffffffc000493884>] dma_alloc_from_contiguous+0xa4/0x12c
[   45.351152] [<ffffffc0000928cc>] __dma_alloc_coherent+0xb0/0x118
[   45.362088] [<ffffffc000092a00>] __dma_alloc_noncoherent+0xcc/0x158
[   45.373319] [<ffffffc00019979c>] dma_pool_alloc+0xf0/0x1c4
[   45.383705] [<ffffffc0004ef388>] ehci_qh_alloc+0x4c/0xc4
[   45.393894] [<ffffffc0004f1408>] ehci_init+0x13c/0x3b8
[   45.403875] [<ffffffc0004f16a4>] sunxi_ehci_setup+0x20/0x38
[   45.414303] [<ffffffc0004de7a8>] usb_add_hcd+0x1c8/0x5a8
[   45.424417] [<ffffffc0004f5560>] sunxi_insmod_ehci+0x118/0x218
[   45.435096] [<ffffffc0004f56d8>] sunxi_usb_enable_ehci+0x78/0x88
[   45.445982] [<ffffffc00051144c>] usb_msg_center+0x88/0x104
[   45.456307] [<ffffffc00051057c>] usb_host_scan_thread+0x54/0x68
[   45.467110] [<ffffffc0000d27f0>] kthread+0xb4/0xbc

and

[   47.357995] [<ffffffc0000853b8>] __switch_to+0x7c/0x88
[   47.368085] [<ffffffc0007244f4>] __schedule+0x4fc/0x714
[   47.378228] [<ffffffc000724780>] schedule+0x74/0x7c
[   47.387959] [<ffffffc000722564>] schedule_timeout+0x34/0x27c
[   47.398562] [<ffffffc000723cbc>] wait_for_common+0x118/0x158
[   47.409169] [<ffffffc000723d24>] wait_for_completion+0x28/0x34
[   47.419962] [<ffffffc0000cb108>] flush_work+0xf8/0x11c
[   47.429992] [<ffffffc0000cccd4>] schedule_on_each_cpu+0xf8/0x124
[   47.440953] [<ffffffc00016c5f0>] lru_add_drain_all+0x1c/0x24
[   47.451515] [<ffffffc0001e5b24>] invalidate_bdev+0x30/0x4c
[   47.461872] [<ffffffc0002453b4>] ext4_put_super+0x264/0x2ec
[   47.472336] [<ffffffc0001b24d8>] generic_shutdown_super+0x68/0xd4
[   47.483396] [<ffffffc0001b27c0>] kill_block_super+0x30/0x7c
[   47.493872] [<ffffffc0001b2b44>] deactivate_locked_super+0x44/0x74
[   47.505016] [<ffffffc0001b2fb4>] deactivate_super+0x68/0x74
[   47.515443] [<ffffffc0001cdbd0>] mntput_no_expire+0x158/0x168
[   47.526039] [<ffffffc0001cef48>] SyS_umount+0x34c/0x36c

I have a rather reliable setup to reproduce this. With the new USB drivers it is less likely to trigger. I boot to initrd only (have simpleimage without rootfs). It just booted 4 times in a row without issue and then crashed twice in a row like this.

I am powering through euler and have HDMI connected (but that does not seem to matter). When i disconnect the USB Keyboard/Mouse dongle it never crashes. Also i can connect the dongler at any time later and it also does not crash.

Owner

longsleep commented May 14, 2017

@longsleep
Are you getting lockups with stack traces similar to posted previously with disp2 HDMI functions in them?

@zador-blood-stained - Yes, very similar to pine64-lockup-debug3.txt - it has

[   39.838477] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:1:30]                       
[   39.851912] Modules linked in:                                                             
[   39.861726]                                                                                
[   39.869831] CPU: 0 PID: 30 Comm: kworker/0:1 Not tainted 3.10.105-- #35                    
[   39.883727] Workqueue: events start_work                                                   
[   39.894722] task: ffffffc078b52f80 ti: ffffffc078b54000 task.ti: ffffffc078b54000          
[   39.909764] PC is at __do_softirq+0xb4/0x2d8                                               
[   39.921341] LR is at __do_softirq+0x30/0x2d8 

and

[   44.313504] [<ffffffc000083dc0>] el1_irq+0x80/0xe4
[   44.323414] [<ffffffc00012584c>] __setup_irq+0x318/0x3e0
[   44.333885] [<ffffffc000125a8c>] request_threaded_irq+0xe0/0x124
[   44.345147] [<ffffffc00040f004>] disp_sys_register_irq+0x88/0x98
[   44.356431] [<ffffffc00041cf9c>] disp_hdmi_enable+0x1d4/0x278
[   44.367423] [<ffffffc000410d38>] disp_device_attached_and_enable+0x1bc/0x1d4
[   44.379876] [<ffffffc000410ef0>] bsp_disp_device_switch+0xbc/0xe4
[   44.391253] [<ffffffc000407d04>] start_work+0x174/0x1f0
[   44.401655] [<ffffffc0000cb784>] process_one_work+0x27c/0x42c
[   44.412623] [<ffffffc0000cc768>] worker_thread+0x208/0x320
[   44.423315] [<ffffffc0000d27f0>] kthread+0xb4/0xbc
[   44.433240] kworker/1:1     S ffffffc0000853b8     0  

and

   45.225365] [<ffffffc0000853b8>] __switch_to+0x7c/0x88                           [445/9673]
[   45.235455] [<ffffffc0007244f4>] __schedule+0x4fc/0x714
[   45.245628] [<ffffffc000724780>] schedule+0x74/0x7c
[   45.255409] [<ffffffc000722564>] schedule_timeout+0x34/0x27c
[   45.266012] [<ffffffc000723cbc>] wait_for_common+0x118/0x158
[   45.276588] [<ffffffc000723d24>] wait_for_completion+0x28/0x34
[   45.287325] [<ffffffc0000cb108>] flush_work+0xf8/0x11c
[   45.297312] [<ffffffc0000cccd4>] schedule_on_each_cpu+0xf8/0x124
[   45.308281] [<ffffffc00016c5f0>] lru_add_drain_all+0x1c/0x24
[   45.318875] [<ffffffc0001a4d54>] migrate_prep+0x14/0x20
[   45.328979] [<ffffffc000167d78>] alloc_contig_range+0xb8/0x26c
[   45.339729] [<ffffffc000493884>] dma_alloc_from_contiguous+0xa4/0x12c
[   45.351152] [<ffffffc0000928cc>] __dma_alloc_coherent+0xb0/0x118
[   45.362088] [<ffffffc000092a00>] __dma_alloc_noncoherent+0xcc/0x158
[   45.373319] [<ffffffc00019979c>] dma_pool_alloc+0xf0/0x1c4
[   45.383705] [<ffffffc0004ef388>] ehci_qh_alloc+0x4c/0xc4
[   45.393894] [<ffffffc0004f1408>] ehci_init+0x13c/0x3b8
[   45.403875] [<ffffffc0004f16a4>] sunxi_ehci_setup+0x20/0x38
[   45.414303] [<ffffffc0004de7a8>] usb_add_hcd+0x1c8/0x5a8
[   45.424417] [<ffffffc0004f5560>] sunxi_insmod_ehci+0x118/0x218
[   45.435096] [<ffffffc0004f56d8>] sunxi_usb_enable_ehci+0x78/0x88
[   45.445982] [<ffffffc00051144c>] usb_msg_center+0x88/0x104
[   45.456307] [<ffffffc00051057c>] usb_host_scan_thread+0x54/0x68
[   45.467110] [<ffffffc0000d27f0>] kthread+0xb4/0xbc

and

[   47.357995] [<ffffffc0000853b8>] __switch_to+0x7c/0x88
[   47.368085] [<ffffffc0007244f4>] __schedule+0x4fc/0x714
[   47.378228] [<ffffffc000724780>] schedule+0x74/0x7c
[   47.387959] [<ffffffc000722564>] schedule_timeout+0x34/0x27c
[   47.398562] [<ffffffc000723cbc>] wait_for_common+0x118/0x158
[   47.409169] [<ffffffc000723d24>] wait_for_completion+0x28/0x34
[   47.419962] [<ffffffc0000cb108>] flush_work+0xf8/0x11c
[   47.429992] [<ffffffc0000cccd4>] schedule_on_each_cpu+0xf8/0x124
[   47.440953] [<ffffffc00016c5f0>] lru_add_drain_all+0x1c/0x24
[   47.451515] [<ffffffc0001e5b24>] invalidate_bdev+0x30/0x4c
[   47.461872] [<ffffffc0002453b4>] ext4_put_super+0x264/0x2ec
[   47.472336] [<ffffffc0001b24d8>] generic_shutdown_super+0x68/0xd4
[   47.483396] [<ffffffc0001b27c0>] kill_block_super+0x30/0x7c
[   47.493872] [<ffffffc0001b2b44>] deactivate_locked_super+0x44/0x74
[   47.505016] [<ffffffc0001b2fb4>] deactivate_super+0x68/0x74
[   47.515443] [<ffffffc0001cdbd0>] mntput_no_expire+0x158/0x168
[   47.526039] [<ffffffc0001cef48>] SyS_umount+0x34c/0x36c

I have a rather reliable setup to reproduce this. With the new USB drivers it is less likely to trigger. I boot to initrd only (have simpleimage without rootfs). It just booted 4 times in a row without issue and then crashed twice in a row like this.

I am powering through euler and have HDMI connected (but that does not seem to matter). When i disconnect the USB Keyboard/Mouse dongle it never crashes. Also i can connect the dongler at any time later and it also does not crash.

@longsleep

This comment has been minimized.

Show comment
Hide comment
@longsleep

longsleep May 19, 2017

Owner

I tested this in detail yesterday. It still can crash exactly like with even when powered at 5.2V via Euler. It never draws more than 400mA during bootup either.

Owner

longsleep commented May 19, 2017

I tested this in detail yesterday. It still can crash exactly like with even when powered at 5.2V via Euler. It never draws more than 400mA during bootup either.

@zador-blood-stained

This comment has been minimized.

Show comment
Hide comment
@zador-blood-stained

zador-blood-stained May 25, 2017

I did some more tests and compiled the kernel with debug info. Looks like it's actually stuck in a softirq, but it's relatively hard to debug since the stack trace is be incomplete in this case and I'm not sure if the info I got after applying an extra patch is correct

[   42.584359] Last softirq was rcu_process_callbacks+0x0/0x3f8

zador-blood-stained commented May 25, 2017

I did some more tests and compiled the kernel with debug info. Looks like it's actually stuck in a softirq, but it's relatively hard to debug since the stack trace is be incomplete in this case and I'm not sure if the info I got after applying an extra patch is correct

[   42.584359] Last softirq was rcu_process_callbacks+0x0/0x3f8
@Icenowy

This comment has been minimized.

Show comment
Hide comment
@Icenowy

Icenowy Mar 5, 2018

P.S. it seems that this behavior also occured on my SoPine w/ Baseboard, running mainline kernel w/ HDMI driver patched. Strange.

Icenowy commented Mar 5, 2018

P.S. it seems that this behavior also occured on my SoPine w/ Baseboard, running mainline kernel w/ HDMI driver patched. Strange.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment