Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugging in Sensors prevents boot. #623

Closed
astr0n8t opened this issue May 3, 2016 · 53 comments
Closed

Plugging in Sensors prevents boot. #623

astr0n8t opened this issue May 3, 2016 · 53 comments

Comments

@astr0n8t
Copy link

astr0n8t commented May 3, 2016

System information:

  • I am using: LEGO MINDSTORMS EV3
  • I installed ev3dev using this image file: ev3-ev3dev-jessie-2015-12-30.img
  • My kernel version is: 3.16.7-ckt21-9-ev3dev-ev3
  • My host computer is running: Linux
  • I am connecting using: Bluetooth

About my issue:

Basically the ev3 will not boot if i keep my sensors and motors plugged in. This is kinda of annoying, but I was having an issue where the bluetooth wasn't starting, and then the ev3 was stuck on the ev3dev booting page and wouldn't go to brickman, so I unplugged everything and it booted. I'm not sure if this is a known bug or intended behavior but it is really annoying to me. I am going to test this more later, but I believe that it was the reason it wasn't booting, and in some cases just the bluetooth wouldn't show up oddly. This is odd because I was using it yesterday and booted it at least three times with everything plugged in, and it worked fine, so I am going to definitely test this because I was having issues with my computer's bluetooth setup which I have resolved now.

@dlech
Copy link
Member

dlech commented May 3, 2016

Which sensors and motors (model numbers) do you have plugged into which port?

Bluetooth not starting is a known problem (#572) and happens more often when more sensors are plugged in. I haven't ever had it not finish booting though.

@astr0n8t
Copy link
Author

astr0n8t commented May 3, 2016

I don't know the motor model numbers, but they are all Lego ones. I have a
touch sensor plugged in to port 1, gyro into 2, color into 3, and
ultrasonic into 4. With motors I have two large, one plugged into c and one
plugged into b. I am performing a upgrade on the ev3 to make sure there
wasnt a fix created but I will also check the thread you mentioned.

@astr0n8t
Copy link
Author

astr0n8t commented May 4, 2016

So, I have tried it at least 10 times, and I successfully booted it 4 times in a row with the sensors plugged in, but then twice it got stuck during boot, and just the two leds on each side of the center button blinked every second, but it was stuck on the ev3dev booting screen. But it seemed the first time it failed that it did not connect with bluetooth to my computer or something similar on the previous boot. It was assigned a IP address showing that it connected at some point, but when I looked at it the bluetooth did not have the black background on it. I would conclude it has to do with the bluetooth connecting automatically and the sensors causing the bluetooth to have issues, although I have not really tested too much without the sensors plugged in.

@rhempel
Copy link
Member

rhempel commented May 4, 2016

I think I have been seeing this lately as well with two of my EV3s - running the ckt-9 kernel right now. I'll try to see if the boot is more reliable with NO devices plugged in anywhere.

@ghost
Copy link

ghost commented May 6, 2016

Have you tried using an older version?

@astr0n8t
Copy link
Author

astr0n8t commented May 6, 2016

I downgraded the kernel because of a motor issue, and it hasn't not booted,
although I have had the Bluetooth not starting issue.

On Friday, May 6, 2016, markyi370 notifications@github.com wrote:

Have you tried using an older version?


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#623 (comment)

@dlech
Copy link
Member

dlech commented May 7, 2016

I have a suspicion that it has to do with this commit: ev3dev/lego-linux-drivers@fa0d807

In your original post, you say that the problem occurs with kernel version 3.16.7-ckt21-9-ev3dev-ev3, however you your last comment, you said "I downgraded the kernel because of a motor issue, and it hasn't not booted".

Can you confirm which kernel version has the problem and which does not?

Also, can you confirm that the issue only happens when EV3/UART sensors are plugged in, but not when any other type of sensor is plugged in?

If you have anything to say about Bluetooth, leave a comment in #572 and not here so that we don't confuse the two issues.

@dlech dlech added the bug label May 7, 2016
@astr0n8t
Copy link
Author

astr0n8t commented May 7, 2016

I'm a bit confused now that I think about it, I upgraded my kernel after
that and motors no longer worked so I downgraded but I'm not sure what I
downgraded to, I'm on mobile so I can't check it right now. I can't test
any other sensors either cause I only have lego ones.

On Friday, May 6, 2016, David Lechner <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I have a suspicion that it has to do with this commit:
ev3dev/lego-linux-drivers@fa0d807
ev3dev/lego-linux-drivers@fa0d807

In your original post, you say that the problem occurs with kernel version
3.16.7-ckt21-9-ev3dev-ev3, however you your last comment, you said "I
downgraded the kernel because of a motor issue, and it hasn't not booted".

Can you confirm which kernel version has the problem and which does not?

Also, can you confirm that the issue only happens when EV3/UART sensors
are plugged in, but not when any other type of sensor is plugged in?

If you have anything to say about Bluetooth, leave a comment in #572
#572 and not here so that we
don't confuse the two issues.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#623 (comment)

@astr0n8t
Copy link
Author

astr0n8t commented May 7, 2016

Okay, the kernel version is:

3.16.7-ckt21-9-ev3dev-ev3

So I must have upgraded to 3.16.7-ckt21-10-ev3dev-ev3, and then when I discovered that motors using python no longer worked I went back down to 3.16.7-ckt21-9-ev3dev-ev3. So it has always booted so far since doing that for some reason, although I haven't booted it many times since then. The bluetooth seems to be semi-reliable, and usually I leave it on the whole time when I'm working on a program, so a quick reboot if its not working is fine till it starts working. I will report back if the ev3 does not boot, but so far it seems to be good.

dlech added a commit to ev3dev/lego-linux-drivers that referenced this issue May 8, 2016
This reverts commit fa0d807.

It is possibly causing deadlocks when multiple EV3/UART sensors are
plugged in during boot.

Issue ev3dev/ev3dev#623
dlech added a commit to ev3dev/lego-linux-drivers that referenced this issue May 8, 2016
This reverts commit 0a6dc8e.

This is possibly causing deadlocks on boot.

Issue ev3dev/ev3dev#623
@dlech
Copy link
Member

dlech commented May 8, 2016

I have just released kernel v4.4.9-11-ev3dev-ev3. Please give it at try.

I was able to reproduce this problem when testing this kernel. Unfortunately, I was not able to get to the root cause. There seem to be some data corruption issues related to the UARTs (reminiscent of #47). However, I have done a number of things that seem to help.

  1. I reverted a couple of workqueue commits from kernel cycle 10. I was seeing some deadlocks related to workqueues. Not sure if it was these exactly, but I reverted them just to be safe.
  2. The Bluetooth drivers are now compiled into the kernel instead of being loaded as a module. This causes them to load much earlier in the boot sequence and it seems to improve things a bit.
  3. I fixed up the SoC tty drivers a bit. We were using patches from the official LEGO firmware. However, I'm not sure what they did was quite right. They were overriding the UART type as AR7. But, 16550A looks to be more appropriate. Furthermore, 16550A is does not have auto-flow control (UART_CAP_AFE) which is required for communicating with the Bluetooth chip. So I added a new UART type for this. ev3dev/ev3-kernel@9255afe (Also picked up a proposed mainline patch to replace part of LEGO's changes: ev3dev/ev3-kernel@9e754d8)

I still had a crash on shutdown once related to EV3/UART sensors (the line discipline driver) but haven't been able to repeat it. So, I am afraid that there are still some bugs lurking about, but I think this issue is "fixed" well enough, I hope.

@dlech
Copy link
Member

dlech commented May 10, 2016

Dang it. I just got a deadlock during boot of 4.4.9-11-ev3dev-ev3.

@kortschak
Copy link
Member

I have noticed similar problems, I think more often (only?) when devices are attached to the ev3. But I've just done a couple dozen boot cycles with the serial console attached (and many devices) and I have not had hang (this is after three hangs in the preceding hour­ - NXT temp sensor, 2 large motors, the LEGO IR and a medium motor).

@dlech
Copy link
Member

dlech commented May 12, 2016

I have noticed similar problems...

which kernel are you using?

@kortschak
Copy link
Member

3.16.7-ckt26-10-ev3dev-ev3 and 4.4.9-11-ev3dev-ev3 both showed this behaviour.

@bmegli
Copy link
Member

bmegli commented Jun 9, 2016

Setup:
-Linux ev3dev 4.4.9-11-ev3dev-ev3
-Ultrasonic and Gyro sensors in port 1 and 2

It occasionaly stops when booting.

Interesting lines from dmesg (when it succeeds!)
those lines are not present without sensors plugged in my experiments

[ 2.402921] serial8250: too much work for irq53
[    2.598897] serial8250: too much work for irq53
[    2.794045] serial8250: too much work for irq53
[    2.989177] serial8250: too much work for irq53
[    3.185087] serial8250: too much work for irq53
[    3.380985] serial8250: too much work for irq53
[    3.576130] serial8250: too much work for irq53
[    3.771258] serial8250: too much work for irq53
[    3.967129] serial8250: too much work for irq53
[    4.162236] serial8250: too much work for irq53
[    4.910667] ttyS1: 1 input overrun(s)
[    6.754650] ttyS1: 3 input overrun(s)
[    7.512047] serial8250_interrupt: 11 callbacks suppressed
[    7.516772] serial8250: too much work for irq53
[    7.713071] serial8250: too much work for irq53
[    7.910077] serial8250: too much work for irq53
[    8.105974] serial8250: too much work for irq53
[    8.301875] serial8250: too much work for irq53
[    8.497750] serial8250: too much work for irq53
[    8.693666] serial8250: too much work for irq53
[    8.888793] serial8250: too much work for irq53
[    9.084712] serial8250: too much work for irq53
[    9.280622] serial8250: too much work for irq53
[    9.482399] ttyS1: 1 input overrun(s)

#[...]

[   13.102983] serial8250_interrupt: 3 callbacks suppressed
[   13.107066] serial8250: too much work for irq53
[   13.303575] serial8250: too much work for irq53
[   13.499547] serial8250: too much work for irq53
[   13.695461] serial8250: too much work for irq53
[   13.890594] serial8250: too much work for irq53
[   14.086511] serial8250: too much work for irq53
[   14.281647] serial8250: too much work for irq53
[   14.477522] serial8250: too much work for irq53
[   14.672625] serial8250: too much work for irq53
[   14.867742] serial8250: too much work for irq53
[   18.457560] serial8250_interrupt: 16 callbacks suppressed
[   18.461727] serial8250: too much work for irq53
[   18.657644] serial8250: too much work for irq53
[   18.853545] serial8250: too much work for irq53
[   19.049429] serial8250: too much work for irq53
[   19.245357] serial8250: too much work for irq53
[   19.442005] serial8250: too much work for irq53
[   19.637983] serial8250: too much work for irq53
[   19.833906] serial8250: too much work for irq53
[   20.029903] serial8250: too much work for irq53
[   20.225062] serial8250: too much work for irq53
#[...]
[   23.771228] serial8250_interrupt: 16 callbacks suppressed
[   23.775403] serial8250: too much work for irq53
[   23.971342] serial8250: too much work for irq53
[   24.167508] serial8250: too much work for irq53
[   24.363088] serial8250: too much work for irq53
[   24.559083] serial8250: too much work for irq53
[   24.755000] serial8250: too much work for irq53
[   24.950883] serial8250: too much work for irq53
[   25.146026] serial8250: too much work for irq53
[   25.341186] serial8250: too much work for irq53
[   25.536342] serial8250: too much work for irq53
[   25.926385] systemd[1]: Started Journal Service.
[   26.325724] EXT4-fs (mmcblk0p2): re-mounted. Opts: errors=remount-ro
[   29.514846] systemd-udevd[107]: starting version 215
[   32.691103] random: nonblocking pool is initialized
[   32.789137] lego-port port0: Registered 'in1' on 'legoev3-ports'.
[   32.789967] lego-port port1: Registered 'in2' on 'legoev3-ports'.
[   32.790866] lego-port port2: Registered 'in3' on 'legoev3-ports'.
[   32.791697] lego-port port3: Registered 'in4' on 'legoev3-ports'.
[   32.792636] lego-port port4: Registered 'outA' on 'legoev3-ports'.
[   32.793581] lego-port port5: Registered 'outB' on 'legoev3-ports'.
[   32.950294] lego-port port6: Registered 'outC' on 'legoev3-ports'.
[   32.951379] lego-port port7: Registered 'outD' on 'legoev3-ports'.
[   33.340688] lego-port port0: Added new device 'in1:ev3-uart-host'
[   33.516522] ti_omapl_pru_suart ti_omapl_pru_suart.1: fw size 3772. downloading...
[   33.520692] ti_omapl_pru_suart.1: ttySU0 at MMIO 0x1d00000 (irq = 3, base_baud = 8250000) is a suart_tty
[   33.582530] ti_omapl_pru_suart.1: ttySU1 at MMIO 0x1d00000 (irq = 4, base_baud = 8250000) is a suart_tty
[   33.626532] ti_omapl_pru_suart ti_omapl_pru_suart.1: ti_omapl_pru_suart device registered(pru_clk=150000000, asp_clk=132000000)

#[...]

[   47.673803] Registered EV3 UART sensor line discipline. (29)
[   49.534774] lego-sensor sensor0: Registered 'lego-ev3-gyro' on 'in1'.
[   49.904757] lego-sensor sensor1: Registered 'lego-ev3-us' on 'in2'.

Serial8250 is responsible for UART in Input Port 1 and 2.

The interesting lines are those like this:

[    6.754650] ttyS1: 3 input overrun(s)

I don't know how overruns are handled in the code but if it happened after

[   47.673803] Registered EV3 UART sensor line discipline. (29)

And was unhandled it could lead to crash before recent @dlech fix:

ev3dev/lego-linux-drivers@2fefd7d

It would be interesting to see if it still crashes with most recent changes (kernel build from github master)

@bmegli
Copy link
Member

bmegli commented Jun 9, 2016

I tried it, it didn't help.

Setup:
-kernel from https://github.com/ev3dev/ev3-kernel (Latest commit 3aaf365 on 8 May)
-Ultrasonic and Gyro sensors in port 1 and 2

Still hanging occasionaly on boot.

@bmegli
Copy link
Member

bmegli commented Jun 9, 2016

This is interesting.

I am unable to reproduce this problem when the sensors are connected to port 3 and 4 (PRU SUARTS, software uarts, not hardware SoCs).

10/10 boots without a problem so far with Ultrasonic and gyro on port 3 and 4.

@bmegli
Copy link
Member

bmegli commented Jun 9, 2016

It would be intesting to blacklist some modules, e.g. ev3_uart_sensor_ld and check if we still get boot problems on port 1 and 2.

Unfortunately I have no more time today.

@dlech
Copy link
Member

dlech commented Jun 9, 2016

@bmegli, Thanks for the dmesg output. I have missed this because I usually have port 1 setup as serial debug. I think what is happening is this:

  • Input port 1 has tty line discipline attached in early boot for printing kernel messages
  • The sensor is sending data to input port 1 at a high rate - this is like someone typing random keys very, very fast on the serial tty (the gyro sensor produces data at the highest rate of all the EV3 sensors)
  • Since the kernel is very busy during boot, it easily gets overruns (hardware buffer is only 16 bytes)
  • Once we get to the point that systemd is starting, the kernel is much less busy and we stop getting errors.
  • Somewhere between Registered EV3 UART sensor line discipline. (29) and lego-sensor sensor0: Registered 'lego-ev3-gyro' on 'in1', the tty line discipline is replaced with the EV3 UART sensor line discipline.

I don't think blacklisting any modules will make any difference since the error are happening in early boot before modules are loaded. Long ago, I tried disabling the tty on input port 1, but it had a weird interaction with FIQ - see #47.

tty on input port 1 (ttyS1) needs to be disabled using the kernel command line. See /etc/default/flash-kernel and /usr/share/flash-kernel/bootscript/bootscr.ev3.

@dlech
Copy link
Member

dlech commented Jun 9, 2016

If anyone wants to try this before I get around to it, replace

console=${console}

with

console=tty0

in /usr/share/flash-kernel/bootscript/bootscr.ev3.

Be aware that this file will be written over when the flash-kernel package is updated, so it is just for testing, not a permanent solution.

@bmegli
Copy link
Member

bmegli commented Jun 9, 2016

@dlech

I suppose your explanation for messages is correct!

I am not sure however if this is what stops boot process.

I suppose that it's worth trying to disable tty on port 1 and check. Maybe tommorow (EV3 stayed at work...)

Alternatively we could:
-confirm that with sensors on ports 3 and 4 there are no problems (10/10 ok boots so far)
-check that port 2 is also safe
-make sure that plugging only to port 1 would hang on booting occasionally

@bmegli
Copy link
Member

bmegli commented Jun 9, 2016

I am unable to reproduce this problem when the sensors are connected to port 3 and 4 (PRU SUARTS, software uarts, not hardware SoCs).

I mean booting problem! ;-)

I don't think blacklisting any modules will make any difference since the error are happening in early boot before modules are loaded

Overruns yes - before. Do we know when booting hangs? (I don't have a serial cable so I can't see) I meant blacklisting modules to pinpoint boot problem.

@bmegli
Copy link
Member

bmegli commented Jun 9, 2016

@kortschak said

I have noticed similar problems, I think more often (only?) when devices are attached to the ev3. But I've just done a couple dozen boot cycles with the serial console attached (and many devices) and I have not had hang (this is after three hangs in the preceding hour­ - NXT temp sensor, 2 large motors, the LEGO IR and a medium motor).

@kortschak do you remember if those earlier hangs were with some sensor plugged to port 1 or also the serial cable?

@dlech
Copy link
Member

dlech commented Jun 9, 2016

Do we know when booting hangs? (I don't have a serial cable so I can't see) I meant blacklisting modules to pinpoint boot problem.

This is why this problem is so hard to fix. If we attach a serial cable, it doesn't hang - at lease since we fixed #662. Maybe we can change it to console=ttyS0 and attach a serial cable to input port 2 instead with LEGO EV3 Gyro sensor on input port 1.

@kortschak
Copy link
Member

do you remember if those earlier hangs were with some sensor plugged to port 1 or also the serial cable?

Never when serial cable is connected. This is the classic Heisenbug.

I'll have a chance to look at some of this this weekend. Sorry for the silence.

@bmegli
Copy link
Member

bmegli commented Jun 11, 2016

@dlech Have you tried debugging both to 1 and 2 at the same time?

console=ttyS1,some_baudrate console=ttyS0,some_baudrate

Is that possible?

I belive we can still debug to both ttyS1 and lcd (tty0?)

console=ttyS1,baudrate console=tty0

Maybe we could debug to both ttyS1 and ttyUSB0.

I don't fully understand what are the limitations here (from kernel.org docs).

@dlech
Copy link
Member

dlech commented Jun 11, 2016

Is that possible?

I think so, but in order to reproduce the problem, I have to have sensors on both port 1 and port 2, so no serial debugging.

I believe we can still debug to both ttyS1 and lcd (tty0?)

Yes, however, Heissenbug strikes again. I have just tried this an replacing console=${console} with console=${console} console=tty0 and it fixes the serial8250: too much work for irq53/serial8250_interrupt: 10 callbacks suppressed errors.

@bmegli
Copy link
Member

bmegli commented Jun 11, 2016

Hey, this bug is funny! ;-)

For my EV3 it is enough to hang with debug set to port 1 and gyro plugged to port 1.

I am searching the net, and original firmware EV3 also sometimes hangs at boot.

Edit: I am not sure I have tried with original firmware and UART sensor plugged to port 1 many times

@dlech
Copy link
Member

dlech commented Jun 11, 2016

I am searching the net, and original firmware EV3 also sometimes hangs at boot.

you should paste the links here

@bmegli
Copy link
Member

bmegli commented Jun 11, 2016

http://forums.usfirst.org/showthread.php?20835-EV3-lock-ups-and-rebooting

The interesting fragment:

And... the kids are ready to take a hammer to the EV3 after it failed to boot up 5 minutes before robot design judging at the QT today. It required a battery pull to get the thing started, which required some robot disassembly -- three-finger trick didn't seem to work as it hung during boot-up. It hadn't locked up in a month (see date of OP) until this morning. They got it back up and running again in the nick of time, and it worked out okay in the end.

I'll be calling Lego Ed tech support on Monday. It probably won't help immediately, but maybe they'll have an idea, and we'll see if they've heard any more about the issue.

So it probably with multiple sensors/motors in when booting

Edit: Another one:

It took three more weeks, but the same thing happened again -- this time at our meeting yesterday -- and I was able to snatch the robot and try the three-button reset myself (while carefully re-reading the instructions) before anyone ripped the robot apart.

The result: it doesn't work. The only way we could get it to boot up was to do a battery pull. So I'm finally getting around to opening a ticket with Lego Education support -- we'll see if they can help.

@dlech
Copy link
Member

dlech commented Jun 11, 2016

I am thinking about just changing the kernel command line for everyone. It does have the effect of showing one line of text under the boot logo. It also shows messages when shutting down, which is actually kind of good if you are wondering why shutdown is taking so long.

@bmegli
Copy link
Member

bmegli commented Jun 11, 2016

I am thinking about just changing the kernel command line for everyone. It does have the effect of showing one line of text under the boot logo. It also shows messages when shutting down, which is actually kind of good if you are wondering why shutdown is taking so long.

It could be good for troubleshooting boot problems for newcomers. Those that are unable to boot ev3dev at all. We can't even reproduce it.

@dlech
Copy link
Member

dlech commented Jun 11, 2016

Yes, I will have to change the boot logo and font size though. Right now, it only has enough room to show [ OK ] Starting A...

dlech added a commit to dlech/ev3dev-kernel that referenced this issue Jun 12, 2016
Since we are going to enable kernel messages on tty0, we need a smaller
font so that we can actually fit the messages on the screen. Also, we need
a smaller logo so there is room for the messages on the screen.

Issue ev3dev/ev3dev#623
dlech added a commit to dlech/ev3dev-kernel that referenced this issue Jun 14, 2016
Since we are going to enable kernel messages on tty0, we need a smaller
font so that we can actually fit the messages on the screen. Also, we need
a smaller logo so there is room for the messages on the screen.

Issue ev3dev/ev3dev#623
dlech added a commit to ev3dev/flash-kernel that referenced this issue Jun 14, 2016
Unfortunaly, this is done mainly to work around issues with hardware
UARTS on EV3 that we can't figure out.

The side effect is that console messages will print on the screen during
boot, which can be useful for troubleshooting.

Issue: ev3dev/ev3dev#623
dlech added a commit to ev3dev/ev3-kernel that referenced this issue Jun 14, 2016
Since we are going to enable kernel messages on tty0, we need a smaller
font so that we can actually fit the messages on the screen. Also, we need
a smaller logo so there is room for the messages on the screen.

Issue ev3dev/ev3dev#623
@bmegli
Copy link
Member

bmegli commented Jun 15, 2016

I think I may be getting to the root cause or at least we are no longer "flying blind".

Setup:
kernel: Linux ev3dev 4.4.9-11-ev3dev-ev3
sensors: gyro in input port 1

When the booting freezees it's possible to:

  • unplug the sensor
  • plug serial cable
  • connect with serial console
  • hit a few keys
  • the boot continues as it should

Here's dmesg output from such scenario (expires in 30 days):
http://pastebin.com/nukpE5Jk

Nothing obvious seems to be there. A lot of too much work for IRQ53.

It's possible that we pause the boot process with some random sequence from UART sensor and resume with serial cable. Now, the interesting part is that I am unable to reproduce it (pausing boot with break, scrool lock, ctrl+s doesn't work for me)

Other dmesg outputs were so flooded with too much work for IRQ53 that they were beginning with it (out of dmesg buffer space probably).

There may be also another possiblity if uart sensor sends l key accidentally in early boot phase. In such case uboot autoboot will be canceled but I don't think we are getting this problem. To recover from that it should be enough to type boot and hit enter.

I once had a scenario where I ended in Welcome to emergency mode! but I guess this was because of what I was pressing rather then the original problem.

Edit:

I once had a scenario where I ended in Welcome to emergency mode! but I guess this was because of what I was pressing rather then the original problem.

Hmm, if I could do it from the serial console then in theory also UART sensor could do it (if sending such data)

@dlech
Copy link
Member

dlech commented Jun 15, 2016

This is interesting because with the 12-ev3dev kernel, I have also had some unexplained lockups during shutdown. No error message or anything.

Please try upgrading flash-kernel as explained here. This will set the console to both ttyS1 and tty0. I'm wondering if we should just disable console on ttyS1 altogether.

@bmegli
Copy link
Member

bmegli commented Jun 15, 2016

Ok.

In what scenario did you have the lockups? UART sensor plugged to ttyS1 or something different?

For now, I was not setting console to both to pinpoint the boot problem (for which you have found workaround... but it's still good riddle)

@dlech
Copy link
Member

dlech commented Jun 15, 2016

Even with console on both ttyS1 and tty0 and LEGO EV3 Gyro sensor on ttyS1 I get occasional lockup on shutdown but not on startup.

dlech added a commit to ev3dev/flash-kernel that referenced this issue Jun 17, 2016
This is still causing problems with UART sensors on input port 1, so
disable it completely by default

Issue ev3dev/ev3dev#623
@bmegli
Copy link
Member

bmegli commented Jun 17, 2016

Even with console on both ttyS1 and tty0 and LEGO EV3 Gyro sensor on ttyS1 I get occasional lockup on shutdown but not on startup.

I haven't been able to reproduce lockup on shutdown but I can shed some light on bootup. Maybe shutdown is in some way similiar.

Setup:
kernel: v4.4.13-12-ev3dev-ev3
sensors: gyro in input port 1 eventually replaced by serial debug
console: only ttyS1 (without tty0)

Up to some point it's possible to recover by unplugging gyro, plugging serial and typing something.
Those boots that are fatal (waited too long) look somewhat like that:

[  OK  ] Mounted FUSE Control File System.
[  OK  ] Mounted Configuration File System.
[  OK  ] Started Load/Save Random Seed.
[  OK  ] Started Apply Kernel Variables.
[  OK  ] Started Create Static Device Nodes in /dev.
[ TIME ] Timed out waiting for device dev-mmcblk0p1.device.
[DEPEND] Dependency failed for /boot/flash.
[DEPEND] Dependency failed for Local File Systems.
[DEPEND] Dependency failed for File System Check on /dev/mmcblk0p1.
[  OK  ] Stopped LEGO MINDSTORMS EV3 LEDs.
[  OK  ] Stopped target Graphical Interface.
[  OK  ] Stopped Brick Manager.
[  OK  ] Stopped target Multi-User System.
[  OK  ] Stopped Enable support for additional executable binary formats.
[  OK  ] Stopped OpenBSD Secure Shell server.
[  OK  ] Stopped Login Service.
[  OK  ] Stopped LSB: start Samba NetBIOS nameserver (nmbd).
[  OK  ] Stopped LSB: Autogenerate and use a swap file.
[  OK  ] Stopped LSB: Start NTP daemon.
[  OK  ] Stopped Avahi mDNS/DNS-SD Stack.
[  OK  ] Closed Avahi mDNS/DNS-SD Stack Activation Socket.
[  OK  ] Stopped D-Bus System Message Bus.
[  OK  ] Closed D-Bus System Message Bus Socket.
         Starting Connection service...
[  OK  ] Stopped Permit User Sessions.
         Starting Set console font and keymap...
[  OK  ] Reached target Remote File Systems.
         Starting Trigger Flushing of Journal to Persistent Storage...
[  OK  ] Stopped /etc/rc.local Compatibility.
[  OK  ] Stopped Turn on global VT cursor.
[  OK  ] Stopped getty on tty2-tty6 if dbus and logind are not available.
[  OK  ] Reached target Login Prompts.
[  OK  ] Stopped USB Gadget for LEGO MINDSTORMS EV3 hardware.
[  OK  ] Stopped target Basic System.
[  OK  ] Reached target Timers.
[  OK  ] Reached target Sockets.
[  OK  ] Stopped target System Initialization.
         Starting Restore Sound Card State...
         Starting Create Volatile Files and Directories...
         Starting Emergency Shell...
[  OK  ] Started Emergency Shell.
[  OK  ] Reached target Emergency Mode.
         Starting udev Kernel Device Manager...
[...]

The timeout of dev-mmcblk0p1.device. seems to be fatal here. The reason of timeout seems to be too much work for IRQ...

The award winning boot that I was still able to recover from (by unplugging gyro and plugging serial) had this fragment:

[  233.322081] serial8250: too much work for irq53
[  233.322667] ev3_uart_receive_buf: 3830 callbacks suppressed
[  233.322715] ttyS1: buffer overrun

So it seems that if too much work for irq53 happens in the wrong place it doesn't allow booting to proceed (for some reason) and eventually dev-mmcblk0p1.device times out.

@dlech
Copy link
Member

dlech commented Jun 17, 2016

Today, I have updated the flash-kernel package so that we only use console=tty1 by default, nothing on serial, so we can see if that makes a difference.

@dlech
Copy link
Member

dlech commented Jul 11, 2016

FWIW, in the official LEGO firmware, they disable the interrupt on this port, I imagine to work around the same problems we are seeing here. https://github.com/mindboards/ev3sources/blob/master/lms2012/lms2012/source/lms2012.h#L155

dlech added a commit to ev3dev/ev3-kernel that referenced this issue Jul 24, 2016
Since we are going to enable kernel messages on tty0, we need a smaller
font so that we can actually fit the messages on the screen. Also, we need
a smaller logo so there is room for the messages on the screen.

Issue ev3dev/ev3dev#623
@dlech
Copy link
Member

dlech commented Jul 24, 2016

I think we have this mostly sorted out. Please try with the latest nightly image build.

Download

dlech added a commit to ev3dev/ev3-kernel that referenced this issue Aug 15, 2016
Since we are going to enable kernel messages on tty0, we need a smaller
font so that we can actually fit the messages on the screen. Also, we need
a smaller logo so there is room for the messages on the screen.

Issue ev3dev/ev3dev#623
dlech added a commit to ev3dev/ev3-kernel that referenced this issue Sep 4, 2016
Since we are going to enable kernel messages on tty0, we need a smaller
font so that we can actually fit the messages on the screen. Also, we need
a smaller logo so there is room for the messages on the screen.

Issue ev3dev/ev3dev#623
@dlech
Copy link
Member

dlech commented Oct 11, 2016

No one has said this is still a problem, so closing.

If you notice a similar problem, please open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants