Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kolibri service is timing out on boot on Ubuntu 18.04 (and possibly other OS's too?) #1489

Closed
holta opened this issue Feb 11, 2019 · 23 comments
Labels
Milestone

Comments

@holta
Copy link
Member

holta commented Feb 11, 2019

@arky have you seen this issue with Kolibri 0.11.1's service failing to start on reboots?

root@box:~# systemctl status kolibri
● kolibri.service - Kolibri
Loaded: loaded (/etc/systemd/system/kolibri.service; enabled; vendor preset: enabled)
Active: failed (Result: timeout) since Sun 2019-02-10 15:29:01 EST; 15h ago

Feb 10 15:27:29 box.lan systemd[1]: Starting Kolibri...
Feb 10 15:28:08 box.lan kolibri[311]: INFO Option HTTP_PORT in section [Deployment] being overridden by environment variable KOLIBRI_HTTP_PORT
Feb 10 15:28:08 box.lan kolibri[311]: INFO Option URL_PATH_PREFIX in section [Deployment] being overridden by environment variable KOLIBRI_URL_PATH_PREFIX
Feb 10 15:28:43 box.lan kolibri[311]: INFO Running Kolibri with the following settings: kolibri.deployment.default.settings.base
Feb 10 15:28:59 box.lan systemd[1]: kolibri.service: Start operation timed out. Terminating.
Feb 10 15:29:01 box.lan systemd[1]: kolibri.service: Failed with result 'timeout'.
Feb 10 15:29:01 box.lan systemd[1]: Failed to start Kolibri.

This seems to be happening on all Ubuntu 18.04 machines/VMs, and possibly others too?

(The kolibri service can be started manually. But this isn't good enough for day-to-day usage :-)

PS this did not used to happen in prior months!

@holta holta added the bug label Feb 11, 2019
@holta holta added this to the 6.7 milestone Feb 11, 2019
@holta holta changed the title Kolibri service is timing out on boot on Ubuntu 18.04 (and possibly others too?) Kolibri service is timing out on boot on Ubuntu 18.04 (and possibly other OS's too?) Feb 11, 2019
@holta
Copy link
Member Author

holta commented Feb 11, 2019

Confusing: this does not happen every time I/we reboot a BIG-sized IIAB on Ubuntu 18.04 VMs.

It does not seem to happen on BIG-sized Debian 9.7.

Worst case if we cannot track down the root cause, we'll need to document this in IIAB 6.7's Known Issues.

@holta
Copy link
Member Author

holta commented Feb 11, 2019

PATTERN: "systemctl status kolibri" consistently shows failed (Result: timeout) on Ubuntu Server 18.04.1 VM's after the 1st reboot, i.e. after IIAB installation has completed.

Whereas on the 2nd, 3rd, 4th and 5th reboots (etc) it seems to always work! Also interesting is that the 'kolibri' service shows "active (running)" very rapidly, within about 1 minute of booting.

This pattern is consistent with many results over the last week (if not longer) across many Ubuntu 18.04 VM's -- failure of the 'kolibri' service during the 1st reboot alone.

I don't know if other OS's/HW are affected.

Ideas ?

@arky
Copy link
Contributor

arky commented Feb 11, 2019

@holta Can you please attach any logs your find under /library/kolibri/*log and also syslog. That will help me find a solution.

@holta
Copy link
Member Author

holta commented Feb 11, 2019

@holta Can you please attach any logs your find under /library/kolibri/*log and also syslog. That will help me find a solution.

Thanks @arky: plz see these 4 pastebins from the most recent Ubuntu Server 18.04.1 VM — where BIG-sized http://d.iiab.io/6.7 was just installed 1-2 hours ago — before it was rebooted 5 times to try to understand this Kolibri issue:

root@box:~# journalctl -u kolibri | nc termbin.com 9999
https://termbin.com/fgd1e

root@box:~# cat /library/kolibri/kolibri.log | nc termbin.com 9999
https://termbin.com/9037

root@box:~# cat /library/kolibri/server.log | nc termbin.com 9999
https://termbin.com/lej0

root@box:~# grep -i kolibri /var/log/syslog | nc termbin.com 9999
https://termbin.com/mfj5

@holta
Copy link
Member Author

holta commented Feb 13, 2019

fyi problem has not been seen on Raspbian on Raspberry Pi.

One unproven theory is that Calibre might be colliding with Kolibri, to cause this Kolibri service timeout (#1489) during the 1st boot only. More testing is needed and ongoing, thanks all!

@arky
Copy link
Contributor

arky commented Feb 13, 2019

@holta Thanks! Unfortunately the solution for this kolibri service timeout evaded me so far.

Can you add "--debug" command line option to kolibri.service find more debugging output when the kolibri fails to start on first boot.

@holta holta modified the milestones: 6.7, 7.0 Feb 13, 2019
@holta
Copy link
Member Author

holta commented Feb 13, 2019

Can you add "--debug" command line option to kolibri.service find more debugging output when the kolibri fails to start on first boot.

Where does --debug belong in https://github.com/iiab/iiab/blob/master/roles/kolibri/templates/kolibri.service.j2 or at the command-line?

@arky
Copy link
Contributor

arky commented Feb 14, 2019

@holta You can just appended '--debug' in the ExecStart in the /etc/systemd/system/kolibri.service file directly.

ExecStart=/usr/local/bin/kolibri start --debug

@holta
Copy link
Member Author

holta commented Feb 14, 2019

@holta You can just appended '--debug' in the ExecStart in the /etc/systemd/system/kolibri.service file directly.

ExecStart=/usr/local/bin/kolibri start --debug

Do you recommend we turn this on in general for now?

@arky
Copy link
Contributor

arky commented Feb 15, 2019

No, I think it is good idea to find out what's causing these failures and patch them.

@holta
Copy link
Member Author

holta commented Mar 5, 2019

Symptoms are very similar...but not quite the same as?

#1094 Kolibri did not start within 2MIN timeout during BIG install on RPi 3

@holta
Copy link
Member Author

holta commented Jun 17, 2019

@KW4NP do you recall ever witnessing this problem?

(Or perhaps it's something unique to Ubuntu 18.04 & similar?)

@KW4NP
Copy link

KW4NP commented Jun 17, 2019 via email

@holta
Copy link
Member Author

holta commented Jun 17, 2019

What happens when you try to run it manually?

Manually starting it works on Debian 10 [pre-release] installed 2019-06-15 as seen here: (where a BIG-sized http://d.iiab.io was installed, as documented @ #1387 (comment))

root@box:~# /usr/local/kolibri/bin/kolibri --version
0.12.3

root@box:~# systemctl status kolibri
● kolibri.service - Kolibri
   Loaded: loaded (/etc/systemd/system/kolibri.service; enabled; vendor preset: enabled)
   Active: failed (Result: timeout) since Sat 2019-06-15 04:13:25 EDT; 1 day 21h ago
  Process: 285 ExecStart=/usr/local/kolibri/bin/kolibri start (code=killed, signal=TERM)

Jun 15 04:11:56 box.lan systemd[1]: Starting Kolibri...
Jun 15 04:13:17 box.lan kolibri[285]: INFO     Option HTTP_PORT in section [Deployment] being overridden by environment variable KOLIBRI_HTTP_PORT
Jun 15 04:13:17 box.lan kolibri[285]: INFO     Option URL_PATH_PREFIX in section [Deployment] being overridden by environment variable KOLIBRI_URL_PATH_PREFIX
Jun 15 04:13:24 box.lan systemd[1]: kolibri.service: Start operation timed out. Terminating.
Jun 15 04:13:25 box.lan systemd[1]: kolibri.service: Control process exited, code=killed, status=15/TERM
Jun 15 04:13:25 box.lan systemd[1]: kolibri.service: Failed with result 'timeout'.
Jun 15 04:13:25 box.lan systemd[1]: Failed to start Kolibri.

root@box:~# systemctl start kolibri

root@box:~# systemctl status kolibri
● kolibri.service - Kolibri
   Loaded: loaded (/etc/systemd/system/kolibri.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2019-06-17 01:21:01 EDT; 2s ago
  Process: 28804 ExecStart=/usr/local/kolibri/bin/kolibri start (code=exited, status=0/SUCCESS)
 Main PID: 28812 (kolibri)
    Tasks: 57 (limit: 2359)
   Memory: 142.1M
   CGroup: /system.slice/kolibri.service
           └─28812 /usr/local/kolibri/bin/python2 /usr/local/kolibri/bin/kolibri start

Jun 17 01:21:01 box.lan kolibri[28804]: Operations to perform:
Jun 17 01:21:01 box.lan kolibri[28804]:   Apply all migrations: admin, analytics, auth, content, contenttypes, device, discovery, exams, kolibriauth, lessons, logger, m
Jun 17 01:21:01 box.lan kolibri[28804]: Running migrations:
Jun 17 01:21:01 box.lan kolibri[28804]:   No migrations to apply.
Jun 17 01:21:01 box.lan kolibri[28804]: Operations to perform:
Jun 17 01:21:01 box.lan kolibri[28804]:   Apply all migrations: admin, analytics, auth, content, contenttypes, device, discovery, exams, kolibriauth, lessons, logger, m
Jun 17 01:21:01 box.lan kolibri[28804]: Running migrations:
Jun 17 01:21:01 box.lan kolibri[28804]:   No migrations to apply.
Jun 17 01:21:01 box.lan kolibri[28804]: Installed 2 object(s) from 1 fixture(s)
Jun 17 01:21:01 box.lan systemd[1]: Started Kolibri.

Then again the kolibri service (often!) starts directly on Debian 10 boot: (so why it had a TIMEOUT failure above I don't know?)

root@box:~# systemctl status kolibri
● kolibri.service - Kolibri
   Loaded: loaded (/etc/systemd/system/kolibri.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2019-06-17 01:25:16 EDT; 11s ago
  Process: 283 ExecStart=/usr/local/kolibri/bin/kolibri start (code=exited, status=0/SUCCESS)
 Main PID: 1655 (kolibri)
    Tasks: 57 (limit: 2359)
   Memory: 147.4M
   CGroup: /system.slice/kolibri.service
           └─1655 /usr/local/kolibri/bin/python2 /usr/local/kolibri/bin/kolibri start

Jun 17 01:25:15 box.lan kolibri[283]: Operations to perform:
Jun 17 01:25:15 box.lan kolibri[283]:   Apply all migrations: admin, analytics, auth, content, contenttypes, device, discovery, exams, kolibriauth, lessons, logger, mor
Jun 17 01:25:15 box.lan kolibri[283]: Running migrations:
Jun 17 01:25:15 box.lan kolibri[283]:   No migrations to apply.
Jun 17 01:25:15 box.lan kolibri[283]: Operations to perform:
Jun 17 01:25:15 box.lan kolibri[283]:   Apply all migrations: admin, analytics, auth, content, contenttypes, device, discovery, exams, kolibriauth, lessons, logger, mor
Jun 17 01:25:15 box.lan kolibri[283]: Running migrations:
Jun 17 01:25:15 box.lan kolibri[283]:   No migrations to apply.
Jun 17 01:25:15 box.lan kolibri[283]: Installed 2 object(s) from 1 fixture(s)
Jun 17 01:25:16 box.lan systemd[1]: Started Kolibri.

root@box:~# cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

@arky may know more?

(We really need to re-test on a brand new install of http://d.iiab.io on a fresh copy of Ubuntu 18.04.2 4 months later, with Kolibri 0.12.3...or 0.12.5 in a few days!)

Ref: #1736

@KW4NP
Copy link

KW4NP commented Jun 17, 2019 via email

@holta
Copy link
Member Author

holta commented Jun 17, 2019

Thanks @KW4NP

Absolute worst case we will keep monitoring this in the IIAB 7.1 release cycle (beginning very shortly in coming weeks) as we close out IIAB 7.0 very shortly in coming days now: https://github.com/iiab/iiab/milestone/5

@holta holta modified the milestones: 7.0, 7.1 Jun 27, 2019
@holta
Copy link
Member Author

holta commented Jul 10, 2019

@kananigit as you reconfirm Kolibri's failure to start after adding channels (#1648) keep an eye on this issue too (#1489) that might possibly be related?

@ITGuyClaude
Copy link

I also had this issue on Ubuntu 18.04, PI 3 (Stretch) and PI4. Changing the Kolibri service to the below resolved it for me.

pi@box:/etc/systemd/system $ cat kolibri.service
[Unit]
Description=Kolibri

[Service]
Type=oneshot
RemainAfterExit=yes
User=pi
ExecStart=/usr/local/kolibri/bin/kolibri start
ExecStop=/usr/local/kolibri/bin/kolibri stop

[Install]
WantedBy=multi-user.target

@holta
Copy link
Member Author

holta commented Aug 13, 2019

@ITGuyClaude thanks for writing here and on #1094!

Can you explain why you removed these 7 lines? (in the [Service] section of /etc/systemd/system/kolibri.service)

Type=forking
...
Environment=KOLIBRI_USER=kolibri
Environment=KOLIBRI_HOME=/library/kolibri
Environment=KOLIBRI_HTTP_PORT=8009
Environment=KOLIBRI_URL_PATH_PREFIX=/kolibri
User=kolibri
Group=www-data

And replaced them with these 2 lines?

Type=oneshot
...
User=pi

Aside: what user did you use on Ubuntu 18.04, where the user pi does not exist?

@holta holta modified the milestones: 7.1, 7.0 Aug 13, 2019
@ITGuyClaude
Copy link

On Ubuntu everything IIAB used my "default" login which is claude, on the pi it is pi and they own the relevant directories.

I removed the below entries because I wanted to narrow out noise like the below in order to make troubleshooting easier, removed entries like port when I know Kolibri is running on 8080 and the below settings are being overwritten anyway as per the below making them redundant.

Aug 16 05:13:17 box.lan kolibri[285]: INFO Option HTTP_PORT in section [Deployment] being overridden by environment variable KOLIBRI_HTTP_PORT
Aug 16 05:13:17 box.lan kolibri[285]: INFO Option URL_PATH_PREFIX in section [Deployment] being overridden by environment variable KOLIBRI_URL_PATH_PREFIX

In kolbiri.service

Environment=KOLIBRI_USER=kolibri
Environment=KOLIBRI_HOME=/library/kolibri
Environment=KOLIBRI_HTTP_PORT=8009
Environment=KOLIBRI_URL_PATH_PREFIX=/kolibri

By default my IIAB 7.0 starts kolibri on port 8080

As for Type=oneshot, not sure why it works but it does for me.

@ITGuyClaude
Copy link

Kolibri is starting with the following settings: kolibri.deployment.default.settings.base and then overrides the Environment=KOL* settings in kolibri.service

pi@box:/library $ systemctl status kolibri.service
● kolibri.service - Kolibri
Loaded: loaded (/etc/systemd/system/kolibri.service; enabled; vendor preset: enabled)
Active: active (exited) since Tue 2019-08-13 20:23:54 SAST; 15min ago
Process: 25172 ExecStart=/usr/local/kolibri/bin/kolibri start (code=exited, status=0/SUCCESS)
Main PID: 25172 (code=exited, status=0/SUCCESS)
Tasks: 80 (limit: 4915)
Memory: 64.9M
CGroup: /system.slice/kolibri.service
└─25179 /usr/local/kolibri/bin/python2 /usr/local/kolibri/bin/kolibri start

Aug 13 20:23:37 box.lan kolibri[25172]: INFO Running Kolibri with the following settings: kolibri.deployment.default.settings.base
Aug 13 20:23:54 box.lan kolibri[25172]: INFO Sqlite database Vacuum finished.
Aug 13 20:23:54 box.lan kolibri[25172]: INFO Running 'kolibri start' as daemon (system service)
Aug 13 20:23:54 box.lan kolibri[25172]: INFO Kolibri running on:
Aug 13 20:23:54 box.lan kolibri[25172]:
Aug 13 20:23:54 box.lan kolibri[25172]: http://127.0.0.1:8080/
Aug 13 20:23:54 box.lan kolibri[25172]: http://192.168.0.114:8080/
Aug 13 20:23:54 box.lan kolibri[25172]: http://172.18.96.1:8080/
Aug 13 20:23:54 box.lan kolibri[25172]: INFO Going to daemon mode, logging to /home/pi/.kolibri/logs/kolibri.txt
Aug 13 20:23:54 box.lan systemd[1]: Started Kolibri.

@holta holta modified the milestones: 7.0, 7.1 Sep 30, 2019
@holta
Copy link
Member Author

holta commented Oct 21, 2019

Can we close this now, if indeed Kolibri's service is no longer timing out on boot?

@holta
Copy link
Member Author

holta commented Nov 1, 2019

Doesn't seem to be happening or reproducible. Please re-open if facts-on-the-ground change!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants