Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
LXD Install Stalls #151
Doing an install of Canonical Kubernetes with localhost (LXD), the install process repeatedly stalls out. (Repeated twice with the default bundle, then again with cloning the repo to add/increase resources limits in bundle.yaml and leave the process running overnight. The clone repo is as of commit a585523)
In each case, the process will stall for hours with the kubernetes master node state remaining on "Installing" and after about 8 hours or so settling on "Rendering authentication templates". A screenshot of "juju status" is below, after roughly 24 hours install time.
Each time, the initial multi hour stall is after the master node logs show
Given eight hours or so, the nodes status will update to 'Rendering Templates' as in the screenshot above.
Running grep over the logs to check for errors gives
The entirety of the kubernetes master node logs are here - http://paste.ubuntu.com/23625873/
As troubleshooting steps undertaken so far, I've done a complete reinstall of Ubuntu 16.04.01, run "apt-get update && apt-get upgrade", installed the PPA's for LXD and Juju, updated their software and run through "lxd init" keeping the defaults before cloning the repo here to edit the bundle.yml and install with "juju deploy ./bundle.yml"
The title of the issue really had me going at first. I know that we have been testing on bare metal very aggressively (backed by MAAS) and to see its stalling gave me pause for alarm.
The LXD defaults are what's causing you trouble. There is a default, very strict set of apparmor profiles, and limited access bits that we haven't come up with a clear way to indicate to the end user this is the case.
What you can do, if you prefer to keep testing on LXD, is install the
The big thing that's happening behind the scenes, is conjure creates and alters the profile assigned to the LXDt container to allow privilege escalation it requires to run kubernetes.
If you want to know the exact bits that its tuning, the profile edits can be found in the spell
Here's the profile its actually using:
Let me know if this doesn't resolve your problem and I'm happy to hop in a hangout and do some real time troubleshooting to get you unblocked as quickly as possible.
Many thanks, that resolved it nicely - all installed well, and using conjure-up is IMO a much better user experience. I'd suggest it's worth changing the "Getting Started" video to use conjure-up as it is a far superior first run experience.
However, the documentation will need to show that users (currently) will have to install conjure-up from the PPA and not with the default (and current documentation) of "apt install", as the default version is one that is unable to find canonical-kubernetes to install at all.
A minor note worth adding in the documentation somewhere is that conjure-up will show an error about LXD not having been initialised and missing its network bridge if you have forgotten to add the user the command is running as to the lxc group. (Oops!)
Huge thanks for your time. Canonical Kubernetes is fantastic.
referenced this issue
Dec 15, 2016
Thanks for the feedback!
@castrojo It'd be good for us to re-do the introduction video with conjure-up
@battlemidget Where is conjure-up in the queue for updates for xenial?
This is a good point, we could add some validation to conjure-up which makes sure things like group is added, etc. I've opened conjure-up/conjure-up#521 to track this.