Skip to content
This repository has been archived by the owner on Mar 4, 2024. It is now read-only.

Containerize build using LXD #92

Merged
merged 5 commits into from
Feb 23, 2017
Merged

Containerize build using LXD #92

merged 5 commits into from
Feb 23, 2017

Conversation

johnsca
Copy link
Contributor

@johnsca johnsca commented Feb 19, 2017

This is a pretty significant refactor, obviously. I'd really like to see all of the logic not directly related to managing the LXD image, Jenkins jobs, and Juju config (and possibly the release logic) moved in to the underlying tooling (cwr, bundletester, matrix). Specifically, I think we need a well-defined way of providing general override information for bundles for the purposes of testing. This would need to cover not just overriding specific charms with other revs or builds from repos, but also things like adding a testing specific charm, overriding the default number of units, etc. Having all of that in the tooling would make the charm much simpler.

In the meantime, we might consider moving much of the logic into the cwrbox image. It would allow us to push out updates to the logic in the container that would be picked up on the next build (unless a given deployment was using a locally attached resource version of the cwrbox image, in which case it would be manual for that deployment).

On the point of the image source, manually hosting the tarball in S3 was the quickest way to have it work out of the box, but is less than ideal. Ideally, we could run a public LXD remote server, but that would require more resources, a domain, and I'm not sure how to—or if you even can—lock down all operations other than copying images from it. I also looked in to running a simplestreams host for the images, which would be read-only out of the box, but that requires repackaging the image that gets exported (because simplestreams doesn't support unified images and only supports xz compression), and we'd still need to host that.

@ktsakalozos
Copy link
Contributor

Nice work! It is a lot of work, I wish we could have done it in more steps so it would be easier.

In any case, I have taken it for a spin in aws and lxd, here are the errors I got: http://pastebin.ubuntu.com/24034264/ and http://pastebin.ubuntu.com/24033990/

Is noble-spider your pet? :)

@johnsca
Copy link
Contributor Author

johnsca commented Feb 20, 2017

@ktsakalozos Ah, I missed that lxd init would need to be run when deploying on a fresh machine / VM. I also improved the job console output by turning off script debugging, adding some additional informational echos, and ensuring that set -e is always on.

@ktsakalozos
Copy link
Contributor

I removed the old cwr subordinate and added the new one and got the follwoing error:
http://pastebin.ubuntu.com/24039183/

Then I logged in jenkins and did a lxc image remove cwrbox
After resolving the above error I got this one:
http://pastebin.ubuntu.com/24039207/

On a clean install of jenkins+cwr on lxd:
http://pastebin.ubuntu.com/24039408/

When deployed on a new image, the LXD storage pool won't be configured.
The charm needs to ensure that `lxd init` is run to do so.  If deployed
on a localhost/lxd provider with an already initialized LXD, the charm
should continue gracefully.

Also turned off script debugging and added additional echos to improve
the job console log.

Also ensure that immediate exit on any error is enabled for all jobs by
setting it at the top of cwr-helpers.sh.
@johnsca
Copy link
Contributor Author

johnsca commented Feb 21, 2017

I rebased against master and fixed the NoneType exception (run_as doesn't pass through kwargs like I thought it did).

The second failure is somewhat expected; if you delete the image you'll also need to remove the signature file from /var/lib/jenkins/cwrbox.tar.gz.sig or the hash value from unitdata to get it to re-import the image. However, it looks like the set -e is not working for some reason, and that's a significant issue but I can't see any possible cause.

The last error I can't replicate, likely because I'm using ZFS for my LXD storage. I'll try to replicate by bootstrapping Juju with LXD on an Amazon instance, but any debugging you can do on your end would be appreciated.

@johnsca
Copy link
Contributor Author

johnsca commented Feb 21, 2017

This seems to be the issue with -e: https://stackoverflow.com/questions/4072984/set-e-in-a-function

When using directory-backed storage for LXD, the perms require that the
containers be marked as privileged.  We were already mapping the
container's root user to the charm's jenkins user, so we don't get any
additional security from unprivileged containers anyway.
@johnsca
Copy link
Contributor Author

johnsca commented Feb 21, 2017

All of the issues that @ktsakalozos hit are resolved now.

@kwmonroe
Copy link
Contributor

kwmonroe commented Feb 21, 2017

This is working great for me.. I tested with cwr-52 and ran a charm and bundle job concurrently. Watching ps on the jenkins unit, i saw multiple cwr processes with multiple containers being active. This is a huge improvement -- previously 2 simultaneous cwr processes had a high likelihood of stomping each other's system-level deps.

I really want to push the merge button because i'm that excited about this. However, I'll let @ktsakalozos do it so he can verify his earlier comments have been addressed in cwr-52.

+1, lgtm.

@kwmonroe
Copy link
Contributor

kwmonroe commented Feb 22, 2017

Nooooo! I spoke too soon.. Bundle job finished clean, but charm job hit a connection timed out :(

http://juju.does-it.net:8081/job/charm_openjdk_in_cs__kwmonroe_bundle_java_devenv/6/consoleFull

Edit: seemed to be a transient issue. Re-running both jobs succeeded. I retract my "Noooooo", but I would like to see the connection timed out issue handled better.

This was from a previous attempt to manage networking with an older
version of lxd.
@johnsca
Copy link
Contributor Author

johnsca commented Feb 22, 2017

@kwmonroe The timeout seems to be from deployer connecting to the API in the middle of a test run (during "reset") so doesn't seem related to this PR. It also seems to have cleared up on a subsequent run.

@lazypower
Copy link

This looks super cool but travis seems to hate it :(

install_sources:
description: PPAs from which to install LXD and Juju
type: string
default: |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have a dumb question. Why use the apt packages over the snaps? It seems like a lot of tooling isn't going to be maintained in debs anymore... I cite:

  • charm-tools
  • conjure-up

as two candidates in question. Are we signing up for pain later not integrating with snaps out of the gate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had run in to issues with the snaps during development before I found out about the squashfuse work-around. It would probably be good to switch to snaps where possible, though snaps do make the restricted network story more complicated. Is there a way to run a snap mirror similar to an apt mirror?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnsca
Copy link
Contributor Author

johnsca commented Feb 22, 2017

@chuckbutler The Travis failures are due to an upstream packaging issue with libcharmstore when installing charm-tools on trusty. We're waiting on @marcoceppi to resolve that. I tried to use the snap, but that failed due to this issue. I'd like it if we could figure out a way to use the snap in Travis but I have no idea how to proceed there.

@@ -54,7 +53,7 @@ def add_job():
branch = "*/master"
elif repo_access == 'poll':
trigger = TRIGGER_PERIODICALLY
skip_builds = SKIP_BUILDS
skip_builds = 'skip_builds'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this string is in too places, it's probably better to leave it in the constant. Avoids the problem where someone alters one down the line, but not the other.

@pengale
Copy link
Contributor

pengale commented Feb 22, 2017

Overall, I am +1 on this. Nothing major jumped out to me in a readthrough of the code, and I'm able to deploy without errors to aws, and setup and run the tests.

@pengale
Copy link
Contributor

pengale commented Feb 22, 2017

@kwmonroe The timeout that you ran into is more likely a problem with the charm in general, rather than a problem with containerizing, correct?

If so, I think that we should merge this ...

@ktsakalozos
Copy link
Contributor

LGTM2! Merging it!

@ktsakalozos ktsakalozos merged commit d1c8e73 into master Feb 23, 2017
@kwmonroe kwmonroe deleted the feature/lxd branch February 23, 2017 20:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants