New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pivot.service to perform pivot #335

Merged
merged 4 commits into from Feb 12, 2019

Conversation

Projects
None yet
8 participants
@jlebon
Copy link
Member

jlebon commented Jan 21, 2019

Rather than directly executing pivot from within the container, use
its new service unit to run it. The main advantage of this is that
pivot can correctly run ostree as install_t which is required for
writing new SELinux labels. More generally though, we should be running
such host tools within the host context rather than from the container.

This works nicely though the downside is that we lose progress output
from pivot. I'd like to add journal proxying to fix that.

Closes: #314

Requires: openshift/pivot#25

@ashcrow

This comment has been minimized.

Copy link
Member

ashcrow commented Jan 22, 2019

Infra flakes

/retest

@ashcrow

This comment has been minimized.

Copy link
Member

ashcrow commented Jan 22, 2019

Infra flake

/retest

@jlebon

This comment has been minimized.

Copy link
Member Author

jlebon commented Jan 30, 2019

OK, this now also proxies journal logs from the pivot unit:

I0130 22:19:17.642994   44729 update.go:643] Updating OS to registry.svc.ci.openshift.org/rhcos/maipo@sha256:ede3888e50016d61a720af2fe3f80e67e86bd819e16516ac36538456d46e0d77
pivot.service: I0130 22:19:20.009133   44951 root.go:89] Resolved to: registry.svc.ci.openshift.org/rhcos/maipo@sha256:ede3888e50016d61a720af2fe3f80e67e86bd819e16516ac36538456d46e0d77
pivot.service: I0130 22:19:20.009160   44951 root.go:103] Pivoting to: 47.198 (66e5fd101880c1583362e6ec363fd24c630290ab68a126b3e614143a1c1a4cda)
pivot.service: I0130 22:19:20.009173   44951 run.go:15] Running: podman pull registry.svc.ci.openshift.org/rhcos/maipo@sha256:ede3888e50016d61a720af2fe3f80e67e86bd819e16516ac36538456d46e0d77

(Notice the pivot.service: prefix in the above, those are from the journal).

Though one thing I'm unsure of is:

One note here is that this requires CGO_ENABLED=1 because the journal
bindings use the dlopen package, which has -ldl.

I'm not sure of the ramifications of this. IIUC, it's safe to enable this since we're not cross-compiling, right?

@jlebon

This comment has been minimized.

Copy link
Member Author

jlebon commented Jan 30, 2019

vendor/github.com/coreos/go-systemd/sdjournal/journal.go:2...atal error: systemd/sd-journal.h: No such file or directory
 // #include <systemd/sd-journal.h>
                                 ^
compilation terminated.
error: build error: running 'WHAT=machine-config-controller ./hack/build-go.sh' failed with exit code 2

Ahh right, this will require adding systemd-devel as a buildreq.

@ashcrow
Copy link
Member

ashcrow left a comment

So far makes sense. One nit, but not a blocker.

@@ -31,4 +31,4 @@ fi
mkdir -p ${BIN_PATH}

echo "Building ${REPO}/cmd/${WHAT} (${VERSION_OVERRIDE})"
CGO_ENABLED=0 GOOS=${GOOS} GOARCH=${GOARCH} go build ${GOFLAGS} -ldflags "${GLDFLAGS}" -o ${BIN_PATH}/${WHAT} ${REPO}/cmd/${WHAT}
CGO_ENABLED=1 GOOS=${GOOS} GOARCH=${GOARCH} go build ${GOFLAGS} -ldflags "${GLDFLAGS}" -o ${BIN_PATH}/${WHAT} ${REPO}/cmd/${WHAT}

This comment has been minimized.

@ashcrow

ashcrow Jan 31, 2019

Member

CGO 😦 ... but understandable.

Show resolved Hide resolved pkg/daemon/rpm-ostree.go Outdated
@ashcrow

This comment has been minimized.

Copy link
Member

ashcrow commented Jan 31, 2019

Building fails due to lack of C headers:

vendor/github.com/coreos/go-systemd/sdjournal/journal.go:2...atal error: systemd/sd-journal.h: No such file or directory
 // #include <systemd/sd-journal.h>
@jlebon

This comment has been minimized.

Copy link
Member Author

jlebon commented Jan 31, 2019

Building fails due to lack of C headers:

Yup, mentioned this higher up in #335 (comment). :) Just waiting to see if someone more versed in golang than I can confirm CGO_ENABLED=1 is indeed fine before fixing things up.

@jlebon

This comment has been minimized.

Copy link
Member Author

jlebon commented Jan 31, 2019

@abhinavdahiya Any reason why this commit added CGO_ENABLED=0? Were we vendoring something which linked to something we didn't want?

@abhinavdahiya

This comment has been minimized.

Copy link
Member

abhinavdahiya commented Jan 31, 2019

@abhinavdahiya Any reason why this commit added CGO_ENABLED=0? Were we vendoring something which linked to something we didn't want?

To force static binaries. Without it @crawford was unable to test binaries locally.

Also don't turn on cgo for all components, I think we should do it only for the daemon.

@crawford

This comment has been minimized.

Copy link
Member

crawford commented Jan 31, 2019

Without it @crawford was unable to test binaries locally.

I run NixOS, which is a really good way of double checking if you've made too many assumptions ;) I don't have libc, bash, or even a dynamic linker in the traditional location. I'm on a personal crusade to kill the misuse of the term "Linux Binary" (looking right at you, https://golang.org/dl/).

@jlebon

This comment has been minimized.

Copy link
Member Author

jlebon commented Feb 6, 2019

OK, comments addressed! This is mostly waiting for openshift/release#2783 now.

@ashcrow

This comment has been minimized.

Copy link
Member

ashcrow commented Feb 7, 2019

Added more to the pre-req for review ... also ping'd owners in chat. 🤞

@ashcrow

This comment has been minimized.

Copy link
Member

ashcrow commented Feb 7, 2019

openshift/release#2783 merged

/retest

@runcom

This comment has been minimized.

Copy link
Member

runcom commented Feb 7, 2019

golang base image hasn't probably propagated yet (or built fwiw)

@runcom

This comment has been minimized.

Copy link
Member

runcom commented Feb 7, 2019

Clayton said the base golang image should be rebuild now, trying...
@jlebon does this still need WIP?

/retest

@jlebon

This comment has been minimized.

Copy link
Member Author

jlebon commented Feb 7, 2019

Yeah, want to retest it one last time now that all the prereqs are baked into the latest RHCOS image. But need to set up a new cluster first.

jlebon added some commits Jan 21, 2019

Use pivot.service to perform pivot
Rather than directly executing `pivot` from within the container, use
its new service unit to run it. The idea here is that `pivot` is part of
the host, and as such, we want it to execute from the host context, not
from within the container. The main main advantage of this is that
`pivot` can correctly run `rpm-ostree` as `install_t` which is required
for writing new SELinux labels.

A follow-up patch will add journal proxying so that we don't lose the
logs from pivot and rpm-ostree.

Closes: #314
Vendor github.com/coreos/go-systemd/sdjournal
We will want to be able to proxy some of the host journal logs in the
MCD logs.
daemon/rpm-ostree: Follow journal logs from pivot.service
Proxy the journal logs from `pivot.service` and `rpm-ostreed.service`
after starting it so we can see the progress of the pivot operation
directly from the MCD container logs. This is important for
troubleshooting. We're also likely to proxy more journal logs in the
future, so this just breaks the ice.

One note here is that this requires `CGO_ENABLED=1` because the journal
bindings use the dlopen package, which has `-ldl`. We're enabling it
only for the machine-config-daemon component for now though.

@jlebon jlebon force-pushed the jlebon:pr/pivot-rework branch from 3f77d41 to 43c676b Feb 9, 2019

@runcom

This comment has been minimized.

Copy link
Member

runcom commented Feb 9, 2019

not sure why rhel-images fail in the CI for jouranld, I thought we added systemd-devel to the base registry.svc.ci.openshift.org/openshift/release:golang-1.10 already and indeed I can see the C header file. It also fails locally for me when doing make deploy-daemon with the very same error (I'm missing something stupid probably)

Nevermind, buildah and podman are doing heavy caching which I don't like

@runcom

This comment has been minimized.

Copy link
Member

runcom commented Feb 9, 2019

trying an upgrade to the latest oscontainer for ootpa, I get this:

20:08:05 [github.com/openshift/installer] ‹master*› oc logs -p machine-config-daemon-qghbm
I0209 19:01:35.548851   33639 start.go:52] Version: 3.11.0-586-g3f77d419
I0209 19:01:35.549179   33639 start.go:88] starting node writer
I0209 19:01:35.557321   33639 run.go:22] Running captured: chroot /rootfs rpm-ostree status --json
I0209 19:01:35.645087   33639 daemon.go:155] Booted osImageURL: registry.svc.ci.openshift.org/rhcos/maipo@sha256:def5c8ef775f634d99e3f603da0db7591a0113ad8307bc8d8545a170787a2825 (47.310)
I0209 19:01:35.645278   33639 daemon.go:227] Managing node: ip-10-0-175-8.us-west-2.compute.internal
I0209 19:01:35.658271   33639 start.go:146] Calling chroot("/rootfs")
I0209 19:01:35.671963   33639 daemon.go:432] Current+desired config: worker-bcb39e51c5bfba89da68ec52036f119e
I0209 19:01:35.671995   33639 daemon.go:786] No target osImageURL provided
I0209 19:01:35.672722   33639 daemon.go:547] Validated on-disk state
I0209 19:01:35.672740   33639 daemon.go:579] In desired config worker-bcb39e51c5bfba89da68ec52036f119e
I0209 19:01:35.672766   33639 start.go:165] Starting MachineConfigDaemon
I0209 19:01:35.672776   33639 daemon.go:248] Enabling Kubelet Healthz Monitor
I0209 19:06:06.374178   33639 update.go:110] Checking reconcilable for config worker-bcb39e51c5bfba89da68ec52036f119e to worker-de933fa1626a5b1943a242550a96e387
I0209 19:06:06.374211   33639 update.go:149] Checking if configs are reconcilable
I0209 19:06:06.374240   33639 update.go:297] Updating files
I0209 19:06:06.374250   33639 update.go:507] Writing file "/etc/containers/registries.conf"
I0209 19:06:06.376011   33639 update.go:507] Writing file "/etc/sysconfig/crio-network"
I0209 19:06:06.377286   33639 update.go:507] Writing file "/var/lib/kubelet/config.json"
I0209 19:06:06.378569   33639 update.go:507] Writing file "/etc/kubernetes/ca.crt"
I0209 19:06:06.379822   33639 update.go:507] Writing file "/etc/sysctl.d/forward.conf"
I0209 19:06:06.381061   33639 update.go:507] Writing file "/etc/kubernetes/kubelet-plugins/volume/exec/.dummy"
I0209 19:06:06.381683   33639 update.go:507] Writing file "/etc/kubernetes/kubelet.conf"
I0209 19:06:06.382953   33639 update.go:441] Writing systemd unit "kubelet.service"
I0209 19:06:06.383092   33639 update.go:479] Enabling systemd unit "kubelet.service"
I0209 19:06:06.383271   33639 update.go:388] /etc/systemd/system/multi-user.target.wants/kubelet.service already exists. Not making a new symlink
I0209 19:06:06.383292   33639 update.go:316] Deleting stale data
I0209 19:06:06.383307   33639 update.go:603] Writing SSHKeys at "/home/core/.ssh"
I0209 19:06:06.384459   33639 update.go:643] Updating OS to registry.svc.ci.openshift.org/rhcos/ootpa@sha256:66e45253e6e7897a409f1a5faed208f215f44331dafffd2293f2e3b61f1f3435
pivot.service: I0209 19:06:10.799839   37987 root.go:89] Resolved to: registry.svc.ci.openshift.org/rhcos/ootpa@sha256:66e45253e6e7897a409f1a5faed208f215f44331dafffd2293f2e3b61f1f3435
pivot.service: I0209 19:06:10.799874   37987 run.go:16] Running: podman pull registry.svc.ci.openshift.org/rhcos/ootpa@sha256:66e45253e6e7897a409f1a5faed208f215f44331dafffd2293f2e3b61f1f3435
pivot.service: Trying to pull registry.svc.ci.openshift.org/rhcos/ootpa@sha256:66e45253e6e7897a409f1a5faed208f215f44331dafffd2293f2e3b61f1f3435...Getting image source signatures
Copying blob 2a89ffbc2816: 543.98 MiB / 543.98 MiB  21sMiB
Copying config b468bc463acf: 412 B / 412 B  0sB / 412 B 
pivot.service: Writing manifest to image destination
pivot.service: Storing signatures
pivot.service: b468bc463acfcce89afec62b34535352c7ae7b513db7ad79ad97df8844e06266
pivot.service: I0209 19:06:51.200223   37987 run.go:16] Running: podman kill ostree-container-pivot
pivot.service: unable to find container ostree-container-pivot: no container with name or ID ostree-container-pivot found: no such container
pivot.service: W0209 19:06:51.242429   37987 run.go:69] (ignored) podman: exit status 125
pivot.service: I0209 19:06:51.242482   37987 run.go:16] Running: podman rm -f ostree-container-pivot
pivot.service: unable to find container ostree-container-pivot: no container with name or ID ostree-container-pivot found: no such container
pivot.service: W0209 19:06:51.338267   37987 run.go:69] (ignored) podman: exit status 125
pivot.service: I0209 19:06:51.338320   37987 run.go:16] Running: podman create --net=none --name ostree-container-pivot registry.svc.ci.openshift.org/rhcos/ootpa@sha256:66e45253e6e7897a409f1a5faed208f215f44331dafffd2293f2e3b61f1f3435
pivot.service: I0209 19:06:51.471997   37987 run.go:16] Running: podman mount 9eb7c6b28a7c5b03cf32dd7605d6e2516d5f1db05f4de7030f1991c413638656
pivot.service: I0209 19:06:51.591874   37987 root.go:114] Pivoting to: 48.254 (862a286d84c486f9accae3af5354389e4f9bfebdfb559392b537ff4cddbd244f)
pivot.service: I0209 19:06:51.591908   37987 run.go:16] Running: rpm-ostree rebase --experimental /var/lib/containers/storage/overlay/d624160f79436adc1754c78cdf6b971cd022f8113a36f69ba350f10fcd7330e2/merged/srv/repo:862a286d84c486f9accae3af5354389e4f9bfebdfb559392b537ff4cddbd244f --custom-origin-url pivot://registry.svc.ci.openshift.org/rhcos/ootpa@sha256:66e45253e6e7897a409f1a5faed208f215f44331dafffd2293f2e3b61f1f3435 --custom-origin-description Managed by pivot tool
rpm-ostreed.service: client(id:cli dbus:1.184 unit:pivot.service uid:0) added; new total=1
rpm-ostreed.service: client(id:cli dbus:1.184 unit:pivot.service uid:0) vanished; remaining=0
rpm-ostreed.service: In idle state; will auto-exit in 62 seconds
pivot.service: error: The connection is closed
pivot.service: F0209 19:06:51.668761   37987 run.go:62] rpm-ostree: exit status 1
I0209 19:06:52.456120   33639 daemon.go:649] Unable to apply update: Failed to run pivot: pivot service did not exit successfully
E0209 19:06:52.456573   33639 writer.go:85] Marking degraded due to: Failed to run pivot: pivot service did not exit successfully
F0209 19:06:52.466403   33639 start.go:170] failed to run: Failed to run pivot: pivot service did not exit successfully

pivot fails manually as well:

[core@ip-10-0-175-8 ~]$ sudo pivot registry.svc.ci.openshift.org/rhcos/ootpa@sha256:66e45253e6e7897a409f1a5faed208f215f44331dafffd2293f2e3b61f1f3435
pivot version 0.0.3
I0209 19:16:08.338706   47746 run.go:16] Running: rpm-ostree status --json
I0209 19:16:08.402832   47746 root.go:79] Previous pivot: registry.svc.ci.openshift.org/rhcos/maipo@sha256:def5c8ef775f634d99e3f603da0db7591a0113ad8307bc8d8545a170787a2825
I0209 19:16:08.402878   47746 run.go:16] Running: skopeo inspect docker://registry.svc.ci.openshift.org/rhcos/ootpa@sha256:66e45253e6e7897a409f1a5faed208f215f44331dafffd2293f2e3b61f1f3435
I0209 19:16:11.758957   47746 root.go:89] Resolved to: registry.svc.ci.openshift.org/rhcos/ootpa@sha256:66e45253e6e7897a409f1a5faed208f215f44331dafffd2293f2e3b61f1f3435
I0209 19:16:11.758987   47746 run.go:16] Running: podman pull registry.svc.ci.openshift.org/rhcos/ootpa@sha256:66e45253e6e7897a409f1a5faed208f215f44331dafffd2293f2e3b61f1f3435
Trying to pull registry.svc.ci.openshift.org/rhcos/ootpa@sha256:66e45253e6e7897a409f1a5faed208f215f44331dafffd2293f2e3b61f1f3435...Getting image source signatures
Skipping blob 2a89ffbc2816 (already present): 543.98 MiB / 543.98 MiB [=====] 0s
Copying config b468bc463acf: 412 B / 412 B [================================] 0s
Writing manifest to image destination
Storing signatures
b468bc463acfcce89afec62b34535352c7ae7b513db7ad79ad97df8844e06266
I0209 19:16:13.872569   47746 run.go:16] Running: podman kill ostree-container-pivot
can only kill running containers: container state improper
W0209 19:16:13.925101   47746 run.go:69] (ignored) podman: exit status 125
I0209 19:16:13.925131   47746 run.go:16] Running: podman rm -f ostree-container-pivot
9eb7c6b28a7c5b03cf32dd7605d6e2516d5f1db05f4de7030f1991c413638656
I0209 19:16:14.043458   47746 run.go:16] Running: podman create --net=none --name ostree-container-pivot registry.svc.ci.openshift.org/rhcos/ootpa@sha256:66e45253e6e7897a409f1a5faed208f215f44331dafffd2293f2e3b61f1f3435
I0209 19:16:14.166460   47746 run.go:16] Running: podman mount 5206ee860483464eb65a02397cf04e1fe712cc2a0f5f3aaa0c6aa246104fa666
I0209 19:16:14.280170   47746 root.go:114] Pivoting to: 48.254 (862a286d84c486f9accae3af5354389e4f9bfebdfb559392b537ff4cddbd244f)
I0209 19:16:14.280210   47746 run.go:16] Running: rpm-ostree rebase --experimental /var/lib/containers/storage/overlay/809bd4c302e119897bcb3401db6f4cb393722ce01e73dc18867225becb34a9a8/merged/srv/repo:862a286d84c486f9accae3af5354389e4f9bfebdfb559392b537ff4cddbd244f --custom-origin-url pivot://registry.svc.ci.openshift.org/rhcos/ootpa@sha256:66e45253e6e7897a409f1a5faed208f215f44331dafffd2293f2e3b61f1f3435 --custom-origin-description Managed by pivot tool
error: The connection is closed
F0209 19:16:14.330924   47746 run.go:62] rpm-ostree: exit status 1

besides ootpa, code LGTM

@jlebon

This comment has been minimized.

Copy link
Member Author

jlebon commented Feb 9, 2019

Hmm, can you check the journal for errors, e.g. journalctl -u rpm-ostreed?

@runcom

This comment has been minimized.

Copy link
Member

runcom commented Feb 10, 2019

ok, found this after scraping the journal at around the time pivot failed:

Feb 10 21:44:37 ip-10-0-159-35 kernel: type=1400 audit(1549835077.480:8): avc:  denied  { read } for  pid=4040 comm="dbus-daemon" path="/var/lib/containers/storage/overlay/a3814813e35e2c58b629fb864e957a7856ac50ccd5f1ff4fefc7b4041b56c8cd/merged/srv/repo" dev="overlay" ino=100288 scontext=system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 tcontext=system_u:object_r:container_file_t:s0:c494,c933 tclass=dir permissive=0
@runcom

This comment has been minimized.

Copy link
Member

runcom commented Feb 10, 2019

alright, that seems to be the case why it still fails on ootpa, if I setenforce 0, upgrade to ootpa goes through (it would also be nice if we try to expose the real failure in some way but no clue how actually)

@runcom

This comment has been minimized.

Copy link
Member

runcom commented Feb 10, 2019

we can follow up on ootpa btw, maipo upgrade went well for me

/approve

@jlebon

This comment has been minimized.

Copy link
Member Author

jlebon commented Feb 10, 2019

alright, that seems to be the case why it still fails on ootpa, if I setenforce 0, upgrade to ootpa goes through

Ahh yup, that's https://bugzilla.redhat.com/show_bug.cgi?id=1672404 (see also projectatomic/rpm-ostree#1732 (comment)). The dev rpm-ostree uses --disable-dfd-over-dbus, but I had forgotten to do the same internally. Just submitted a new build now so hopefully it's in tomorrow's spin.

@jlebon

This comment has been minimized.

Copy link
Member Author

jlebon commented Feb 11, 2019

Hmm, looks like ci/prow/rhel-images is still failing due to the sdjournal dependency. Will have to look into that tomorrow.

@runcom

This comment has been minimized.

Copy link
Member

runcom commented Feb 11, 2019

Yeah, that image (the rhel one in this repo for the daemon) still doesn't have systemd-devel and the base image is different, even if I thought it's just mirrored.

FROM registry.svc.ci.openshift.org/ocp/builder:golang-1.10 AS builder
@ashcrow

This comment has been minimized.

Copy link
Member

ashcrow commented Feb 11, 2019

Yeah, that image (the rhel one in this repo for the daemon) still doesn't have systemd-devel and the base image is different, even if I thought it's just mirrored.

FROM registry.svc.ci.openshift.org/ocp/builder:golang-1.10 AS builder

Is there an open issue to fix that ... or should it flow into it and we just need to bug someone to speed up the change?

@runcom

This comment has been minimized.

Copy link
Member

runcom commented Feb 11, 2019

Is there an open issue to fix that ... or should it flow into it and we just need to bug someone to speed up the change?

I was going to bug someone already actually as my understanding was that rhel builder images are just mirrored

@runcom

This comment has been minimized.

Copy link
Member

runcom commented Feb 12, 2019

/retest

@ashcrow

This comment has been minimized.

Copy link
Member

ashcrow commented Feb 12, 2019

vendor/github.com/coreos/go-systemd/sdjournal/journal.go:2...atal error: systemd/sd-journal.h: No such file or directory

/cc @sosiouxme

@ashcrow

This comment has been minimized.

Copy link
Member

ashcrow commented Feb 12, 2019

/test rhel-images

I don't think the image is updated yet but let's try just in case.

@cgwalters

This comment has been minimized.

Copy link
Contributor

cgwalters commented Feb 12, 2019

We could fall back to just running journalctl as a subprocess.

@ashcrow

This comment has been minimized.

Copy link
Member

ashcrow commented Feb 12, 2019

@cgwalters true. If this test fails again it's worth the change unless @sosiouxme believes the image will be fixed updated today.

@runcom

This comment has been minimized.

Copy link
Member

runcom commented Feb 12, 2019

looks like it's built now?

/retest

@cgwalters

This comment has been minimized.

Copy link
Contributor

cgwalters commented Feb 12, 2019

/lgtm

@openshift-ci-robot

This comment has been minimized.

Copy link

openshift-ci-robot commented Feb 12, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, jlebon, runcom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [cgwalters,jlebon,runcom]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jlebon

This comment has been minimized.

Copy link
Member Author

jlebon commented Feb 12, 2019

✔️ ci/prow/rhel-images — Job succeeded.

🎊

@openshift-merge-robot openshift-merge-robot merged commit c363860 into openshift:master Feb 12, 2019

6 checks passed

ci/prow/e2e-aws Job succeeded.
Details
ci/prow/e2e-aws-op Job succeeded.
Details
ci/prow/images Job succeeded.
Details
ci/prow/rhel-images Job succeeded.
Details
ci/prow/unit Job succeeded.
Details
tide In merge pool.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment