pkg/daemon: log pending config to journal #711

runcom · 2019-05-06T22:52:39Z

Let's try to avoid losing a file write...

runcom · 2019-05-06T23:02:43Z

pkg/daemon/update.go

+	}
+
+	pendingConfigStr := fmt.Sprintf(`MESSAGE_ID=34c7912c5dd2454286097f8f92a22e9e
+MESSAGE=%s`, pending.GetName())


TODO write bootid

same, message_id can be just what we want...see https://github.com/openshift/machine-config-operator/pull/711/files#r281399898

runcom · 2019-05-06T23:02:54Z

pkg/daemon/update.go

@@ -702,6 +692,36 @@ func (dn *Daemon) updateOS(config *mcfgv1.MachineConfig) error {
 	return nil
 }

+func (dn *Daemon) readPendingConfig() (string, error) {
+	// TODO(runcom): msgid has been generated with journalctl --new-id128, move it to a const
+	journalOutput, err := exec.Command("journalctl", "-b", "-1", "-o", "cat", "MESSAGE_ID=34c7912c5dd2454286097f8f92a22e9e").CombinedOutput()


output to json to read BOOT_ID and MESSAGE

also, we might want to have a customized "machine-config-daemon-pending-config" as msgid - not sure about opinions on this?

cgwalters · 2019-05-07T00:49:51Z

pkg/daemon/update.go

+		return fmt.Errorf("failed to get stdin pipe: %v", err)
+	}
+
+	pendingConfigStr := fmt.Sprintf(`MESSAGE_ID=34c7912c5dd2454286097f8f92a22e9e


This could use some abstraction into a wrapper that takes key=value strings as varargs or so.

(One other random thought I had is that we systemd-run a service on the host and send our logs to it and have it log them; then we can properly systemctl status openshift-machine-config or so - it would also basically act as a host-side mutex too to make doubly sure there aren't multiple MCDs)

cgwalters · 2019-05-07T01:05:22Z

I know I suggested this...but I'm wavering a bit. I'd be a lot happier if we had a stronger idea of what was happening...I really really want to get live access to an affected cluster.

Maybe the best bet is to keep throwing in more logging PRs and see what comes from that.

runcom · 2019-05-07T08:39:38Z

I know I suggested this...but I'm wavering a bit. I'd be a lot happier if we had a stronger idea of what was happening...I really really want to get live access to an affected cluster.

Maybe the best bet is to keep throwing in more logging PRs and see what comes from that.

I concur with that - I opened this to validate if it was something which we might pursue..

runcom · 2019-05-07T15:52:55Z

/retest

runcom · 2019-05-07T17:08:47Z

console/authentication failures

/retest

ashcrow · 2019-05-07T17:54:06Z

/retest

ashcrow · 2019-05-07T19:03:07Z

level=fatal msg="failed to initialize the cluster: Some cluster operators are still updating: authentication, console: timed out waiting for the condition"

ashcrow · 2019-05-07T19:05:43Z

pkg/daemon/update.go

+		}
+		return "", nil
+	}
+	return "", fmt.Errorf("no pending config found in journal: %v", string(journalOutput))


Could this end up being a very large error to return? IE: journalOutput containing lots of content?

we're filtering out the journal itself only on our message id and even if we reboot 100 times, we only get 100 lines - this error isn unreacheable though, I need to remove it

ashcrow · 2019-05-07T19:13:03Z

pkg/daemon/update.go

+}
+
+func (dn *Daemon) logPendingConfig(pending *mcfgv1.MachineConfig, isPending int) error {
+	logger := exec.Command("logger", "--journald")


It may be worth noting why writing to stdin is used rather than directly executing the command with the message. At first I was going to recommend simplifying via a direct call until I gave logger a run and realized that direct calling it is awkward (double enter to end input).

ashcrow · 2019-05-07T19:15:38Z

/retest

imcleod · 2019-05-07T20:17:50Z

/retest

runcom · 2019-05-07T21:57:10Z

/retest

runcom · 2019-05-08T00:10:53Z

oh so nice this worked and we're now reading/writing to journal the pending config

kikisdeliveryservice · 2019-05-08T00:13:22Z

@runcom are we going this route for now then?

runcom · 2019-05-08T00:15:40Z

@runcom are we going this route for now then?

let's hear back from @cgwalters, but I would greatly love to merge this and validate it properly in the upgrade jobs (we need this to land to master in order for the upgrade to pick up this change as the starting point of the job itself).
I share the hesitation with Colin into going this route anyway but I'm still standing towards finding the root cause of the bugzilla and properly handle that. Having said that, as closer as we might be, if this turns out to be stable enough, why not having it.

cgwalters · 2019-05-08T01:30:08Z

I think what's pushing me towards merging this the most is that it will make auditing events easier.

The code design looks good to me.

/lgtm

imcleod · 2019-05-08T02:34:33Z

/retest

openshift-ci-robot · 2019-05-08T09:16:13Z

New changes are detected. LGTM label has been removed.

runcom · 2019-05-08T09:18:49Z

rebased and re-pushed.
/lgtm

openshift-ci-robot · 2019-05-08T09:18:50Z

@runcom: you cannot LGTM your own PR.

In response to this:

rebased and re-pushed.
/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2019-05-08T09:18:59Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, runcom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [cgwalters,runcom]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2019-05-08T09:25:32Z

New changes are detected. LGTM label has been removed.

Let's try to avoid losing a file write...

openshift-ci-robot · 2019-05-08T09:39:59Z

New changes are detected. LGTM label has been removed.

openshift-ci-robot requested review from ashcrow and kikisdeliveryservice May 6, 2019 22:52

runcom force-pushed the log-pending-config branch 2 times, most recently from cff0cb9 to 534dcca Compare May 6, 2019 22:57

runcom commented May 6, 2019

View reviewed changes

runcom force-pushed the log-pending-config branch from 534dcca to 3f3f9c3 Compare May 6, 2019 23:03

cgwalters reviewed May 7, 2019

View reviewed changes

runcom force-pushed the log-pending-config branch from 3f3f9c3 to 3480e82 Compare May 7, 2019 08:39

runcom force-pushed the log-pending-config branch from 3480e82 to f9c5c6e Compare May 7, 2019 10:02

openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 7, 2019

runcom force-pushed the log-pending-config branch from f9c5c6e to 9d537a8 Compare May 7, 2019 13:25

runcom force-pushed the log-pending-config branch from 9d537a8 to bfa8eaa Compare May 7, 2019 17:56

ashcrow reviewed May 7, 2019

View reviewed changes

runcom force-pushed the log-pending-config branch 2 times, most recently from 769b887 to b0cbf41 Compare May 7, 2019 23:00

runcom changed the title ~~WIP pkg/daemon: log pending config to journal~~ pkg/daemon: log pending config to journal May 7, 2019

openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 7, 2019

runcom force-pushed the log-pending-config branch from b0cbf41 to 023f01e Compare May 8, 2019 00:13

openshift-ci-robot assigned cgwalters May 8, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 8, 2019

runcom force-pushed the log-pending-config branch from 023f01e to 94c974b Compare May 8, 2019 09:16

openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label May 8, 2019

runcom added the lgtm Indicates that a PR is ready to be merged. label May 8, 2019

openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label May 8, 2019

runcom added the lgtm Indicates that a PR is ready to be merged. label May 8, 2019

runcom added 2 commits May 8, 2019 11:39

pkg/daemon: log pending config to journal

2672424

Let's try to avoid losing a file write...

DROP: add a compat layer to pass upgrades job

d6d963b

runcom force-pushed the log-pending-config branch from e7afeed to d6d963b Compare May 8, 2019 09:39

openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label May 8, 2019

runcom added the lgtm Indicates that a PR is ready to be merged. label May 8, 2019

openshift-merge-robot merged commit fe5ae49 into openshift:master May 8, 2019

runcom deleted the log-pending-config branch May 8, 2019 11:05

runcom mentioned this pull request May 8, 2019

pkg/daemon: fix pending config logic #719

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pkg/daemon: log pending config to journal #711

pkg/daemon: log pending config to journal #711

runcom commented May 6, 2019

runcom May 6, 2019

runcom May 6, 2019

runcom May 6, 2019

runcom May 6, 2019

cgwalters May 7, 2019

cgwalters commented May 7, 2019

runcom commented May 7, 2019

runcom commented May 7, 2019

runcom commented May 7, 2019

ashcrow commented May 7, 2019

ashcrow commented May 7, 2019

ashcrow May 7, 2019

runcom May 7, 2019

ashcrow May 7, 2019

ashcrow commented May 7, 2019

imcleod commented May 7, 2019

runcom commented May 7, 2019

runcom commented May 8, 2019

kikisdeliveryservice commented May 8, 2019

runcom commented May 8, 2019

cgwalters commented May 8, 2019

imcleod commented May 8, 2019

openshift-ci-robot commented May 8, 2019

runcom commented May 8, 2019

openshift-ci-robot commented May 8, 2019

openshift-ci-robot commented May 8, 2019

openshift-ci-robot commented May 8, 2019

openshift-ci-robot commented May 8, 2019

pkg/daemon: log pending config to journal #711

pkg/daemon: log pending config to journal #711

Conversation

runcom commented May 6, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cgwalters commented May 7, 2019

runcom commented May 7, 2019

runcom commented May 7, 2019

runcom commented May 7, 2019

ashcrow commented May 7, 2019

ashcrow commented May 7, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ashcrow commented May 7, 2019

imcleod commented May 7, 2019

runcom commented May 7, 2019

runcom commented May 8, 2019

kikisdeliveryservice commented May 8, 2019

runcom commented May 8, 2019

cgwalters commented May 8, 2019

imcleod commented May 8, 2019

openshift-ci-robot commented May 8, 2019

runcom commented May 8, 2019

openshift-ci-robot commented May 8, 2019

openshift-ci-robot commented May 8, 2019

openshift-ci-robot commented May 8, 2019

openshift-ci-robot commented May 8, 2019