New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/daemon: log pending config to journal #711
pkg/daemon: log pending config to journal #711
Conversation
cff0cb9
to
534dcca
Compare
pkg/daemon/update.go
Outdated
} | ||
|
||
pendingConfigStr := fmt.Sprintf(`MESSAGE_ID=34c7912c5dd2454286097f8f92a22e9e | ||
MESSAGE=%s`, pending.GetName()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO write bootid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, message_id can be just what we want...see https://github.com/openshift/machine-config-operator/pull/711/files#r281399898
pkg/daemon/update.go
Outdated
@@ -702,6 +692,36 @@ func (dn *Daemon) updateOS(config *mcfgv1.MachineConfig) error { | |||
return nil | |||
} | |||
|
|||
func (dn *Daemon) readPendingConfig() (string, error) { | |||
// TODO(runcom): msgid has been generated with journalctl --new-id128, move it to a const | |||
journalOutput, err := exec.Command("journalctl", "-b", "-1", "-o", "cat", "MESSAGE_ID=34c7912c5dd2454286097f8f92a22e9e").CombinedOutput() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output to json to read BOOT_ID and MESSAGE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, we might want to have a customized "machine-config-daemon-pending-config" as msgid - not sure about opinions on this?
pkg/daemon/update.go
Outdated
return fmt.Errorf("failed to get stdin pipe: %v", err) | ||
} | ||
|
||
pendingConfigStr := fmt.Sprintf(`MESSAGE_ID=34c7912c5dd2454286097f8f92a22e9e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use some abstraction into a wrapper that takes key=value
strings as varargs or so.
(One other random thought I had is that we systemd-run
a service on the host and send our logs to it and have it log them; then we can properly systemctl status openshift-machine-config
or so - it would also basically act as a host-side mutex too to make doubly sure there aren't multiple MCDs)
I know I suggested this...but I'm wavering a bit. I'd be a lot happier if we had a stronger idea of what was happening...I really really want to get live access to an affected cluster. Maybe the best bet is to keep throwing in more logging PRs and see what comes from that. |
I concur with that - I opened this to validate if it was something which we might pursue.. |
/retest |
console/authentication failures /retest |
/retest |
|
pkg/daemon/update.go
Outdated
} | ||
return "", nil | ||
} | ||
return "", fmt.Errorf("no pending config found in journal: %v", string(journalOutput)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this end up being a very large error to return? IE: journalOutput
containing lots of content?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're filtering out the journal itself only on our message id and even if we reboot 100 times, we only get 100 lines - this error isn unreacheable though, I need to remove it
} | ||
|
||
func (dn *Daemon) logPendingConfig(pending *mcfgv1.MachineConfig, isPending int) error { | ||
logger := exec.Command("logger", "--journald") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth noting why writing to stdin
is used rather than directly executing the command with the message. At first I was going to recommend simplifying via a direct call until I gave logger
a run and realized that direct calling it is awkward (double enter to end input).
/retest |
2 similar comments
/retest |
/retest |
769b887
to
b0cbf41
Compare
oh so nice this worked and we're now reading/writing to journal the pending config |
@runcom are we going this route for now then? |
let's hear back from @cgwalters, but I would greatly love to merge this and validate it properly in the upgrade jobs (we need this to land to master in order for the upgrade to pick up this change as the starting point of the job itself). |
I think what's pushing me towards merging this the most is that it will make auditing events easier. The code design looks good to me. /lgtm |
/retest |
New changes are detected. LGTM label has been removed. |
rebased and re-pushed. |
@runcom: you cannot LGTM your own PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, runcom The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
New changes are detected. LGTM label has been removed. |
Let's try to avoid losing a file write...
New changes are detected. LGTM label has been removed. |
Let's try to avoid losing a file write...