New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daemon: Log reboot reason to systemd journal as well #216

Merged
merged 2 commits into from Dec 7, 2018

Conversation

Projects
None yet
6 participants
@cgwalters
Contributor

cgwalters commented Dec 5, 2018

I saw my master reboot (pretty sure it was the MCD) after I killed
the MCD pod to test something. There was no event (must not have
gotten flushed? - wonder if we should wait a little bit for the
event to propagate...or be more conservative about rebooting masters?)

Anyways let's log to the systemd journal too. Yes, this implementation
is crude but it works.

@cgwalters

This comment has been minimized.

Contributor

cgwalters commented Dec 5, 2018

/hold

@cgwalters

This comment has been minimized.

Contributor

cgwalters commented Dec 5, 2018

I am still testing this; am having issues where for some reason trying to deploy my testing MCD is causing reboots (from the vanilla one) and of course that is missing this rationale...

@ashcrow

This comment has been minimized.

Member

ashcrow commented Dec 6, 2018

Test flakes, but needs a rebase.

}
go func() {
defer stdin.Close()

This comment has been minimized.

@ashcrow

ashcrow Dec 6, 2018

Member

This isn't problematic, but why not do the WriteString followed by Close since defer executes on function end?

This comment has been minimized.

@cgwalters

cgwalters Dec 6, 2018

Contributor

Because I write Go sometimes by cargo culting bits from the Internet 😄

This comment has been minimized.

@cgwalters

cgwalters Dec 6, 2018

Contributor

I think the general idea is that it's a best practice to always defer..*close since that way if e.g. if one later refactors a function to include an early return it still works?

This comment has been minimized.

@ashcrow

ashcrow Dec 6, 2018

Member

Works for me. 👍

@ashcrow

Change itself looks good, though a rebase is needed.

daemon: Log reboot reason to systemd journal as well
I saw my master reboot (pretty sure it was the MCD) after I killed
the MCD pod to test something.  There was no event (must not have
gotten flushed? - wonder if we should wait a little bit for the
event to propagate...or be more conservative about rebooting masters?)

Anyways let's log to the systemd journal too.  Yes, this implementation
is crude but it works.
@kikisdeliveryservice

This comment has been minimized.

Member

kikisdeliveryservice commented Dec 6, 2018

I am still testing this; am having issues where for some reason trying to deploy my testing MCD is causing reboots (from the vanilla one) and of course that is missing this rationale...

What behaviour are you seeing exactly @cgwalters ?

@cgwalters cgwalters force-pushed the cgwalters:journal-reboots branch from 50f75c7 to 83f938a Dec 6, 2018

@cgwalters

This comment has been minimized.

Contributor

cgwalters commented Dec 6, 2018

OK rebased 🏄 and tested!

[root@osiris-worker-0-z2v8p ~]# journalctl |grep machine-config-daemon.*reboot
Dec 06 17:35:28 osiris-worker-0-z2v8p root[25887]: machine-config-daemon initiating reboot: Node will reboot with new config.
Dec 06 17:35:57 osiris-worker-0-z2v8p root[6234]: machine-config-daemon initiating reboot: Node will reboot with new config.
Dec 06 17:36:25 osiris-worker-0-z2v8p root[6433]: machine-config-daemon initiating reboot: Node will reboot with new config.
Dec 06 18:10:24 osiris-worker-0-z2v8p root[17944]: machine-config-daemon initiating reboot: Node will reboot with new config.
Dec 06 18:17:02 osiris-worker-0-z2v8p root[9908]: machine-config-daemon initiating reboot: Node will reboot with new config.
Dec 06 18:23:12 osiris-worker-0-z2v8p root[9823]: machine-config-daemon initiating reboot: Node will reboot with new config.
Dec 06 19:16:40 osiris-worker-0-z2v8p root[23805]: machine-config-daemon initiating reboot: Node will reboot with new config.
Dec 06 19:38:50 osiris-worker-0-z2v8p root[14589]: machine-config-daemon initiating reboot: Node will reboot with new config.

(Which um...hm, that worker rebooted quite a bit, a lot more than I changed configs...)

@cgwalters

This comment has been minimized.

Contributor

cgwalters commented Dec 6, 2018

What behaviour are you seeing exactly @cgwalters ?

Following the HACKING.md file and deploying a new MCD container was (I believe) causing reboots.

@jlebon

This comment has been minimized.

Member

jlebon commented Dec 6, 2018

/lgtm

@cgwalters

This comment has been minimized.

Contributor

cgwalters commented Dec 6, 2018

/hold cancel

daemon: Add config hash to log messages
We want to be clear *which* config we're trying to use (and
which was the previous).
@cgwalters

This comment has been minimized.

Contributor

cgwalters commented Dec 6, 2018

/retest

@openshift-ci-robot openshift-ci-robot removed the lgtm label Dec 6, 2018

@cgwalters

This comment has been minimized.

Contributor

cgwalters commented Dec 6, 2018

OK since CI is being flaky again I went ahead and added in a new commit I was going to do as a followup.

@ashcrow

ashcrow approved these changes Dec 6, 2018

@ashcrow

This comment has been minimized.

Member

ashcrow commented Dec 6, 2018

Looks good @cgwalters. 🤞 for CI!

@jlebon

This comment has been minimized.

Member

jlebon commented Dec 6, 2018

Yup, looks sane to me!
/lgtm

@openshift-ci-robot

This comment has been minimized.

openshift-ci-robot commented Dec 6, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, jlebon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ashcrow

This comment has been minimized.

Member

ashcrow commented Dec 6, 2018

/test e2e-aws

@cgwalters

This comment has been minimized.

Contributor

cgwalters commented Dec 7, 2018

/retest

1 similar comment
@ashcrow

This comment has been minimized.

Member

ashcrow commented Dec 7, 2018

/retest

@openshift-merge-robot openshift-merge-robot merged commit 16b24e1 into openshift:master Dec 7, 2018

5 checks passed

ci/prow/e2e-aws Job succeeded.
Details
ci/prow/images Job succeeded.
Details
ci/prow/rhel-images Job succeeded.
Details
ci/prow/unit Job succeeded.
Details
tide In merge pool.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment