Log operator initial sync timings #329

cgwalters · 2019-01-18T22:17:38Z

No description provided.

cgwalters · 2019-01-18T22:42:30Z

Timings from a CI run:

I0118 22:40:11.214606       1 sync.go:47] [init mode] synced pools in 35.793377ms
I0118 22:40:18.681254       1 sync.go:47] [init mode] synced mcs in 7.453014364s
I0118 22:40:32.455741       1 sync.go:47] [init mode] synced mcd in 13.704618657s
I0118 22:40:46.895299       1 sync.go:47] [init mode] synced mcc in 14.356922167s
I0118 22:40:54.927609       1 sync.go:47] [init mode] synced required-pools in 8.000336266s
I0118 22:40:54.947661       1 sync.go:58] Initialization complete

This also looks like a clean run, no degraded nodes. Less spam in the MCC log than usual (other than the new spam about being unable to prune MCs).

jlebon · 2019-01-18T22:57:02Z

pkg/operator/sync.go

+		startTime := time.Now()
+		errs = append(errs, sf.fn(rconfig))
+		if optr.inClusterBringup {
+			glog.Infof("[init mode] synced %s in %v", sf.name, time.Since(startTime))


Oh this is neat!

why report only on cluster bringup ?

Currently there are some .V(4) level logs in sync(). It wasn't clear to me at what cadence we'd end up logging. Glancing at the operator...it watches all daemonsets in our namespace, so we'll do a sync every time e.g. a node updates and the MCD state changes right?

I guess we can do more sync logging if we more consistently add some rate-limiting/delays like #337 ?

cgwalters · 2019-01-22T22:23:06Z

I'm lifting WIP on this since I think it's an improvement.

cgwalters · 2019-01-29T15:01:38Z

OK I rebased 🏄‍♂️ and dropped the reordering. This one now just adds "initial sync time" logging; goal is increased observability to help us debug, that's it.

jlebon · 2019-01-29T19:38:38Z

This looks good to me. Will let @abhinavdahiya have another look.
/approve

abhinavdahiya · 2019-01-29T19:47:35Z

This looks good to me. Will let @abhinavdahiya have another look.
/approve

will defer to the team, I have no strong opinions.

jlebon · 2019-01-29T20:09:57Z

/lgtm

openshift-bot · 2019-01-29T20:41:14Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2019-01-29T22:42:12Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2019-01-30T00:43:19Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2019-01-30T02:44:09Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2019-01-30T04:45:10Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2019-01-30T06:46:10Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2019-01-30T08:47:13Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2019-01-30T10:48:14Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2019-01-30T12:49:15Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2019-01-30T14:50:13Z

/retest

Please review the full test history for this PR and help us cut down flakes.

ashcrow · 2019-01-30T17:30:15Z

aws_internet_gateway.igw: error attaching EC2 <snip>: timeout while waiting for state to become 'success' (timeout: 2m0s)

/retest

ashcrow · 2019-01-30T19:11:12Z

/retest

openshift-bot · 2019-01-30T20:52:32Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2019-02-01T13:14:57Z

/retest

Please review the full test history for this PR and help us cut down flakes.

I'd like to know how long the parts of the initial sync take.

cgwalters · 2019-02-01T14:13:09Z

Looking at the logs in this PR, we kept printing "Initialization complete". Fixed and rebased 🏄‍♂️

ashcrow · 2019-02-01T14:42:45Z

/retest

ashcrow · 2019-02-01T15:34:06Z

/retest

jlebon · 2019-02-01T17:01:48Z

/lgtm

openshift-ci-robot · 2019-02-01T17:02:25Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, jlebon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [cgwalters,jlebon]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2019-02-01T23:44:38Z

/retest

Please review the full test history for this PR and help us cut down flakes.

Sync w/ library for updating jenkins nodejs agent image

systemd's implementation of `InaccessiblePaths` has quadratic behavior in number of mounts. The fix to rpm-ostree is inbound. But...adding this workaround which has the MCD dynamically reconfigure the system to drop that config will help unstick clusters so they can get the real fix. Manual backport of PR openshift#329

openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 18, 2019

openshift-ci-robot requested review from abhinavdahiya and jlebon January 18, 2019 22:17

openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 18, 2019

This was referenced Jan 18, 2019

MachineConfigs can be garbage collected while a node is still booting #301

Closed

WIP: Render osImageURL, handle "bootstrap" case in MCD #324

Closed

jlebon reviewed Jan 18, 2019

View reviewed changes

cgwalters changed the title ~~WIP: Operator pause on cluster start~~ Operator pause on cluster start Jan 22, 2019

openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 22, 2019

cgwalters force-pushed the operator-pause branch from c8109db to 0f999e4 Compare January 29, 2019 15:01

cgwalters changed the title ~~Operator pause on cluster start~~ Log operator initial sync timings Jan 29, 2019

openshift-ci-robot assigned jlebon Jan 29, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 29, 2019

operator: Log timing and status for initial cluster bringup

b8876d9

I'd like to know how long the parts of the initial sync take.

cgwalters force-pushed the operator-pause branch from 0f999e4 to b8876d9 Compare February 1, 2019 14:12

openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Feb 1, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 1, 2019

openshift-merge-robot merged commit 4d5d7e8 into openshift:master Feb 2, 2019

osherdp pushed a commit to osherdp/machine-config-operator that referenced this pull request Apr 13, 2021

Merge pull request openshift#329 from waveywaves/library-sync

a543ba6

Sync w/ library for updating jenkins nodejs agent image

sinnykumari mentioned this pull request Aug 17, 2022

OCPBUGS-197: daemon: Add a workaround for bug 2111817 #3292

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log operator initial sync timings #329

Log operator initial sync timings #329

cgwalters commented Jan 18, 2019

cgwalters commented Jan 18, 2019 •

edited

jlebon Jan 18, 2019

abhinavdahiya Jan 22, 2019

cgwalters Jan 22, 2019

cgwalters commented Jan 22, 2019

cgwalters commented Jan 29, 2019 •

edited

jlebon commented Jan 29, 2019

abhinavdahiya commented Jan 29, 2019

jlebon commented Jan 29, 2019

openshift-bot commented Jan 29, 2019

openshift-bot commented Jan 29, 2019

openshift-bot commented Jan 30, 2019

openshift-bot commented Jan 30, 2019

openshift-bot commented Jan 30, 2019

openshift-bot commented Jan 30, 2019

openshift-bot commented Jan 30, 2019

openshift-bot commented Jan 30, 2019

openshift-bot commented Jan 30, 2019

openshift-bot commented Jan 30, 2019

ashcrow commented Jan 30, 2019

ashcrow commented Jan 30, 2019

openshift-bot commented Jan 30, 2019

openshift-bot commented Feb 1, 2019

cgwalters commented Feb 1, 2019

ashcrow commented Feb 1, 2019

ashcrow commented Feb 1, 2019

jlebon commented Feb 1, 2019

openshift-ci-robot commented Feb 1, 2019

openshift-bot commented Feb 1, 2019

Log operator initial sync timings #329

Log operator initial sync timings #329

Conversation

cgwalters commented Jan 18, 2019

cgwalters commented Jan 18, 2019 • edited

jlebon Jan 18, 2019

Choose a reason for hiding this comment

abhinavdahiya Jan 22, 2019

Choose a reason for hiding this comment

cgwalters Jan 22, 2019

Choose a reason for hiding this comment

cgwalters commented Jan 22, 2019

cgwalters commented Jan 29, 2019 • edited

jlebon commented Jan 29, 2019

abhinavdahiya commented Jan 29, 2019

jlebon commented Jan 29, 2019

openshift-bot commented Jan 29, 2019

openshift-bot commented Jan 29, 2019

openshift-bot commented Jan 30, 2019

openshift-bot commented Jan 30, 2019

openshift-bot commented Jan 30, 2019

openshift-bot commented Jan 30, 2019

openshift-bot commented Jan 30, 2019

openshift-bot commented Jan 30, 2019

openshift-bot commented Jan 30, 2019

openshift-bot commented Jan 30, 2019

ashcrow commented Jan 30, 2019

ashcrow commented Jan 30, 2019

openshift-bot commented Jan 30, 2019

openshift-bot commented Feb 1, 2019

cgwalters commented Feb 1, 2019

ashcrow commented Feb 1, 2019

ashcrow commented Feb 1, 2019

jlebon commented Feb 1, 2019

openshift-ci-robot commented Feb 1, 2019

openshift-bot commented Feb 1, 2019

cgwalters commented Jan 18, 2019 •

edited

cgwalters commented Jan 29, 2019 •

edited