-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log operator initial sync timings #329
Log operator initial sync timings #329
Conversation
Timings from a CI run:
This also looks like a clean run, no degraded nodes. Less spam in the MCC log than usual (other than the new spam about being unable to prune MCs). |
startTime := time.Now() | ||
errs = append(errs, sf.fn(rconfig)) | ||
if optr.inClusterBringup { | ||
glog.Infof("[init mode] synced %s in %v", sf.name, time.Since(startTime)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh this is neat!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why report only on cluster bringup ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently there are some .V(4)
level logs in sync()
. It wasn't clear to me at what cadence we'd end up logging. Glancing at the operator...it watches all daemonsets in our namespace, so we'll do a sync every time e.g. a node updates and the MCD state changes right?
I guess we can do more sync logging if we more consistently add some rate-limiting/delays like #337 ?
I'm lifting WIP on this since I think it's an improvement. |
c8109db
to
0f999e4
Compare
OK I rebased 🏄♂️ and dropped the reordering. This one now just adds "initial sync time" logging; goal is increased observability to help us debug, that's it. |
This looks good to me. Will let @abhinavdahiya have another look. |
will defer to the team, I have no strong opinions. |
/lgtm |
/retest Please review the full test history for this PR and help us cut down flakes. |
9 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest |
/retest |
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest Please review the full test history for this PR and help us cut down flakes. |
I'd like to know how long the parts of the initial sync take.
0f999e4
to
b8876d9
Compare
Looking at the logs in this PR, we kept printing "Initialization complete". Fixed and rebased 🏄♂️ |
/retest |
1 similar comment
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, jlebon The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
Sync w/ library for updating jenkins nodejs agent image
systemd's implementation of `InaccessiblePaths` has quadratic behavior in number of mounts. The fix to rpm-ostree is inbound. But...adding this workaround which has the MCD dynamically reconfigure the system to drop that config will help unstick clusters so they can get the real fix. Manual backport of PR openshift#329
No description provided.