Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log operator initial sync timings #329

Merged
merged 1 commit into from
Feb 2, 2019

Conversation

cgwalters
Copy link
Member

No description provided.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 18, 2019
@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 18, 2019
@cgwalters
Copy link
Member Author

cgwalters commented Jan 18, 2019

Timings from a CI run:

I0118 22:40:11.214606       1 sync.go:47] [init mode] synced pools in 35.793377ms
I0118 22:40:18.681254       1 sync.go:47] [init mode] synced mcs in 7.453014364s
I0118 22:40:32.455741       1 sync.go:47] [init mode] synced mcd in 13.704618657s
I0118 22:40:46.895299       1 sync.go:47] [init mode] synced mcc in 14.356922167s
I0118 22:40:54.927609       1 sync.go:47] [init mode] synced required-pools in 8.000336266s
I0118 22:40:54.947661       1 sync.go:58] Initialization complete

This also looks like a clean run, no degraded nodes. Less spam in the MCC log than usual (other than the new spam about being unable to prune MCs).

startTime := time.Now()
errs = append(errs, sf.fn(rconfig))
if optr.inClusterBringup {
glog.Infof("[init mode] synced %s in %v", sf.name, time.Since(startTime))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this is neat!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why report only on cluster bringup ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently there are some .V(4) level logs in sync(). It wasn't clear to me at what cadence we'd end up logging. Glancing at the operator...it watches all daemonsets in our namespace, so we'll do a sync every time e.g. a node updates and the MCD state changes right?

I guess we can do more sync logging if we more consistently add some rate-limiting/delays like #337 ?

@cgwalters cgwalters changed the title WIP: Operator pause on cluster start Operator pause on cluster start Jan 22, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 22, 2019
@cgwalters
Copy link
Member Author

I'm lifting WIP on this since I think it's an improvement.

@cgwalters cgwalters changed the title Operator pause on cluster start Log operator initial sync timings Jan 29, 2019
@cgwalters
Copy link
Member Author

cgwalters commented Jan 29, 2019

OK I rebased 🏄‍♂️ and dropped the reordering. This one now just adds "initial sync time" logging; goal is increased observability to help us debug, that's it.

@jlebon
Copy link
Member

jlebon commented Jan 29, 2019

This looks good to me. Will let @abhinavdahiya have another look.
/approve

@abhinavdahiya
Copy link
Contributor

This looks good to me. Will let @abhinavdahiya have another look.
/approve

will defer to the team, I have no strong opinions.

@jlebon
Copy link
Member

jlebon commented Jan 29, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 29, 2019
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

9 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@ashcrow
Copy link
Member

ashcrow commented Jan 30, 2019

aws_internet_gateway.igw: error attaching EC2 <snip>: timeout while waiting for state to become 'success' (timeout: 2m0s)

/retest

@ashcrow
Copy link
Member

ashcrow commented Jan 30, 2019

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

I'd like to know how long the parts of the initial sync take.
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Feb 1, 2019
@cgwalters
Copy link
Member Author

Looking at the logs in this PR, we kept printing "Initialization complete". Fixed and rebased 🏄‍♂️

@ashcrow
Copy link
Member

ashcrow commented Feb 1, 2019

/retest

1 similar comment
@ashcrow
Copy link
Member

ashcrow commented Feb 1, 2019

/retest

@jlebon
Copy link
Member

jlebon commented Feb 1, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 1, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, jlebon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 4d5d7e8 into openshift:master Feb 2, 2019
osherdp pushed a commit to osherdp/machine-config-operator that referenced this pull request Apr 13, 2021
Sync w/ library for updating jenkins nodejs agent image
sinnykumari added a commit to sinnykumari/machine-config-operator that referenced this pull request Aug 17, 2022
systemd's implementation of `InaccessiblePaths` has
quadratic behavior in number of mounts.
The fix to rpm-ostree is inbound.  But...adding this workaround which
has the MCD dynamically reconfigure the system to drop that config will
help unstick clusters so they can get the real fix.

Manual backport of PR openshift#329
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants