Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1907929: enable madvdontneed in system components #2299

Merged
merged 1 commit into from Jan 22, 2021

Conversation

rphillips
Copy link
Contributor

@rphillips rphillips commented Dec 14, 2020

Golang 1.12 changed to use MADV_FREE. MADV_FREE is somewhat faster than
MADV_DONTNEED; however, the kernel will not try and reclaim memory in golang processes until the system is memory constrained. Cloud servers can potentially page out executable caches, which can cause IOPS throttles. Since golang 1.16 will be changing the default to MADV_DONTNEED, we are going to enable the flag by default.

golang/go#42330

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 14, 2020
@rphillips rphillips force-pushed the test_madvdontneed branch 2 times, most recently from 0df46c8 to ab74617 Compare December 14, 2020 19:22
@rphillips
Copy link
Contributor Author

We would also need to tweak the base image as well to set the GODEBUG: https://github.com/openshift/images/blob/515726d73cf1b35abbe0183860f5cde0aad5f387/base/Dockerfile.rhel#L23

@cgwalters
Copy link
Member

Is the thought that we should do this OpenShift-wide? The basic tradeoff here is latency vs returning memory to the OS, right? That seems to be what https://go-review.googlesource.com/c/go/+/135395/ is claiming.

One thing I'd note is at least for experimenting with this w/kubelet and crio, there's no need to patch the MCO; one could provide a MachineConfig object that adds a systemd-drop in for those two units and get most of the desired effect here I think.

@rphillips rphillips changed the title WIP: test MADV_DONTNEED defaulted Bug 1907929: enable madvdontneed in system components Dec 15, 2020
@openshift-ci-robot openshift-ci-robot added bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Dec 15, 2020
@openshift-ci-robot
Copy link
Contributor

@rphillips: This pull request references Bugzilla bug 1907929, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.7.0) matches configured target release for branch (4.7.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1907929: enable madvdontneed in system components

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Dec 15, 2020
@rphillips
Copy link
Contributor Author

rphillips commented Dec 15, 2020

@cgwalters correct. The thought is that instead of getting into a memory pressure scenario is to have GO free the memory back to the kernel more aggressively. This will help in quite a few scenarios: scheduling, NodeNotReady, etc. Since, Golang 1.16 is going to default this option, it should have minimal impact to the system.

This PR is Part 2.
Part 1 of the PR for Openshift Images is here: openshift/images#61

@openshift-ci-robot
Copy link
Contributor

@rphillips: This pull request references Bugzilla bug 1907929, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.7.0) matches configured target release for branch (4.7.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1907929: enable madvdontneed in system components

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rphillips
Copy link
Contributor Author

/retest

@rphillips
Copy link
Contributor Author

/test e2e-aws-serial

@openshift-merge-robot
Copy link
Contributor

@rphillips: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-serial 6beb42d258623eed0c252d4067789f425f0d5b20 link /test e2e-aws-serial

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@rphillips
Copy link
Contributor Author

/close

closing for now...

@openshift-ci-robot
Copy link
Contributor

@rphillips: Closed this PR.

In response to this:

/close

closing for now...

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

@rphillips: This pull request references Bugzilla bug 1907929. The bug has been updated to no longer refer to the pull request using the external bug tracker. All external bug links have been closed. The bug has been moved to the NEW state.

In response to this:

Bug 1907929: enable madvdontneed in system components

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rphillips
Copy link
Contributor Author

/reopen

cc @smarterclayton

@openshift-ci-robot
Copy link
Contributor

@rphillips: Reopened this PR.

In response to this:

/reopen

cc @smarterclayton

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

@rphillips: This pull request references Bugzilla bug 1907929, which is invalid:

  • expected the bug to target the "4.7.0" release, but it targets "4.8.0" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1907929: enable madvdontneed in system components

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. and removed bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Jan 22, 2021
@smarterclayton
Copy link
Contributor

Is this EVERY component, or just openshift / rhcos system component services?

@rphillips
Copy link
Contributor Author

This is every component. Since Golang 1.16 is going to default the option it seems reasonable to default it everywhere.

@smarterclayton
Copy link
Contributor

Every component inside AND outside a container? Or just all system golang services not running in a container? If it's the latter, i think that's ok (we control all of them) BUT we would need to verify that teams in RHCOS that use golang are ready for it. Probably can't do the former because we don't control all the possible images that run on the system and the golang code they were compiled with (customer could have a golang app that doesn't work with this that we don't want to take from their control).

In general we don't inject our logic into our customers code unless we have to to ensure they function.

@rphillips
Copy link
Contributor Author

This option does not get passed down to inside the container... so all system golang services not running in a container is correct.

Customer workloads do not see this option.

@smarterclayton
Copy link
Contributor

Ok, then I think this is acceptable for RHCOS and OCP and OKD to enforce.

@rphillips
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci-robot
Copy link
Contributor

@rphillips: This pull request references Bugzilla bug 1907929, which is invalid:

  • expected the bug to target the "4.7.0" release, but it targets "4.8.0" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rphillips
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jan 22, 2021
@openshift-ci-robot
Copy link
Contributor

@rphillips: This pull request references Bugzilla bug 1907929, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.7.0) matches configured target release for branch (4.7.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rphillips
Copy link
Contributor Author

This change is safer. It'll only target crio and kubelet. If other system components would like to enable it, then we can do another drop-in for that component.

@kikisdeliveryservice
Copy link
Contributor

Is the thought that we should do this OpenShift-wide? The basic tradeoff here is latency vs returning memory to the OS, right? That seems to be what https://go-review.googlesource.com/c/go/+/135395/ is claiming.

One thing I'd note is at least for experimenting with this w/kubelet and crio, there's no need to patch the MCO; one could provide a MachineConfig object that adds a systemd-drop in for those two units and get most of the desired effect here I think.

new changes seem to fullfil @cgwalters ask and is limited to crio/kubelet so makes sense

/approve

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 22, 2021
@rphillips
Copy link
Contributor Author

/retest

@mrunalp
Copy link
Member

mrunalp commented Jan 22, 2021

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 22, 2021
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kikisdeliveryservice, mrunalp, rphillips

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [kikisdeliveryservice]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 22, 2021

@rphillips: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-workers-rhel7 0c7541e link /test e2e-aws-workers-rhel7
ci/prow/e2e-ovn-step-registry 0c7541e link /test e2e-ovn-step-registry

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 52dd3ba into openshift:master Jan 22, 2021
@openshift-ci-robot
Copy link
Contributor

@rphillips: All pull requests linked via external trackers have merged:

Bugzilla bug 1907929 has been moved to the MODIFIED state.

In response to this:

Bug 1907929: enable madvdontneed in system components

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rphillips
Copy link
Contributor Author

/cherry-pick release-4.6

@openshift-cherrypick-robot

@rphillips: new pull request created: #2397

In response to this:

/cherry-pick release-4.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rphillips
Copy link
Contributor Author

/cherry-pick release-4.6

@openshift-cherrypick-robot

@rphillips: new pull request created: #2632

In response to this:

/cherry-pick release-4.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants