Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-17157: opm: always serve pprof endpoints, improve server allocations #1129

Conversation

stevekuznetsov
Copy link
Member

There's no substantial runtime cost to serving the endpoints by default, and when things hit the fan you always need to query them in situ.

Description of the change:

Motivation for the change:

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Docs updated or added to /docs
  • Commit messages sensible and descriptive

@codecov
Copy link

codecov bot commented Jul 27, 2023

Codecov Report

Merging #1129 (c2e5511) into master (bca2bfb) will increase coverage by 0.04%.
Report is 1 commits behind head on master.
The diff coverage is 50.00%.

@@            Coverage Diff             @@
##           master    #1129      +/-   ##
==========================================
+ Coverage   53.77%   53.81%   +0.04%     
==========================================
  Files         108      108              
  Lines       10163    10166       +3     
==========================================
+ Hits         5465     5471       +6     
+ Misses       3738     3734       -4     
- Partials      960      961       +1     
Files Changed Coverage Δ
pkg/cache/json.go 57.02% <33.33%> (+0.35%) ⬆️
pkg/cache/tar.go 63.15% <66.66%> (+2.04%) ⬆️

... and 2 files with indirect coverage changes

cmd/opm/serve/serve.go Outdated Show resolved Hide resolved
@stevekuznetsov stevekuznetsov force-pushed the skuznets/pprof-always branch 2 times, most recently from 3ec0ea3 to 2796254 Compare July 27, 2023 14:22
Previously, new buffers were allocated on each file we read in, which
was unnecessary and wasteful.

Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
@stevekuznetsov
Copy link
Member Author

Note on the first commit: io.Copy will allocate the exact same size buffer (buf := make([]byte, 32*1024)) for each file, as long as the file is larger than the buffer, which almost all are. In the catalog I am looking at, there are 1703 files on disk and I saw ~50MiB allocated at startup. Doing the math and looking at pprof, that's the clear culprit. While this is un-used memory, it will take a long time for the kernel to reclaim it from Go's runtime and there's no need to do the allocations. Each file is read in serially, so using a singular buffer should not impact how long it takes to run this check at startup.

@stevekuznetsov
Copy link
Member Author

unit tests broken on codecov ...

[2023-07-27T14:38:37.937Z] ['error'] There was an error running the uploader: Error uploading to [https://codecov.io:](https://codecov.io/) Error: There was an error fetching the storage URL during POST: 404 - {'detail': ErrorDetail(string='Unable to locate build via Github Actions API. Please upload with the Codecov repository upload token to resolve issue.', code='not_found')}
Error: Codecov: Failed to properly upload: The process '/home/runner/work/_actions/codecov/codecov-action/v3/dist/codecov' failed with exit code 255

@stevekuznetsov stevekuznetsov changed the title opm: always serve pprof endpoints opm: always serve pprof endpoints, improve server allocations Jul 27, 2023
There is no substantial runtime cost to serving pprof endpoints, and
when things hit the fan and we need to investigate performance in situ,
there is no time to restart pods and change flags. Capturing profiles
remains opt-in, since those are costly.

Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 30, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: grokspawn, stevekuznetsov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 30, 2023
@grokspawn
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 30, 2023
@openshift-merge-robot openshift-merge-robot merged commit 68e13df into operator-framework:master Jul 30, 2023
12 of 13 checks passed
@stevekuznetsov stevekuznetsov changed the title opm: always serve pprof endpoints, improve server allocations OCPBUGS-17157: opm: always serve pprof endpoints, improve server allocations Aug 1, 2023
@openshift-ci-robot
Copy link

@stevekuznetsov: Jira Issue OCPBUGS-17157: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-17157 has not been moved to the MODIFIED state.

In response to this:

There's no substantial runtime cost to serving the endpoints by default, and when things hit the fan you always need to query them in situ.

Description of the change:

Motivation for the change:

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Docs updated or added to /docs
  • Commit messages sensible and descriptive

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants