Skip to content

Conversation

@grokspawn
Copy link
Contributor

@grokspawn grokspawn commented May 27, 2022

Description of the change:
Add pprof cpu/mem profiling to opm serve command.
Profiling options including dumping cpu/mem profile files to disk, or serving a /pprof HTTP endpoint. Either/both options can be used.

Usage:

% ./bin/opm serve  -h
This command serves declarative configs via a GRPC server.

NOTE: The declarative config directory is loaded by the serve command at
startup. Changes made to the declarative config after the this command starts
will not be reflected in the served content.

Usage:
  opm serve <source_path> [flags]

Flags:
      --debug                    enable debug logging
  -h, --help                     help for serve
  -p, --port string              port number to serve on (default "50051")
      --pprof-addr string        address of startup profiling endpoint (addr:port format)
  -t, --termination-log string   path to a container termination log file (default "/dev/termination-log")

Global Flags:
      --skip-tls-verify   skip TLS certificate verification for container image registries while pulling bundles
      --use-http          use plain HTTP for container image registries while pulling bundles

Update the Dockerfile to generate the profiling data (and set optional paths/filenames):

...
COPY amd64 /configs
COPY --from=builder /bin/opm/ /bin/opm
COPY --from=builder /bin/grpc_health_probe /bin/grpc_health_probe
WORKDIR /tmp
ARG user=1001
RUN microdnf clean all && chmod -vR g=u /etc/passwd
USER ${user}
ENTRYPOINT ["/bin/opm"]
CMD ["serve", "--pprof-addr", "localhost:60066", /configs"]

output example:

% ./bin/opm serve --pprof-addr localhost ~/devel/test-index/ppc64le
INFO[0000] start caching startup cpu profile data at "/debug/pprof/startup/cpu"  address="localhost:60066"
INFO[0000] starting default pprof endpoint               address="localhost:60066"
WARN[0000] unable to set termination log path            error="open /dev/termination-log: operation not permitted"
INFO[0000] serving registry                              configs=/Users/jordan/devel/test-index/ppc64le port=50051
INFO[0000] stop caching startup cpu profile data         address="localhost:60066"
^CINFO[0102] shutting down...                              configs=/Users/jordan/devel/test-index/ppc64le port=50051

pprof may be connected to the custom endpoint "/debug/pprof/startup/cpu" like

% go tool pprof http://localhost:60066/debug/pprof/startup/cpu
Fetching profile over HTTP from http://localhost:60066/debug/pprof/startup/cpu
Saved profile in /Users/jordan/pprof/pprof.samples.cpu.007.pb.gz
Type: cpu
Time: Jun 3, 2022 at 3:49pm (CDT)
Duration: 503.93ms, Total samples = 360ms (71.44%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 250ms, 69.44% of 360ms total
Showing top 10 nodes out of 137
      flat  flat%   sum%        cum   cum%
      50ms 13.89% 13.89%       50ms 13.89%  encoding/json.stateInString
      50ms 13.89% 27.78%       50ms 13.89%  runtime.madvise
      40ms 11.11% 38.89%       60ms 16.67%  encoding/json.checkValid
      30ms  8.33% 47.22%       60ms 16.67%  encoding/json.(*decodeState).skip
      20ms  5.56% 52.78%       20ms  5.56%  runtime.memmove
      20ms  5.56% 58.33%       20ms  5.56%  sync.(*Map).Load
      10ms  2.78% 61.11%       10ms  2.78%  encoding/binary.bigEndian.PutUint64
      10ms  2.78% 63.89%       10ms  2.78%  encoding/json.compact
      10ms  2.78% 66.67%       10ms  2.78%  encoding/json.unquoteBytes
      10ms  2.78% 69.44%       10ms  2.78%  gopkg.in/yaml%2ev2.read
(pprof) web encoding/json.checkValid
(pprof) q

Motivation for the change:
Downstream issues with FBC serving on alternate architectures, needing more information which could be generally useful.

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Docs updated or added to /docs
  • Commit messages sensible and descriptive

@openshift-ci openshift-ci bot requested review from ankitathomas and jmrodri May 27, 2022 16:17
@codecov
Copy link

codecov bot commented May 27, 2022

Codecov Report

Merging #968 (6c4422f) into master (0899512) will not change coverage.
The diff coverage is n/a.

❗ Current head 6c4422f differs from pull request most recent head fc03fa6. Consider uploading reports for the commit fc03fa6 to get more accurate results

@@           Coverage Diff           @@
##           master     #968   +/-   ##
=======================================
  Coverage   52.48%   52.48%           
=======================================
  Files         103      103           
  Lines        9240     9240           
=======================================
  Hits         4850     4850           
  Misses       3468     3468           
  Partials      922      922           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0899512...fc03fa6. Read the comment docs.

@grokspawn grokspawn force-pushed the grokspawn-pprof-playing branch from fb0f968 to 3713770 Compare May 27, 2022 16:23
@joelanford
Copy link
Member

Should we run a /pprof http endpoint instead? That would make it easier to pull down the pprof data in the context of a pod running in the cluster.

@grokspawn
Copy link
Contributor Author

grokspawn commented May 29, 2022

Should we run a /pprof http endpoint instead? That would make it easier to pull down the pprof data in the context of a pod running in the cluster.

I gave this some thought, and based on our current diagnosis needs, I thought offline info would be easier to stick on a bz. Endpoint doesn't seem to capture the opm serve process start, but I implemented both cases so we have options.

@grokspawn grokspawn force-pushed the grokspawn-pprof-playing branch from 3713770 to 0ca0a31 Compare May 31, 2022 19:52
@grokspawn grokspawn changed the title add offline pprof data for opm serve command add pprof options for opm serve command May 31, 2022
Copy link
Member

@anik120 anik120 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 31, 2022
Copy link
Contributor

@exdx exdx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good. We can definitely use a pprof implementation like this in the newer set of APIs.

@grokspawn grokspawn force-pushed the grokspawn-pprof-playing branch from 0ca0a31 to 3fa5a94 Compare June 1, 2022 21:00
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 1, 2022
@exdx
Copy link
Contributor

exdx commented Jun 2, 2022

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 2, 2022
@joelanford
Copy link
Member

Sorry for the driveby, but a potential option for the "where to put the startup profile" is to capture it to a bytes.Buffer{}, and then run an http endpoint that writes that buffer to the http response.

@grokspawn grokspawn force-pushed the grokspawn-pprof-playing branch from 3fa5a94 to 3dd5fa7 Compare June 3, 2022 21:36
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 3, 2022
@grokspawn grokspawn requested review from anik120, exdx and joelanford June 3, 2022 21:40
Copy link
Contributor

@tylerslaton tylerslaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 8, 2022
@grokspawn grokspawn force-pushed the grokspawn-pprof-playing branch from 3dd5fa7 to 9c0209b Compare June 10, 2022 17:50
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 10, 2022
serves cached startup cpu profile data via custom endpoint

Signed-off-by: Jordan Keister <jordan@nimblewidget.com>
Signed-off-by: Jordan Keister <jordan@nimblewidget.com>
@timflannagan
Copy link
Member

I mentioned in slack that I think we can refactor this implementation, but it seems sufficient for now.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 17, 2022
Signed-off-by: Jordan Keister <jordan@nimblewidget.com>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 17, 2022
@exdx
Copy link
Contributor

exdx commented Jun 21, 2022

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 21, 2022
Copy link
Member

@awgreene awgreene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks great, nice work!

port string
terminationLog string
debug bool

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this newline?


// goroutine exits with main
go func() {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: remove newline.

@awgreene
Copy link
Member

/lgtm

Copy link
Member

@dinhxuanvu dinhxuanvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 23, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dinhxuanvu, exdx, grokspawn, tylerslaton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 23, 2022
@openshift-ci openshift-ci bot merged commit ad7e7ac into master Jun 23, 2022
@grokspawn grokspawn deleted the grokspawn-pprof-playing branch June 23, 2022 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants