Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jb]: monitor low memory notifications #10558

Merged
merged 1 commit into from
Jun 16, 2022
Merged

[jb]: monitor low memory notifications #10558

merged 1 commit into from
Jun 16, 2022

Conversation

akosyakov
Copy link
Member

@akosyakov akosyakov commented Jun 9, 2022

Description

As for now we monitor JVM used and max memory. It is not enough though to detect perf degradation, since GC can kick off and release memory. It turned out though that JB backend already monitors performance degradation internally and notifies a user about it with low memory notification after GC. This PR adds a counter of low memory notifications. Steady increase of such notifications means performance degradation for a user.

Related Issue(s)

Monitoring for #8704

How to test

Reproducing low memory notifications

Screenshot 2022-06-16 at 14 13 23

  • Double Shift → Find and open LanguageServerImpl.java file
  • Find some method and start typing to get auto triggered completions:
    • for local symbols like parameters
    • for global symbols like IllegalStateExceptions
  • Now try to trigger completions manually via Ctrl+Space.
  • Repeat 2 steps above till you get low memory notifications. It usually happens after indexing of JDK is done, IntellIJ is trying to index gradle project itself and serve completions already. After you seen it do above a bit more to trigger more notifications. You won't see them on UI new but backend will count them.

Screenshot 2022-06-16 at 14 15 11

Monitoring

  • Start a dev workspace https://gitpod.io#https://github.com/gitpod-io/gitpod/pull/10558
  • Run ./dev/preview/portforward-monitoring-satellite.sh -c harvester to port forward prometheus API endpoint.
  • Run gp preview $(gp url 3000)/d/oamBLUC7k/jetbrains-overview?orgId=1 --external to open JetBrains Overview dashboard.
  • You can see rate of low memory after GC notifications in the last 5 minutes as well as top 10 pods.

Release Notes

NONE

Documentation

@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-ak-jb-gc-pause.20 because the annotations in the pull request description changed
(with .werft/ from main)

@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-ak-jb-gc-pause.21 because the annotations in the pull request description changed
(with .werft/ from main)

@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-ak-jb-gc-pause.22 because the annotations in the pull request description changed
(with .werft/ from main)

@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-ak-jb-gc-pause.23 because the annotations in the pull request description changed
(with .werft/ from main)

@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-ak-jb-gc-pause.24 because the annotations in the pull request description changed
(with .werft/ from main)

@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-ak-jb-gc-pause.25 because the annotations in the pull request description changed
(with .werft/ from main)

@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-ak-jb-gc-pause.26 because the annotations in the pull request description changed
(with .werft/ from main)

@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-ak-jb-gc-pause.27 because the annotations in the pull request description changed
(with .werft/ from main)

@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-ak-jb-gc-pause.28 because the annotations in the pull request description changed
(with .werft/ from main)

@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-ak-jb-gc-pause.29 because the annotations in the pull request description changed
(with .werft/ from main)

@akosyakov akosyakov changed the title [jb]: monitor low memory notifications and gc overhead [jb]: monitor low memory notifications Jun 16, 2022
@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-ak-jb-gc-pause.30 because the annotations in the pull request description changed
(with .werft/ from main)

@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-ak-jb-gc-pause.31 because the annotations in the pull request description changed
(with .werft/ from main)

@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-ak-jb-gc-pause.33 because the annotations in the pull request description changed
(with .werft/ from main)

@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-ak-jb-gc-pause.34 because the annotations in the pull request description changed
(with .werft/ from main)

@akosyakov
Copy link
Member Author

akosyakov commented Jun 16, 2022

/werft run with-clean-slate-deployment=true

👍 started the job as gitpod-build-ak-jb-gc-pause.35
(with .werft/ from main)

@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-ak-jb-gc-pause.36 because the annotations in the pull request description changed
(with .werft/ from main)

@akosyakov akosyakov marked this pull request as ready for review June 16, 2022 13:01
@akosyakov akosyakov requested a review from a team June 16, 2022 13:01
@akosyakov
Copy link
Member Author

akosyakov commented Jun 16, 2022

@mustard-mh Could you have a look please? 🙏

Copy link
Contributor

@mustard-mh mustard-mh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prometheus and Grafana graph view as expected (I know it's shown, but not clear what the numbers means)

P G
image image

When I click option Configuration of Low Mem Notification, restart IDE, and re-conn, it seems not work

Set New Conn
image image

Copy link
Contributor

@mustard-mh mustard-mh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM

/hold

Not sure if I need to test with observability too? Since we can see it in dev Grafana

@akosyakov
Copy link
Member Author

Not sure if I need to test with observability too? Since we can see it in dev Grafana

dev Grafana should be enough

Prometheus and Grafana graph view as expected (I know it's shown, but not clear what the numbers means)

First graph show overall in the system (among all pods) how many low memory cases happen. Second show top 10 pods with highest number of notifications. First graph should be clean ideally. Second should be used to investigate worst cases.

When I click option Configuration of Low Mem Notification, restart IDE, and re-conn, it seems not work

Yes, it does not work. I will file an issue to investigate how to fix it.

@akosyakov
Copy link
Member Author

/unhold

@roboquat roboquat merged commit 4ff0c7e into main Jun 16, 2022
@roboquat roboquat deleted the ak/jb_gc_pause branch June 16, 2022 14:11
@roboquat roboquat added deployed: IDE IDE change is running in production deployed Change is completely running in production labels Jun 16, 2022
@akosyakov
Copy link
Member Author

a follow-up to upgrade of Xmx: #10715

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deployed: IDE IDE change is running in production deployed Change is completely running in production release-note-none size/XL team: IDE
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants