Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential memory leak after upgrading to 3.0.1 from 2.0.0 #269

Closed
SimonTheLeg opened this issue Aug 12, 2021 · 5 comments
Closed

Potential memory leak after upgrading to 3.0.1 from 2.0.0 #269

SimonTheLeg opened this issue Aug 12, 2021 · 5 comments

Comments

@SimonTheLeg
Copy link

What happened:
After upgrading from grafana-image-renderer 2.0.0 to 3.0.1 we are seeing increased memory usage across some of our instances. Additionally the increase happens gradually over time, indicating possibly a memory leak.

image

Notes on the picture above:

  • we have an instance of grafana-image-renderer per customer. Each line in the graph represents one instance
  • all instances were upgraded around the same time. We are not seeing the issue across all instances. However this can be due to the fact, that we only use the image-renderer on one page and usage of that page might differ between customers

here is a more fine-grained chart of one of the instances

image

What you expected to happen:
Memory consumption to be roughly the same as before the upgrade

How to reproduce it (as minimally and precisely as possible):

I am having a tough time reproducing this issue in a controlled environment. I have decided to try to explain what we are doing, please let me know if there is any additional info I can fetch to help you out.

First off, we are only doing one call to get a chart. The chart query is not too fancy and takes around 2 seconds to finish. Here is an example using curl

curl -w "@/tmp/curl-format.txt" -s '<<sanitized>>/render/d-solo/_eAFi6uWk/meshfed?refresh=1m&orgId=1&panelId=13&theme=light&width=1000&height=500&tz=Europe/Berlin' --user "${UNAME}:${PW}"

     time_namelookup:  0.001459s
        time_connect:  0.023928s
     time_appconnect:  0.071088s
    time_pretransfer:  0.071160s
       time_redirect:  0.000000s
  time_starttransfer:  2.251392s
                     -----------
          time_total:  2.251415s

I have already tried to manually put a high-load on the system calling the aforementioned command in a loop, but no indication of memory leaks, the renderer frees the memory after the load again, dropping consumption back to its idle state.

image

Anything else we need to know?:

Environment:

  • Grafana Image Renderer version: 3.0.1
  • Grafana version: v8.0.6
  • Installed plugin or remote renderer service: // not sure what is meant here. We are running the standard docker image as a remote renderer service
  • OS Grafana Image Renderer is installed on: alpine (we are running the standard docker image)
@SimonTheLeg SimonTheLeg changed the title Memory leak after upgrading to 3.0.1 from 2.0.0 Potential memory leak after upgrading to 3.0.1 from 2.0.0 Aug 12, 2021
@joanlopez
Copy link
Collaborator

Hey @SimonTheLeg,

Looking at the plugin's change log, I see one of the major changes between both versions is the support to render CSV files.
Could that be the reason some instances are experiencing an increase in the memory usage and some not?
Have you tried that feature while trying to reproduce the error locally?

Any information related to that feature could help in refining the debugging task.
Otherwise, I guess we could explore some ways to extract some memory profiles from production, if possible.

Thanks! 🙌🏻

@SimonTheLeg
Copy link
Author

Hi @joanlopez
Thank you for the fast response. I was out the past weeks, but I am back with some good news I think. I have managed to replicate the issue in the dev environment.

So after the "load-test" mentioned before:

I have already tried to manually put a high-load on the system calling the aforementioned command in a loop, but no indication of memory leaks, the renderer frees the memory after the load again, dropping consumption back to its idle state.

I have noticed that the previous statement is only partially true. So it does return back to normal at first, but after a while exactly the behaviour described in production occurs.

image

With that I think we could give it a chance to extract memory profiles from the dev environment as well while replicating the situation. Could you give me some pointers how to do this and what to post in this issue?

@AgnesToulet
Copy link
Contributor

Hi @SimonTheLeg,

Are you using the default rendering mode? (see https://github.com/grafana/grafana-image-renderer/blob/master/docs/remote_rendering_using_docker.md#environment-variables, section "Change how browser instances are created") And did you see some requests timeout during your load test?

If so, you probably encountered the issue fixed by this PR. We'll release a new version containing this fix very soon (planned today if nothing goes wrong) but you can also apply it anytime by changing your configuration (through environment variables, see doc above, or through the configuration file).

@AgnesToulet
Copy link
Contributor

@SimonTheLeg Version 3.1.0 is released. I'm closing this issue as it should be fixed by upgrading the plugin but feel free to open it again if this didn't fix it.

@SimonTheLeg
Copy link
Author

Hi @AgnesToulet. Sounds good! I have tried it with 3.2.0 and the leak did not occur anymore. Thank you for your work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants