Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revoke storing target and metadata cache in context. #10590

Merged
merged 1 commit into from
Apr 14, 2022

Conversation

roidelapluie
Copy link
Member

Storing the scrape cache and the target (which also contains that cache)
is apparently causing hige memory increase. I think me might not control
the lifespan of the context enough, therefore old objects keep living in
memory for longer than needed.

Let's unblock the release and look for an alternative so that downstream
consumers can get access to that data.

Signed-off-by: Julien Pivotto roidelapluie@o11y.eu

Storing the scrape cache and the target (which also contains that cache)
is apparently causing hige memory increase. I think me might not control
the lifespan of the context enough, therefore old objects keep living in
memory for longer than needed.

Let's unblock the release and look for an alternative so that downstream
consumers can get access to that data.

Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
@roidelapluie
Copy link
Member Author

/prombench v2.34.0

@prombot
Copy link
Contributor

prombot commented Apr 14, 2022

⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️

Compared versions: PR-10590 and v2.34.0

After successful deployment, the benchmarking metrics can be viewed at:

Other Commands:
To stop benchmark: /prombench cancel
To restart benchmark: /prombench restart v2.34.0

@roidelapluie
Copy link
Member Author

maybe #10588 fixes this too

@roidelapluie
Copy link
Member Author

/prombench cancel

@prombot
Copy link
Contributor

prombot commented Apr 14, 2022

Benchmark cancel is in progress.

@gouthamve
Copy link
Member

Hi, sorry for all the trouble. I'll look into this now, but just to be clear, #10588 did not fix it right?

@roidelapluie
Copy link
Member Author

Hi, sorry for all the trouble. I'll look into this now, but just to be clear, #10588 did not fix it right?

No, it did not. we immediately see huge increase in RSS in prombench.

@Nexucis
Copy link
Member

Nexucis commented Apr 14, 2022

thanks @roidelapluie! double cross finger it fixed the memory leak

@Nexucis Nexucis merged commit db8c550 into prometheus:release-2.35 Apr 14, 2022
gouthamve added a commit that referenced this pull request Apr 27, 2022
The previous attempt was rolled back in #10590 due to memory issues.

`sl.parentCtx` and `sl.ctx` both had a copy of the cache and target info
in the previous attempt and it was hard to pin-point where the context
was being retained causing the memory increase.

I've experimented a bunch in #10627 to figure out that this approach doesn't
cause memory increase. Beyond that, just using this info in _any_ other context
is causing a memory increase.

The change fixed a bunch of long-standing in the OTel Collector that the
community was waiting on and release is blocked on a few downstream distrubutions
of OTel Collector waiting on a fix. I propose to merge this change in while
I investigate what is happening.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
gouthamve added a commit that referenced this pull request May 2, 2022
The previous attempt was rolled back in #10590 due to memory issues.

`sl.parentCtx` and `sl.ctx` both had a copy of the cache and target info
in the previous attempt and it was hard to pin-point where the context
was being retained causing the memory increase.

I've experimented a bunch in #10627 to figure out that this approach doesn't
cause memory increase. Beyond that, just using this info in _any_ other context
is causing a memory increase.

The change fixed a bunch of long-standing in the OTel Collector that the
community was waiting on and release is blocked on a few downstream distrubutions
of OTel Collector waiting on a fix. I propose to merge this change in while
I investigate what is happening.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
gouthamve added a commit that referenced this pull request May 3, 2022
* Send target and metadata cache in context (again)

The previous attempt was rolled back in #10590 due to memory issues.

`sl.parentCtx` and `sl.ctx` both had a copy of the cache and target info
in the previous attempt and it was hard to pin-point where the context
was being retained causing the memory increase.

I've experimented a bunch in #10627 to figure out that this approach doesn't
cause memory increase. Beyond that, just using this info in _any_ other context
is causing a memory increase.

The change fixed a bunch of long-standing in the OTel Collector that the
community was waiting on and release is blocked on a few downstream distrubutions
of OTel Collector waiting on a fix. I propose to merge this change in while
I investigate what is happening.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Gate the change behind a manager option

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
roidelapluie added a commit to roidelapluie/prometheus that referenced this pull request Jun 22, 2022
Storing the scrape cache and the target (which also contains that cache)
is apparently causing hige memory increase. I think me might not control
the lifespan of the context enough, therefore old objects keep living in
memory for longer than needed.

Let's unblock the release and look for an alternative so that downstream
consumers can get access to that data.

Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
roidelapluie pushed a commit to roidelapluie/prometheus that referenced this pull request Jun 22, 2022
* Send target and metadata cache in context (again)

The previous attempt was rolled back in prometheus#10590 due to memory issues.

`sl.parentCtx` and `sl.ctx` both had a copy of the cache and target info
in the previous attempt and it was hard to pin-point where the context
was being retained causing the memory increase.

I've experimented a bunch in prometheus#10627 to figure out that this approach doesn't
cause memory increase. Beyond that, just using this info in _any_ other context
is causing a memory increase.

The change fixed a bunch of long-standing in the OTel Collector that the
community was waiting on and release is blocked on a few downstream distrubutions
of OTel Collector waiting on a fix. I propose to merge this change in while
I investigate what is happening.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Gate the change behind a manager option

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants