Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spanmetrics connector has memory leak #27654

Closed
bripkens opened this issue Oct 12, 2023 · 5 comments · Fixed by #28847
Closed

spanmetrics connector has memory leak #27654

bripkens opened this issue Oct 12, 2023 · 5 comments · Fixed by #28847
Labels

Comments

@bripkens
Copy link
Contributor

Component(s)

connector/spanmetrics

What happened?

Description

The spanmetrics connector has a memory leak that occurs when the cumulative temporality is used. The connectorImp#resourceMetrics map is only ever cleaned up in delta temporality.

func (p *connectorImp) resetState() {
// If delta metrics, reset accumulated data
if p.config.GetAggregationTemporality() == pmetric.AggregationTemporalityDelta {
p.resourceMetrics = make(map[resourceKey]*resourceMetrics)
p.metricKeyToDimensions.Purge()
p.startTimestamp = pcommon.NewTimestampFromTime(time.Now())
} else {
p.metricKeyToDimensions.RemoveEvictedItems()
// Exemplars are only relevant to this batch of traces, so must be cleared within the lock
if p.config.Histogram.Disable {
return
}
for _, m := range p.resourceMetrics {
m.histograms.Reset(true)
}
}
}

Steps to Reproduce

  1. Enable cumulative temporality
  2. Observe many changing resources

Expected Result

connectorImp#resourceMetrics is eventually cleaned up.

Actual Result

connectorImp#resourceMetrics keeps growing over time.

Collector version

628a6cb

Environment information

Not relevant

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

No response

@bripkens bripkens added bug Something isn't working needs triage New item requiring triage labels Oct 12, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@bryan-aguilar bryan-aguilar added priority:p1 High and removed needs triage New item requiring triage labels Oct 13, 2023
@bryan-aguilar
Copy link
Contributor

Thanks for the report @bripkens, are you interested in working on a fix?

bripkens added a commit to bripkens/opentelemetry-collector-contrib that referenced this issue Nov 1, 2023
# Why

The `spanmetrics` connector has a memory leak that occurs when the
cumulative temporality is used. The `connectorImp#resourceMetrics` map
is only ever cleaned up in delta temporality.

# What

Turn `connectorImp#resourceMetrics` into a LRU cache with a maximum
size. To correctly handle metric resets we also introduce a start
timestamp per `resourceMetric` instance.

# References

Fixes open-telemetry#27654
bripkens added a commit to bripkens/opentelemetry-collector-contrib that referenced this issue Nov 1, 2023
# Why

The `spanmetrics` connector has a memory leak that occurs when the
cumulative temporality is used. The `connectorImp#resourceMetrics` map
is only ever cleaned up in delta temporality.

# What

Turn `connectorImp#resourceMetrics` into a LRU cache with a maximum
size. To correctly handle metric resets we also introduce a start
timestamp per `resourceMetric` instance.

# References

Fixes open-telemetry#27654
bripkens added a commit to bripkens/opentelemetry-collector-contrib that referenced this issue Nov 1, 2023
# Why

The `spanmetrics` connector has a memory leak that occurs when the
cumulative temporality is used. The `connectorImp#resourceMetrics` map
is only ever cleaned up in delta temporality.

# What

Turn `connectorImp#resourceMetrics` into a LRU cache with a maximum
size. To correctly handle metric resets we also introduce a start
timestamp per `resourceMetric` instance.

# References

Fixes open-telemetry#27654
@bripkens
Copy link
Contributor Author

bripkens commented Nov 1, 2023

Fix proposal added via #28847

bripkens added a commit to bripkens/opentelemetry-collector-contrib that referenced this issue Nov 1, 2023
# Why

The `spanmetrics` connector has a memory leak that occurs when the
cumulative temporality is used. The `connectorImp#resourceMetrics` map
is only ever cleaned up in delta temporality.

# What

Turn `connectorImp#resourceMetrics` into a LRU cache with a maximum
size. To correctly handle metric resets we also introduce a start
timestamp per `resourceMetric` instance.

# References

Fixes open-telemetry#27654
bripkens added a commit to bripkens/opentelemetry-collector-contrib that referenced this issue Nov 1, 2023
# Why

The `spanmetrics` connector has a memory leak that occurs when the
cumulative temporality is used. The `connectorImp#resourceMetrics` map
is only ever cleaned up in delta temporality.

# What

Turn `connectorImp#resourceMetrics` into a LRU cache with a maximum
size. To correctly handle metric resets we also introduce a start
timestamp per `resourceMetric` instance.

# References

Fixes open-telemetry#27654
bripkens added a commit to bripkens/opentelemetry-collector-contrib that referenced this issue Nov 13, 2023
# Why

The `spanmetrics` connector has a memory leak that occurs when the
cumulative temporality is used. The `connectorImp#resourceMetrics` map
is only ever cleaned up in delta temporality.

# What

Turn `connectorImp#resourceMetrics` into a LRU cache with a maximum
size. To correctly handle metric resets we also introduce a start
timestamp per `resourceMetric` instance.

# References

Fixes open-telemetry#27654
@mfilipe
Copy link

mfilipe commented Nov 27, 2023

@bripkens do you think the comments I made here are related to your issue?

@bripkens
Copy link
Contributor Author

Sounds related, yes. I'd argue however that just pruning histograms is not sufficient. See my (open) PR which would fix this: #28847

bripkens added a commit to bripkens/opentelemetry-collector-contrib that referenced this issue Dec 5, 2023
# Why

The `spanmetrics` connector has a memory leak that occurs when the
cumulative temporality is used. The `connectorImp#resourceMetrics` map
is only ever cleaned up in delta temporality.

# What

Turn `connectorImp#resourceMetrics` into a LRU cache with a maximum
size. To correctly handle metric resets we also introduce a start
timestamp per `resourceMetric` instance.

# References

Fixes open-telemetry#27654

Co-authored-by: Jared Tan <jian.tan@daocloud.io>
TylerHelmuth pushed a commit that referenced this issue Dec 5, 2023
# Why

The `spanmetrics` connector has a memory leak that occurs when the
cumulative temporality is used. The `connectorImp#resourceMetrics` map
is only ever cleaned up in delta temporality.

# What

Turn `connectorImp#resourceMetrics` into a LRU cache with a maximum
size. To correctly handle metric resets we also introduce a start
timestamp per `resourceMetric` instance.

# References

Fixes
#27654

Co-authored-by: Jared Tan <jian.tan@daocloud.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants