Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/deltatocumulative] limit tracked streams #31488

Merged
merged 5 commits into from Mar 11, 2024

Conversation

sh0rez
Copy link
Contributor

@sh0rez sh0rez commented Feb 28, 2024

Description: Adds a configurable upper limit to the number of tracked streams. This allows to introduce a upper bound to the memory usage.

Testing: Test case was added

Documentation: README was updated

@RichieSams
Copy link
Contributor

I realize this is still draft, and you're iterating. But when everything is ready, can we split the internal/exp/metrics changes into a separate PR?

@sh0rez sh0rez marked this pull request as ready for review March 5, 2024 13:07
@sh0rez sh0rez requested a review from a team as a code owner March 5, 2024 13:07
@sh0rez sh0rez requested a review from TylerHelmuth March 5, 2024 13:07
@sh0rez
Copy link
Contributor Author

sh0rez commented Mar 5, 2024

This self-simplified a lot with the merge of the recent staleness PR.

Its ready for final review @RichieSams @djaglowski @ethercrow

@RichieSams
Copy link
Contributor

Just a small comment. Otherwise, LGTM

if err := m.Map.Store(id, v); err != nil {
return err
}
return ErrEvicted{ErrLimit: errl, id: gone}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return an error? Yes we had to evict something, but the store itself was a "success". IMO, these aren't actual errors. Just statistics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at this point they are errors imo, the consumer (processor) should decide what to do with them.
when #31363 is merged, those will turn into metrics.
Until then I think they are best treated as errors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. But I guess my point is that eviction itself isn't an error. It's just something that happens. IE, eviction should not increment the metrics_processed{error=true} metric (I don't recall the correct name :P ).

Perhaps the eviction yes/no value should be returned as an additional return, instead of being convolved with the error status. Thoughts?

Copy link
Contributor Author

@sh0rez sh0rez Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree eviction should not increment that specific counter, but rather be streams_evicted, so you can easily alert on that. Imo eviction should never be part of normal operations, as it signals a fairly serious capacity issue and thus is very close to an error scenario.

I think it's a not uncommon thing in Go to use the Error for non-fatal occurrences, as that's the whole point of having a value-based, easy to manipulate error system.

@jpkrohling
Copy link
Member

I'm merging without final approval from @djaglowski as he's not available this week, and I believe there's nothing here that can't be worked on via follow-up PRs.

@jpkrohling jpkrohling merged commit 97f685e into open-telemetry:main Mar 11, 2024
142 checks passed
@github-actions github-actions bot added this to the next release milestone Mar 11, 2024
@sh0rez sh0rez deleted the deltatocumulative-limits branch March 11, 2024 13:46
jpkrohling pushed a commit that referenced this pull request Mar 11, 2024
**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->

#31488
was recently merged and has broken `build-and-test` for all builds due
to a lint failure. It looks like two PRs were worked on in parallel, the
aforementioned one, as well as
#31625.
#31625 renamed `exp` to `stale`, but the most recently merged PR was
referencing the original `exp` variable.

This is an unreleased component, and is simply fixing a bug, so I don't
think this should have a changelog.
DougManton pushed a commit to DougManton/opentelemetry-collector-contrib that referenced this pull request Mar 13, 2024
…1488)

**Description:** Adds a configurable upper limit to the number of
tracked streams. This allows to introduce a upper bound to the memory
usage.

**Testing:** Test case was added

**Documentation:** README was updated
DougManton pushed a commit to DougManton/opentelemetry-collector-contrib that referenced this pull request Mar 13, 2024
**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->

open-telemetry#31488
was recently merged and has broken `build-and-test` for all builds due
to a lint failure. It looks like two PRs were worked on in parallel, the
aforementioned one, as well as
open-telemetry#31625.
open-telemetry#31625 renamed `exp` to `stale`, but the most recently merged PR was
referencing the original `exp` variable.

This is an unreleased component, and is simply fixing a bug, so I don't
think this should have a changelog.
XinRanZhAWS pushed a commit to XinRanZhAWS/opentelemetry-collector-contrib that referenced this pull request Mar 13, 2024
…1488)

**Description:** Adds a configurable upper limit to the number of
tracked streams. This allows to introduce a upper bound to the memory
usage.

**Testing:** Test case was added

**Documentation:** README was updated
XinRanZhAWS pushed a commit to XinRanZhAWS/opentelemetry-collector-contrib that referenced this pull request Mar 13, 2024
**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->

open-telemetry#31488
was recently merged and has broken `build-and-test` for all builds due
to a lint failure. It looks like two PRs were worked on in parallel, the
aforementioned one, as well as
open-telemetry#31625.
open-telemetry#31625 renamed `exp` to `stale`, but the most recently merged PR was
referencing the original `exp` variable.

This is an unreleased component, and is simply fixing a bug, so I don't
think this should have a changelog.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants