Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need for EvictedHashMap, EvictedQueue #1283

Closed
cijothomas opened this issue Oct 3, 2023 · 6 comments · Fixed by #1381
Closed

Need for EvictedHashMap, EvictedQueue #1283

cijothomas opened this issue Oct 3, 2023 · 6 comments · Fixed by #1381

Comments

@cijothomas
Copy link
Member

We currently maintain a EvictedHashMap and EvictedQueue structures.

The main reason, as I understand it, for the usage of the above structures are to enforce the SpanLimits from the spec.

The current SDK implementation seem to be copying all attributes into these data structures : attributes, events, links.

The above has perf overheads. However, the spec does not require the kind of "eviction" behavior. i.e if user tries to add more attributes/events/links than allowed, ignore the newer one. There is no need of "evict" an existing one. We could achieve this with regular Vec/HashMap.

@TommyCpp mentioned the desire to give predictable behavior for users, if we are dropping attributes. I guess it can be achieved with modifications to the API/contract:
AddAttribute(KeyValue) - adds attribute. If size limit is reached, drop this one.
AddAttribute(Array/IntoIterator of KeyValues) - adds attributes in the order, and drops from the moment size limit is reached. If user wants to control the ones that gets dropped, they can provide attributes in that order.

Related to this the SpanBuilder being a Struct in the API, exposing the data structures used to store Attributes, Events and Links.. I think this should be a trait in the API, and the implementation (i.e SDK) should decide the data structures to store it. See similar issue for Logs.

Don't have an PR with some proposal or benchmark numbers yet, but opening the issue to get feedback about the general direction:

  1. Do not force every user to pay for the cost of "eviction" logic.
  2. If a user want predictable behavior when something is being dropped - they can opt-in to pay the cost. Others users should not pay the penalty.
@cijothomas
Copy link
Member Author

Linking to previous issue which is related to this : #794

@cijothomas
Copy link
Member Author

The original intention of the attribute limits feature is to protect buggy code from eating up whole memory, causing OOM etc. However, given TracerBuilder is in the API, and it holds the vec/map user provides, the OTel SDK cannot prevent any buggy code from causing OOM. User has to ensure they do not cause this!

Once within the OTel SDK, it is possible to do the trim logic (as SDK has access to the configured limits). However, the approach of truncate (as done in some places in sdk today), will not help, as truncate won't free up actual space. So again, not sure if the SDK should attempt to do trimming..

If the concern is more about keeping multiple spans, each with large number of events/attributes etc., then this problem is faced by BatchingExportProcessor only, as that is the component who would batch more than one Span. If this is the case, then it is better handled by BatchingExportProcessor, so as to not penalize others.

It is also possible that the trimming is required as backends have restrictions - in that case it is best handled by the Exporter. (Languages like .NET has taken this approach.)..

To achieve highest performance, we need to cut as much logic as possible from hot paths. I would vote to make the trimming an optional-opt-in feature. (some languages like C++ do not implement this feature at all)

As part of or after #1293, I'll send some proposals on how/where to handle the trimming capabilities. Happy to hear more thoughts on this topic.

@wperron
Copy link
Contributor

wperron commented Oct 11, 2023

Hey 👋 I authored #1148 -- We wanted to set the limit to 0 because we were dealing with a lot of spammy events coming from tracing that we wanted to blackhole entirely. There's a difference in philosophy here between the two libraries, and tracing being a drop-in replacement for log, generally speaking that tends to generate a lot of events in opentelemetry-land.

As for the eviction behavior, I too noticed while writing that PR that the EvictedQueue and EvictedHashMap were keeping a copy of the dropped attributes. I know for a fact it's not what's done in other SDKs like Ruby or Go. I personally have no concerns with removing that behavior. As long as the Rust SDK respects the span limits environment variables spec, I'm a happy user.

@cijothomas
Copy link
Member Author

cijothomas commented Oct 11, 2023

@wperron Would it solve the issue you are facing, if tracing-opentelemetry crate provide a way to simply don't sent its events as SpanEvents? So you can fix the issue at the source itself.

As long as the Rust SDK respects the span limits environment variables spec, I'm a happy user.

I cannot guarantee that. Its done today, but I will propose to remove it. If other maintainers/approvers agree and based on further feedback, we may remove that capability completely from otel-rust-sdk. (And offer it only as part of OTLP/other exporters). If your use case is to simply set limit to 0 - check my above response, and see if that helps.

Related discussion around tracing-opentelemetry's handling of events as SpanEvents. : #1111

@wperron
Copy link
Contributor

wperron commented Oct 11, 2023

if tracing-opentelemetry crate provide a way to simply don't sent its events as SpanEvents? So you can fix the issue at the source itself.

The assumption here is that users will always use tracing to instrument, and that opentelemetry is just there to export, with tracing-opentelementry to bridge between them. That's certainly true of a lot of applications right now, but I don't think it's a safe assumption to make.

For example we've found that the impedance mismatch between tracing and opentelemetry is simply not worth it, and we would rather just use opentelemetry directly to instrument (provided it can play nice with async runtimes but that's another discussion). We're aware of #1111 and would very much like to see it move forward, it's been a source of pain for us for sure.

In my opinion, opentelemetry-rust should be usable standalone, and as such should respect the spec wherever possible. From my understanding of the initial description of the issue, the performance drawbacks of the span limits are primarily because the SDK is keeping track of every attribute/event/link that gets evicted. We should fix that, not remove the feature entirely.

@cijothomas
Copy link
Member Author

The assumption here is that users will always use tracing to instrument

Sorry that was not the message I was intending to convey! I used that because you were setting the limit to 0, to completely silence SpanEvents! That is not the original intent of the SpanEvent limit feature. If someone is setting it to 0, then I think it is because someone is producing SpanEvents, and that is undesired for me (either because of too much noise or my backed dont accept it etc.), so a better option is to ask the producer to stop producing SpanEvents in the first place, than them producing it, and then dropping it at the OTel side! An example from another repo : open-telemetry/opentelemetry-dotnet-contrib#416

In my opinion, opentelemetry-rust should be usable standalone, and as such should respect the spec wherever possible. From my understanding of the initial description of the issue, the performance drawbacks of the span limits are primarily because the SDK is keeping track of every attribute/event/link that gets evicted. We should fix that, not remove the feature entirely.

Agree! One of the fix here is to enforce the limit elsewhere, not in the SDK itself. (But that is not settled. Its possible we'll just do it in the SDK itself. I'll try some benchmarks and make proposals based on that).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants