Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve lingering question of LogRecord mutability in LogRecordProcessor #2955

Closed
jack-berg opened this issue Nov 16, 2022 · 5 comments · Fixed by #2969
Closed

Resolve lingering question of LogRecord mutability in LogRecordProcessor #2955

jack-berg opened this issue Nov 16, 2022 · 5 comments · Fixed by #2969
Assignees
Labels
spec:logs Related to the specification/logs directory

Comments

@jack-berg
Copy link
Member

#2681 made it possible for LogRecordProcessor to mutate LogRecords in their onEmit implementations. Mutating LogRecords is critical for use cases like enriching with baggage, redacting sensitive information, and more.

After merging, a conversation took place regarding potential performance impact of allowing mutation due to locking requirements.

Option 1: Do Nothing

All LogRecordProcessors are allowed to mutate LogRecords. SDKs have to deal with this by coming up with appropriate locking strategies.

It may be possible to still have lock-free high performance if none of the registered LogRecordProcessors perform mutations. Consider a strategy where mutations are tracked using a lock free CAS (compare and set) counter like java's AtomicInteger. If no mutation ever occurs, then BatchLogProcessor can transform to an immutable ReadableLogRecord version needed to call LogRecordExporter#export without taking a lock. If mutations do occur, a lock would have to be obtained.

Option 2: Force LogRecordProcessor implementations to indicate whether they mutate

For example, by adding method to LogRecordProcessor called boolean isMutating().

This makes it somewhat easier to avoid locks in configurations where no LogRecordProcessors perform mutations.

If an implementation trying to mutate the LogRecord when isMutating() == false, reject the mutation and log a warning.

As noted here, this is the strategy used by collector processors.

Option 3: Split out start and end phases of emitting a LogRecord

SpanProcessor is split into two phases: 1. onStart runs when the span is starting, and has the ability to mutate data. 2. onEnd runs when the span has ended and DOES NOT have the ability to mutate the data.

The Log API could similarly be split into multiple phases of starging and log record and emitting it. LogRecordProcessor could then be split into multiple phases as well, with onStart allowed to mutate while the LogRecord sent to onEnd is immutable.

This produces awkward ergonomics for the API since people don't think of emitting a log in multiple phases. Additionally, we'd have to answer questions like what types of data can / can not be set in the start phase, and what data must be set in the start phase.

Not sure of any advantages of this approach above Option 2.

Let's talk about this and come to some conclusion so we can move forward with confidence.

@jack-berg jack-berg added the spec:logs Related to the specification/logs directory label Nov 16, 2022
@tigrannajaryan
Copy link
Member

Option 4 is to do something similar to Collector, where data is exclusively owned by one processor at a time: https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor#data-ownership

In this case there is no need for locks and processors can freely mutate the data while they own it.

@jack-berg
Copy link
Member Author

In this case there is no need for locks and processors can freely mutate the data while they own it.

Thanks for sharing that link. It looks like the collector relies on chaining to define the ownership boundaries. I.e. a processor is responsible for calling the next processor in the chain, and owns the data until it calls the next:

The duration of ownership of the data by processor is from the beginning of ConsumeTraces/ConsumeMetrics/ConsumeLogs call until the processor calls the next processor's ConsumeTraces/ConsumeMetrics/ConsumeLogs function, which passes the ownership to the next processor. After that the processor must no longer read or write the data since it may be concurrently modified by the new owner.

Changing the architecture of LogRecordProcessors to have each responsible for calling the next is different from the trace SDK design, which the log SDK was modeled after.

I wonder if we can define similar concepts of exclusive ownership, shared ownership, and ownership handoff while not requiring processors to call the next in the chain. Something like:

  • LogRecordProcessors declare intent to mutate by implementing boolean isMutating.
  • Log SDK operates in exclusive ownership mode if any LogRecordProcessor intends to mutate, else log SDK operates in shared ownership mode. There is no functional different between exclusive and shared ownership mode since the SDK doesn't have the fanout considerations of the collector.
  • In exclusive ownership mode, a processor can freely modify the data in its implementation of onEmit until onEmit returns.
  • In shared ownership mode, a processor is not allowed to modify the data unless it makes a copy.

The consequence of this design would be that asynchronous processors like BatchLogRecordProcessor wouldn't be allowed to mutate the data without making a copy.

This seems reasonable.

@tsloughter
Copy link
Member

I like just doing chaining, but even if not going that direction, requiring a copy for modification is essentially what OnEnd enforces, so it would be matching with the SpanProcessor design if that were done, right?

The rest (isMutating and ownership mode) seems implementation specific and/or to be optimizations so may be over complicating the discussion.

@jack-berg
Copy link
Member Author

I like just doing chaining

I like chaining as well, but think the design should be similar to tracing unless there's a strong reason to make it different.

The rest (isMutating and ownership mode) seems implementation specific and/or to be optimizations so may be over complicating the discussion.

Now that you mention it, there's no actual behavior change in an SDK in response to isMutation() in the scheme I outlined. If we were to define the ownership model as "a processor can freely modify the data in its implementation of onEmit until onEmit returns", then we sidestep the extra complexity and simply trust that processors will do the right think and not modify asynchronously without first making a copy.

Maybe this is enough? @tigrannajaryan does the collector somehow enforce its policy that no modifications can be made after the "processor calls the next processor's ConsumeTraces/ConsumeMetrics/ConsumeLogs function"?

@tigrannajaryan
Copy link
Member

Maybe this is enough? @tigrannajaryan does the collector somehow enforce its policy that no modifications can be made after the "processor calls the next processor's ConsumeTraces/ConsumeMetrics/ConsumeLogs function"?

@jack-berg It doesn't currently. We had one case recently when an exporter that was no supposed to mutate the date did mutate it and caused a crash. We discussed a possible "debug" option to enforce/catch such violations, but it will likely remain a debug-only capability since it has performance implications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:logs Related to the specification/logs directory
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants