Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logs: Add log record pooling #3385

Merged
merged 20 commits into from
Jun 30, 2022
Merged

Conversation

CodeBlanch
Copy link
Member

@CodeBlanch CodeBlanch commented Jun 18, 2022

[Goal is to pare down the amount of changes on #3305]

Fixes #1582
Fixes #2905

Changes

  • Adds two LogRecord pool implementations. One uses a ThreadStatic and the other implements a circular buffer. The type of pool to use is determined by the processor(s) registered with OpenTelemetryLoggerProvider.
  • Buffers state values when using batch exporter or in-memory exporter.

TODOs

  • Appropriate CHANGELOG.md updated for non-trivial changes

@CodeBlanch CodeBlanch requested a review from a team as a code owner June 18, 2022 17:46
@codecov
Copy link

codecov bot commented Jun 18, 2022

Codecov Report

Merging #3385 (c9dd94e) into main (09ef47d) will decrease coverage by 0.13%.
The diff coverage is 96.64%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3385      +/-   ##
==========================================
- Coverage   86.36%   86.22%   -0.14%     
==========================================
  Files         258      261       +3     
  Lines        9269     9411     +142     
==========================================
+ Hits         8005     8115     +110     
- Misses       1264     1296      +32     
Impacted Files Coverage Δ
src/OpenTelemetry/BatchExportProcessor.cs 84.11% <66.66%> (+2.20%) ⬆️
...penTelemetry/Logs/BatchLogRecordExportProcessor.cs 91.66% <75.00%> (-8.34%) ⬇️
src/OpenTelemetry/Logs/OpenTelemetryLogger.cs 86.66% <90.90%> (-2.81%) ⬇️
src/OpenTelemetry/Batch.cs 97.58% <94.11%> (-1.34%) ⬇️
...rter.InMemory/InMemoryExporterLoggingExtensions.cs 90.00% <100.00%> (-0.91%) ⬇️
src/OpenTelemetry/CompositeProcessor.cs 95.31% <100.00%> (ø)
src/OpenTelemetry/Logs/LogRecord.cs 64.35% <100.00%> (-19.65%) ⬇️
.../OpenTelemetry/Logs/OpenTelemetryLoggerProvider.cs 96.00% <100.00%> (+0.83%) ⬆️
src/OpenTelemetry/Logs/Pool/LogRecordPoolHelper.cs 100.00% <100.00%> (ø)
src/OpenTelemetry/Logs/Pool/LogRecordSharedPool.cs 100.00% <100.00%> (ø)
... and 9 more

Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a changelog entry.

}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
internal int RemoveReference()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How passionately do you feel about this? I looked around, don't see any good example or prior art in .NET. AddRef w/ Release seems a bit odd for me for a .NET API. AddRef and RemoveRef I think would be better? Or AddRef w/ ReleaseRef.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only 1% 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/OpenTelemetry/Batch.cs Outdated Show resolved Hide resolved
Copy link
Contributor

@TimothyMothra TimothyMothra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

return true;
}

current = current.Next;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this may be improbable but still possible - might have a concurrency issue here in the event another thread changes the value of Next.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you mention it, ya there are some issues here. I don't think AddProcessor is thread-safe for tracer or logger providers. I'm not going to tackle it on this PR. I don't even know if it is worth tackling at all but we could at least add comments on our "AddProcessor" methods about the thread safety?

Copy link
Member

@reyang reyang Jun 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the current behavior of AddProcessor:

  1. If multiple threads are calling AddProcessor simultaneously, the caller should be responsible for doing proper synchronization (this follows the .NET general approach - member functions are not thread safe by default).
  2. It is safe if a thread is calling AddProcessor while other threads are using the processors (e.g. by generating Activity objects which go through the processor chain).

And here is my general thinking regarding thread safety:

  1. All the telemetry APIs must be thread safe, following the OpenTelemetry spec:
  2. Any other APIs (e.g. the APIs exposed by the SDK package) should follow .NET general guidance, which is:
    • static method should be thread safe
    • member function will not be thread safe
  3. Certain functions might provide additional thread safety guarantees in addition to 2), and this should be called out explicitly in the API XML document:
    /// <remarks>
    /// This function guarantees thread-safety.
    /// </remarks>
    public bool ForceFlush(int timeoutMilliseconds = Timeout.Infinite)

Copy link
Member

@alanwest alanwest Jun 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 makes sense. Agreed we should follow .NET's guidance here.

Different concern, but same area of code... someone could do the following:

var processor = new CompositeProcessor<LogRecord>(new SomeProcessor());
loggerProvider.AddProcessor(processor);
processor.AddProcessor(new BatchLogRecordExportProcessor(new SomeExporter()));

The provider would end up using use the ThreadStaticPool, correct? Maybe this is far fetched as well and not worth doing anything about 😆. I suppose it would require constantly inspecting for any changes to the processor pipeline.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya that's a problem. Could add an event on CompositeProcessor like internal EventHandler ProcessorAdded we could fire and use to listen for changes. Just an idea, not going to do anything on this PR to address it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably won't like this idea, but could drop the ThreadStaticPool for now - if we went with the event solution, definitely seems fit for its own PR. I assume the SharedPool would still have perf benefits even when no batching is involved.

That said, not a blocker at all, I think it's probably fine to roll with this as a known edge case - the thing is, I think it's an edge case. I haven't encountered users who are modifying their processor pipeline after initialization.

// only one of the logRecords will make it back into the
// pool. Anything lost in the race will collected by the GC
// and the pool will issue new instances as needed
this.pool[returnSnapshot % this.Capacity] = logRecord;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of losing the logRecord object, we could try to write it to the next available index by using CompareExchange instead of Exchange:

            while (true)
            {
                var rentSnapshot = Volatile.Read(ref this.rentIndex);
                var returnSnapshot = Volatile.Read(ref this.returnIndex);

                if (returnSnapshot - rentSnapshot >= this.Capacity)
                {
                    return; // buffer is full
                }

                if (Interlocked.CompareExchange(ref this.returnIndex, returnSnapshot + 1, returnSnapshot) == returnSnapshot)
                {
-                    this.pool[returnSnapshot % this.Capacity] = logRecord;

+                    if (Interlocked.CompareExchange(ref this.pool[returnSnapshot % this.Capacity], logRecord, null) == null)
+                    {
+                        return;
+                    }
                }
            }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea, however I think it might be optimizing for the wrong case. The race called out in the comment is for "many threads hammering rent/return" which is more of a theoretic exercise. The actual use (right now) is many threads renting/reading (log emitters), one thread returning/writing (batch exporter). In that scenario, this race should really never be hit. So if we put in the Interlocked.CompareExchange that is more expensive than just doing this.pool[returnSnapshot % this.Capacity] = logRecord which should work for actual use case we have right now. Are you open to me updating the comment with the solution and considerations so we have it captured (in the event the use case changes) but we leave as-is for now so we have better perf for exporters returning records?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just updated the comment with more details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The race called out in the comment is for "many threads hammering rent/return" which is more of a theoretic exercise. The actual use (right now) is many threads renting/reading (log emitters), one thread returning/writing (batch exporter). In that scenario, this race should really never be hit.

I think we could still have a race for multiple threads returning/writing. We could have an emitter thread returning because OnExport failed and the exporter thread returning simultaneously but I guess it's okay to ignore that scenario as it might not be as common.

@@ -57,9 +57,14 @@ public override void OnEnd(LogRecord data)
// happen here.
Debug.Assert(data != null, "LogRecord was null.");

data!.BufferLogScopes();
data!.Buffer();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets add a TODO to add a targeted test for testing #2905

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some tests I did on #2932 so I just updated and added them to this PR.


internal void Buffer()
{
this.BufferLogStateValues();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment/todo for "state" - do we need to do similar for state as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

}

var attributeStorage = this.AttributeStorage ??= new List<KeyValuePair<string, object?>>(stateValues.Count);
attributeStorage.AddRange(stateValues);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment this is what triggers the iteration/enumeration to "snap" what we need before things could get disposed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Member

@cijothomas cijothomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Left few nits, mostly about adding comments or todo.

@CodeBlanch CodeBlanch merged commit 9858e3a into open-telemetry:main Jun 30, 2022
@CodeBlanch CodeBlanch deleted the logrecordpool branch June 30, 2022 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Logs] ObjectDisposedException when using ParseStateValues Perf improvements for LogRecord
6 participants