CAS and voltile approach to fix delta concurrency bug #5976

jack-berg · 2023-11-10T16:26:04Z

After merging #5932, @trask pointed out that the approach isn't actually non-blocking because it doesn't use readLock().tryLock() and the sequence below can result in the call to readLock().lock() to block:

Record thread:
time=1 - Lock readLock = aggregatorHolder.lock.readLock();
time=6 - readLock.lock();

Collect thread:
time=2 - AggregatorHolder<T, U> holder = this.aggregatorHolder;
time=3 - this.aggregatorHolder = new AggregatorHolder<>();
time=4 - Lock writeLock = holder.lock.writeLock();
time=5 - writeLock.lock();

I wrote a test and verifies this is indeed correct, so back to the drawing board. I adjusted the code to repeatedly try to read the volatile AggregatorHolder, with a tryLock as the loop condition. Something like:

    AggregatorHolder<T, U> aggregatorHolder;
    do {
      aggregatorHolder = this.aggregatorHolder;
    } while (!aggregatorHolder.lock.readLock().tryLock());

But after running the unit test I wrote to verify correctness 100 times, this occasionally fails. The problem with this code is that its subject to (essentially) the same sequence outlined above: record threads calls to readLock().tryLock() can succeed, but after a collect thread has unlocked its writeLock(), causing lost writes.

After thinking about this more, I came up with a solution where I use AtomicInteger to track the number of outstanding record operations, incrementing and decrementing as they start and complete. The collect thread does a CAS loop waiting for 0 outstanding record operations, and setting the AtomicInteger to -1, which acts as a signal for record threads to re-read the volatile AggregatorHolder.

The performance of this is better than the readWriteLock solution. And it appears to be correct in all cases (I re-ran the unit test 100 times without failure).

...c/main/java/io/opentelemetry/sdk/metrics/internal/state/DefaultSynchronousMetricStorage.java

codecov · 2023-11-10T17:58:56Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Files	Coverage Δ
...nternal/state/DefaultSynchronousMetricStorage.java	`96.59% <100.00%> (+0.24%)`	⬆️

... and 2 files with indirect coverage changes

📢 Thoughts on this report? Let us know!

...c/main/java/io/opentelemetry/sdk/metrics/internal/state/DefaultSynchronousMetricStorage.java

trask

very nice 👍

I think better comments and naming could make it easier to follow but that could be done in a follow-up.

asafm · 2023-11-13T11:52:51Z

@jack-berg I wrote that in my original comment design :)

When we record, we call readLock().tryLock. If we get false it means the write lock has been taken and we need to refresh the value from activeAggregatorHandles (explained below why) hence upon false we re-read the value at activeAggregatorHandles and call tryLock again - this should never fail - if it does, fail.

but it doesn't matter since it's flawed as you wrote. Very nice catch! I haven't considered the option that the thread reading the aggregatorHandles from volatile would be context switched between reading the value from volatile up until the write lock has been released and the map gone stale.

The current proposed approach of Odd / even AtomicInteger has 2 issues:

It uses busy wait during collect(), which is not a good approach as it keeps the CPU at 100% due to this. Compare this with the writeLock().lock() approach which to the best of my knowledge doesn't use busy wait.
When reading this you're a bit puzzled and takes a few "hops" to "get it".

I have a suggestion:

Use read/write lock as before, but with the change of using tryLock as outlined in the description of this PR.
Add a boolean volatile openForRecordings to the holder, with initial value of true.
After taking write lock, you set openForRecordings to false.
Upon getting readLock().tryLock() true, you read openForRecordings and if it is false, this is the signal to re-read the holder volatile and re-obtain the read lock since you're using a stale value. It will succeed in the 2nd attempt.

This solves the scenario in which you obtain a read lock on a "finished-write" holder (write has been unlocked).
It doesn't do busy wait in collect(), but uses the existing approach of writeLock().lock() to wait for all other locks.
I think it might be a bit easier to understand, although still requires docs to explain this weird piece of code to the first-time reader.

WDYT?

jack-berg · 2023-11-13T13:49:55Z

I think there's improvements that could made to the approach for this, but that we're out of time. The release intended for Friday is 3 days late, and going through another round of development / performance testing / review would likely push it at least another day (based on the schedule I have). Our options are essentially: merge this PR or revert the original fix #5932 (no point including code which is both blocking and not correct all the time).

I'm going to push some minor updates to improve the naming / comments, merge this PR, and cut the release. I'm happy to continue iterating on this for the December release.

trask · 2023-11-13T15:24:55Z

It uses busy wait during collect()

~~the key to me is that it is effectively only an "at most 2 times" loop, since the collect() thread is only called every OTEL_METRIC_EXPORT_INTERVAL~~

~~an option could be to codify that in the algorithm (and avoid some super weird degenerate thread prioritization) by no-op'ing the record() on the third or fourth loop~~

ugh, I mixed up collect() and record() in my head, please disregard this comment

jack-berg · 2023-11-13T15:47:59Z

It uses busy wait during collect()

the key to me is that it is effectively only an "at most 2 times" loop, since the collect() thread is only called every OTEL_METRIC_EXPORT_INTERVAL

The record threads are at most 2 times, but the collect thread could be more time since the loop is waiting for record threads to finish and decrement before continuing.

int recordsInProgress = holder.activeRecordingThreads.addAndGet(1);
while (recordsInProgress > 1) {
  recordsInProgress = holder.activeRecordingThreads.get();
}

trask · 2023-11-13T16:38:15Z

oh I mixed up collect() and record() in my head (again?) 🤦‍♂️

CAS and voltile approach to fix delta concurrency bug

f8c9ffe

jack-berg requested a review from a team November 10, 2023 16:26

jack-berg force-pushed the fix-concurrency-bug-2 branch from 6e12111 to 5452481 Compare November 10, 2023 16:54

trask reviewed Nov 10, 2023

View reviewed changes

...c/main/java/io/opentelemetry/sdk/metrics/internal/state/DefaultSynchronousMetricStorage.java Show resolved Hide resolved

Use android compatible methods on AtomicInteger

7121615

jack-berg force-pushed the fix-concurrency-bug-2 branch from 5452481 to 7121615 Compare November 10, 2023 17:36

Remove unused import

da60154

trask reviewed Nov 10, 2023

View reviewed changes

...c/main/java/io/opentelemetry/sdk/metrics/internal/state/DefaultSynchronousMetricStorage.java Show resolved Hide resolved

Ensure concurrentStressTest collects all recorded measurements

bba30a6

jack-berg mentioned this pull request Nov 11, 2023

Prepare 1.32.0 #5977

Merged

trask reviewed Nov 11, 2023

View reviewed changes

...c/main/java/io/opentelemetry/sdk/metrics/internal/state/DefaultSynchronousMetricStorage.java Outdated Show resolved Hide resolved

trask reviewed Nov 11, 2023

View reviewed changes

...c/main/java/io/opentelemetry/sdk/metrics/internal/state/DefaultSynchronousMetricStorage.java Outdated Show resolved Hide resolved

trask approved these changes Nov 12, 2023

View reviewed changes

Add additional explanatory comments

8e1566a

jack-berg merged commit 72a5bb1 into open-telemetry:main Nov 13, 2023
18 checks passed

trask mentioned this pull request Nov 15, 2023

Ability to instrument logs before OTel injection into OTel appenders open-telemetry/opentelemetry-java-instrumentation#9798

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CAS and voltile approach to fix delta concurrency bug #5976

CAS and voltile approach to fix delta concurrency bug #5976

jack-berg commented Nov 10, 2023 •

edited

Loading

codecov bot commented Nov 10, 2023 •

edited

Loading

trask left a comment

asafm commented Nov 13, 2023 •

edited

Loading

jack-berg commented Nov 13, 2023

trask commented Nov 13, 2023 •

edited

Loading

jack-berg commented Nov 13, 2023

trask commented Nov 13, 2023

CAS and voltile approach to fix delta concurrency bug #5976

CAS and voltile approach to fix delta concurrency bug #5976

Conversation

jack-berg commented Nov 10, 2023 • edited Loading

codecov bot commented Nov 10, 2023 • edited Loading

Codecov Report

trask left a comment

Choose a reason for hiding this comment

asafm commented Nov 13, 2023 • edited Loading

jack-berg commented Nov 13, 2023

trask commented Nov 13, 2023 • edited Loading

jack-berg commented Nov 13, 2023

trask commented Nov 13, 2023

jack-berg commented Nov 10, 2023 •

edited

Loading

codecov bot commented Nov 10, 2023 •

edited

Loading

asafm commented Nov 13, 2023 •

edited

Loading

trask commented Nov 13, 2023 •

edited

Loading