Skip to content

Commit

Permalink
Make LifetimeManager lockfree
Browse files Browse the repository at this point in the history
We only need to be able to determine if a lease is active
or there was any activity in the last timer period.
That is now done 2 atomic counters - a count of leased
leases and a count returned leases.

The timer still has a potential race if a new lease occurs
right after the timer checks it didn't occur and attempts to
delete it. This was there even with the locks and this doesn't
make any worse (AFAIK)

Performance of single-threaded benchmark is unaffected,
the 16 thread test sees 1.5-3x improvement (it is noisy :/)

Counter (4 runs) | 1339ms...1971ms -> 881ms...939ms
Histogram (4 runs) | 2021ms...2158ms -> 771ms...928ms

The improvement is this small only because there is also
a RWLock which not that trivial to remove (and no,
.NET RWLock is not lockfree with purely read workload)
  • Loading branch information
exyi committed Nov 29, 2023
1 parent 170ecdd commit 4e86977
Showing 1 changed file with 25 additions and 33 deletions.
58 changes: 25 additions & 33 deletions Prometheus/ManagedLifetimeMetricHandle.cs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
using System.Collections.Concurrent;
using System.Collections.Concurrent;

namespace Prometheus;

Expand Down Expand Up @@ -96,12 +96,14 @@ public LifetimeManager(TChild child, TimeSpan expiresAfter, IDelayer delayer, Ac
private readonly Action<TChild> _remove;

private readonly object _lock = new();
private int _leaseCount = 0;

// Taking or releasing a lease will always start a new epoch. The expiration timer simply checks whether the epoch changes between two ticks.
// If the epoch changes, it must mean there was some lease-related activity and it will do nothing. If the epoch remains the same and the lease
// count is 0, the metric has expired and will be removed.
private int _epoch = 0;
// We count the number of Lease and Release calls, this allows us to determine whether
// 1. there are any active leases (when _leaseCount != _releaseCount)
// 2. whether there was any activity during the last timer iteration

// If both conditions are false, the metric has expired and will be removed.
private long _leaseCount = 0;
// 1 cache line padding - 64 bytes
private long _releaseCount = 0;

// We start the expiration timer the first time a lease is taken.
private bool _timerStarted;
Expand All @@ -127,55 +129,45 @@ internal IDisposable TakeLeaseFast()

private void TakeLeaseCore()
{
lock (_lock)
{
EnsureExpirationTimerStarted();

_leaseCount++;
unchecked { _epoch++; }
}
EnsureExpirationTimerStarted();
Interlocked.Increment(ref _leaseCount);
}

private void ReleaseLease()
{
lock (_lock)
{
_leaseCount--;
unchecked { _epoch++; }
}
Interlocked.Increment(ref _releaseCount);
}

private void EnsureExpirationTimerStarted()
{
if (_timerStarted)
if (Volatile.Read(ref _timerStarted))
return;

_timerStarted = true;
lock(_lock)
{
if (_timerStarted)
return;
_timerStarted = true;

_ = Task.Run(ExecuteExpirationTimer);
_ = Task.Run(ExecuteExpirationTimer);
}
}

private async Task ExecuteExpirationTimer()
{
while (true)
{
int epochBeforeDelay;

lock (_lock)
epochBeforeDelay = _epoch;
long leaseCountBeforeDelay = _leaseCount;

// We iterate on the expiration interval. This means that the real lifetime of a metric may be up to 2x the expiration interval.
// This is fine - we are intentionally loose here, to avoid the timer logic being scheduled too aggressively. Approximate is good enough.
await _delayer.Delay(_expiresAfter);

lock (_lock)
{
if (_leaseCount != 0)
continue; // Will not expire if there are active leases.
if (_leaseCount != _releaseCount)
continue; // Will not expire if there are active leases.

if (_epoch != epochBeforeDelay)
continue; // Will not expire if some leasing activity happened during this interval.
}
if (_leaseCount != leaseCountBeforeDelay)
continue; // Will not expire if some leasing activity happened during this interval.

// Expired!
//
Expand Down

0 comments on commit 4e86977

Please sign in to comment.