[C#] Could Disk Reads be FASTER? #238

ThiagoT1 · 2020-02-01T19:58:43Z

I've got pretty much every basic operation covered on my POC.

Repo: https://github.com/ThiagoT1/FasterDictionary/tree/unsafe-0.0.3-fastread

Iteration thru the whole KV has ok(ish) performance.

public IAsyncEnumerator<KeyValuePair<TKey, TValue>> GetAsyncEnumerator(CancellationToken cancellationToken = default)
{
	return new AsyncEnumerator(this);
}
class AsyncEnumerator : IAsyncEnumerator<KeyValuePair<TKey, TValue>>
{
	FasterDictionary<TKey, TValue> _fasterDictionary;

	public ValueTask StartTask { get; }

	public AsyncEnumerator(FasterDictionary<TKey, TValue> fasterDictionary)
	{
		_fasterDictionary = fasterDictionary;
		StartTask = _fasterDictionary.AquireIterator();
	}

	KeyValuePair<TKey, TValue> _current;
	public KeyValuePair<TKey, TValue> Current => _current;

	public ValueTask DisposeAsync()
	{
		return _fasterDictionary.ReleaseIterator();
	}

	public async ValueTask<bool> MoveNextAsync()
	{
		await StartTask;

		_current = default;
		var readResult = await _fasterDictionary.IteratePair();
		if (!readResult.Found)
			return false;

		_current = new KeyValuePair<TKey, TValue>(readResult.Key, readResult.Value);
		return true;
	}
}

https://github.com/ThiagoT1/FasterDictionary/blob/96407cf3e476f612eb2eb72e895af4c6f9358850/src/FasterDictionary/FasterDictionary.Iterator.cs#L68-L71

Issue

What's killing me now is the random key read performance. I'm talking "mono-threaded" access to KV where a single reader has to retrieve values for thousands of random keys.

Adding, Updating once and Iterating over 1M keys takes less than 2 minutes:

[InlineData(1_000_000)]
public async Task AddUpdateIterate(int loops)
{
	FasterDictionary<string[], string>.ReadResult result;
	using (var dictionary = new FasterDictionary<string[], string>(TestHelper.GetKeyComparer<string[]>(), GetOptions($"{nameof(AddIterate)}-{loops}")))
	{
		var keys = new List<string[]>();

		for (var i = 0; i < loops; i++)
		{
			keys.Add(new[] { i.ToString() });
			dictionary.Upsert(keys[i], (i + 1).ToString()).Dismiss();
		}

		await dictionary.Ping();

		for (var i = 0; i < loops; i++)
			dictionary.Upsert(keys[i], (i + 10).ToString()).Dismiss();

		await dictionary.Ping();

		var count = 0;
		await foreach (var entry in dictionary)
		{
			count++;
			Assert.Equal((int.Parse(entry.Key[0]) + 10).ToString(), entry.Value);
		}

		result = await dictionary.TryGet(new[] { loops.ToString() });
		Assert.False(result.Found);

		Assert.Equal(loops, count);
	}
}

https://github.com/ThiagoT1/FasterDictionary/blob/026248db7bfca8f5b72c22481f732babd8237c8e/tests/FasterDictionary.Tests/UseCases/ArrayAsKeyTests.cs#L157-L190

However, doing the same while replacing the iteration with a 1M key lookup takes 30 minutes:

		for (var i = 0; i < loops; i += step)
		{
			result = await dictionary.TryGet(keys[i]);
			Assert.True(result.Found);
			Assert.Equal((i + 10).ToString(), result.Value);
		}

https://github.com/ThiagoT1/FasterDictionary/blob/026248db7bfca8f5b72c22481f732babd8237c8e/tests/FasterDictionary.Tests/UseCases/ArrayAsKeyTests.cs#L136-L141

This is a regular SSD, with roughly 500 MB/s read/write:

Looking at CPU time eaters on VS 2019 we see that the culprit is ClientSession<,,,,,>CompletePending, which in turn blames the Thread.Yield() inside FasterKV<,,,,,>.InternalCompletePending.

Replacing the Thread.Yield() with a SpinWait.SpinOnce() maintains functionality while greatly reducing CPU Usage.
Throughput, however, stays low, since the real bottleneck is the granularity of disk reads (the 30 minute mark is with this one):

Use Cases

Theres two scenarios where one of those strategies - mono-threaded iteration or massive random key lookups - will come into play:

1) Iterate thru a subset of the key set (Read Only)

This assumes a function to calculate all keys to iterate thru is in place (given a query or something).

In this case, every key in the subset will have to be passed to KVSession.Read, which will have awful performance. This strategy could be replaced (by the KV user) with a whole KV iteration, dismissing keys not in the subset. Not great, though.

Proposal

The knowledge about multiple keys in need of retrieval should be passed to KV, by means of a Read method that would accept multiple keys, while returning a AsyncEnumerable<ReadResult> that could be used to process all recovered values.

Such KV method would use that key array to optimize disk access (leveraging address and segment locallity), by means of reading larger chunks less often, whenever possible.

2) Massive RMW of Unitary Keys Already on Disk

This assumes that as soon as an RMW operation starts, all reads on the same key should retrieve the new value or wait for it (see below). It's pretty much the same as the tombstone mark, that informs us that the record is gone.

In this case, currently both the RMW'er and the Reader are slow, plus (AFAIK) the reader may get an old value, while we wait for the disk access to complete (and the RMW continuation to complete).

Proposal

RMW`s should atomically mark the record as dirty and cause any Read result of that key to yield a status that means we are waiting on a RMW to complete. The reader could optionally continue the read of a (possibly) old value, or place a continuation for when the RMW completes, which would garantee a fresh read up to that point.

This would also let the RMW'er fastly keep issuing the RMW command for the next keys, since a continuation for the actual RMW operation can be placed and there's a guarantee that any readers can wait for the RMW to complete.

The code that actually does the RMW after the disk read could be provided by a new Functions instance, as per #230.

It should also be noted that FASTER was firstly written without TLP in mind, and full adherence to TLP semantics would see the Functions class callbacks gone. But that's probably for another day, isn't it? :)

Also, please note that the same tests without updates complete under 40 seconds. The detail here is that we are changing the value length on purpose, with those updates. This forces the concurrentwriter call to return false, which in turn causes the need of a new slot on the log.

BTW, all of this is without serializers or object log, using only the unsafe approach with var length types

Thanks a lot!

The text was updated successfully, but these errors were encountered:

badrishc · 2020-02-21T21:44:38Z

There is a tension between sync and async code here. For sync, the better alternative is Thread.Yield (benchmarked in the past). But for async, the better alternative might be what you suggest (Spin), although I have not benchmarked this yet. Is the observed cost (throughput) with async significantly degraded to justify a change or a config here?

ThiagoT1 · 2020-02-21T21:48:10Z

Actually, async code is already avoiding the Thread.Yeild.
So, this could be a left as it is, as I predict async API will be way more popular, even making all sync code obsolete.

The other issues, though, like batching reads and chaining RMW's, are still alive.

badrishc · 2020-02-21T22:35:23Z

Then is this PR still relevant: #241?

Will consider the other comments separately.

ThiagoT1 · 2020-02-22T00:23:32Z

I closed the PR.

badrishc · 2020-07-17T15:54:49Z

Closing as we have optimized the main goal of reads. We can create new issues if we plan to work on any other issues/ideas mentioned here.

ThiagoT1 changed the title ~~[C#] Could Cold Reads be FASTER?~~ [C#] Could Disk Reads be FASTER? Feb 1, 2020

This was referenced Feb 4, 2020

User should be able to provide context class when using async methods #240

Merged

Thread.Yield on CompletePending control flow is too CPU Heavy #241

Closed

This was referenced Feb 20, 2020

[C#] Continuations Should be Properly Wired #246

Closed

Explanation on some of the setting params #245

Closed

badrishc closed this as completed Jul 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C#] Could Disk Reads be FASTER? #238

[C#] Could Disk Reads be FASTER? #238

ThiagoT1 commented Feb 1, 2020 •

edited

Loading

badrishc commented Feb 21, 2020

ThiagoT1 commented Feb 21, 2020

badrishc commented Feb 21, 2020

ThiagoT1 commented Feb 22, 2020

badrishc commented Jul 17, 2020

[C#] Could Disk Reads be FASTER? #238

[C#] Could Disk Reads be FASTER? #238

Comments

ThiagoT1 commented Feb 1, 2020 • edited Loading

Issue

Use Cases

1) Iterate thru a subset of the key set (Read Only)

Proposal

2) Massive RMW of Unitary Keys Already on Disk

Proposal

badrishc commented Feb 21, 2020

ThiagoT1 commented Feb 21, 2020

badrishc commented Feb 21, 2020

ThiagoT1 commented Feb 22, 2020

badrishc commented Jul 17, 2020

ThiagoT1 commented Feb 1, 2020 •

edited

Loading