Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 246 continuations wiring (II) #248

Merged
merged 22 commits into from
Mar 9, 2020

Conversation

ThiagoT1
Copy link
Contributor

@ThiagoT1 ThiagoT1 commented Feb 21, 2020

Important:

  1. Sync methods behavior and API are unchanged
  2. The only flow control changed is ReadAsync. Other async methods are unchanged.

Properly Wire ReadAsync Continuations.

Every commit tries to stage a step towards the PR objective.

The main point of concern is the RelaxedCPR treatment on cs/src/core/Index/FASTER/FASTER.cs

Fixes #246

@ThiagoT1
Copy link
Contributor Author

Last commit done in attempt to meet FASTER concurrency model requirements.

All tests passing, including comprehensive ones on dependent libs, like FasterDictionary:

image

image

cs/src/core/Allocator/AsyncIOContext.cs Outdated Show resolved Hide resolved
cs/src/core/Index/FASTER/FASTER.cs Outdated Show resolved Hide resolved
cs/src/core/Index/FASTER/FASTERThread.cs Show resolved Hide resolved
@badrishc
Copy link
Contributor

I am worried about the depth of changes in this PR as they break subtle invariants in the system. We will step back and (1) fix the multi-async bug; (2) make sure your use case gets high performance. I think these goals can be accomplished without modifying the mono-threaded session access invariant.

@badrishc
Copy link
Contributor

badrishc commented Feb 26, 2020

Also, FASTER is a latch-free library. We try to not take locks anywhere, except in one case during checkpointing, when forcing a session to move from one version to another. This is achieved by ensuring that only the mono-threaded session accesses/updates any state.

Thus, we need the async continuations to simply be added to the async response queue (which already uses a latch-free ConcurrentQueue implementation), from which the mono-threaded session will remove elements.

Copy link
Contributor

@badrishc badrishc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the user does RMWAsync followed by ReadAsync with waitForCommit true. We would require the older RMW to complete as well, as part of this wait for commit.


await diskRequest.ValueTaskOfT;

lock (clientSession.ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not want to take the overhead of locking the client session. This is the reason FASTER is mono-threaded. All work for a session is done on a single thread, and this PR breaks that contract.

Copy link
Contributor Author

@ThiagoT1 ThiagoT1 Feb 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the user does RMWAsync followed by ReadAsync with waitForCommit true. We would require the older RMW to complete as well, as part of this wait for commit.

That should be ensured by ClientSession.WaitForCommitAsync (that went unchanged), no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If user does this sequence, the expectation is that all operations until that point are committed:

s.RMWAsync();
await s.ReadAsync(waitForCommit: true);

Copy link
Contributor Author

@ThiagoT1 ThiagoT1 Feb 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not want to take the overhead of locking the client session. This is the reason FASTER is mono-threaded. All work for a session is done on a single thread, and this PR breaks that contract.

I see.

However, currently there's other places where there's shared memory, which current synchronization techniques are equivalent to locks, even with the possibility of a context switch. Those are ConcurrentDictionaries and ConcurrentQueues.

Would that be better if we switched the lock to Interlocked calls and a Spinner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If user does this sequence, the expectation is that all operations until that point are committed:

s.RMWAsync();
await s.ReadAsync(waitForCommit: true);

This is happening.
RMWAsync will put an entry inside the pending requests collection, that in turn will be monitored by the CompletePendingAsync call inside WaitForCommitAsync

@ThiagoT1
Copy link
Contributor Author

This is achieved by ensuring that only the mono-threaded session accesses/updates any state.

if this is about FASTER.cs:389 lock (clientSession.ctx), it was added since Native32.Read callbacks do not get executed in a mono-threaded way. And the continuation in the PR is following the same execution path.
Thus, such code will also be executed in a multi threaded way, hence the lock, that by the way also happens inside ConcurrentQuere, albeit by way of Interlocked methods and spinners.

Thus, we need the async continuations to simply be added to the async response queue (which already uses a latch-free ConcurrentQueue implementation), from which the mono-threaded session will remove elements.

This would have ate least these consequences:

  1. The AsyncQueue only releases a single awaiter to go, every time an enqueue happens, so
  2. Even if this awaiter completed someone elses continuation and stored it somewhere, the actual awaiter would still be hung on DequeueAsync (maybe forever)
  3. To counter that, we would need a Timeout (lots of allocation, bcz its a task array with two tasks in it, at least)
  4. other option would be to Re-Enqueue the continuation on AsyncQueue. Then we could have endless racing between several awaiters, plus a new Task and StateMachine every loop.

In the end, we would need to use other type of queue, that would signal (by a task) every awaiter, so they can all just discard a continuation thats not theirs and re await for the next completion, until theirs come around, and they can return.

That would produce a task for every completion multiplied by maybe a new state machine for every awaiter loop interation.

try
{
do
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop is not identical in behavior to the original code. For example, I notice that InternalRefresh would not be called in this tight loop, resulting in never getting out of CPR. The debug asserts for version correctness are also skipped. We need to ensure that all code paths are functionally identical to the older case.

Why not use an async equivalent of HandleOperationStatus? This keeps the main FasterKV.ReadAsync method small so it can be inlined, leaving the complexity of uncommon cases to a sub-method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop is not identical in behavior to the original code. For example, I notice that InternalRefresh would not be called in this tight loop, resulting in never getting out of CPR. The debug asserts for version correctness are also skipped. We need to ensure that all code paths are functionally identical to the older case.

You're right. Thank you! The last commit just fixed this.

Why not use an async equivalent of HandleOperationStatus? This keeps the main FasterKV.ReadAsync method small so it can be inlined, leaving the complexity of uncommon cases to a sub-method.

It was done this way because HandleOperationStatus has a lot of cases to thread.

Actually, any status different from Success or NotFound will cause it to be called.
It will even trigger the disk IO if needed.

That's where we need to fork paths.

When the disk IO is needed, the ReadAsync path will do the new wiring, thus we cannot call HandleOperationStatus at this point, and should just prepare for the await.

It's important to notice that the only changed path was the ReadAsync one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, current code will error out the Read if 2 consecutive CPR_SHIFT_DETECTED happen.

Should we keep this? I think this could be in error.

What is keeping the overall state to be in CPR_SHIFT_DETECTED twice, while being on other states in between the two tries, in a classic ABA Problem?

Copy link
Contributor Author

@ThiagoT1 ThiagoT1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yesterday's comments handled.

try
{
do
{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop is not identical in behavior to the original code. For example, I notice that InternalRefresh would not be called in this tight loop, resulting in never getting out of CPR. The debug asserts for version correctness are also skipped. We need to ensure that all code paths are functionally identical to the older case.

You're right. Thank you! The last commit just fixed this.

Why not use an async equivalent of HandleOperationStatus? This keeps the main FasterKV.ReadAsync method small so it can be inlined, leaving the complexity of uncommon cases to a sub-method.

It was done this way because HandleOperationStatus has a lot of cases to thread.

Actually, any status different from Success or NotFound will cause it to be called.
It will even trigger the disk IO if needed.

That's where we need to fork paths.

When the disk IO is needed, the ReadAsync path will do the new wiring, thus we cannot call HandleOperationStatus at this point, and should just prepare for the await.

It's important to notice that the only changed path was the ReadAsync one.

try
{
do
{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, current code will error out the Read if 2 consecutive CPR_SHIFT_DETECTED happen.

Should we keep this? I think this could be in error.

What is keeping the overall state to be in CPR_SHIFT_DETECTED twice, while being on other states in between the two tries, in a classic ABA Problem?

… it.

It then is the user role to await every ReadAsync call eventually
@ThiagoT1
Copy link
Contributor Author

ThiagoT1 commented Mar 7, 2020

As per last discussions, a new struct is wrapping the compute completion logic, so the user can trigger it.
The completion lock is gone and the user must call ReadAsyncResult.CompleteRead in a single threaded manner.

@ThiagoT1
Copy link
Contributor Author

ThiagoT1 commented Mar 7, 2020

BTW,

Thank you @badrishc for the patience and the high level discussions about this issue.

/// <summary>
/// Async Operation ValueTask backer
/// </summary>
public FasterAsyncOperation<AsyncIOContext<Key, Value>> asyncOperation;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason to use all the complicated logic in FasterAsyncOperation vs a TaskCompletionSource that we set completed when IO is done?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid introducing new code if we can avoid it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.
TCS will yield a task (class) while our FAO will yield and back a ValueTask (struct).
Its one allocation less.

Lots of code inside .net core itself are using IValueTaskSource as is FasterAsyncOperation, as to avoid allocating a Task when possible.

@badrishc badrishc merged commit 26569b2 into microsoft:master Mar 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C#] Continuations Should be Properly Wired
2 participants