Support async/await session model #130

badrishc · 2019-05-21T00:56:32Z

The FASTER model is based on a set of threads, each of which can start a session with FASTER, perform a sequence of operations, and later stop the session. Operations consist of bursts of activity, with periodic invocations of a Refresh() call to FASTER in between. It would not be okay for a thread/session to go to sleep (i.e., stop calling Refresh) without stopping its session. Sessions and threads are tightly integrated in this model.

However, this tight coupling of threads and sessions in FASTER is not ideal for an async/await API using the C# thread pool. This PR aims to investigate and prototype alternatives to this approach, and implement the best one. There is next to no code right now, this PR for now serves as a place to discuss options and alternatives.

Adding @JorgeCandeias @gunaprsd @ReubenBond for comments.

Fix #140

badrishc · 2019-05-21T01:05:12Z

The first option is to eliminate thread-local variables in FASTER entirely, and have logical sessions that may move across threads. This will allow FASTER to provide persistence guarantees for a logical session (e.g., commit/persist all operations with LSN < k on the logical session), regardless of which thread the session is currently executing on. This means users can seamlessly use other async operations in their code.

Every FASTER operation would then need a session ID parameter, which is a bit ugly. Perhaps the default could still be “threads==sessions” behavior by using a session-free overload, where a thread-local variable is used to store the default session ID.

Currently, FASTER operations that go async are affinitized back to the thread that issued the I/O. The client is supposed to call CompletePending to "resume" these operations on that thread. We could leave this as it is, with logical sessions expected to call CompletePending to complete the I/Os on that session. However, it may instead be interesting to support the async model for these operations, where the operation returns a Task instead.

Internally, we could use TaskCompletionSource to handle the async operations. We could create a simplified async API layer for FASTER that simply uses FASTER’s Context generic parameter to store the TaskCompletionSource. An async callback can then complete the task using this context. Since we are in async world, there is no need for a separate user-facing Context anyway.

Note that switching a session from one thread to another is generally expensive, as the session context needs to move to the other thread. However, ideally, this would happen rarely. In the common synchronous case, we would expect a thread to have a single session to FASTER, leading to high performance very similar to the current thread-local session model.

itn3000 · 2019-05-27T02:19:40Z

How would you like to use ValueTask + IValueTaskSource?
It is used in library which is required high performance like System.Threading.Channels and System.IO.Pipelines.
It may be more efficient than using TaskCompletionSource.

JorgeCandeias · 2019-05-28T16:01:00Z

The first option is to eliminate thread-local variables in FASTER entirely, and have logical sessions that may move across threads. This will allow FASTER to provide persistence guarantees for a logical session (e.g., commit/persist all operations with LSN < k on the logical session), regardless of which thread the session is currently executing on. This means users can seamlessly use other async operations in their code.

Every FASTER operation would then need a session ID parameter, which is a bit ugly. Perhaps the default could still be “threads==sessions” behavior by using a session-free overload, where a thread-local variable is used to store the default session ID.

One option to do away with threads could be for the lookup to pool sessions, renting them out and taking them back as needed. That could enable user syntax such as this:

using (ISession session = lookup.OpenSession())
{
    await session.UpsertAsync( /* stuff here */ )
}

In the model above, the ISession is just a shell for an object that contains a unique session id and a reference to the owning lookup. Calls on the session translate to calls on the lookup using the actual session. Disposing ISession returns the underlying rented session back to the owning session pool, while ensuring Refresh() happens at some point.

Currently, FASTER operations that go async are affinitized back to the thread that issued the I/O. The client is supposed to call CompletePending to "resume" these operations on that thread. We could leave this as it is, with logical sessions expected to call CompletePending to complete the I/Os on that session. However, it may instead be interesting to support the async model for these operations, where the operation returns a Task instead.

Internally, we could use TaskCompletionSource to handle the async operations. We could create a simplified async API layer for FASTER that simply uses FASTER’s Context generic parameter to store the TaskCompletionSource. An async callback can then complete the task using this context. Since we are in async world, there is no need for a separate user-facing Context anyway.

This what the Orleans sample is doing now. The drawback is that we now need an extra allocation for every operation, even those completing on the sync path. At this time is hard to tell what performance difference it makes due to the overhead from everything else. An implementation using ValueTask and a pooled task factory may have some merit.

Note that switching a session from one thread to another is generally expensive, as the session context needs to move to the other thread. However, ideally, this would happen rarely. In the common synchronous case, we would expect a thread to have a single session to FASTER, leading to high performance very similar to the current thread-local session model.

I think the pooled session model above could work, by making the session pool return the session that is affinitized to the running thread (instead of just an id) and only create a new one if it does not exist yet. So it would be more of a GetOrCreate()-style lookup than an actual pool. I suspect there would be some overhead as the ThreadPool adds and removes threads but in the case of Orleans that would be amortized down as the system lives on.

I'll take a stab at the AsyncFasterKV in this branch to see if this could work.

JorgeCandeias · 2019-05-28T16:03:06Z

How would you like to use ValueTask + IValueTaskSource?
It is used in library which is required high performance like System.Threading.Channels and System.IO.Pipelines.
It may be more efficient than using TaskCompletionSource.

Indeed, this may have some merit, as operations in FASTER can complete sync or async.

badrishc · 2019-05-29T01:16:53Z

The idea of an ISession is nice. User-facing sessions are Guids, and may need to be mapped to internal "sessions" which will simply be offsets in the epoch table. A session abstraction can potentially avoid an explicit map between session Guid and the table offset.
Yeah, the extra allocation is not good. We really need the normal sync code path to be as lightweight as it is right now. One crazy idea is to make context an out param, allowing a user defined callback to create the context when it is needed.

I think the pooled session model above could work, by making the session pool return the session that is affinitized to the running thread (instead of just an id) and only create a new one if it does not exist yet.

If the pooled session was simply going to return the session associated with the thread, then what would happen after an async call returns on a different thread? Would the code continue to use that same old session, now on a different continuation thread? What about another session that now tries to continue on the older thread? It seems that we would need sessions to be independent of threads, right?

JorgeCandeias · 2019-05-29T10:34:41Z

One crazy idea is to make context an out param, allowing a user defined callback to create the context when it is needed.

At least this can get the shell AsyncFasterKV working early without affecting the sync path or the core FasterKV too much.

If the pooled session was simply going to return the session associated with the thread, then what would happen after an async call returns on a different thread? Would the code continue to use that same old session, now on a different continuation thread? What about another session that now tries to continue on the older thread? It seems that we would need sessions to be independent of threads, right?

You're right, that would open a can of worms, if it would even work at all. Best to split those concerns from the onset.

As a first step, I'd just exploit the context parameter as the Orleans sample does it, and either make the context on the FasterKV an out or a ref to allow delaying the allocation up to when we hit the async path. That will be a breaking change on the current surface, but at least we can test it. And while a ValueTask sync path may have merit, the problem is that we can only await on it once. That's fine for tight-controlled internals, but prone to bugs on a public surface. Let's see how far we can get with plain old Tasks.

ReubenBond · 2019-06-26T17:43:52Z

I think the pooled session model is appropriate. The sessions would essentially be async-local instead of thread-local.

Alternatively, each grain could rent a session at activation and return it at deactivation. That would require something to pulse Refresh(), which we could potentially add hooks for (or use a timer).

Thread-local has issues with a dynamically sized thread pool such as .NET's ThreadPool (which grains do not use currently)

badrishc · 2019-07-10T21:22:11Z

Okay, I'm just getting back to this topic after some travel. I plan to work on decoupling threads and sessions in Faster now, on this branch.

The epoch table will contain one row per session. There will be no thread local variables any more. When someone closes a session, the epoch table entry will become available for another session to rent out.

@ReubenBond, the session is a local variable for the application and thus effectively 'async local'. Is this what you meant, or were you thinking of AsyncLocal? The latter seems not very useful to me, perhaps I am missing something ...

…ork: session-based acccess just updates existing thread-local variables to set the session for the current thread.

badrishc · 2019-07-11T02:33:44Z

Rough sketch of session checked in, without touching the existing thread-local framework: session-based access will just update existing thread-local variables using the session's values, before proceeding with usual operation normally.

badrishc · 2019-07-13T01:45:33Z

@JorgeCandeias -- I added some preliminary support for pooled sessions. we need to see if this can improve the Orleans integration. See samples at https://github.com/microsoft/FASTER/blob/async-support/cs/test/SessionFASTERTests.cs

App can request a session and perform FASTER operations on it as it hops across threads (e.g., due to other async calls it might make)
Two apps can request two different sessions on the same thread.
An app can dispose a session, i.e., the next app that requests a session will not resume that session.
An app can "return" a session to FASTER. In this case, another app can call GetSharedSession and potentially get back the same session to continue operations on.
Apps can always call CreateSharedSession to get a fresh session

I'm still a bit fuzzy on how exactly this would map to the Orleans threading model, and would appreciate your thoughts. E.g., since individual grains may perform other async ops at any time, they may be holding an active FASTER session when they go async, right? In that case, when a different grain is scheduled on that thread, it would need to "rent" a different session. Thus, the number of active sessions may grow to more than the number of threads -- it would depend on the number of pending activated grains.

badrishc · 2019-07-26T01:28:04Z

@JorgeCandeias -- any comments? Would this version be of interest to you to improve the FASTER-Orleans sample? And what do you think of adding a true async interface to FASTER?

JorgeCandeias · 2019-07-26T14:25:25Z

Hi @badrishc, many apologies for the delay, other things moved up on my list in the meantime, but this one is still on it.

Yes, this version is already useful for improving the sample, so I'll take it on-board, thank you!

That said, being a synchronous interface, it will still have to run on the thread pool in order not to limit throughput on the Orleans task scheduler when spinning for I/O completions. A true async interface with no blocking or background threads will enable direct integration with the Orleans scheduler and therefore not having to rely on the thread pool.

More importantly, it will become easier to promote this to product developers looking for straightforward solutions. 😄

E.g., since individual grains may perform other async ops at any time, they may be holding an active FASTER session when they go async, right? In that case, when a different grain is scheduled on that thread, it would need to "rent" a different session. Thus, the number of active sessions may grow to more than the number of threads -- it would depend on the number of pending activated grains.

This is correct. It has less to do with grains and more with TPL tasks. Although the Orleans scheduler only creates as many worker threads as processors, there can be great many tasks pending in the system, due to awaiting something non-CPU bound to complete, like I/O operations. In the meantime other tasks will run.

For background on this, Orleans does this to keep the CPU doing actual user work as much as possible and avoid wasting cycles on context switching or spinning for I/O. The reason is that grain requests on Orleans tend to be very fine-grained on the CPU, e.g. in the microseconds and lower (even if they await I/O tasks for longer), so having an Orleans worker thread spinning even for a few milliseconds can lower the throughput ceiling. Orleans is opinionated on this and enforces responsiveness in the system by killing any task taking over 30s and issuing warnings when tasks take over 200ms, regardless of CPU usage.

In my head, for an async/await model to work with the Orleans scheduler, I think...

For isolated sessions, there must be no (or very high) upper bound for the number of in-use sessions against a lookup, even if the pool itself has some limit. If faster can support this, then the sessions do not need to be thread-safe. Each grain task can request one, do work, go async, do more work, then return the session to the session pool for another task to request. This assumes developers won't pass around sessions between threads, just like we don't pass SqlConnections around.
For shared sessions, the sessions will need to be thread-safe. Let's say task 1 requests session A on thread 1 and awaits, then task 2 requests session A again on thread 1 and also awaits. When either continuation resumes, there's no guarantee the work will be performed on the same thread, as far as I know (@ReubenBond please correct me on this). We may end up with the continuations from tasks 1 and 2 accessing the same session concurrently.

I'm leaning in favour of isolated/pooled sessions because they mimic the behavior of well-known expensive resources such as SqlConnection, which product developers already tend to handle with care. Also, thread-safety and performance don't tend to mix, not sure how much concurrency handling would affect session performance, which already exists to avoid concurrency in the first place. What is your opinion?

I attempted to apply the TaskCompletionSource trick recommended by @ReubenBond on my PR clone, but wasn't able to do it on the "async lookup shell" (for lack of a better name) without it requiring changing the underlying lookup code and without avoiding allocation of TaskCompletionSource instances upfront.

Maybe you two have a better trick for this that I can learn from?
I'll try again on this new code base over the weekend and some good coffee.

Alternatively, each grain could rent a session at activation and return it at deactivation. That would require something to pulse Refresh(), which we could potentially add hooks for (or use a timer).

This matches the intent of the sample and naturally follows into creating generic grain templates for any type, or with an eye in the future, providing an injected faster lookup for some type and underlying storage via DI bindings, azure function style.

ReubenBond · 2019-07-26T14:35:54Z

Orleans is opinionated on this and enforces responsiveness in the system by killing any task taking over 30s

One minor point: we wont kill Tasks, since .NET Tasks are co-operative & aren't preemptible. We will throw a timeout exception back to the caller, though, so that they aren't left hanging forever. We will also destroy the grain activation if it has become unresponsive for a very long time.

This assumes developers won't pass around sessions between threads,

Just to be precise here: a grain method call can hop between threads between await points, so the session wouldn't strictly be thread-local

When either continuation resumes, there's no guarantee the work will be performed on the same thread, as far as I know

Yes, precisely correct. This goes for any async method in .NET (i.e, it's not Orleans-specific)

JorgeCandeias · 2019-07-26T14:58:26Z

One minor point: we wont kill Tasks, since .NET Tasks are co-operative & aren't preemptible. We will throw a timeout exception back to the caller, though, so that they aren't left hanging forever. We will also destroy the grain activation if it has become unresponsive for a very long time.

Thanks for this important clarification 👍

This assumes developers won't pass around sessions between threads,

Just to be precise here: a grain method call can hop between threads between await points, so the session wouldn't strictly be thread-local

What I meant was it assumes developers won't willingly pass sessions around in a way that allows concurrent access.

badrishc · 2019-07-26T18:11:12Z

I attempted to apply the TaskCompletionSource trick recommended by @ReubenBond on my PR clone, but wasn't able to do it on the "async lookup shell" (for lack of a better name) without it requiring changing the underlying lookup code and without avoiding allocation of TaskCompletionSource instances upfront.

How about this: we slightly modify FASTER's API to support "ref Context" instead of "Context". User then passes in a "null" context to the lookup. If/when an operation goes async, it can fill up/augment this context via a new user-specified context provider callback. In our use case, this callback will allocate the TaskCompletionSource instance. It would be "ref" instead of "out" because users might still want to add stuff to the context (from the stack) before invoking the operation.

void UpdatePendingContext(ref context);
{
   context = new TaskCompletionSource<TOutput>(...);
}

…nc-support

…nto async-support

gunaprsd

I have some queries that need to be resolved. See comments. O/w looks good to me.

cs/src/core/Allocator/AllocatorBase.cs

cs/src/core/Index/FASTER/FASTERThread.cs

cs/src/core/Index/Recovery/Checkpoint.cs

cs/src/core/ClientSession/FASTERAsync.cs

badrishc · 2019-12-10T07:00:02Z

We now have a full async-compliant sessions interface to FasterKv! You no longer need to call refresh or worry about thread affinity. The overall idea is to create new sessions to FasterKv and perform a sequence of operations on a session. You can also have a separate async committer (checkpointer) thread for durability. The async session operations can be configured to await until either after they are completed in memory or after they are made durable by a checkpoint. This is similar to the new FasterLog API.

See any example in the playground for details (older API is marked obsolete for now). A good one is the newly added sample here.

I think it almost cannot get easier than this for C# users, but would love to hear feedback from everyone.

badrishc · 2019-12-12T00:30:18Z

Async Sessions API

// Create new shared FASTER KV
var faster = new FasterKV(...);

// Create any number of sessions to FasterKV
var session = faster.NewSession();

// Async read, retrieves from disk if needed, reads uncommited if present
(Status, Output) result = await session.ReadAsync(key);

// Read waits for uncommitted read value to checkpoint/commit before returning
(Status, Output) result = await session.ReadAsync(key, waitForCommit: true);

// Upsert into mutable region, does not wait for commit
await session.UpsertAsync(key, value);

// Upsert into mutable region, return after this (and all prev) upserts commit
await session.UpsertAsync(key, value, waitForCommit: true);

// RMW into mutable region, does not wait for commit
await session.RMWAsync(key, input);

// RMW into mutable region, wait for commit
await session.RMWAsync(key, input, waitForCommit: true);

// Waits for all prev operations on session to commit before returning
await session.WaitForCommit();

Commits can be performed periodically by a separate thread/task:

int i = 0;
while (true)
{
  Thread.Sleep(5000);
  if (i++ % 100 == 0)
     faster.TakeFullCheckpoint(out _); // periodically take full index + log checkpoint
  else
     faster.TakeHybridLogCheckpoint(out _); // take (incremental) log checkpoint
}

tli2

LGTM other than minor nits and clarifications. I didn't look at the internals too closely since I am not as familiar.

cs/src/core/Index/Common/Contexts.cs

cs/src/core/Index/FASTER/FASTER.cs

cs/src/core/Index/Interfaces/IFasterKV.cs

…nto async-support

Initial checkin

484c674

badrishc mentioned this pull request May 21, 2019

[WIP] [DRAFT] Microsoft Faster Integration Sample dotnet/orleans#5618

Closed

badrishc added 2 commits May 20, 2019 18:13

Merge branch 'master' into async-support

fc89978

Merge branch 'master' into async-support

63d1276

badrishc mentioned this pull request Jun 18, 2019

Better support for async/await threading model #140

Closed

Merge branch 'master' into async-support

0c7bddd

badrishc added the work in progress Work in progress label Jul 10, 2019

badrishc added 2 commits July 10, 2019 15:12

Merge branch 'master' into async-support

3d17a2d

Rough sketch of session without touching currrent thread-local framew…

2b7e24f

…ork: session-based acccess just updates existing thread-local variables to set the session for the current thread.

badrishc added 2 commits July 11, 2019 18:18

Improved support, added session pooling option, added session testcases.

00dc474

Updates to session support

533652b

badrishc added 2 commits July 23, 2019 19:54

Merge branch 'master' into async-support

2fff5cb

Merge branch 'master' into async-support

6c5420a

badrishc added 2 commits July 26, 2019 11:51

Merge branch 'master' of https://github.com/Microsoft/FASTER into asy…

4412113

…nc-support

Merge branch 'async-support' of https://github.com/Microsoft/FASTER i…

366a8cc

…nto async-support

badrishc added 4 commits October 7, 2019 16:37

Improved sample, changed GetMemory to use byte[] instead of Span<byte>

5caea66

Merging from fasterlog

e0f745f

Merging goodness from master.

fc47dbd

Merge branch 'master' into async-support

c4fba23

gunaprsd reviewed Nov 2, 2019

View reviewed changes

badrishc added 9 commits November 14, 2019 11:46

Merge branch 'master' into async-support

ca15128

Merge branch 'master' into async-support

0f65fbb

Fixed break

72ecc4f

Merge branch 'master' into async-support

a1b87a5

Merge branch 'master' into async-support

6b03887

Merging from master

876c0d3

Fix merge

b3cccd0

Major API cleanup and porting to session-based interface

1eca340

Added async API to sessions, added first draft of async sample.

0237a0c

badrishc requested a review from tli2 December 10, 2019 17:27

badrishc added 2 commits December 11, 2019 16:30

Added tests

255c000

Update README.md

a16c5eb

badrishc added enhancement New feature or request and removed work in progress Work in progress labels Dec 13, 2019

badrishc changed the title ~~[WIP] Better support for async/await threading model in FASTER~~ Support async/await session model Dec 14, 2019

Merge branch 'master' into async-support

95c0e58

tli2 reviewed Dec 15, 2019

View reviewed changes

cs/src/core/Index/Common/Contexts.cs Show resolved Hide resolved

cs/src/core/Index/FASTER/FASTER.cs Outdated Show resolved Hide resolved

cs/src/core/Index/FASTER/FASTER.cs Outdated Show resolved Hide resolved

cs/src/core/Index/Interfaces/IFasterKV.cs Outdated Show resolved Hide resolved

badrishc added 4 commits December 18, 2019 19:33

Updates based on review.

0624fd5

Merge branch 'async-support' of https://github.com/microsoft/FASTER i…

b31631b

…nto async-support

Minor fixes.

ecbf696

fixed call to newsession

49866bf

badrishc merged commit a8e1f3c into master Dec 20, 2019

badrishc mentioned this pull request Feb 24, 2020

[C#] Continuations Should be Properly Wired #246

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support async/await session model #130

Support async/await session model #130

badrishc commented May 21, 2019 •

edited

Loading

badrishc commented May 21, 2019

itn3000 commented May 27, 2019

JorgeCandeias commented May 28, 2019

JorgeCandeias commented May 28, 2019

badrishc commented May 29, 2019

JorgeCandeias commented May 29, 2019

ReubenBond commented Jun 26, 2019

badrishc commented Jul 10, 2019

badrishc commented Jul 11, 2019

badrishc commented Jul 13, 2019

badrishc commented Jul 26, 2019

JorgeCandeias commented Jul 26, 2019

ReubenBond commented Jul 26, 2019 •

edited

Loading

JorgeCandeias commented Jul 26, 2019

badrishc commented Jul 26, 2019

gunaprsd left a comment

badrishc commented Dec 10, 2019 •

edited

Loading

badrishc commented Dec 12, 2019

tli2 left a comment

Support async/await session model #130

Support async/await session model #130

Conversation

badrishc commented May 21, 2019 • edited Loading

badrishc commented May 21, 2019

itn3000 commented May 27, 2019

JorgeCandeias commented May 28, 2019

JorgeCandeias commented May 28, 2019

badrishc commented May 29, 2019

JorgeCandeias commented May 29, 2019

ReubenBond commented Jun 26, 2019

badrishc commented Jul 10, 2019

badrishc commented Jul 11, 2019

badrishc commented Jul 13, 2019

badrishc commented Jul 26, 2019

JorgeCandeias commented Jul 26, 2019

ReubenBond commented Jul 26, 2019 • edited Loading

JorgeCandeias commented Jul 26, 2019

badrishc commented Jul 26, 2019

gunaprsd left a comment

Choose a reason for hiding this comment

badrishc commented Dec 10, 2019 • edited Loading

badrishc commented Dec 12, 2019

Async Sessions API

tli2 left a comment

Choose a reason for hiding this comment

badrishc commented May 21, 2019 •

edited

Loading

ReubenBond commented Jul 26, 2019 •

edited

Loading

badrishc commented Dec 10, 2019 •

edited

Loading