Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support async/await session model #130

Merged
merged 118 commits into from
Dec 20, 2019
Merged

Support async/await session model #130

merged 118 commits into from
Dec 20, 2019

Conversation

badrishc
Copy link
Contributor

@badrishc badrishc commented May 21, 2019

The FASTER model is based on a set of threads, each of which can start a session with FASTER, perform a sequence of operations, and later stop the session. Operations consist of bursts of activity, with periodic invocations of a Refresh() call to FASTER in between. It would not be okay for a thread/session to go to sleep (i.e., stop calling Refresh) without stopping its session. Sessions and threads are tightly integrated in this model.

However, this tight coupling of threads and sessions in FASTER is not ideal for an async/await API using the C# thread pool. This PR aims to investigate and prototype alternatives to this approach, and implement the best one. There is next to no code right now, this PR for now serves as a place to discuss options and alternatives.

Adding @JorgeCandeias @gunaprsd @ReubenBond for comments.

Fix #140

@badrishc
Copy link
Contributor Author

The first option is to eliminate thread-local variables in FASTER entirely, and have logical sessions that may move across threads. This will allow FASTER to provide persistence guarantees for a logical session (e.g., commit/persist all operations with LSN < k on the logical session), regardless of which thread the session is currently executing on. This means users can seamlessly use other async operations in their code.

Every FASTER operation would then need a session ID parameter, which is a bit ugly. Perhaps the default could still be “threads==sessions” behavior by using a session-free overload, where a thread-local variable is used to store the default session ID.

Currently, FASTER operations that go async are affinitized back to the thread that issued the I/O. The client is supposed to call CompletePending to "resume" these operations on that thread. We could leave this as it is, with logical sessions expected to call CompletePending to complete the I/Os on that session. However, it may instead be interesting to support the async model for these operations, where the operation returns a Task instead.

Internally, we could use TaskCompletionSource to handle the async operations. We could create a simplified async API layer for FASTER that simply uses FASTER’s Context generic parameter to store the TaskCompletionSource. An async callback can then complete the task using this context. Since we are in async world, there is no need for a separate user-facing Context anyway.

Note that switching a session from one thread to another is generally expensive, as the session context needs to move to the other thread. However, ideally, this would happen rarely. In the common synchronous case, we would expect a thread to have a single session to FASTER, leading to high performance very similar to the current thread-local session model.

@itn3000
Copy link
Contributor

itn3000 commented May 27, 2019

How would you like to use ValueTask + IValueTaskSource?
It is used in library which is required high performance like System.Threading.Channels and System.IO.Pipelines.
It may be more efficient than using TaskCompletionSource.

@JorgeCandeias
Copy link

The first option is to eliminate thread-local variables in FASTER entirely, and have logical sessions that may move across threads. This will allow FASTER to provide persistence guarantees for a logical session (e.g., commit/persist all operations with LSN < k on the logical session), regardless of which thread the session is currently executing on. This means users can seamlessly use other async operations in their code.

Every FASTER operation would then need a session ID parameter, which is a bit ugly. Perhaps the default could still be “threads==sessions” behavior by using a session-free overload, where a thread-local variable is used to store the default session ID.

One option to do away with threads could be for the lookup to pool sessions, renting them out and taking them back as needed. That could enable user syntax such as this:

using (ISession session = lookup.OpenSession())
{
    await session.UpsertAsync( /* stuff here */ )
}

In the model above, the ISession is just a shell for an object that contains a unique session id and a reference to the owning lookup. Calls on the session translate to calls on the lookup using the actual session. Disposing ISession returns the underlying rented session back to the owning session pool, while ensuring Refresh() happens at some point.

Currently, FASTER operations that go async are affinitized back to the thread that issued the I/O. The client is supposed to call CompletePending to "resume" these operations on that thread. We could leave this as it is, with logical sessions expected to call CompletePending to complete the I/Os on that session. However, it may instead be interesting to support the async model for these operations, where the operation returns a Task instead.

Internally, we could use TaskCompletionSource to handle the async operations. We could create a simplified async API layer for FASTER that simply uses FASTER’s Context generic parameter to store the TaskCompletionSource. An async callback can then complete the task using this context. Since we are in async world, there is no need for a separate user-facing Context anyway.

This what the Orleans sample is doing now. The drawback is that we now need an extra allocation for every operation, even those completing on the sync path. At this time is hard to tell what performance difference it makes due to the overhead from everything else. An implementation using ValueTask and a pooled task factory may have some merit.

Note that switching a session from one thread to another is generally expensive, as the session context needs to move to the other thread. However, ideally, this would happen rarely. In the common synchronous case, we would expect a thread to have a single session to FASTER, leading to high performance very similar to the current thread-local session model.

I think the pooled session model above could work, by making the session pool return the session that is affinitized to the running thread (instead of just an id) and only create a new one if it does not exist yet. So it would be more of a GetOrCreate()-style lookup than an actual pool. I suspect there would be some overhead as the ThreadPool adds and removes threads but in the case of Orleans that would be amortized down as the system lives on.

I'll take a stab at the AsyncFasterKV in this branch to see if this could work.

@JorgeCandeias
Copy link

How would you like to use ValueTask + IValueTaskSource?
It is used in library which is required high performance like System.Threading.Channels and System.IO.Pipelines.
It may be more efficient than using TaskCompletionSource.

Indeed, this may have some merit, as operations in FASTER can complete sync or async.

@badrishc
Copy link
Contributor Author

  • The idea of an ISession is nice. User-facing sessions are Guids, and may need to be mapped to internal "sessions" which will simply be offsets in the epoch table. A session abstraction can potentially avoid an explicit map between session Guid and the table offset.

  • Yeah, the extra allocation is not good. We really need the normal sync code path to be as lightweight as it is right now. One crazy idea is to make context an out param, allowing a user defined callback to create the context when it is needed.

I think the pooled session model above could work, by making the session pool return the session that is affinitized to the running thread (instead of just an id) and only create a new one if it does not exist yet.

  • If the pooled session was simply going to return the session associated with the thread, then what would happen after an async call returns on a different thread? Would the code continue to use that same old session, now on a different continuation thread? What about another session that now tries to continue on the older thread? It seems that we would need sessions to be independent of threads, right?

@JorgeCandeias
Copy link

One crazy idea is to make context an out param, allowing a user defined callback to create the context when it is needed.

At least this can get the shell AsyncFasterKV working early without affecting the sync path or the core FasterKV too much.

If the pooled session was simply going to return the session associated with the thread, then what would happen after an async call returns on a different thread? Would the code continue to use that same old session, now on a different continuation thread? What about another session that now tries to continue on the older thread? It seems that we would need sessions to be independent of threads, right?

You're right, that would open a can of worms, if it would even work at all. Best to split those concerns from the onset.

As a first step, I'd just exploit the context parameter as the Orleans sample does it, and either make the context on the FasterKV an out or a ref to allow delaying the allocation up to when we hit the async path. That will be a breaking change on the current surface, but at least we can test it. And while a ValueTask sync path may have merit, the problem is that we can only await on it once. That's fine for tight-controlled internals, but prone to bugs on a public surface. Let's see how far we can get with plain old Tasks.

@ReubenBond
Copy link
Member

I think the pooled session model is appropriate. The sessions would essentially be async-local instead of thread-local.

Alternatively, each grain could rent a session at activation and return it at deactivation. That would require something to pulse Refresh(), which we could potentially add hooks for (or use a timer).

Thread-local has issues with a dynamically sized thread pool such as .NET's ThreadPool (which grains do not use currently)

@badrishc badrishc added the work in progress Work in progress label Jul 10, 2019
@badrishc
Copy link
Contributor Author

Okay, I'm just getting back to this topic after some travel. I plan to work on decoupling threads and sessions in Faster now, on this branch.

The epoch table will contain one row per session. There will be no thread local variables any more. When someone closes a session, the epoch table entry will become available for another session to rent out.

@ReubenBond, the session is a local variable for the application and thus effectively 'async local'. Is this what you meant, or were you thinking of AsyncLocal? The latter seems not very useful to me, perhaps I am missing something ...

…ork: session-based acccess just updates existing thread-local variables to set the session for the current thread.
@badrishc
Copy link
Contributor Author

Rough sketch of session checked in, without touching the existing thread-local framework: session-based access will just update existing thread-local variables using the session's values, before proceeding with usual operation normally.

@badrishc
Copy link
Contributor Author

@JorgeCandeias -- I added some preliminary support for pooled sessions. we need to see if this can improve the Orleans integration. See samples at https://github.com/microsoft/FASTER/blob/async-support/cs/test/SessionFASTERTests.cs

  • App can request a session and perform FASTER operations on it as it hops across threads (e.g., due to other async calls it might make)
  • Two apps can request two different sessions on the same thread.
  • An app can dispose a session, i.e., the next app that requests a session will not resume that session.
  • An app can "return" a session to FASTER. In this case, another app can call GetSharedSession and potentially get back the same session to continue operations on.
  • Apps can always call CreateSharedSession to get a fresh session

I'm still a bit fuzzy on how exactly this would map to the Orleans threading model, and would appreciate your thoughts. E.g., since individual grains may perform other async ops at any time, they may be holding an active FASTER session when they go async, right? In that case, when a different grain is scheduled on that thread, it would need to "rent" a different session. Thus, the number of active sessions may grow to more than the number of threads -- it would depend on the number of pending activated grains.

@badrishc
Copy link
Contributor Author

@JorgeCandeias -- any comments? Would this version be of interest to you to improve the FASTER-Orleans sample? And what do you think of adding a true async interface to FASTER?

@JorgeCandeias
Copy link

Hi @badrishc, many apologies for the delay, other things moved up on my list in the meantime, but this one is still on it.

Yes, this version is already useful for improving the sample, so I'll take it on-board, thank you!

That said, being a synchronous interface, it will still have to run on the thread pool in order not to limit throughput on the Orleans task scheduler when spinning for I/O completions. A true async interface with no blocking or background threads will enable direct integration with the Orleans scheduler and therefore not having to rely on the thread pool.

More importantly, it will become easier to promote this to product developers looking for straightforward solutions. 😄


E.g., since individual grains may perform other async ops at any time, they may be holding an active FASTER session when they go async, right? In that case, when a different grain is scheduled on that thread, it would need to "rent" a different session. Thus, the number of active sessions may grow to more than the number of threads -- it would depend on the number of pending activated grains.

This is correct. It has less to do with grains and more with TPL tasks. Although the Orleans scheduler only creates as many worker threads as processors, there can be great many tasks pending in the system, due to awaiting something non-CPU bound to complete, like I/O operations. In the meantime other tasks will run.

For background on this, Orleans does this to keep the CPU doing actual user work as much as possible and avoid wasting cycles on context switching or spinning for I/O. The reason is that grain requests on Orleans tend to be very fine-grained on the CPU, e.g. in the microseconds and lower (even if they await I/O tasks for longer), so having an Orleans worker thread spinning even for a few milliseconds can lower the throughput ceiling. Orleans is opinionated on this and enforces responsiveness in the system by killing any task taking over 30s and issuing warnings when tasks take over 200ms, regardless of CPU usage.

In my head, for an async/await model to work with the Orleans scheduler, I think...

  • For isolated sessions, there must be no (or very high) upper bound for the number of in-use sessions against a lookup, even if the pool itself has some limit. If faster can support this, then the sessions do not need to be thread-safe. Each grain task can request one, do work, go async, do more work, then return the session to the session pool for another task to request. This assumes developers won't pass around sessions between threads, just like we don't pass SqlConnections around.

  • For shared sessions, the sessions will need to be thread-safe. Let's say task 1 requests session A on thread 1 and awaits, then task 2 requests session A again on thread 1 and also awaits. When either continuation resumes, there's no guarantee the work will be performed on the same thread, as far as I know (@ReubenBond please correct me on this). We may end up with the continuations from tasks 1 and 2 accessing the same session concurrently.

I'm leaning in favour of isolated/pooled sessions because they mimic the behavior of well-known expensive resources such as SqlConnection, which product developers already tend to handle with care. Also, thread-safety and performance don't tend to mix, not sure how much concurrency handling would affect session performance, which already exists to avoid concurrency in the first place. What is your opinion?


I attempted to apply the TaskCompletionSource trick recommended by @ReubenBond on my PR clone, but wasn't able to do it on the "async lookup shell" (for lack of a better name) without it requiring changing the underlying lookup code and without avoiding allocation of TaskCompletionSource instances upfront.

Maybe you two have a better trick for this that I can learn from?
I'll try again on this new code base over the weekend and some good coffee.


Alternatively, each grain could rent a session at activation and return it at deactivation. That would require something to pulse Refresh(), which we could potentially add hooks for (or use a timer).

This matches the intent of the sample and naturally follows into creating generic grain templates for any type, or with an eye in the future, providing an injected faster lookup for some type and underlying storage via DI bindings, azure function style.

@ReubenBond
Copy link
Member

ReubenBond commented Jul 26, 2019

Orleans is opinionated on this and enforces responsiveness in the system by killing any task taking over 30s

One minor point: we wont kill Tasks, since .NET Tasks are co-operative & aren't preemptible. We will throw a timeout exception back to the caller, though, so that they aren't left hanging forever. We will also destroy the grain activation if it has become unresponsive for a very long time.

This assumes developers won't pass around sessions between threads,

Just to be precise here: a grain method call can hop between threads between await points, so the session wouldn't strictly be thread-local

When either continuation resumes, there's no guarantee the work will be performed on the same thread, as far as I know

Yes, precisely correct. This goes for any async method in .NET (i.e, it's not Orleans-specific)

@JorgeCandeias
Copy link

One minor point: we wont kill Tasks, since .NET Tasks are co-operative & aren't preemptible. We will throw a timeout exception back to the caller, though, so that they aren't left hanging forever. We will also destroy the grain activation if it has become unresponsive for a very long time.

Thanks for this important clarification 👍

This assumes developers won't pass around sessions between threads,

Just to be precise here: a grain method call can hop between threads between await points, so the session wouldn't strictly be thread-local

What I meant was it assumes developers won't willingly pass sessions around in a way that allows concurrent access.

@badrishc
Copy link
Contributor Author

I attempted to apply the TaskCompletionSource trick recommended by @ReubenBond on my PR clone, but wasn't able to do it on the "async lookup shell" (for lack of a better name) without it requiring changing the underlying lookup code and without avoiding allocation of TaskCompletionSource instances upfront.

How about this: we slightly modify FASTER's API to support "ref Context" instead of "Context". User then passes in a "null" context to the lookup. If/when an operation goes async, it can fill up/augment this context via a new user-specified context provider callback. In our use case, this callback will allocate the TaskCompletionSource instance. It would be "ref" instead of "out" because users might still want to add stuff to the context (from the stack) before invoking the operation.

void UpdatePendingContext(ref context);
{
   context = new TaskCompletionSource<TOutput>(...);
}

Copy link
Contributor

@gunaprsd gunaprsd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some queries that need to be resolved. See comments. O/w looks good to me.

cs/src/core/Allocator/AllocatorBase.cs Show resolved Hide resolved
cs/src/core/Index/FASTER/FASTERThread.cs Show resolved Hide resolved
cs/src/core/Index/Recovery/Checkpoint.cs Show resolved Hide resolved
cs/src/core/ClientSession/FASTERAsync.cs Outdated Show resolved Hide resolved
cs/src/core/ClientSession/FASTERAsync.cs Show resolved Hide resolved
cs/src/core/ClientSession/FASTERAsync.cs Show resolved Hide resolved
@badrishc
Copy link
Contributor Author

badrishc commented Dec 10, 2019

We now have a full async-compliant sessions interface to FasterKv! You no longer need to call refresh or worry about thread affinity. The overall idea is to create new sessions to FasterKv and perform a sequence of operations on a session. You can also have a separate async committer (checkpointer) thread for durability. The async session operations can be configured to await until either after they are completed in memory or after they are made durable by a checkpoint. This is similar to the new FasterLog API.

See any example in the playground for details (older API is marked obsolete for now). A good one is the newly added sample here.

I think it almost cannot get easier than this for C# users, but would love to hear feedback from everyone.

@badrishc badrishc requested a review from tli2 December 10, 2019 17:27
@badrishc
Copy link
Contributor Author

Async Sessions API

// Create new shared FASTER KV
var faster = new FasterKV(...);

// Create any number of sessions to FasterKV
var session = faster.NewSession();

// Async read, retrieves from disk if needed, reads uncommited if present
(Status, Output) result = await session.ReadAsync(key);

// Read waits for uncommitted read value to checkpoint/commit before returning
(Status, Output) result = await session.ReadAsync(key, waitForCommit: true);

// Upsert into mutable region, does not wait for commit
await session.UpsertAsync(key, value);

// Upsert into mutable region, return after this (and all prev) upserts commit
await session.UpsertAsync(key, value, waitForCommit: true);

// RMW into mutable region, does not wait for commit
await session.RMWAsync(key, input);

// RMW into mutable region, wait for commit
await session.RMWAsync(key, input, waitForCommit: true);

// Waits for all prev operations on session to commit before returning
await session.WaitForCommit();

Commits can be performed periodically by a separate thread/task:

int i = 0;
while (true)
{
  Thread.Sleep(5000);
  if (i++ % 100 == 0)
     faster.TakeFullCheckpoint(out _); // periodically take full index + log checkpoint
  else
     faster.TakeHybridLogCheckpoint(out _); // take (incremental) log checkpoint
}

@badrishc badrishc added enhancement New feature or request and removed work in progress Work in progress labels Dec 13, 2019
@badrishc badrishc changed the title [WIP] Better support for async/await threading model in FASTER Support async/await session model Dec 14, 2019
Copy link
Contributor

@tli2 tli2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than minor nits and clarifications. I didn't look at the internals too closely since I am not as familiar.

cs/src/core/Index/Common/Contexts.cs Show resolved Hide resolved
cs/src/core/Index/FASTER/FASTER.cs Outdated Show resolved Hide resolved
cs/src/core/Index/FASTER/FASTER.cs Outdated Show resolved Hide resolved
cs/src/core/Index/Interfaces/IFasterKV.cs Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Better support for async/await threading model
6 participants