Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native support for task { ... } #581

Open
dustinmoris opened this Issue Jun 15, 2017 · 39 comments

Comments

Projects
None yet
@dustinmoris
Copy link

dustinmoris commented Jun 15, 2017

I propose we implement a task {} workflow, similar to async {} which can execute and unwrap a Task, Task<'T> and ValueTask<'T> without having to convert the task into an async workflow first.

Given that most .NET libraries (almost everything if not everything) implements asynchronous methods which work with Task and Task<'T> it doesn't make sense to constantly have to convert back and forth between async workflows and tasks with the static Async.{...} methods if F# could just natively work with tasks instead.

My proposal is to add a task {} workflow which can exist side by side to the already existing async {} workflow. It would be a non-breaking addition to the F# language and probably (over time) supersede async {} altogether (I think async would only be used for backwards compatibility at some point).

Pros and Cons

The advantages of making this adjustment to F# would be better performance (as I understand that Tasks are slightly faster than Async and it will cut down the additional overhead cost of converting from Task to Async and from Async back to Task in every application. For example this is typical overhead in a standard .NET web application written in F#).

There are no disadvantages, because Async will still exist.

Extra information

FSharpX has a task computation expression and someone else has further improved it in this GitHub repository, which claims to be a lot faster.

Estimated cost (XS, S, M, L, XL, XXL):
S-M

Related suggestions:

The difference between #559 and this suggestion is that I don't propose to add more conversions between Task and Async, but to add native support for Tasks instead.

Affidavit (must be submitted)

Please tick this by placing a cross in the box:

  • This is not a question (e.g. like one you might ask on stackoverflow) and I have searched stackoverflow for discussions of this issue
  • I have searched both open and closed suggestions on this site and believe this is not a duplicate
  • This is not something which has obviously "already been decided" in previous versions of F#. If you're questioning a fundamental design decision that has obviously already been taken (e.g. "Make F# untyped") then please don't submit it.

Please tick all that apply:

  • This is not a breaking change to the F# language design
  • I would be willing to help implement and/or test this
  • I or my company would be willing to help crowdfund F# Software Foundation members to work on this
@lambdakris

This comment has been minimized.

Copy link

lambdakris commented Jun 15, 2017

Overall, I think it is a good idea for pragmatic purposes, but to be fair, there are some cons:

  1. Having two different but similar ways of creating asynchronous computations would add some cognitive overhead (even now there are times when I am indecisive between using async or mailbox for certain parallelism/concurrency scenarios).
  2. I am still getting a grasp of the issue, but I have seen it mentioned several times that Async composes better than Task and this would mean that they cannot be considered completely interchangeable (though I am unsure if this is inherent to Task or if it could be resolved by proper design of a task ce)

Nonetheless, I do agree that it would make it more approachable to work with the majority of .NET libraries and it certainly is an issue that gets raised every so often with regards to good F# support in this or that library/framework. In particular, I remember that there was an Orleans issue about this very thing and while that particular issue never got resolved it was addressed in Orleankka by implementing a custom task ce. So I would say that if authors of both ASP.NET and Orleans based F# libraries have found a need for a task ce, it might make sense to make it official.

@rmunn

This comment has been minimized.

Copy link

rmunn commented Jun 15, 2017

One of the most important differences between Async and Task, and this would need to be well documented in the docs for the task CE, is that when you run a function that returns an Async, the Async has not started yet. Whereas by the time you're holding a Task object, the Task is already running. That is why Asyncs compose better than Tasks, and why even if we do add a task computation expression, we won't want to get rid of async. See http://tomasp.net/blog/async-csharp-differences.aspx/ for more details.

@matthid

This comment has been minimized.

Copy link

matthid commented Jun 15, 2017

@rmunn I don't want to confuse anyone here. But that is just how the C# compiler is doing it. You actually can have a "not-started" task object (and start it with the "Start" method). I have actually used that in the past. But I agree that the general philosophy is that Tasks are already running, while Asyncs are not.

Starting Tasks manually is incredible painful and error prone.

@ReedCopsey

This comment has been minimized.

Copy link

ReedCopsey commented Jun 15, 2017

An alternative to this suggestion might be to just ease the interop between async and task. FusionTasks is an example of an approach to this, and I'd love to see something similar baked into the async builders and support in F#.

@dustinmoris

This comment has been minimized.

Copy link
Author

dustinmoris commented Jun 15, 2017

@ReedCopsey One motivation of this proposal is to cut down overhead as well where it is not required. FusionTaks is nice, but it still has to convert between the two models where there are many cases where an application could live entirely without a single conversion if there would be native support for tasks.

@ReedCopsey

This comment has been minimized.

Copy link

ReedCopsey commented Jun 15, 2017

A true implementation to handle Task and ValueTask probably should, ideally, use the functionality from C#'s task like types.

However, I expect that'd change the scope of work from S-M to more L+...

@buybackoff

This comment has been minimized.

Copy link

buybackoff commented Jun 16, 2017

As my experiments are linked here, a small clarification:

FSharpX has a task computation expression and someone else has further improved it in this GitHub repository, which claims to be a lot faster.

Not my rewrite but the Task CE from FSharpX is a lot faster than crossing async <-> TPL boundary. The rewrite just eliminates one allocation, its perf difference is not clear.

Using a single let! from that CE is quite fast and convenient and is on par with using awaiters with OnCompleted callbacks manually (this is what the CE does inside, but the task{} syntax is very nice). Loops are the problematic thing - while and for from the CE generate so much overheads and garbage that they become unusable on hot paths. I tried to rewrite it in many different ways, but could't reduce that. Every iteration of a loop goes through several methods of CE machinery - and this is the pain point where native task{} implementation could be very helpful. A gotcha was to use recursive functions with simple let! instead of loops in CEs. Maybe someone more experienced with CEs could just rewrite the task builder in more efficient way using awaiters and callbacks.

@0x53A

This comment has been minimized.

Copy link
Contributor

0x53A commented Jun 16, 2017

One thing I would like to call out is to NOT repeat the mistake of C# with regard to context capture.

To ease working with Tasks in a GUI context, they enabled capture by default, which means that in a non-gui context you need to add .ConfigureAwait(false) everywhere.

I've been using TaskBuilder.fs (https://github.com/rspeele/TaskBuilder.fs) in a few projects, and it works quite well.

And the best feature is that it acually has two taskbuilders, ContextSensitive, which behaves like C#, and ContextInsensitive, which does not capture the context.

So I am all for adding a native optimized task CE to F#, but please take care of the context handling.

@gerardtoconnor

This comment has been minimized.

Copy link

gerardtoconnor commented Jun 19, 2017

I think having an inbuilt task CE is essential for easy .Net Library Interop without a performance hit on conversions back and forth to async, but I think the async and task would perform two different functions.
@dustinmoris

and probably (over time) supersede async {} altogether

The starting and running of async work pipelines with Async is safer and easier so I think Async still serves a purpose although if the two could someday converge it would simplify things ... have a f# Task module that safely started/managed/parallelised task workloads the way Async does today.

@lambdakris

Having two different but similar ways of creating asynchronous computations would add some cognitive overhead (even now there are times when I am indecisive between using async or mailbox for certain parallelism/concurrency scenarios).

@rmunn

when you run a function that returns an Async, the Async has not started yet.

If it is clear that task CE is a helper CE for .net application/librarys that predominantly manage the starting of their own tasks, task CE would just be an async function pipeline, while Async in its own right can manage the whole pipeline of starting, parrallel, async etc. The task CE would not have any parallelism built in (for/while in Async CE are not parallel). This narrows the use case down to working with .net libraries where no Async is needed as library manages task lifecycle. I agree that task CE is not a substitute for extensive functionality Async, would have a specific use case of .net task library interop without the expensive conversions back and forth to Async that are unneeded.

@ReedCopsey

A true implementation to handle Task and ValueTask probably should, ideally, use the functionality from C#'s task like types.

Would that be needed for just a task CE or to impliment a Task Module with comparable fucntionality to Async? If the later, I do agree it would be a big job. If the former, just the task CE, you will know better then me but CE overloads & type inference is a bit limiting now, when trying to overload task CE with plain Task as well as standard Task<'T>, the compiler complains that some bindings need type annotaitons as types cannot be inferred. if the CE bind was an inline function and not a class method overload, we could put type constraint on the bind type ('T when 'T : (member GetAwaiter : unit -> Awaiter<'U>)), but if we are focusing on CE, I dont know if we can improve type inference from type constraints as CE uses method overloads and not inline function ... if there is a way, I would greatly appreciate some direction?

@0x53A
Agreed but I think perhaps we make the default simple task CE ContextInsensitive as if people want to handle parrallel compute on shared state/data, they should be using alternaive mechanism unless GUI. One of the motivations for introducing a task CE would be for easy interop with ASP.NET core and it should be noted, there is no sync context in ASP.NET core, they did away with it for obvious performance reasons and the lack of GUI requirements etc, I would focus on just the insensitive one now and then perhaps introduce additional synctask {} after if GUI (etc) people want it?

@buybackoff

Maybe someone more experienced with CEs could just rewrite the task builder in more efficient way using awaiters and callbacks.

This is very helpful for team building it to know ahead of time, if for and while loop can perhaps use lightweight recursive task unwrapping or somehow reuse task completion source, would help reduce overhead you identified.

@gerardtoconnor

This comment has been minimized.

Copy link

gerardtoconnor commented Jun 19, 2017

Ideally, if we could bind based on the member GetAwaiter() we could do something like below (and use on Task<'T>, ValueTask<'T> and any other awaiter) but I get a compliler error saying :

could not be generalised because it would escape scope

   // General Task type case that can accept Task<'T>, ValueTask<'T> and any other custom awaiter type 
   member this.Bind<'T,'S,'U when 'T : (member GetAwaiter : unit -> TaskAwaiter<'S>)>(m:'T, f:'S->Task<'U>) : Task<'U> =
      let a = m.GetAwaiter()
      if a.IsCompleted then a.GetResult() |> f
      else
            let tcs = TaskCompletionSource<Task<'U>>()
            let t = tcs.Task 
            a.Completed(fun () -> tcs.SetResult( a.GetResult() |> f ))
            t.Unwrap()
@isaacabraham

This comment has been minimized.

Copy link

isaacabraham commented Jun 19, 2017

@matthid but aren't the majority of Task-returning methods in the BCL "hot" tasks? In which case it would be pretty difficult to reason about what "kind" of a task you already have. In this respect Async is very consistent.

@dustinmoris

This comment has been minimized.

Copy link
Author

dustinmoris commented Jun 23, 2017

This is true, what many already pointed out, Task cannot replace Async in F#, because having the control of when an async computation gets started makes more sense in a functional language. The Task support would have to be an addition to better interop with other .NET code.

@Tarmil

This comment has been minimized.

Copy link

Tarmil commented Jun 26, 2017

One of the most important differences between Async and Task, and this would need to be well documented in the docs for the task CE, is that when you run a function that returns an Async, the Async has not started yet. Whereas by the time you're holding a Task object, the Task is already running.

Not only has the Async not started yet, but you may never start it, or start it multiple times. Conceptually Async<'T> is isomorphic to unit -> Task<'T>.

@dsyme dsyme changed the title Native support for Task and Task<'T> Native support for task { ... } Nov 16, 2017

@dsyme

This comment has been minimized.

Copy link
Collaborator

dsyme commented Nov 16, 2017

I am marking this as approved. I would love to see an RFC and proposed implementation for this, with the people in the community who need it taking a strong lead on it?

@gerardtoconnor

This comment has been minimized.

Copy link

gerardtoconnor commented Nov 16, 2017

I think one of the best implementations to date, that we adopted for Giraffe, is rspeele/TaskBuilder.fs as it can accept Task-like objects like Task<'T>, Task & ValueTask<'T> without complier having inference issues thanks to extension methods approach.

Robert has implemented using AsyncTaskMethodBuilder similar to how c# does async/await so better then chaining ContinueWith, but with little added overhead of allocations on DU etc. There may be some room to improve slightly using structs and using IAsyncStateMachine SetStateMachine for pauses but the current number/size of refs being copied etc would make it slower so this would need further testing & investigation in line with how C# async/await is able to overcome this.

@NinoFloris

This comment has been minimized.

Copy link

NinoFloris commented Nov 16, 2017

I've had a few conversations about it with different people spanning half a year or so @cartermp @gerardtoconnor @ReedCopsey
It's great to see this happening, as it's a painful point to pleasantly interop with increasingly bigger portions of 'asyncified' .NET without anything in the box. So whatever gets into core is already very welcome when viewed from a no hassle compat standpoint.

On the performance end we've used some ugly hacks in parts of our codebase to try get closer to C# speed where needed but it's difficult to get on par with exec time and allocs. We got close enough for our liking but the allocs are still substantially higher than C#'s

We probably never quite will because there's just a lot more closure capturing etc going on vs having the compiler rewrite the method body to goto's for each step and it also generating correctly shaped types where all locals can reside in unboxed form while on the heap.
It's also feels like it's not something to focus on in the first iteration, for fear of losing steam, as I'd like to see it merged in yesterday already ;)

Nevertheless I'm very excited to see a green light for it and as we (my company) feel the pain I'm eager to help!

@dsyme

This comment has been minimized.

Copy link
Collaborator

dsyme commented Nov 16, 2017

I think one of the best implementations to date, that we adopted for Giraffe, is rspeele/TaskBuilder.fs as it can accept Task-like objects like Task<'T>, Task & ValueTask<'T> without complier having inference issues thanks to extension methods approach.

I took a look and it is great. Very impressed. Would love to see this as the basis for a PR

I also wonder about a task { ... } that propagates cancellation tokens implicitly (and am concerned by the implicit bind to Async using StartAsTask without a cancellation token)

@dsyme dsyme added the needs rfc label Dec 1, 2017

@buybackoff

This comment has been minimized.

Copy link

buybackoff commented May 23, 2018

The TaskBuilder.fs has a contrived benchmark of reading from file system. It's real IO - so why no just use native F#'s async that is slow for CPU-intensive tasks - performance is irrelevant when network/disks are touched, it is already slow. TaskBuilder.fs still allocates a lot on a hot path of synchronous completion. My hot path is intentionally synchronous for default implementation - but that is the point: I try to make it so + if @stephentoub et. al. have spent a great deal of work to make C# shine and to short-circuit syncronous cases - then why it doesn't reflect on F#!?

In TaskBuilder.fs, making Step a struct and other tricks just make the whole thing up to 5x slower. Have spent several hours y'day to tweak it but failed (warn: trash there) and opted to rewrite async parts in c#.

We do need native F# task{} modeled after C# - structs, no allocations on synchronous path... It's not that F# is "bad" but TPL stack (as is awailable from C#) is so awesome and becoming more and more so - please make F# even with that!

Maybe someone could show me, please, where to start/play on this? I'm trying to grok Rust's async story that is evolving fast and, frankly, didn't look into low level async state machines before Rust. But want do learn how that works across languages and platforms.

@dsyme

This comment has been minimized.

Copy link
Collaborator

dsyme commented May 24, 2018

@buybackoff It's useful feedback: basically you're saying that if we add native Task support for F# then it should be done in the same way as C# to get full performance parity. Rather than just adding the TaskBuilder task { ... } to the FSharp.Core library (we might still add task { .. } but give it a special compilation, as is done for seq { ... }).

After also looking at async debugging I can see the wisdom in that approach. That's a big job, but perhaps it's feasible since it would just be emitting code as C# does, wee would be following an established spec.

@buybackoff

This comment has been minimized.

Copy link

buybackoff commented May 24, 2018

@dsyme

basically you're saying that if we add native Task support for F# then it should be done in the same way as C# to get full performance parity.

Yes, I see no reason why to differ from TPL. There is already async {} that differs in behavior. task {} should be complied to the same IL that C#'s async and reuse state machines - there is so much optimization work done there and F# should leverage it. E.g. in .NET Core 2.1 socket API with ValueTask is specially optimized for minimum allocations. It would be very hard to justify using F# for network code if it allocates on every received packet. And the problem is that even if the networking code is written in C#, F# cannot consume it without allocations since they happen on every await (let!/do!). With DUs as structs and value tuples built in the language F# is well suitable for high performance synchronous code and micro-optimization. inline keyword often helps to inline lambdas and that is missing in C#. The expensive bridge between Tasks is the only thing that remains popping up in profilers.

Just adding TaskBuilder.fs is kind of meaningless since anyone could add that file to existing projects. In TaskBuilder.fs I tried to make Step a struct and pass it by reference everywhere, because just making it a struct is slow likely due to excessive copying. But it was impossible to finish that because byref cannot be a field or passed to closures. Maybe your recent addition of ref structs will allow to rewrite TaskBuilder.fs in more efficient way.

@zpodlovics

This comment has been minimized.

Copy link

zpodlovics commented May 30, 2018

Is there any reason why NOT solve task support with full performance parity to C#? It's a known and solvable problem, and there is a recepie how to solve it. Unless there is a better, more performant solution, with less allocation, more optimization opportunity why not adapt the usable / working in real-world approach?

I tried the same struct approach with TaskBuilder.fs due the excessive memory allocations. However due the different semantics, highly concurrent code with struct will crash horribly under high load (heap corruption during the execution, that will cause issues later). With no debugger support (only SOS + WinDBG / LLDB), it's nearly impossible to reproduce, examine and fix these problems. Good luck examining the on-the-fly generated code at IL/ASM and raw byte level. While the internals of CoreCLR could be interesting and fun, debugging these concurrency/corruption issues for weeks is not (been there, done that).

I am sorry, but I have to say, that the current F# task solution is not really supportable in real-world software. I guess the problem comes from the fact that there is lot less "dogfooding" with F# than C#. It's far better to have a static code generated "dumb" code (aka C# approach), that is relatively easy to understand and (relatively) easy to examine and debug and support, than a complex and elegant one, that is not supportable in real-world...

F# should have lot more focus on performance, especially on avoiding garbage generation (both heap and stack). Sooner or later every business will have this question: why we leave 2x-3x improvements (cost/performance/whatever) on the table by using X instead of Y?

@NinoFloris

This comment has been minimized.

Copy link

NinoFloris commented May 31, 2018

Reading through https://news.ycombinator.com/item?id=17191087 I came across yet another comment on F#s painful TPL integration.

I'm also seeing problems with new hosting primitives being used in F# without understanding the limits of the TaskBuilder CE.

For instance take that persons example of the new Channels<T> api introduced in 2.1
https://gist.github.com/AlgorithmsAreCool/b0960ce8a3400305e43fe8ffdf89b32c

Specifically:

   private static async Task ListenToChannel(ChannelReader<string> reader)
        {
            //because async methods use a state machine to handle awaits
            //it is safe to await in an infinte loop. Thank you C# compiler gods!

            while (await reader.WaitToReadAsync())//if this returns false the channel is completed
            {

...etc

Doing such an endless while loop in the TaskBuilder CE however will at some point OOM due to an endlessly growing Step continuation depth. This stuff leads to extremely nasty bugs. 1) They'll usually only show up after prolonged use (i.e. production) 2) The exception will of course happen in an unrelated area all together.

Code like this is going to be written so much more with the unstoppable rising popularity of the TPL, asp.net generic host api for web and non web usage, and other async message models like actors/channels/etc.

There are all these correctness and performance gotchas with a non native implementation that will hurt F# viability as these usages grow in popularity.

@cartermp

This comment has been minimized.

Copy link
Member

cartermp commented May 31, 2018

There are all these correctness and performance gotchas with a non native implementation that will hurt F# viability as these usages grow in popularity.

Agree 100%. Although this is not something we have the capacity to take for F# 4.5, I'm very strongly in favor of doing so in the release afterwards.

@dsyme

This comment has been minimized.

Copy link
Collaborator

dsyme commented May 31, 2018

Is there any reason why not solve task support with parity to C#?

Yes, I think we're all agreed on this approach now.

FWIW I find async still much simpler and easier to get correct (particularly since you don't plumb the cancellation tokens explicitly, and particularly because of cold start), so I'm happy to continue to make it the focus, just as immutable linked lists are the focus for F#. Async is also sufficiently performant for the vast majority of applications outside of systems programming.

But I'm also happy to "approve-in-principle" inbuilt support for task that goes as close as possible to C# code output. The exact details would need to be ironed out.

@buybackoff

This comment has been minimized.

Copy link

buybackoff commented Jun 1, 2018

The key point not to miss is ValueTask with its hot path and the bridge (lack of) between F# code and TPL state machines. E.g.

open CsharpPipe
...
let loop () =
   task {
       while true do
          let! message : 'T = csharpPipe.ReadAsync() // () -> ValueTask<'T>
          let result: 'U = fsharpProcessor(message)
          do! csharpPipe.WriteAsync(result) // 'U -> ValueTask
   }

As @davidfowl kindly explained recently, async endless loop just use the same chunky state machine (aspnet/KestrelHttpServer#2620 (comment)):

Usually, if you can make the async state machine chunky (large long running async loops) then it's fine. The thing that ends up killing perf usually are short lived async state machines (the overhead of the async machinery might be too high for a short lived async method). This usually means receive loops are fine, but writing isn't.

If the bridge cannot be seamless and a long async pipeline with F# and C# parts cannot be parts of the same state machine and every iteration will allocate something or be slow - then the effort of native task{} would be useless.

But even as of now, another pattern is quite possible from C#:

using FSharpProcessor;
...
public async Task Loop() 
{
      while (true) 
      {
          var message = await csharpPipe.ReadAsync();
          var result = fsharpProcessor.Invoke(message);
          await csharpPipe.WriteAsync(result);
      }
}

I see little difference in usability. E.g. F# could be used for synchronous processing of messages and keep some complex (logical) state machine for message processing/decision making, and C#'s part could consult F# code but do the efficient work of message delivery.

for the vast majority of applications outside of systems programming

Just realized I am mostly doing "kind of" system programming lately to deliver data for analysis as fast as possible, unfortunately without much complex algos where F# could shine. My use case is the pattern I described above. And it's just fine not trying to use F# everywhere just for the sake of it, but it could be plugged into any processing pipeline easily now.

Maybe this native task {} is not a big deal at all... There are some other parts in F# that by default does not make friends with JIT or generate non-intuitive IL, e.g. the GenericEqualityComparer as a method call is very frequent unexpected fat (in %) line in profiler, when the types in question are primitive or implement IEquitable<T>. It is just too easy to use =.

I now think task {} is an interesting idea but given the potential huge amount of effort it shouldn't be a priority while there are known issues with synchronous code (and of cause tooling!).

@dsyme

This comment has been minimized.

Copy link
Collaborator

dsyme commented Jun 1, 2018

If the bridge cannot be seamless and a long async pipeline with F# and C# parts cannot be parts of the same state machine and every iteration will allocate something or be slow - then the effort of native task{} would be useless.

I'm not sure I understand the overall comment (and I'd like to). For the kind of very-allocation-sensitive code you are writing then your F# code would just use task { ... } everywhere, wouldn't it? So there would be no bridging.

@zpodlovics

This comment has been minimized.

Copy link

zpodlovics commented Jun 1, 2018

I guess he means the C# async implementation with AsyncTaskMethodBuilder. The builder will create a state machine - preferrably with small mutable state data - heap allocated:

"When IO is actually async (because read data is not yet available or the response buffers are full), seeing async allocations is completely expected. Since we don't want to block a thread, we have to move the continuation data off the stack and onto the heap."

However unless the F# somehow manages to merge to the same "generated" async state machine, and the end result is a big long running state machine, the second state machine - have to created (and allocated on the heap to continuation data) on each message (which is useless for performance due the allocations):

" I would focus on the per message allocations. Usually, if you can make the async state machine chunky (large long running async loops) then it's fine. The thing that ends up killing perf usually are short lived async state machines (the overhead of the async machinery might be too high for a short lived async method). This usually means receive loops are fine, but writing isn't."

Single state machine pseudocode with one long running allocation for the states:

let state = new CompiledStateMachine()
while(true) 
 let! input  = in.ReadAsync()
 let! newData = f input
 do! = out.WriteAsync newData
 state.MoveNewState()

Multi state machine pseudocode with one long running allocation for the main state machine and per message state machine allocation for each received message:

let state = new CompiledStateMachine()
while(true) 
 let! input  = in.ReadAsync()
 let messageProcessingState = new MessageProcessingCompiledStateMachine()
 let! newData = f input messageProcessingState
 do! = out.WriteAsync newData
 state.MoveNewState()

And using sync methods in fully async context is the worst thing that you can do:

" To expand on @davidfowl's point, while theoretically synchronous I/O could reduce async machinery related allocations at the cost of blocking threads (which use a lot of memory in their own right mostly for the stacks), this is not the case for ASP.NET since it written from the ground up using async/await meaning the async-related allocations will happen regardless.

When you use the synchronous Stream APIs with ASP.NET, you're basically adding hidden Task.Wait()s everywhere. We call this sync-over-async, and it gives you the worst of both worlds. In 2.0, we actually wanted to completely remove support for the synchronous Stream APIs in ASP.NET, but this broke too many things.

@buybackoff

This comment has been minimized.

Copy link

buybackoff commented Jun 1, 2018

@dsyme

For the kind of very-allocation-sensitive code you are writing then your F# code would just use task { ... } everywhere, wouldn't it? So there would be no bridging.

Unless IO stack is written in F# completely, there is always a point to cross between F# and TPL. This is not so much a language issue per se, but the fact that the current IO stack is using TPL with it's abstractions and concrete implementations, they managed to make it async and allocations-free on hot paths, yet F# cannot generate code that is an integral part of those async state machines implementation. I see the issue is not "be like C#", but natively support .NET async infrastucture.

@zpodlovics

However unless the F# somehow manages to merge to the same "generated" async state machine, and the end result is a big long running state machine, the second state machine - have to created (and allocated on the heap to continuation data) on each message

Exactly that was my point, async pipline should be as long as possible. If C#'s async method has 10 steps that are merged into a single state machine, and then F# awaits (let!) it and then does other 10 async steps inside task{} - the end result should be a single state machine with 20 steps, not 10+10, or is the worst case 10 + 1 + 1 .. + 1.

If that goal is technically or economically (time/human resources) not feasible then TaskBuilder.fs would be just fine to make it "just work fast enough" when needed. Even loops in Kestrel allocate now, but objects are small and GC0-collectable.

very-allocation-sensitive code you are writing

very is not possible in .NET in principle for a general-purpose code. I think .NET currently is perfectly pragmatic for most workloads. Trying to make it absolute performance winner will make it C/Rust, trying to make it pure FP will make it JavaScript :)

@gerardtoconnor

This comment has been minimized.

Copy link

gerardtoconnor commented Jul 27, 2018

In relation to the implementation for this, I would prefer if we could fix properly and make Task a first class citizen in the Fsharp compliler with a specialised Task CE like async & query.

To do this we should/could adopt the async/await model for nested task CEs, This uses a pre-allocated struct statemachine that is boxed back into the AMB at the end of a run on waits but needs to have fields available for all the captured values though out the entire pipeline of tasks.

How this differs from how the current TaskBuilder bindings work is that all continuations will be Action, functions without input, as the continuation capture would automatically include the task awaiter as a captured variable also, allowing the binding action to just be a assignment from the awaiter.GetResult().
In addition, rather then having one capture per bind, we are effectively capturing all code blocks for the whole pipeline in a single SM, progressing the calling of each Invoke#() as state progresses. the mechanics of the SM & AMB allow the SM struct to be boxed in and out of its slot efficiently so no loss of capture data/refs

Just want this to be a jump off point so please feel free to suggest better alternatives but I think it is best to keep in line with the C# async/await pattern to ensure we benefit from updates in future

module Task 

// initial example of a complex (pointless) task CE that needs to be wrapped up in a state machine
let get url1 url2 = task {
    let! res = HttpClient.GetAsync(url)
    if res.StatusCode = 200 then
           return res
    else
           return!  HttpClient.GetAsync(url)
}

let main () = task {
    let captureVariable1 = "captured value one"
    let! bind1 = task {
            use conn = SqlConn("connectionstring")
            let! DBData = conn.Query<MyType>("SELECT …")
            let captureVariable2 = "captured value two"  
            <expr uses captureVariable2>
            let! bind2 = task {
               let captureVariable3 = "captured value three"
               let result = 0
               for data in DBData do
                      Result  <- result + data.count 
                      let! res = Get data.url1 data.url2
                      Do! HttpClient.Post(capturedVariable3,res)
               return result
            }
            if bind2 > 0 then
               return true
            else
               return false
    }
    if bind1  then
           return! HttpClient.Post(capturedVariable1,"success")
    else
           return! HttpClient.Post(capturedVariable1,"failure")
    return ()       
}

////// Inlined and flattening the nested task CE's would look something like below, allowing removal of awaits on child computations

let main () = task {
    let captureVariable1 = "captured value one"
    use conn = SqlConn("connectionstring")
    let! DBData = conn.Query<MyType>("SELECT …")
    let captureVariable2 = "captured value two"  
    <expr uses captureVariable2>
    let captureVariable3 = "captured value three"
    let result = 0
    for data in DBData do
        Result  <- result + data.count 
        let! res = HttpClient.GetAsync(url)
        if res.StatusCode = 200 then
            Do! HttpClient.Post(capturedVariable3,res)
        else
            let! res' = HttpClient.GetAsync(url)
            Do! HttpClient.Post(capturedVariable3,res')
    
    if result > 0  then
        do! HttpClient.Post(capturedVariable1,"success")
    else
        do! HttpClient.Post(capturedVariable1,"failure")
    return ()
}

// a specialised StateMachine type created for the pipeline by compiler
type StateMachine'filename@# =
    struct
        val AMB : AsyncMethodBuilder<unit> // AMB return type for SM is fixed as initial Task type we need to return task for
        val mutable state : int
        
        // all captured variables in pipeline pre-allocated (potential for slot re-use where new variable needed after one finished with)
        [<DefaultValue>] val mutable capturedVariable1 : string
        // captureVariable2 is not used after a bind so not captured at any point                 
        [<DefaultValue>] val mutable capturedVariable3 : string
        [<DefaultValue>] val mutable result : int
        [<DefaultValue>] val mutable url : string
        [<DefaultValue>] val mutable res : HttpResponse

        // all enumerators preallocated
        [<DefaultValue>] val mutable DBData : Enumerator<MyType>

        // all unique awaiters in the pipeline pre-allocated (only one awaiter can be awaited at once so slot re-use fine)
        [<DefaultValue>] val mutable awaiter1 : TaskAwaiter<MyType seq>
        [<DefaultValue>] val mutable awaiter2 : TaskAwaiter<HttpResponse>
        [<DefaultValue>] val mutable awaiter3 : TaskAwaiter

        // all disposables pre-allocated
        [<DefaultValue>] val mutable sqlConn : IDisposable

        member inline x.Invoke0() =
            x.captureVariable1 <- "captured value one"
            x.sqlConn <- SqlConn("connectionstring")
            let work = conn.Query<MyType>("SELECT …")
            x.awaiter1 <- work.GetAwaiter()
            
            state <- 1
            x.AMB.AwaitUnsafeOnCompleted(&x.awaiter1,&x)

        member inline x.Invoke1() =
            x.DBData <- x.awaiter1.GetResult().GetEnumerator() // get result integrated into the method
            // not is similar to how bind works already with captures
            let captureVariable2 = "captured value two"  
            <expr uses captureVariable2>
            // binding of TaskCE (bind2) is inlined/flattened
            x.captureVariable3 <- "captured value three"
            x.result <- 0

            state <- 2  // for safety set state to two althrough already calling it
            x.Invoke2() // start loop

        member inline x.Invoke2() =    // start looping block
            if x.DBData.MoveNext() then
                x.result  <- x.result + x.DBData.Current.count
                // binding of TaskCE (res) is inlined/flattened
                let work = HttpClient.GetAsync(x.DBData.Current.Url1)
                x.awaiter2 <- work.GetAwaiter()

                state <- 3
                x.AMB.AwaitUnsafeOnCompleted(&x.awaiter2,&x)
            else
                // loop finished so execute following block
                if x.result > 0  then
                    let work = HttpClient.Post(x.capturedVariable1,"success")
                    awaiter3 <- work.GetAwaiter()

                    state <- 5
                    x.AMB.AwaitUnsafeOnCompleted(&x.awaiter3,&x) //re-using awaiter3 as same plain awaiter type
                else
                    let work = HttpClient.Post(x.capturedVariable1,"failure")
                    awaiter3 <- work.GetAwaiter()
                    
                    state <- 5
                    x.AMB.AwaitUnsafeOnCompleted(&x.awaiter3,&x) //re-using awaiter3 as same plain awaiter type

        member inline x.Invoke3() = 
            let res = x.awaiter2.GetResult()
            if res.StatusCode = 200 then
                let work = HttpClient.Post(x.capturedVariable3,res)
                x.awaiter3 <- work.GetAwaiter()   //re-using awaiter3 as same plain awaiter type

                state <- 2  // finished loop block so go back to start of loop block enumeration
                x.AMB.AwaitUnsafeOnCompleted(&x.awaiter3,&x)

            else
                let work = HttpClient.GetAsync(x.DBData.Current.Url2)
                x.awaiter2 <- work.GetAwaiter() //re-using awaiter2 as same HttpResponse awaiter type

                state <- 4
                x.AMB.AwaitUnsafeOnCompleted(&x.awaiter3,&x)
                

        member inline x.Invoke4() = 
            let res' = x.awaiter2.GetResult()
            let work = HttpClient.Post(x.capturedVariable3,res')
            x.awaiter3 <- work.GetAwaiter()  //re-using awaiter3 as same plain awaiter type
            
            state <- 2 // finished loop block so go back to start of loop block enumeration
            x.AMB.AwaitUnsafeOnCompleted(&x.awaiter3,&x)

        member inline x.Invoke5() = 
            x.sqlConn.Dispose()
            x.AMB.SetResult( () )

        // the state machine interface is 
        interface IAsyncStateMachine with
            x.MoveNext() =
                match x.state with
                | 0 -> x.Invoke0()
                | 1 -> x.Invoke1()
                | 2 -> x.Invoke2()
                | 3 -> x.Invoke3()
                | 4 -> x.Invoke4()
                | 5 -> x.Invoke5()
                | _ -> ()
            x.SetStateMachine sm = x.AMB.SetStateMachine sm

        new StateMachine'filename@#(amb) = { AMB = amb , state = 0 }
    end

/// then the replaced complied code for the CE looks like

let main () =
    let amb = AsyncMethodBuilder()
    let mutable sm = StateMachine'filename@#(amb)
    amb.Start(&sm)
    amb.Task
@NinoFloris

This comment has been minimized.

Copy link

NinoFloris commented Feb 28, 2019

I wrote http://github.com/crowded/ply a while back, it's not on Nuget yet but it would serve as the best starting point for this endevour.

It is super close to async await in terms of cpu time and I have a general outline below of what I think is needed to get Ply and CEs in general super close to C# async await in terms of allocations.

These are the sources of allocations in Ply:

  • An allocation per CE flow for an initial function value to pass into our StateMachine. This is for the purpose of running the entire CE body in an execution bubble (ExecCtx and SyncCtx). It's done by the AsyncTaskMethodBuilder (it uses internal setters on Thread) before it calls MoveNext.
  • Every bind continuation needs to be forced to a function value as we have to carry them around in our return value.
  • The way we have to thread the concrete awaiter type (to get maximum optimization by CoreFX) means we need an existential, on the CLR the only way to get those is to allocate.

I'm presenting language suggestions below which are all thought of in the spirit of maximum generality, possible of improving any CE. If we see these suggestions are a no-go we basically have to do a compiler level CE rewriting pass for task. Due to reasons I go into below such a rewriting has to happen inside fsc, FSharp.Core optimizations are not enough to get to the result we need. I'd really like to avoid that but nevertheless either path comes with quite some complexity and a maintenance burden on the compiler team.

As far as I know not even seq needs to have such invasive optimization to be fast as task. Simply because task is continuously dependent on forcing functions to real runtime values (read allocations). There is no weaseling about the meaning of a value in terms of optimizing those allocations away. Such a pattern is truly hard to optimize for at compile time.

To remove the existential allocation I have a prototype here: https://github.com/crowded/ply/tree/feature/experimental-state-passing it allocates a new computation builder instance per CE invocation to have it carry around the running AsyncTaskMethodBuilder as instance state. It directly enqueues awaiters in bind through this removing the need for returning an existential at all. It's dirty as it completely goes around monadic binding, but it works...

Beside it being dirty it makes expressing CE actions like Combine awkward. Now that you don't have the previous awaiter + continuation embedded in the monadic value that's passed into Combine you cannot easily produce a combined task. This might be worked around with very little overhead by storing the awaiter + continuation of any Ply on this but I haven't looked very closely yet.

On top of that to remove the other allocations I'd love to see two features to deal with most of that:

  • Struct object expressions with proper inlining optimization to inline away function value allocations.

    This already happens for HoF arguments + inline functions so it's not without precedent. However automatically adding fields to an object expression or using it as an optimization construct may be antithetical to the concept of object expressions. @dsyme what are your thoughts?

    If the function value was a closure, we must include the state reference in a field on the struct. It would allow you to embed the underlying code (or a call to it) of a function value directly into a struct object expression method. Say:

let inline (k: () -> Ply<'K, 'T>) = 
  // valueK is of type IFunction but won't locally lose it's true type so we can use the struct type for generic type constructors or constrained calls. 
  // Invoke actually calls the IL method here, not the allocated k instance Invoke method. 
  let valueK = { struct IFunction with member __.Invoke() = k() }

  // 'K where 'K :> IFunction , _ is inferred to be the actual struct type. 
  // This true type may or may not be surfaced to the user but that's something to discuss.
  Ply<_,'T>(valueK) 
  • Now you might have read the above and asked yourself "if the function value usually is a closure don't we still have to allocate a reference object per function value to carry that state?" and you would be correct. Which is why my second proposal would be to add a generally applicable CE rewriting to the compiler such that all CE continuations share the same 'variable space/closure' value, amortizing the allocations to effectively a single alloc.
    For example ComputationExpressionAttribute(SharedVariableSpace = true)

Sidebar: Instead of these two features a different rewrite could be added BindContinuationsAsStateMachine. It would assemble every piece of bind cexpr into the same function value, with a state increment per invocation. That does make such a function very special, you cannot use it freely as it should only be called once per bind. It would get us half way there but it would still not solve issues around While/For/Try body arguments requiring allocations.

Those two additions to F# would, should, get us within one alloc per entire CE block of async await, leaving only the single variable space allocation. The additions are general in design and applicable to other CEs. Besides CEs I'd say any function value optimizations are extremely useful for any future perf work you might want to do in F#.

Related discussion is struct/value lambdas in C#, specifically the last comment by Bart de Smet where he outlines a similar scheme. Their compiler does a rewriting for ref delegates (which have an implicit byref argument for the closure value) such that at call time these closure values get passed to the delegate.
dotnet/csharplang#1060 (comment)

Which brings me to that last allocation.

  • We could say that it's good enough and do nothing.
  • It could be as 'simple' as having a CE use site flag to change that closure state class to a struct, taking copying as an acceptable trade-off.
  • Or it could be as complex as the rewriting like Bart de Smet employed, which feels like a much more heavyweight addition to the language and the compiler.

Whether either of these get added isn't really important for me personally.
AFAIK Task returing methods in C# still get compiled with a class based state machine carrying all the state, so they would have a similar alloc profile. Only ValueTask returing methods get struct treatment, possibly this distinction was done to avoid runtime generic code bloat for AsyncTaskMethodBuilder.

To conclude, I'm not sure if I would advocate for special one-off compiler level CE rewriting anymore, even if that means we keep that single closure value allocation...

Thoughts?
@dsyme @cartermp @gerardtoconnor

P.S. I feel this is the right path forward but I may be failing to see some crucial thing which is a complete blocker for this approach. It's complex to reason through so please be a critical reader :)

@dsyme

This comment has been minimized.

Copy link
Collaborator

dsyme commented Mar 1, 2019

@NinoFloris It's a fascinating write up, I hadn't seen ply

To conclude, I'm not sure if I would advocate for special one-off compiler level CE rewriting anymore, even if that means we keep that single closure value allocation...

Quick response: In my heart I feel we need a one-off compiler rewrite just for ensuring good debugging alone. And it seems simpler to rely on than Optimizer.fs, given how fragile such thing can be.

But I may change my mind if/when I actually try to implement it!

@NinoFloris

This comment has been minimized.

Copy link

NinoFloris commented Mar 1, 2019

@dsyme Stack traces are actually very pleasant in ply. I made sure of that, alloc optimizations needed quite some inline already and that happened to match up with how I'd like to see the traces too. It's only showing the MoveNext method of the statemachine most of the times. Just like C# would.

Who knows indeed. I know I would generally like some extra power (without going the macro route) in F# CEs to have them be used more without restraint.

@7sharp9

This comment has been minimized.

Copy link
Member

7sharp9 commented Mar 5, 2019

@NinoFloris I wonder if some ast input/output generation could achieve something interesting here?

@NinoFloris

This comment has been minimized.

Copy link

NinoFloris commented Mar 5, 2019

Definitely 🙃 but that would fall under 'macros/code gen' and no end to end .net tool chain exists to do it well, yet. For now it doesn't seem an option worth recommending for general purpose production development.

@7sharp9

This comment has been minimized.

Copy link
Member

7sharp9 commented Mar 5, 2019

@NinoFloris I might be doing something for the Applied F# Challenge so it was just something I was pondering...

@rspeele

This comment has been minimized.

Copy link

rspeele commented Mar 6, 2019

Hey @NinoFloris , awesome work on Ply. I am amazed that you can get performance that close to C# -- I didn't think it could be done without compiler support. By the way, feel free to copy the tests from TaskBuilder.fs if you want'em as a starting point, they aren't much but they did help me out with catching correctness problems a time or two.

I am curious about something you said in a much older comment I didn't see till now:

Doing such an endless while loop in the TaskBuilder CE however will at some point OOM due to an endlessly growing Step continuation depth.

While TaskBuilder.fs definitely doesn't match the performance of C# and clutters stack traces, I do believe it should behave correctly. The case of infinite while loops is something I remember looking at specifically as that was a problem with the older builders based on Task.ContinueWith.

For example this code runs into at least the 100s of millions of iterations with minor fluctuations but no persistent increase in memory usage that I can see:

let one() =
    task {
        do! Task.Yield()
        return 1
    }

[<EntryPoint>]
let main argv = 
    (task {
        let mutable counter = 0
        while true do
            let! x = one()
            if x > 1 then failwith "Mistake"
            counter <- counter + 1
            if counter % 1_000_000 = 0 then
                printfn "%d" counter
    }).Wait()
    0 // return an integer exit code
@NinoFloris

This comment has been minimized.

Copy link

NinoFloris commented Mar 7, 2019

@rspeele Thanks, that means a lot coming from you!

Yeah the benchmarks helped catch mostly type errors too.
I was meaning to add some tests but at least it's running in two pretty sizable codebases in production. And to be frank the awaiter/statemachine code is quite straightforward. The hardest part was bending the compiler to my will ;)

I am curious about something you said in a much older comment I didn't see till now:

I think you can chalk my old comment up as some reasoning error.
I think the while is completely fine, it backs out of the stack after each iteration so that seems fine.

@Angr1st

This comment has been minimized.

Copy link

Angr1st commented Mar 10, 2019

If what @matthid is saying is true, that the compiler itself can choose whether to return a started task or not. Wouldn't it make sense to completly change the underlying way Fsharp handels the async ce to also use the TPL infrastructure to reap maximum performance benefits? Having the async ce be basically a cold task without context and a task ce a hot task, like csharp, without context and ontop a sync ce that also returns a hot task but with context? Could we then also get a syncAsync ce with cold task and context just to handle all cases? I would prefers this to be done by the compiler itself as to not introduce breaking changes in the existing async ce uses today. Is this basically the plan how to solve this or am I missing something?

@cartermp

This comment has been minimized.

Copy link
Member

cartermp commented Mar 10, 2019

@dsyme and I chatted about this topic - giving the compiler knowledge of both async and task ces - and we felt it was certainly worthwhile. I think it would be strange only to do that for one of them. Of course the biggest issue is testing and ironing out a bug tail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.