Await: asynchronous evaluation of sequence elements #205

atifaziz · 2016-10-27T11:13:23Z

This PR is just an idea, an RFC to solicit feedback. I'm not even sure this belongs in MoreLINQ as it may expand the scope of the library to cover async scenarios.

var beatles = new[]
{
    "John Lennon",
    "Paul McCartney",
    "George Harrison",
    "Ringo Starr",
};

var http = new HttpClient();
var results = await beatles.SelectAsync(
    async e => await http.SendAsync(new HttpRequestMessage(HttpMethod.Get, "https://en.wikipedia.org/wiki/" + e.Replace(' ', '_'))),
    (e, rsp) => new
    {
        Topic = e,
        rsp.StatusCode,
        rsp.Content.Headers.ContentLength,
        rsp.Content.Headers.ContentType.MediaType,
    });

foreach (var e in results)
    Console.WriteLine(e.ToString());

Output is:

{ Topic = John Lennon, StatusCode = OK, ContentLength = 516857, MediaType = text/html }
{ Topic = Paul McCartney, StatusCode = OK, ContentLength = 754121, MediaType = text/html }
{ Topic = George Harrison, StatusCode = OK, ContentLength = 591170, MediaType = text/html }
{ Topic = Ringo Starr, StatusCode = OK, ContentLength = 476451, MediaType = text/html }

…ctions

fsateler · 2016-10-27T13:35:40Z

Interesting. Some thoughts, in no particular order

This method consumes (and executes the tasks of) the entire source sequence. More general scenarios will likely require a maximum number of tasks to be executed at the same time.
If it is named SelectAsync, it should probably mimic Select and return just the TAsync instead of a KeyValuePair.
The method is not streaming, as it will wait for all tasks to finish before yielding the first result. This is another difference from plain Select.

While this is very useful, I'm unsure this is a good fit for MoreLINQ and it should instead be put in a separate library. Processing of async sequences is likely to require a lot more functions and functionality, and I don't know if this will end up polluting the MoreLINQ library with either more dependencies, or reimplementation of functionality available elsewhere[1]. Perhaps putting this into a separate dll (MoreAsyncLINQ.dll ?), within this same repository makes sense.

[1] For example, I made my own implementation of SelectAsync by building on top of Rx:

public static IEnumerable<TResult> SelectAsync<TSource, TResult>(this IEnumerable<TSource> src, Func<TSource, Task<TResult>> asyncSelector, int maxConcurrency) {
    // Merge iterates eagerly over the source sequence
    // However, Observable.FromAsync will only execute the function when it is subscribed to
    // Therefore, we *must* call the async selector inside the function, otherwise the maxConcurrency is useless
    return src.Select(s => Observable.FromAsync(() => asyncSelector(s))).Merge(maxConcurrency).ToEnumerable();
}

atifaziz · 2016-10-27T17:18:38Z

@fsateler Thanks for all that feedback! The initial implementation was a very poor crack at the problem. The latter commits address all of your points & SelectAsync now pretty much looks (signature-wise) and behaves (streams) like Select.

The updated usage now looks like this:

var beatles = new[]
{
    "John Lennon",
    "Paul McCartney",
    "George Harrison",
    "Ringo Starr",
};

var http = new HttpClient();
var results =
    from e in beatles.SelectAsync(async e => new
    { 
        Topic = e,
        Response = await http.SendAsync(new HttpRequestMessage(HttpMethod.Get, "https://en.wikipedia.org/wiki/" + e.Replace(' ', '_')))
    })
    select new
    {
        e.Topic,
        e.Response.StatusCode,
        e.Response.Content.Headers.ContentLength,
        e.Response.Content.Headers.ContentType.MediaType,
    };

foreach (var e in results)
    Console.WriteLine(e.ToString());

Processing of async sequences is likely to require a lot more functions and functionality,

Not sure. Take a look at the code now.

…will end up polluting the MoreLINQ library with either more dependencies, or reimplementation of functionality available elsewhere.

That's what I'm afraid of too thus why I consider this PR as an RFC (should have a label for that?) for now.

Perhaps putting this into a separate dll (MoreAsyncLINQ.dll ?), within this same repository makes sense.

I don't want to create & maintain another whole library for just one method in the async category. Perhaps a MoreLinq.Sandbox would be better for all ideas that are cooking and unsupported?

I made my own implementation of SelectAsync by building on top of Rx

Thanks for sharing that and I had done something very similar in the past in projects where I already had a dependency on Rx. However, for projects that don't, I felt it may be possible to implement it simply using existing infrastructure from the framework.

fsateler · 2016-10-27T18:41:36Z

I consider this PR as an RFC (should have a label for that?)

There is already a discussion label. Maybe this tag can be used for this too?

I don't want to create & maintain another whole library for just one method in the async category. Perhaps a MoreLinq.Sandbox would be better for all ideas that are cooking and unsupported?

This sounds like a good idea, but orthogonal to what I proposed.

I made my own implementation of SelectAsync by building on top of Rx

Thanks for sharing that and I had done something very similar in the past in projects where I already had a dependency on Rx. However, for projects that don't, I felt it may be possible to implement it simply using existing infrastructure from the framework.

It is certainly possible. But, then we get back to the most important issue:

…will end up polluting the MoreLINQ library with either more dependencies, or reimplementation of functionality available elsewhere.

That's what I'm afraid of too

.

Processing of async sequences is likely to require a lot more functions and functionality,

Not sure. Take a look at the code now.

What I mean is that mixing async code and sequences is usually more complex than just having a SelectAsync. Why not implement WhereAsync, SelectManyAsync, JoinAsync, etc too? Another example: the code now no longer preserves order of the source sequence. Should there be a new overload that does?
If there is potential for adding more async-related methods, I suspect the amount of reimplementation of stuff already present in other libraries will increase..

fsateler · 2016-10-27T18:49:39Z

MoreLinq/SelectAsync.cs

+            var queue = new BlockingCollection<object>();
+            using (var _ = source.GetEnumerator())
+            {
+                var item = _;


Why this indirection?

To stop ReSharper from complaining about potentially accessing a disposed closure. I found this was good enough for now instead of polluting the code with suppression comments.

fsateler · 2016-10-27T18:57:06Z

MoreLinq/SelectAsync.cs

+            if (maxConcurrency <= 0) throw new ArgumentOutOfRangeException(nameof(maxConcurrency));
+            if (selector == null) throw new ArgumentNullException("selector");
+
+            var queue = new BlockingCollection<object>();


Why not use Tuple<ExceptionDispatchInfo, Task<TResult>> instead of object? Then, use an approach like the one in Go or Javascript: The first item is null if no error ocurred, the second item is null if an error occurred.

That's exactly how it started out! Don't believe me? See my Throttle implementation that was the basis for SelectAsync. In the end, I felt that Tuple represents a product and not a union/either/choice. With a tuple, you can get invalid combinations & while both approaches need if-ing on the various cases, using a super type like object as a union of the run-time types just seemed better as a shortcut. Ideally I'd have Notification<T> handy because that's exactly what's needed here but I didn't want to go out and add new types as I'm in two minds of adding this to MoreLINQ.

fsateler · 2016-10-27T19:07:21Z

What I mean is that mixing async code and sequences is usually more complex than just having a SelectAsync. Why not implement WhereAsync, SelectManyAsync, JoinAsync, etc too? Another example: the code now no longer preserves order of the source sequence. Should there be a new overload that does?

For avoidance of doubt, I'm not suggesting you should provide these methods for SelectAsync to be useful. The questions are mostly thinking out loud trying to discover if the async-related methods will grow to a sizable number.

atifaziz · 2016-10-27T20:44:56Z

Another example: the code now no longer preserves order of the source sequence. Should there be a new overload that does?

Select in LINQ to SQL and PLINQ does not preserve order either:

In PLINQ, the goal is to maximize performance while maintaining correctness. A query should run as fast as possible but still produce the correct results. In some cases, correctness requires the order of the source sequence to be preserved; however, ordering can be computationally expensive. Therefore, by default, PLINQ does not preserve the order of the source sequence. In this regard, PLINQ resembles LINQ to SQL, but is unlike LINQ to Objects, which does preserve ordering.

I think it's understood that there needs to be a balancing act here. If streaming is the greater benefit then order can't be guaranteed and if order is important then you need to follow-up with an OrderBy.

I have the same concerns as you about the proliferation of other async-enabled operators, which is why I continue to be in two minds about adding this to MoreLINQ. I reckon though that SelectAsync will be the most popular (the primary motivation being use of async lambdas in existing LINQ queries) & the rest may just fall under YAGNI. If anyone is doing anything more sophisticated then they'll probably be using Rx in the first place. Instead of guessing & worrying, I think the right answer to put all these fears to rest may be to have another library for hosting experiments, be that MoreLinq.Sandbox or MoreLinq.Labs. And it doesn't have to have tests or rigorous review.

atifaziz · 2016-10-27T20:54:27Z

Some open points:

What happens to tasks that are in-flight when the iterator is closed or an error is thrown by one of the asynchronous operations?
How would one support cancellation for the entire enumeration? The cancellation token would have to be re-created on each enumeration for the query to have repeatable semantics. Would cancellation be ever needed?

atifaziz · 2016-10-27T21:31:48Z

The IDisposable.Dispose docs clearly state that subsequent calls should be ignored:

If an object's Dispose method is called more than once, the object must ignore all calls after the first one.

So it is safe (or not incorrect) to assume so.

atifaziz · 2016-10-28T10:15:37Z

What happens to tasks that are in-flight when the iterator is closed or an error is thrown by one of the asynchronous operations?

There are now new overloads where the projection function receives an additional argument that's a CancellationToken. The cancellation token is signaled when either the iteration is stopped early (e.g. SelectAsync is combined with Take or TakeWhile) or the projection of an element throws an error, thus allowing in-flight tasks to cancel.

Below is an example using an overload that sends a CancellationToken to the projection function.

var beatles = new[]
{
    "John Lennon",
    "Paul McCartney",
    "George Harrison",
    "Ringo Starr",
};

var http = new HttpClient();
var results =
    from e in beatles.SelectAsync(async (e, ct) =>
    {
        try
        {
            var url = "https://en.wikipedia.org/wiki/" + e.Replace(' ', '_');
            return new
            { 
                Topic = e,
                Response = await http.SendAsync(new HttpRequestMessage(HttpMethod.Get, url), ct),
            };
        }
        catch (OperationCanceledException)
        {
            Console.WriteLine("ABORT! " + e);
            throw;
        }
    })
    select new
    {
        e.Topic,
        e.Response.StatusCode,
        e.Response.Content.Headers.ContentLength,
        e.Response.Content.Headers.ContentType.MediaType,
    };

foreach (var e in results.Take(2))
    Console.WriteLine(e.ToString());

Because we only care about 2 results (note the Take(2) in the foreach loop at the end), one possible output is:

{ Topic = John Lennon, StatusCode = OK, ContentLength = 518001, MediaType = text/html }
{ Topic = George Harrison, StatusCode = OK, ContentLength = 591180, MediaType = text/html }
ABORT! Ringo Starr
ABORT! Paul McCartney

Since SelectAsync does not respect the source order, the output above will vary with each run.

# Conflicts: # MoreLinq.Test/project.json # MoreLinq/project.json

System.Collections.Concurrent is available in .NET Standard 1.1 and above, and we don't want to add a whole new target for that just now since it is covered by the .NET Standard 2.0 target.

atifaziz · 2018-03-06T19:09:16Z

but I haven't been able to come up with better alternatives. CollectTasks, Wait, Await don't sound any better to me.

SelectTask or SelectAwaitable also come to mind but I fear they won't bring much more clarity. Is Fetch so bad? Do we need to really make anything about tasks or awaiting obvious in the name?

BTW, since I never explicitly said this before: I like the overall shape, and I think this can be a useful addition to MoreLINQ.

Cool and I'd like to work towards getting moving it out of the experimental space. If we can settle on a name and release it as experimental then there'll be time to work out unit tests and iron out any kinks from battle-testing.

atifaziz · 2018-03-06T21:43:09Z

CollectTasks, Wait, Await don't sound any better to me.

I realised that it is a bit useless to have a projection function in the first overload. The method can be reduced to simply being an extension of a sequence of tasks (IEnumerable<Task<T>>). See what it gives in 3819d3d. What's also interesting is that it then made sense to simply call the method Await, as one of your suggestions! This is either a good thing or it means that the first overload should be removed because no one in their right mind will use it; it'll waste work if the sequence is not fully consumed. The second overload makes you think about cancellation.

This reverts commit d2857e8 that was partially complete.

This is what commit d2857e8 should have been.

atifaziz · 2018-03-07T08:13:40Z

With 09497dc, I'm proposing to rename SelectAsync to Await entirely. With some careful re-wording, I think it brings about more clarity. That is, SelectAsync was possibly the wrong name all along because the function in the following overload felt like a projection when it's really an evaluation.

public static IAwaitQuery<TResult> Await<T, TResult>(
    this IEnumerable<T> source,
    Func<T, CancellationToken, Task<TResult>> evaluator)

It's purpose is not as much to project (T → TResult) as it is to simply supply or inject the CancellationToken into the evaluation in order to abort it if the iteration is terminated prematurely.

@fsateler What do you think? Am I twisting things to force retrofitting a definition that favours Await as a name or does it also make sense to you, and clarify/scope things better?

atifaziz · 2018-04-04T11:48:07Z

@fsateler Did I miss anything from the changes you requested in your last review? Wondering what's keeping to ship this as an experiment?

fsateler

I like the new names better.

I think this is ready to go into the Experimental namespace.

atifaziz added 2 commits October 27, 2016 12:57

SelectAsync to asynchronously pair sequence elements with their proje…

0944664

…ctions

Make second SelectAsync overload public

d36166c

atifaziz added 4 commits October 27, 2016 17:43

Fix test build

1afd841

Make SelectAsync like Select and streaming

122a905

Source arg name

689c2d3

Concurrency control

e391074

fsateler reviewed Oct 27, 2016

View reviewed changes

atifaziz added discussion enhancement labels Oct 27, 2016

Eager disposal of source

fa2148e

atifaziz changed the title ~~SelectAsync to asynchronously pairs sequence elements with their projections~~ SelectAsync to asynchronously project elements of a sequence Oct 27, 2016

atifaziz added 2 commits October 28, 2016 11:08

SelectAsync is lazy so validate args eagerly

1144ae3

Cancellation support in case of error or early termination

b5b6f18

Back to an async awaiter loop so thread can return to pool

07578f8

atifaziz mentioned this pull request Nov 1, 2016

Namespace for experimental operators #209

Closed

atifaziz added 6 commits February 27, 2018 09:56

Merge branch 'master' into SelectAsync

aa671d8

# Conflicts: # MoreLinq.Test/project.json # MoreLinq/project.json

Exclude SelectAsync from .NET Standard 1.0 build

2b86eba

System.Collections.Concurrent is available in .NET Standard 1.1 and above, and we don't want to add a whole new target for that just now since it is covered by the .NET Standard 2.0 target.

Fix arg validation to pass tests

bbe2daf

Use nameof to get arg names

2854b07

Use local function pattern for implementation

403df2d

Add remarks that order is not preserved

34af1fd

atifaziz added 4 commits March 6, 2018 18:17

MaxConcurrency(int) + UnboundedConcurrency()

21314f4

Eager disposals of enumerator in CollectToAsync

6e22d29

Keep TupleComparer member parity with ValueTuple

811b956

Don't continue our awaits on captured context

e377fdb

Simplify first overload to an awaiter

3819d3d

atifaziz added 3 commits March 7, 2018 07:25

Rename to Await

d2857e8

Revert "Rename to Await"

3efb94d

This reverts commit d2857e8 that was partially complete.

Rename SelectAsync to Await & update doc wording

09497dc

This is what commit d2857e8 should have been.

atifaziz removed the discussion label Mar 7, 2018

fsateler approved these changes Apr 5, 2018

View reviewed changes

atifaziz added 5 commits April 5, 2018 10:25

Fix doc typo

7c9273d

Doc remarks about effects of hot/cold tasks

3bcaa35

Merge remote-tracking branch 'upstream/master' into SelectAsync

23c2ac9

Add Await to list of operators in package description

551ffbb

Add Await to readme

aa9dc5d

atifaziz added this to the 3.0.0 milestone Apr 5, 2018

atifaziz changed the title ~~SelectAsync to asynchronously project elements of a sequence~~ Await: asynchronous evaluation of sequence elements Apr 5, 2018

atifaziz modified the milestones: 3.0.0, 3.0.0 βeta 1 Apr 5, 2018

atifaziz added 3 commits April 9, 2018 09:41

Merge branch 'master' into SelectAsync

7cee7d4

Add await to new ops list in release notes

de2d232

Mark Await as experimental in package metadata

70edcbe

atifaziz merged commit 3e53e03 into master Apr 9, 2018

atifaziz deleted the SelectAsync branch April 9, 2018 09:17

atifaziz mentioned this pull request May 4, 2018

Add AwaitCompletion for await “tasks” #479

Merged

MrSmoke mentioned this pull request Dec 7, 2021

Experimental operators #834

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Await: asynchronous evaluation of sequence elements #205

Await: asynchronous evaluation of sequence elements #205

atifaziz commented Oct 27, 2016 •

edited

Loading

fsateler commented Oct 27, 2016 •

edited

Loading

atifaziz commented Oct 27, 2016 •

edited

Loading

fsateler commented Oct 27, 2016

fsateler Oct 27, 2016

atifaziz Oct 27, 2016

fsateler Oct 27, 2016

atifaziz Oct 27, 2016

fsateler commented Oct 27, 2016

atifaziz commented Oct 27, 2016 •

edited

Loading

atifaziz commented Oct 27, 2016

atifaziz commented Oct 27, 2016

atifaziz commented Oct 28, 2016 •

edited

Loading

atifaziz commented Mar 6, 2018

atifaziz commented Mar 6, 2018 •

edited

Loading

atifaziz commented Mar 7, 2018 •

edited

Loading

atifaziz commented Apr 4, 2018

fsateler left a comment

Await: asynchronous evaluation of sequence elements #205

Await: asynchronous evaluation of sequence elements #205

Conversation

atifaziz commented Oct 27, 2016 • edited Loading

fsateler commented Oct 27, 2016 • edited Loading

atifaziz commented Oct 27, 2016 • edited Loading

fsateler commented Oct 27, 2016

fsateler Oct 27, 2016

Choose a reason for hiding this comment

atifaziz Oct 27, 2016

Choose a reason for hiding this comment

fsateler Oct 27, 2016

Choose a reason for hiding this comment

atifaziz Oct 27, 2016

Choose a reason for hiding this comment

fsateler commented Oct 27, 2016

atifaziz commented Oct 27, 2016 • edited Loading

atifaziz commented Oct 27, 2016

atifaziz commented Oct 27, 2016

atifaziz commented Oct 28, 2016 • edited Loading

atifaziz commented Mar 6, 2018

atifaziz commented Mar 6, 2018 • edited Loading

atifaziz commented Mar 7, 2018 • edited Loading

atifaziz commented Apr 4, 2018

fsateler left a comment

Choose a reason for hiding this comment

atifaziz commented Oct 27, 2016 •

edited

Loading

fsateler commented Oct 27, 2016 •

edited

Loading

atifaziz commented Oct 27, 2016 •

edited

Loading

atifaziz commented Oct 27, 2016 •

edited

Loading

atifaziz commented Oct 28, 2016 •

edited

Loading

atifaziz commented Mar 6, 2018 •

edited

Loading

atifaziz commented Mar 7, 2018 •

edited

Loading