Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dispatch: group elements into ports/buckets by index #633

Open
Orace opened this issue Nov 2, 2019 · 8 comments
Open

Dispatch: group elements into ports/buckets by index #633

Orace opened this issue Nov 2, 2019 · 8 comments

Comments

@Orace
Copy link
Contributor

Orace commented Nov 2, 2019

I propose a Dispatch method that build up many IEnumerable<T> from one IEnumerable<T>

IReadonlyList<IEnumerable<T>> Dispatch<T>(this IEnumerable<T> source, int portCount, Func<T, int> portSelector);

For each element, the portSelector is evaluated and provide an index, the element is pushed onto port with the given index. We can use -1 as returned index to discard values.

Exemple:

var nums = new [] {0,1,-1,2,-2,3,-3};
var ports = nums.Dispatch(v => v < 0 ? -1 : v%2); // discard negative number, put even number in port 0 and odd numbers in port 1
  • ports[0] yield the sequence {0,2}
  • ports[1] yield the sequence {1,3}

We can add overloads with predefined count of ports that return a tuple:

(IEnumerable<T> First, IEnumerable<T> Second) Dispatch2<T>(this IEnumerable<T> source, Func<T, int> portSelector);
(IEnumerable<T> First, IEnumerable<T> Second, IEnumerable<T> Third) Dispatch3<T>(this IEnumerable<T> source, Func<T, int> portSelector);
@atifaziz
Copy link
Member

atifaziz commented Nov 5, 2019

You can achieve this with GroupBy from LINQ composed with Partition from MoreLINQ:

var nums = new[] { 0, 1, -1, 2, -2, 3, -3 };
var (evens, odds, _) = // note that third set is computed but discarded
    nums.GroupBy(v => v < 0 ? -1 : v % 2)
        .Partition(0, 1, ValueTuple.Create);

@Orace
Copy link
Contributor Author

Orace commented Nov 5, 2019

GroupBy will consume the full sequence on first iteration.

A handcrafted implementation may avoid that.

@atifaziz
Copy link
Member

atifaziz commented Nov 6, 2019

GroupBy will consume the full sequence on first iteration.

So ports is an array of sequences:

  • ports[0] yield the sequence {0,2}
  • ports[1] yield the sequence {1,3}

How's that any different?

@atifaziz atifaziz changed the title [Proposal] Dispatch method. Dispatch: group elements into ports/buckets by index Nov 6, 2019
@Orace
Copy link
Contributor Author

Orace commented Nov 6, 2019

ports is an array of lazy sequences. And with the help of some buffer the source sequence is not enumerate more than needed. Not like GroupBy that will consume the full sequence.

@Orace Orace changed the title Dispatch: group elements into ports/buckets by index Dispatch: group elements into ports/buckets by value Nov 6, 2019
@Orace Orace changed the title Dispatch: group elements into ports/buckets by value Dispatch: group elements into ports/buckets by index Nov 6, 2019
@atifaziz
Copy link
Member

atifaziz commented Nov 6, 2019

Perhaps something is missing in my understanding but I don't see how you can do this with sequences that are lazy. Also, your return value is a read-only list (of supposedly lazy sequence) and a list implies immediate execution semantics.

@Orace
Copy link
Contributor Author

Orace commented Nov 6, 2019

I will try to build a useful example.

Let says that you have the logs of a connection between a client and a server.
All client request start with Q and all server response start with A.
We assume that the response are given in the order of the question.
There is also some noise that start with #.

  Q Hy
  A Hello
  Q What's your name ?
  # the fuzz is buzzing
  Q Ans city ?
  A My name is Bob
  Q And age
  # the buzz is fuzzing
  A I live in Detroit
  A I'm 42
  ...

That is the input of type IEnumerable<string>
We want to match each Q with it's corresponding A.

Let's do it with my original proposition:

var ports = input.Dispatch(s => s.StartWith("Q") ? 0 : s.StartWith("A") ? 1 : -1);
var results = ports[0].Zip(ports[1], (question, answer) => (question, answer));

With an input like "Q", "Q" ..., "Q", "A" where the first answer is at the position N,
if we do port[1].First(), the original input is buffered until a "A" is found. not more.

From this example we can improve the signature:

IReadonlyDictionary<TKey, IEnumerable<TSource>> Dispatch<TKey, TSource>(
  this IEnumerable<TSource> source,
  IEnumerable<TKey> keys, // accepted keys, it's consumed immediately,
  Func<TSource, TKey> keySelector);


// usage

var keys = new[] {'Q', 'A' };
var ports = input.Dispatch(keys, s => s[0]);
var results = ports['Q'].Zip(ports['A'], (question, answer) => (question, answer));

@Orace
Copy link
Contributor Author

Orace commented Nov 11, 2019

some overload of Partition actually implement dispatch with 2 ports.

@Orace
Copy link
Contributor Author

Orace commented Dec 3, 2019

Perhaps something is missing in my understanding but I don't see how you can do this with sequences that are lazy. Also, your return value is a read-only list (of supposedly lazy sequence) and a list implies immediate execution semantics.

It's not full lazy, there are some caches.
Each IEnumerable in the returned list/dictionary is implemented has a queue. When an element is requested on one of the IEnumerable there are two options:

  • If the inner queue has some elements, the first element is removed and returned.
  • if the inner queue doesn't have any element, the source sequence iteration start (or continue from last position) and the elements are dispatched (using the key selector) and enqueued to their corresponding queue (or discarded). The iteration continue until an element matching the requested IEnumerable is found. The matching element is then returned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants