Enum refactoring #5818

michalmuskala · 2017-02-26T08:14:49Z

Avoid do_ names in enum implementation
Refactor enum list functions to body-recursive where sensible
The reasoning is the same as in Optimize Enum.map/2 for lists #3811 and 99e0d8e

whatyouhide · 2017-02-26T08:42:44Z

out of curiosity, have you measured the changes to tail recursion?
you renamed many functions to *_list, but then protocol_into: maybe into_protocol then?

lexmag · 2017-02-26T11:53:39Z

@michalmuskala did you, after style guide tweet, decide to start spying on me? 🤔
Just kidding. I did extremely close changes yesterday lexmag@7065156.
Let's fix h and t usage like in that commit.

maybe into_protocol then?

I think it's better to have common prefix. 👍

whatyouhide · 2017-02-27T09:34:38Z

@lexmag you mean list_ and protocol_ then?

michalmuskala · 2017-02-27T10:19:05Z

I initially wanted to go for list_ instead of _list. But there were already some helper functions defined using the _list naming and I didn't want to create excessive diffs, so I decided to follow what I found.

josevalim · 2017-02-27T10:37:32Z

I prefer _list because it is easier to find. all_list is more straight-forward to find then if we have many many functions starting with list_.

lexmag · 2017-02-27T11:21:53Z

@whatyouhide by saying common prefix I meant prefix of helper function and public function, for example all_list helper for all? public function.

whatyouhide · 2017-02-27T11:23:38Z

okay, that sounds good :). I agree, let's have _list and into_protocol then :).

whatyouhide · 2017-03-03T16:25:40Z

Ping @michalmuskala? :)

josevalim · 2017-03-04T14:24:59Z

@whatyouhide maybe we should go ahead and merge it and do the changes ourselves as this PR can easily get stale.

michalmuskala · 2017-03-04T15:11:22Z

I tried benchmarking the change to see if it really gives the expected benefits and saw some surprising answers - for zip, as expected the body recursive is faster for small lists (tested for 1_000 elements), and the tail recursive is faster for larger lists (tested for 10_000_000 elements) - in both cases the difference is 10-15%. But for 100_000 elements, the body recursive approach is 3 times slower 😱 - I was able to reproduce this result consistently. That's why I stopped working on this hoping I'll be able to figure out what's going on, but unfortunately I wasn't yet.

josevalim · 2017-03-04T21:01:43Z

@michalmuskala I have seen the same numbers in the past, except I never noticed a drastic slow down for 100_000 elements. I would expect some slow downs for certain elements for maps, due to internal resizing, but not for lists.

We should probably merge the PR only with the renaming and keep operations tail recursive. Especially because Enumerable.reduce is body recursive, so it would probably be best to keep the same properties.

josevalim · 2017-03-16T08:54:40Z

Ping.

michalmuskala · 2017-03-21T16:10:08Z

I updated the namings to use head and tail for the variables.

Additionally, there are two new commits:

Enum.fetch performs only a single enumerable dispatch, instead of two, as was the common case previously
Enum.filter/2 and Enum.reject/2 are optimised for lists with an inline function, instead of for, which is 30% to 10% faster.

As to the body-recursive vs tail-recursive. We already have some functions body recursive, most notably Enum.map/2. Here's my reasoning:

For small collections the performance doean't really matter
For small to medium size collections (up to thpusands of items) body recursive is faster. This is the most common case.
For large collections the performance loss of using body-recursive vs tail recursive is 10-15% (which is not a horrible outcome). This isn't very common case to handle collections of millions of items. With such collections there are usually other issues, that lead to implement the algorithms manually, so the defaults don't matter that much.

The reasoning is the same as in elixir-lang#3811 and 99e0d8e

OvermindDL1 · 2017-03-21T16:13:45Z

instead of for, which is 30% to 10% faster.

Why is for not generating the same style of private functions so it should have the same speed? I'm fairly sure erlang's list/binary comprehensions do that so the speed should be the same?

michalmuskala · 2017-03-21T16:20:20Z

Elixir's for handles all enumerables, not only lists.

Additionally, there was a guard erroneously removed from group_by in 99e44a1#diff-6881431a92cd4e3ea0de82bf2338f8eaL1032 - the guard was used for dispatch between current and deprecated function.

josevalim · 2017-03-21T18:33:25Z

lib/elixir/lib/enum.ex

-    case Enumerable.count(enumerable) do
-      {:error, _module} ->
+    module = Enumerable.impl_for!(enumerable)
+    case module.count(enumerable) do


This change is backwards incompatible. It is ok for a module to return ANOTHER module which has the proper implementation. This change also defeats the whole point of returning {:error, __MODULE__}, which is to avoid two protocol dispatches.

First, the documentation of the enumerable protocol talks about returning {:error, __MODULE__} only. To me, this means returning anything else than the protocol module is not allowed.

Second, without this, we are doing double dispatch:

when Enumerable.count/1 does a dispatch and call returns {:ok, count}.

in fetch_enumerable/3 we pass Enumerable

fetch_enumerable/3 does another dispatch.

The only time a double-dispatch does not happen in this function is when count returns {:error, __MODULE__}.

@michalmuskala we do a double dispatch on the worst case scenario only. You do a double dispatch on all cases. If an implementation doesn't want to perform a double dispatch, then they can always implement count.

@michalmuskala please revert the changes and we will discuss how to proceed regarding count and member? separately.

Sure, I will revert.

Could you explain where the second dispatch happens in my implementation, though? The first is during impl_for! call, but I can't find the second one.

You are right. There is no double protocol dispatch on this version of the code. Apologies. We should stick with the current version.

And by current version I mean your version.

josevalim · 2017-03-21T18:33:50Z

lib/elixir/lib/enum.ex

-    case Enumerable.count(enumerable) do
-      {:error, module} ->
+    module = Enumerable.impl_for!(enumerable)
+    case module.count(enumerable) do


Same here, please revert.

josevalim · 2017-03-21T18:36:42Z

Thanks @michalmuskala, I have added two final comments.

----- With input Big (10 Million) ----- Name ips average deviation median tail 1.64 610.99 ms ±13.78% 577.63 ms body 1.48 675.90 ms ±10.06% 681.08 ms for 1.46 687.19 ms ±12.84% 693.82 ms Comparison: tail 1.64 body 1.48 - 1.11x slower for 1.46 - 1.12x slower ----- With input Middle (100 Thousand) ----- Name ips average deviation median tail 201.60 4.96 ms ±15.12% 4.81 ms body 199.52 5.01 ms ±14.03% 4.76 ms for 178.95 5.59 ms ±14.50% 5.39 ms Comparison: tail 201.60 body 199.52 - 1.01x slower for 178.95 - 1.13x slower ---- With input Small (1 Thousand) ----- Name ips average deviation median body 23.98 K 41.70 μs ±38.90% 38.00 μs tail 21.35 K 46.84 μs ±35.64% 44.00 μs for 18.64 K 53.63 μs ±31.60% 50.00 μs Comparison: body 23.98 K tail 21.35 K - 1.12x slower for 18.64 K - 1.29x slower

It was used for dispatch between the new and deprecated version. The guard was erroneously removed in elixir-lang@99e44a1#diff-6881431a92cd4e3ea0de82bf2338f8eaL1032

michalmuskala · 2017-03-21T19:14:42Z

I reverted the fetch change, and will open a separate PR.

lexmag · 2017-03-21T23:51:05Z

lib/elixir/lib/enum.ex

@@ -806,7 +806,7 @@ defmodule Enum do
  """
  @spec filter(t, (element -> as_boolean(term))) :: list
  def filter(enumerable, fun) when is_list(enumerable) do
-    for item <- enumerable, fun.(item), do: item
+    filter_list(enumerable, fun)


I assume we can use :lists.filter/2 here, instead of defining extra function?

Ignore me, we need to handle falsey values.

lexmag · 2017-03-22T00:20:04Z

Great job @michalmuskala. 💛

josevalim approved these changes Feb 26, 2017

View reviewed changes

michalmuskala force-pushed the enum-refactor branch from 313f38f to 25708a6 Compare March 21, 2017 15:28

michalmuskala added 2 commits March 21, 2017 17:12

Avoid do_ names in enum implementation

bb3cc49

Refactor enum list functions to body-recursive where sensible

5e27d27

The reasoning is the same as in elixir-lang#3811 and 99e0d8e

michalmuskala force-pushed the enum-refactor branch from 1a3287a to 14a2da3 Compare March 21, 2017 16:13

josevalim reviewed Mar 21, 2017

View reviewed changes

michalmuskala force-pushed the enum-refactor branch 2 times, most recently from 01dbf2c to 536e5d2 Compare March 21, 2017 19:11

michalmuskala added 2 commits March 21, 2017 20:13

Add function guard to group_by back

0bd9ead

It was used for dispatch between the new and deprecated version. The guard was erroneously removed in elixir-lang@99e44a1#diff-6881431a92cd4e3ea0de82bf2338f8eaL1032

michalmuskala force-pushed the enum-refactor branch from 536e5d2 to 0bd9ead Compare March 21, 2017 19:13

michalmuskala mentioned this pull request Mar 21, 2017

Perform only single protocol dispatching on Enum.fetch/2 #5911

Merged

josevalim approved these changes Mar 21, 2017

View reviewed changes

lexmag reviewed Mar 21, 2017

View reviewed changes

lexmag changed the title ~~Enum refactor~~ Enum refactoring Mar 21, 2017

lexmag approved these changes Mar 22, 2017

View reviewed changes

lexmag merged commit f046508 into elixir-lang:master Mar 22, 2017

Enum refactoring #5818

Enum refactoring #5818

Uh oh!

Conversation

michalmuskala commented Feb 26, 2017

Uh oh!

whatyouhide commented Feb 26, 2017

Uh oh!

lexmag commented Feb 26, 2017

Uh oh!

whatyouhide commented Feb 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michalmuskala commented Feb 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josevalim commented Feb 27, 2017

Uh oh!

lexmag commented Feb 27, 2017

Uh oh!

whatyouhide commented Feb 27, 2017

Uh oh!

whatyouhide commented Mar 3, 2017

Uh oh!

josevalim commented Mar 4, 2017

Uh oh!

michalmuskala commented Mar 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josevalim commented Mar 4, 2017

Uh oh!

josevalim commented Mar 16, 2017

Uh oh!

michalmuskala commented Mar 21, 2017

Uh oh!

OvermindDL1 commented Mar 21, 2017

Uh oh!

michalmuskala commented Mar 21, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

josevalim commented Mar 21, 2017

Uh oh!

michalmuskala commented Mar 21, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lexmag commented Mar 22, 2017

Uh oh!

Uh oh!

whatyouhide commented Feb 27, 2017 •

edited

Loading

michalmuskala commented Feb 27, 2017 •

edited

Loading

michalmuskala commented Mar 4, 2017 •

edited

Loading