Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Enum.map/2 for lists #3811

Merged
merged 1 commit into from Oct 1, 2015

Conversation

michalmuskala
Copy link
Member

When writing an article about lists I ventured into implementing couple of the core list functions myself. I was surprised when I discovered that the current implementation of Enum.map/2 is not optimal.

I came up with 3 options:

  • using for comprehensions (currently used)
  • using explicit recursion with accumulator
  • using simple consing

The third option seems to be the fastest. That's also the one that is used by :lists.map/2

Tested implementations:

  def map_cons([head | tail], fun), do: [fun.(head) | map_cons(tail, fun)]
  def map_cons([], _fun), do: []

  def map_recurse(list, fun, acc \\ [])
  def map_recurse([head | tail], fun, acc), do: map_recurse(tail, fun, [fun.(head) | acc])
  def map_recurse([], _fun, acc), do: :lists.reverse(acc)

  def map_for(list, fun), do: for i <- list, do: fun.(i)

Results:

cons 100            500000   3.25 µs/op
recurse 100         500000   3.27 µs/op
for 100             500000   4.76 µs/op
cons 1000            50000   30.57 µs/op
recurse 1000         50000   32.59 µs/op
for 1000             50000   43.51 µs/op
cons 10000            5000   308.71 µs/op
recurse 10000         5000   339.36 µs/op
for 10000             5000   449.46 µs/op
cons 100000            500   3135.30 µs/op
recurse 100000         500   4178.06 µs/op
for 100000             500   5025.42 µs/op

Full testing code

Performance gain is steadily around 30%. The difference is not huge (and probably less significant if you do any work inside the mapping function), but given how often Enum.map/2 is used, I think this warrants the change.

Out of 3 options:
* using for comprehensions
* using explicit recursion with accumulator
* using simple consing

The third option seems to be the fastest. That's also the one that
is used by :lists.map/2
@josevalim
Copy link
Member

Just to be double sure, please run every benchmark in a separated process.
:)

José Valim
www.plataformatec.com.br
Skype: jv.ptec
Founder and Director of R&D

@michalmuskala
Copy link
Member Author

I must admit I went a lazy path this time - the new testing code

Results:

cons 100            500000   5.62 µs/op
recurse 100         500000   6.47 µs/op
for 100             200000   8.75 µs/op
cons 1000            50000   57.47 µs/op
recurse 1000         50000   59.15 µs/op
for 1000             50000   70.73 µs/op
cons 10000            5000   535.17 µs/op
recurse 10000         5000   573.46 µs/op
for 10000             5000   678.71 µs/op
cons 100000            200   7505.32 µs/op
recurse 100000         200   7814.68 µs/op
for 100000             200   9194.78 µs/op

@josevalim
Copy link
Member

Yes. comprehensions have been rewritten to always use Enum.reduce, that's why it became slower. If you find any other use of comprehensions in Enum, we should change it. Can you please send PRs for any other cases you see? Thank you! :)

josevalim added a commit that referenced this pull request Oct 1, 2015
@josevalim josevalim merged commit f8e2332 into elixir-lang:master Oct 1, 2015
josevalim added a commit that referenced this pull request Oct 1, 2015
Optimize Enum.map/2 for lists

Signed-off-by: José Valim <jose.valim@plataformatec.com.br>
@splattael
Copy link

Just to be double sure, please run every benchmark in a separated process.

@josevalim Interesting, could you explain why? Should we always run benchmarks in separated processes?

@michalmuskala
Copy link
Member Author

I think the concern was to eliminate the effects of garbage collection. Each process has a new heap, so garbage collection will happen at the same time. When you run all in the same process it's not as predictable, as you can accumulate garbage from previous runs.

@splattael
Copy link

@michalmuskala That makes sense. Thank you for the explanation :)
benchp is pretty handy! Maybe it should be added to Benchfella :D

@ericmj
Copy link
Member

ericmj commented Oct 3, 2015

I'm usually a proponent of body-recursive functions but in the stdlib we should strive for highest performance. @michalmuskala did you also measure the tail-recursive version that reverses the list at the end. Tail-recursive should be ~30% faster on x86 according to erlang's efficiency guide.

@michalmuskala
Copy link
Member Author

@ericmj I believe tail-recursive is the implementation I called recurse. It's performance is similar to the body-recursive solution I called cons that I ended up using, because it came out as slightly faster.
I may be wrong on that. Maybe someone could run those tests on other machine that mine.

I've read that part of efficiency guide too - if my benchmarking was correct it seems both solutions are equally performant currently.

@lexmag
Copy link
Member

lexmag commented Oct 3, 2015

I think using :lists.map/2 makes it even better: comparable performance and obviously less code.

@josevalim
Copy link
Member

In my previous benchmarks, tail-recursive is faster for long collections, body-recursive for shorter ones. :lists.map is also great because we inline its code. They are all great solutions.

@michalmuskala michalmuskala mentioned this pull request Feb 26, 2017
michalmuskala added a commit to michalmuskala/elixir that referenced this pull request Mar 21, 2017
lexmag pushed a commit that referenced this pull request Mar 22, 2017
* Avoid do_ prefixing

* Refactor list functions to be body-recursive where sensible

  The reasoning is the same as in #3811 and 99e0d8e.

* Use recursive functions to optimize filter/reject for lists

  ----- With input: Big (10 Million) -----
  Name           ips        average  deviation         median
  tail          1.64      610.99 ms    ±13.78%      577.63 ms
  body          1.48      675.90 ms    ±10.06%      681.08 ms
  for           1.46      687.19 ms    ±12.84%      693.82 ms

  Comparison:
  tail          1.64
  body          1.48 - 1.11x slower
  for           1.46 - 1.12x slower

  ----- With input: Medium (100 Thousand) -----
  Name           ips        average  deviation         median
  tail        201.60        4.96 ms    ±15.12%        4.81 ms
  body        199.52        5.01 ms    ±14.03%        4.76 ms
  for         178.95        5.59 ms    ±14.50%        5.39 ms

  Comparison:
  tail        201.60
  body        199.52 - 1.01x slower
  for         178.95 - 1.13x slower

  ---- With input: Small (1 Thousand) -----
  Name           ips        average  deviation         median
  body       23.98 K       41.70 μs    ±38.90%       38.00 μs
  tail       21.35 K       46.84 μs    ±35.64%       44.00 μs
  for        18.64 K       53.63 μs    ±31.60%       50.00 μs

  Comparison:
  body       23.98 K
  tail       21.35 K - 1.12x slower
  for        18.64 K - 1.29x slower

* Add function guard to group_by/3 back

  It was used for dispatch between the new and deprecated version.
  The guard was erroneously removed in 99e44a1#diff-6881431a92cd4e3ea0de82bf2338f8eaL1032.
@michalmuskala michalmuskala deleted the optimize_map branch May 12, 2018 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants