New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Enum.map/2 for lists #3811

Merged
merged 1 commit into from Oct 1, 2015

Conversation

Projects
None yet
5 participants
@michalmuskala
Member

michalmuskala commented Sep 30, 2015

When writing an article about lists I ventured into implementing couple of the core list functions myself. I was surprised when I discovered that the current implementation of Enum.map/2 is not optimal.

I came up with 3 options:

  • using for comprehensions (currently used)
  • using explicit recursion with accumulator
  • using simple consing

The third option seems to be the fastest. That's also the one that is used by :lists.map/2

Tested implementations:

  def map_cons([head | tail], fun), do: [fun.(head) | map_cons(tail, fun)]
  def map_cons([], _fun), do: []

  def map_recurse(list, fun, acc \\ [])
  def map_recurse([head | tail], fun, acc), do: map_recurse(tail, fun, [fun.(head) | acc])
  def map_recurse([], _fun, acc), do: :lists.reverse(acc)

  def map_for(list, fun), do: for i <- list, do: fun.(i)

Results:

cons 100            500000   3.25 µs/op
recurse 100         500000   3.27 µs/op
for 100             500000   4.76 µs/op
cons 1000            50000   30.57 µs/op
recurse 1000         50000   32.59 µs/op
for 1000             50000   43.51 µs/op
cons 10000            5000   308.71 µs/op
recurse 10000         5000   339.36 µs/op
for 10000             5000   449.46 µs/op
cons 100000            500   3135.30 µs/op
recurse 100000         500   4178.06 µs/op
for 100000             500   5025.42 µs/op

Full testing code

Performance gain is steadily around 30%. The difference is not huge (and probably less significant if you do any work inside the mapping function), but given how often Enum.map/2 is used, I think this warrants the change.

Optimize Enum.map/2 for lists
Out of 3 options:
* using for comprehensions
* using explicit recursion with accumulator
* using simple consing

The third option seems to be the fastest. That's also the one that
is used by :lists.map/2
@josevalim

This comment has been minimized.

Show comment
Hide comment
@josevalim

josevalim Oct 1, 2015

Member

Just to be double sure, please run every benchmark in a separated process.
:)

José Valim
www.plataformatec.com.br
Skype: jv.ptec
Founder and Director of R&D

Member

josevalim commented Oct 1, 2015

Just to be double sure, please run every benchmark in a separated process.
:)

José Valim
www.plataformatec.com.br
Skype: jv.ptec
Founder and Director of R&D

@michalmuskala

This comment has been minimized.

Show comment
Hide comment
@michalmuskala

michalmuskala Oct 1, 2015

Member

I must admit I went a lazy path this time - the new testing code

Results:

cons 100            500000   5.62 µs/op
recurse 100         500000   6.47 µs/op
for 100             200000   8.75 µs/op
cons 1000            50000   57.47 µs/op
recurse 1000         50000   59.15 µs/op
for 1000             50000   70.73 µs/op
cons 10000            5000   535.17 µs/op
recurse 10000         5000   573.46 µs/op
for 10000             5000   678.71 µs/op
cons 100000            200   7505.32 µs/op
recurse 100000         200   7814.68 µs/op
for 100000             200   9194.78 µs/op
Member

michalmuskala commented Oct 1, 2015

I must admit I went a lazy path this time - the new testing code

Results:

cons 100            500000   5.62 µs/op
recurse 100         500000   6.47 µs/op
for 100             200000   8.75 µs/op
cons 1000            50000   57.47 µs/op
recurse 1000         50000   59.15 µs/op
for 1000             50000   70.73 µs/op
cons 10000            5000   535.17 µs/op
recurse 10000         5000   573.46 µs/op
for 10000             5000   678.71 µs/op
cons 100000            200   7505.32 µs/op
recurse 100000         200   7814.68 µs/op
for 100000             200   9194.78 µs/op
@josevalim

This comment has been minimized.

Show comment
Hide comment
@josevalim

josevalim Oct 1, 2015

Member

Yes. comprehensions have been rewritten to always use Enum.reduce, that's why it became slower. If you find any other use of comprehensions in Enum, we should change it. Can you please send PRs for any other cases you see? Thank you! :)

Member

josevalim commented Oct 1, 2015

Yes. comprehensions have been rewritten to always use Enum.reduce, that's why it became slower. If you find any other use of comprehensions in Enum, we should change it. Can you please send PRs for any other cases you see? Thank you! :)

josevalim added a commit that referenced this pull request Oct 1, 2015

@josevalim josevalim merged commit f8e2332 into elixir-lang:master Oct 1, 2015

josevalim added a commit that referenced this pull request Oct 1, 2015

Merge pull request #3811 from michalmuskala/optimize_map
Optimize Enum.map/2 for lists

Signed-off-by: José Valim <jose.valim@plataformatec.com.br>
@splattael

This comment has been minimized.

Show comment
Hide comment
@splattael

splattael Oct 1, 2015

Just to be double sure, please run every benchmark in a separated process.

@josevalim Interesting, could you explain why? Should we always run benchmarks in separated processes?

Just to be double sure, please run every benchmark in a separated process.

@josevalim Interesting, could you explain why? Should we always run benchmarks in separated processes?

@michalmuskala

This comment has been minimized.

Show comment
Hide comment
@michalmuskala

michalmuskala Oct 1, 2015

Member

I think the concern was to eliminate the effects of garbage collection. Each process has a new heap, so garbage collection will happen at the same time. When you run all in the same process it's not as predictable, as you can accumulate garbage from previous runs.

Member

michalmuskala commented Oct 1, 2015

I think the concern was to eliminate the effects of garbage collection. Each process has a new heap, so garbage collection will happen at the same time. When you run all in the same process it's not as predictable, as you can accumulate garbage from previous runs.

@splattael

This comment has been minimized.

Show comment
Hide comment
@splattael

splattael Oct 1, 2015

@michalmuskala That makes sense. Thank you for the explanation :)
benchp is pretty handy! Maybe it should be added to Benchfella :D

@michalmuskala That makes sense. Thank you for the explanation :)
benchp is pretty handy! Maybe it should be added to Benchfella :D

@ericmj

This comment has been minimized.

Show comment
Hide comment
@ericmj

ericmj Oct 3, 2015

Member

I'm usually a proponent of body-recursive functions but in the stdlib we should strive for highest performance. @michalmuskala did you also measure the tail-recursive version that reverses the list at the end. Tail-recursive should be ~30% faster on x86 according to erlang's efficiency guide.

Member

ericmj commented Oct 3, 2015

I'm usually a proponent of body-recursive functions but in the stdlib we should strive for highest performance. @michalmuskala did you also measure the tail-recursive version that reverses the list at the end. Tail-recursive should be ~30% faster on x86 according to erlang's efficiency guide.

@michalmuskala

This comment has been minimized.

Show comment
Hide comment
@michalmuskala

michalmuskala Oct 3, 2015

Member

@ericmj I believe tail-recursive is the implementation I called recurse. It's performance is similar to the body-recursive solution I called cons that I ended up using, because it came out as slightly faster.
I may be wrong on that. Maybe someone could run those tests on other machine that mine.

I've read that part of efficiency guide too - if my benchmarking was correct it seems both solutions are equally performant currently.

Member

michalmuskala commented Oct 3, 2015

@ericmj I believe tail-recursive is the implementation I called recurse. It's performance is similar to the body-recursive solution I called cons that I ended up using, because it came out as slightly faster.
I may be wrong on that. Maybe someone could run those tests on other machine that mine.

I've read that part of efficiency guide too - if my benchmarking was correct it seems both solutions are equally performant currently.

@lexmag

This comment has been minimized.

Show comment
Hide comment
@lexmag

lexmag Oct 3, 2015

Member

I think using :lists.map/2 makes it even better: comparable performance and obviously less code.

Member

lexmag commented Oct 3, 2015

I think using :lists.map/2 makes it even better: comparable performance and obviously less code.

@josevalim

This comment has been minimized.

Show comment
Hide comment
@josevalim

josevalim Oct 4, 2015

Member

In my previous benchmarks, tail-recursive is faster for long collections, body-recursive for shorter ones. :lists.map is also great because we inline its code. They are all great solutions.

Member

josevalim commented Oct 4, 2015

In my previous benchmarks, tail-recursive is faster for long collections, body-recursive for shorter ones. :lists.map is also great because we inline its code. They are all great solutions.

michalmuskala added a commit to michalmuskala/elixir that referenced this pull request Feb 26, 2017

@michalmuskala michalmuskala referenced this pull request Feb 26, 2017

Merged

Enum refactoring #5818

michalmuskala added a commit to michalmuskala/elixir that referenced this pull request Mar 21, 2017

michalmuskala added a commit to michalmuskala/elixir that referenced this pull request Mar 21, 2017

lexmag added a commit that referenced this pull request Mar 22, 2017

Enum refactoring (#5818)
* Avoid do_ prefixing

* Refactor list functions to be body-recursive where sensible

  The reasoning is the same as in #3811 and 99e0d8e.

* Use recursive functions to optimize filter/reject for lists

  ----- With input: Big (10 Million) -----
  Name           ips        average  deviation         median
  tail          1.64      610.99 ms    ±13.78%      577.63 ms
  body          1.48      675.90 ms    ±10.06%      681.08 ms
  for           1.46      687.19 ms    ±12.84%      693.82 ms

  Comparison:
  tail          1.64
  body          1.48 - 1.11x slower
  for           1.46 - 1.12x slower

  ----- With input: Medium (100 Thousand) -----
  Name           ips        average  deviation         median
  tail        201.60        4.96 ms    ±15.12%        4.81 ms
  body        199.52        5.01 ms    ±14.03%        4.76 ms
  for         178.95        5.59 ms    ±14.50%        5.39 ms

  Comparison:
  tail        201.60
  body        199.52 - 1.01x slower
  for         178.95 - 1.13x slower

  ---- With input: Small (1 Thousand) -----
  Name           ips        average  deviation         median
  body       23.98 K       41.70 μs    ±38.90%       38.00 μs
  tail       21.35 K       46.84 μs    ±35.64%       44.00 μs
  for        18.64 K       53.63 μs    ±31.60%       50.00 μs

  Comparison:
  body       23.98 K
  tail       21.35 K - 1.12x slower
  for        18.64 K - 1.29x slower

* Add function guard to group_by/3 back

  It was used for dispatch between the new and deprecated version.
  The guard was erroneously removed in 99e44a1#diff-6881431a92cd4e3ea0de82bf2338f8eaL1032.

ckampfe added a commit to ckampfe/elixir that referenced this pull request Jul 22, 2017

Enum refactoring (#5818)
* Avoid do_ prefixing

* Refactor list functions to be body-recursive where sensible

  The reasoning is the same as in #3811 and 99e0d8e.

* Use recursive functions to optimize filter/reject for lists

  ----- With input: Big (10 Million) -----
  Name           ips        average  deviation         median
  tail          1.64      610.99 ms    ±13.78%      577.63 ms
  body          1.48      675.90 ms    ±10.06%      681.08 ms
  for           1.46      687.19 ms    ±12.84%      693.82 ms

  Comparison:
  tail          1.64
  body          1.48 - 1.11x slower
  for           1.46 - 1.12x slower

  ----- With input: Medium (100 Thousand) -----
  Name           ips        average  deviation         median
  tail        201.60        4.96 ms    ±15.12%        4.81 ms
  body        199.52        5.01 ms    ±14.03%        4.76 ms
  for         178.95        5.59 ms    ±14.50%        5.39 ms

  Comparison:
  tail        201.60
  body        199.52 - 1.01x slower
  for         178.95 - 1.13x slower

  ---- With input: Small (1 Thousand) -----
  Name           ips        average  deviation         median
  body       23.98 K       41.70 μs    ±38.90%       38.00 μs
  tail       21.35 K       46.84 μs    ±35.64%       44.00 μs
  for        18.64 K       53.63 μs    ±31.60%       50.00 μs

  Comparison:
  body       23.98 K
  tail       21.35 K - 1.12x slower
  for        18.64 K - 1.29x slower

* Add function guard to group_by/3 back

  It was used for dispatch between the new and deprecated version.
  The guard was erroneously removed in elixir-lang@99e44a1#diff-6881431a92cd4e3ea0de82bf2338f8eaL1032.

@michalmuskala michalmuskala deleted the michalmuskala:optimize_map branch May 12, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment