Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upImprove performance of List.foldr #872
Conversation
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
process-bot
Jun 12, 2017
Thanks for the pull request! Make sure it satisfies this checklist. My human colleagues will appreciate it!
Here is what to expect next, and if anyone wants to comment, keep these things in mind.
process-bot
commented
Jun 12, 2017
|
Thanks for the pull request! Make sure it satisfies this checklist. My human colleagues will appreciate it! Here is what to expect next, and if anyone wants to comment, keep these things in mind. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
evancz
Jun 13, 2017
Member
Whoo, very cool! :D Three questions:
-
Instead of having branches for every single scenario, what if you only had 0, 1, and 9? Or 0, 1, 2, 4, 8? Or some other pattern for reducing the number of branches? How much does that reduce the size of generated code? How much perf does it cost?
-
Can you use
f a (f b (f c ...))instead of using|>? No reason to require a compiler optimization to trigger if we do not need to. -
Is the generated code is doing things like
xs._1._1._1._1.ctor? Can that can be avoided by writing nested cases instead of one big one? How does that change generated code size and performance?
|
Whoo, very cool! :D Three questions:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Skinney
Jun 13, 2017
Contributor
- Won't compile without branches for every single scenario.
- Sure
- Not only did nested case-statements reduce generated code size with about 550 bytes (2500bytes -> 1950bytes). It also increased performance. We're now in the 40-45% range of performance improvements (Chrome & Safari). Good call :)
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Skinney
Jun 13, 2017
Contributor
Another interesting thing, for future reference, is that I consistently get 50+% (so another 5-10 percentiles) performance improvement when inlining the ctr > 500 line in the compiled code. By that I mean changing the compiled code from ...Util.cmp(ctr, 500) > 0 -> ctr > 500.
|
Another interesting thing, for future reference, is that I consistently get 50+% (so another 5-10 percentiles) performance improvement when inlining the |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
evancz
Jun 13, 2017
Member
Neat! I want to do some thinking about whether this has implications on how pattern matching code is generated.
About it not compiling, you need to change it from a :: b :: [] to a :: b :: _ and do a recursive call.
Can you share the code that is doing the performance comparison? I would like to run it myself and see the numbers. With that ability, I can take a look at this on my own in the next few days.
And yeah, this is one of the first optimizations we can do if type information is kept all the way until code generation. Swapping in === and < for String, Int, etc. will surely make things faster.
|
Neat! I want to do some thinking about whether this has implications on how pattern matching code is generated. About it not compiling, you need to change it from Can you share the code that is doing the performance comparison? I would like to run it myself and see the numbers. With that ability, I can take a look at this on my own in the next few days. And yeah, this is one of the first optimizations we can do if type information is kept all the way until code generation. Swapping in |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
evancz
Jun 16, 2017
Member
I merged it in, but it did not register here for some reason.
I ended up reducing the unrolling to 4 to make things a bit smaller without huge perf losses. Not sure if it's the right trade, but it seemed like a reasonable compromise.
Anyway, thanks for looking into this!
|
I merged it in, but it did not register here for some reason. I ended up reducing the unrolling to 4 to make things a bit smaller without huge perf losses. Not sure if it's the right trade, but it seemed like a reasonable compromise. Anyway, thanks for looking into this! |
Skinney commentedJun 12, 2017
•
edited
Edited 1 time
-
Skinney
edited Jun 12, 2017 (most recent)
This PR takes the same idea presented in #707 and applies it to
List.foldr. I see performance improvements of 30-40% on my machine (Safari and Chrome). Sincefoldris used by many other list functions, this gives a nice performance boost across the board.I tested each level of "unrollment" individually. 9 seemed to be the magic number.
I tried the same thing with
foldl. It gave a modest improvement in Chrome (8-10%), but reduced performance in Safari (10-14%).I have not tested if #707 still provides a performance improvement over a map fn using this foldr implementation.