Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upUse tail-recursive version of List.take at large n #668
Conversation
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
process-bot
Jul 18, 2016
Thanks for the pull request! Make sure it satisfies this checklist. My human colleagues will appreciate it!
Here is what to expect next, and if anyone wants to comment, keep these things in mind.
process-bot
commented
Jul 18, 2016
|
Thanks for the pull request! Make sure it satisfies this checklist. My human colleagues will appreciate it! Here is what to expect next, and if anyone wants to comment, keep these things in mind. |
jvoigtlaender
referenced this pull request
Jul 18, 2016
Closed
List.take throws "Maximum call stack size exceeded" with large numbers #601
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
evancz
Jul 18, 2016
Member
This is really exciting! It looks like this implementation falls back on the slow/reliable version based on the initial input, whereas the approach Yaron suggested in his post only fell back on the other method after getting N deep. Did you try an implementation like that? I'd expect it to make the beginning of a long take faster. I think it would require having an additional Int argument that tracks stack depth in takeFast but I'm not certain.
|
This is really exciting! It looks like this implementation falls back on the slow/reliable version based on the initial input, whereas the approach Yaron suggested in his post only fell back on the other method after getting N deep. Did you try an implementation like that? I'd expect it to make the beginning of a long |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
evancz
Jul 18, 2016
Member
Two other thoughts:
- I have a theory on why unrolling to 4x does not increase stack depth by 4x. The stack is a fixed size. Allocating a stack frame takes space, independent of the particulars of the function. So I guess we save on the fixed costs of allocating a stack frame, but the body of that stack frame is longer than before. And maybe that fixed cost is so high, that jamming more in the function body ultimately takes a bit less space. Otherwise, I'm not sure! :)
- I am curious if this approach would give us a faster
mapimplementation as well. It'd be great to move more (all?) ofListout of JS, and perhaps it'd be faster in the typical cases anyway. In other words, if you are interested in looking into this approach on other functions, I think there's a chance it would work really well!
|
Two other thoughts:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
evancz
Jul 18, 2016
Member
Okay, sorry for being disorganized. Here are the two crucial questions.
First, what is the impact of falling back to takeTailRec only after you have recursed through takeFast as far as possible?
If it works well, I think that's the way to go.
Second, what is the impact of doing some unrolling in takeTailRec?
I notice it does not do that at the moment. I assume unrolling makes the compiled code larger, so at some point, it becomes undesirable. Unclear how that tradeoff would play out, so I think it should be outside the scope of this PR. I just wanted to bring it up as soon as possible.
|
Okay, sorry for being disorganized. Here are the two crucial questions. First, what is the impact of falling back to If it works well, I think that's the way to go. Second, what is the impact of doing some unrolling in I notice it does not do that at the moment. I assume unrolling makes the compiled code larger, so at some point, it becomes undesirable. Unclear how that tradeoff would play out, so I think it should be outside the scope of this PR. I just wanted to bring it up as soon as possible. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
nphollon
Jul 18, 2016
Contributor
I hadn't tried switching implementations mid-stream the way Yaron does. But I'm looking at that now and it does look like an improvement. As you'd expect, the execution time drifts slowly from the takeFast curve to the takeTailRec curve. I will update the PR.
Unrolling takeTailRec gave a slight improvement in performance (about 10% faster for n=5000). This is a much weaker effect than it has on takeFast (about 75% faster at n=5000).
|
I hadn't tried switching implementations mid-stream the way Yaron does. But I'm looking at that now and it does look like an improvement. As you'd expect, the execution time drifts slowly from the Unrolling |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
evancz
Jul 18, 2016
Member
About switching midway, great, glad it works!
That's really interesting on takeTailRec. Perhaps because it gets compiled to a while loop, the cost of looping back is not really that high relative to everything else, whereas with a function call, the book keeping around that dominates the costs so much that avoiding it is a big deal. Given the numbers, unrolling or not both seem fine to me, so I'll leave it up to your discretion.
|
About switching midway, great, glad it works! That's really interesting on |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
nphollon
Jul 18, 2016
Contributor
Implementation has been updated. I left takeTailRec as is.
I'd be happy to take a look at the other List functions, but I am not I will have much time to contribute in the near future.
|
Implementation has been updated. I left I'd be happy to take a look at the other List functions, but I am not I will have much time to contribute in the near future. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
evancz
Jul 18, 2016
Member
Alright, looks good to me. Thanks for working on this and sharing your findings. I learned some cool stuff about JavaScript :)
|
Alright, looks good to me. Thanks for working on this and sharing your findings. I learned some cool stuff about JavaScript :) |
evancz
merged commit e24730e
into
elm:master
Jul 18, 2016
1 check was pending
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Glad I was able to help! |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Mouvedia
commented
Dec 25, 2016
|
Was the choice of 1000 as the limit completely arbitrary? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
nphollon
Dec 27, 2016
Contributor
Not completely. See the comments in original post about setting the limit at 5000. With the changes that happened after that, I needed to lower the limit further.
If the limit is too high, the stack may overflow. The exact size of the stack depends on the platform. It's probably safe to raise the limit a little bit, but since I could not test the code on every platform, I thought it better to err on the conservative side.
|
Not completely. See the comments in original post about setting the limit at 5000. With the changes that happened after that, I needed to lower the limit further. If the limit is too high, the stack may overflow. The exact size of the stack depends on the platform. It's probably safe to raise the limit a little bit, but since I could not test the code on every platform, I thought it better to err on the conservative side. |
nphollon commentedJul 18, 2016
This is a follow up to #659. We are looking for a way of implementing
List.takethat doesn't run the risk of a stack overflow and has good performance over as wide a domain as possible. My previous PR failed the second criterion, but after taking a look at this post, I think I have found something that has acceptable performance.Would love to get some feedback on this!
The Fast Way
takeFastis similar to the current version oftakewith a bit of loop unrolling. This means that we don't do quite as much recursion, so it can handle larger values ofnwithout crashing. It is also slightly faster than the current implementation.I expected that unrolling the loop 4 times would have raised the limit on
nby a factor of 4, but this turned out to be wrong. In Node, it's only a factor of about 2.6.If we unroll the loop further, we can speed up the function and raise the limit on
n, but these effects drop off pretty quickly.The Tail-Recursive Way
takeTailRecimplementstakeso that the recursive calls can be optimized. The good news is we no longer have to worry about stack overflows. The bad news is we need to reverse the list before returning it.Experimentally, the execution time for
takeTailRecis about 1.5 times that oftakeFast.Switching between them
The new version of
List.takein this PR usestakeFastfor smallnand switches totakeTailRecwhenn > 5000. This switch point is somewhat arbitrary. Obviously we want to use the faster implementation as much as we can, but different platforms have different stack sizes, so it's not clear exactly where the danger line is. Firefox started having issues withtakeFastaroundn > 7000. Chrome and IE were better. If we want to test on other platforms, I can share the code for my (very primitive) test harness.Methodology
These implementations of
takewere tested by repeatedly runningtake n [1..10000]for different values of n. Testing was done in Node, Firefox, and Chromium (on Linux Mint) and IE (on Windows 7). I would be happy to provide the raw data and the source code for the test harness.