Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UniqueBy is implemented using recursion #61

Closed
michaeljones opened this issue Jun 22, 2017 · 9 comments
Closed

UniqueBy is implemented using recursion #61

michaeljones opened this issue Jun 22, 2017 · 9 comments

Comments

@michaeljones
Copy link

michaeljones commented Jun 22, 2017

I have experienced a problem with hitting the maximum call stack size when using uniqueBy on a long list (22,000 entries.) I am not very wise in the ways of these things but it occurs to me that it might be due to the recursive implementation.

I have put together a foldl based implementation that seems to run without issue though i haven't verified that is works properly yet. If it does, would you be interested in a pull request?

uniqueBy : (a -> comparable) -> List a -> List a
uniqueBy f list =
    let
        foldf entry ( set, current ) =
            let
                key =
                    f entry
            in
            if Set.member key set then
                ( set, current )
            else
                ( Set.insert key set, entry :: current )
    in
    List.foldl foldf ( Set.empty, [] ) list
        |> Tuple.second
@pzp1997
Copy link
Contributor

pzp1997 commented Jun 22, 2017

The current implementation is as it is for performance reasons. Could you make a benchmark comparing your version to the current implementation to see how performance will be affected using elm-benchmark? If there is a significant drop-off, I could implement a trick to have the current implementation switch to a tail-call optimized implementation when the list exceeds a certain number of elements (I happen to be currently working on doing this for the At functions). That way there won't be a performance hit for small lists (< 2000 elements or so) but the stack won't overflow for large lists like the one you're dealing with.

@michaeljones
Copy link
Author

Ah, that makes sense. Thanks for the explanation. I don't have any experience with elm-benchmark but I'll try to give it ago within the next week if I can. I did read of a similar switching approach in some other elm discussion thread when I was googling around. Seems like it might be necessary.

@pzp1997
Copy link
Contributor

pzp1997 commented Jun 22, 2017

I did some quick benchmarking and your implementation seems to be performing more or less on par with the current one. One thing I would be careful of is to make sure you're not reversing the list in the process.

@michaeljones
Copy link
Author

Thank you for doing that, very interesting to see. And thank you for pointing out the bug. I hadn't realised at all, though fortunately it isn't a big deal for my case as the code is already in production!

Playing with the benching marks I couldn't see a huge difference between the foldl + reverse and foldr but I assume there is an understood best approach for this. Otherwise happy to hear that it is within tolerance. Still happy to see a switch in there if it is deemed preferable.

Thanks again! A very educational experience all in.

@andys8
Copy link
Contributor

andys8 commented Jul 18, 2017

I've the same problem and I think the problem exists in many methods of this library because of their recursive implementation. In my opinion it should be solved in a way which will not possibly crash an entire application at runtime if a list is long enough.

list : List Int
list =
    List.range 0 100000
    |> List.Extra.unique

image

https://ellie-app.com/3MxzbGDZdNSa1/0

@michaeljones Thanks for the non-recursive example :)

@andys8
Copy link
Contributor

andys8 commented Jul 18, 2017

Out of interest I made an implementation based of the original uniqueHelp using an accumulator. Like the first implementation the list has to be reversed at the end.

uniqueByAcc : (a -> comparable) -> List a -> List a
uniqueByAcc f list =
    uniqueHelpAcc [] f Set.empty list


uniqueHelpAcc : List a -> (a -> comparable) -> Set comparable -> List a -> List a
uniqueHelpAcc acc f existing remaining =
    case remaining of
        [] ->
            List.reverse acc

        first :: rest ->
            let
                computedFirst =
                    f first
            in
                if Set.member computedFirst existing then
                    uniqueHelpAcc acc f existing rest
                else
                    uniqueHelpAcc (first :: acc) f (Set.insert computedFirst existing) rest

I benchmarked the two tail recursive versions and it seems to be a little bit faster depending on the number of collisions. The test is also testing "100.000 Ints" which would fail with the original implementation.

Benchmark

https://ellie-app.com/3MyCRxqSyNva1/2

Group: many collisions

image

Group: some collisions

image

Group: no collisions

image

@andys8
Copy link
Contributor

andys8 commented Jul 19, 2017

I could create a pull request. Do you want me to do it?

@Chadtech
Copy link
Collaborator

Yeah that would be awesome!

@andys8
Copy link
Contributor

andys8 commented Jul 19, 2017

@Chadtech Here you go #77

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants