New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OrderedSet is slower than OrderedDict; should use OrderedDict instead #73
Comments
FYI at https://github.com/pantsbuild/pants/blob/master/src/python/pants/util/ordered_set.py, we ~forked this library to create an I don't have benchmarks handy, but I believe We didn't upstream our implementation, as we made several major changes (simplifications) like no longer implementing |
@Eric-Arellano Funny how this implementation is also inspired from Raymond Hettinger's, but uses a With a linked list, index access would be O(n), and deletion O(1). I am not so sure about the pull request anymore. I would never think of using index access on a set, but I suppose it also depends on the other people using this library. |
Indeed! We 95% of the time use So, we care far more about access by index than deletion. -- Possibly interesting to you - we also added a simple --
Yeah, this was a big reason we forked. I originally tried adding I think the |
Cool! I am into functional programming (functional core / imperative shell), and I agree that immutability is valuable. I might "steal" your ideas 😁 I wish you good luck with your project!
Hmm, you have no access by index in your OrderedSet. Are you talking about the dict? Or perhaps you mean membership check?
I can totally understand. Perhaps the author and users might agree. In that case, I am not sure how to deal with the license (MIT vs. Apache). The author has to think about whether to keep your implementation under Apache (and have multiple licenses inside this project), or ask all contributors to that file for MIT permission (That is, if a copy of the file is desired instead of a re-implementation). |
@Eric-Arellano One way you could still keep the mutation methods is by having them return a new object that is a copy of the old, but with some changed things. That way, you get immutable mutation! 😆 |
Added a note to the top of the README about when you would prefer to use OrderedDict. |
Since it requires Python >= 3.5 anyway,
OrderedSet
can be reduced to an adapter ofcollections.OrderedDict
. One can easily use adict
as aset
by ignoring thedict
's values.A synthetic benchmark I ran shows that, if CPU caching does not come into play (a big if), an
OrderedDict
is faster (especially in the case of deletion, which is O(n) ):I am quite convinced there would be performance improvements, if we changed the implementation to use an OrderedDict instead of a set and a list (or a Python 3.6 dict, which is ordered, as @Eric-Arellano suggests).
Edit: However, this would make the access by index (like
OrderedSet[1]
) much slower (complexity O(n) instead of the current O(1)).Would you accept a PR doing so?
The text was updated successfully, but these errors were encountered: