New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unique Method #9
Comments
Yes! Thanks for filing this issue--I should have added it soon after the discussion at the conference but was distracted with talks and events. I hesitated to implement an "assert unique" method because I didn't want to add a feature that would break with larger-than-memory sets of data but I didn't want to just persist it all to a temp-file either (for a method that would assert uniqueness for any given data type, not just integers, though that was the use case discussed). I did some thinking about it and I'm wondering if I could use Bloom filter to take care of most of the work and then make a second pass over the data to eliminate any false positives. But maybe there's an even better approach that I just can't see right now. If you have an idea, yourself, I'd be glad to hear it. I think the Bloom filter approach is promising though. |
There's a bit of refactoring that I want to do first (some magic removal) but I'll look at implementing this "assert unique" behavior soon after. |
The initial work for this is done: 055f438. The new Currently, the implementation is unoptimized--it cannot run on data larger than available RAM. I've opened issue #13 for the planned, future optimization. If anyone needs this method before the next release, you can install the development release with:
|
Hey Shawn - one of the problems you were speaking about at PyCon 2016 was looking to guarantee that all integers in a list were unique, in an efficient way for large sets of data?
The text was updated successfully, but these errors were encountered: