New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficiently creating a large IPSet from a list of IPRanges? #171
Comments
This is based on Henry's initial patch from GH-166. Changes: * Fixed the return type of cidr_merge() so it's always IPNetworks, not IPRanges as it was in some cases in the original patch * Added tests * Updated documentation This addresses GH-171. Co-authored-by: Henry Stern <hstern@securityscorecard.io>
@varenc Right now ( This should do the job: ip_set = IPSet(itertools.chain.from_iterable(range.cidrs() for range in ranges)) Using the ranges from your link I get the following numbers:
So this should be already satisfactory and I'll close this issue now. On top of that starting with next release you'll be able to pass an iterable of |
Since 0.8.0 you can pass an iterable of |
Thanks for the follow up! Awesome to see this fix. I'm not really using this stuff anymore, but it's heartwarming to see that my 2+ year old test cases were still useful and read. Cheers! |
Hi there folks,
I'm creating an IPSet that contains the ~3k IP ranges. (specifically from this list)
I'm running into pretty poor performance on this though. It takes ~4 minutes to create the IPSet doing this the simplest way.
The slow down here is that
IPSet.compact
gets ran for every addition.compact
seems to beO(<len of set>)
so as the set gets larger this gets slower and slower.But crazily, if I do this performance goes from 4 minutes to 45 seconds.
The reason for the speed up here, is that when adding a cidr by itself, IPSet only needs to run
_compact_single_network
which is quite speedier.But if I do this, it only takes 700ms on my system!
Manually modifying the protected _cidrs is definitely naughty, but it avoids having to constantly call compact or _compact_single_network. The IPSet I generate this way seems to work just fine, but I wouldn't recommend it.
So my question for you folks, is there a better way to do this?
If not, what do you think about me making a pull request to do one of these:
.add
anIPRange
you get the same performance as adding that range's cidr's manually? (letting us do_compact_single_network
instead ofcompact
). Though I'm not sure if this is always faster, or some pathological cases would be worse.IPSet.add
to take a list of IPRanges and then only run the compact step after that's all completed..add
that comes with a big disclaimer on running compact afterward.The text was updated successfully, but these errors were encountered: