Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficiently creating a large IPSet from a list of IPRanges? #171

Closed
varenc opened this issue Apr 11, 2018 · 3 comments
Closed

Efficiently creating a large IPSet from a list of IPRanges? #171

varenc opened this issue Apr 11, 2018 · 3 comments

Comments

@varenc
Copy link

varenc commented Apr 11, 2018

Hi there folks,

I'm creating an IPSet that contains the ~3k IP ranges. (specifically from this list)

I'm running into pretty poor performance on this though. It takes ~4 minutes to create the IPSet doing this the simplest way.

bad_ip_set = IPSet()
for ip_start, ip_end in ip_ranges:
    bad_ip_set.add(IPRange(ip_start, ip_end))

The slow down here is that IPSet.compact gets ran for every addition. compact seems to be O(<len of set>) so as the set gets larger this gets slower and slower.

But crazily, if I do this performance goes from 4 minutes to 45 seconds.

bad_ip_set = IPSet()
for ip_start, ip_end in ip_ranges:
    for cidr in IPRange(ip_start, ip_end).cidrs():
        bad_ip_set.add(cidr)

The reason for the speed up here, is that when adding a cidr by itself, IPSet only needs to run _compact_single_network which is quite speedier.

But if I do this, it only takes 700ms on my system!

bad_ip_set = IPSet()
for ip_start, ip_end in ip_ranges:
    for cidr in IPRange(ip_start, ip_end).cidrs():
        bad_ip_set._cidrs[cidr] = True
bad_ip_set.compact()

Manually modifying the protected _cidrs is definitely naughty, but it avoids having to constantly call compact or _compact_single_network. The IPSet I generate this way seems to work just fine, but I wouldn't recommend it.

So my question for you folks, is there a better way to do this?

If not, what do you think about me making a pull request to do one of these:

  • Make it so when you .add an IPRange you get the same performance as adding that range's cidr's manually? (letting us do _compact_single_network instead of compact). Though I'm not sure if this is always faster, or some pathological cases would be worse.
  • Allow IPSet.add to take a list of IPRanges and then only run the compact step after that's all completed.
  • Alternatively, maybe just make a "skip_compact" option on .add that comes with a big disclaimer on running compact afterward.
jstasiak added a commit that referenced this issue Jun 22, 2020
This is based on Henry's initial patch from GH-166. Changes:

* Fixed the return type of cidr_merge() so it's always IPNetworks,
  not IPRanges as it was in some cases in the original patch
* Added tests
* Updated documentation

This addresses GH-171.

Co-authored-by: Henry Stern <hstern@securityscorecard.io>
@jstasiak
Copy link
Contributor

@varenc Right now (netaddr 0.7.19 and 0.7.20, not sure about the older releases) both IPSet's constructor and its updated() method accept (among others) an iterable of mix of IPNetwork and IPAddress objects, so if you convert IPRange to those with the cidrs() method you can efficiently create an IPSet() (or update an existing one).

This should do the job:

ip_set = IPSet(itertools.chain.from_iterable(range.cidrs() for range in ranges))

Using the ranges from your link I get the following numbers:

  • the bad_ip_set = IPSet(); for range in ranges: bad_ip_set.add(IPRange(ip_start, ip_end)) way – 2m 5s
  • the IPSet(itertools.chain.from_iterable(range.cidrs() for range in ranges)) way – 164ms

So this should be already satisfactory and I'll close this issue now. On top of that starting with next release you'll be able to pass an iterable of IPRange objects to IPSet constructor and the update() method like IPSet(ranges) and this cuts down the runtime to 112ms on my machine.

@jstasiak
Copy link
Contributor

jstasiak commented Jul 3, 2020

Since 0.8.0 you can pass an iterable of IPRange objects to IPSet.

@varenc
Copy link
Author

varenc commented Jul 4, 2020

Thanks for the follow up! Awesome to see this fix. I'm not really using this stuff anymore, but it's heartwarming to see that my 2+ year old test cases were still useful and read.

Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants