-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster implementation to collapse consecutive ip-networks #65025
Comments
This alternative implementation runs over the Technically both are O(n), so the best case is always the same. But the old implementation runs over the *complete* list multiple times until it cannot make any more optimisations. The new implementation only repeats the optimisation on elements which require reconciliation. Tests on a local machine have shown a considerable increase in speed on large collections of elements (iirc about twice as fast on average). |
Hey Exhuma, thanks for the patch. Can you give me an example list on which this shift reduce approach works much better? |
Sorry for the late reply. I wanted to take some time and give a more detailed explanation. At least to the best of my abilities :) I attached a zip-file with my quick-and-dirty test-rig. The zip contains:
I am not sure how sensitive the data is I am working with, so I prefer not to put any of the real data on a public forum. Instead, I wrote a small script which generates a data-set which makes the performance difference visible ( I then ran the operation 5 times using Let me also outline an explanation to what it happening: It is possible, that through one "merge" operation, a second may become possible. For the sake of readability, let's use IPv4 addresses, and consider the following list:
This can be reduced to
Which in turn can then be reduced to:
The current implementation, sets a boolean (
With the shift/reduce approach, only as many nodes are visited as necessary. If a "reduce" is made, it "backtracks" as much as possible, but not further. So in the above example, nodes [a1..an] will only be visited once, and [b1..bn] will only be visited once the complete optimisation of the example addresses has been performed. With the given data-set, this gives the following result:
If my thoughts are correct, both implementations should have a similar "best-case", but the "worst-case" differs significantly. I am not well-versed with the Big-O notation, especially the best/average/worst case difference. Neither am I good at math. But I believe both are strictly speaking O(n). But more precisely, O(k*n) where k is proportional the number of reconciliation steps needed (that is, if one merger makes another merger possible). But it is much smaller in the shift/reduce approach as only as many elements need to be revisited as necessary, instead of all of them. |
From a quick look, the algorithm is indeed correct, and it should perform better on average than the previous one. Note: both algorithms are O(n**2) worst case, not O(n). |
Here is a much faster patch, around 30x faster than the original code. With exhuma's data set and tester.py, the original code gives: $ time python3.4 tester.py
Execution time: 5.949284339199949 seconds real 0m30.152s The patched code gives: $ time ./python tester.py
Execution time: 0.25444041779992405 seconds real 0m1.695s exhuma, perhaps you want to test with other data sets? |
Uh, those measurements are wrong, sorry, since tester.py doesn't consume the iterator. When consuming the iterator, the patch is ~ 4x faster than the original code, which is more reasonable :-) |
It seems like the speed of Network.supernet() is a bottleneck here. bpo-16531 would probably allow making supernet() much faster. |
Updated patch, a bit faster yet. After bpo-16531 is committed, it is now ~15x faster than 3.4. |
New changeset 8867874a2b7d by Antoine Pitrou in branch 'default': |
I've now committed this. exhuma, if you have any further observations or results, don't hesitate to post them! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: