Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi, here is a modest contribution proposal to the AggregateApproxIPV4s feature.
TL;DR : In its current state, the algorithm used to heuristically aggregate IPv4s in approximate subnets gives different results depending on the order in which the addresses are provided.
The algorithm creates a new /24 subnet for each new IP address that is not already contained in a previously seen subnet, and then for each new address matching the subnet, it specializes the subnet mask. The found subnets are therefore highly dependent on the order in which the IPs were provided in the input.
Here is a short example :
In the first case (ordered input), the algorithm first creates a /24 subnet (10.10.0.1), then specializes in /30 (10.10.0.2), and finally opens back up to /24 to match the .255. This generates a nice "guessed" /24 subnet that contains all of the input.
However, in the second case, the algorithm creates a /24 (10.10.0.1), stays at /24 (10.10.0.255), and then specializes to /30 because of the .2. The result of this is an output in /30 which misleads the user to think that no IP outside the /30 range exists in the dataset.
The contribution I propose is a sort of the input IP addresses at the start of the function which :
Please let me know if this is a known/intended behaviour.
PS
The sort feature did not fix the output in my case :