Generate IPv4 networks, IPv6 addresses, IPv6 networks #112

davidchall · 2020-09-26T01:36:04Z

Description

I noticed charlatan has some TODO/FIXMEs related to generating:

IPv4 networks
IPv6 addresses
IPv6 networks

My ipaddress package supports randomly sampling the IPv6 address space, and can also do the bit masking needed to generate networks for both IPv4 and IPv6.

BTW the faker module won't generate an address in a reserved network (see here). We could achieve this using an accept-reject algorithm (see here), if this is something you're interested in?

Related Issue

None. The FIXMEs are in the code.

Example

library(charlatan)

x <- InternetProvider$new()
x$ipv4()
#> [1] "190.172.2.193"
x$ipv4(network = TRUE)
#> [1] "67.64.192.0/18"
x$ipv6()
#> [1] "40dc:98a8:380:548b:8822:4e97:1fce:6942"
x$ipv6(network = TRUE)
#> [1] "fba1:738d:df08:cb00::/58"

^{Created on 2020-09-25 by the reprex package (v0.3.0)}

Produce IPv4 networks, IPv6 addresses, IPv6 networks

codecov-commenter · 2020-09-26T01:42:33Z

Codecov Report

Merging #112 into master will increase coverage by 0.62%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #112      +/-   ##
==========================================
+ Coverage   69.84%   70.46%   +0.62%     
==========================================
  Files          43       43              
  Lines         955      965      +10     
==========================================
+ Hits          667      680      +13     
+ Misses        288      285       -3

Impacted Files	Coverage Δ
R/internet-provider.R	`57.94% <100.00%> (+6.39%)`	⬆️
R/taxonomy-provider.R	`100.00% <0.00%> (+8.33%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update efd3585...9152346. Read the comment docs.

sckott · 2020-09-29T00:18:28Z

Thanks @davidchall ! This is great to have some of the ip address stuff finished off

Looks like there's a modest slow down compared to iptools:

microbenchmark(
  ipaddress = ipaddress::sample_ipv4(1),
  iptools = iptools::ip_random(1),
  times = 10^4
)
#> Unit: microseconds
#>       expr    min     lq      mean  median     uq      max neval
#>  ipaddress 89.379 93.396 112.63896 95.3445 98.352 6342.843 10000
#>    iptools 12.678 14.065  17.35034 16.0760 16.834 6208.996 10000

I don't know if that's accurate or meaningful. I don't typically use this kind of data so not sure of the use cases. e.g., do people often want to generate millions of IP addresses at a time (in which case the speed may become an issue), or do people most often generate 10's to hundreds/thousands of addresses at a time (in which case speed difference probably not an issue)?

BTW the faker module won't generate an address in a reserved network (see here). We could achieve this using an accept-reject algorithm (see here), if this is something you're interested in?

I'm not super familiar with the terminology. By "We could achieve this", what do you mean exactly? Is it best to avoid generating addresses in a reserved network? That is, should we avoid that here as well?

davidchall · 2020-09-29T16:02:34Z

Hi @sckott,

Your benchmarking results are really interesting - thanks for bringing this to my attention! My first thought was that {ipaddress} supports both IPv4 and IPv6, and so there is some additional overhead involved. If we look at generating many addresses, then we see that {ipaddress} is faster than {iptools}:

microbenchmark(
  ipaddress = ipaddress::sample_ipv4(1e5),
  iptools = iptools::ip_random(1e5)
)
#> Unit: milliseconds
#>       expr       min       lq     mean   median        uq      max neval
#>  ipaddress  6.717324 12.01752  39.4633 15.91629  22.50388 396.7741   100
#>    iptools 53.141831 63.32564 117.5727 73.44913 117.57090 680.6242   100

If people want to generate millions of IP addresses, I'd recommend using ipaddress::sample_ipv4() directly (instead of using charlatan::InternetProvider$ipv4()), because this takes advantage of a vectorized implementation.

BTW the faker module won't generate an address in a reserved network (see here). We could achieve this using an accept-reject algorithm (see here), if this is something you're interested in?

I'm not super familiar with the terminology. By "We could achieve this", what do you mean exactly? Is it best to avoid generating addresses in a reserved network? That is, should we avoid that here as well?

The protocol reserves some regions of IP address space for special usage, and so a user would never be assigned one of these reserved addresses. Charlatan is creating fake user data, so I think it makes sense to exclude such addresses. The same idea applies to IPv6 too, though {faker} doesn't handle this (yet).

In reality, IP address allocation is very complicated. Here are a few other points to consider:

Public vs private: Some address ranges are reserved for private networks (e.g. LANs). But depending on the situation, {charlatan} users might want to generate these.
Unallocated: Although the IPv4 address space is now depleted, the IPv6 address space has only allocated a very small proportion of its addresses. So technically speaking, addresses shouldn't be generated in these unallocated regions. However, new addresses are getting allocated all the time...
Countries: Different address ranges are allocated to different countries. For {charlatan}, you could argue this should be incorporated into the localization model. However, there are many reasons this is not a strict rule (e.g. VPNs allow a user in country A to have an IP address in country B). And these country allocations also can change.

Yuck! It might make most sense for {charlatan} to avoid these complexities altogether and simply randomly generate any address (i.e. let's just forget about excluding networks). Let me know your decision and I can update the PR.

BTW -- I was suggesting that we could prevent {charlatan} from generating reserved addresses by using an accept-reject algorithm. In contrast, {faker} uses weighted sampling from the non-excluded networks. The {faker} implementation has a 100% acceptance rate (i.e. they will use the very first IP address they generate), whereas {charlatan} might need to generate 2 or more addresses until it finds an accepted address. However, the accept-reject algorithm is much easier to understand and they acceptance rate is expected to be high (roughly 87%).

sckott · 2020-10-01T00:06:23Z

Good point that if a user wanted >1 address they'd be much better off with a vectorized approach. We should take advantage of any vectorization when possible. This is a longer term issue, charlatan i think largely does 1 thing at a time, and if you want many of those things you have to run the method that many times. opened an issue #113

I like the simplicity of just randomly generating any address. And then we could point people to your package in the documentation if they want more control/etc. But, what do you prefer?

davidchall · 2020-10-01T01:07:14Z

Yeah, I like that approach. In the future, I might add a weighted sampling function (davidchall/ipaddress#67), similar to how {faker} handles this.

sckott · 2020-10-01T01:10:24Z

okay, let me know when you're done updating the PR

davidchall · 2020-10-01T01:12:07Z

The only things I'm wondering about is whether you'd like me to update the NEWS and codemeta.json files, or is that something you handle? Otherwise, I'm done already 👍

sckott · 2020-10-01T15:27:21Z

no, i update news and codemeta before new releases to cran

Use ipaddress instead of iptools

0c69ae1

Produce IPv4 networks, IPv6 addresses, IPv6 networks

Add tests

9152346

sckott added this to the v0.5 milestone Oct 6, 2020

sckott merged commit 4b797ff into ropensci:master Oct 6, 2020

davidchall deleted the ipaddress branch April 1, 2021 03:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate IPv4 networks, IPv6 addresses, IPv6 networks #112

Generate IPv4 networks, IPv6 addresses, IPv6 networks #112

davidchall commented Sep 26, 2020

codecov-commenter commented Sep 26, 2020 •

edited

Loading

sckott commented Sep 29, 2020

davidchall commented Sep 29, 2020 •

edited

Loading

sckott commented Oct 1, 2020

davidchall commented Oct 1, 2020

sckott commented Oct 1, 2020

davidchall commented Oct 1, 2020

sckott commented Oct 1, 2020

Generate IPv4 networks, IPv6 addresses, IPv6 networks #112

Generate IPv4 networks, IPv6 addresses, IPv6 networks #112

Conversation

davidchall commented Sep 26, 2020

Description

Related Issue

Example

codecov-commenter commented Sep 26, 2020 • edited Loading

Codecov Report

sckott commented Sep 29, 2020

davidchall commented Sep 29, 2020 • edited Loading

sckott commented Oct 1, 2020

davidchall commented Oct 1, 2020

sckott commented Oct 1, 2020

davidchall commented Oct 1, 2020

sckott commented Oct 1, 2020

codecov-commenter commented Sep 26, 2020 •

edited

Loading

davidchall commented Sep 29, 2020 •

edited

Loading