-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate IPv4 networks, IPv6 addresses, IPv6 networks #112
Conversation
Produce IPv4 networks, IPv6 addresses, IPv6 networks
Codecov Report
@@ Coverage Diff @@
## master #112 +/- ##
==========================================
+ Coverage 69.84% 70.46% +0.62%
==========================================
Files 43 43
Lines 955 965 +10
==========================================
+ Hits 667 680 +13
+ Misses 288 285 -3
Continue to review full report at Codecov.
|
Thanks @davidchall ! This is great to have some of the ip address stuff finished off Looks like there's a modest slow down compared to iptools: microbenchmark(
ipaddress = ipaddress::sample_ipv4(1),
iptools = iptools::ip_random(1),
times = 10^4
)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> ipaddress 89.379 93.396 112.63896 95.3445 98.352 6342.843 10000
#> iptools 12.678 14.065 17.35034 16.0760 16.834 6208.996 10000 I don't know if that's accurate or meaningful. I don't typically use this kind of data so not sure of the use cases. e.g., do people often want to generate millions of IP addresses at a time (in which case the speed may become an issue), or do people most often generate 10's to hundreds/thousands of addresses at a time (in which case speed difference probably not an issue)?
I'm not super familiar with the terminology. By "We could achieve this", what do you mean exactly? Is it best to avoid generating addresses in a reserved network? That is, should we avoid that here as well? |
Hi @sckott, Your benchmarking results are really interesting - thanks for bringing this to my attention! My first thought was that {ipaddress} supports both IPv4 and IPv6, and so there is some additional overhead involved. If we look at generating many addresses, then we see that {ipaddress} is faster than {iptools}: microbenchmark(
ipaddress = ipaddress::sample_ipv4(1e5),
iptools = iptools::ip_random(1e5)
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> ipaddress 6.717324 12.01752 39.4633 15.91629 22.50388 396.7741 100
#> iptools 53.141831 63.32564 117.5727 73.44913 117.57090 680.6242 100 If people want to generate millions of IP addresses, I'd recommend using
The protocol reserves some regions of IP address space for special usage, and so a user would never be assigned one of these reserved addresses. Charlatan is creating fake user data, so I think it makes sense to exclude such addresses. The same idea applies to IPv6 too, though {faker} doesn't handle this (yet). In reality, IP address allocation is very complicated. Here are a few other points to consider:
Yuck! It might make most sense for {charlatan} to avoid these complexities altogether and simply randomly generate any address (i.e. let's just forget about excluding networks). Let me know your decision and I can update the PR. BTW -- I was suggesting that we could prevent {charlatan} from generating reserved addresses by using an accept-reject algorithm. In contrast, {faker} uses weighted sampling from the non-excluded networks. The {faker} implementation has a 100% acceptance rate (i.e. they will use the very first IP address they generate), whereas {charlatan} might need to generate 2 or more addresses until it finds an accepted address. However, the accept-reject algorithm is much easier to understand and they acceptance rate is expected to be high (roughly 87%). |
Good point that if a user wanted >1 address they'd be much better off with a vectorized approach. We should take advantage of any vectorization when possible. This is a longer term issue, charlatan i think largely does 1 thing at a time, and if you want many of those things you have to run the method that many times. opened an issue #113 I like the simplicity of just randomly generating any address. And then we could point people to your package in the documentation if they want more control/etc. But, what do you prefer? |
Yeah, I like that approach. In the future, I might add a weighted sampling function (davidchall/ipaddress#67), similar to how {faker} handles this. |
okay, let me know when you're done updating the PR |
The only things I'm wondering about is whether you'd like me to update the NEWS and codemeta.json files, or is that something you handle? Otherwise, I'm done already 👍 |
no, i update news and codemeta before new releases to cran |
Description
I noticed charlatan has some TODO/FIXMEs related to generating:
My ipaddress package supports randomly sampling the IPv6 address space, and can also do the bit masking needed to generate networks for both IPv4 and IPv6.
BTW the faker module won't generate an address in a reserved network (see here). We could achieve this using an accept-reject algorithm (see here), if this is something you're interested in?
Related Issue
None. The FIXMEs are in the code.
Example
Created on 2020-09-25 by the reprex package (v0.3.0)