Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Requesting the support of IPinfo Free IP to Country ASN dataset #862

Closed
abdullahdevrel opened this issue Dec 12, 2023 · 10 comments · Fixed by #871
Closed
Labels
enhancement New feature or request
Milestone

Comments

@abdullahdevrel
Copy link

I am the DevRel of IPinfo. I would like to request supporting the IPinfo free IP to Country dataset in Trippy. Features of the database:

  • Free as in CC-BY-SA 4.0
  • Full accuracy and daily updates
  • Includes IPv4 and IPv6 addresses in a single db
  • Country + ASN in a single db
  • Flat data structure
  • Comes in MMDB format

The database comes in MMDB format, so I believe it can be easily ingested in the project. Also, the data structure is flat and predictable. You can package our free IP to the Country ASN database with the project. For that, we will provide an access token that you can use. By using the IPinfo dataset, you can get both country-level geolocation information and ASN information from a single source.

Please let me know what you what you think. If you need any assistance, please let me know. Thanks.

Schema: https://ipinfo.io/developers/ip-to-country-asn-database

FIELD NAME EXAMPLE DATA TYPE DESCRIPTION
start_ip 1.0.16.0 TEXT Starting IP address of an IP address range
end_ip 1.0.31.255 TEXT Ending IP address of an IP address range
country JP TEXT ISO 3166 country code of the location
country_name Japan TEXT Name of the country
continent AS TEXT Continent code of the country
continent_name Asia TEXT Name of the continent
asn AS2519 TEXT Autonomous System Number
as_name ARTERIA Networks Corporation TEXT Name of the AS (Autonomous System) organization
as_domain arteria-net.com TEXT Official domain or website of the AS organization
@fujiapple852 fujiapple852 self-assigned this Dec 13, 2023
@fujiapple852 fujiapple852 added the enhancement New feature or request label Dec 13, 2023
@fujiapple852 fujiapple852 added this to the 0.10.0 milestone Dec 13, 2023
@fujiapple852
Copy link
Owner

Hi there @abdullahdevrel thanks for following up!

Trippy currently reads City data from mmdb files and so I think users would need to have the (premium?) IP Geolocation Extended to get that?

I tried downloading the sample in mmdb format but I was not able to get City data from it using the maxminddb crate, perhaps it does not support the ipinfo flavour of mmdb file?

Test code:

use std::net::{IpAddr, Ipv4Addr};
use maxminddb::geoip2::City;

fn main() {
    let reader = maxminddb::Reader::open_readfile("ip_geolocation_extended_ipv4_sample.mmdb").unwrap();
    let addr = IpAddr::V4(Ipv4Addr::from([50, 220, 147, 113]));
    let city_data = reader.lookup::<City<'_>>(addr);
    println!("{city_data:?}");
}

Fails with Err(DecodingError("invalid type: string \"Royal Oak\", expected struct City"))

(I tried decoding as Country as well)

Perhaps this is what you mean by the data being "flat"? Perhaps I have to deserialise to a custom struct with the "flat" structure? Is there a recommend mmdb reader crate for Rust that support ipinfo flavour of mmdb files?

You can package our free IP to the Country ASN database with the project

I'd prefer to allow user to bring their own files rather than bundle it, to keep size down and also to prevent stale data being used.

For that, we will provide an access token that you can use

I'm not quite sure what this is for, presumably the token is used for looking up the ipinfo API? I see you have an API for that. Would the token be something that could be bundled in Trippy for all users or just for development use? I prefer user-provided mmdb files over API access as Trippy will often be used in data centre environment with no external internet access.

By using the IPinfo dataset, you can get both country-level geolocation information and ASN information from a single source.

Just to note that Trippy currently get ASN data from the IP to ASN Mapping Service provided by Team Cymru via DNS TXT records, so it's mostly GeoIp (country, city, lat/long) that are needed.

@fujiapple852
Copy link
Owner

fujiapple852 commented Dec 14, 2023

With some trial and error I was able to figure out it is a HashMap<String, String> (I'm sure this is mentioned in your docs somewhere?):

reader.lookup::<HashMap<String, String>>(addr)

Doing that works:

Ok({"latitude": "42.48948", "longitude": "-83.14465", "postal_code": "48067", "radius": "500", "country": "US", "region": "Michigan", "network": "50.220.147.113-50.220.147.113", "timezone": "America/Detroit", "city": "Royal Oak", "geoname_id": "5007804"})

So that looks great.

The question now is, how does Trippy know if a given mmdb file is MaxMind or IpInfo flavoured? Is there some trick to figuring that out? I guess it could try both and see if either works?

@fujiapple852
Copy link
Owner

I see that the mmdb files have a metadata attribute which could help tell them apart. Comparing the MaxMind and IpInfo mmdb files I can see this for the database_type attribute:

MaxMind (GeoLite2-City.mmdb):

Metadata { database_type: "GeoLite2-City" }

IpInfo (ip_geolocation_extended_ipv4_sample.mmdb):

Metadata { database_type: "ipinfo ip_geolocation_extended_ipv4_sample.mmdb" }

So unlike the MaxMind file, the IpInfo file has a database_type with the format ipinfo <file>, is that guaranteed to be the case?

@fujiapple852
Copy link
Owner

WIP impl: #871

@fujiapple852
Copy link
Owner

fujiapple852 commented Dec 15, 2023

@abdullahdevrel I would like Trippy to be able to consume either the free "IP to Country + ASN Database" mmdb file or the
premium "IP to Geolocation Extended Database" mmdb file.

One quirk I notice is that the free "IP to Country + ASN Database" mmdb file has both country (code) and country_name fields whereas the premium "IP to Geolocation Extended Database" mmdb file has only the country.

From https://ipinfo.io/developers/ip-to-country-asn-database:

FIELD NAME EXAMPLE DATA TYPE DESCRIPTION
country JP TEXT ISO 3166 country code of the location
country_name Japan TEXT Name of the country

From https://ipinfo.io/developers/ip-to-geolocation-extended:

FIELD NAME EXAMPLE DATA TYPE DESCRIPTION
country US TEXT ISO 3166 country code of the location

Same story for continent.

@abdullahdevrel
Copy link
Author

Hey @fujiapple852

My apologies for the late response. I really appreciate you considering our data for Trippy.

Just an FYI, my Rust skill is not very good.

How does Trippy know if a given mmdb file is MaxMind or IpInfo flavoured? Is there some trick to figuring that out? I guess it could try both and see if either works?

That is a very good question. MaxMind uses a nested data structure for their MMDB databases, while IPinfo uses a flat data structure.

MaxMind data structure for MMDB:

image

IPinfo data structure for MMDB:

image

As you have seen in MaxMind's MMDB reader library, they have declared the structs themselves, so they have native support for their different database. In the case of IPinfo, you have to declare the struct based on database schema, which you have already done in #871.

IPinfo has a flat and predictable data structure. The key will return an empty string even if the value does not exist. And boolean values are strings with true and "" ().

For Rust, this is usually what I send to users: https://gist.github.com/abdullahdevrel/ace2c80bd53a7323a18bbf8c8ae6a4d2

So unlike the MaxMind file, the IpInfo file has a database_type with the format ipinfo , is that guaranteed to be the case?

Yes. The database_type information will be prefaced with ipinfo .

$ mmdbctl metadata ipinfo_country_asn.mmdb
- Binary Format 2.0
- Database Type ipinfo country_asn.mmdb
- IP Version    6
- Record Size   32
- Node Count    5458524
- Description
    en ipinfo country_asn.mmdb
- Languages     en
- Build Epoch   1702629871

I think this database_type value is added when the data is compiled from the CSV file to the MMDB database.

I'm not quite sure what this is for, presumably the token is used for looking up the ipinfo API? I see you have an API for that. Would the token be something that could be bundled in Trippy for all users or just for development use?

The access token is for downloading the IPinfo database. To download the database, users need to run a command like this:

curl -L [https://ipinfo.io/data/free/country_asn.mmdb?token=<ACCESS_TOKEN](https://ipinfo.io/data/free/country_asn.mmdb?token=%3CACCESS_TOKEN)>

Although our API supports 1,000 tokenless requests/day and 50,000 requests/month with a token. Compared to our free IP database, free API does provide city and zip code level information.

I would like Trippy to be able to consume either the free "IP to Country + ASN Database" mmdb file or the
premium "IP to Geolocation Extended Database" mmdb file.

We would love if you could use the "IP to Country + ASN Database". It is free and easily accessible for the project and the users, but it does not compromise accuracy at all. Support for this database would be incredible.

Here is the mmdb version of that database: https://www.transfernow.net/dl/20231218MUeQ39J8 (available for 7 days)

One quirk I notice is that the free "IP to Country + ASN Database" mmdb file has both country (code) and country_name fields whereas the premium "IP to Geolocation Extended Database" mmdb file has only the country.

We wanted to make the free IP to Country ASN database as accessible as possible. In our geolocation database, we do not provide the full country name or continent name, and we usually recommend users to use a reference object/dictionary for the full country name, currency, continent, isEu, etc.


For posterity

You have addressed this issue, but admittedly, I have not prepared the best Rust documentation. I am addressing it here in case someone stumbles upon this.

let city_data = reader.lookup::<City<'_>>(addr);
Fails with Err(DecodingError("invalid type: string "Royal Oak", expected struct City"))

This is due to IPinfo not having package native struct declarations. The user has to declare their own structs, and they should not declare a "generic argument" to the lookup function like <City<'_>>(addr).

Perhaps this is what you mean by the data being "flat"? Perhaps I have to deserialise to a custom struct with the "flat" structure? Is there a recommend mmdb reader crate for Rust that support ipinfo flavour of mmdb files?

Yes, this is spot on. The mmdb reader crate does its job of reading the mmdb files perfectly, however, as MaxMind developed this crate they have native support for their database through declaring the structs within the package.

IPinfo and MaxMind's databases are structured differently. So, when using IPinfo's database with mmdb reader crate, users need to declare the structs based on the database schema of the IPinfo database they are using.

Example:

@fujiapple852
Copy link
Owner

fujiapple852 commented Dec 19, 2023

Hi again @abdullahdevrel and thank you for the comprehensive reply!

The key will return an empty string even if the value does not exist

That is good to know, i'll adjust my impl accordingly to treat empty string as None (I don't think there are any boolean values to worry about here).

For Rust, this is usually what I send to users: https://gist.github.com/abdullahdevrel/ace2c80bd53a7323a18bbf8c8ae6a4d2

As yes, that works well.

and they should not declare a "generic argument" to the lookup function like <City<'_>>(addr).

Nit: note that these are equivalent (the latter infers the type parameter T from return type of lookup which much be IpinfoCountryASN to be assigned to record):

let record = reader.lookup::<IpinfoCountryASN>(ip_address).unwrap()
let record: IpinfoCountryASN = reader.lookup(ip_address).unwrap()

Yes. The database_type information will be prefaced with ipinfo .

Perfect, that was the key thing I needed to know.

We would love if you could use the "IP to Country + ASN Database". It is free and easily accessible for the project and the users, but it does not compromise accuracy at all. Support for this database would be incredible.

Trippy can certainly support that (it would use the country and continent names from that file, the AS data is not needed as it comes from elsewhere already).

Trippy can also support the extra attributes (city, postcode, lat/log/radius etc) provided by the premium files in a way where Trippy will look for and use these fields if available in the file provided. To put it another way, A user can provide either the "IP to Country + ASN Database" or the "IP to Geolocation Extended Database" mmdb file and Trippy will pick out the data is needs from either. Does that work?

Here is the mmdb version of that database: https://www.transfernow.net/dl/20231218MUeQ39J8 (available for 7 days)

Thank you, I have downloaded the file. Is this the same as the file I can download from https://ipinfo.io/account/data-downloads? (I registered account FujiApple on ipinfo.io a while ago).

@fujiapple852
Copy link
Owner

@abdullahdevrel if you could help check the tests I added in #871 then we should be able to merge this.

@fujiapple852
Copy link
Owner

Merged. This will be included in the 0.10.0 release of Trippy and will be mentioned in the release note.

@fujiapple852 fujiapple852 removed their assignment Jan 8, 2024
@abdullahdevrel
Copy link
Author

Thank you very much @fujiapple852!! Really appreciate it!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants