Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: updated dataset (2020-10-13) #80

Merged
merged 2 commits into from
Oct 20, 2020
Merged

Conversation

lidel
Copy link
Member

@lidel lidel commented Oct 19, 2020

This PR updates the IPv4 City geoip dataset to the latest free version provided on https://dev.maxmind.com/geoip/geoip2/geolite2/ (GeoLite2-City-CSV_20201013)

Closes #68
Closes #63

Changes

  • Refactored code behind npm run generate to be compatible with new input format without changing output format too much
    • BREAKING CHANGE: area_code and metro_code are no longer provided due to upstream changes
  • Generated a new b-tree from GeoLite2-City-CSV_20201013 dataset
    • Note: most of the naming bugs fixed in fix: update country/region names #78 got fixed in the upstream dataset,
      but I've kept our overrides just to protect against future regressions.

Not in this PR

  • no IPv6 – it is tracked in Add IPv6 support #60
  • did not change the DAG format – it still uses stringified JSON and ipfs.object API instead of ipfs.dag and dag-cbor

Test/Preview: ipfs-geoip on WebUI's Peers screen

I've run this PR against ipfs-webui and Peers screen looks good.

My node is located in Poland. I no longer see "USA" nodes with 30ms ping – faster-than-light networking is no more ;)

2020-10-19--16-10-23

With this updated dataset, we now see much better distribution across all continents:

Screenshot_2020-10-19 Peers IPFS(3)

This updates the IPv4 City geoip dataset to the latest version
provided on https://dev.maxmind.com/geoip/geoip2/geolite2/

The b-tree format and lookup logic remain unchanged, however the code
responsible for building the b-tree had to be changed to work with the
new CSV format of the source dataset.

Closes #68
Closes #63

BREAKING CHANGE: `area_code` and `metro_code` are no longer provided due to upstream changes

License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>
@lidel lidel marked this pull request as ready for review October 19, 2020 13:11
@lidel lidel changed the title feat: updated dataset (20201013) feat: updated dataset (2020-10-13) Oct 19, 2020
Copy link
Contributor

@jessicaschilling jessicaschilling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for following up on this!

// |- locationsCsv
// |- blocksCsv
const DATA_HASH = 'bafybeid3munsqqt36qhoumn3kvgwmft6dsswzgl3wiohsanlyqemczcsvi' // GeoLite2-City-CSV_20201013
const locationsCsv = 'GeoLite2-City-Locations-en.csv'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Are we going to use other languages in the near future? I see that we have available de, en, es, fr, ja, pt-BR, ru and zh-CN

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is certainly a possibility now, but not in the scope of this PR.
Created #81 to track that.

src/generate/index.js Outdated Show resolved Hide resolved
geonameData.push(
String(row.postal_code),
Number(row.latitude),
Number(row.longitude)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝Last two could end up as NaNs if inputs aren't valid numbers. If that is not valid output it would be worth to uphold invariant here.

Co-authored-by: Irakli Gozalishvili <contact@gozala.io>

License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>
@lidel lidel force-pushed the feat/GeoLite2-City-CSV_20201013 branch from f628658 to a1eba19 Compare October 20, 2020 19:55
@lidel lidel merged commit 1de0d2b into master Oct 20, 2020
@lidel lidel deleted the feat/GeoLite2-City-CSV_20201013 branch October 20, 2020 20:07
lidel added a commit to ipfs/public-gateway-checker that referenced this pull request Oct 20, 2020
Details: ipfs-shipyard/ipfs-geoip#80

License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>
lidel added a commit to ipfs/ipfs-webui that referenced this pull request Oct 21, 2020
lidel added a commit to ipfs/ipfs-webui that referenced this pull request Oct 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dataset Updating Plan Update the dataset
5 participants