Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: update country names #76

Merged
merged 8 commits into from
Sep 1, 2020
Merged

fix: update country names #76

merged 8 commits into from
Sep 1, 2020

Conversation

jessicaschilling
Copy link
Contributor

@jessicaschilling jessicaschilling commented Aug 28, 2020

Summary

This PR updates country names for typographical consistency, simplicity (favoring shorter common names over full official ones), and political sensitivity. Full list of updates is below.

This PR addresses the need in ipfs/ipfs-webui#1603 , but that issue will need updating to the resulting ipfs-geoip release in order to close.

Done

To do

  • npm run generate to get new generated DAG representation (I'm getting a 405 error when I try to do this)
  • Update src/lookup.js#L9 for the new generated hash
  • Update README.md#L118 with the new generated hash
  • Release (as 4.2.0?) and update README.md#L28 accordingly

Updates made

Antigua And Barbuda -> Antigua and Barbuda
Bosnia & Herzegovina -> Bosnia and Herzegovina
"Bolivia, Plurinational State Of" -> Bolivia
"Bonaire, Saint Eustatius And Saba" -> "Bonaire, Saint Eustatius and Saba"
Democratic Republic Of Congo -> Democratic Republic of Congo
Republic Of Congo -> Republic of Congo
Czech Republic -> Czechia
"Ceuta, Mulilla" -> Ceuta and Mulilla
"Micronesia, Federated States Of" -> Micronesia
"France, Metropolitan" -> Metropolitan France
South Georgia And The South Sandwich Islands -> South Georgia and the South Sandwich Islands
Guinea-bissau -> Guinea-Bissau
Heard Island And McDonald Islands -> Heard Island and McDonald Islands
Isle Of Man -> Isle of Man
"Iran, Islamic Republic Of" -> Iran
Saint Kitts And Nevis -> Saint Kitts and Nevis
"Korea, Democratic People's Republic Of" -> DPR Korea
"Korea, Republic Of" -> Republic of Korea
Lao People's Democratic Republic -> Laos
"Macedonia, The Former Yugoslav Republic Of" -> North Macedonia
Saint Pierre And Miquelon -> Saint Pierre and Miquelon
"Palestinian Territory, Occupied" -> Palestine
"Saint Helena, Ascension And Tristan Da Cunha" -> "Saint Helena, Ascension and Tristan da Cunha"
Svalbard And Jan Mayen -> Svalbard and Jan Mayen
Syrian Arab Republic -> Syria
Turks And Caicos Islands -> Turks and Caicos Islands
Tristan de Cunha -> Tristan da Cunha
Trinidad And Tobago -> Trinidad and Tobago
"Taiwan, Province Of China" -> Taiwan
"Tanzania, United Republic Of" -> Tanzania
United States -> USA
Vatican City State -> Vatican City
Saint Vincent And The Grenadines -> Saint Vincent and the Grenadines
"Venezuela, Bolivarian Republic Of" -> Venezuela
Virgin Islands (British) -> British Virgin Islands
Virgin Islands (US) -> US Virgin Islands
Viet Nam -> Vietnam
Wallis And Futuna -> Wallis and Futuna

@jessicaschilling jessicaschilling marked this pull request as draft August 28, 2020 19:36
@jessicaschilling
Copy link
Contributor Author

@hacdias - are you able to help me regenerate this? Thank you!

cc @lidel for continuity.

@lidel lidel changed the title [WIP] Update country names fix: update country names Aug 29, 2020
@lidel
Copy link
Member

lidel commented Aug 29, 2020

@jessicaschilling the error looked familiar, so I fixed it, but side effects turned out to be yet another 🕳️ 🐇

Changes:

  • updated deps, including ipfsd-ctl and refactored tests to work with new apis
  • I thought generation is broken, but it just took a long time, so I've improved progress reporting to see whats happening:

    2020-08-29--01-38-34

I've started npm run generate but it takes long time on my machine (dying disk?), so was unable to confirm if output is correct. EOD for me, but I've left it running, will get back to this when done tomorrow/monday (or you can try running it locally).

ps. CI fails because commit titles in this repo need to follow type: title convention, and this PR has one commit which breaks that (we should rebase and fix the title at some point)

ps2. I worry those manual changes are not future proof, and will be pain to re-apply on updates. Perhaps we could have a normalization function normalizedName that modifies known problematic country names on the fly?
I think

-acc[row.alpha2] = row.name
+acc[row.alpha2] = normalizedName(row.name)

would do the trick here:

acc[row.alpha2] = row.name

This way we would not have to touch the source data.

@lidel
Copy link
Member

lidel commented Aug 29, 2020

> bin/generate

Finished with root hash QmVp6myP4AzFAPuP7rWS7oa4kPu76BP9dd7ahvtyEyfEnM

@jessicaschilling try fetching my fixes and running npm run generate – lets see if you get the same CID (you need IPFS node running while you do it)

@jessicaschilling
Copy link
Contributor Author

jessicaschilling commented Aug 30, 2020

Unfortunately npm run generate fails on Error: Cannot find module 'it-concat', and npm install then Failed at the bcrypto@5.1.0 install script when trying to install it-concat. Can't install brcypto directly, either. I'm afraid I'm a bit stuck.

@lidel
Copy link
Member

lidel commented Aug 31, 2020

Tested but does not pass.
Seems that a lot of stuff changed since geoip was created, APIs changed all over the place breaking random parts. Needs additional investigation.
Will block some time today to figure out if it can be fixed, or if its better to rewrite entire thing as part of #63

@hacdias
Copy link
Member

hacdias commented Aug 31, 2020

Thanks for stepping into this @lidel!

@lidel
Copy link
Member

lidel commented Aug 31, 2020

Quick update:

  • switched to CIDs as most of APIs return CID objects now, and lookup works as expected with the old index (QmRn43NNNBEibc6m7zVNcS6UusB1u3qTTfyoLmkugbeeGJ)
  • the new index at QmVp6myP4AzFAPuP7rWS7oa4kPu76BP9dd7ahvtyEyfEnM is 41MB while old one at QmRn43NNNBEibc6m7zVNcS6UusB1u3qTTfyoLmkugbeeGJ was 170MB.
    • old and new inputs are 98MB, confirmed the same problem occurs for original source files
    • most likely there is a bug in generate logic which skips a lot of data (unexpected result of API changes somewhere)
      • initial suspect is locations which should be 600k but parsed CSV returns empty array
      • will dig bit more after meetings today

@jessicaschilling
Copy link
Contributor Author

Thanks so much for the detour for this. LMK if there's any way I can be more useful.

jessicaschilling and others added 8 commits September 1, 2020 02:24
License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>
License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>
The error looked familiar, so I fixed it locally:
- needed update of ipfsd-ctl and refactor of lookup tests
- refactor  JS APIs to Async Iterators where needed.

I thought generation is broken, but it simply took a long time,
so I've improved progress reporting.

License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>
License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>
Note we still use the old index for lookups, so this is
backward-compatible switch.

License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>
This change keeps source CSV intact. Instead, rules from
src/generate/overrides.js are applied during the build.

This way overrides can be re-applied to any future updates to geoip data.

A bug where postal codes were returned as Number instead of String is
also fixed.

License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>
License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>
License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>
@lidel lidel marked this pull request as ready for review September 1, 2020 00:54
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this should be good and ready for final checks.

Smoke-tested browser bundle locally with https://ipfs.github.io/public-gateway-checker/ and worked fine, showing USA instead of United States when hovering on flag 👌

My changes:

  • switched back to the original CSV data, overrides are now future proof, applied during the generate task
  • fixed papercut API changes, including external libraries for parsing csv and latin1 conversion
  • fixed a bug where postal and other codes were casted as numbers in countries where those are digits-only
    • not sure if anyone used postal codes (our gui apps did not), ut it is a breaking change so needs to be released as a major version
  • fixed CI by rewriting history of this PR to fix the commit title and dropping Node 10 support

@jessicaschilling
Copy link
Contributor Author

jessicaschilling commented Sep 1, 2020

@lidel Thanks so much for all this unexpected work!

The overrides look perfect, with one question: Some original items included double quotes to escape out commas in country names. In the overrides, some remove the comma and thus the need for double quotes (e.g. "Bolivia, Plurinational State Of" -> Bolivia) but some don't (e.g. "Bonaire, Saint Eustatius And Saba" -> "Bonaire, Saint Eustatius and Saba"). I believe your overrides.js may leave us in the situations of double quotes when we don't really need them -- do we care?

I don't have the permissions to cut a new release or update to npm, but if you do, I can take care of the remaining steps in ipfs/ipfs-webui#1603. Let me know if there's anything else I can do. 🙏

@lidel
Copy link
Member

lidel commented Sep 1, 2020

The double quotes story is pretty fun, and probably there is a commentary on the state of our civilization somewhere in there:

The original double quotes were part of .csv, source file format which uses comma as a field separator. Double quotes around strings are necessary in names which include comma (,), to keep those additional commas from being interpreted as field separators. Overrides execute after data is read from CSV, and at that stage of processing the double quotes around the name are already gone. So no issue here.

You may notice that some names in overrides.js have single quotes, and some use double ones, and those are different ones than in .csv. This is for similar-but-different reason: in JS we use single quotes by default, but when a string includes ' (and some names do), then use of double ones makes it work and is easier on the eyes than escaping '.


Anyway, I'll merge this and release later today, when done will comment on the ipfs-webui issue.

@lidel lidel merged commit 670b55b into ipfs-shipyard:master Sep 1, 2020
src/generate/overrides.js Show resolved Hide resolved
src/generate/overrides.js Show resolved Hide resolved
src/generate/overrides.js Show resolved Hide resolved
@jessicaschilling jessicaschilling deleted the chore/update-countrynames branch September 1, 2020 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants