-
-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3-letter iso country code #4
Comments
Thanks for your interest @ckhung! We're trying to "harmonise" country names to be consistent across datasets (see how we do it), so joins across datasets should work. It's possible there are some old datasets with non-harmonised names or typos, but the important and recent ones should be clean. We have a similar countries regions table with ISO codes and harmonised country names you could use. I'd also recommend to take a look at our catalog with python interface that let's you easily load entire datasets. Good luck with your data work and don't hesitate to give us feedback! |
A quick note on the Faroe Islands: "Faeroe" is an alternative spelling that's less common now, but that's the one we've used historically in our database. (See merriam-webster.com) |
Thanks, @edomt for the note. Thanks, @Marigold for the explanation and links. I only read a few pages of the etl project. The answer to my following question may just lie somewhere there, but if you could point me to a specific page it would be most helpful :-) How do I easily filter out entries representing aggregates (e.g. world, continent, G20, ...)? |
Sorry, I should have kept it simple! The easiest way is to load dataframe directly from our catalog with
that should give you all you need. We have more info about countries like common aliases, other non-iso codes, historical regions, etc. but you probably don't need that. |
Thank you very much! Yes, that's exactly what I need. Appreciate it! |
Hi, thanks for your work!
It will be very interesting to "join" several datasets and study correlations among possibly vastly different topics such as sanitation and transportation. Using the 3-letter ISO country code as the joining key between different tables (instead of the full country name) would make it much easier. Therefore I wrote a small program country-encode.py to prepend every row with the 3-letter code and the continent in which the country is located. Might it be possible to integrate this idea and code into the entire dataset?
BTW, during the process I found that "Faroe Islands" are misspelled as "Faeroe Islands" in some files. Presently I see 52 files have this problem using this command:
grep -l Faeroe */*.csv | wc
The text was updated successfully, but these errors were encountered: