3-letter iso country code #4

ckhung · 2023-04-15T04:48:50Z

Hi, thanks for your work!

It will be very interesting to "join" several datasets and study correlations among possibly vastly different topics such as sanitation and transportation. Using the 3-letter ISO country code as the joining key between different tables (instead of the full country name) would make it much easier. Therefore I wrote a small program country-encode.py to prepend every row with the 3-letter code and the continent in which the country is located. Might it be possible to integrate this idea and code into the entire dataset?

BTW, during the process I found that "Faroe Islands" are misspelled as "Faeroe Islands" in some files. Presently I see 52 files have this problem using this command: grep -l Faeroe */*.csv | wc

The text was updated successfully, but these errors were encountered:

Marigold · 2023-04-17T09:05:48Z

Thanks for your interest @ckhung! We're trying to "harmonise" country names to be consistent across datasets (see how we do it), so joins across datasets should work. It's possible there are some old datasets with non-harmonised names or typos, but the important and recent ones should be clean.

We have a similar countries regions table with ISO codes and harmonised country names you could use.

I'd also recommend to take a look at our catalog with python interface that let's you easily load entire datasets. Good luck with your data work and don't hesitate to give us feedback!

edomt · 2023-04-17T09:27:48Z

A quick note on the Faroe Islands: "Faeroe" is an alternative spelling that's less common now, but that's the one we've used historically in our database. (See merriam-webster.com)

ckhung · 2023-04-18T14:18:56Z

Thanks, @edomt for the note.

Thanks, @Marigold for the explanation and links. I only read a few pages of the etl project. The answer to my following question may just lie somewhere there, but if you could point me to a specific page it would be most helpful :-) How do I easily filter out entries representing aggregates (e.g. world, continent, G20, ...)?

Marigold · 2023-04-21T08:07:54Z

How do I easily filter out entries representing aggregates (e.g. world, continent, G20, ...)?

Sorry, I should have kept it simple! The easiest way is to load dataframe directly from our catalog with

import pandas as pd
df = pd.read_feather('https://catalog.ourworldindata.org/garden/regions/2023-01-01/regions/definitions.feather')
df.head()

that should give you all you need. We have more info about countries like common aliases, other non-iso codes, historical regions, etc. but you probably don't need that.

ckhung · 2023-04-21T14:10:12Z

Thank you very much! Yes, that's exactly what I need. Appreciate it!

ckhung closed this as completed Apr 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3-letter iso country code #4

3-letter iso country code #4

ckhung commented Apr 15, 2023

Marigold commented Apr 17, 2023

edomt commented Apr 17, 2023 •

edited

Loading

ckhung commented Apr 18, 2023

Marigold commented Apr 21, 2023

ckhung commented Apr 21, 2023

3-letter iso country code #4

3-letter iso country code #4

Comments

ckhung commented Apr 15, 2023

Marigold commented Apr 17, 2023

edomt commented Apr 17, 2023 • edited Loading

ckhung commented Apr 18, 2023

Marigold commented Apr 21, 2023

ckhung commented Apr 21, 2023

edomt commented Apr 17, 2023 •

edited

Loading