# About

The `country` column represents countries by two characters.  
There are several countries that cannot be identified to me.

This notebook shows you how to get country name from the `country` column using `pycountry`:  
https://github.com/flyingcircusio/pycountry

`pycountry` provides country name based on [ISO 3166](https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes).

# Prepare

## import

In [None]:
from __future__ import annotations  # for Type Hint
from pathlib import Path

import numpy as np
import pandas as pd

from joblib import Parallel, delayed

from tqdm.notebook import tqdm, trange

from matplotlib import pyplot as plt
import seaborn as sns

%matplotlib inline
sns.set()
pd.set_option("max_rows", 500)

## set constants

In [None]:
ROOT = Path.cwd().parent
INPUT = ROOT / "input"
DATA = INPUT / "foursquare-location-matching"
WORK = ROOT / "working"

## read data

In [None]:
train = pd.read_csv(DATA / "train.csv")
test = pd.read_csv(DATA / "test.csv")
# pairs = pd.read_csv(DATA / "pairs.csv")
smpl_sub = pd.read_csv(DATA / "sample_submission.csv")

# Get Country Name by pycountry

Fortunately, Kaggle Notebook environment includes `pycountry` ðŸ˜„

In [None]:
import pycountry as pc

In [None]:
countries = train["country"].drop_duplicates().dropna().sort_values(ignore_index=True)

In [None]:
# number of unique country
len(countries)

In [None]:
countries.head()

In [None]:
# example
pc.countries.get(alpha_2="AD")

In [None]:
country_list = []
for c in countries:
    c_obj = pc.countries.get(alpha_2=c)
    if c_obj is None:
        country_list.append([c, None, None, None])
        continue
    country_list.append([
        c_obj.alpha_2,
        c_obj.name,
        c_obj.official_name if hasattr(c_obj, "official_name") else None,
        c_obj.flag,
    ])

country_df = pd.DataFrame(
    country_list,
    columns=["country", "country_name",  "country_official_name", "country_flag"])
del country_list

In [None]:
country_df

But there are some unavailable countries ðŸ˜¢

In [None]:
country_df[country_df.country_name.isnull()]

# EOF