This notebook visualizes the country code not existent on ISO3166[1] country code ("AN", "EU", "XK", "XX").
This is also disscussed in a disscussion[2].

As disscussed in [2], 

* `AN`: Netherlands Antilles[3]
* `XK`: Kosovo (it's in the user-assigned code elements of ISO3166[4])
* `XX`: "The code XX is being used by WIPO as an indicator for unknown states, other entities or organizations"[4]
* `EU`: European Union (it's in the Exceptional reservations[5])

## Reference

- [1] https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes
- [2] https://www.kaggle.com/competitions/foursquare-location-matching/discussion/324387#1784805
- [3] https://en.wikipedia.org/wiki/Netherlands_Antilles
- [4] https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2#User-assigned_code_elements
- [5] https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2#Exceptional_reservations

In [None]:
!pip install nb-black > /dev/null

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import plotly.express as px
import plotly.offline as py
import plotly.graph_objects as go
import pyarrow.parquet as pq
import pyarrow as pa

from numpy import sin, cos, deg2rad
from plotly.offline import init_notebook_mode, iplot
from sklearn.metrics.pairwise import haversine_distances
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from tqdm.auto import tqdm

pd.set_option("max_colwidth", 256)
plt.style.use("ggplot")
init_notebook_mode(connected=True)

%load_ext lab_black

In [None]:
train = pd.read_csv("../input/foursquare-location-matching/train.csv")
country_of_world = pd.read_csv(
    "../input/countries-of-the-world/countries of the world.csv"
)
country_code = pd.read_csv("../input/country-list/data.csv")

In [None]:
country_code

In [None]:
cc_set_4sq = set(train["country"].unique())
cc_set_iso = set(country_code["Code"].to_list())

In [None]:
len(cc_set_iso & cc_set_4sq), len(cc_set_4sq), (cc_set_4sq - cc_set_iso)

In [None]:
class CFG:
    COUNTRIES = ["AN", "EU", "XK", "XX"]

In [None]:
def compose(df, fns):
    ret = df.copy()
    for fn in fns:
        ret = fn(ret)
    return ret

In [None]:
def filter_country(df):
    df = df.copy()
    countries = CFG.COUNTRIES
    df = df.query("country in @countries")
    return df

In [None]:
train_ext = compose(train, [filter_country])
del train

In [None]:
fig = px.scatter_geo(
    train_ext,
    lat="latitude",
    lon="longitude",
    color="country",
    title="Geo Distribution",
)
fig.update_geos(lataxis_showgrid=True, lonaxis_showgrid=True)
fig.update_layout(width=960, height=400, margin={"r": 0, "t": 30, "l": 0, "b": 0})
fig.show()

In [None]:
train_ext.groupby("country").count()[["id"]]