## Handling country names using pycountry python package

pycountry Python package provides the ISO databases to match the standards for: Languages, Countries, Deleted countries, Subdivisions of countries, Currencies and Scripts. The ISO codes are as follows:

+ 3166 - Countries
+ 639 - Languages
+ 3166-3 - Deleted countries
+ 3166-2 - Subdivisions of countries
+ 4217 - Currencies
+ 15924 - Scripts

The package includes a copy from Debian's pkg-isocodes and makes the data accessible through a Python API.

Translation files for the various strings are included as well.

### Matching Country ISO codes

#### COUNTRIES (ISO 3166)
Countries are accessible through a database object that is already configured upon import of [pycountry](https://pypi.org/project/pycountry/) and works as an iterable.

In the first query to _pycountry_ let's import tha package and check how many countries are on the database (249 in total), secondly will check which is the first country on DB (Aruba), and last we will iterate over all records, will visualize the parameters for each country.

As we can easily inspect the alpha_2 parameter defines  ISO 3166 Alpha-2 code as described in the [International Standard](https://www.iban.com/country-codes), as well as alpha_3, and then the short name, the numeric code and the official name.

In [31]:
import pycountry
len(pycountry.countries)

249

In [32]:
list(pycountry.countries)[0]

Country(alpha_2='AW', alpha_3='ABW', name='Aruba', numeric='533')

In [33]:
for country in pycountry.countries:
    print(country)

Country(alpha_2='AW', alpha_3='ABW', name='Aruba', numeric='533')
Country(alpha_2='AF', alpha_3='AFG', name='Afghanistan', numeric='004', official_name='Islamic Republic of Afghanistan')
Country(alpha_2='AO', alpha_3='AGO', name='Angola', numeric='024', official_name='Republic of Angola')
Country(alpha_2='AI', alpha_3='AIA', name='Anguilla', numeric='660')
Country(alpha_2='AX', alpha_3='ALA', name='Åland Islands', numeric='248')
Country(alpha_2='AL', alpha_3='ALB', name='Albania', numeric='008', official_name='Republic of Albania')
Country(alpha_2='AD', alpha_3='AND', name='Andorra', numeric='020', official_name='Principality of Andorra')
Country(alpha_2='AE', alpha_3='ARE', name='United Arab Emirates', numeric='784')
Country(alpha_2='AR', alpha_3='ARG', name='Argentina', numeric='032', official_name='Argentine Republic')
Country(alpha_2='AM', alpha_3='ARM', name='Armenia', numeric='051', official_name='Republic of Armenia')
Country(alpha_2='AS', alpha_3='ASM', name='American Samoa', n

Specific countries can be looked up by their various codes and provide the information included in the standard as attributes. I will do the example with my Country of Birth, looking for the alpha_2 and 3 codes, numeric code, name and official_name, last one is the name I am not very proud of... but anyway...

In [34]:
Venezuela = pycountry.countries.get(alpha_2='VE')
print(Venezuela)
print(Venezuela.alpha_2)
print(Venezuela.alpha_3)
print(Venezuela.numeric)
print(Venezuela.name)
print(Venezuela.official_name)

Country(alpha_2='VE', alpha_3='VEN', common_name='Venezuela', name='Venezuela, Bolivarian Republic of', numeric='862', official_name='Bolivarian Republic of Venezuela')
VE
VEN
862
Venezuela, Bolivarian Republic of
Bolivarian Republic of Venezuela


The historic_countries database contains former countries that have been removed from the standard and are now included in ISO 3166-3, in addition to the existing ones. For example, let's look for the information of the former URSS, now days know as Russia.

In [35]:
URSS = pycountry.historic_countries.get(alpha_2='SU')
URSS

Country(alpha_2='SU', alpha_3='SUN', alpha_4='SUHH', name='USSR, Union of Soviet Socialist Republics', numeric='810', withdrawal_date='1992-08-30')

In [36]:
Russia = pycountry.countries.get(alpha_2='RU')
Russia

Country(alpha_2='RU', alpha_3='RUS', name='Russian Federation', numeric='643')

#### COUNTRY SUBDIVISIONS (ISO 3166-2)

The country subdivisions are a little more complex than the countries itself because they provide a nested and typed structure.

All subdivisons can be accessed directly, first let's check how many subdivisions exists for all the ISO Countries, and then which would be the first subdivision on that list, it corresponds to country_code **'AD'**, Andorra, and it is Canillo parish.

In [37]:
len(pycountry.subdivisions)

4844

In [38]:
list(pycountry.subdivisions)[0]

Subdivision(code='AD-02', country_code='AD', name='Canillo', parent_code=None, type='Parish')

Subdivisions can be accessed using their unique code and provide at least their code, name and type, in this example will query one of the venezuelan subdivisions, where specifically locates the capital city [Caracas](https://en.wikipedia.org/wiki/Caracas) in _Distrito Federal_.

In [39]:
venezuelan_state = pycountry.subdivisions.get(code='VE-A')
print(venezuelan_state)

Subdivision(code='VE-A', country_code='VE', name='Distrito Federal', parent_code=None, type='Federal District')


#### SCRIPTS (ISO 15924)

Scripts are available from a database similar to the countries, first will check how many scripts worldwide and their corresponding information.

In the second example will get the complete information of our mother tongue script, which is _Latin_

In [40]:
len(pycountry.scripts)

182

In [41]:
for scripts in pycountry.scripts:
    print(scripts)

Script(alpha_4='Adlm', name='Adlam', numeric='166')
Script(alpha_4='Afak', name='Afaka', numeric='439')
Script(alpha_4='Aghb', name='Caucasian Albanian', numeric='239')
Script(alpha_4='Ahom', name='Ahom, Tai Ahom', numeric='338')
Script(alpha_4='Arab', name='Arabic', numeric='160')
Script(alpha_4='Aran', name='Arabic (Nastaliq variant)', numeric='161')
Script(alpha_4='Armi', name='Imperial Aramaic', numeric='124')
Script(alpha_4='Armn', name='Armenian', numeric='230')
Script(alpha_4='Avst', name='Avestan', numeric='134')
Script(alpha_4='Bali', name='Balinese', numeric='360')
Script(alpha_4='Bamu', name='Bamum', numeric='435')
Script(alpha_4='Bass', name='Bassa Vah', numeric='259')
Script(alpha_4='Batk', name='Batak', numeric='365')
Script(alpha_4='Beng', name='Bengali', numeric='325')
Script(alpha_4='Bhks', name='Bhaiksuki', numeric='334')
Script(alpha_4='Blis', name='Blissymbols', numeric='550')
Script(alpha_4='Bopo', name='Bopomofo', numeric='285')
Script(alpha_4='Brah', name='Brahmi

In [42]:
latin = pycountry.scripts.get(name='Latin')
print(latin)

Script(alpha_4='Latn', name='Latin', numeric='215')


#### LANGUAGES (ISO 639)

As well as the scripts, we can get the languages from _ISO 639_ database. In a similar way as in the Scripts case, we will check how many languagues, and will iterate over the database, in order to save space in the Notebook, I will not print the result of the up to 7000 lnguages.

In [43]:
len(pycountry.languages)

7847

We could look for Spanish language by setting the **alpha_2** parameter as _es_, as well as English looking for the **alpha_2** _en_

In [44]:
spanish = pycountry.languages.get(alpha_2='es')
print(spanish)

Language(alpha_2='es', alpha_3='spa', name='Spanish', scope='I', type='L')


In [45]:
english = pycountry.languages.get(alpha_2='en')
print(english)

Language(alpha_2='en', alpha_3='eng', name='English', scope='I', type='L')


### Scraping list of countries

Now we will use the package called _pycountry_convert_ in order to get the continent names for every country in a list of countries that we will extract from wikipedia.

First we will import the _requests_ package that will connect to the wikipedia URL, then we will use the _BeautifulSoup_ package to manipulate that data.

In [46]:
import requests
website_url = requests.get('https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)').text

In [47]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(website_url,'lxml')

In [48]:
My_table = soup.find('table',{'class':'wikitable sortable'})

In [49]:
links = My_table.findAll('a')
links

[<a href="/wiki/Gross_world_product" title="Gross world product">World</a>,
 <a href="#cite_note-IMF_Groups-20">[19]</a>,
 <a href="/wiki/United_States" title="United States">United States</a>,
 <a href="/wiki/European_Union" title="European Union">European Union</a>,
 <a href="#cite_note-24">[23]</a>,
 <a href="#cite_note-EU_note-26">[n 1]</a>,
 <a href="/wiki/China" title="China">China</a>,
 <a href="#cite_note-China-THM-27">[n 2]</a>,
 <a href="/wiki/Japan" title="Japan">Japan</a>,
 <a href="/wiki/Germany" title="Germany">Germany</a>,
 <a href="/wiki/India" title="India">India</a>,
 <a href="/wiki/United_Kingdom" title="United Kingdom">United Kingdom</a>,
 <a href="/wiki/France" title="France">France</a>,
 <a href="/wiki/Italy" title="Italy">Italy</a>,
 <a href="/wiki/Brazil" title="Brazil">Brazil</a>,
 <a href="/wiki/Canada" title="Canada">Canada</a>,
 <a href="/wiki/Russia" title="Russia">Russia</a>,
 <a href="#cite_note-Russia-28">[n 3]</a>,
 <a href="/wiki/South_Korea" title="So

We will store the result of the request operation in the list _Countries_ and then will iterate each title on that list.

In [50]:
Countries = []
for link in links:
    Countries.append(link.get('title'))
print(Countries)

['Gross world product', None, 'United States', 'European Union', None, None, 'China', None, 'Japan', 'Germany', 'India', 'United Kingdom', 'France', 'Italy', 'Brazil', 'Canada', 'Russia', None, 'South Korea', 'Spain', 'Australia', 'Mexico', 'Indonesia', 'Netherlands', 'Saudi Arabia', 'Turkey', 'Switzerland', 'Taiwan', 'Poland', 'Thailand', 'Sweden', 'Belgium', 'Iran', 'Austria', 'Nigeria', 'Argentina', 'Norway', 'United Arab Emirates', 'Israel', 'Republic of Ireland', 'Hong Kong', 'Malaysia', 'Singapore', 'South Africa', 'Philippines', 'Denmark', 'Colombia', 'Bangladesh', 'Egypt', 'Chile', 'Pakistan', 'Finland', 'Vietnam', 'Czech Republic', 'Romania', 'Portugal', 'Peru', 'Iraq', 'Greece', 'New Zealand', 'Qatar', 'Algeria', 'Hungary', 'Kazakhstan', 'Ukraine', 'Kuwait', 'Morocco', 'Ecuador', 'Slovakia', 'Puerto Rico', 'Kenya', 'Angola', 'Ethiopia', 'Dominican Republic', 'Sri Lanka', 'Guatemala', 'Oman', 'Venezuela', 'Luxembourg', 'Panama', 'Ghana', 'Bulgaria', 'Myanmar', 'Tanzania', 'Bel

Then we will convert our list of countries in a DataFrame. Once it is a DataFrame, taking a look on that list we note that there are names that shouldn't be there like _Gross world product_ and _European Union_ so we will remove them from that list and the _None_ as well.

In [51]:
import pandas as pd
df = pd.DataFrame()
df['Country'] = Countries
df

Unnamed: 0,Country
0,Gross world product
1,
2,United States
3,European Union
4,
...,...
194,Federated States of Micronesia
195,Palau
196,Marshall Islands
197,Kiribati


In [52]:
World_Countries = df.loc[(df['Country'] !='Gross world product') & (df['Country'] !='European Union')]
World_Countries

Unnamed: 0,Country
1,
2,United States
4,
5,
6,China
...,...
194,Federated States of Micronesia
195,Palau
196,Marshall Islands
197,Kiribati


In [53]:
World_Countries.Country.unique()

array([None, 'United States', 'China', 'Japan', 'Germany', 'India',
       'United Kingdom', 'France', 'Italy', 'Brazil', 'Canada', 'Russia',
       'South Korea', 'Spain', 'Australia', 'Mexico', 'Indonesia',
       'Netherlands', 'Saudi Arabia', 'Turkey', 'Switzerland', 'Taiwan',
       'Poland', 'Thailand', 'Sweden', 'Belgium', 'Iran', 'Austria',
       'Nigeria', 'Argentina', 'Norway', 'United Arab Emirates', 'Israel',
       'Republic of Ireland', 'Hong Kong', 'Malaysia', 'Singapore',
       'South Africa', 'Philippines', 'Denmark', 'Colombia', 'Bangladesh',
       'Egypt', 'Chile', 'Pakistan', 'Finland', 'Vietnam',
       'Czech Republic', 'Romania', 'Portugal', 'Peru', 'Iraq', 'Greece',
       'New Zealand', 'Qatar', 'Algeria', 'Hungary', 'Kazakhstan',
       'Ukraine', 'Kuwait', 'Morocco', 'Ecuador', 'Slovakia',
       'Puerto Rico', 'Kenya', 'Angola', 'Ethiopia', 'Dominican Republic',
       'Sri Lanka', 'Guatemala', 'Oman', 'Venezuela', 'Luxembourg',
       'Panama', 'Ghana', 

In [54]:
World_Countries.shape

(197, 1)

#### Calling pycountry_convert package

Now is time to call the [pycountry_convert](https://pypi.org/project/pycountry-convert/) package, and let's demonstrate how to convert country names to country and continent codes.

In [55]:
import pycountry_convert as pc

country_code = pc.country_name_to_country_alpha2("China", cn_name_format="default")
print(country_code)
continent_name = pc.country_alpha2_to_continent_code(country_code)
print(continent_name)

CN
AS


In [56]:
#Function to to map country names
def country_to_continent(country_name):
    country_alpha2 = pc.country_name_to_country_alpha2(country_name)
    country_continent_code = pc.country_alpha2_to_continent_code(country_alpha2)
    country_continent_name = pc.convert_continent_code_to_continent_name(country_continent_code)
    return country_continent_name

In [57]:
country_name = 'India'
print(country_to_continent(country_name))

Asia


Now let's create a function to map continent names to **country_alpha2_to_continent_code** and **country_name_to_country_alpha2** elements from pycountry, this will return as well a list of the continents where those countries are located.

After that we'll merge both list (countrie and continents) into one, and then convert it to a pandas DataFrame.

In [58]:
from pycountry_convert import country_alpha2_to_continent_code, country_name_to_country_alpha2

continents = {
    'NA': 'North America',
    'SA': 'South America',
    'EU': 'European Union',
    'AS': 'Asia',
    'OC': 'Australia',
    'AF': 'Africa',
}

countries = ['United States', 'China', 'Japan', 'Germany', 'India',
       'United Kingdom', 'France', 'Italy', 'Brazil', 'Canada', 'Russia',
       'South Korea', 'Spain', 'Australia', 'Mexico', 'Indonesia',
       'Netherlands', 'Saudi Arabia', 'Turkey', 'Switzerland', 'Taiwan',
       'Poland', 'Thailand', 'Sweden', 'Belgium', 'Iran', 'Austria',
       'Nigeria', 'Argentina', 'Norway', 'United Arab Emirates', 'Israel',
       'Ireland', 'Hong Kong', 'Malaysia', 'Singapore',
       'South Africa', 'Philippines', 'Denmark', 'Colombia', 'Bangladesh',
       'Egypt', 'Chile', 'Pakistan', 'Finland', 'Vietnam',
       'Czech Republic', 'Romania', 'Portugal', 'Peru', 'Iraq', 'Greece',
       'New Zealand', 'Qatar', 'Algeria', 'Hungary', 'Kazakhstan',
       'Ukraine', 'Kuwait', 'Morocco', 'Ecuador', 'Slovakia',
       'Puerto Rico', 'Kenya', 'Angola', 'Ethiopia', 'Dominican Republic',
       'Sri Lanka', 'Guatemala', 'Oman', 'Venezuela', 'Luxembourg',
       'Panama', 'Ghana', 'Bulgaria', 'Myanmar', 'Tanzania', 'Belarus',
       'Costa Rica', 'Croatia', 'Uzbekistan', 'Syria', 'Uruguay',
       'Lebanon', 'Macau', 'Slovenia', 'Lithuania', 'Serbia',
       'Azerbaijan', 'Turkmenistan','Ivory Coast', 'Jordan', 'Bolivia', 
       'Paraguay', 'Tunisia', 'Cameroon', 'Bahrain', 'Latvia', 'Libya', 'Estonia', 'Sudan',
       'Uganda', 'Yemen', 'Nepal', 'El Salvador', 'Cambodia', 'Honduras',
       'Cyprus', 'Zambia', 'Senegal', 'Iceland', 'Papua New Guinea',
       'Trinidad and Tobago', 'Bosnia and Herzegovina', 'Laos',
       'Afghanistan', 'Botswana', 'Mali', 'Gabon', 'Georgia',
       'Jamaica', 'Albania', 'Mozambique', 'Malta', 'Burkina Faso',
       'Mauritius', 'Benin', 'Namibia', 'Mongolia', 'Armenia', 'Guinea',
       'Zimbabwe', 'North Macedonia', 'Bahamas', 'Madagascar',
       'Nicaragua', 'Brunei', 'Equatorial Guinea', 'Moldova',
       'Chad', 'Rwanda', 'Niger', 'Haiti',
       'Kyrgyzstan', 'Tajikistan', 'Malawi', 'Maldives', 'Togo',
       'Mauritania', 'Montenegro', 'Fiji', 'Barbados', 'Somalia',
       'Eswatini', 'Sierra Leone', 'Guyana', 'Suriname', 'South Sudan',
       'Burundi', 'Liberia', 'Djibouti', 'Aruba', 'Bhutan',
       'Lesotho', 'Central African Republic', 'Eritrea', 'Belize',
       'Saint Lucia', 'Gambia', 'Antigua and Barbuda', 'Seychelles',
       'San Marino', 'Solomon Islands', 'Grenada', 'Comoros',
       'Saint Kitts and Nevis', 'Vanuatu', 'Samoa',
       'Saint Vincent and the Grenadines', 'Dominica', 'Tonga',
       'Federated States of Micronesia', 'Palau']

continents = [continents[country_alpha2_to_continent_code(country_name_to_country_alpha2(country))] for country in countries]

In [59]:
countries_list = pd.DataFrame(
    {'Country': countries,
     'Continents': continents,
    })

countries_list

Unnamed: 0,Country,Continents
0,United States,North America
1,China,Asia
2,Japan,Asia
3,Germany,European Union
4,India,Asia
...,...,...
178,Saint Vincent and the Grenadines,North America
179,Dominica,North America
180,Tonga,Australia
181,Federated States of Micronesia,Australia


We can continue adding more features to any list of countries extracted from the web of a csv table imported to pandas, using the pycountry and pycountry-convert packages.