# Practice Exercise: Countries registered in the UN
**by: Gibrán Mendoza Magaña**

---

## Introduction

When analysing world data, it can be challenging merging different datasets especially when the spelling of the country in a database might defer from that of the other data sources. This is why the ISO code of a country comes in handy. Swaziland? Eswatini? Try SWZ instead. Czech Republic? Czechia? Let's use CZE.

## Objective

In this notebook, a function is created to retrieve the list of country names (in English) and ISO code for each country from the United Nations Statistics Division website.

## Setup

For this project, the following libraries were used:

* urllib.- A python package that provides several modules for working with URLs, such as opening, reading, parsing, and handling errors.
* requests.- A python library that simplifies working with HTTP requests. It allows you to send and receive data from web services with different methods, such as GET, POST, PUT, and DELETE
* BeautifulSoup.- A Python library that helps you extract data from HTML and XML files.
* Pandas.- A Python library that allows you to create manipulable data structures (DataFrames), among it's functionalities are: reading and saving csv an xlsx files.
* io.- A Python Library that allows you to manage the file-related input and output operations of python.

## Libraries

In [1]:
import os
import pandas as pd
import requests
from bs4 import BeautifulSoup
import urllib.request
import numpy as np

In [2]:
path = os.getcwd()

## Setting our url

In [3]:
# Send an HTTP GET request to the URL
#We define our user agent (what type of client is making the request)
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.9999.99 Safari/537.36'}
#This is the URL to the United Nations Statistics Division website
url = "https://unstats.un.org/unsd/methodology/m49/"

In [4]:
def get_country_members(url, headers):
    '''
    This function creates a beautifulsoup that parses the url provided, then finds all the elements on the table with countries 
    and ISO codes in the website. Finally, the list of countries and iso are transpose into a single pandas dataframe.
    
    param url: the url of the UN Members page, with a list of url linking to the about country webpage.
    param headers: a dictionary that defines the user agent doing this request
    
    return: pandas dataframe with a country name (in English) in the first column, and the country's ISO in the second column
    '''
    
    def get_parsed_html(url, headers):
        '''
        param url: string of a url to be scraped
        param headers: a dictionary that defines the user agent doing this request
        return: bs4.BeautifulSoup
        '''
        mybytes = urllib.request.urlopen(urllib.request.Request(url, headers=headers)).read().decode("utf8")
        return BeautifulSoup(mybytes, features="lxml")

    def get_download_urls(parsed_html):
        '''
        param parsed_html: bs4.BeautifulSoup object
        return: list of country name, M49 code, and ISO code. 
        Each of these values are located in a separate row. i.e ['Afghanistan','004','AFG',...]
        '''
        
        eng_data = parsed_html.body.find('div', attrs={'id':'ENG_COUNTRIES'})
        table_data = eng_data.find_all('td')
        countries = [country.text.strip() for country in table_data]
        return countries
    
    def format_countries(countries):
        '''
        param countries: list
        return: pandas dataframe with country name in first column 
        and ISO code in second column
        '''
        
        country_name = countries[::3]
        country_iso = countries[2:][::3]
        
        countries_iso = pd.DataFrame({'Country':country_name, 'ISO':country_iso})

        return countries_iso

    parsed_html = get_parsed_html(url, headers)
    
    countries = get_download_urls(parsed_html)
        
    countries_iso = format_countries(countries)
    
    return countries_iso

### Countries and Regions listed in the United Nations

In [5]:
UN_countries = get_country_members(url, headers)

In [6]:
UN_countries.shape

(248, 2)

In [7]:
UN_countries.head()

Unnamed: 0,Country,ISO
0,Afghanistan,AFG
1,Åland Islands,ALA
2,Albania,ALB
3,Algeria,DZA
4,American Samoa,ASM


In [8]:
UN_countries.tail()

Unnamed: 0,Country,ISO
243,Wallis and Futuna Islands,WLF
244,Western Sahara,ESH
245,Yemen,YEM
246,Zambia,ZMB
247,Zimbabwe,ZWE
