# Wrangling Countries & UN Regions
## By: Scott Kustes

### Objective:
Wrangle UN regions and sub-regions and associated countries for insertion into database.

#### Dataset:
The original dataset was downloaded here: https://unstats.un.org/unsd/methodology/m49/

In [1]:
# Import necessary packages
import pandas as pd

### Gather

In [13]:
countries = pd.read_csv( 'countries.csv' )
countries.head()

Unnamed: 0,Global Code,Global Name,Region Code,Region Name,Sub-region Code,Sub-region Name,Intermediate Region Code,Intermediate Region Name,Country or Area,M49 Code,ISO-alpha3 Code,Least Developed Countries (LDC),Land Locked Developing Countries (LLDC),Small Island Developing States (SIDS),Developed / Developing Countries
0,1,World,19.0,Americas,419.0,Latin America and the Caribbean,29.0,Caribbean,Bonaire - Sint Eustatius and Saba,535,BES,,,x,Developing
1,1,World,142.0,Asia,30.0,Eastern Asia,,,China - Hong Kong Special Administrative Region,344,HKG,,,,Developing
2,1,World,142.0,Asia,30.0,Eastern Asia,,,China - Macao Special Administrative Region,446,MAC,,,,Developing
3,1,World,2.0,Africa,15.0,Northern Africa,,,Algeria,12,DZA,,,,Developing
4,1,World,2.0,Africa,15.0,Northern Africa,,,Egypt,818,EGY,,,,Developing


### Initial Assessment

In [14]:
countries['Global Code'].unique()

array([1], dtype=int64)

In [15]:
countries['Global Name'].unique()

array(['World'], dtype=object)

#### Issues Found:
<ol>
    <li>Drop columns: Global Code and Global Name - only 1 unique value
    <li>Rename columns: replace spaces with underscores, replace uppercase with lowercase</li>
</ol>

### Initial Cleaning
#### 1) Drop columns Global Code and Global Name due to each having only 1 unique value (1 and World, respectively)

#### Code

In [22]:
countries.drop( columns=['Global Code','Global Name'], axis=1, inplace=True )

#### Test

In [24]:
countries.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 249 entries, 0 to 248
Data columns (total 13 columns):
Region Code                                248 non-null float64
Region Name                                248 non-null object
Sub-region Code                            248 non-null float64
Sub-region Name                            248 non-null object
Intermediate Region Code                   108 non-null float64
Intermediate Region Name                   108 non-null object
Country or Area                            249 non-null object
M49 Code                                   249 non-null int64
ISO-alpha3 Code                            248 non-null object
Least Developed Countries (LDC)            47 non-null object
Land Locked Developing Countries (LLDC)    32 non-null object
Small Island Developing States (SIDS)      53 non-null object
Developed / Developing Countries           248 non-null object
dtypes: float64(3), int64(1), object(9)
memory usage: 25.4+ KB


#### 2) Rename Columns
Replace spaces with underscores, replace uppercase letters with lowercase

#### Code

In [33]:
# Dictionary of new column names
column_names = {
    'Region Code': 'region_code',
    'Region Name': 'region_name',
    'Sub-region Code': 'subregion_code',
    'Sub-region Name': 'subregion_name',
    'Intermediate Region Code': 'intermediate_region_code',
    'Intermediate Region Name': 'intermediate_region_name',
    'Country or Area': 'country',
    'M49 Code': 'un_m49',
    'ISO-alpha3 Code': 'iso_alpha3',
    'Least Developed Countries (LDC)': 'least_developed_countries',
    'Land Locked Developing Countries (LLDC)': 'landlocked_developing_countries',
    'Small Island Developing States (SIDS)': 'small_island_developing_states',
    'Developed / Developing Countries': 'developed_developing_countries'
}

countries.rename( mapper=column_names, axis=1, inplace=True )

#### Test

In [34]:
countries.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 249 entries, 0 to 248
Data columns (total 13 columns):
region_code                        248 non-null float64
region_name                        248 non-null object
subregion_code                     248 non-null float64
subregion_name                     248 non-null object
intermediate_region_code           108 non-null float64
intermediate_region_name           108 non-null object
country                            249 non-null object
un_m49                             249 non-null int64
iso_alpha3                         248 non-null object
least_developed_countries          47 non-null object
landlocked_developing_countries    32 non-null object
small_island_developing_states     53 non-null object
developed_developing_countries     248 non-null object
dtypes: float64(3), int64(1), object(9)
memory usage: 25.4+ KB


### Assessment

In [35]:
countries[['region_code','region_name']].unique()

AttributeError: 'DataFrame' object has no attribute 'unique'

### Clean