# Unsupervised Learning on Country Data

**Clustering the Countries by using Unsupervised Learning for HELP International
Objective:**
To categorize the countries using socio-economic and health factors that determine the overall development of the country.

**About organization:**
HELP International is an international humanitarian NGO that is committed to fighting poverty and providing the people of backward countries with basic amenities and relief during the time of disasters and natural calamities.

**Problem Statement:**
HELP International have been able to raise around $ 10 million. Now the CEO of the NGO needs to decide how to use this money strategically and effectively. So, CEO has to make decision to choose the countries that are in the direst need of aid. Hence, your Job as a Data scientist is to categorize the countries using some socio-economic and health factors that determine the overall development of the country. Then you need to suggest the countries which the CEO needs to focus on the most.

## Data Information

| Column Name    | Description                                                                                                 |
|----------------|-------------------------------------------------------------------------------------------------------------|
| **country**    | Name of the country.                                                                                         |
| **child_mort** | Death of children under 5 years of age per 1000 live births. (Metric: Mortality rate per 1000 live births)   |
| **exports**    | Exports of goods and services per capita. Given as %age of the GDP per capita. (Metric: Percentage of GDP per capita attributed to exports) |
| **health**     | Total health spending per capita. Given as %age of GDP per capita. (Metric: Percentage of GDP per capita spent on healthcare) |
| **imports**    | Imports of goods and services per capita. Given as %age of the GDP per capita. (Metric: Percentage of GDP per capita attributed to imports) |
| **income**     | Net income per person. (Metric: Income per capita, Net)                                                      |
| **inflation**  | The measurement of the annual growth rate of the Total GDP. (Metric: Annual growth rate of the country's GDP) |
| **life_expec** | The average number of years a new born child would live if the current mortality patterns remain the same. (Metric: Life expectancy at birth) |
| **total_fer**  | The number of children that would be born to each woman if the current age-fertility rates remain the same. (Metric: Total fertility rate, children per woman) |
| **gdpp**       | The GDP per capita. Calculated as the Total GDP divided by the total population. (Metric: Gross Domestic Product per capita) |






### I. Importing Libraries

In [16]:
import pandas as pd
import numpy as np
import seaborn as sns


### II. Import Data

In [18]:
df = pd.read_csv('../data/raw/Country-data.csv')
df.head()

Unnamed: 0,country,child_mort,exports,health,imports,income,inflation,life_expec,total_fer,gdpp
0,Afghanistan,90.2,10.0,7.58,44.9,1610,9.44,56.2,5.82,553
1,Albania,16.6,28.0,6.55,48.6,9930,4.49,76.3,1.65,4090
2,Algeria,27.3,38.4,4.17,31.4,12900,16.1,76.5,2.89,4460
3,Angola,119.0,62.3,2.85,42.9,5900,22.4,60.1,6.16,3530
4,Antigua and Barbuda,10.3,45.5,6.03,58.9,19100,1.44,76.8,2.13,12200


In [7]:
dd = pd.read_csv('../data/data-dictionary.csv')
dd.head()

Unnamed: 0,Column Name,Description
0,country,Name of the country
1,child_mort,Death of children under 5 years of age per 100...
2,exports,Exports of goods and services per capita. Give...
3,health,Total health spending per capita. Given as %ag...
4,imports,Imports of goods and services per capita. Give...


In [8]:
print(f"Dataset shape: {df.shape[0]} rows, {df.shape[1]} columns")

Dataset shape: 167 rows, 10 columns


In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 167 entries, 0 to 166
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   country     167 non-null    object 
 1   child_mort  167 non-null    float64
 2   exports     167 non-null    float64
 3   health      167 non-null    float64
 4   imports     167 non-null    float64
 5   income      167 non-null    int64  
 6   inflation   167 non-null    float64
 7   life_expec  167 non-null    float64
 8   total_fer   167 non-null    float64
 9   gdpp        167 non-null    int64  
dtypes: float64(7), int64(2), object(1)
memory usage: 13.2+ KB


In [11]:
df.describe()

Unnamed: 0,child_mort,exports,health,imports,income,inflation,life_expec,total_fer,gdpp
count,167.0,167.0,167.0,167.0,167.0,167.0,167.0,167.0,167.0
mean,38.27006,41.108976,6.815689,46.890215,17144.688623,7.781832,70.555689,2.947964,12964.155689
std,40.328931,27.41201,2.746837,24.209589,19278.067698,10.570704,8.893172,1.513848,18328.704809
min,2.6,0.109,1.81,0.0659,609.0,-4.21,32.1,1.15,231.0
25%,8.25,23.8,4.92,30.2,3355.0,1.81,65.3,1.795,1330.0
50%,19.3,35.0,6.32,43.3,9960.0,5.39,73.1,2.41,4660.0
75%,62.1,51.35,8.6,58.75,22800.0,10.75,76.8,3.88,14050.0
max,208.0,200.0,17.9,174.0,125000.0,104.0,82.8,7.49,105000.0


#### III. Exploratory Data Analysis

#### IV. Data Processing

In [14]:
df.shape

(167, 10)

In [12]:
df.isnull().sum()

country       0
child_mort    0
exports       0
health        0
imports       0
income        0
inflation     0
life_expec    0
total_fer     0
gdpp          0
dtype: int64

#### V. Model Training

#### VI. Hyperparameter Tuning

#### VII. Model Evaluation