# Cluster analysis to determine countries in need of financial aid

# Table of Contents

* [About the dataset](#about)
     * [Description](#description)
     * [Data exploration](#exploration)
* [Pre-processing](#processing)
* [Data visualization](#visualization)
* [Dimensionality reduction (PCA)](#pca)
     * [Scaling](#scaling)
     * [Normalization](#norm)
* [K-means clustering](#ml)

<a id="about" ></a>
# About dataset

We will cluster countries using socio, economic and health factors (based on numerical features) in order to allocate funding for country's development.

<a id="description" ></a>
# Description

## Features
- country : Name of the country
- child_mort : Death of children under 5 years of age per 1000 live births
- exports : Exports of goods and services per capita. Given as %age of the GDP per capita
- health : Total health spending per capita. Given as %age of GDP per capita
- imports : Imports of goods and services per capita. Given as %age of the GDP per capita
- Income : Net income per person
- Inflation : The measurement of the annual growth rate of the Total GDP
- life_expec : The average number of years a new born child would live if the current mortality patterns are to rem...
- total_fer : The number of children that would be born to each woman if the current age-fertility rates remain th...
- gdpp : The GDP per capita. Calculated as the Total GDP divided by the total population.

<a id="exploration" ></a>
# Data exploration

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

countries = pd.read_csv('../input/unsupervised-learning-on-country-data/Country-data.csv')
countries.head()

Unnamed: 0,country,child_mort,exports,health,imports,income,inflation,life_expec,total_fer,gdpp
0,Afghanistan,90.2,10.0,7.58,44.9,1610,9.44,56.2,5.82,553
1,Albania,16.6,28.0,6.55,48.6,9930,4.49,76.3,1.65,4090
2,Algeria,27.3,38.4,4.17,31.4,12900,16.1,76.5,2.89,4460
3,Angola,119.0,62.3,2.85,42.9,5900,22.4,60.1,6.16,3530
4,Antigua and Barbuda,10.3,45.5,6.03,58.9,19100,1.44,76.8,2.13,12200


<a id="processing" ></a>
# Pre-processing

<a id="visualization" ></a>
# Data visualization

<a id="pca" ></a>
# Dimensionality reduction (PCA)

<a id="scaling" ></a>
# Scaling

<a id="norm" ></a>
# Normalization 

<a id="ml" ></a>
# K-means clustering