# Human Development Index and Components

![](https://www.linkpicture.com/q/map-of-the-world-1005413_1280.jpg)

In this notebook we will analyze the Human Development Index (HDI), life expectancy at birth, expected years of schooling, Gross National Income (GNI), the GNI per capita range minus the HDI range, according to data from the World Human Development Report 2021.

In [1]:
import pandas as pd



In [2]:
with open('Human Development Index and Components.csv') as f:
    print(f)

<_io.TextIOWrapper name='Human Development Index and Components.csv' mode='r' encoding='UTF-8'>


In [3]:

dataset = pd.read_csv('Human Development Index and Components.csv', encoding= 'unicode_escape')


In [4]:
human_development_df = dataset
human_development_df

Unnamed: 0,HDI rank,Country,HUMAN DEVELOPMENT,Human Development Index (HDI),Life expectancy at birth,Expected years of schooling,Mean years of schooling,Gross national income (GNI) per capita,GNI per capita rank minus HDI rank,HDI rank.1,Unnamed: 10,Unnamed: 11
0,1,Switzerland,VERY HIGH,0.962,84.0,16.5,13.9,66933,5,3,,
1,2,Norway,VERY HIGH,0.961,83.2,18.2,13.0,64660,6,1,,
2,3,Iceland,VERY HIGH,0.959,82.7,19.2,13.8,55782,11,2,,
3,4,"Hong Kong, China (SAR)",VERY HIGH,0.952,85.5,17.3,12.2,62607,6,4,,
4,5,Australia,VERY HIGH,0.951,84.5,21.1,12.7,49238,18,5,,
...,...,...,...,...,...,...,...,...,...,...,...,...
190,191,South Sudan,LOW,0.385,55.0,5.5,5.7,768,-1,191,,
191,192,Korea (Democratic People's Rep. of),OTHER,..,73.3,10.8,..,..,..,..,,
192,193,Monaco,OTHER,..,85.9,..,..,..,..,..,,
193,194,Nauru,OTHER,..,63.6,11.7,..,17730,..,..,,


## Data preparation & cleaning

In [6]:
type(human_development_df)

pandas.core.frame.DataFrame

In [7]:
human_development_df.shape

(195, 12)

Here can view the number of rows (195) and columns (12) in the data frame

Let's view basic information about the data frame.

In [8]:
human_development_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 195 entries, 0 to 194
Data columns (total 12 columns):
 #   Column                                  Non-Null Count  Dtype  
---  ------                                  --------------  -----  
 0   HDI rank                                195 non-null    int64  
 1   Country                                 195 non-null    object 
 2   HUMAN DEVELOPMENT                       195 non-null    object 
 3   Human Development Index (HDI)           195 non-null    object 
 4   Life expectancy at birth                195 non-null    float64
 5   Expected years of schooling             195 non-null    object 
 6   Mean years of schooling                 195 non-null    object 
 7   Gross national income (GNI) per capita  195 non-null    object 
 8   GNI per capita rank minus HDI rank      195 non-null    object 
 9   HDI rank.1                              195 non-null    object 
 10  Unnamed: 10                             0 non-null      float6

In [9]:
human_development_df.describe()

Unnamed: 0,HDI rank,Life expectancy at birth,Unnamed: 10,Unnamed: 11
count,195.0,195.0,0.0,0.0
mean,97.815385,71.277949,,
std,56.467551,7.746484,,
min,1.0,52.5,,
25%,49.5,65.7,,
50%,97.0,71.7,,
75%,146.0,76.7,,
max,195.0,85.9,,


In [10]:
human_development_df.columns

Index(['HDI rank', 'Country', 'HUMAN DEVELOPMENT',
       'Human Development Index (HDI) ', 'Life expectancy at birth',
       'Expected years of schooling', 'Mean years of schooling',
       'Gross national income (GNI) per capita',
       'GNI per capita rank minus HDI rank', 'HDI rank.1', 'Unnamed: 10',
       'Unnamed: 11'],
      dtype='object')

Handle missing, incorrect and invalid data

In [11]:
human_development_df.isnull().sum()

HDI rank                                    0
Country                                     0
HUMAN DEVELOPMENT                           0
Human Development Index (HDI)               0
Life expectancy at birth                    0
Expected years of schooling                 0
Mean years of schooling                     0
Gross national income (GNI) per capita      0
GNI per capita rank minus HDI rank          0
HDI rank.1                                  0
Unnamed: 10                               195
Unnamed: 11                               195
dtype: int64

In [12]:
human_development_df.isna()

Unnamed: 0,HDI rank,Country,HUMAN DEVELOPMENT,Human Development Index (HDI),Life expectancy at birth,Expected years of schooling,Mean years of schooling,Gross national income (GNI) per capita,GNI per capita rank minus HDI rank,HDI rank.1,Unnamed: 10,Unnamed: 11
0,False,False,False,False,False,False,False,False,False,False,True,True
1,False,False,False,False,False,False,False,False,False,False,True,True
2,False,False,False,False,False,False,False,False,False,False,True,True
3,False,False,False,False,False,False,False,False,False,False,True,True
4,False,False,False,False,False,False,False,False,False,False,True,True
...,...,...,...,...,...,...,...,...,...,...,...,...
190,False,False,False,False,False,False,False,False,False,False,True,True
191,False,False,False,False,False,False,False,False,False,False,True,True
192,False,False,False,False,False,False,False,False,False,False,True,True
193,False,False,False,False,False,False,False,False,False,False,True,True


As we can see in this data frame that the data in the columns unnamed 10, and unnamed11 are missing values.

So we are going to eliminate the columns that do not provide any data, such as the column  unname 10, and  unname 11 and creating a new dataframe.

In [13]:
human_develop_df = human_development_df.drop(['Unnamed: 10', 'Unnamed: 11'], axis=1)
human_develop_df

Unnamed: 0,HDI rank,Country,HUMAN DEVELOPMENT,Human Development Index (HDI),Life expectancy at birth,Expected years of schooling,Mean years of schooling,Gross national income (GNI) per capita,GNI per capita rank minus HDI rank,HDI rank.1
0,1,Switzerland,VERY HIGH,0.962,84.0,16.5,13.9,66933,5,3
1,2,Norway,VERY HIGH,0.961,83.2,18.2,13.0,64660,6,1
2,3,Iceland,VERY HIGH,0.959,82.7,19.2,13.8,55782,11,2
3,4,"Hong Kong, China (SAR)",VERY HIGH,0.952,85.5,17.3,12.2,62607,6,4
4,5,Australia,VERY HIGH,0.951,84.5,21.1,12.7,49238,18,5
...,...,...,...,...,...,...,...,...,...,...
190,191,South Sudan,LOW,0.385,55.0,5.5,5.7,768,-1,191
191,192,Korea (Democratic People's Rep. of),OTHER,..,73.3,10.8,..,..,..,..
192,193,Monaco,OTHER,..,85.9,..,..,..,..,..
193,194,Nauru,OTHER,..,63.6,11.7,..,17730,..,..


Seeing that our dataframe is clean of incorrect data, we can start to manipulate it to obtain more specific information.

fist let make a copy of data frame to be able to manipulate it.

In [15]:
human_development = human_development_df.copy()

In [16]:
human_development

Unnamed: 0,HDI rank,Country,HUMAN DEVELOPMENT,Human Development Index (HDI),Life expectancy at birth,Expected years of schooling,Mean years of schooling,Gross national income (GNI) per capita,GNI per capita rank minus HDI rank,HDI rank.1,Unnamed: 10,Unnamed: 11
0,1,Switzerland,VERY HIGH,0.962,84.0,16.5,13.9,66933,5,3,,
1,2,Norway,VERY HIGH,0.961,83.2,18.2,13.0,64660,6,1,,
2,3,Iceland,VERY HIGH,0.959,82.7,19.2,13.8,55782,11,2,,
3,4,"Hong Kong, China (SAR)",VERY HIGH,0.952,85.5,17.3,12.2,62607,6,4,,
4,5,Australia,VERY HIGH,0.951,84.5,21.1,12.7,49238,18,5,,
...,...,...,...,...,...,...,...,...,...,...,...,...
190,191,South Sudan,LOW,0.385,55.0,5.5,5.7,768,-1,191,,
191,192,Korea (Democratic People's Rep. of),OTHER,..,73.3,10.8,..,..,..,..,,
192,193,Monaco,OTHER,..,85.9,..,..,..,..,..,,
193,194,Nauru,OTHER,..,63.6,11.7,..,17730,..,..,,


Let's working 

In [17]:
human_development.head(10)

Unnamed: 0,HDI rank,Country,HUMAN DEVELOPMENT,Human Development Index (HDI),Life expectancy at birth,Expected years of schooling,Mean years of schooling,Gross national income (GNI) per capita,GNI per capita rank minus HDI rank,HDI rank.1,Unnamed: 10,Unnamed: 11
0,1,Switzerland,VERY HIGH,0.962,84.0,16.5,13.9,66933,5,3,,
1,2,Norway,VERY HIGH,0.961,83.2,18.2,13.0,64660,6,1,,
2,3,Iceland,VERY HIGH,0.959,82.7,19.2,13.8,55782,11,2,,
3,4,"Hong Kong, China (SAR)",VERY HIGH,0.952,85.5,17.3,12.2,62607,6,4,,
4,5,Australia,VERY HIGH,0.951,84.5,21.1,12.7,49238,18,5,,
5,6,Denmark,VERY HIGH,0.948,81.4,18.7,13.0,60365,6,5,,
6,7,Sweden,VERY HIGH,0.947,83.0,19.4,12.6,54489,9,9,,
7,8,Ireland,VERY HIGH,0.945,82.0,18.9,11.6,76169,-3,8,,
8,9,Germany,VERY HIGH,0.942,80.6,17.0,14.1,54534,6,7,,
9,10,Netherlands,VERY HIGH,0.941,81.7,18.7,12.6,55979,3,10,,


####  How many countries does the dataframe contain?

In [19]:
human_development.shape

(195, 12)

# Exploratory Analysis and Visualization

Now let's create visualizations of the data frames to get a better idea of the data so we can manipulate it to find information.

In [None]:
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (9, 5)
matplotlib.rcParams['figure.facecolor'] = '#00000000'