World bank has defined different types of income as follows

"For the current 2021 fiscal year, low-income economies are defined as those with a GNI per capita, calculated using the World Bank Atlas method, of $1,035 or less in 2019, lower middle-income economies are those with a GNI per capita between $1,036 and $4,045, upper middle-income economies are those with a GNI per capita between $4,046 and $12,535, high-income economies are those with a GNI per capita of $12,536 or more."

- Low income = $1,035 or less

- Lower-middle income = $1,036 to $4,045

- Upper-middle income = $4,046 to $12,535

- High income = $12,536 or more

Source: https://datahelpdesk.worldbank.org/knowledgebase/articles/906519-world-bank-country-and-lending-groups


In [51]:
import requests
import pandas as pd
import numpy as np
#from plotly.offline import iplot
#import cufflinks as cf
#cf.go_offline(connected=True)

import plotly.express as px


In [2]:
# base end_point = http://api.worldbank.org/v2/country

<mark> Making a request to retrive data from the endpoint adding json format as parameter

In [3]:
param = {'page':1}
response = requests.get('http://api.worldbank.org/v2/country/?format=json',params=param)

In [4]:
response.url

'http://api.worldbank.org/v2/country/?format=json&page=1'

In [5]:
response.status_code

200

In [6]:
response.json()

[{'page': 1, 'pages': 7, 'per_page': '50', 'total': 304},
 [{'id': 'ABW',
   'iso2Code': 'AW',
   'name': 'Aruba',
   'region': {'id': 'LCN',
    'iso2code': 'ZJ',
    'value': 'Latin America & Caribbean '},
   'adminregion': {'id': '', 'iso2code': '', 'value': ''},
   'incomeLevel': {'id': 'HIC', 'iso2code': 'XD', 'value': 'High income'},
   'lendingType': {'id': 'LNX', 'iso2code': 'XX', 'value': 'Not classified'},
   'capitalCity': 'Oranjestad',
   'longitude': '-70.0167',
   'latitude': '12.5167'},
  {'id': 'AFG',
   'iso2Code': 'AF',
   'name': 'Afghanistan',
   'region': {'id': 'SAS', 'iso2code': '8S', 'value': 'South Asia'},
   'adminregion': {'id': 'SAS', 'iso2code': '8S', 'value': 'South Asia'},
   'incomeLevel': {'id': 'LIC', 'iso2code': 'XM', 'value': 'Low income'},
   'lendingType': {'id': 'IDX', 'iso2code': 'XI', 'value': 'IDA'},
   'capitalCity': 'Kabul',
   'longitude': '69.1761',
   'latitude': '34.5228'},
  {'id': 'AFR',
   'iso2Code': 'A9',
   'name': 'Africa',
   'reg

<mark> Above we can see that the data has been extracted from page 1. There are 7 pages and each page has 50 data points. And there are total 304 data points. I will extract all the data from the 7 pages. In addition to this, I will choose relevant attributes for further data analysis. I will choose the following attributes.
- iso2Code
- name
- region
- incomeLevel
- capitalCity
- longitude
- latitude
</mark>

In [7]:
countries = response.json()[1]

In [8]:
countries[0]['region']

{'id': 'LCN', 'iso2code': 'ZJ', 'value': 'Latin America & Caribbean '}

In [9]:
list1 = []

for page in range(1,8): 

    param = {'page':page}    
    response = requests.get('http://api.worldbank.org/v2/country/?format=json',params=param)
    #print(response.json())
    countries = response.json()[1]
    
    for country in countries:
        #print(country)
        all_countries = {}
        all_countries['iso2Code']=country['iso2Code']
        all_countries['name']=country['name']
        all_countries['region']=country['region']
        all_countries['incomeLevel']=country['incomeLevel']
        all_countries['capitalCity']=country['capitalCity']
        all_countries['longitude']=country['longitude']
        all_countries['latitude']=country['latitude']
        list1.append(all_countries)
  
   
   
  
    
    

In [10]:
list1

[{'iso2Code': 'AW',
  'name': 'Aruba',
  'region': {'id': 'LCN',
   'iso2code': 'ZJ',
   'value': 'Latin America & Caribbean '},
  'incomeLevel': {'id': 'HIC', 'iso2code': 'XD', 'value': 'High income'},
  'capitalCity': 'Oranjestad',
  'longitude': '-70.0167',
  'latitude': '12.5167'},
 {'iso2Code': 'AF',
  'name': 'Afghanistan',
  'region': {'id': 'SAS', 'iso2code': '8S', 'value': 'South Asia'},
  'incomeLevel': {'id': 'LIC', 'iso2code': 'XM', 'value': 'Low income'},
  'capitalCity': 'Kabul',
  'longitude': '69.1761',
  'latitude': '34.5228'},
 {'iso2Code': 'A9',
  'name': 'Africa',
  'region': {'id': 'NA', 'iso2code': 'NA', 'value': 'Aggregates'},
  'incomeLevel': {'id': 'NA', 'iso2code': 'NA', 'value': 'Aggregates'},
  'capitalCity': '',
  'longitude': '',
  'latitude': ''},
 {'iso2Code': 'AO',
  'name': 'Angola',
  'region': {'id': 'SSF', 'iso2code': 'ZG', 'value': 'Sub-Saharan Africa '},
  'incomeLevel': {'id': 'LMC',
   'iso2code': 'XN',
   'value': 'Lower middle income'},
  'cap

In [11]:
type(list1)

list

In [12]:
df = pd.DataFrame(list1,)

In [13]:
df

Unnamed: 0,iso2Code,name,region,incomeLevel,capitalCity,longitude,latitude
0,AW,Aruba,"{'id': 'LCN', 'iso2code': 'ZJ', 'value': 'Lati...","{'id': 'HIC', 'iso2code': 'XD', 'value': 'High...",Oranjestad,-70.0167,12.5167
1,AF,Afghanistan,"{'id': 'SAS', 'iso2code': '8S', 'value': 'Sout...","{'id': 'LIC', 'iso2code': 'XM', 'value': 'Low ...",Kabul,69.1761,34.5228
2,A9,Africa,"{'id': 'NA', 'iso2code': 'NA', 'value': 'Aggre...","{'id': 'NA', 'iso2code': 'NA', 'value': 'Aggre...",,,
3,AO,Angola,"{'id': 'SSF', 'iso2code': 'ZG', 'value': 'Sub-...","{'id': 'LMC', 'iso2code': 'XN', 'value': 'Lowe...",Luanda,13.242,-8.81155
4,AL,Albania,"{'id': 'ECS', 'iso2code': 'Z7', 'value': 'Euro...","{'id': 'UMC', 'iso2code': 'XT', 'value': 'Uppe...",Tirane,19.8172,41.3317
...,...,...,...,...,...,...,...
299,A5,Sub-Saharan Africa excluding South Africa and ...,"{'id': 'NA', 'iso2code': 'NA', 'value': 'Aggre...","{'id': 'NA', 'iso2code': 'NA', 'value': 'Aggre...",,,
300,YE,"Yemen, Rep.","{'id': 'MEA', 'iso2code': 'ZQ', 'value': 'Midd...","{'id': 'LIC', 'iso2code': 'XM', 'value': 'Low ...",Sana'a,44.2075,15.352
301,ZA,South Africa,"{'id': 'SSF', 'iso2code': 'ZG', 'value': 'Sub-...","{'id': 'UMC', 'iso2code': 'XT', 'value': 'Uppe...",Pretoria,28.1871,-25.746
302,ZM,Zambia,"{'id': 'SSF', 'iso2code': 'ZG', 'value': 'Sub-...","{'id': 'LMC', 'iso2code': 'XN', 'value': 'Lowe...",Lusaka,28.2937,-15.3982


# Normalizing dataframe

In [14]:
# Extracting value from region column and renaming as Region

In [15]:
a = pd.json_normalize(df['region'])

In [16]:
Region = a['value']

In [17]:
# Extracting value from incomeLevel column and renaming as Income

In [18]:
b = pd.json_normalize(df['incomeLevel'])
#b

In [19]:
Income = b['value']

In [20]:
df['Region'] = Region
df['Income'] = Income

In [21]:
df

Unnamed: 0,iso2Code,name,region,incomeLevel,capitalCity,longitude,latitude,Region,Income
0,AW,Aruba,"{'id': 'LCN', 'iso2code': 'ZJ', 'value': 'Lati...","{'id': 'HIC', 'iso2code': 'XD', 'value': 'High...",Oranjestad,-70.0167,12.5167,Latin America & Caribbean,High income
1,AF,Afghanistan,"{'id': 'SAS', 'iso2code': '8S', 'value': 'Sout...","{'id': 'LIC', 'iso2code': 'XM', 'value': 'Low ...",Kabul,69.1761,34.5228,South Asia,Low income
2,A9,Africa,"{'id': 'NA', 'iso2code': 'NA', 'value': 'Aggre...","{'id': 'NA', 'iso2code': 'NA', 'value': 'Aggre...",,,,Aggregates,Aggregates
3,AO,Angola,"{'id': 'SSF', 'iso2code': 'ZG', 'value': 'Sub-...","{'id': 'LMC', 'iso2code': 'XN', 'value': 'Lowe...",Luanda,13.242,-8.81155,Sub-Saharan Africa,Lower middle income
4,AL,Albania,"{'id': 'ECS', 'iso2code': 'Z7', 'value': 'Euro...","{'id': 'UMC', 'iso2code': 'XT', 'value': 'Uppe...",Tirane,19.8172,41.3317,Europe & Central Asia,Upper middle income
...,...,...,...,...,...,...,...,...,...
299,A5,Sub-Saharan Africa excluding South Africa and ...,"{'id': 'NA', 'iso2code': 'NA', 'value': 'Aggre...","{'id': 'NA', 'iso2code': 'NA', 'value': 'Aggre...",,,,Aggregates,Aggregates
300,YE,"Yemen, Rep.","{'id': 'MEA', 'iso2code': 'ZQ', 'value': 'Midd...","{'id': 'LIC', 'iso2code': 'XM', 'value': 'Low ...",Sana'a,44.2075,15.352,Middle East & North Africa,Low income
301,ZA,South Africa,"{'id': 'SSF', 'iso2code': 'ZG', 'value': 'Sub-...","{'id': 'UMC', 'iso2code': 'XT', 'value': 'Uppe...",Pretoria,28.1871,-25.746,Sub-Saharan Africa,Upper middle income
302,ZM,Zambia,"{'id': 'SSF', 'iso2code': 'ZG', 'value': 'Sub-...","{'id': 'LMC', 'iso2code': 'XN', 'value': 'Lowe...",Lusaka,28.2937,-15.3982,Sub-Saharan Africa,Lower middle income


# Cleaning and Tidying Dataframe

In [22]:
df.columns

Index(['iso2Code', 'name', 'region', 'incomeLevel', 'capitalCity', 'longitude',
       'latitude', 'Region', 'Income'],
      dtype='object')

In [23]:
df.drop(columns=['iso2Code','region','incomeLevel'],inplace=True)

In [24]:
df

Unnamed: 0,name,capitalCity,longitude,latitude,Region,Income
0,Aruba,Oranjestad,-70.0167,12.5167,Latin America & Caribbean,High income
1,Afghanistan,Kabul,69.1761,34.5228,South Asia,Low income
2,Africa,,,,Aggregates,Aggregates
3,Angola,Luanda,13.242,-8.81155,Sub-Saharan Africa,Lower middle income
4,Albania,Tirane,19.8172,41.3317,Europe & Central Asia,Upper middle income
...,...,...,...,...,...,...
299,Sub-Saharan Africa excluding South Africa and ...,,,,Aggregates,Aggregates
300,"Yemen, Rep.",Sana'a,44.2075,15.352,Middle East & North Africa,Low income
301,South Africa,Pretoria,28.1871,-25.746,Sub-Saharan Africa,Upper middle income
302,Zambia,Lusaka,28.2937,-15.3982,Sub-Saharan Africa,Lower middle income


<mark>There are some values called 'Aggregates' in 'Income' and 'Region' columns. I will drop the rows with those values.

In [25]:
index_toDrop = df.query("Region=='Aggregates'").index

In [26]:
#df1 = df.drop(df[df['Region']=='Aggregates'].index)

In [27]:
index_toDrop

Int64Index([  2,   6,   7,  17,  18,  24,  28,  31,  37,  40,  43,  44,  45,
             51,  52,  60,  61,  67,  68,  70,  72,  74,  75,  77,  79,  80,
             81,  82,  84,  85,  86,  87,  88,  91,  96,  97, 103, 119, 122,
            126, 127, 128, 129, 130, 132, 135, 153, 159, 160, 161, 162, 165,
            166, 168, 175, 178, 181, 184, 189, 198, 199, 206, 209, 211, 213,
            215, 223, 229, 230, 234, 235, 238, 240, 250, 252, 253, 261, 266,
            267, 272, 274, 276, 277, 286, 296, 299],
           dtype='int64')

In [28]:
df2 = df.drop(df.query("Region=='Aggregates'").index)

In [29]:
df2.reset_index(inplace=True)

In [30]:
df2.shape

(218, 7)

<mark> I have now 218 rows and 6 columns after removing unnessary rows and columns

In [31]:
df2.columns

Index(['index', 'name', 'capitalCity', 'longitude', 'latitude', 'Region',
       'Income'],
      dtype='object')

<mark> I will now arrange the columns in proper order so that it looks good to read

In [32]:
df2 = df2[['name','capitalCity','Region','Income','longitude','latitude']]

In [33]:
df2

Unnamed: 0,name,capitalCity,Region,Income,longitude,latitude
0,Aruba,Oranjestad,Latin America & Caribbean,High income,-70.0167,12.5167
1,Afghanistan,Kabul,South Asia,Low income,69.1761,34.5228
2,Angola,Luanda,Sub-Saharan Africa,Lower middle income,13.242,-8.81155
3,Albania,Tirane,Europe & Central Asia,Upper middle income,19.8172,41.3317
4,Andorra,Andorra la Vella,Europe & Central Asia,High income,1.5218,42.5075
...,...,...,...,...,...,...
213,Kosovo,Pristina,Europe & Central Asia,Upper middle income,20.926,42.565
214,"Yemen, Rep.",Sana'a,Middle East & North Africa,Low income,44.2075,15.352
215,South Africa,Pretoria,Sub-Saharan Africa,Upper middle income,28.1871,-25.746
216,Zambia,Lusaka,Sub-Saharan Africa,Lower middle income,28.2937,-15.3982


# Data Analysis

In [35]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 218 entries, 0 to 217
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   name         218 non-null    object
 1   capitalCity  218 non-null    object
 2   Region       218 non-null    object
 3   Income       218 non-null    object
 4   longitude    218 non-null    object
 5   latitude     218 non-null    object
dtypes: object(6)
memory usage: 10.3+ KB


In [36]:
df2.describe()

Unnamed: 0,name,capitalCity,Region,Income,longitude,latitude
count,218,218.0,218,218,218.0,218.0
unique,218,212.0,7,4,212.0,212.0
top,"Iran, Islamic Rep.",,Europe & Central Asia,High income,,
freq,1,7.0,58,83,7.0,7.0


In [37]:
by_Income = df2.groupby('Income').Income.count()

In [38]:
fig = px.bar(by_Income,color=by_Income.index)
fig.show()

<mark> There are 29 countries with per capital income of less than 1,036 dollars whereas there are 83 nations with per capita income of more than 12,536 dollars. 


In [39]:
df2['Region'].nunique()

7

In [40]:
df2

Unnamed: 0,name,capitalCity,Region,Income,longitude,latitude
0,Aruba,Oranjestad,Latin America & Caribbean,High income,-70.0167,12.5167
1,Afghanistan,Kabul,South Asia,Low income,69.1761,34.5228
2,Angola,Luanda,Sub-Saharan Africa,Lower middle income,13.242,-8.81155
3,Albania,Tirane,Europe & Central Asia,Upper middle income,19.8172,41.3317
4,Andorra,Andorra la Vella,Europe & Central Asia,High income,1.5218,42.5075
...,...,...,...,...,...,...
213,Kosovo,Pristina,Europe & Central Asia,Upper middle income,20.926,42.565
214,"Yemen, Rep.",Sana'a,Middle East & North Africa,Low income,44.2075,15.352
215,South Africa,Pretoria,Sub-Saharan Africa,Upper middle income,28.1871,-25.746
216,Zambia,Lusaka,Sub-Saharan Africa,Lower middle income,28.2937,-15.3982


In [49]:
fig1 = px.bar(df2,x='Region',color='Income')
fig1.show()

<mark> From the above figure we can see that North America has all countries having high income. Europe and Central Asia has mostly high income or upper middle income countries. The most low income coutries are in Sub-Saharan Africa. There are no single countries with high income in South Asia. We can also verify this from the code below.

In [50]:
df2.query("Region=='South Asia'")

Unnamed: 0,name,capitalCity,Region,Income,longitude,latitude
1,Afghanistan,Kabul,South Asia,Low income,69.1761,34.5228
17,Bangladesh,Dhaka,South Asia,Lower middle income,90.4113,23.7055
29,Bhutan,Thimphu,South Asia,Lower middle income,89.6177,27.5768
89,India,New Delhi,South Asia,Lower middle income,77.225,28.6353
113,Sri Lanka,Colombo,South Asia,Lower middle income,79.8528,6.92148
124,Maldives,Male,South Asia,Upper middle income,73.5109,4.1742
146,Nepal,Kathmandu,South Asia,Lower middle income,85.3157,27.6939
150,Pakistan,Islamabad,South Asia,Lower middle income,72.8,30.5167


In [52]:
import plotly.graph_objects as go

In [None]:
go.scattergeo()

In [56]:
df2

Unnamed: 0,name,capitalCity,Region,Income,longitude,latitude
0,Aruba,Oranjestad,Latin America & Caribbean,High income,-70.0167,12.5167
1,Afghanistan,Kabul,South Asia,Low income,69.1761,34.5228
2,Angola,Luanda,Sub-Saharan Africa,Lower middle income,13.242,-8.81155
3,Albania,Tirane,Europe & Central Asia,Upper middle income,19.8172,41.3317
4,Andorra,Andorra la Vella,Europe & Central Asia,High income,1.5218,42.5075
...,...,...,...,...,...,...
213,Kosovo,Pristina,Europe & Central Asia,Upper middle income,20.926,42.565
214,"Yemen, Rep.",Sana'a,Middle East & North Africa,Low income,44.2075,15.352
215,South Africa,Pretoria,Sub-Saharan Africa,Upper middle income,28.1871,-25.746
216,Zambia,Lusaka,Sub-Saharan Africa,Lower middle income,28.2937,-15.3982


In [55]:
fig2 = px.scatter_geo(df2,lat='latitude',lon='longitude',
                      locations='name',color='Income',
             )
fig2.show()

In [58]:
fig3 = px.choropleth_mapbox(data_frame=df2,locations='name',color='Income')
fig3.show()