# 01. Importing Libraries

In [1]:
# Import libraries
import pandas as pd
import numpy as np
import os

# 02. Import Data

In [61]:
path = r'C:\Users\isava\OneDrive\Documents\CareerFoundry\Data Immersion\AdvancedAnalytics\WorldRiskIndex'

In [63]:
df = pd.read_csv(os.path.join(path,'02 Data','OriginalData', 'world_risk_index.csv'))

# 04 Cleaning Data

### Missing Values

In [65]:
df.isnull().sum()

Region                          0
WRI                             0
Exposure                        0
Vulnerability                   0
Susceptibility                  0
Lack of Coping Capabilities     0
 Lack of Adaptive Capacities    1
Year                            0
Exposure Category               0
WRI Category                    1
Vulnerability Category          4
Susceptibility Category         0
dtype: int64

6 missing values

In [67]:
df[pd.isnull(df).any(axis = 1)] #Pull out 5 values with Nan. This shows the 4 Null vulnerable categories and the one null WRI category. 

Unnamed: 0,Region,WRI,Exposure,Vulnerability,Susceptibility,Lack of Coping Capabilities,Lack of Adaptive Capacities,Year,Exposure Category,WRI Category,Vulnerability Category,Susceptibility Category
1193,Österreich,2.87,13.18,21.75,13.63,39.27,12.34,2019,Medium,Very Low,,Very Low
1202,Deutschland,2.43,11.51,21.11,14.3,36.44,12.6,2019,Low,Very Low,,Very Low
1205,Norwegen,2.34,10.6,22.06,13.29,39.21,13.68,2019,Low,Very Low,,Very Low
1292,Föd. Staaten v. Mikronesien,7.59,14.95,50.77,31.79,72.13,48.39,2020,High,,High,High
1858,Korea Republic of 4.59,14.89,30.82,14.31,46.55,31.59,,2016,Very High,Very High,,High


Using my  groups per category in the data description section below, I know that a Vulnerability score of 21.75 for Row 1193 lies in the Very Low category. The same will be done for the other muissing values. Row 1858 also has a missing lack of adaptive capacities for 2016. I think I should take a mean of 2015 and 2016 Korea Republic Lack of Adaptive Capacities and enter that in for the 2016 row. I also noticed that the name of the Region should not have 4.59 in it. 

In [69]:
df.loc[1193,'Vulnerability Category'] = 'Very Low'
df.loc[1202,'Vulnerability Category'] = 'Very Low'
df.loc[1205,'Vulnerability Category'] = 'Very Low'
df.loc[1858,'Vulnerability Category'] = 'Very Low'
df.loc[1292,'WRI Category'] = 'Medium'

In [71]:
df.loc[df['Region'].str.contains('Korea')]

Unnamed: 0,Region,WRI,Exposure,Vulnerability,Susceptibility,Lack of Coping Capabilities,Lack of Adaptive Capacities,Year,Exposure Category,WRI Category,Vulnerability Category,Susceptibility Category
630,Korea Republic of,4.8,14.89,32.26,15.02,46.6,35.14,2014,High,Low,Very Low,Very Low
1858,Korea Republic of 4.59,14.89,30.82,14.31,46.55,31.59,,2016,Very High,Very High,Very Low,High


Given that there are only two Korean countries rows the data is unlikely to be informative. I will move forward with removing them from the dataset entirely. 

In [146]:
df.shape

(1917, 12)

In [73]:
df.drop([630, 1858], inplace = True)

In [199]:
df.shape

(1915, 12)

In [None]:
Two rows were removed.

In [75]:
df.isna().sum()

Region                          0
WRI                             0
Exposure                        0
Vulnerability                   0
Susceptibility                  0
Lack of Coping Capabilities     0
 Lack of Adaptive Capacities    0
Year                            0
Exposure Category               0
WRI Category                    0
Vulnerability Category          0
Susceptibility Category         0
dtype: int64

All missing values removed

### Rename Columns

In [77]:
df = df.rename(columns= {' Lack of Adaptive Capacities': 'Lack of Adaptive Capacities'}) #Remove leading space from column name.

### Check Country Names

In [79]:
pd.options.display.max_rows = 550
print(df['Region'].drop_duplicates().sort_values())

14                           Afghanistan
553                              Albania
38                              Albanien
576                              Algeria
60                              Algerien
61                                Angola
692                  Antigua und Barbuda
647                            Argentina
129                          Argentinien
610                              Armenia
83                              Armenien
87                         Aserbaidschan
642                            Australia
118                           Australien
650                              Austria
614                           Azerbaijan
117                              Bahamas
168                              Bahrain
5                            Bangladesch
521                           Bangladesh
160                             Barbados
148                              Belarus
139                              Belgien
655                              Belgium
101             

**NOTE** This dataset comes from a German source, some of these countries will need to be translated to English. There are also some duplicates such as Albania and Albanien. Albanien just means Albania in german.

In [81]:
df['Region'] = df['Region'].mask(df['Region'] == 'Albanien', 'Albania')
df['Region'] = df['Region'].mask(df['Region'] == 'Algerien', 'Algeria')
df['Region'] = df['Region'].mask(df['Region'] == 'Antigua und Barbuda', 'Antigua and Barbuda')
df['Region'] = df['Region'].mask(df['Region'] == 'Argentinien', 'Argentina')
df['Region'] = df['Region'].mask(df['Region'] == 'Armenien', 'Armenia')
df['Region'] = df['Region'].mask(df['Region'] == 'Aserbaidschan', 'Azerbaijan')
df['Region'] = df['Region'].mask(df['Region'] == 'Äthiopien', 'Ethiopia')
df['Region'] = df['Region'].mask(df['Region'] == 'Australien', 'Australia')
df['Region'] = df['Region'].mask(df['Region'] == 'Bangladesch', 'Bangladesh')
df['Region'] = df['Region'].mask(df['Region'] == 'Belgien', 'Belgium')
df['Region'] = df['Region'].mask(df['Region'] == 'Bosnien und Herzegowina', 'Bosnia and Herzegovina')
df['Region'] = df['Region'].mask(df['Region'] == 'Brasilien', 'Brazil')
df['Region'] = df['Region'].mask(df['Region'] == 'Bulgarien', 'Bulgaria')
df['Region'] = df['Region'].mask(df['Region'] == 'Dänemark', 'Denmark')
df['Region'] = df['Region'].mask(df['Region'] == 'Deutschland', 'Germany')
df['Region'] = df['Region'].mask(df['Region'] == 'Dominikanische Republik', 'Dominican Republic')
df['Region'] = df['Region'].mask(df['Region'] == 'Demokratische Rep. Kongo', 'Democratic Republic of the Congo')
df['Region'] = df['Region'].mask(df['Region'] == 'Elfenbeinküste', 'Ivory Coast')
df['Region'] = df['Region'].mask(df['Region'] == 'Estland', 'Estonia')
df['Region'] = df['Region'].mask(df['Region'] == 'Finnland', 'Finland')
df['Region'] = df['Region'].mask(df['Region'] == 'Föd. Staaten v. Mikronesien', 'Fps. States of Micronesia')
df['Region'] = df['Region'].mask(df['Region'] == 'Föd. Staaten von Mikronesien', 'Fps. States of Micronesia')
df['Region'] = df['Region'].mask(df['Region'] == 'Frankreich', 'France')
df['Region'] = df['Region'].mask(df['Region'] == 'Georgien', 'Georgia')
df['Region'] = df['Region'].mask(df['Region'] == 'Griechenland', 'Greece')

In [83]:
df['Region'] = df['Region'].mask(df['Region'] == 'Indien', 'India')
df['Region'] = df['Region'].mask(df['Region'] == 'Indonesien', 'Indonesia')
df['Region'] = df['Region'].mask(df['Region'] == 'Irak', 'Iraq')
df['Region'] = df['Region'].mask(df['Region'] == 'Irland', 'Ireland')
df['Region'] = df['Region'].mask(df['Region'] == 'Island', 'Iceland')
df['Region'] = df['Region'].mask(df['Region'] == 'Italien', 'Italy')
df['Region'] = df['Region'].mask(df['Region'] == 'Jamaika', 'Jamaica')
df['Region'] = df['Region'].mask(df['Region'] == 'Jemen', 'Yemen')
df['Region'] = df['Region'].mask(df['Region'] == 'Jordanien', 'Jordan')
df['Region'] = df['Region'].mask(df['Region'] == 'Kambodscha', 'Cambodia')
df['Region'] = df['Region'].mask(df['Region'] == 'Kamerun', 'Cameroon')
df['Region'] = df['Region'].mask(df['Region'] == 'Kanada', 'Canada')
df['Region'] = df['Region'].mask(df['Region'] == 'Kazachstan', 'Kazakhstan') 
df['Region'] = df['Region'].mask(df['Region'] == 'Kazachstan', 'Kazakhstan')
df['Region'] = df['Region'].mask(df['Region'] == 'Kap Verde', 'Cape Verde')
df['Region'] = df['Region'].mask(df['Region'] == 'Katar', 'Qatar')
df['Region'] = df['Region'].mask(df['Region'] == 'Kenia', 'Kenya')
df['Region'] = df['Region'].mask(df['Region'] == 'Kirgisistan', 'Kyrgyzstan')
df['Region'] = df['Region'].mask(df['Region'] == 'Kolumbien', 'Colombia')
df['Region'] = df['Region'].mask(df['Region'] == 'Kroatien', 'Croatia')
df['Region'] = df['Region'].mask(df['Region'] == 'Lao People\'s Democ. Republic', 'Laos')
df['Region'] = df['Region'].mask(df['Region'] == 'Lao People\'s Democratic Republic', 'Laos')
df['Region'] = df['Region'].mask(df['Region'] == 'Lettland', 'Latvia')
df['Region'] = df['Region'].mask(df['Region'] == 'Libanon', 'Lebanon')

In [85]:
df['Region'] = df['Region'].mask(df['Region'] == 'Libyen', 'Libya')
df['Region'] = df['Region'].mask(df['Region'] == 'Litauen', 'Lithuania')
df['Region'] = df['Region'].mask(df['Region'] == 'Luxemburg', 'Luxembourg')
df['Region'] = df['Region'].mask(df['Region'] == 'Malediven', 'Maldives')
df['Region'] = df['Region'].mask(df['Region'] == 'Marokko', 'Morocco')
df['Region'] = df['Region'].mask(df['Region'] == 'Mazedonien', 'Macedonia')
df['Region'] = df['Region'].mask(df['Region'] == 'Mexiko', 'Mexico')
df['Region'] = df['Region'].mask(df['Region'] == 'Moldawien', 'Moldova')
df['Region'] = df['Region'].mask(df['Region'] == 'Republic of Macedonia', 'Macedonia')
df['Region'] = df['Region'].mask(df['Region'] == 'Republic of Moldova', 'Moldova')
df['Region'] = df['Region'].mask(df['Region'] == 'Moldau', 'Moldova')
df['Region'] = df['Region'].mask(df['Region'] == 'Mongolei', 'Mongolia')
df['Region'] = df['Region'].mask(df['Region'] == 'Montenegro', 'Montenegro')
df['Region'] = df['Region'].mask(df['Region'] == 'Myanmar', 'Myanmar')
df['Region'] = df['Region'].mask(df['Region'] == 'Nepal', 'Nepal')
df['Region'] = df['Region'].mask(df['Region'] == 'Neuseeland', 'New Zealand')
df['Region'] = df['Region'].mask(df['Region'] == 'Niederlande', 'Netherlands')
df['Region'] = df['Region'].mask(df['Region'] == 'Nordmazedonien', 'North Macedonia')
df['Region'] = df['Region'].mask(df['Region'] == 'Norwegen', 'Norway')
df['Region'] = df['Region'].mask(df['Region'] == 'Österreich', 'Austria')
df['Region'] = df['Region'].mask(df['Region'] == 'Pakistan', 'Pakistan')
df['Region'] = df['Region'].mask(df['Region'] == 'Panama', 'Panama')
df['Region'] = df['Region'].mask(df['Region'] == 'Paraguay', 'Paraguay')
df['Region'] = df['Region'].mask(df['Region'] == 'Peru', 'Peru')

In [87]:
df['Region'] = df['Region'].mask(df['Region'] == 'Philippinen', 'Philippines')
df['Region'] = df['Region'].mask(df['Region'] == 'Polen', 'Poland')
df['Region'] = df['Region'].mask(df['Region'] == 'Portugal', 'Portugal')
df['Region'] = df['Region'].mask(df['Region'] == 'Ruanda', 'Rwanda')
df['Region'] = df['Region'].mask(df['Region'] == 'Rumänien', 'Romania')
df['Region'] = df['Region'].mask(df['Region'] == 'Russland', 'Russia')
df['Region'] = df['Region'].mask(df['Region'] == 'Saudi-Arabien', 'Saudi Arabia')
df['Region'] = df['Region'].mask(df['Region'] == 'Schweden', 'Sweden')
df['Region'] = df['Region'].mask(df['Region'] == 'Schweiz', 'Switzerland')
df['Region'] = df['Region'].mask(df['Region'] == 'Senegal', 'Senegal')
df['Region'] = df['Region'].mask(df['Region'] == 'Serbien', 'Serbia')
df['Region'] = df['Region'].mask(df['Region'] == 'Singapur', 'Singapore')
df['Region'] = df['Region'].mask(df['Region'] == 'Slowakei', 'Slovakia')
df['Region'] = df['Region'].mask(df['Region'] == 'Slowenien', 'Slovenia')
df['Region'] = df['Region'].mask(df['Region'] == 'Spanien', 'Spain')
df['Region'] = df['Region'].mask(df['Region'] == 'Sri Lanka', 'Sri Lanka')
df['Region'] = df['Region'].mask(df['Region'] == 'Südafrika', 'South Africa')
df['Region'] = df['Region'].mask(df['Region'] == 'Sudan', 'Sudan')
df['Region'] = df['Region'].mask(df['Region'] == 'Südkorea', 'South Korea')
df['Region'] = df['Region'].mask(df['Region'] == 'Syrien', 'Syria')
df['Region'] = df['Region'].mask(df['Region'] == 'Tadschikistan', 'Tajikistan')

In [89]:
df['Region'] = df['Region'].mask(df['Region'] == 'Salomonen', 'Solomon Islands')
df['Region'] = df['Region'].mask(df['Region'] == 'Sambia', 'Zambia')
df['Region'] = df['Region'].mask(df['Region'] == 'Saudia Arabia', 'Saudi Arabia')
df['Region'] = df['Region'].mask(df['Region'] == 'Seychelles', 'Seychelles')
df['Region'] = df['Region'].mask(df['Region'] == 'Sierra Leone', 'Sierra Leone')
df['Region'] = df['Region'].mask(df['Region'] == 'St. Vincent u. d. Grenadinen', 'Saint Vincent and the Grenadines')
df['Region'] = df['Region'].mask(df['Region'] == 'St. Vincent u. die Grenadinen', 'Saint Vincent and the Grenadines')
df['Region'] = df['Region'].mask(df['Region'] == 'St. Vincent und d. Grenadinen', 'Saint Vincent and the Grenadines')
df['Region'] = df['Region'].mask(df['Region'] == 'Surinam', 'Suriname')
df['Region'] = df['Region'].mask(df['Region'] == 'Swasiland', 'Eswatini')
df['Region'] = df['Region'].mask(df['Region'] == 'Swaziland', 'Eswatini')
df['Region'] = df['Region'].mask(df['Region'] == 'São Tomé und Príncipe', 'São Tomé and Príncipe')
df['Region'] = df['Region'].mask(df['Region'] == 'T. f. Yugo. Rep. of Macedonia', 'North Macedonia')
df['Region'] = df['Region'].mask(df['Region'] == 'Tansania', 'Tanzania')
df['Region'] = df['Region'].mask(df['Region'] == 'Timor-Leste', 'East Timor')
df['Region'] = df['Region'].mask(df['Region'] == 'Trinidad und Tobago', 'Trinidad and Tobago')
df['Region'] = df['Region'].mask(df['Region'] == 'Tschad', 'Chad')
df['Region'] = df['Region'].mask(df['Region'] == 'Tschechische Republik', 'Czech Republic')
df['Region'] = df['Region'].mask(df['Region'] == 'Ver. Arabische Emirate', 'United Arab Emirates')
df['Region'] = df['Region'].mask(df['Region'] == 'Ver. Staaten von Amerika', 'United States')
df['Region'] = df['Region'].mask(df['Region'] == 'Vereinigte Arabisch Emirate', 'United Arab Emirates')
df['Region'] = df['Region'].mask(df['Region'] == 'Vereinigte Staaten v. A.', 'United States')
df['Region'] = df['Region'].mask(df['Region'] == 'Vereinigte Staaten von Amerika', 'United States')
df['Region'] = df['Region'].mask(df['Region'] == 'Viet Nam', 'Vietnam')
df['Region'] = df['Region'].mask(df['Region'] == 'Zentralafrik. Republik', 'Central African Republic')
df['Region'] = df['Region'].mask(df['Region'] == 'Zentralafrikanische Republik', 'Central African Republic')
df['Region'] = df['Region'].mask(df['Region'] == 'Ägypten', 'Egypt')
df['Region'] = df['Region'].mask(df['Region'] == 'Äquatorialguinea', 'Equatorial Guinea')

In [91]:
df['Region'] = df['Region'].mask(df['Region'] == 'Brunei Darussalam', 'Brunei')
df['Region'] = df['Region'].mask(df['Region'] == 'Cape Verde', 'Cabo Verde')
df['Region'] = df['Region'].mask(df['Region'] == "Cote d'Ivoire", 'Ivory Coast')
df['Region'] = df['Region'].mask(df['Region'] == 'East Timor', 'Timor-Leste')
df['Region'] = df['Region'].mask(df['Region'] == 'Fps. States of Micronesia', 'Micronesia')
df['Region'] = df['Region'].mask(df['Region'] == 'Komoren', 'Comoros')
df['Region'] = df['Region'].mask(df['Region'] == 'Kongo', 'Congo')
df['Region'] = df['Region'].mask(df['Region'] == 'Kuba', 'Cuba')
df['Region'] = df['Region'].mask(df['Region'] == 'Libyan Arab Jamahiriya', 'Libya')
df['Region'] = df['Region'].mask(df['Region'] == 'Macedonia', 'North Macedonia')
df['Region'] = df['Region'].mask(df['Region'] == 'St. Lucia', 'Saint Lucia')
df['Region'] = df['Region'].mask(df['Region'] == 'Syrian Arab Republic', 'Syria')
df['Region'] = df['Region'].mask(df['Region'] == 'São Tomé and Príncipe', 'Sao Tome and Principe')
df['Region'] = df['Region'].mask(df['Region'] == 'United Republic of Tanzania', 'Tanzania')

In [93]:
df['Region'] = df['Region'].mask(df['Region'] == 'Bolivien', 'Bolivia')

df['Region'] = df['Region'].mask(df['Region'] == 'Botsuana', 'Botswana')

df['Region'] = df['Region'].mask(df['Region'] == 'Dschibuti', 'Djibouti')

df['Region'] = df['Region'].mask(df['Region'] == 'Seychellen', 'Seychelles')

df['Region'] = df['Region'].mask(df['Region'] == 'Simbabwe', 'Zimbabwe')

df['Region'] = df['Region'].mask(df['Region'] == 'Tunesien', 'Tunisia')

df['Region'] = df['Region'].mask(df['Region'] == 'Ungarn', 'Hungary')

df['Region'] = df['Region'].mask(df['Region'] == 'Usbekistan', 'Uzbekistan')

df['Region'] = df['Region'].mask(df['Region'] == 'Vereinigte Arabische Emirate', 'United Arab Emirates')

df['Region'] = df['Region'].mask(df['Region'] == 'Zypern', 'Cyprus')

df['Region'] = df['Region'].mask(df['Region'] == 'Iran (Islamic Republic of)', 'Iran')

df['Region'] = df['Region'].mask(df['Region'] == 'Timor-Leste', 'East Timor')

In [95]:
df['Region'] = df['Region'].mask(df['Region'] == 'Fidschi', 'Fiji')
df['Region'] = df['Region'].mask(df['Region'] == 'Gabun', 'Gabon')
df['Region'] = df['Region'].mask(df['Region'] == 'Kasachstan', 'Kazakhstan')
df['Region'] = df['Region'].mask(df['Region'] == 'Madagaskar', 'Madagascar')
df['Region'] = df['Region'].mask(df['Region'] == 'Mauretanien', 'Mauritania')
df['Region'] = df['Region'].mask(df['Region'] == 'Mongolien', 'Mongolia')
df['Region'] = df['Region'].mask(df['Region'] == 'Mosambik', 'Mozambique')
df['Region'] = df['Region'].mask(df['Region'] == 'Papua-Neuguinea', 'Papua New Guinea')
df['Region'] = df['Region'].mask(df['Region'] == 'Romänien', 'Romania')
df['Region'] = df['Region'].mask(df['Region'] == 'Russische Föderation', 'Russia')
df['Region'] = df['Region'].mask(df['Region'] == 'Türkei', 'Turkey')
df['Region'] = df['Region'].mask(df['Region'] == 'Vereinigtes Königreich', 'United Kingdom')
df['Region'] = df['Region'].mask(df['Region'] == 'Weißrussland', 'Belarus')

In [97]:
print(df['Region'].drop_duplicates().sort_values())

14                          Afghanistan
38                              Albania
60                              Algeria
61                               Angola
692                 Antigua and Barbuda
129                           Argentina
83                              Armenia
118                           Australia
143                             Austria
87                           Azerbaijan
117                             Bahamas
168                             Bahrain
5                            Bangladesh
160                            Barbados
148                             Belarus
139                             Belgium
101                              Belize
35                                Benin
17                               Bhutan
111                             Bolivia
96               Bosnia and Herzegovina
107                            Botswana
120                              Brazil
13                               Brunei
126                            Bulgaria


In [51]:
df.shape

(1915, 12)

# 03. Describing Data

In [9]:
df.head()

Unnamed: 0,Region,WRI,Exposure,Vulnerability,Susceptibility,Lack of Coping Capabilities,Lack of Adaptive Capacities,Year,Exposure Category,WRI Category,Vulnerability Category,Susceptibility Category
0,Vanuatu,32.0,56.33,56.81,37.14,79.34,53.96,2011,Very High,Very High,High,High
1,Tonga,29.08,56.04,51.9,28.94,81.8,44.97,2011,Very High,Very High,Medium,Medium
2,Philippinen,24.32,45.09,53.93,34.99,82.78,44.01,2011,Very High,Very High,High,High
3,Salomonen,23.51,36.4,64.6,44.11,85.95,63.74,2011,Very High,Very High,Very High,High
4,Guatemala,20.88,38.42,54.35,35.36,77.83,49.87,2011,Very High,Very High,High,High


- **Region** - Country
- **WRI** - world risk index; square root of the exposure * the vulnerability
- **Exposure** - Occurences of natural hazards risk
- **Vulnerability** -Exposure to risks such as susceptibility, lack of coping capacities, and lack of adaptive capacities
- **Susceptibility** - Exposure to structural charactersitics and conditions of a society that will cause suffering from a disaster situtation
- **Lack of Coping Capabilities** - Government preparedness, early warning, medical care and social and material security infrastructure 
- **Lack of Adaptive Capacities** - Adaptability to natrual events, climate change and other challenges
- **Year** - year
- **Exposure Category** - Categorical exposure score
- **WRI Category** - WRI categorical score
- **Vulnerability Category** - Vulnerability Categorical Score
- **Susceptibility Category** - Susceptibility Categorical Score

In [99]:
df['Year'].value_counts()

Year
2021    181
2020    181
2019    180
2011    173
2013    173
2012    173
2018    172
2015    171
2017    171
2014    170
2016    170
Name: count, dtype: int64

11 Years in total. Not all countries recorded for each year.

In [101]:
df['Region'].value_counts()

Region
Vanuatu                             11
Australia                           11
Turkey                              11
Bolivia                             11
Jordan                              11
Iran                                11
Lebanon                             11
Moldova                             11
Italy                               11
Bahamas                             11
New Zealand                         11
Hungary                             11
Brazil                              11
Ireland                             11
Czech Republic                      11
Paraguay                            11
United Arab Emirates                11
Bulgaria                            11
Kazakhstan                          11
Uruguay                             11
Serbia                              11
Botswana                            11
Yemen                               11
Bosnia and Herzegovina              11
Equatorial Guinea                   11
Trinidad and Tobag

Expect a max of 11 per country for each of the 11 years. 

In [20]:
df['Exposure Category'].value_counts()

Exposure Category
Very Low     393
Very High    383
Medium       381
Low          381
High         379
Name: count, dtype: int64

In [54]:
df.groupby('Exposure Category')['Exposure'].agg(['min','max']) #Get ranges per category

Unnamed: 0_level_0,min,max
Exposure Category,Unnamed: 1_level_1,Unnamed: 2_level_1
High,14.02,19.75
Low,9.16,12.3
Medium,11.53,14.83
Very High,17.54,99.88
Very Low,0.05,9.71


In [22]:
df['WRI Category'].value_counts()

WRI Category
Very Low     393
Very High    383
Medium       383
Low          379
High         378
Name: count, dtype: int64

In [56]:
df.groupby('WRI Category')['WRI'].agg(['min','max']) #Get ranges per category

Unnamed: 0_level_0,min,max
WRI Category,Unnamed: 1_level_1,Unnamed: 2_level_1
High,7.12,11.13
Low,3.23,5.8
Medium,5.46,7.71
Very High,10.3,56.71
Very Low,0.02,3.65


In [24]:
df['Vulnerability Category'].value_counts()

Vulnerability Category
Very Low     386
Medium       384
Very High    382
Low          381
High         380
Name: count, dtype: int64

In [58]:
df.groupby('Vulnerability Category')['Vulnerability'].agg(['min','max']) #Get ranges per category

Unnamed: 0_level_0,min,max
Vulnerability Category,Unnamed: 1_level_1,Unnamed: 2_level_1
High,47.98,63.76
Low,32.07,46.5
Medium,40.93,53.49
Very High,61.5,76.47
Very Low,20.97,36.81


In [26]:
df['Susceptibility Category'].value_counts()

Susceptibility Category
Very Low     390
Very High    383
Medium       382
High         381
Low          381
Name: count, dtype: int64

In [60]:
df.groupby('Susceptibility Category')['Susceptibility'].agg(['min','max']) #Get ranges per category

Unnamed: 0_level_0,min,max
Susceptibility Category,Unnamed: 1_level_1,Unnamed: 2_level_1
High,27.94,49.02
Low,16.52,22.89
Medium,20.69,33.17
Very High,44.93,70.83
Very Low,8.26,17.29


In [103]:
df.describe()

Unnamed: 0,WRI,Exposure,Vulnerability,Susceptibility,Lack of Coping Capabilities,Lack of Adaptive Capacities,Year
count,1915.0,1915.0,1915.0,1915.0,1915.0,1915.0,1915.0
mean,7.549368,15.380538,48.10165,30.739384,70.471023,43.094663,2016.050653
std,5.553269,10.239401,13.816607,15.666927,15.010563,13.553476,3.183362
min,0.02,0.05,20.97,8.26,35.16,11.16,2011.0
25%,3.74,10.14,37.08,17.795,59.39,33.185,2013.0
50%,6.52,12.76,47.1,25.4,74.23,43.09,2016.0
75%,9.385,16.45,60.115,42.625,83.01,53.07,2019.0
max,56.71,99.88,76.47,70.83,94.36,76.11,2021.0


Dataset has 1915 rows. WRI ranges from 0.02 to 56.71 but has an average of 7.55 which indicates right skewed data meaning most countries have low risk score.

In [32]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1917 entries, 0 to 1916
Data columns (total 12 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Region                        1917 non-null   object 
 1   WRI                           1917 non-null   float64
 2   Exposure                      1917 non-null   float64
 3   Vulnerability                 1917 non-null   float64
 4   Susceptibility                1917 non-null   float64
 5   Lack of Coping Capabilities   1917 non-null   float64
 6    Lack of Adaptive Capacities  1916 non-null   float64
 7   Year                          1917 non-null   int64  
 8   Exposure Category             1917 non-null   object 
 9   WRI Category                  1916 non-null   object 
 10  Vulnerability Category        1913 non-null   object 
 11  Susceptibility Category       1917 non-null   object 
dtypes: float64(6), int64(1), object(5)
memory usage: 179.8+ KB


# 04. Ethical Considerations

This dataset is only current through 2021 so any risk index scores may be inaccurate to the countries current state. This dataset should only be an indicator for possible future analyses on specific countries and methods to improve their risk score. Users should be sure to research each specific country rather than assuming they know the specific realms of aid needed with the scores given in this study. The scores supplied in this dataset provide a good overview but not answers for informed solutions on a country level.

# 05 Write Data

In [107]:
df.to_csv(os.path.join(path,'02 Data','CleanedData', 'world_risk_index_cleaned.csv'), index = False)