## Using functions from scripts

One can import classes and funtions from scripts in the same folder using the name of the file in question. In this ase the file is *tools.py*. THus we do the following to import them. It is also possible to set an alias for ease of use.

In [1]:
import tools as tls

The data cleaning functions are methods of the **data_cleaner** class object. Thus to use them we can create one such object and call the specific methods when they are needed.

In [2]:
cleaner = tls.data_cleaner()

### Getting projected temperatures - 2020-2039

In [3]:
df_temp_proj = cleaner.temp_proj('raw data/tas_2020_2039_mavg_rcp26_AFG_CAF.csv')

Unnamed: 0,Country,Temperature (°C)
0,AFG,12.369804
1,AGO,20.311982
2,ALB,14.062129
3,AND,12.352881
4,ARE,24.896902
...,...,...
188,XRK,11.533050
189,YEM,22.247077
190,ZAF,15.588705
191,ZMB,19.578873


In [None]:
df_temp_proj.to_csv('clean data/projected_temp_2020-2039.csv')
df_temp_proj

### Getting climate factors 2013-2017

In [4]:
df_rain, df_temp, df_water, df_climate_factors = cleaner.climate_factors('raw data/WORLDBANK_rainfall.csv', 'raw data/WORLDBANK_temperature.csv', 'raw data/AQUASTAT_water_resources.csv')

In [5]:
df_climate_factors.to_csv('clean data/climate_factors.csv')
df_climate_factors

Unnamed: 0,Country,Temperature (°C),Total Rainfall (mm),Total internal renewable water resources (IRWR),Total external renewable water resources (ERWR),Total renewable water resources,Dependency ratio,Total exploitable water resources
0,AFG,14.074742,349.736945,47.1500,18.18,65.3300,28.722600,
1,AGO,22.182196,960.024065,148.0000,0.40,148.4000,0.269542,
2,ALB,12.754647,1079.459168,26.9000,3.30,30.2000,10.927152,13.000
3,AND,12.402212,760.241065,0.3156,,0.3156,,
4,ARE,28.010773,64.449765,0.1500,0.00,0.1500,0.000000,
...,...,...,...,...,...,...,...,...
197,WSM,27.578074,3162.300825,,0.00,0.0000,0.000000,
198,YEM,24.211854,161.796177,2.1000,0.00,2.1000,0.000000,
199,ZAF,18.620716,403.002933,44.8000,6.55,51.3500,12.840467,11.972
200,ZMB,22.412762,895.648672,80.2000,24.60,104.8000,23.473282,


### Getting water stress indicators 2013-2017

In [6]:
df_waterstress = cleaner.water_stress('raw data/AQUASTAT_water_stress.csv')

Variable Name,Country,Water stress (MDG),Water use efficiency (SDG),Water stress (SDG)
0,AFG,31.045462,0.923778,54.757019
1,AGO,0.475539,142.467836,1.871883
2,ALB,3.933775,6.656907,7.139423
3,ARE,1708.000000,92.773763,1708.000000
4,ARG,4.301333,13.616564,10.456664
...,...,...,...,...
175,VNM,9.259150,2.349448,18.130315
176,YEM,169.761905,5.219357,169.761905
177,ZAF,37.740993,14.659097,62.055716
178,ZMB,1.500000,12.764894,2.835498


In [None]:
df_waterstress.to_csv('clean data/water_stress.csv')
df_waterstress

### Getting socio economic factors

In [7]:
df_aqua, df_unicef = cleaner.socioecon_factors('raw data/aquastat_socio_economic.csv', 'raw data/unicef_socio_economic.csv')

In [12]:
df_aqua.to_csv('clean data/aqua.csv')
df_unicef.to_csv('clean data/unicef.csv')

In [8]:
df_unicef

Unnamed: 0,Indicator,Country,Time,Value
0,"Fertility rate, total (births per woman)",AUS,2000,1.8
1,"Fertility rate, total (births per woman)",AUS,2001,1.7
2,"Fertility rate, total (births per woman)",AUS,2002,1.8
3,"Fertility rate, total (births per woman)",AUS,2003,1.7
4,"Fertility rate, total (births per woman)",AUS,2004,1.8
...,...,...,...,...
133768,"Prevalence of HIV, total (% of population ages...",LBY,2015,0.2
133769,"Prevalence of HIV, total (% of population ages...",LBY,2016,0.2
133770,"Prevalence of HIV, total (% of population ages...",LBY,2017,0.2
133771,"Prevalence of HIV, total (% of population ages...",LBY,2018,0.2


In [9]:
df_aqua

Unnamed: 0,Country,Variable,1998-2002,2003-2007,2008-2012,2013-2017
0,AFG,Rural population (1000 inhab),17086.91,20464.923,23280.663,26558.609
1,AFG,Urban population (1000 inhab),4893.013,6151.869,7416.295,8971.472
2,AFG,Population density (inhab/km2),34.6180957633,41.5104861686,47.73056398,55.5955534111
3,AFG,GDP per capita (current US$/inhab),194.958382,389.985586,694.885618,605.557362
4,AFG,Human Development Index (HDI) [highest = 1] (-),0.378,0.431,0.479,0.493
...,...,...,...,...,...,...
2181,ZMB,GDP per capita (current US$/inhab),377.129939,1104.582234,1690.361466,1513.27609
2182,ZMB,Human Development Index (HDI) [highest = 1] (-),0.445,0.492,0.552,0.589
2185,ZMB,Total population with access to safe drinking-...,54.7,59.0,63.0,65.4
2186,ZMB,Rural population with access to safe drinking-...,36.9,42.5,48.0,51.3


## Checking countries

Need to see what country codes are shared across all datasets. The AQUA and Unicef datasets arleady have been cheked across each other. Tus can just use on of the two.

In [None]:
set_temp = set.intersection(set(df_temp_hist['Country']), set(df_temp_proj['Country']))
set_climate = set.intersection(set(df_rain['Country']), set(df_inflow['Country']))
shared_countries = set.intersection(set_temp, set_climate, set(df_aqua['Country']))

In [None]:
len(shared_countries)

Thus we have 135 countries that are shared across the datasets. We will write a seperate *.csv* file that includes all these countries.

In [None]:
import pandas as pd
shared = pd.DataFrame(list(shared_countries))
shared.to_csv('shared_country_codes.csv')