## Using functions from scripts

One can import classes and funtions from scripts in the same folder using the name of the file in question. In this ase the file is *tools.py*. THus we do the following to import them. It is also possible to set an alias for ease of use.

In [1]:
import tools as tls

The data cleaning functions are methods of the **data_cleaner** class object. Thus to use them we can create one such object and call the specific methods when they are needed.

In [2]:
cleaner = tls.data_cleaner()

We can access the methods in the same way as any other known object from different packages. The decotring explains what each of the functions does, and we can see the arguement that need to be provided.

### Getting the historical temperatures

In [14]:
df_temp_hist = cleaner.temp_hist('tas_1991_2016_all.csv')

df_temp_hist.dtypes

Temperature    float64
Year             int64
Month            int64
Name            object
Country         object
dtype: object

### Getting projected temperatures

In [4]:
df_temp_proj = cleaner.temp_proj('tas_2020_2039_mavg_rcp26_AFG_CAF.csv')
df_temp_proj

Unnamed: 0,Temperature,Year,Model,Month,Name,Country
0,-0.722797,2020-2039,bcc_csm1_1,1,Afghanistan,AFG
1,1.788925,2020-2039,bcc_csm1_1,2,Afghanistan,AFG
2,6.647932,2020-2039,bcc_csm1_1,3,Afghanistan,AFG
3,13.529653,2020-2039,bcc_csm1_1,4,Afghanistan,AFG
4,18.811602,2020-2039,bcc_csm1_1,5,Afghanistan,AFG
...,...,...,...,...,...,...
45139,1.701721,2020-2039,Ensemble (90th Percentile),8,Zimbabwe,ZWE
45140,1.770244,2020-2039,Ensemble (90th Percentile),9,Zimbabwe,ZWE
45141,2.253327,2020-2039,Ensemble (90th Percentile),10,Zimbabwe,ZWE
45142,2.153784,2020-2039,Ensemble (90th Percentile),11,Zimbabwe,ZWE


### Getting climate factors

In [5]:
df_rain, df_inflow = cleaner.climate_factors('raw data/WORLDBANK_rainfall.csv', 'raw data/AQUASTAT_water_resources.csv')

In [6]:
df_rain

Unnamed: 0,Total Rainfall (mm),Year,Statistics,Country
0,64.77650,1991,Jan Average,AFG
1,59.40250,1991,Feb Average,AFG
2,119.62500,1991,Mar Average,AFG
3,51.80250,1991,Apr Average,AFG
4,57.24380,1991,May Average,AFG
...,...,...,...,...
61147,2.03926,2016,Aug Average,ZWE
61148,0.48070,2016,Sep Average,ZWE
61149,9.13410,2016,Oct Average,ZWE
61150,72.95080,2016,Nov Average,ZWE


In [7]:
df_inflow

Variable Name,Total internal renewable water resources (IRWR),Total external renewable water resources (ERWR),Total renewable water resources,Dependency ratio,Total exploitable water resources,Country
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AFG,47.1500,18.18,65.3300,28.722600,,AFG
AGO,148.0000,0.40,148.4000,0.269542,,AGO
ALB,26.9000,3.30,30.2000,10.927152,13.000,ALB
AND,0.3156,,0.3156,,,AND
ARE,0.1500,0.00,0.1500,0.000000,,ARE
...,...,...,...,...,...,...
WSM,,0.00,0.0000,0.000000,,WSM
YEM,2.1000,0.00,2.1000,0.000000,,YEM
ZAF,44.8000,6.55,51.3500,12.840467,11.972,ZAF
ZMB,80.2000,24.60,104.8000,23.473282,,ZMB


### Adjusting time frame

We have a function within the tools script that provides the Yearly, Monthly, and Annual Average of the rainfall statistic.

This is just called using the script alias, and the function name.

In [8]:
df_rain_adj = tls.rainfall_time(df_rain, adjustment = 'Yearly')
df_rain_adj

Unnamed: 0_level_0,Unnamed: 1_level_0,Total Rainfall (mm)
Country,Year,Unnamed: 2_level_1
AFG,1991,435.44990
AFG,1992,408.15623
AFG,1993,317.08530
AFG,1994,342.22238
AFG,1995,300.89815
...,...,...
ZWE,2012,543.74841
ZWE,2013,614.81309
ZWE,2014,607.29762
ZWE,2015,390.63457


### Getting socio economic factors

In [9]:
aqua = 'raw data/aquastat_socio_economic.csv'
unicef = 'raw data/unicef_socio_economic.csv'
df_aqua, df_unicef = cleaner.socioecon_factors(aqua, unicef)

In [10]:
df_unicef

Unnamed: 0,Indicator,Country,Time,Value
0,"Fertility rate, total (births per woman)",AUS,2000,1.8
1,"Fertility rate, total (births per woman)",AUS,2001,1.7
2,"Fertility rate, total (births per woman)",AUS,2002,1.8
3,"Fertility rate, total (births per woman)",AUS,2003,1.7
4,"Fertility rate, total (births per woman)",AUS,2004,1.8
...,...,...,...,...
133768,"Prevalence of HIV, total (% of population ages...",LBY,2015,0.2
133769,"Prevalence of HIV, total (% of population ages...",LBY,2016,0.2
133770,"Prevalence of HIV, total (% of population ages...",LBY,2017,0.2
133771,"Prevalence of HIV, total (% of population ages...",LBY,2018,0.2


In [11]:
df_aqua

Unnamed: 0,Country,Variable,1998-2002,2003-2007,2008-2012,2013-2017
0,AFG,Rural population (1000 inhab),17086.91,20464.923,23280.663,26558.609
1,AFG,Urban population (1000 inhab),4893.013,6151.869,7416.295,8971.472
2,AFG,Population density (inhab/km2),34.6180957633,41.5104861686,47.73056398,55.5955534111
3,AFG,GDP per capita (current US$/inhab),194.958382,389.985586,694.885618,605.557362
4,AFG,Human Development Index (HDI) [highest = 1] (-),0.378,0.431,0.479,0.493
...,...,...,...,...,...,...
2181,ZMB,GDP per capita (current US$/inhab),377.129939,1104.582234,1690.361466,1513.27609
2182,ZMB,Human Development Index (HDI) [highest = 1] (-),0.445,0.492,0.552,0.589
2185,ZMB,Total population with access to safe drinking-...,54.7,59.0,63.0,65.4
2186,ZMB,Rural population with access to safe drinking-...,36.9,42.5,48.0,51.3


## Checking countries

Need to see what country codes are shared across all datasets. The AQUA and Unicef datasets arleady have been cheked across each other. Tus can just use on of the two.

In [15]:
set_temp = set.intersection(set(df_temp_hist['Country']), set(df_temp_proj['Country']))
set_climate = set.intersection(set(df_rain['Country']), set(df_inflow['Country']))
shared_countries = set.intersection(set_temp, set_climate, set(df_aqua['Country']))

In [18]:
len(shared_countries)

135

Thus we have 135 countries that are shared across the datasets. We will write a seperate *.csv* file that includes all these countries.

In [28]:
import pandas as pd
shared = pd.DataFrame(list(shared_countries))
shared.to_csv('shared_country_codes.csv')