# Recapping global boundary data

At the end of the last class, we covered how to filter and export country boundary information. 

Today we are going to cover some of the more complicated aspects of this code, by recapping on each step, and having you do some exercises on each part, before we put it all together. 

Generally, the key steps include:

- Loading in the `countries.csv` metadata file using pandas.
- Looping over each country (row) in our `countries.csv` metadata file.
- Constraining the analysis to a single countries (for now).
- Separating out variables of interest. 
- Specifying a country folder path. 
- Checking to see if the country folder path exists, and if not, creating it.
- Specifying a country regions folder path. 
- Checking to see if the country regions folder path exists, and if not, creating it.
- Specifying a global bounaries path for us to read in bespoke layer data. 
- Reading in existing global boundaries data.
- Subsetting desired boundaries.
- Specifying a global boundaries output path.
- Writing out the global boundaries subset for a country of interest. 


And the code was specified as follows for Rwanda. Granted, this is *a lot*, therefore we will spend today going over individual vignettes, and then piece the whole lot together. 

In [None]:
# Example
import os           # for basic operating system functions
import pandas       # to load .csv data as a dataframe
import geopandas    # to load shapefiles to a geodataframe

path = os.path.join('..', 'data', 'countries.csv')
countries = pandas.read_csv(path, encoding='latin-1')

for idx, country in countries.iterrows():
    
    if not country['iso3'] == 'RWA': # if the current country iso3 does not match RWA...
        continue                     # continue in the loop to the next country 
    
    # subset variables of interest
    iso3 = country['iso3']             
    gid_region = country['gid_region']
    
    # specify folder out path
    country_folder_path = os.path.join('..', 'data', 'processed', iso3)
    if not os.path.exists(country_folder_path): # check if path exists or not
        os.makedirs(country_folder_path)        # if not, create the path 
    
    # specify folder out path for regions
    regions_folder_path = os.path.join('..', 'data', 'processed', iso3, 'regions')
    if not os.path.exists(regions_folder_path): # check if path exists or not
        os.makedirs(regions_folder_path)        # if not, create the path 
    
    # specify filename for desired layer, based on desired gid_region level (e.g., 1, 2 etc.)
    filename = 'gadm36_{}.shp'.format(gid_region)
    global_boundaries_path = os.path.join('..', 'data', 'raw', 'gadm36_levels_shp', filename) 
    
    # load global boundaries data usin geopandas
    global_boundaries = geopandas.read_file(global_boundaries_path)
    
    # subset boundaries which match the iso3 code of our desired the country - RWA
    country_boundaries = global_boundaries[global_boundaries['GID_0'] == iso3]
    
    # specify the path out for the subset of boundaries 
    path_out = os.path.join('..', 'data', 'processed', iso3, 'regions', filename)
    country_boundaries.to_file(path_out) # export bounaries to .shp file
    
    print(iso3, gid_region)

## Selecting which countries to process

Firstly, we can reexamine how we use **if logic** to decide which country(s) we want to analyze.

For this analysis, the point is to confine the code to a small manageable area, where we can interrogate/validate what our codebase is doing. 

This is always how you should begin working on a problem. Get it right for a small defined area, you can easily check. And then you can scale up the analysis to a whole country or the whole world. 

It is assumed that you understand how to load the `countries.csv` metadata file, and then iterate over each country (thus, each row), one by one. 


In [None]:
# Example
path = os.path.join('..', 'data', 'countries.csv')
countries = pandas.read_csv(path, encoding='latin-1')

for idx, country in countries.iterrows():
    
    if not country['iso3'] == 'RWA': # if the current country iso3 does not match RWA...
        continue                     # continue in the loop to the next country 
        
    print(iso3, gid_region)

If this is complicated for you to understand, think of it like this (without the `not` first):

(Remember an if function evaluates to `True` or `False`)

In [None]:
# Example

iso3 = 'RWA'

if iso3 == 'RWA': 
    print('if iso3 == RWA is true') # if the current country iso3 does match RWA...
else:
    print('if iso3 == RWA is not true') # if the current country iso3 does not match RWA...

if iso3 == 'USA': 
    print('if iso3 == USA is not')  # if the current country iso3 does match USA...
else:
    print('if iso3 == USA is not true') # if the current country iso3 does not match USA...
    

Then we can add the `not` in.

In [None]:
# Example
iso3 = 'RWA'

if not iso3 == 'RWA': 
    print('if not iso3 == RWA is true') # if the current country iso3 does not match RWA...
else:
    print('if not iso3 == RWA is not true') # if the current country iso3 does match RWA...

if not iso3 == 'USA': 
    print('if not iso3 == USA is true')  # if the current country iso3 does not match USA...
else:
    print('if not iso3 == USA is not true') # if the current country iso3 does match USA...

# Exercise

Given this **if logic** ensures only Rwanda is processed, how would you change the loop to ensure every country is processed except Rwanda? Print the iso3 codes and gid_region levels to demonstrate. 

In [None]:
# Enter your attempt below:


And how would you change the code to ensure all countries are processed? Print the iso3 codes and gid_region levels to demonstrate. 

In [None]:
# Enter your attempt below:


The current code only matches for a single country iso3, for example: `if not country['iso3'] == 'RWA':`

To process for a combination of countries you could replace the equals with `in`, and then check if the country iso3 is in a user-defined list of iso3 codes. Have a go for Azerbaijan and Kenya. 


In [None]:
# Enter your attempt below:


## Country folder structure

When we run our loop for all countries, we need to automate the creation of a sensible folder structure to store all our data. Generally, a sensible way is:

    /data
        /raw
        /processed
            /RWA
                /regions
            

We are going to aim to create this folder structure. 

First, we can specify our path. Remember we are going to need to move from our current directory `/global_assessment/notebooks`, up into the parent folder, and then down into `data/processed/`.

In [None]:
# Example
# specify folder path
# remember '..' means to go up one folder
country_folder_path = os.path.join('..', 'data', 'processed', iso3)
country_folder_path

Now we do not want to create this each time if we have already done so. 

Therefore, we need to be able to check if a path already exists.

Let's use the `os.path.exists()` function to check if a path exists. 

In [None]:
# Example

country_folder_path = os.path.join('..', 'data', 'processed', iso3)

if os.path.exists(country_folder_path): # check if path exists or not
    print('path exists!')
else:
    print('path does not exist!')

Now we can add in logic to our **if function**, for example, as we only want to create the folder if it does not already exist. Therefore, we need to use `if not os.path.exists()`.

In [None]:
# Example
country_folder_path = os.path.join('..', 'data', 'processed', iso3)

if not os.path.exists(country_folder_path): # check if path exists or not
    print('then make the folder!')
else:
    print('do nothing if the path already exists!')

Finally, we can throw in the function to create the path. 

We have multiple options:

    - `os.mkdir()` <- can only create one folder at a time.
    - `os.makedirs()` <- can create multiple new folders at a time.
   
For example, the following will cause an error, as 'test_directory1' does not exist.    

In [None]:
# Example

example_path = os.path.join('..', 'data', 'processed', iso3, 'test_directory1', 'test_directory2')
os.mkdir(example_path)

You would need to use `os.makedirs()`, so you could create multiple new directories at once. 

Let us now put all of that together: 

In [None]:
country_folder_path = os.path.join('..', 'data', 'processed', iso3, 'demo')

if not os.path.exists(country_folder_path): # check if path exists or not
    os.makedirs(country_folder_path)        # if not make the path directories

# Exercise

Create a set of new folder paths, for Rwanda, Azerbaijan and Kenya. 

- Create a loop which selects just these three countries. 
- Create a folder for each country within `global_assessment/processed/`
- Create a `regions` folder for each country within each country's respective directory.
- Also create a `hazards` folder within each country's respective directory. 
- The hazards folder will require a `flooding` folder within it (so `global_assessment/processed/RWA/hazards/flooding`.
- The `flooding` folder will require a `regional` folder within it (so `global_assessment/processed/RWA/hazards/flooding/regional`.


## Specifying filenames in loops

We know we have multiple GADM boundry layers, e.g. gid_0 for national boundaries, and gid_1 for level 1 boundaries. 

As each country requires different boundary layer files, we want to specify these in our loop, using the `countries.csv` metadata. 

Indeed, anytime you need to do something bespoke for each country, add in an additional column to the `countries.csv` metadata file. 

So we can start back with our loop where we extract the `gid_region` indicators we have already defined. Rwanda uses level 1, as follows:


In [None]:
# Example
path = os.path.join('..', 'data', 'countries.csv')
countries = pandas.read_csv(path, encoding='latin-1')

for idx, country in countries.iterrows():
    
    if not country['iso3'] == 'RWA': # if the current country iso3 does not match RWA...
        continue                     # continue in the loop to the next country 
    
    # subset variables of interest
    iso3 = country['iso3']             
    gid_region = country['gid_region']
    
    print(iso3, gid_region)

We have already covered that we can edit a string using a variable by using curly brackets. 

This is a good example: `'gadm36_{}.shp'.format(gid_region)`

In [None]:
# Example

path = os.path.join('..', 'data', 'countries.csv')
countries = pandas.read_csv(path, encoding='latin-1')

for idx, country in countries.iterrows():
    
    if not country['iso3'] == 'RWA': # if the current country iso3 does not match RWA...
        continue                     # continue in the loop to the next country 
    
    # subset variables of interest
    iso3 = country['iso3']             
    gid_region = country['gid_region']
        
    # specify filename for desired layer, based on desired gid_region level (e.g., 1, 2 etc.)
    filename = 'gadm36_{}.shp'.format(gid_region)
    print(filename)
    

So hopefully you can see how this filename is changed based on the `gid_level` specified for each country in the `countries.csv` metadata.

Now we can use geopandas to load in the file:

In [None]:
# Example

path = os.path.join('..', 'data', 'countries.csv')
countries = pandas.read_csv(path, encoding='latin-1')

for idx, country in countries.iterrows():
    
    if not country['iso3'] == 'RWA': # if the current country iso3 does not match RWA...
        continue                     # continue in the loop to the next country 
    
    # subset variables of interest
    iso3 = country['iso3']             
    gid_region = country['gid_region']
        
    # specify filename for desired layer, based on desired gid_region level (e.g., 1, 2 etc.)
    filename = 'gadm36_{}.shp'.format(gid_region)
    global_boundaries_path = os.path.join('..', 'data', 'raw', 'gadm36_levels_shp', filename) 
    
    # load global boundaries data usin geopandas
    global_boundaries = geopandas.read_file(global_boundaries_path)
    print(len(global_boundaries))
    

Subsetting data should be something you are already familiar with. We covered subsetting in the pandas tutorials. 

For example, `global_boundaries[global_boundaries['GID_0'] == iso3]` will subset the global boundaries where the `GID_0` column value matches our iso3 code. 

Finally, we specify out output path, and write out the shapes using the geopandas function `to_file()`.

## Exercise

Set up a coding example where you put what we have learnt today together for Rwanda, Azerbaijan and Kenya.

This should include:

    - The appropriate file structure.
    - A subset of region shapefiles in the correct folder. 


Once complete, write and export the national boundary for each country, placing it in the main country folder (named `national_outline.shp`). 

Remember, start small for a single country, and once you have got the code working, then scale to other countries. 

Good luck!