# Working with global boundary data


In this tutorial we are going to look at how to get started with global boundary data e.g., from the GADM dataset we have already covered.

- Import desired gadm shapefiles using geopandas.
- Select desired shapes.
- Write out desired shapes.


## Using the country metadata lookup table

The `countries.csv` file we previously looked at provides country information for for 250 territories. 

The trick to analyzing the whole world is to use this data in a loop, enabling us to address the processing of each country, one at a time.  

We will need some packages to get started.

In [4]:
# Example
import os           # for basic operating system functions
import pandas       # to load .csv data as a dataframe
import geopandas    # to load shapefiles to a geodataframe

We need to specify our path to the `countries.csv` file first. 

Remember, as the file is in the separate folder (`/data`), we will need to first go up one directory, and then into that directory. 

You may feel this complicates things, but file management is easily 30% of the work when operating at the global scale. 

In [5]:
# Example
path = os.path.join('..', 'data', 'countries.csv')
path

'..\\data\\countries.csv'

Now we have the path, we can load in the data using the `.read_csv()` function we covered previously. 

As some country names have symbols which may not be compliant with a standard encoding format, we have to specify `encoding='latin-1'` to avoid a potential error message.

In [6]:
# Example
countries = pandas.read_csv(path, encoding='latin-1')
countries.head(5)

Unnamed: 0,iso3,iso2,country,continent,gid_region,lowest,Exclude,Population,income_group,flood_region
0,ABW,AW,Aruba,North America,0,0,1,107195,HIC,South America
1,AFG,AF,Afghanistan,Asia,2,2,0,39835428,LIC,South Asia
2,AGO,AO,Angola,Africa,2,2,0,33933611,LMC,Sub-Saharan Africa
3,AIA,AI,Anguilla,North America,0,0,1,-,-,Central America & Caribbean
4,ALA,AX,Åland Islands,Europe,2,2,1,-,-,Western Europe


Important information in this countries dataframe includes:
    
    - `iso3` - the ISO 3 letter country code.
    - `gid_region` - which is the most desirable GADM geographic level to work with. 
    
Now we can demonstrate how we loop over this data.

As it is a `pandas` dataframe, we can use the `iterrows():` function.

In [11]:
# Example
for idx, country in countries.iterrows():
    print(country)

iso3                      ABW
iso2                       AW
country                 Aruba
continent       North America
gid_region                  0
lowest                      0
Exclude                     1
 Population           107,195
income_group              HIC
flood_region    South America
Name: 0, dtype: object
iso3                    AFG
iso2                     AF
country         Afghanistan
continent              Asia
gid_region                2
lowest                    2
Exclude                   0
 Population      39,835,428
income_group            LIC
flood_region     South Asia
Name: 1, dtype: object
iso3                           AGO
iso2                            AO
country                     Angola
continent                   Africa
gid_region                       2
lowest                           2
Exclude                          0
 Population             33,933,611
income_group                   LMC
flood_region    Sub-Saharan Africa
Name: 2, dtype: object
i

As we loop, we can utilize information that we need for each country, for example, related to the `gid_region` level. 

In [None]:
# Example
for idx, country in countries.iterrows():
    
    iso3 = country['iso3']
    
    gid_region = country['gid_region']
    
    print(iso3, gid_region)


Therefore, we can use this loop to initially:
    
    - Create a folder for each country
    - Add boundary information for each country

We just need to add those processing steps to each loop. 

We can try get this working for Rwanda, as it is a very small country. To do this we are going to put a small piece of logic in our code which skips any country which does not match Rwanda's iso3 code (`if not country['iso3'] == 'RWA'`).

Remember: `continue` skips the current item in a loop, and continues to the next. 

In [None]:
# Example
for idx, country in countries.iterrows():
    
    if not country['iso3'] == 'RWA': # if the current country iso3 does not match RWA...
        continue                     # continue in the loop to the next country 
    
    iso3 = country['iso3']
    gid_region = country['gid_region']
    
    print(iso3, gid_region)

Now we can begin by creating a folder for our country. We will do this within a directory called `processed`, which we will create within data (e.g. `data/processed`).

First, we will specify the folder path: `country_folder_path = os.path.join('..', 'data', 'processed', iso3)`.

And then we will check to see if this path exists or not already: `if not os.path.exists(country_folder_path):`.

Only if the path does not exist, will we create it: `os.makedirs(country_folder_path)`.

In [12]:
# Example
for idx, country in countries.iterrows():
    
    if not country['iso3'] == 'RWA': # if the current country iso3 does not match RWA...
        continue                     # continue in the loop to the next country 
    
    iso3 = country['iso3']
    gid_region = country['gid_region']
    
    country_folder_path = os.path.join('..', 'data', 'processed', iso3)
    if not os.path.exists(country_folder_path):
        os.makedirs(country_folder_path)
    
    print(iso3, gid_region)  

RWA 1


You can now check your data directory to check you created that folder.

Our second aim is to create a folder called `regions`, where we will aim to put our gadm regions, once subset from the global data. 

We can reuse the code we used previously, which checks if our desired new folder exists or not, and if not, it creates it using `os.makedirs()`. 

In [None]:
# Example
for idx, country in countries.iterrows():
    
    if not country['iso3'] == 'RWA': # if the current country iso3 does not match RWA...
        continue                     # continue in the loop to the next country 
    
    iso3 = country['iso3']
    gid_region = country['gid_region']
    
    country_folder_path = os.path.join('..', 'data', 'processed', iso3)
    if not os.path.exists(country_folder_path):
        os.makedirs(folder_path)
    
    regions_folder_path = os.path.join('..', 'data', iso3, 'regions')
    if not os.path.exists(regions_folder_path):
        os.makedirs(regions_folder_path)
    
    print(iso3, gid_region)

Now we have our folder directory setup to add our boundaries, we need to load them, subset and write them out. 

We will read the global data by using: `global_boundaries = geopandas.read_file(global_boundaries_path)`.

This imports all countries, therefore we subset for our country using: `country_boundaries = global_boundaries[global_boundaries['GID_0'] == iso3]`. 

Then we can write out our country boundaries: 

`path_out = os.path.join('..', 'data', 'processed', iso3, 'regions', filename)
    country_boundaries.to_file(path_out)`

In [10]:
# Example
for idx, country in countries.iterrows():
    
    if not country['iso3'] == 'RWA': # if the current country iso3 does not match RWA...
        continue                     # continue in the loop to the next country 
    
    iso3 = country['iso3']
    gid_region = country['gid_region']
    
    country_folder_path = os.path.join('..', 'data', 'processed', iso3)
    if not os.path.exists(country_folder_path):
        os.makedirs(folder_path)
    
    regions_folder_path = os.path.join('..', 'data', 'processed', iso3, 'regions')
    if not os.path.exists(regions_folder_path):
        os.makedirs(regions_folder_path)
    
    filename = 'gadm36_{}.shp'.format(gid_region)
    global_boundaries_path = os.path.join('..', 'data', 'raw', 'gadm36_levels_shp', filename) 
    global_boundaries = geopandas.read_file(global_boundaries_path)
    
    country_boundaries = global_boundaries[global_boundaries['GID_0'] == iso3]
    
    path_out = os.path.join('..', 'data', 'processed', iso3, 'regions', filename)
    country_boundaries.to_file(path_out)
    
    print(iso3, gid_region)

  pd.Int64Index,


RWA 1
