# Introductory demo to global assessment methods

This demo provides a basic overview of the key techniques we will use in this class to extract information from raster layers, for specific boundary areas. 

The packages utilized include the following:

- `os` which stands for 'operating system' and contains generic portable functions for using operating system dependent functionality.
- `rasterio` which broadly stands for 'raster input-output' providing reading and writing functionality for raster formats. Once a `rasterio` object has been created this object provides a Python API based on Numpy N-dimensional arrays and GeoJSON.
- `geopandas` which stands for 'geographic panel data' and is an open source project to make working with geospatial data in python easier. `geopandas` extends the datatypes used by pandas to allow spatial operations on geometric types. Geometric operations are performed by `shapely`, depends on `fiona` for file access and `matplotlib` for plotting.
- `rasterstats` is a Python module for summarizing geospatial raster datasets based on vector geometries. It includes functions for zonal statistics and interpolated point queries. The command-line interface allows for easy interoperability with other GeoJSON tools.

This might seem like a lot, but it pretty much covers the full gambit of packages we will use in this `Global Assessment` class. 

If you can master these packages in the relatively basic ways presented here, you will be in a very strong position to excel in this class, and subsequently get a good, well-paid job. 

Let's begin by importing these packages:


In [1]:
# Example
import os
import rasterio 
import geopandas as gpd
from rasterstats import zonal_stats

Next, we want to set the path to the data layers we have. 

Two different types of data exist here:

- a GADM shapefile for a boundary area in Rwanda
- a `World Pop` population density map for Rwanda

You can view these in the `github.com/edwardoughton/global_assessment/data` folder. We can use the `os` model to state the path to these files. For example, the function `os.path.join()` will help us stitch together a directory structure. 

We actually need to go up one folder using the double period indicator (`..`), and then into the `data` folder. The file we want is called `rwa_pd_2020_1km.tif`. This can be completed like so:

In [2]:
# Example
path_population = os.path.join('..','data','rwa_pd_2020_1km.tif')
print(path_population)

../data/rwa_pd_2020_1km.tif


We are now ready to try load this layer in using `rasterio`. We can do this by first using the `rasterio.open()` function, which then creates our `rasterio` object. 

In [3]:
# Example
data = rasterio.open(path_population)
print(data)

<open DatasetReader name='../data/rwa_pd_2020_1km.tif' mode='r'>


We can now see that we have our rasterio object created, and therefore we have access to the Python API interface for this .tif file based on Numpy N-dimensional arrays. 

In [4]:
# Example
data = data.read(1)
print(data)

[[-99999. -99999. -99999. ... -99999. -99999. -99999.]
 [-99999. -99999. -99999. ... -99999. -99999. -99999.]
 [-99999. -99999. -99999. ... -99999. -99999. -99999.]
 ...
 [-99999. -99999. -99999. ... -99999. -99999. -99999.]
 [-99999. -99999. -99999. ... -99999. -99999. -99999.]
 [-99999. -99999. -99999. ... -99999. -99999. -99999.]]


We are now ready to begin extracting data from the population density layer, as desired. 

However, in this demo today, we will utilize a spatial boundary to extract data for a specific region. 

This means we need to create/obtain/load this boundary. Rather than create a boundary from scratch (e.g., a grid), let's load in our existing regional boundary for Rwanda. First, we need to specify our path to the file:

In [5]:
# Example
path_boundary = os.path.join('..','data','gadm41_RWA_0.shp')
print(path_boundary)

../data/gadm41_RWA_0.shp


And now we have the path, we can load the boundary in using the `geopandas` function `read_file()` as follows. 

Let us also specify the coordinate reference system (crs) which is WGS84 (also known as epsg:4326).

In [6]:
# Example
boundary = gpd.read_file(path_boundary, crs='epsg:4326')
print(boundary)

  GID_0 COUNTRY                                           geometry
0   RWA  Rwanda  POLYGON ((29.71332 -2.81759, 29.71295 -2.81774...


Now we know both our data layers are working properly as we have loaded them and checked their contents. 

Let us specify the full code for querying this .tif population density layer for our chosen boundary. We're going to use `with`, rather than the generic `open()` as it provides us with cleaner functionalityt, ensuring resources are closed right after using/processing.

In [7]:
# Example
with rasterio.open(path_population) as src:

    affine = src.transform                       #here we load the affine function 
    array = src.read(1)                          #here we load our actual data from layer 1
    array[array <= 0] = 0                        #here we get rid of negative number 

    population = [i['sum'] for i in zonal_stats( #now we sum all cells within our zonal boundary 
        boundary['geometry'],   #<- providing our boundary
        array,                  #<- providing our .tif raster data 
        nodata=255,             #<- stating the value that cells with no data will have
        stats=['sum'],          #<- stating what statistics we want
        affine=affine)][0]      #<- providing the affine function to convert between coordinate systems
    
    #now let's print the sum of the population in our area!
    print(round(population))

15658804
