# Geotemporal Trends in Urbanization: Cameroon
*Using yearly estimates (2000-2015) of population, built-area, and economic indicators to track city-by-city growth and change over time.*

---

## Research questions 

#### 1. How has the size of Settlement X changed over time? 

- Population size 

- Geographical extents 

- Population density 

#### 2. In what year did Settlement X become a new urban class?  

- From semi-dense to high-density city 

- Small settlement area to built-up area 

- When a hamlet area or small settlement area first appeared

#### 3. Is there a discernable pattern between the spatio-temporal distribution of economic density and population density? 

#### 4. How much of urban space attributable to City X is outside of the administrative limits of the city? 

- When did this fragment(s) appear? 

- Which district/municipality/authority has purview over the fragment(s)? 

#### 5. For the questions above, how does the answer change based on different understandings of urban limits? 

- Scenario A: where "city" is delimited by an official administrative boundary 

- Scenario B: where "city" includes all contiguous (and near-contiguous) built up area 

#### 6. Subnational and inter-national comparisons. Examples: 

- Compare the rates (pop, build-up, economic…) of the fastest growing settlement of each ADM1 region. 

- Which African metropoles experience the most vs. the least fragmentation? Is there a confluence between amount of urban fragmentation and rate of densification? 

---

## Datasets
1. Most up-to-date administrative boundaries: **ADM3.**
2. Built-up area, yearly: **World Settlement Footprint Evolution.** Resolution: 30m. 
3. Settlement types: **GRID3 settlement extents.** Captured between 2009-2019.
4. Population, yearly: **WorldPop.** UN-adjusted, unconstrained. Resolution: 100m.
5. Nighttime lights, yearly: **Harmonization of DMSP and VIIRS.** Resolution: 1km.
6. City names: **UCDB, Africapolis, and GeoNames.**

---

## Joining primary datasets together in raster and vector space. (current notebook)

### 1. PREPARE WORKSPACE

In [2]:
# Installs

#import sys
#!{sys.executable} -m pip install voronoi-diagram-for-polygons xarray-spatial rioxarray pygeos

#!{sys.executable} -m pip install --user --upgrade pygeos

Collecting dea-tools
  Downloading dea_tools-0.2.7-py3-none-any.whl (143 kB)
     -------------------------------------- 143.9/143.9 kB 1.1 MB/s eta 0:00:00
Collecting odc-ui
  Downloading odc_ui-0.2.0a3-py3-none-any.whl (15 kB)
Collecting OWSLib
  Using cached OWSLib-0.27.2-py2.py3-none-any.whl (218 kB)
Collecting rasterstats
  Using cached rasterstats-0.17.0-py3-none-any.whl (16 kB)
Collecting tqdm
  Using cached tqdm-4.64.1-py2.py3-none-any.whl (78 kB)
Collecting geopy
  Using cached geopy-2.2.0-py3-none-any.whl (118 kB)
Collecting datacube
  Downloading datacube-1.8.8-py2.py3-none-any.whl (351 kB)
     -------------------------------------- 351.4/351.4 kB 2.7 MB/s eta 0:00:00
Collecting numexpr
  Using cached numexpr-2.8.3-cp310-cp310-win_amd64.whl (92 kB)
Collecting scikit-image
  Using cached scikit_image-0.19.3-cp310-cp310-win_amd64.whl (12.0 MB)
Collecting ciso8601
  Downloading ciso8601-2.2.0.tar.gz (18 kB)
  Installing build dependencies: started
  Installing build dependenci

  error: subprocess-exited-with-error
  
  Building wheel for ciso8601 (pyproject.toml) did not run successfully.
  exit code: 1
  
  [11 lines of output]
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build\lib.win-amd64-cpython-310
  creating build\lib.win-amd64-cpython-310\ciso8601
  copying ciso8601\__init__.pyi -> build\lib.win-amd64-cpython-310\ciso8601
  copying ciso8601\py.typed -> build\lib.win-amd64-cpython-310\ciso8601
  running build_ext
  building 'ciso8601' extension
  error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
  [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for ciso8601
ERROR: Could not build wheels for ciso8601, which is required to install pyproject.toml-based projects


In [1]:
# Note: Not all of these packages were used in final form. 

import os, sys
import glob
import re
import time

import geopandas as gpd 
import pandas as pd
from shapely.geometry import Point, LineString, Polygon, shape
from shapely.ops import voronoi_diagram
from shapely.validation import make_valid
from longsgis import voronoiDiagram4plg 
import fiona

from xrspatial import zonal_stats 
import xarray as xr 
import rasterio 
from rasterio.plot import show
from rasterio import features
from rasterio.features import shapes
import rioxarray 
from osgeo import gdal
from osgeo import gdal_array
from osgeo import ogr
import matplotlib.pyplot as plt
import numpy as np

# import deafrica_tools
# from deafrica_tools.datahandling import mostcommon_crs
# from deafrica_tools.spatial import xr_vectorize, xr_rasterize

In [2]:
ProjectFolder = os.getcwd()
print(ProjectFolder)

C:\Users\grace\GIS\povertyequity\urban_growth


### 2. DATA PREP
Projection for all datasets: Africa Albers Equal Area Conic

Remove unnecessary fields (e.g. extra fields in gazetteer data)

##### WSFE

In [3]:
test = gdal.Open("C:/Users/grace/GIS/povertyequity/urban_growth/WSFE_TCD.tif")
print(test.GetRasterBand(1).GetNoDataValue()) # What is NoData currently? We might want to change it before using this file as the archetype for other rasterized files.
test = None

99999.0


*Assigning NoData values (0 for each dataset) so that we can ignore them later with vectorizing.*

In [None]:
# # OPEN QGIS FOR THIS PORTION. CODE DOCUMENTED HERE.
# Change NoData value to zero, as this won't interfere with a possible value of 99999 in GRID3 and ADM.
# Then make sure there are no values above 2015 (such as 99999) or below 1985 in the dataset by reclassifying them as NoData.
# Was having trouble with rasterio & gdal here, so moved to QGIS.

# processing.run("native:reclassifybytable", {'INPUT_RASTER':'C:/Users/grace/GIS/povertyequity/urban_growth/WSFE_TCD.tif','RASTER_BAND':1,'TABLE':['2016','','0','','1984','0'],'NO_DATA':0,'RANGE_BOUNDARIES':0,'NODATA_FOR_MISSING':False,'DATA_TYPE':5,'OUTPUT':'C:/Users/grace/GIS/povertyequity/urban_growth/WSFE.tif'})

In [4]:
WSFE = rasterio.open("C:/Users/grace/GIS/povertyequity/urban_growth/WSFE.tif")
print(WSFE) # WSFE values are all 4 digits long (1985-2015)
print(dir(WSFE))
print(WSFE.crs)
print(WSFE.dtypes)
print(WSFE.nodatavals)
print(WSFE.read(1).min(), WSFE.read(1).mean(), np.median(WSFE.read(1)), WSFE.read(1).max())

<open DatasetReader name='C:/Users/grace/GIS/povertyequity/urban_growth/WSFE.tif' mode='r'>
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__pyx_vtable__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_block_shapes', '_closed', '_count', '_crs', '_crs_wkt', '_descriptions', '_dtypes', '_env', '_gcps', '_get_crs', '_get_rpcs', '_handle_crswkt', '_has_band', '_has_gcps_or_rpcs', '_mask_flags', '_nodatavals', '_offsets', '_read', '_rpcs', '_scales', '_set_all_descriptions', '_set_all_offsets', '_set_all_scales', '_set_all_units', '_set_attrs_from_dataset_handle', '_set_crs', '_set_gcps', '_set_nodatavals', '_set_rpcs', '_transform', '_units', 'block_shapes', 'block_size', 'block_window', 'block_win

##### WorldPop: Reproject to same CRS as the vector layers.

In [4]:
WpopFolder = r"C:\Users\grace\GIS\povertyequity\urban_growth\TCD_WorldPop"
WpopFiles = os.listdir('TCD_WorldPop/')
print(WpopFiles)

['tcd_ppp_2000_UNadj.tif', 'tcd_ppp_2001_UNadj.tif', 'tcd_ppp_2002_UNadj.tif', 'tcd_ppp_2003_UNadj.tif', 'tcd_ppp_2004_UNadj.tif', 'tcd_ppp_2005_UNadj.tif', 'tcd_ppp_2006_UNadj.tif', 'tcd_ppp_2007_UNadj.tif', 'tcd_ppp_2008_UNadj.tif', 'tcd_ppp_2009_UNadj.tif', 'tcd_ppp_2010_UNadj.tif', 'tcd_ppp_2011_UNadj.tif', 'tcd_ppp_2012_UNadj.tif', 'tcd_ppp_2013_UNadj.tif', 'tcd_ppp_2014_UNadj.tif', 'tcd_ppp_2015_UNadj.tif', 'Wpop_2000_albers.tif', 'Wpop_2001_albers.tif', 'Wpop_2002_albers.tif', 'Wpop_2003_albers.tif', 'Wpop_2004_albers.tif', 'wpop_2005_albers.tif', 'Wpop_2006_albers.tif', 'Wpop_2007_albers.tif', 'Wpop_2008_albers.tif', 'Wpop_2009_albers.tif', 'Wpop_2010_albers.tif', 'Wpop_2011_albers.tif', 'Wpop_2012_albers.tif', 'Wpop_2013_albers.tif', 'Wpop_2014_albers.tif', 'Wpop_2015_albers.tif']


In [57]:
CRS = gdal.WarpOptions(dstSRS='ESRI:102022')

for item in WpopFiles:
    InputFileName = r'C:\Users\grace\GIS\povertyequity\urban_growth\TCD_WorldPop\{}'.format(item)
    InputRaster = gdal.Open(InputFileName)
    OutputFileName = "Wpop_" + str(re.sub(r'[^0-9]', '', item)) + "_albers.tif"
    OutputRaster = os.path.join(WpopFolder, OutputFileName)
    Warp = gdal.Warp(OutputRaster, 
                     InputRaster, 
                     format='GTiff', 
                     options=CRS) # Reproject to Africa Albers Equal Area Conic
    Warp = None # Closes the files

In [5]:
CheckContents = gdal.Open(r'C:\Users\grace\GIS\povertyequity\urban_growth\TCD_WorldPop\Wpop_2005_albers.tif')
print(CheckContents.GetDescription())
print(CheckContents.GetProjection())
del CheckContents

C:\Users\grace\GIS\povertyequity\urban_growth\TCD_WorldPop\Wpop_2005_albers.tif
PROJCS["Africa_Albers_Equal_Area_Conic",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",0],PARAMETER["longitude_of_center",25],PARAMETER["standard_parallel_1",20],PARAMETER["standard_parallel_2",-23],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]


##### Nighttime Lights

In [None]:
NTL = 

##### GRID3 and Admin areas: Loading through gdal because we'll be rasterizing right away.

In [5]:
ADM_vec = gpd.read_file('TCD_Urb_VDITransfer.gpkg', layer=0)
GRID3_vec = gpd.read_file('TCD_Urb_VDITransfer.gpkg', layer=1)
ADM_out = './ADM.tif'
GRID3_out = './GRID3.tif'
# WSFE loaded earlier. It will be the raster to snap and sample to.

print(ADM_vec.info(), "\n\n", 
      ADM_vec.crs, "\n\n", 
      len(str(ADM_vec['ADM2_CODE'].max()))) # Chad's ADM IDs are up to 4 digits long. (70 features, but their ADM2_CODEs start in the 100s)
print(GRID3_vec.info(), "\n\n", 
      GRID3_vec.crs, "\n\n", 
      len(str(GRID3_vec['OBJECTID'].max()))) # Chad's GRID3 object IDs are up to 6 digits long. (353,534 features)

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 70 entries, 0 to 69
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype   
---  ------     --------------  -----   
 0   ADM0_CODE  70 non-null     int64   
 1   ADM0_NAME  70 non-null     object  
 2   ADM1_CODE  70 non-null     int64   
 3   ADM1_NAME  70 non-null     object  
 4   ADM2_CODE  70 non-null     int64   
 5   ADM2_NAME  70 non-null     object  
 6   geometry   70 non-null     geometry
dtypes: geometry(1), int64(3), object(3)
memory usage: 4.0+ KB
None 

 PROJCS["Africa_Albers_Equal_Area_Conic",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0],UNIT["Degree",0.0174532925199433]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",0],PARAMETER["longitude_of_center",25],PARAMETER["standard_parallel_1",20],PARAMETER["standard_parallel_2",-23],PARAMETER["false_easting",0],PARAMETER["false_n

##### Place names: Three options to work with. 

In [25]:
UCDB = gpd.read_file(r'GHS_UCDB/GHS_STAT_UCDB2015MT_GLOBE_R2019A_V1_2.gpkg', layer=0)
Africapolis = gpd.read_file('AFRICAPOLIS2020.shp')
GeoNames = gpd.read_file('GeoNames_cities500_TCD.shp')
print(UCDB.info(), "\n\n\n", Africapolis.info(), "\n\n\n", GeoNames.info())

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 13135 entries, 0 to 13134
Columns: 161 entries, ID_HDC_G0 to geometry
dtypes: float64(143), geometry(1), object(17)
memory usage: 16.1+ MB
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 7720 entries, 0 to 7719
Data columns (total 17 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   agglosID    7720 non-null   int64   
 1   agglosName  7720 non-null   object  
 2   ISO3        7720 non-null   object  
 3   Longitude   7720 non-null   float64 
 4   Latitude    7720 non-null   float64 
 5   Pop2015     7720 non-null   int64   
 6   builtUp     7720 non-null   float64 
 7   Voronoi     7720 non-null   int64   
 8   Pop1950     7720 non-null   int64   
 9   Pop1960     7720 non-null   int64   
 10  Pop1970     7720 non-null   int64   
 11  Pop1980     7720 non-null   int64   
 12  Pop1990     7720 non-null   int64   
 13  Pop2000     7720 non-null   int64   
 14  Pop2010    

### 3. WSFE AND ADM; GRID3 AND ADM
RASTERIZE: Bring ADM and GRID3 into raster space.

RASTER MATH: "Join" ADM ID onto GRID3 and onto WSFE by creating unique concatenation string.

VECTORIZE: Bring joined data into vector space.

VECTOR MATH: Split unique ID from raster math step into separate columns.

##### Rasterize

In [6]:
# Copy and update the metadata from WSFE for the output
meta = WSFE.meta.copy()
meta.update(compress='lzw')

In [7]:
with rasterio.open(ADM_out, 'w+', **meta) as out:
    out_arr = out.read(1)

    # this is where we create a generator of geom, value pairs to use in rasterizing
    shapes = ((geom,value) for geom, value in zip(ADM_vec.geometry, ADM_vec.ADM2_CODE))

    burned = features.rasterize(shapes=shapes, fill=0, out=out_arr, transform=out.transform, all_touched=False)
    out.write_band(1, burned)

In [8]:
with rasterio.open(GRID3_out, 'w+', **meta) as out:
    out_arr = out.read(1)

    # this is where we create a generator of geom, value pairs to use in rasterizing
    shapes = ((geom,value) for geom, value in zip(GRID3_vec.geometry, GRID3_vec.OBJECTID))

    burned = features.rasterize(shapes=shapes, fill=0, out=out_arr, transform=out.transform, all_touched=False)
    out.write_band(1, burned)

In [9]:
ADM_rast = gdal.Open(r"ADM.tif")
GRID3_rast = gdal.Open(r"GRID3.tif")
WSFE_rast = gdal.Open(r"WSFE.tif")

ADM_band = ADM_rast.GetRasterBand(1)
GRID3_band = GRID3_rast.GetRasterBand(1)
WSFE_band = WSFE_rast.GetRasterBand(1)

print(gdal.GetDataTypeName(ADM_band.DataType), ADM_band.GetNoDataValue(),
      gdal.GetDataTypeName(GRID3_band.DataType), GRID3_band.GetNoDataValue(), 
      gdal.GetDataTypeName(WSFE_band.DataType), WSFE_band.GetNoDataValue())

Float32 0.0 Float32 0.0 Float32 0.0


##### Raster math
Processing is more rapid when "joining," i.e. creating serial codes out of two datasets, in raster rather than vector space.
Here, we are concatenating the ID fields of the two datasets to create a serial number that we can then split in vector space later to create two ID fields.

In [10]:
RastersList = [rasterio.open(r"ADM.tif"), rasterio.open(r"GRID3.tif"), rasterio.open(r"WSFE.tif")]

for item in RastersList:
    print(item.name, "\nBands= ", item.count, "\nWxH= ", item.width, "x", item.height, "\n\n")

stats = []
for item in RastersList:
    band = item.read(1)
    stats.append({
        'raster': item.name,
        'min': band.min(),
        'mean': band.mean(),
        'median': np.median(band),
        'max': band.max()})

# Show stats for each channel
print("\n", stats)

ADM.tif 
Bands=  1 
WxH=  39041 x 60394 


GRID3.tif 
Bands=  1 
WxH=  39041 x 60394 


WSFE.tif 
Bands=  1 
WxH=  39041 x 60394 



 [{'raster': 'ADM.tif', 'min': 0.0, 'mean': 814.0737, 'median': 301.0, 'max': 2303.0}, {'raster': 'GRID3.tif', 'min': 0.0, 'mean': 1171.708, 'median': 0.0, 'max': 353534.0}, {'raster': 'WSFE.tif', 'min': 0.0, 'mean': 0.52195686, 'median': 0.0, 'max': 2015.0}]


*Adding together the values to create join IDs. This is in effect a concatenation of their ID strings, by way of summation. The number of zeros in the calc multiplication corresponds with number of digits of the maximum value in the "B" dataset. (e.g. Chad ADM codes go up 4 digits, so it's calc=(A*10000)+B).*

In [128]:
# # OPEN TERMINAL FOR THIS PORTION. CODE DOCUMENTED HERE.

# Gdal_calc.py # To see info.

# gdal_calc.py -A C:\Users\grace\GIS\povertyequity\urban_growth\GRID3.tif -B  C:\Users\grace\GIS\povertyequity\urban_growth\ADM.tif --outfile=C:\Users\grace\GIS\povertyequity\urban_growth\GRID3_ADM.tif --overwrite --calc="(A*10000)+B"
# gdal_calc.py -A C:\Users\grace\GIS\povertyequity\urban_growth\WSFE.tif -B  C:\Users\grace\GIS\povertyequity\urban_growth\ADM.tif --outfile=C:\Users\grace\GIS\povertyequity\urban_growth\WSFE_ADM.tif --overwrite --calc="(A*10000)+B"

# # END TERMINAL-ONLY ASPECT. RETURN HERE FOR NEXT STEPS.

SyntaxError: invalid syntax (3519976050.py, line 5)

In [11]:
# Validation: check the basic statistics of the resulting datasets.
RastersList = [rasterio.open("WSFE_ADM.tif"), rasterio.open("GRID3_ADM.tif")]
for item in RastersList:
    print(item.name, "\nBands= ", item.count, "\nWxH= ", item.width, "x", item.height, "\n\n")
    
stats = []
for item in RastersList:
    band = item.read(1)
    stats.append({
        'raster': item.name,
        'min': band.min(),
        'mean': band.mean(),
        'median': np.median(band),
        'max': band.max()})

# Show stats for each channel
print("\n", stats)

WSFE_ADM.tif 
Bands=  1 
WxH=  39041 x 60394 


GRID3_ADM.tif 
Bands=  1 
WxH=  39041 x 60394 




  ret = umr_sum(arr, axis, dtype, out, keepdims, where=where)



 [{'raster': 'WSFE_ADM.tif', 'min': 19850100.0, 'mean': inf, 'median': inf, 'max': 3.4028235e+38}, {'raster': 'GRID3_ADM.tif', 'min': 10906.0, 'mean': inf, 'median': inf, 'max': 3.4028235e+38}]


##### Vectorize

In [None]:
# OPEN QGIS FOR THIS PORTION. CODE DOCUMENTED HERE.

# Due to dtype errors with both gdal and rasterio here, I decided to run the raster to polygon function in QGIS instead.
# It is possible to run QGIS functions within a Jupyter Notebook, but I ran it within the GUI. Arc or R are other options.
# Command line code here.

# processing.run("gdal:polygonize", {'INPUT':'C:/Users/grace/GIS/povertyequity/urban_growth/GRID3_ADM.tif','BAND':1,'FIELD':'DN','EIGHT_CONNECTEDNESS':False,'EXTRA':'','OUTPUT':'C:/Users/grace/GIS/povertyequity/urban_growth/GRID3_ADM.shp'})
# processing.run("gdal:polygonize", {'INPUT':'C:/Users/grace/GIS/povertyequity/urban_growth/WSFE_ADM.tif','BAND':1,'FIELD':'DN','EIGHT_CONNECTEDNESS':False,'EXTRA':'','OUTPUT':'C:/Users/grace/GIS/povertyequity/urban_growth/WSFE_ADM.shp'})

##### Vector math

In [12]:
# Load newly created vectorized datasets.
GRID3_ADM = gpd.read_file("GRID3_ADM.shp")
WSFE_ADM = gpd.read_file("WSFE_ADM.shp")
print(GRID3_ADM.info(), "\n\n", GRID3_ADM.sample(10), "\n\n", GRID3_ADM.crs, "\n\n", 
      WSFE_ADM.info(), "\n\n", WSFE_ADM.sample(10), "\n\n", WSFE_ADM.crs)

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 699644 entries, 0 to 699643
Data columns (total 2 columns):
 #   Column    Non-Null Count   Dtype   
---  ------    --------------   -----   
 0   DN        699644 non-null  int64   
 1   geometry  699644 non-null  geometry
dtypes: geometry(1), int64(1)
memory usage: 10.7 MB
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 258724 entries, 0 to 258723
Data columns (total 2 columns):
 #   Column    Non-Null Count   Dtype   
---  ------    --------------   -----   
 0   DN        258724 non-null  int64   
 1   geometry  258724 non-null  geometry
dtypes: geometry(1), int64(1)
memory usage: 3.9 MB
None 

                 DN                                           geometry
308557   181591102  POLYGON ((-999027.533 1104229.464, -998970.975...
565402  1979512064  POLYGON ((-486353.195 1415020.616, -486240.078...
229515  1542450502  POLYGON ((-806163.118 1392580.605, -806078.280...
31287   2147483647  POLYGON ((-448826.644 169

In [13]:
# Split serial back into separate dataset fields.
# Remember, for Chad: WSFE and ADM: 4+4=8 digits. GRID3 and ADM: 6+4=10 digits.
GRID3_ADM['gridstring'] = GRID3_ADM['DN'].astype(str).str.zfill(10)
WSFE_ADM['gridstring'] = WSFE_ADM['DN'].astype(str).str.zfill(8)

GRID3_ADM['GRID3_OID'] = GRID3_ADM['gridstring'].str[:-4].astype(int) # Remove the last 4 digits to get the GRID3 portion.
GRID3_ADM['ADM'] = GRID3_ADM['gridstring'].str[-4:].astype(int) # Keep only the last 4 digits to get the ADM portion.
WSFE_ADM['year'] = WSFE_ADM['gridstring'].str[:-4].astype(int)
WSFE_ADM['ADM'] = WSFE_ADM['gridstring'].str[-4:].astype(int)

print(GRID3_ADM.sample(10), WSFE_ADM.sample(10))

                DN                                           geometry  \
553430  1856010368  POLYGON ((-646244.017 1432226.630, -646130.900...   
246118  1138772101  POLYGON ((-284015.234 1353235.383, -283986.955...   
407508  2147483647  POLYGON ((-339188.031 1643872.637, -339131.472...   
452142  2147483647  POLYGON ((-378892.084 1586388.908, -378807.246...   
245633  1070141501  POLYGON ((-422413.834 1354589.003, -422328.996...   
697868   218500896  POLYGON ((-864955.657 946758.338, -864899.099 ...   
506011  1427960576  POLYGON ((-987517.883 1519490.000, -987461.324...   
109418  2147483647  POLYGON ((-503688.369 1578176.947, -503631.810...   
638110   210431104  POLYGON ((-960822.280 1155366.220, -960737.443...   
71707   2147483647  POLYGON ((-957711.564 1619206.673, -957626.726...   

        gridstring  GRID3_OID   ADM  
553430  1856010368     185601   368  
246118  1138772101     113877  2101  
407508  2147483647     214748  3647  
452142  2147483647     214748  3647  
245633

In [14]:
# Remove features where year, settlement, or admin area = 0.
# This was supposed to be resolved earlier with the gdal_calc NoDataValue parameter, but guess it didn't work.

print("Before: WSFE %s and GRID3 %s\n" % (WSFE_ADM.shape, GRID3_ADM.shape))
WSFE_ADM = WSFE_ADM.loc[(WSFE_ADM["year"] != 0) & (WSFE_ADM["ADM"] != 0)] # Since we change the datatype to integer, no need to include all digits. Otherwise, it would need to be: != '0000'
GRID3_ADM = GRID3_ADM.loc[(GRID3_ADM["GRID3_OID"] != 0) & (GRID3_ADM["ADM"] != 0)]
print("After: WSFE %s and GRID3 %s\n" % (WSFE_ADM.shape, GRID3_ADM.shape))

Before: WSFE (258724, 5) and GRID3 (699644, 5)

After: WSFE (253434, 5) and GRID3 (699644, 5)



In [15]:
GRID3_ADM['GRID3_splitID'] = GRID3_ADM.index
WSFE_ADM['WSFE_ID'] = WSFE_ADM.index

### 4. WSFE AND GRID3
NEAR JOIN: Join GRID3 ID onto WSFE by spatial (within a distance) and attribute matching.

In [91]:
# GRID3_ADM = gpd.read_file("GRID3_ADM_cleaned.shp")
# WSFE_ADM = gpd.read_file("WSFE_ADM_cleaned.shp")

In [81]:
# GRID3_ADM['ADM_copy'] = pd.to_numeric(GRID3_ADM['ADM'])
# WSFE_ADM['ADM_copy'] = pd.to_numeric(WSFE_ADM['ADM'])

In [16]:
WSFE_ADM.ADM.unique()

array([2201, 2301,  202,  201, 2001, 2002, 2303, 1703, 1704, 1701,  602,
       1702,  601, 1401, 1403, 1901,  704,  102,  603,  702,  703,  701,
       1903,  101,  103,  501,  502,  503, 2102, 1402,  401, 2101,  404,
        302, 1801,  301,  402, 1502,  403, 1103, 1501,  303, 1101, 1303,
       1503, 1601, 1301, 1203, 1104, 1202, 1102, 1602, 1603, 1201, 1001,
        804,  803,  801,  901,  802, 1003, 1302,  904, 1002,  906,  902,
        905,  903, 2200, 2300,  200, 2000, 2304, 1700,  600, 1400, 1404,
       1900,  604,  700, 1904,  100,  104,  500,  504,  400, 2100, 1800,
        300, 1500,  304, 1100, 1304, 1504, 1600, 1300, 1204, 1604, 1200,
       1000,  800,  900, 1004])

In [17]:
GRID3_ADM.ADM.unique()

array([3647,  101,  102, 1902, 1702, 1403, 1701,  702,  103,  501,  603,
       1901, 1402, 1903,  502, 2102,  404,  701,  503, 2101,  401,  402,
        301,  302, 1801, 1502, 1501, 1503,  403, 1103,  303, 1101, 1301,
       1303, 1601, 1203, 1104, 1202, 1102, 1602, 1201, 1001, 1603,  804,
        901,  803,  904,  802,  801, 1002, 1003, 1302,  906,  902,  905,
        903,   80,   96,  112,  128,  144,  176,  160,   64,  192, 1920,
       1632, 1376, 1392, 1408, 1424, 1440, 1456, 1728, 1344, 1360, 1712,
       1472,  608,  624,  640,  672,  656,  704,  688,  592,  576, 1888,
        464,  480,  512,  544,  496,  528, 1904,  560,  448, 1872, 1936,
       1968, 1952, 1856, 1984, 2064, 2080, 2096, 2000,  400, 2016,  416,
        432, 2032,  320, 2048, 2112,  336,  368,  352,  384,  304,  288,
        240,  256,  272,  208,  224, 1776, 1744, 1808, 1488, 1504, 1520,
       1568, 1760, 1584, 1536, 1552, 1600, 1840, 1824, 2144, 1216, 1120,
       1088, 1056, 1072, 1264, 1312, 1296, 1280, 13

In [18]:
not_matching = list(set(WSFE_ADM.ADM.unique().tolist()) - set(GRID3_ADM.ADM.unique().tolist()))
not_matching

[2304,
 900,
 1800,
 1300,
 2200,
 2201,
 1304,
 1404,
 1700,
 1703,
 1704,
 300,
 2100,
 1204,
 700,
 703,
 1604,
 200,
 201,
 202,
 1100,
 2001,
 2002,
 600,
 601,
 602,
 604,
 1500,
 100,
 104,
 1000,
 504,
 1900,
 1004,
 500,
 1400,
 1401,
 2300,
 2301,
 2303]

In [19]:
print(WSFE_ADM.info(), "\n\n", WSFE_ADM.sample(5), "\n\n\n", GRID3_ADM.info(), "\n\n", GRID3_ADM.sample(5))

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 253434 entries, 0 to 258723
Data columns (total 6 columns):
 #   Column      Non-Null Count   Dtype   
---  ------      --------------   -----   
 0   DN          253434 non-null  int64   
 1   geometry    253434 non-null  geometry
 2   gridstring  253434 non-null  object  
 3   year        253434 non-null  int32   
 4   ADM         253434 non-null  int32   
 5   WSFE_ID     253434 non-null  int64   
dtypes: geometry(1), int32(2), int64(2), object(1)
memory usage: 11.6+ MB
<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 699644 entries, 0 to 699643
Data columns (total 6 columns):
 #   Column         Non-Null Count   Dtype   
---  ------         --------------   -----   
 0   DN             699644 non-null  int64   
 1   geometry       699644 non-null  geometry
 2   gridstring     699644 non-null  object  
 3   GRID3_OID      699644 non-null  int32   
 4   ADM            699644 non-null  int32   
 5   GRID3_splitID  6996

In [22]:
WSFE_matches = WSFE_ADM[~WSFE_ADM["ADM"].isin(not_matching)] # Take only the features that share an ADM with at least one GRID3 feature.
GRID3_matches = GRID3_ADM[~GRID3_ADM["ADM"].isin(not_matching)]

new_not_matching = list(set(WSFE_matches.ADM.unique().tolist()) - set(GRID3_matches.ADM.unique().tolist()))
new_not_matching # This should be empty.

[]

In [20]:
# Shard the smaller dataframe into a dict
shards = {k:d for k, d in GRID3_matches.groupby('ADM')}

In [23]:
# Now just group by ADM, sjoin_nearest appropriate shard
WSFE_GRID3 = WSFE_matches.groupby('ADM').apply(
    lambda d: gpd.sjoin_nearest(
    d, shards[d['ADM'].values[0]], 
        how='left', 
        max_distance=500))
WSFE_GRID3.sample(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,DN_left,geometry,gridstring_left,year,ADM_left,WSFE_ID,index_right,DN_right,gridstring_right,GRID3_OID,ADM_right,GRID3_splitID
ADM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
801,122278,20010801,"POLYGON ((-957372.213 1003760.780, -957343.934...",20010801,2001,801,122278,341474.0,41490800.0,41490801.0,4149.0,801.0,341474.0
1202,70742,19961202,"POLYGON ((-1121561.479 1136656.183, -1121533.2...",19961202,1996,1202,70742,296426.0,515191200.0,515191202.0,51519.0,1202.0,296426.0
1002,247315,19971002,"POLYGON ((-770446.438 1007881.801, -770418.159...",19971002,1997,1002,247315,340273.0,377921000.0,377921002.0,37792.0,1002.0,340273.0
804,97507,19990804,"POLYGON ((-896996.036 1069275.988, -896967.757...",19990804,1999,804,97507,321010.0,254020800.0,254020804.0,25402.0,804.0,321010.0
1602,205027,20151602,"POLYGON ((-955279.549 1119841.215, -955251.270...",20151602,2015,1602,205027,302354.0,143461600.0,143461602.0,14346.0,1602.0,302354.0
804,227021,19980804,"POLYGON ((-932203.690 1063410.301, -932175.411...",19980804,1998,804,227021,,,,,,
1603,88761,20051603,"POLYGON ((-938453.402 1089941.253, -938396.843...",20051603,2005,1603,88761,313851.0,327831600.0,327831603.0,32783.0,1603.0,313851.0
302,189119,20040302,"POLYGON ((-1024139.498 1410388.228, -1024111.2...",20040302,2004,302,189119,220108.0,1315070000.0,1315070302.0,131507.0,302.0,220108.0
1002,127527,19941002,"POLYGON ((-789959.113 978794.011, -789930.834 ...",19941002,1994,1002,127527,345940.0,379921000.0,379921002.0,37992.0,1002.0,345940.0
1303,77496,19851303,"POLYGON ((-631538.812 1114547.057, -631425.695...",19851303,1985,1303,77496,304502.0,918781300.0,918781303.0,91878.0,1303.0,304502.0


In [24]:
# The grouping (sharding) approach created a multi-index of WSFE and ADM3. Resetting because this interferes with geoviz.
WSFE_GRID3 = WSFE_GRID3.reset_index(level=[0,1])
WSFE_GRID3.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 176309 entries, 0 to 176308
Data columns (total 14 columns):
 #   Column            Non-Null Count   Dtype   
---  ------            --------------   -----   
 0   ADM               176309 non-null  int64   
 1   level_1           176309 non-null  int64   
 2   DN_left           176309 non-null  int64   
 3   geometry          176309 non-null  geometry
 4   gridstring_left   176309 non-null  object  
 5   year              176309 non-null  int32   
 6   ADM_left          176309 non-null  int32   
 7   WSFE_ID           176309 non-null  int64   
 8   index_right       158850 non-null  float64 
 9   DN_right          158850 non-null  float64 
 10  gridstring_right  158850 non-null  object  
 11  GRID3_OID         158850 non-null  float64 
 12  ADM_right         158850 non-null  float64 
 13  GRID3_splitID     158850 non-null  float64 
dtypes: float64(5), geometry(1), int32(2), int64(4), object(2)
memory usage: 17.5+ MB


### 5. ADD NAMES
JOIN FEATURES: UCDB, Africapolis, and GeoNames onto the new GRID3 vectors.

In [26]:
# Clean up before joining: drop unnecessary columns, reproject, and clip.

UCDB = gpd.clip(UCDB[['UC_NM_MN', 'geometry']].rename(columns={"UC_NM_MN": "UCDB_Name"}).to_crs("ESRI:102022"), ADM_vec)
Africapolis = gpd.clip(Africapolis[['agglosName', 'geometry']].rename(columns={"agglosName": "Afpl_Name"}).to_crs("ESRI:102022"), ADM_vec)
GeoNames = gpd.clip(GeoNames[['field_2', 'geometry']].rename(columns={"field_2": "GeoName"}).to_crs("ESRI:102022"), ADM_vec)

print(UCDB.sample(5), "\n\n", Africapolis.sample(5), "\n\n", GeoNames.sample(5))

           UCDB_Name                                           geometry
3340  نيتكتن Bitkine  POLYGON ((-707252.037 1409151.654, -707284.782...
3261       Bourkouri  POLYGON ((-885530.535 1656631.599, -885565.497...
3364    Faya-Largeau  POLYGON ((-614881.354 2087477.502, -614991.892...
3344             Ati  POLYGON ((-693573.374 1550322.274, -693647.251...
3277            Gore  POLYGON ((-869748.783 936977.772, -869784.231 ... 

           Afpl_Name                                           geometry
3622       Dourbali  POLYGON ((-952298.121 1385081.244, -952555.167...
4784  Treguine camp  POLYGON ((-345428.357 1579749.669, -345364.348...
6859      Massakory  POLYGON ((-967110.462 1521595.239, -967169.717...
3865  Goz Amer camp  POLYGON ((-339139.528 1410083.470, -339186.683...
3598   Dosseye Camp  POLYGON ((-871418.652 943218.085, -871340.044 ... 

         GeoName                          geometry
42      Bébédja   POINT (-877950.903 1022673.632)
22          Mao  POINT (-1011099.065

In [27]:
WSFE_GRID3 = gpd.sjoin(WSFE_GRID3, GeoNames, how='left', predicate='contains', lsuffix="G3", rsuffix="GN") # Name file is point type, so we can do contain.
WSFE_GRID3 = gpd.sjoin(WSFE_GRID3, Africapolis, how='left', predicate='intersects', lsuffix="G3", rsuffix="Af") # Name file is polygon type.
WSFE_GRID3 = gpd.sjoin(WSFE_GRID3, UCDB, how='left', predicate='intersects', lsuffix="G3", rsuffix="UC") # Name file is polygon type.

In [33]:
print(WSFE_GRID3.info(), "\n\n", WSFE_GRID3.sample(10))

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 176310 entries, 0 to 176308
Data columns (total 20 columns):
 #   Column            Non-Null Count   Dtype   
---  ------            --------------   -----   
 0   ADM               176310 non-null  int64   
 1   level_1           176310 non-null  int64   
 2   DN_left           176310 non-null  int64   
 3   geometry          176310 non-null  geometry
 4   gridstring_left   176310 non-null  object  
 5   year              176310 non-null  int32   
 6   ADM_left          176310 non-null  int32   
 7   WSFE_ID           176310 non-null  int64   
 8   index_right       158851 non-null  float64 
 9   DN_right          158851 non-null  float64 
 10  gridstring_right  158851 non-null  object  
 11  GRID3_OID         158851 non-null  float64 
 12  ADM_right         158851 non-null  float64 
 13  GRID3_splitID     158851 non-null  float64 
 14  index_GN          53 non-null      float64 
 15  GeoName           53 non-null      object  

In [35]:
print(WSFE_GRID3['GeoName'].count(), WSFE_GRID3['Afpl_Name'].count(), WSFE_GRID3['UCDB_Name'].count())

53 75838 51484


In [37]:
WSFE_GRID3 = WSFE_GRID3[['ADM', 'geometry', 'year', 'WSFE_ID', 'GRID3_OID', 'GRID3_splitID', 'GeoName', 'Afpl_Name', 'UCDB_Name']]
print(WSFE_GRID3.info(), "\n\n", WSFE_GRID3.sample(10))

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 176310 entries, 0 to 176308
Data columns (total 9 columns):
 #   Column         Non-Null Count   Dtype   
---  ------         --------------   -----   
 0   ADM            176310 non-null  int64   
 1   geometry       176310 non-null  geometry
 2   year           176310 non-null  int32   
 3   WSFE_ID        176310 non-null  int64   
 4   GRID3_OID      158851 non-null  float64 
 5   GRID3_splitID  158851 non-null  float64 
 6   GeoName        53 non-null      object  
 7   Afpl_Name      75838 non-null   object  
 8   UCDB_Name      51484 non-null   object  
dtypes: float64(2), geometry(1), int32(1), int64(2), object(3)
memory usage: 12.8+ MB
None 

          ADM                                           geometry  year  \
163214  1801  POLYGON ((-1028126.871 1422390.325, -1028098.5...  2010   
11015    303  POLYGON ((-784670.896 1188424.628, -784642.616...  2015   
56595    804  POLYGON ((-904235.522 1061695.716, -904207.242...

In [39]:
# Save intermediate files.
WSFE_GRID3.to_file(driver='GPKG', filename='WSFE_GRID3.gpkg', layer='WSFE_GRID3')

### 6. WSFE CUMULATIVE
DISSOLVE BY YEAR SETS: Create separate feature layers of each cumulative year.

In [3]:
def CreateList(r1, r2):
    return [item for item in range(r1, r2+1)]

CuStart, CuEnd = 1985, 2015
StudyStart, StudyEnd = 1999, 2015

AllCuYears = CreateList(CuStart, CuEnd)
AllStudyYears = CreateList(StudyStart, StudyEnd)
print(AllCuYears, AllStudyYears)

[1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015] [1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015]


In [57]:
for item in AllStudyYears:
    print('Subsetting to cumulative area for year: %s. %s\n' % (item, time.ctime()))
    CuYearSet = WSFE_GRID3[WSFE_GRID3['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.
    print('Dissolving so that each unique settlement (GRID3_splitID) has a single cumulative WSFE feature. %s\n' % time.ctime())
    CuYearDissolve = CuYearSet.dissolve(by='GRID3_splitID', 
                                        aggfunc={"year": "max", "WSFE_ID":"min", "GRID3_OID":"min", 
                                                 "GeoName":"first", "Afpl_Name":"first", "UCDB_Name":"first"},)
    print('Write to file. %s\n' % time.ctime())
    CuYearName = ''.join(['Cu', str(item)])
    CuYearDissolve.to_file(driver='GPKG', filename='WSFE_CuDissolve.gpkg', layer=CuYearName)
    del CuYearSet, CuYearDissolve
print("Done with all years in set. %s" % time.ctime())

Subsetting to cumulative area for year: 1999. Mon Oct 10 20:24:52 2022

Dissolving so that each unique settlement (GRID3_splitID) has a single cumulative WSFE feature. Mon Oct 10 20:24:52 2022



  CuYearSet = WSFE_GRID3[WSFE_GRID3['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Mon Oct 10 20:25:05 2022

Subsetting to cumulative area for year: 2000. Mon Oct 10 20:25:09 2022

Dissolving so that each unique settlement (GRID3_splitID) has a single cumulative WSFE feature. Mon Oct 10 20:25:09 2022



  CuYearSet = WSFE_GRID3[WSFE_GRID3['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Mon Oct 10 20:25:29 2022

Subsetting to cumulative area for year: 2001. Mon Oct 10 20:25:33 2022

Dissolving so that each unique settlement (GRID3_splitID) has a single cumulative WSFE feature. Mon Oct 10 20:25:33 2022



  CuYearSet = WSFE_GRID3[WSFE_GRID3['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Mon Oct 10 20:25:52 2022

Subsetting to cumulative area for year: 2002. Mon Oct 10 20:25:56 2022

Dissolving so that each unique settlement (GRID3_splitID) has a single cumulative WSFE feature. Mon Oct 10 20:25:56 2022



  CuYearSet = WSFE_GRID3[WSFE_GRID3['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Mon Oct 10 20:26:15 2022

Subsetting to cumulative area for year: 2003. Mon Oct 10 20:26:20 2022

Dissolving so that each unique settlement (GRID3_splitID) has a single cumulative WSFE feature. Mon Oct 10 20:26:20 2022



  CuYearSet = WSFE_GRID3[WSFE_GRID3['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Mon Oct 10 20:26:40 2022

Subsetting to cumulative area for year: 2004. Mon Oct 10 20:26:43 2022

Dissolving so that each unique settlement (GRID3_splitID) has a single cumulative WSFE feature. Mon Oct 10 20:26:43 2022



  CuYearSet = WSFE_GRID3[WSFE_GRID3['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Mon Oct 10 20:27:05 2022

Subsetting to cumulative area for year: 2005. Mon Oct 10 20:27:08 2022

Dissolving so that each unique settlement (GRID3_splitID) has a single cumulative WSFE feature. Mon Oct 10 20:27:08 2022



  CuYearSet = WSFE_GRID3[WSFE_GRID3['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Mon Oct 10 20:27:24 2022

Subsetting to cumulative area for year: 2006. Mon Oct 10 20:27:27 2022

Dissolving so that each unique settlement (GRID3_splitID) has a single cumulative WSFE feature. Mon Oct 10 20:27:27 2022



  CuYearSet = WSFE_GRID3[WSFE_GRID3['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Mon Oct 10 20:27:43 2022

Subsetting to cumulative area for year: 2007. Mon Oct 10 20:27:47 2022

Dissolving so that each unique settlement (GRID3_splitID) has a single cumulative WSFE feature. Mon Oct 10 20:27:47 2022



  CuYearSet = WSFE_GRID3[WSFE_GRID3['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Mon Oct 10 20:28:05 2022

Subsetting to cumulative area for year: 2008. Mon Oct 10 20:28:10 2022

Dissolving so that each unique settlement (GRID3_splitID) has a single cumulative WSFE feature. Mon Oct 10 20:28:10 2022



  CuYearSet = WSFE_GRID3[WSFE_GRID3['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Mon Oct 10 20:28:31 2022

Subsetting to cumulative area for year: 2009. Mon Oct 10 20:28:34 2022

Dissolving so that each unique settlement (GRID3_splitID) has a single cumulative WSFE feature. Mon Oct 10 20:28:34 2022



  CuYearSet = WSFE_GRID3[WSFE_GRID3['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Mon Oct 10 20:28:57 2022

Subsetting to cumulative area for year: 2010. Mon Oct 10 20:29:00 2022

Dissolving so that each unique settlement (GRID3_splitID) has a single cumulative WSFE feature. Mon Oct 10 20:29:00 2022



  CuYearSet = WSFE_GRID3[WSFE_GRID3['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Mon Oct 10 20:29:17 2022

Subsetting to cumulative area for year: 2011. Mon Oct 10 20:29:22 2022

Dissolving so that each unique settlement (GRID3_splitID) has a single cumulative WSFE feature. Mon Oct 10 20:29:22 2022



  CuYearSet = WSFE_GRID3[WSFE_GRID3['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Mon Oct 10 20:29:45 2022

Subsetting to cumulative area for year: 2012. Mon Oct 10 20:29:47 2022

Dissolving so that each unique settlement (GRID3_splitID) has a single cumulative WSFE feature. Mon Oct 10 20:29:48 2022



  CuYearSet = WSFE_GRID3[WSFE_GRID3['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Mon Oct 10 20:30:06 2022

Subsetting to cumulative area for year: 2013. Mon Oct 10 20:30:08 2022

Dissolving so that each unique settlement (GRID3_splitID) has a single cumulative WSFE feature. Mon Oct 10 20:30:08 2022



  CuYearSet = WSFE_GRID3[WSFE_GRID3['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Mon Oct 10 20:30:30 2022

Subsetting to cumulative area for year: 2014. Mon Oct 10 20:30:34 2022

Dissolving so that each unique settlement (GRID3_splitID) has a single cumulative WSFE feature. Mon Oct 10 20:30:34 2022



  CuYearSet = WSFE_GRID3[WSFE_GRID3['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Mon Oct 10 20:30:57 2022

Subsetting to cumulative area for year: 2015. Mon Oct 10 20:31:01 2022

Dissolving so that each unique settlement (GRID3_splitID) has a single cumulative WSFE feature. Mon Oct 10 20:31:01 2022



  CuYearSet = WSFE_GRID3[WSFE_GRID3['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Mon Oct 10 20:31:24 2022

Done with all years in set. Mon Oct 10 20:31:27 2022


---

### Associating yearly datasets. (next notebook)

##### SETTLEMENT EXTENTS BY YEAR

Concatenate for cumulative year versions.

##### THIESSEN (VORONOI) POLYGONS

For each year, demarcate the surrounding space which is closest to a particular feature than to any other feature in the year set.

##### BUFFER

Buffer area of each built-up polygon and use that buffer to clip the Thiessen areas.

##### MASK

Use the buffered Thiessen polygons to reduce the raster size and limit the area included in the zonal stats.

##### ZONAL STATISTICS

Summarize raster data by the buffered Thiessen polygons.

#### Input datasets

In [4]:
BuiltAreaList = fiona.listlayers("WSFE_CuDissolve.gpkg")
print(BuiltAreaList)

['Cu1985', 'Cu1986', 'Cu1987', 'Cu1988', 'Cu1989', 'Cu1990', 'Cu1991', 'Cu1992', 'Cu1993', 'Cu1994', 'Cu1995', 'Cu1996', 'Cu1997', 'Cu1998', 'Cu1999', 'Cu2000', 'Cu2001', 'Cu2002', 'Cu2003', 'Cu2004', 'Cu2005', 'Cu2006', 'Cu2007', 'Cu2008', 'Cu2009', 'Cu2010', 'Cu2011', 'Cu2012', 'Cu2013', 'Cu2014', 'Cu2015']


In [5]:
Boundary = gpd.read_file('TCD_ADM2.shp')#; Boundary.crs = "ESRI:102022"
print(Boundary.info())
print(Boundary.crs)

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 70 entries, 0 to 69
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype   
---  ------     --------------  -----   
 0   ADM0_CODE  70 non-null     int64   
 1   ADM0_NAME  70 non-null     object  
 2   ADM1_CODE  70 non-null     int64   
 3   ADM1_NAME  70 non-null     object  
 4   ADM2_CODE  70 non-null     int64   
 5   ADM2_NAME  70 non-null     object  
 6   geometry   70 non-null     geometry
dtypes: geometry(1), int64(3), object(3)
memory usage: 4.0+ KB
None
PROJCS["Africa_Albers_Equal_Area_Conic",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",0],PARAMETER["longitude_of_center",25],PARAMETER["standard_parallel_1",20],PARAMETER["s

In [49]:
PopRasterList = [rioxarray.open_rasterio(item, masked=True) 
                 for item in glob.glob(r'C:\Users\grace\GIS\povertyequity\urban_growth\TCD_WorldPop' 
                                       + '**/*.tif', recursive=True)] 

print(PopRasterList)

for item in PopRasterList:
    print(item.rio.crs) 

[<xarray.DataArray (band: 1, y: 19698, x: 11966)>
[235706268 values with dtype=float32]
Coordinates:
  * band         (band) int32 1
  * x            (x) float64 -1.208e+06 -1.208e+06 ... -1.039e+05 -1.038e+05
  * y            (y) float64 2.694e+06 2.694e+06 ... 8.768e+05 8.767e+05
    spatial_ref  int32 0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0, <xarray.DataArray (band: 1, y: 19698, x: 11966)>
[235706268 values with dtype=float32]
Coordinates:
  * band         (band) int32 1
  * x            (x) float64 -1.208e+06 -1.208e+06 ... -1.039e+05 -1.038e+05
  * y            (y) float64 2.694e+06 2.694e+06 ... 8.768e+05 8.767e+05
    spatial_ref  int32 0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0, <xarray.DataArray (band: 1, y: 19698, x: 11966)>
[235706268 values with dtype=float32]
Coordinates:
  * band         (band) int32 1
  * x            (x) float64 -1.208e+06 -1.208e+06 ... -1.039e+05 -1.038e+05
  * 

#### Settlement extents by year (cumulative built areas)

In [59]:
CheckingContents = gpd.read_file("WSFE_CuDissolve.gpkg", layer=5)
CheckingContents.info() # We want our BuiltAllYears dataframe below to have the same fields as the pre-appended layers.
del CheckingContents

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 1619 entries, 0 to 1618
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype   
---  ------         --------------  -----   
 0   GRID3_splitID  1619 non-null   float64 
 1   ADM            1619 non-null   int64   
 2   year           1619 non-null   int64   
 3   WSFE_ID        1619 non-null   int64   
 4   GRID3_OID      1619 non-null   float64 
 5   GeoName        34 non-null     object  
 6   Afpl_Name      137 non-null    object  
 7   UCDB_Name      68 non-null     object  
 8   geometry       1619 non-null   geometry
dtypes: float64(2), geometry(1), int64(3), object(3)
memory usage: 114.0+ KB


In [6]:
BuiltAllYears = gpd.GeoDataFrame(
    columns=['GRID3_splitID', 'ADM', 'year', 'WSFE_ID', 'GRID3_OID', 'GeoName', 'Afpl_Name', 'UCDB_Name', 'geometry'], 
    geometry='geometry', crs = "ESRI:102022")
print(BuiltAllYears.info())
print(BuiltAllYears.crs)
print(BuiltAllYears.head())

<class 'geopandas.geodataframe.GeoDataFrame'>
Index: 0 entries
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype   
---  ------         --------------  -----   
 0   GRID3_splitID  0 non-null      object  
 1   ADM            0 non-null      object  
 2   year           0 non-null      object  
 3   WSFE_ID        0 non-null      object  
 4   GRID3_OID      0 non-null      object  
 5   GeoName        0 non-null      object  
 6   Afpl_Name      0 non-null      object  
 7   UCDB_Name      0 non-null      object  
 8   geometry       0 non-null      geometry
dtypes: geometry(1), object(8)
memory usage: 108.0+ bytes
None
ESRI:102022
Empty GeoDataFrame
Columns: [GRID3_splitID, ADM, year, WSFE_ID, GRID3_OID, GeoName, Afpl_Name, UCDB_Name, geometry]
Index: []


In [7]:
for item in BuiltAreaList:
    CuYear = re.sub(r'[^0-9]', '', item)
    TempItem = gpd.read_file("WSFE_CuDissolve.gpkg", layer=item)
    TempItem["CuYear"] = CuYear
    BuiltAllYears = pd.concat([BuiltAllYears, TempItem])

BuiltAllYears.to_file(driver="GPKG", filename="WSFE_Cumulative_TCD.gpkg", layer="WSFE_AllYears")
BuiltAllYears.sample(20)

Unnamed: 0,GRID3_splitID,ADM,year,WSFE_ID,GRID3_OID,GeoName,Afpl_Name,UCDB_Name,geometry,CuYear
381,268380.0,1103.0,1985,65153,65796.0,,,,"MULTIPOLYGON (((-940348.111 1263475.337, -9403...",1986
396,272656.0,1101.0,1985,65279,60383.0,,,,"MULTIPOLYGON (((-981692.359 1239561.384, -9816...",1991
547,300613.0,1602.0,1985,74780,14359.0,,,,"MULTIPOLYGON (((-955194.712 1122187.490, -9551...",1992
1028,337447.0,1002.0,1985,112834,38179.0,,,,"POLYGON ((-764309.843 1017657.945, -764309.843...",1991
863,322251.0,804.0,1985,97883,25393.0,,,,"MULTIPOLYGON (((-907770.427 1066237.863, -9077...",1998
230,199206.0,,2005,28198,127723.0,,,,"MULTIPOLYGON (((-1039325.450 1441250.764, -103...",2005
1309,343781.0,,1999,124932,4123.0,,,,"MULTIPOLYGON (((-954911.919 992901.739, -95496...",1999
1212,343502.0,1002.0,1994,124486,38088.0,,,,"MULTIPOLYGON (((-773048.128 993864.314, -77304...",1997
259,201387.0,,1998,28458,131989.0,,,,"MULTIPOLYGON (((-1027080.539 1435776.123, -102...",2013
515,293468.0,,2000,70346,91879.0,,Alaroro,,"MULTIPOLYGON (((-617229.517 1142732.433, -6172...",2006


In [8]:
print(BuiltAllYears.crs)

ESRI:102022


#### Thiessen polygons (Voronoi polygons): For each year, demarcate the surrounding space which is closest to a particular feature than to any other feature in the year set.
#### Then, buffer area of each built-up polygon and use that buffer to clip the Thiessen areas.

In [None]:
#If CRSs do not match:
#PopRasterList = [item.rio.reproject_match(BuiltArea) for item in PopRasterList]
#Boundary = Boundary.to_crs(BuiltArea.crs)

In [9]:
Boundary = Boundary.dissolve()
Boundary.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 1 entries, 0 to 0
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype   
---  ------     --------------  -----   
 0   geometry   1 non-null      geometry
 1   ADM0_CODE  1 non-null      int64   
 2   ADM0_NAME  1 non-null      object  
 3   ADM1_CODE  1 non-null      int64   
 4   ADM1_NAME  1 non-null      object  
 5   ADM2_CODE  1 non-null      int64   
 6   ADM2_NAME  1 non-null      object  
dtypes: geometry(1), int64(3), object(3)
memory usage: 172.0+ bytes


In [None]:
# AllStudyYears = CreateList(StudyStart, StudyEnd)

In [14]:
for item in AllStudyYears:
    print('Loading layer %s. %s\n' % (item, time.ctime()))
    CuYear = ''.join(['Cu', str(item)])
    Layer = gpd.read_file("WSFE_CuDissolve.gpkg", layer=CuYear) # Read in the layer as a geodataframe.
    print('Create a buffer around the features in the original layer. %s\n' % time.ctime())
    BufferLayer = Layer
    BufferLayer['geometry'] = BufferLayer['geometry'].apply(make_valid) # This is a workaround for any null geometries.
    BufferLayer['geometry'] = BufferLayer['geometry'].buffer(2000) # Create a 2km buffer around the original feature.
    BufferLayer.to_file(driver='GPKG', filename='WSFE_Buffer.gpkg', layer=''.join(['Buff_', str(item)]))
    print('Buffered version finished and saved to file. %s\n' % time.ctime())
    del CuYear, Layer, BufferLayer
print("All years finished buffering. %s" % time.ctime())

Loading layer 1999. Mon Oct 10 20:59:43 2022

Create a buffer around the features in the original layer. Mon Oct 10 20:59:43 2022

Buffered version finished and saved to file. Mon Oct 10 21:03:36 2022

Loading layer 2000. Mon Oct 10 21:03:36 2022

Create a buffer around the features in the original layer. Mon Oct 10 21:03:36 2022

Buffered version finished and saved to file. Mon Oct 10 21:06:44 2022

Loading layer 2001. Mon Oct 10 21:06:45 2022

Create a buffer around the features in the original layer. Mon Oct 10 21:06:45 2022

Buffered version finished and saved to file. Mon Oct 10 21:09:50 2022

Loading layer 2002. Mon Oct 10 21:09:50 2022

Create a buffer around the features in the original layer. Mon Oct 10 21:09:50 2022

Buffered version finished and saved to file. Mon Oct 10 21:12:53 2022

Loading layer 2003. Mon Oct 10 21:12:53 2022

Create a buffer around the features in the original layer. Mon Oct 10 21:12:53 2022

Buffered version finished and saved to file. Mon Oct 10 21:15

In [15]:
for item in AllStudyYears:
    print('Loading layer %s. %s\n' % (item, time.ctime()))
    CuYear = ''.join(['Cu', str(item)])
    Layer = gpd.read_file("WSFE_CuDissolve.gpkg", layer=CuYear) # Read in the layer as a geodataframe.
    Buffer = gpd.read_file("WSFE_Buffer.gpkg", layer=''.join(['Buff_', str(item)]))
    print('Loaded. Assigning year field. %s\n' % time.ctime())
    Layer["CuYear"] = item # Give geodataframe a field where every value is the year of cumulative buildup represented by the layer. This will be useful if concatenating all the layers together into a single dataset.
    print('Assigned. Drawing Thiessen (Voronoi) polygons using buffer as the bounding area. %s\n' % time.ctime())
    ThiessenLayer = voronoiDiagram4plg(Layer, Buffer) # Demarcate the area around each feature which is closer to that feature than any other feature.
    ThiessenLayer.to_file(driver='GPKG', filename='WSFE_ThiessenBuffer.gpkg', layer=''.join(['ThBuff_', str(item)]))
    print('Polygons drawn and written to file. %s\n' % time.ctime())
    del CuYear, Buffer, Layer, ThiessenLayer
print("All years finished drawing near polygons. %s" % time.ctime())

Loading layer 1999. Mon Oct 10 22:45:36 2022

Loaded. Assigning year field. Mon Oct 10 22:45:37 2022

Assigned. Drawing Thiessen (Voronoi) polygons using buffer as the bounding area. Mon Oct 10 22:45:37 2022



TopologyException: Input geom 1 is invalid: Self-intersection at -1208584.4164925045 1316968.232589456


ValueError: Could not create Voronoi Diagram with the specified inputs.

In [None]:
for item in BuiltAreaList:
    print('Loading Thiessen areas and buffered polygons from layer %s. %s\n' % (item, time.ctime()))
    CuYear = re.sub(r'[^0-9]', '', item) # Pull the year of feature layer (e.g. "2005") from the numeric portion of the layer name.
    ThiessenLayer = gpd.read_file(filename=''.join(['Thies_', CuYear, '.shp'])) # Read in the layer as a geodataframe.
    BufferLayer = gpd.read_file(filename=''.join(['Buff_', CuYear, '.shp']))
    print('Now clipping the Thiessen polygons with the buffer. %s\n' % time.ctime())
    ThiessenBufferLayer = gpd.clip(ThiessenLayer, BufferLayer) # Clip the demarcated area so that coverage ends at the 2km mark. This will be both the mask used to reduce the file size of the population rasters, and the zones used to summarize the pop data during zonal statistics.
    print('Clipped. Polygons did not retain feature attributes. Joining back on. %s\n' % time.ctime())
    ThiessenBufferLayer = ThiessenBufferLayer.merge(Layer, how='left', left_index=True, right_index=True) # Voronoi function does not retain the attributes (leaves them all Null for whatever reason). Just joining it all back together.
    print('Finished! Writing to file. %s\n' % time.ctime())
    ThiessenBufferLayer.to_file(driver='ESRI Shapefile', filename=''.join(['ThBuf_', CuYear, '.shp']))
    print(ThiessenBufferLayer.sample(5))
    print('\nNext layer. %s\n' % time.ctime())
    del CuYear, ThiessenLayer, BufferLayer, ThiessenBufferLayer

#### Masking function to apply to each raster using each WSFE year

In [None]:
def masking_function(Raster, PolyMask): 
    RasterYear = int(str(Raster[8:12])) 
    PolyMaskYear = select(PolyMask["MAX_year"] == RasterYear) 
    MaskedRaster = rastercalculator(Int(Raster * 100), mask=PolyMaskYear) 
return MaskedRaster 

In [None]:
MaskedRasterList = [masking_function(item, BuiltThiessenBuffer) for item in PopRasterList] 
print(MaskedRasterList)

#### Zonal statistics