# Spatiotemporal Trends in Urbanization: Cameroon
*Using yearly estimates (2000-2015) of population, built-area, and economic indicators to track city-by-city growth and change over time.*

---

### Research questions 

#### 1. How has the size of Settlement X changed over time? 

- Population size 

- Geographical extents 

- Population density 

#### 2. In what year did Settlement X become a new urban class?  

- From semi-dense to high-density city 

- Small settlement area to built-up area 

- When a hamlet area or small settlement area first appeared

#### 3. Is there a discernable pattern between the spatio-temporal distribution of economic density and population density? 

#### 4. How much of urban space attributable to City X is outside of the administrative limits of the city? 

- When did this fragment(s) appear? 

- Which district/municipality/authority has purview over the fragment(s)? 

#### 5. For the questions above, how does the answer change based on different understandings of urban limits? 

- Scenario A: where "city" is delimited by an official administrative boundary 

- Scenario B: where "city" includes all contiguous (and near-contiguous) built up area 

#### 6. Subnational and inter-national comparisons. Examples: 

- Compare the rates (pop, build-up, economic…) of the fastest growing settlement of each ADM1 region. 

- Which African metropoles experience the most vs. the least fragmentation? Is there a confluence between amount of urban fragmentation and rate of densification? 

### Datasets
1. Most up-to-date administrative boundaries: **ADM3.**
2. Built-up area, yearly: **World Settlement Footprint Evolution.** Resolution: 30m.
3. Settlement types: **GRID3 settlement extents.** Captured between 2009-2019.
4. Population, yearly: **WorldPop.** UN-adjusted, unconstrained. Resolution: 100m.
5. Nighttime lights, yearly: **Harmonization of DMSP and VIIRS.** Resolution: 1km.
6. City names: **UCDB, Africapolis, and GeoNames.**

---

---

## 1. PREPARE WORKSPACE

### 1.1 Off-script

##### Off-script: Create folders in working directory.
> *ADM
<br>Buildup
<br>PlaceName
<br>Population
<br>Settlement
<br>NTL*

##### Off-script: Download datasets (as shapefile, GeoJSON, or tif where possible) and place or extract into corresponding folder:
- ADM: *Sourced internally.*
- Buildup: https://download.geoservice.dlr.de/WSF_EVO/files/
- PlaceName: 
    - GeoNames: (file: cities500.zip) https://download.geonames.org/export/dump/
    - Africapolis: https://africapolis.org/en/data
    - Urban Centres Database: https://ghsl.jrc.ec.europa.eu/ghs_stat_ucdb2015mt_r2019a.php
- Population: https://hub.worldpop.org/geodata/listing?id=69
- Settlement: https://data.grid3.org/datasets/GRID3::grid3-cameroon-settlement-extents-version-01-01-/explore
- NTL: https://figshare.com/articles/dataset/Harmonization_of_DMSP_and_VIIRS_nighttime_light_data_from_1992-2018_at_the_global_scale/9828827/2

##### Other off-script:
- Convert GeoNames from .txt file to shape (delimiter = tab, header rows = 0) and rename fields.
- If necessary, mosaic WSFE rasters that cover the area of interest to create a single file.

### 1.2 Load all packages.

In [None]:
# Note: Most but not all of these packages were used in final form. 

import os, sys, glob, re, time
from os.path import exists
from functools import reduce

import geopandas as gpd 
import pandas as pd
from shapely.geometry import Point, LineString, Polygon, shape, MultiPoint
from shapely.ops import cascaded_union
from shapely.validation import make_valid, explain_validity
import shapely.wkt
import scipy

#from xrspatial import zonal_stats 
#import xarray as xr 
import numpy as np 
import fiona, rioxarray
import rasterio
from rasterio.plot import show
from rasterio import features
from rasterio.features import shapes
from rasterio import mask
from osgeo import gdal, osr, ogr, gdal_array
import matplotlib.pyplot as plt

### 1.3 Set workspace.

In [None]:
ProjectFolder = os.getcwd()
ResultsFolder = os.path.join(ProjectFolder, 'Results')
print(ProjectFolder)
print(ResultsFolder)

---

## 2. PREPARE BUILDUP, SETTLEMENT, AND ADMIN DATASETS
Projection for all datasets: Africa Albers Equal Area Conic

### 2.1 WSFE: Check contents and change NoData value as necessary.

##### Off-script: Run this block in QGIS.

In [None]:
# # OPEN QGIS FOR THIS PORTION. CODE DOCUMENTED HERE.
# Change NoData value to zero, as this won't interfere with a possible value of 99999 in GRID3 and ADM.
# Then make sure there are no values above 2015 (such as 99999) or below 1985 in the dataset by reclassifying them as NoData.
# Was having trouble with rasterio & gdal here, so moved to QGIS.

# processing.run("native:reclassifybytable", {'INPUT_RASTER':'C:/Users/grace/GIS/povertyequity/urban_growth/Cameroon/Buildup/WSFE_CMN.tif','RASTER_BAND':1,'TABLE':['2016','','0','','1984','0'],'NO_DATA':0,'RANGE_BOUNDARIES':0,'NODATA_FOR_MISSING':False,'DATA_TYPE':5,'OUTPUT':'C:/Users/grace/GIS/povertyequity/urban_growth/Cameroon/Buildup/WSFE.tif'})

### 2.2 Prepare raster locations for GRID3 and Admin areas

In [None]:
ADM_vec = gpd.read_file(glob.glob('ADM/*.shp')[0])[['geometry']].to_crs("ESRI:102022") # This glob() function pulls the first file ([0]) in the ADM folder which ended in '.shp'
GRID3_vec = gpd.read_file(glob.glob('Settlement/*.shp')[0])[['type','geometry']].to_crs("ESRI:102022")
ADM_vec['ADM_ID'] = range(0,len(ADM_vec))
GRID3_vec['G3_ID'] = range(0,len(GRID3_vec))
ADM_vec.to_file(driver='GPKG', filename=r'ADM/ADM_warp.gpkg', layer='ADM')
GRID3_vec.to_file(driver='GPKG', filename=r'Settlement/Settlement_warp.gpkg', layer='GRID3')

In [None]:
ADM_vec = gpd.read_file(r'ADM/ADM_warp.gpkg', layer='ADM')
GRID3_vec = gpd.read_file(r'Settlement/Settlement_warp.gpkg', layer='GRID3')

print(ADM_vec.info(), "\n\n", 
      ADM_vec.sample(5),
      ADM_vec.crs, "\n\n", 
      len(str(ADM_vec['ADM_ID'].max()))) # We need to know how many digits need to be allocated to each dataset in the "join" serial.
print(GRID3_vec.info(), "\n\n",
      GRID3_vec.sample(5),
      GRID3_vec.crs, "\n\n", 
      len(str(GRID3_vec['G3_ID'].max())))

---

## 3. WSFE AND ADM; GRID3 AND ADM
RASTERIZE: Bring ADM and GRID3 into raster space.

RASTER MATH: "Join" ADM ID onto GRID3 and onto WSFE by creating unique concatenation string.

VECTORIZE: Bring joined data into vector space.

VECTOR MATH: Split unique ID from raster math step into separate columns.

### 3.1 Reproject WSFE to project CRS.

In [None]:
WSFE_in = glob.glob('Buildup/*.tif')[0]
WSFE_warp = './Buildup/WSFE_warp.tif'
ProjCRS = gdal.WarpOptions(dstSRS='ESRI:102022')

In [None]:
Warp = gdal.Warp(WSFE_warp, # Where to store the warped raster
                 WSFE_in, # Which raster to warp
                 format='GTiff', 
                 options=ProjCRS) # Reproject to Africa Albers Equal Area Conic
Warp = None
print('Reprojected dataset. %s' % time.ctime())

try:  
    os.remove(os.path.join(ProjectFolder, WSFE_in))
except OSError:
    pass
print('Removed (or skipped if error) intermediate file. %s' % time.ctime())

In [None]:
WSFE = rasterio.open(os.path.join(ProjectFolder, "Buildup", os.listdir('Buildup/')[0]))
print(WSFE) # WSFE values are all 4 digits long (1985-2015)
print(dir(WSFE))
print(WSFE.crs)
print(WSFE.dtypes)
NoDataValue = WSFE.nodatavals
print(NoDataValue)
print(WSFE.read(1).min(), WSFE.read(1).mean(), np.median(WSFE.read(1)), WSFE.read(1).max())

# If NoDataValue != 0, change to 0. (See step 2.1)

### 3.2 Rasterize admin areas and GRID3 using WSFE specs.

In [None]:
# Copy and update the metadata from WSFE for the output
meta = WSFE.meta.copy()
meta.update(compress='lzw')
WSFE.meta

ADM_out = './ADM/ADM_rasterized.tif'
GRID3_out = './Settlement/GRID3_rasterized.tif'

In [None]:
print("Rasterizing dataset. %s" % time.ctime())
with rasterio.open(ADM_out, 'w+', **meta) as out:
    out_arr = out.read(1)

    # this is where we create a generator of geom, value pairs to use in rasterizing
    shapes = ((geom,value) for geom, value in zip(ADM_vec.geometry, ADM_vec.ADM_ID))

    burned = features.rasterize(shapes=shapes, fill=0, out=out_arr, transform=out.transform)
    out.write_band(1, burned)
out = None

In [None]:
print("Rasterizing dataset. %s" % time.ctime())
with rasterio.open(GRID3_out, 'w+', **meta) as out:
    out_arr = out.read(1)

    # this is where we create a generator of geom, value pairs to use in rasterizing
    shapes = ((geom,value) for geom, value in zip(GRID3_vec.geometry, GRID3_vec.G3_ID))

    burned = features.rasterize(shapes=shapes, fill=0, out=out_arr, transform=out.transform)
    out.write_band(1, burned)
out = None

*Validation: Check the dimensions, type, and basic stats of the three datasets. All should be the same dimension and NoData value.*

In [None]:
RastersList = [gdal.Open(r"ADM/ADM_rasterized.tif"), 
               gdal.Open(r"Settlement/GRID3_rasterized.tif"),
               gdal.Open(os.path.join(ProjectFolder, "Buildup", os.listdir('Buildup/')[0]))]

for item in RastersList:
    print(gdal.GetDataTypeName(item.GetRasterBand(1).DataType), 
          item.GetRasterBand(1).GetNoDataValue(),
         "\n\n")

RastersList = None

RastersList = [rasterio.open(r"ADM/ADM_rasterized.tif"), 
               rasterio.open(r"Settlement/GRID3_rasterized.tif"), 
               rasterio.open(os.path.join(ProjectFolder, "Buildup", os.listdir('Buildup/')[0]))]

for item in RastersList:
    print(item.name, "\nBands= ", item.count, "\nWxH= ", item.width, "x", item.height, "\n\n")

stats = []
for item in RastersList:
    band = item.read(1)
    stats.append({
        'raster': item.name,
        'min': band.min(),
        'mean': band.mean(),
        'median': np.median(band),
        'max': band.max()})

# Show stats for each channel
print("\n", stats)

RastersList = None
band = None

### 3.2 Raster math to "join" admin to GRID3 and to WSFE.
Processing is more rapid when "joining," i.e. creating serial codes out of two datasets, in raster rather than vector space.
Here, we are concatenating the ID fields of the two datasets to create a serial number that we can then split in vector space later to create two ID fields.

*Adding together the values to create join IDs. This is in effect a concatenation of their ID strings, by way of summation. The number of zeros in the calc multiplication corresponds with number of digits of the maximum value in the "B" dataset. (e.g. Chad ADM codes go up 4 digits, so it's calc=(A*10000)+B).*

In [None]:
# # OPEN TERMINAL FOR THIS PORTION. CODE DOCUMENTED HERE.

# Gdal_calc.py # To see info.

# gdal_calc.py -A C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon\Settlement\GRID3_rasterized.tif -B  C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon\ADM\ADM_rasterized.tif --outfile=C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon\Settlement\GRID3_ADM.tif --overwrite --calc="(A*1000)+B"
# gdal_calc.py -A C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon\Buildup\WSFE_warp.tif -B  C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon\ADM\ADM_rasterized.tif --outfile=C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon\Buildup\WSFE_ADM.tif --overwrite --calc="(A*1000)+B"

# # END TERMINAL-ONLY ASPECT. RETURN HERE FOR NEXT STEPS.

In [12]:
# Validation: check the basic statistics of the resulting datasets.
RastersList = [rasterio.open(r"Buildup/WSFE_ADM.tif"), 
               rasterio.open(r"Settlement/GRID3_ADM.tif")]
for item in RastersList:
    print(item.name, "\nBands= ", item.count, "\nWxH= ", item.width, "x", item.height, "\n\n")
    
stats = []
for item in RastersList:
    band = item.read(1)
    stats.append({
        'raster': item.name,
        'min': band.min(),
        'mean': band.mean(),
        'median': np.median(band),
        'max': band.max()})

# Show stats for each channel
print("\n", stats)

RastersList = None
band = None

Buildup/WSFE_ADM.tif 
Bands=  1 
WxH=  26387 x 43889 


Settlement/GRID3_ADM.tif 
Bands=  1 
WxH=  26387 x 43889 



 [{'raster': 'Buildup/WSFE_ADM.tif', 'min': 1985001, 'mean': 4285775835.831103, 'median': 4294967293.0, 'max': 4294967293}, {'raster': 'Settlement/GRID3_ADM.tif', 'min': 1323, 'mean': 4243096252.2932878, 'median': 4294967293.0, 'max': 4294967293}]


### 3.3 Vectorize "joined" layers.

##### Off-script: Run this block in QGIS.

In [None]:
# OPEN QGIS FOR THIS PORTION. CODE DOCUMENTED HERE.

# Due to dtype errors with both gdal and rasterio here, I decided to run the raster to polygon function in QGIS instead.
# It is possible to run QGIS functions within a Jupyter Notebook, but I ran it within the GUI. Arc or R are other options.
# Command line code here.

# processing.run("gdal:polygonize", {'INPUT':'C:/Users/grace/GIS/povertyequity/urban_growth/Cameroon/Settlement/GRID3_ADM.tif','BAND':1,'FIELD':'gridcode','EIGHT_CONNECTEDNESS':False,'EXTRA':'','OUTPUT':'C:/Users/grace/GIS/povertyequity/urban_growth/Cameroon/Settlement/GRID3_ADM.shp'})
# processing.run("gdal:polygonize", {'INPUT':'C:/Users/grace/GIS/povertyequity/urban_growth/Cameroon/Buildup/WSFE_ADM.tif','BAND':1,'FIELD':'gridcode','EIGHT_CONNECTEDNESS':False,'EXTRA':'','OUTPUT':'C:/Users/grace/GIS/povertyequity/urban_growth/Cameroon/Buildup/WSFE_ADM.shp'})

### 3.4 Vector math to split raster strings into admin area, GRID3, and WSFE year assignments.

In [13]:
# Load newly created vectorized datasets.
GRID3_ADM = gpd.read_file(r"Settlement/GRID3_ADM.shp")
WSFE_ADM = gpd.read_file(r"Buildup/WSFE_ADM.shp")
print(GRID3_ADM.info(), "\n\n", GRID3_ADM.sample(10), "\n\n", GRID3_ADM.crs, "\n\n", 
      WSFE_ADM.info(), "\n\n", WSFE_ADM.sample(10), "\n\n", WSFE_ADM.crs)

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 208406 entries, 0 to 208405
Data columns (total 2 columns):
 #   Column    Non-Null Count   Dtype   
---  ------    --------------   -----   
 0   gridcode  208406 non-null  int64   
 1   geometry  208406 non-null  geometry
dtypes: geometry(1), int64(1)
memory usage: 3.2 MB
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 538320 entries, 0 to 538319
Data columns (total 2 columns):
 #   Column    Non-Null Count   Dtype   
---  ------    --------------   -----   
 0   gridcode  538320 non-null  int64   
 1   geometry  538320 non-null  geometry
dtypes: geometry(1), int64(1)
memory usage: 8.2 MB
None 

          gridcode                                           geometry
138125   53824305  POLYGON ((-1552758.662 645500.557, -1552728.05...
121221   60928157  POLYGON ((-1549820.627 698354.567, -1549759.41...
12979   180559232  POLYGON ((-1157134.000 1262854.984, -1157042.1...
96208    98680191  POLYGON ((-1478053.022 757176.4

In [14]:
print(GRID3_ADM['gridcode'].max(), WSFE_ADM['gridcode'].max())

201820297 2015328


In [15]:
# Split serial back into separate dataset fields.
# For Burkina: WSFE and ADM: 4+3=7 digits. GRID3 and ADM: 6+3=9 digits.
GRID3_ADM['gridstring'] = GRID3_ADM['gridcode'].astype(str).str.zfill(9)
WSFE_ADM['gridstring'] = WSFE_ADM['gridcode'].astype(str).str.zfill(7)

GRID3_ADM['Sett_ID'] = GRID3_ADM['gridstring'].str[:-3].astype(int) # Remove the last 4 digits to get the GRID3 portion.
GRID3_ADM['ADM_ID'] = GRID3_ADM['gridstring'].str[-3:].astype(int) # Keep only the last 4 digits to get the ADM portion.
WSFE_ADM['year'] = WSFE_ADM['gridstring'].str[:-3].astype(int)
WSFE_ADM['ADM_ID'] = WSFE_ADM['gridstring'].str[-3:].astype(int)

print(GRID3_ADM.sample(10), WSFE_ADM.sample(10))

         gridcode                                           geometry  \
177325   20280274  POLYGON ((-1428228.860 487917.870, -1428137.04...   
180658   20038276  POLYGON ((-1422628.232 476686.011, -1422597.62...   
31273   161716226  POLYGON ((-1177914.470 1183925.921, -1177822.6...   
5626    191969256  POLYGON ((-1045396.888 1301998.168, -1045274.4...   
66687   119296197  POLYGON ((-1288121.356 855631.208, -1288060.14...   
127234   56247156  POLYGON ((-1542291.915 686541.221, -1542077.68...   
39970   157432226  POLYGON ((-1183453.889 1149863.087, -1183392.6...   
168502   22520271  POLYGON ((-1405244.863 518889.647, -1405183.65...   
127353   59517156  POLYGON ((-1531825.168 678369.814, -1531733.35...   
156129   44406072  POLYGON ((-1081938.688 566785.725, -1081846.87...   

       gridstring  Sett_ID  ADM_ID  
177325  020280274    20280     274  
180658  020038276    20038     276  
31273   161716226   161716     226  
5626    191969256   191969     256  
66687   119296197   11

In [16]:
# Dissolve any features that have the same G3 and ADM values so that we have a single unique feature per settlement.
# Note: we do NOT want to dissolve the WSFE features. Distinct features for noncontiguous builtup areas of the same year is necessary to separate them in the Near tool step.
GRID3_ADM = GRID3_ADM.dissolve(by=['Sett_ID', 'ADM_ID'], as_index=False)
print(GRID3_ADM.info(), GRID3_ADM.head())

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 204257 entries, 0 to 204256
Data columns (total 5 columns):
 #   Column      Non-Null Count   Dtype   
---  ------      --------------   -----   
 0   Sett_ID     204257 non-null  int64   
 1   ADM_ID      204257 non-null  int64   
 2   geometry    204257 non-null  geometry
 3   gridcode    204257 non-null  int64   
 4   gridstring  204257 non-null  object  
dtypes: geometry(1), int64(3), object(1)
memory usage: 7.8+ MB
None    Sett_ID  ADM_ID                                           geometry  \
0        1     323  MULTIPOLYGON (((-1572100.720 271452.083, -1572...   
1        2     323  POLYGON ((-1572896.437 272094.778, -1572804.62...   
2        3     323  POLYGON ((-1571274.398 273778.027, -1571213.18...   
3        4     323  POLYGON ((-1575038.754 274298.304, -1574977.54...   
4        5     323  POLYGON ((-1573416.714 277205.733, -1573233.08...   

   gridcode gridstring  
0      1323  000001323  
1      2323  000002323  

In [17]:
# Remove features where year, settlement, or admin area = 0.
# This was supposed to be resolved earlier with the gdal_calc NoDataValue parameter.

print("Before: WSFE %s and GRID3 %s\n" % (WSFE_ADM.shape, GRID3_ADM.shape))
WSFE_ADM = WSFE_ADM.loc[(WSFE_ADM["year"] != 0) & (WSFE_ADM["ADM_ID"] != 0)] # Since we change the datatype to integer, no need to include all digits. Otherwise, it would need to be: != '0000'
GRID3_ADM = GRID3_ADM.loc[(GRID3_ADM["Sett_ID"] != 0) & (GRID3_ADM["ADM_ID"] != 0)]
print("After: WSFE %s and GRID3 %s\n" % (WSFE_ADM.shape, GRID3_ADM.shape))

Before: WSFE (538320, 5) and GRID3 (204257, 5)

After: WSFE (538320, 5) and GRID3 (204257, 5)



In [18]:
# The Bounded_ID is our new unique settlement identifier for subsequent matching steps.
GRID3_ADM['Bounded_ID'] = GRID3_ADM.index
WSFE_ADM['WSFE_ID'] = WSFE_ADM.index
GRID3_ADM = GRID3_ADM[['Sett_ID', 'Bounded_ID', 'ADM_ID', 'geometry']]
WSFE_ADM = WSFE_ADM[['year', 'ADM_ID', 'geometry']]

---

## 4. UNIQUE SETTLEMENTS FROM WSFE AND GRID3: TWO VERSIONS

Note that there are 2 versions here, so that we can create a fragmentation index:
1. **Boundless, aka boundary-agnostic settlements**: Unique settlements are linked to GRID3 settlement IDs. Administrative areas do not influence the extents of the settlement.
2. **Bounded, aka politically-defined settlements**: Settlements in the Boundless dataset which spread across more than one administrative area are split into separate settlements in the Bounded dataset. The largest polygon after the split is considered the "principal" settlement, and polygons in other admin areas are considered "fragments." By dividing the fragment area(s) of the Bounded settlement by the area of the Boundless settlement, we can acquire a fragmentation index for each locality.

### 4.1 BOUNDED SETTLEMENTS: Near Join by ADM group.

In [19]:
# The sharding step below doesn't work if any ADM group contains features from only one of the two datasets.
WSFE_u = sorted(WSFE_ADM.ADM_ID.unique().tolist())
GRID3_u = sorted(GRID3_ADM.ADM_ID.unique().tolist())

not_matching = list(set(GRID3_u).symmetric_difference(set(WSFE_u)))
print(not_matching) # Validate: If there are many ADM_IDs in this list, investigate why GRID3 or WSFE is missing in so many areas.

# Take only the features that share an ADM with at least one GRID3 feature.
WSFE_matching = WSFE_ADM[~WSFE_ADM["ADM_ID"].isin(not_matching)] 
GRID3_matching = GRID3_ADM[~GRID3_ADM["ADM_ID"].isin(not_matching)]

WSFE_u = sorted(WSFE_matching.ADM_ID.unique().tolist())
GRID3_u = sorted(GRID3_matching.ADM_ID.unique().tolist())

not_matching = list(set(GRID3_u).symmetric_difference(set(WSFE_u)))
print(not_matching) # This should now be empty.

del WSFE_u, GRID3_u, not_matching, WSFE_ADM, GRID3_ADM

[51]
[]


In [20]:
# Shard the dataframe whose variables we want to join into a dict
shards = {k:d for k, d in GRID3_matching.groupby('ADM_ID', as_index=False)}

# Take the dataframe whose geometry we want to retain.
# Group by ADM, then sjoin_nearest among the smaller dataframe's matching ADM shard
Bounded = WSFE_matching.groupby('ADM_ID', as_index=False).apply(
    lambda d: gpd.sjoin_nearest(
    d, shards[d['ADM_ID'].values[0]], 
        how='left', 
        max_distance=500))

print(Bounded.info())
print(Bounded.sample(10))

<class 'geopandas.geodataframe.GeoDataFrame'>
MultiIndex: 540184 entries, (0, 104264) to (325, 3696)
Data columns (total 7 columns):
 #   Column        Non-Null Count   Dtype   
---  ------        --------------   -----   
 0   year          540184 non-null  int32   
 1   ADM_ID_left   540184 non-null  int32   
 2   geometry      540184 non-null  geometry
 3   index_right   536223 non-null  float64 
 4   Sett_ID       536223 non-null  float64 
 5   Bounded_ID    536223 non-null  float64 
 6   ADM_ID_right  536223 non-null  float64 
dtypes: float64(4), geometry(1), int32(2)
memory usage: 48.1 MB
None
            year  ADM_ID_left  \
53  373202  2011           55   
265 220317  1999          268   
11  445552  1997           12   
116 372497  2010          118   
141 166566  1998          143   
35  500371  2007           36   
207 78724   2013          209   
5   227170  1985            6   
222 33397   2015          225   
193 80322   1999          195   

                             

In [21]:
# Now we can dissolve with the WSFE years, now that we can group them by their administratively split ID.
Bounded = Bounded.dissolve(by=['year', 'Bounded_ID'], as_index=False)
print(Bounded.info(), Bounded.sample(10))

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 83356 entries, 0 to 83355
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype   
---  ------        --------------  -----   
 0   year          83356 non-null  int64   
 1   Bounded_ID    83356 non-null  float64 
 2   geometry      83356 non-null  geometry
 3   ADM_ID_left   83356 non-null  int32   
 4   index_right   83356 non-null  float64 
 5   Sett_ID       83356 non-null  float64 
 6   ADM_ID_right  83356 non-null  float64 
dtypes: float64(4), geometry(1), int32(1), int64(1)
memory usage: 4.1 MB
None        year  Bounded_ID                                           geometry  \
55826  2009     15299.0  POLYGON ((-1487601.633 455232.240, -1487571.02...   
78164  2014    186411.0  MULTIPOLYGON (((-1120745.222 1230506.004, -112...   
32907  2001    199988.0  POLYGON ((-1415313.751 423372.933, -1415283.14...   
69264  2012    148906.0  POLYGON ((-1179995.578 1090673.941, -1179964.9...   
26920  2000     1360

In [22]:
# Clean up and save to file.
Bounded = Bounded.rename(
    columns={"ADM_ID_left": "ADM_ID"})[['ADM_ID', 'year', 'Bounded_ID', 'Sett_ID', 'geometry']]
Bounded.to_file(
    driver='GPKG', filename=r'Results/NonCumulativeSettlements.gpkg', layer='Settlements_Bounded')

In [23]:
del WSFE_matching, GRID3_matching, shards

### 4.2 BOUNDLESS SETTLEMENTS: Simple near join.

In [24]:
# Fragments of any bounded settlement will be combined into a single "boundless" settlement in this version.
# It is based on their "Sett_ID", which is a direct loan from the GRID3 settlement features.
Boundless = Bounded.dissolve(by=['year', 'Sett_ID'], as_index=False)
print(Boundless.info(), Boundless.sample(10))

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 80363 entries, 0 to 80362
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   year        80363 non-null  int64   
 1   Sett_ID     80363 non-null  float64 
 2   geometry    80363 non-null  geometry
 3   ADM_ID      80363 non-null  int32   
 4   Bounded_ID  80363 non-null  float64 
dtypes: float64(2), geometry(1), int32(1), int64(1)
memory usage: 2.8 MB
None        year   Sett_ID                                           geometry  \
14755  1993   97680.0  POLYGON ((-1480348.361 758278.222, -1480317.75...   
69479  2013   51742.0  MULTIPOLYGON (((-1554778.560 648652.823, -1554...   
65305  2012   51567.0  MULTIPOLYGON (((-1563194.804 621047.543, -1563...   
76471  2015    4407.0  MULTIPOLYGON (((-1329896.529 363235.046, -1329...   
70609  2013  146186.0  POLYGON ((-1178159.307 1100865.247, -1178128.7...   
11961  1988   46452.0  POLYGON ((-1664954.841 676900.797

In [25]:
# Clean up and save to file.
Boundless.to_file(driver='GPKG', filename=r'Results/NonCumulativeSettlements.gpkg', layer='Settlements_Boundless')

---

## 5. CUMULATIVE ANNUALIZED SETTLEMENT EXTENTS
DISSOLVE BY YEAR SETS: Create separate feature layers of each cumulative year.

### 5.1 Define study years for each for loop.

In [26]:
# Boundless = gpd.read_file(r'Results/NonCumulativeSettlements.gpkg', layer='Settlements_Boundless')

def CreateList(r1, r2):
    return [item for item in range(r1, r2+1)]

CuStart, CuEnd = Boundless['year'].min(), Boundless['year'].max()
StudyStart, StudyEnd = 1999, Boundless['year'].max()

AllCuYears = CreateList(CuStart, CuEnd) # All years in the WSFE dataset
AllStudyYears = CreateList(StudyStart, StudyEnd) # All years for which there will be growth stats in the present study.
print(AllCuYears, '\n\n', AllStudyYears)

ReversedStudyYears = []
for i in AllStudyYears:
    ReversedStudyYears.insert(0,i)
ReversedStudyYears.remove(StudyEnd)
print('\n\n', ReversedStudyYears)

[1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015] 

 [1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015]


 [2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999]


### 5.2 Starting with main Boundless dataset, create a cumulative area feature layer for each year.

In [27]:
# For each year in the growth stats study, we are taking features from all years prior to and including that year, 
# dissolving those features, and exporting as its own file.

for item in AllStudyYears:
    print('Subsetting to cumulative area for year: %s. %s\n' % (item, time.ctime()))
    CuYearSet = Boundless[Boundless['year'].between(
        CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.
    print('Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. %s\n' % time.ctime())
    CuYearDissolve = CuYearSet.dissolve(by='Sett_ID', 
                                        aggfunc={"year": "max", "ADM_ID":"min"}, # Though ADM_ID should be matching every time.
                                        as_index=False)
    print('Write to file. %s\n' % time.ctime())
    CuYearName = ''.join(['Cu', str(item), '_Boundless'])
    CuYearDissolve.to_file(driver='GPKG', filename=r'Results/CumulativeSettlements.gpkg', layer=CuYearName)
    del CuYearSet, CuYearDissolve
print("Done with all years in set. %s" % time.ctime())

Subsetting to cumulative area for year: 1999. Thu Dec  8 16:03:29 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Dec  8 16:03:29 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Thu Dec  8 16:03:52 2022

Subsetting to cumulative area for year: 2000. Thu Dec  8 16:03:59 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Dec  8 16:03:59 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Thu Dec  8 16:04:24 2022

Subsetting to cumulative area for year: 2001. Thu Dec  8 16:04:31 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Dec  8 16:04:31 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Thu Dec  8 16:04:58 2022

Subsetting to cumulative area for year: 2002. Thu Dec  8 16:05:06 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Dec  8 16:05:06 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Thu Dec  8 16:05:34 2022

Subsetting to cumulative area for year: 2003. Thu Dec  8 16:05:41 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Dec  8 16:05:41 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Thu Dec  8 16:06:11 2022

Subsetting to cumulative area for year: 2004. Thu Dec  8 16:06:18 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Dec  8 16:06:18 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Thu Dec  8 16:06:49 2022

Subsetting to cumulative area for year: 2005. Thu Dec  8 16:06:56 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Dec  8 16:06:56 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Thu Dec  8 16:07:29 2022

Subsetting to cumulative area for year: 2006. Thu Dec  8 16:07:36 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Dec  8 16:07:36 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Thu Dec  8 16:08:09 2022

Subsetting to cumulative area for year: 2007. Thu Dec  8 16:08:16 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Dec  8 16:08:16 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Thu Dec  8 16:08:51 2022

Subsetting to cumulative area for year: 2008. Thu Dec  8 16:08:58 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Dec  8 16:08:58 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Thu Dec  8 16:09:34 2022

Subsetting to cumulative area for year: 2009. Thu Dec  8 16:09:41 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Dec  8 16:09:41 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Thu Dec  8 16:10:18 2022

Subsetting to cumulative area for year: 2010. Thu Dec  8 16:10:25 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Dec  8 16:10:25 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Thu Dec  8 16:11:03 2022

Subsetting to cumulative area for year: 2011. Thu Dec  8 16:11:10 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Dec  8 16:11:10 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Thu Dec  8 16:11:49 2022

Subsetting to cumulative area for year: 2012. Thu Dec  8 16:11:56 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Dec  8 16:11:56 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Thu Dec  8 16:12:37 2022

Subsetting to cumulative area for year: 2013. Thu Dec  8 16:12:43 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Dec  8 16:12:43 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Thu Dec  8 16:13:23 2022

Subsetting to cumulative area for year: 2014. Thu Dec  8 16:13:29 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Dec  8 16:13:29 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Thu Dec  8 16:14:10 2022

Subsetting to cumulative area for year: 2015. Thu Dec  8 16:14:17 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Dec  8 16:14:17 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Thu Dec  8 16:14:59 2022

Done with all years in set. Thu Dec  8 16:15:06 2022


##### Join area information from each cumulative layer onto the latest year dataset.

In [28]:
# The latest year in the study contains all settlements. Merge all other years' areas onto this dataset.
SettAreas = gpd.read_file(r'Results/CumulativeSettlements.gpkg', layer=
                          ''.join(['Cu', str(StudyEnd), '_Boundless'])) 
SettAreas['Area2015'] = SettAreas['geometry'].area / 10**6
SettAreas = pd.DataFrame(SettAreas).drop(columns='geometry') # We have settlement IDs, so no need to join spatially!


for item in ReversedStudyYears:
    print("Loading cumulative layer for year %s. %s\n" % (item, time.ctime()))
    YearLayer = gpd.read_file(r'Results/CumulativeSettlements.gpkg', layer=''.join(['Cu', str(item), '_Boundless']))
    print("Adding area field and converting to non-spatial dataframe. %s\n" % (time.ctime()))
    AreaYearName = ''.join(['Area', str(item)])
    YearLayer[AreaYearName] = YearLayer['geometry'].area/ 10**6 
    YearLayer = pd.DataFrame(YearLayer)[['Sett_ID', AreaYearName]]
    print("Merging variables from %s onto our latest year (%s) via table join. %s\n" % (item, StudyEnd, time.ctime()))
    SettAreas = SettAreas.merge(YearLayer, how='left', on='Sett_ID')
print("Done merging annualized areas onto latest year geometries. Saving to file. %s\n" % (time.ctime()))


print(SettAreas.info())
SettAreas.to_csv(os.path.join(ResultsFolder, 'Areas%sto%s.csv' % (StudyStart, StudyEnd)))

Loading cumulative layer for year 2014. Thu Dec  8 16:15:07 2022

Adding area field and converting to non-spatial dataframe. Thu Dec  8 16:15:08 2022

Merging variables from 2014 onto our latest year (2015) via table join. Thu Dec  8 16:15:08 2022

Loading cumulative layer for year 2013. Thu Dec  8 16:15:08 2022

Adding area field and converting to non-spatial dataframe. Thu Dec  8 16:15:10 2022

Merging variables from 2013 onto our latest year (2015) via table join. Thu Dec  8 16:15:10 2022

Loading cumulative layer for year 2012. Thu Dec  8 16:15:10 2022

Adding area field and converting to non-spatial dataframe. Thu Dec  8 16:15:11 2022

Merging variables from 2012 onto our latest year (2015) via table join. Thu Dec  8 16:15:11 2022

Loading cumulative layer for year 2011. Thu Dec  8 16:15:11 2022

Adding area field and converting to non-spatial dataframe. Thu Dec  8 16:15:12 2022

Merging variables from 2011 onto our latest year (2015) via table join. Thu Dec  8 16:15:12 2022

Load

In [29]:
del SettAreas

### 5.3 Repeat for Bounded dataset.

In [30]:
# Bounded = gpd.read_file(r'Results/NonCumulativeSettlements.gpkg', layer='Settlements_Bounded')

for item in AllStudyYears:
    print('Subsetting to cumulative area for year: %s. %s\n' % (item, time.ctime()))
    CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.
    print('Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. %s\n' % time.ctime())
    CuYearDissolve = CuYearSet.dissolve(by='Bounded_ID', 
                                        aggfunc={"year": "max", "ADM_ID":"min", "Sett_ID":"min"}, # Though ADM_ID and Sett_ID should be matching every time.
                                        as_index=False)
    print('Write to file. %s\n' % time.ctime())
    CuYearName = ''.join(['Cu', str(item), '_Bounded'])
    CuYearDissolve.to_file(driver='GPKG', filename=r'Results/CumulativeSettlements.gpkg', layer=CuYearName)
    del CuYearSet, CuYearDissolve
print("Done with all years in set. %s" % time.ctime())

Subsetting to cumulative area for year: 1999. Thu Dec  8 16:15:30 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Dec  8 16:15:30 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Dec  8 16:15:50 2022

Subsetting to cumulative area for year: 2000. Thu Dec  8 16:15:57 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Dec  8 16:15:57 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Dec  8 16:16:19 2022

Subsetting to cumulative area for year: 2001. Thu Dec  8 16:16:27 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Dec  8 16:16:27 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Dec  8 16:16:51 2022

Subsetting to cumulative area for year: 2002. Thu Dec  8 16:16:58 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Dec  8 16:16:58 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Dec  8 16:17:23 2022

Subsetting to cumulative area for year: 2003. Thu Dec  8 16:17:30 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Dec  8 16:17:30 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Dec  8 16:17:57 2022

Subsetting to cumulative area for year: 2004. Thu Dec  8 16:18:05 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Dec  8 16:18:05 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Dec  8 16:18:35 2022

Subsetting to cumulative area for year: 2005. Thu Dec  8 16:18:42 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Dec  8 16:18:42 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Dec  8 16:19:13 2022

Subsetting to cumulative area for year: 2006. Thu Dec  8 16:19:21 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Dec  8 16:19:21 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Dec  8 16:19:53 2022

Subsetting to cumulative area for year: 2007. Thu Dec  8 16:20:00 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Dec  8 16:20:00 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Dec  8 16:20:33 2022

Subsetting to cumulative area for year: 2008. Thu Dec  8 16:20:41 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Dec  8 16:20:41 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Dec  8 16:21:15 2022

Subsetting to cumulative area for year: 2009. Thu Dec  8 16:21:22 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Dec  8 16:21:22 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Dec  8 16:21:58 2022

Subsetting to cumulative area for year: 2010. Thu Dec  8 16:22:05 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Dec  8 16:22:05 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Dec  8 16:22:42 2022

Subsetting to cumulative area for year: 2011. Thu Dec  8 16:22:49 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Dec  8 16:22:49 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Dec  8 16:23:27 2022

Subsetting to cumulative area for year: 2012. Thu Dec  8 16:23:35 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Dec  8 16:23:35 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Dec  8 16:24:14 2022

Subsetting to cumulative area for year: 2013. Thu Dec  8 16:24:21 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Dec  8 16:24:21 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Dec  8 16:25:02 2022

Subsetting to cumulative area for year: 2014. Thu Dec  8 16:25:09 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Dec  8 16:25:09 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Dec  8 16:25:51 2022

Subsetting to cumulative area for year: 2015. Thu Dec  8 16:25:58 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Dec  8 16:25:58 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Dec  8 16:26:40 2022

Done with all years in set. Thu Dec  8 16:26:48 2022


In [31]:
SettAreas = gpd.read_file(r'Results/CumulativeSettlements.gpkg', 
                          layer=''.join(['Cu', str(StudyEnd), '_Bounded']))
SettAreas['Area2015'] = SettAreas['geometry'].area / 10**6
SettAreas = pd.DataFrame(SettAreas).drop(columns='geometry')


for item in ReversedStudyYears:
    print("Loading cumulative layer for year %s. %s\n" % (item, time.ctime()))
    YearLayer = gpd.read_file(r'Results/CumulativeSettlements.gpkg', layer=''.join(['Cu', str(item), '_Bounded']))
    print("Adding area field and converting to non-spatial dataframe. %s\n" % (time.ctime()))
    AreaYearName = ''.join(['Area', str(item)])
    YearLayer[AreaYearName] = YearLayer['geometry'].area/ 10**6 
    YearLayer = pd.DataFrame(YearLayer)[['Bounded_ID', AreaYearName]]
    print("Merging variables from %s onto our latest year (%s) via table join. %s\n" % (item, StudyEnd, time.ctime()))
    SettAreas = SettAreas.merge(YearLayer, how='left', on='Bounded_ID')
print("Done merging annualized areas onto latest year geometries. Saving to file. %s\n" % (time.ctime()))

print(SettAreas.info())
SettAreas.to_csv(os.path.join(ResultsFolder, 'Areas%sto%s_%s.csv' % (StudyStart, StudyEnd, 'Bounded')))

Loading cumulative layer for year 2014. Thu Dec  8 16:26:49 2022

Adding area field and converting to non-spatial dataframe. Thu Dec  8 16:26:51 2022

Merging variables from 2014 onto our latest year (2015) via table join. Thu Dec  8 16:26:51 2022

Loading cumulative layer for year 2013. Thu Dec  8 16:26:51 2022

Adding area field and converting to non-spatial dataframe. Thu Dec  8 16:26:52 2022

Merging variables from 2013 onto our latest year (2015) via table join. Thu Dec  8 16:26:52 2022

Loading cumulative layer for year 2012. Thu Dec  8 16:26:52 2022

Adding area field and converting to non-spatial dataframe. Thu Dec  8 16:26:53 2022

Merging variables from 2012 onto our latest year (2015) via table join. Thu Dec  8 16:26:53 2022

Loading cumulative layer for year 2011. Thu Dec  8 16:26:53 2022

Adding area field and converting to non-spatial dataframe. Thu Dec  8 16:26:55 2022

Merging variables from 2011 onto our latest year (2015) via table join. Thu Dec  8 16:26:55 2022

Load

In [32]:
del SettAreas

### 5.4 One settlement geofile to rule them all. ...and in the Sett_ID bind them.
The annualized values can be stored as distinct non-spatial dataframes. Their Sett_IDs will be used to join onto this geoversion with place names for the summary stats.

In [33]:
Settlements = gpd.read_file(r'Results/CumulativeSettlements.gpkg', 
                           layer=''.join(['Cu', str(StudyEnd), '_Boundless']))[['Sett_ID', 'ADM_ID', 'geometry']]
print(Settlements.info())
Settlements.to_file(driver='GPKG', 
                       filename=r'Results/SETTLEMENTS.gpkg', 
                       layer='SETTLEMENTS')

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 13622 entries, 0 to 13621
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   Sett_ID   13622 non-null  float64 
 1   ADM_ID    13622 non-null  int64   
 2   geometry  13622 non-null  geometry
dtypes: float64(1), geometry(1), int64(1)
memory usage: 319.4 KB
None


### 5.5 Buffer the area of the Boundless dataset's latest year to mask raster data in later sections.
The Bounded dataset would also be fine for our purposes here. The buffer is dissolved to a single feature to be used for its total extents, which are identical between Bounded & Boundless datasets.

In [34]:
# Create buffer layer(s) to use as maximum distance for Near joins.

# Population buffer: 2km
Distance = 2000

print('Creating buffer layer. %s' % time.ctime())
BufferLayer = gpd.read_file(r'Results/SETTLEMENTS.gpkg', layer='SETTLEMENTS')
BufferLayer['geometry'] = BufferLayer['geometry'].apply(
    make_valid).buffer(Distance) # make_valid is a workaround for any null geometries.
print('Finished buffer layer creation. %s' % time.ctime())
BufferFileName1 = ''.join(['Buff', str(Distance), 'm_', str(StudyEnd)])
BufferLayer.to_file(driver='GPKG', filename=r'Results/Catchment.gpkg', layer=BufferFileName1)
print('Saved to file. %s' % time.ctime())

Creating buffer layer. Thu Dec  8 16:27:22 2022
Finished buffer layer creation. Thu Dec  8 17:36:33 2022
Saved to file. Thu Dec  8 17:36:38 2022


In [35]:
# NTL buffer: 250m
Distance = 250

print('Creating buffer layer. %s' % time.ctime())
BufferLayer = gpd.read_file(r'Results/SETTLEMENTS.gpkg', layer='SETTLEMENTS')
BufferLayer['geometry'] = BufferLayer['geometry'].apply(
    make_valid).buffer(Distance) # make_valid is a workaround for any null geometries.
print('Finished buffer layer creation. %s' % time.ctime())
BufferFileName2 = ''.join(['Buff', str(Distance), 'm_', str(StudyEnd)])
BufferLayer.to_file(driver='GPKG', filename=r'Results/Catchment.gpkg', layer=BufferFileName2)
print('Saved to file. %s' % time.ctime())

Creating buffer layer. Thu Dec  8 17:36:38 2022
Finished buffer layer creation. Thu Dec  8 17:38:43 2022
Saved to file. Thu Dec  8 17:38:49 2022


---

## 6. PLACE NAMES
Join urban place names from UCDB, Africapolis, and GeoNames onto the settlement vectors.

### 6.1 Load placename datasets, filter, and project.

In [36]:
# If restarting here:
Settlements = gpd.read_file(r'Results/SETTLEMENTS.gpkg', layer='SETTLEMENTS')
Settlements['Area2015'] = Settlements['geometry'].area / 10**6

# Load, pull name field, rename, and reproject to match the catchments CRS.
UCDB = gpd.read_file('PlaceName/GHS_STAT_UCDB2015MT_GLOBE_R2019A_V1_2.gpkg', 
                     layer=0)[['UC_NM_MN', 'geometry']].rename(
    columns={"UC_NM_MN": "UCDB_Name"}).to_crs("ESRI:102022")

Africapolis = gpd.read_file('PlaceName/AFRICAPOLIS2020.shp')[['agglosName', 'geometry']].rename(
    columns={"agglosName": "Afpl_Name"}).to_crs("ESRI:102022")

GeoNames = gpd.read_file('PlaceName/GeoNames.gpkg', 
                         layer=0)[['GeoName', 'geometry']].to_crs("ESRI:102022")

print(Settlements.info(), UCDB.info(), Africapolis.info(), GeoNames.info())

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 13622 entries, 0 to 13621
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   Sett_ID   13622 non-null  float64 
 1   ADM_ID    13622 non-null  int64   
 2   geometry  13622 non-null  geometry
 3   Area2015  13622 non-null  float64 
dtypes: float64(2), geometry(1), int64(1)
memory usage: 425.8 KB
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 13135 entries, 0 to 13134
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype   
---  ------     --------------  -----   
 0   UCDB_Name  13135 non-null  object  
 1   geometry   13135 non-null  geometry
dtypes: geometry(1), object(1)
memory usage: 205.4+ KB
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 7720 entries, 0 to 7719
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype   
---  ------     --------------  -----   
 0   Afpl_Name  7720 non-null   object  
 1   ge

### 6.2 Join placenames onto settlements geodataframe.

In [37]:
# We wrap it in pd.DataFrame() since the sjoin() is the last time we need the geometry.

GeoNames = pd.DataFrame(gpd.sjoin(Settlements, GeoNames, 
                             how='left', predicate='contains', # Name file is point type, so we can do contain.
                             lsuffix="G3", rsuffix="GN")).drop(columns='geometry')
Africapolis = pd.DataFrame(gpd.sjoin(Settlements, Africapolis, 
                             how='left', predicate='intersects', # Name file is polygon type.
                             lsuffix="G3", rsuffix="Af")).drop(columns='geometry')
UCDB = pd.DataFrame(gpd.sjoin(Settlements, UCDB, 
                             how='left', predicate='intersects', # Name file is polygon type.
                             lsuffix="G3", rsuffix="UC")).drop(columns='geometry')

In [38]:
print(GeoNames.info())
print(Africapolis.info())
print(UCDB.info())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 13628 entries, 0 to 13621
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Sett_ID   13628 non-null  float64
 1   ADM_ID    13628 non-null  int64  
 2   Area2015  13628 non-null  float64
 3   index_GN  119 non-null    float64
 4   GeoName   119 non-null    object 
dtypes: float64(3), int64(1), object(1)
memory usage: 638.8+ KB
None
<class 'pandas.core.frame.DataFrame'>
Int64Index: 13629 entries, 0 to 13621
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Sett_ID    13629 non-null  float64
 1   ADM_ID     13629 non-null  int64  
 2   Area2015   13629 non-null  float64
 3   index_Af   1433 non-null   float64
 4   Afpl_Name  1433 non-null   object 
dtypes: float64(3), int64(1), object(1)
memory usage: 638.9+ KB
None
<class 'pandas.core.frame.DataFrame'>
Int64Index: 13623 entries, 0 to 13621
Data columns (tot

In [39]:
alldatasets = [Africapolis[['Sett_ID', 'Afpl_Name', 'Area2015']], 
               GeoNames[['Sett_ID', 'GeoName']],
               UCDB[['Sett_ID', 'UCDB_Name']]]

SettlementsNamed = reduce(lambda left,right: pd.merge(left,right,on=['Sett_ID'],
                                            how='left'), alldatasets).fillna('UNK')

In [40]:
print(SettlementsNamed.info())
print(SettlementsNamed.sample(10))

<class 'pandas.core.frame.DataFrame'>
Int64Index: 13637 entries, 0 to 13636
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Sett_ID    13637 non-null  float64
 1   Afpl_Name  13637 non-null  object 
 2   Area2015   13637 non-null  float64
 3   GeoName    13637 non-null  object 
 4   UCDB_Name  13637 non-null  object 
dtypes: float64(2), object(3)
memory usage: 639.2+ KB
None
        Sett_ID  Afpl_Name  Area2015 GeoName UCDB_Name
8830    64802.0        UNK  0.000937     UNK       UNK
12632  194607.0        UNK  0.003747     UNK       UNK
5139    24082.0        UNK  0.016859     UNK       UNK
7885    54849.0  Bafoussam  0.010303     UNK       UNK
2592     8155.0        UNK  0.020606     UNK       UNK
1236     3443.0        UNK  0.019669     UNK       UNK
5122    23965.0        UNK  0.000937     UNK       UNK
8277    58848.0        UNK  0.004683     UNK       UNK
10772  104569.0        UNK  0.032782     UNK       UNK
104

In [41]:
del UCDB, Africapolis, GeoNames

### 6.3 Reduce to single name column.

In [42]:
# The left joins should have prevented duplication of rows, but sometimes they creep in.
# Dataset should have same number of rows as the original Settlements file.
SettlementsNamed.drop_duplicates(subset=['Sett_ID'], inplace=True, keep='first')
SettlementsNamed.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 13622 entries, 0 to 13636
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Sett_ID    13622 non-null  float64
 1   Afpl_Name  13622 non-null  object 
 2   Area2015   13622 non-null  float64
 3   GeoName    13622 non-null  object 
 4   UCDB_Name  13622 non-null  object 
dtypes: float64(2), object(3)
memory usage: 638.5+ KB


In [43]:
# Create a single name column where non-named settlements are "UNK" but all others use one of the three name sources.
SettlementsNamed['SettName'] = "UNK"

SettlementsNamed.loc[
    SettlementsNamed['Afpl_Name'] != "UNK", 
    'SettName'] = SettlementsNamed['Afpl_Name']

SettlementsNamed.loc[
    (SettlementsNamed['SettName'] == "UNK") & (SettlementsNamed['UCDB_Name'] != "UNK"), 
    'SettName'] = SettlementsNamed['UCDB_Name']

SettlementsNamed.loc[
    (SettlementsNamed['SettName'] == "UNK") & (SettlementsNamed['GeoName'] != "UNK"), 
    'SettName'] = SettlementsNamed['GeoName']

SettlementsNamed.sample(20)

Unnamed: 0,Sett_ID,Afpl_Name,Area2015,GeoName,UCDB_Name,SettName
583,1643.0,UNK,0.034656,UNK,UNK,UNK
6070,34197.0,UNK,0.000937,UNK,UNK,UNK
7748,53857.0,Bafoussam,0.000937,UNK,UNK,Bafoussam
9341,68812.0,UNK,0.085234,UNK,UNK,UNK
7385,50636.0,UNK,0.09179,UNK,UNK,UNK
11967,172307.0,UNK,0.001873,UNK,UNK,UNK
10551,89322.0,UNK,0.01405,UNK,UNK,UNK
11822,163514.0,UNK,0.014986,UNK,UNK,UNK
4725,22476.0,UNK,0.087107,UNK,UNK,UNK
9239,68708.0,UNK,0.11708,UNK,UNK,UNK


### 6.4 Make sure place name is unique by stripping smaller localities of duplicated names.

In [44]:
Dupes = SettlementsNamed[ 
    (SettlementsNamed['SettName'] != 'UNK') & 
    (SettlementsNamed.duplicated('SettName')) ]

Largest = Dupes.loc[Dupes.groupby(["SettName"])["Area2015"].idxmax()]
print(Largest)

        Sett_ID   Afpl_Name    Area2015    GeoName  UCDB_Name    SettName
6109    34755.0  Akonolinga    3.784013        UNK        UNK  Akonolinga
633      1753.0       Ambam    0.112396        UNK        UNK       Ambam
8291    59171.0       Awing    5.009134        UNK        UNK       Awing
6027    33965.0        Ayos    0.103967        UNK        UNK        Ayos
10542   89229.0     Babungo    0.001873        UNK        UNK     Babungo
...         ...         ...         ...        ...        ...         ...
12059  175759.0      Tourou    0.033719        UNK        UNK      Tourou
10444   85444.0         Wum    0.027162        UNK        UNK         Wum
11990  174679.0      Yagoua    0.003747        UNK        UNK      Yagoua
13509  199780.0     Yaounde  101.741235    Yaoundé    Yaounde     Yaounde
2381     7387.0   Yokadouma    1.451787  Yokadouma  Yokadouma   Yokadouma

[100 rows x 6 columns]


In [45]:
SettlementsNamed['SettName'].str.contains('UNK').value_counts()[False] # Count number of non-UNK settlements.

1493

In [46]:
SettlementsNamed.loc[~SettlementsNamed.Sett_ID.isin(Largest.Sett_ID), 'SettName'] = 'UNK'

In [47]:
SettlementsNamed['SettName'].str.contains('UNK').value_counts()[False] # Count number of non-UNK settlements.

100

In [48]:
print(SettlementsNamed.info(), SettlementsNamed.sample(20))

<class 'pandas.core.frame.DataFrame'>
Int64Index: 13622 entries, 0 to 13636
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Sett_ID    13622 non-null  float64
 1   Afpl_Name  13622 non-null  object 
 2   Area2015   13622 non-null  float64
 3   GeoName    13622 non-null  object 
 4   UCDB_Name  13622 non-null  object 
 5   SettName   13622 non-null  object 
dtypes: float64(2), object(4)
memory usage: 1.2+ MB
None         Sett_ID  Afpl_Name  Area2015 GeoName UCDB_Name SettName
3085    12352.0        UNK  0.018733     UNK       UNK      UNK
947      2665.0        UNK  0.117080     UNK       UNK      UNK
7718    53572.0  Bafoussam  0.000937     UNK       UNK      UNK
12638  194918.0        UNK  0.034656     UNK       UNK      UNK
3795    17065.0        UNK  0.005620     UNK       UNK      UNK
11449  145485.0        UNK  0.320330     UNK       UNK      UNK
4694    22044.0        UNK  0.103967     UNK       UNK      UNK
220

In [49]:
# Drop extra columns and save to file.
SettlementsNamed = SettlementsNamed[['Sett_ID', 'SettName']]
SettlementsNamed.to_csv(r'Results/PlaceNames.csv')

In [50]:
del SettlementsNamed

---

## 7. CREATE FRAGMENTATION INDEX
We are determining what percentage of a settlement's area lies outside of its administrative zone each year.
The index is a range of 0 to 100, i.e. the percent of the settlement area which is fragmented.

For each Sett_ID:
((Area of Boundless settlement - Area of largest Bounded settlement feature) / Area of Boundless settlement) * 100

### 7.1 Load boundless and bounded cumulative settlements and clean.

In [51]:
BoundlessAreas = pd.read_csv(os.path.join(ResultsFolder, ('Areas%sto%s.csv' % (StudyStart, StudyEnd))))
print('Loaded Boundless dataset, whose settlements will be used as the index of the Fragmentation Index dataset. %s' 
      % time.ctime())
print(BoundlessAreas.info())

BoundedAreas = pd.read_csv(os.path.join(ResultsFolder, ('Areas%sto%s_%s.csv' % (StudyStart, StudyEnd, 'Bounded'))))
print('Loaded Bounded dataset, which will factor into the fragmentation calculation. %s' % time.ctime())
print(BoundedAreas.info())

Loaded Boundless dataset, whose settlements will be used as the index of the Fragmentation Index dataset. Thu Dec  8 17:39:08 2022
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13622 entries, 0 to 13621
Data columns (total 21 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  13622 non-null  int64  
 1   Sett_ID     13622 non-null  float64
 2   year        13622 non-null  int64  
 3   ADM_ID      13622 non-null  int64  
 4   Area2015    13622 non-null  float64
 5   Area2014    13514 non-null  float64
 6   Area2013    13375 non-null  float64
 7   Area2012    13253 non-null  float64
 8   Area2011    13133 non-null  float64
 9   Area2010    13033 non-null  float64
 10  Area2009    12931 non-null  float64
 11  Area2008    12865 non-null  float64
 12  Area2007    12759 non-null  float64
 13  Area2006    12664 non-null  float64
 14  Area2005    12578 non-null  float64
 15  Area2004    12494 non-null  float64
 16  Area2003    12429

In [52]:
LargestFragments = BoundedAreas.loc[BoundedAreas.groupby(["Sett_ID"])["Area2015"].idxmax()] 
print(LargestFragments.info())
print("Filtered the Bounded dataset to only rows where latest year's area is largest for each Sett_ID. %s" % time.ctime())
LargestFragments.columns = LargestFragments.columns.str.replace('Area', 'Largest')
LargestFragments = LargestFragments.drop(columns=['year', 'ADM_ID'])
print("Renamed columns to avoid duplication during merge, and dropped unnecessary columns. %s" % time.ctime())
FragIndices = BoundlessAreas.merge(LargestFragments, how='left', on='Sett_ID')
print(FragIndices.info())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 13622 entries, 0 to 14268
Data columns (total 22 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  13622 non-null  int64  
 1   Bounded_ID  13622 non-null  float64
 2   year        13622 non-null  int64  
 3   ADM_ID      13622 non-null  int64  
 4   Sett_ID     13622 non-null  float64
 5   Area2015    13622 non-null  float64
 6   Area2014    13514 non-null  float64
 7   Area2013    13375 non-null  float64
 8   Area2012    13251 non-null  float64
 9   Area2011    13131 non-null  float64
 10  Area2010    13031 non-null  float64
 11  Area2009    12929 non-null  float64
 12  Area2008    12863 non-null  float64
 13  Area2007    12756 non-null  float64
 14  Area2006    12662 non-null  float64
 15  Area2005    12576 non-null  float64
 16  Area2004    12492 non-null  float64
 17  Area2003    12426 non-null  float64
 18  Area2002    12352 non-null  float64
 19  Area2001    12271 non-nul

In [53]:
del BoundlessAreas, BoundedAreas, LargestFragments

### 7.2 Merge and run fragmentation calculation.

In [54]:
for item in AllStudyYears:
    YY = str(item) # 4-digit year
    AreaYY = ''.join(["Area", YY]) # The Boundless area variable name
    LargestYY = ''.join(['Largest', YY]) # The Bounded largest area variable name
    FragYY = ''.join(["Frag", YY]) # Name for the fragmentation index variable
    print("Created names for Year %s's variables and temporary objects. %s" % (item, time.ctime()))
    
    FragIndices[FragYY] = ((FragIndices[AreaYY] - FragIndices[LargestYY]) / FragIndices[AreaYY]) * 100
    FragIndices[FragYY] = (FragIndices[FragYY].fillna(0).replace([np.inf, -np.inf], 0)).astype('int')
    print("Calculated fragmentation index for year %s. %s" % (item, time.ctime()))

# Remove unnecessary columns.
FragIndices = FragIndices.loc[:, ~FragIndices.columns.str.startswith('Largest')]
FragIndices = FragIndices.loc[:, ~FragIndices.columns.str.startswith('Area')]

print('Completed fragmentation index calculations for all years. %s' % time.ctime())
print(FragIndices.info())
print(FragIndices.sample(5))

Created names for Year 1999's variables and temporary objects. Thu Dec  8 17:39:09 2022
Calculated fragmentation index for year 1999. Thu Dec  8 17:39:09 2022
Created names for Year 2000's variables and temporary objects. Thu Dec  8 17:39:09 2022
Calculated fragmentation index for year 2000. Thu Dec  8 17:39:09 2022
Created names for Year 2001's variables and temporary objects. Thu Dec  8 17:39:09 2022
Calculated fragmentation index for year 2001. Thu Dec  8 17:39:09 2022
Created names for Year 2002's variables and temporary objects. Thu Dec  8 17:39:09 2022
Calculated fragmentation index for year 2002. Thu Dec  8 17:39:09 2022
Created names for Year 2003's variables and temporary objects. Thu Dec  8 17:39:09 2022
Calculated fragmentation index for year 2003. Thu Dec  8 17:39:09 2022
Created names for Year 2004's variables and temporary objects. Thu Dec  8 17:39:09 2022
Calculated fragmentation index for year 2004. Thu Dec  8 17:39:09 2022
Created names for Year 2005's variables and te

In [55]:
FragIndices = FragIndices.drop(columns=['Unnamed: 0_x', 'Unnamed: 0_y', 'year', 'ADM_ID'])
FragIndices.to_csv(os.path.join(ResultsFolder, 'FragIndex%sto%s.csv' % (StudyStart, StudyEnd)))
print('Saved to file. %s' % time.ctime())

Saved to file. Thu Dec  8 17:39:09 2022


In [56]:
del FragIndices

---

## 8. PREPARE YEARLY DATASETS: POPULATION
Can use this as a template for other annualized rasters

### 8.1 Reproject and reclassify with settlement buffer mask.
Reclassify so that we only need to work with cells within X distance of settlements.

In [57]:
ProjCRS = gdal.WarpOptions(dstSRS='ESRI:102022')
AnnualizedSourceFiles = [i for i in os.listdir('Population/') if i.endswith('.tif')]

with fiona.open(r'Results/Catchment.gpkg', mode="r", layer="Buff2000m_2015") as shapefile:
    MaskGeom = [feature["geometry"] for feature in shapefile] # Identify the bounding areas of the mask.
# Mask_out = './LatestYearBuffer.tif'
AnnualizedSourceFiles

['cmr_ppp_2000_UNadj.tif',
 'cmr_ppp_2001_UNadj.tif',
 'cmr_ppp_2002_UNadj.tif',
 'cmr_ppp_2003_UNadj.tif',
 'cmr_ppp_2004_UNadj.tif',
 'cmr_ppp_2005_UNadj.tif',
 'cmr_ppp_2006_UNadj.tif',
 'cmr_ppp_2007_UNadj.tif',
 'cmr_ppp_2008_UNadj.tif',
 'cmr_ppp_2009_UNadj.tif',
 'cmr_ppp_2010_UNadj.tif',
 'cmr_ppp_2011_UNadj.tif',
 'cmr_ppp_2012_UNadj.tif',
 'cmr_ppp_2013_UNadj.tif',
 'cmr_ppp_2014_UNadj.tif',
 'cmr_ppp_2015_UNadj.tif']

In [58]:
# This codeblock changes each annual population raster's projection (gdal.Warp()), 
# then masks it to within a specified distance of the settlements (rasterio.mask.mask()).

for YearFile in AnnualizedSourceFiles:
    InputRasterName = os.path.join(ProjectFolder, "Population", YearFile)
    Year = str(re.sub(r'[^0-9]', '', YearFile))
    InputRasterObject = gdal.Open(InputRasterName)
    TempOutputName = "Temp_" + Year + "_albers.tif"
    TempOutputPath = os.path.join(ProjectFolder, "Population", TempOutputName)
    if exists(TempOutputPath):
        pass
    else:
        # Reproject to same CRS as settlements.
        Warp = gdal.Warp(TempOutputPath, # Where to store the warped raster
                     InputRasterObject, # Which raster to warp
                     format='GTiff', 
                     options=ProjCRS) # Reproject to Africa Albers Equal Area Conic
        print('Finished gdal.Warp() for year %s. %s \n' % (Year, time.ctime()))
        
        Warp = None # Close the files
        InputRasterObject = None

        # Reclassify as nodata if outside settlement buffer zones.
        with rasterio.open(TempOutputPath) as InputRasterObject:
            MaskedOutputRaster, OutTransform = rasterio.mask.mask(
                InputRasterObject, MaskGeom, crop=True) # Anything outside the mask is reclassed to the raster's NoData value.
            OutMetaData = InputRasterObject.meta.copy()
        print('Finished rasterio.mask.mask() for year %s. %s \n' % (Year, time.ctime()))
            
        OutMetaData.update({"driver": "GTiff",
                         "height": MaskedOutputRaster.shape[1],
                         "width": MaskedOutputRaster.shape[2],
                         "transform": OutTransform})
        FinalOutputPath = os.path.join(ProjectFolder, "Population", ''.join(['Masked_', Year, '.tif'])) # ''.join([r'Population/', 'Masked_', Year, '.tif']
        with rasterio.open(FinalOutputPath, "w", **OutMetaData) as dest:
            dest.write(MaskedOutputRaster)
        print('Written to file. %s \n' % time.ctime())
    InputRasterObject = None
    
    try:  # Finally, remove the intermediate file from disk
        os.remove(TempOutputPath)
    except OSError:
        pass
    print('Removed intermediate file. %s \n' % time.ctime())

print('\n \n Finished all years in list. %s' % time.ctime())

Finished gdal.Warp() for year 2000. Thu Dec  8 17:39:15 2022 

Finished rasterio.mask.mask() for year 2000. Thu Dec  8 17:39:19 2022 

Written to file. Thu Dec  8 17:39:20 2022 

Removed intermediate file. Thu Dec  8 17:39:20 2022 

Finished gdal.Warp() for year 2001. Thu Dec  8 17:39:26 2022 

Finished rasterio.mask.mask() for year 2001. Thu Dec  8 17:39:30 2022 

Written to file. Thu Dec  8 17:39:31 2022 

Removed intermediate file. Thu Dec  8 17:39:31 2022 

Finished gdal.Warp() for year 2002. Thu Dec  8 17:39:37 2022 

Finished rasterio.mask.mask() for year 2002. Thu Dec  8 17:39:41 2022 

Written to file. Thu Dec  8 17:39:42 2022 

Removed intermediate file. Thu Dec  8 17:39:42 2022 

Finished gdal.Warp() for year 2003. Thu Dec  8 17:39:47 2022 

Finished rasterio.mask.mask() for year 2003. Thu Dec  8 17:39:52 2022 

Written to file. Thu Dec  8 17:39:54 2022 

Removed intermediate file. Thu Dec  8 17:39:54 2022 

Finished gdal.Warp() for year 2004. Thu Dec  8 17:39:59 2022 

Finis

In [59]:
print(os.listdir('Population/'))

['cmr_ppp_2000_UNadj.tif', 'cmr_ppp_2001_UNadj.tif', 'cmr_ppp_2002_UNadj.tif', 'cmr_ppp_2003_UNadj.tif', 'cmr_ppp_2004_UNadj.tif', 'cmr_ppp_2005_UNadj.tif', 'cmr_ppp_2006_UNadj.tif', 'cmr_ppp_2007_UNadj.tif', 'cmr_ppp_2008_UNadj.tif', 'cmr_ppp_2009_UNadj.tif', 'cmr_ppp_2010_UNadj.tif', 'cmr_ppp_2011_UNadj.tif', 'cmr_ppp_2012_UNadj.tif', 'cmr_ppp_2013_UNadj.tif', 'cmr_ppp_2014_UNadj.tif', 'cmr_ppp_2015_UNadj.tif', 'Masked_2000.tif', 'Masked_2001.tif', 'Masked_2002.tif', 'Masked_2003.tif', 'Masked_2004.tif', 'Masked_2005.tif', 'Masked_2006.tif', 'Masked_2007.tif', 'Masked_2008.tif', 'Masked_2009.tif', 'Masked_2010.tif', 'Masked_2011.tif', 'Masked_2012.tif', 'Masked_2013.tif', 'Masked_2014.tif', 'Masked_2015.tif']


In [60]:
AnnualizedSourceFiles = None

### 8.2 Raster values summarized by settlement.
1. Convert each annualized raster to .xyz, 
2. then bring them to vector space and assign their Sett_ID,
3. and finally, aggregate the value as appropriate to the settlement level and save table to file.

XYZ is similar to .csv. Raster cell centers are stored as x and y, and their value is stored as z.

In [61]:
NoDataVal = -99999 
Settlements = gpd.read_file(r'Results/SETTLEMENTS.gpkg', layer='SETTLEMENTS')
AllSummaries = pd.DataFrame(Settlements).drop(columns='geometry')

AnnualizedMaskedFiles = [i for i in os.listdir('Population/') if i.startswith('Masked') and i.endswith('.tif')]
AnnualizedMaskedFiles

['Masked_2000.tif',
 'Masked_2001.tif',
 'Masked_2002.tif',
 'Masked_2003.tif',
 'Masked_2004.tif',
 'Masked_2005.tif',
 'Masked_2006.tif',
 'Masked_2007.tif',
 'Masked_2008.tif',
 'Masked_2009.tif',
 'Masked_2010.tif',
 'Masked_2011.tif',
 'Masked_2012.tif',
 'Masked_2013.tif',
 'Masked_2014.tif',
 'Masked_2015.tif']

In [62]:
for YearFile in AnnualizedMaskedFiles:
    
### STEP 1: TIF TO XYZ ###
    InputRasterName = os.path.join(ProjectFolder, "Population", YearFile)
    Year = str(re.sub(r'[^0-9]', '', YearFile))
    print('Loading data for year %s. %s \n' % (Year, time.ctime()))
    InputRasterObject = gdal.Open(InputRasterName)
    XYZOutputPath = r'Population/{}'.format(
        YearFile.replace('.tif', '.xyz')) # New file path will be the same as original, but .tif is replaced with .xyz
    
    # Create an .xyz version of the .tif
    XYZ = gdal.Translate(XYZOutputPath, # Specify a destination path
                         InputRasterObject, # Input is the masked .tif file
                         format='XYZ', 
                         creationOptions=["ADD_HEADER_LINE=YES"])
    print('Finished gdal.Translate() for year %s. %s \n' % (Year, time.ctime()))

#     # Remove the temporary masked tif file.
#     try:  
#         os.remove(InputRasterName)
#     except OSError:
#         pass
#     print('Removed (or skipped if error) intermediate tif file. %s \n' % time.ctime())
    
    InputRasterObject = None
    XYZ = None # Reload XYZ as a point geodataframe

    
### STEP 2: GENERATE GEODATAFRAME WITH SETT_ID FIELD ###
    InputXYZName = ''.join(['Masked_', Year, '.xyz'])
    InputXYZ = pd.read_table(os.path.join(ProjectFolder, 'Population', InputXYZName), delim_whitespace=True)
    InputXYZ = InputXYZ.loc[InputXYZ['Z'] != NoDataVal] # Subset to only the features that have a raster value.
    print('Loaded XYZ file as a pandas dataframe, year %s. %s \n' % (Year, time.ctime()))
    ValObject = gpd.GeoDataFrame(InputXYZ,
                                 geometry = gpd.points_from_xy(InputXYZ['X'], InputXYZ['Y']),
                                 crs = 'ESRI:102022')
    print('Created geodataframe from non-NoData points, year %s. %s \n' % (Year, time.ctime()))
    del InputXYZ
    
    # Sjoin_nearest: No need to group by ADM this time. 
    ValObject_withID = gpd.sjoin_nearest(ValObject, 
                                    Settlements, 
                                    how='left') # No need for max_distance parameter this time. We've already narrowed down to nearby raster cells.
    
    print('\nJoined settlement ID onto vectorized raster cells for year %s. %s \n' % (Year, time.ctime()))
    print(ValObject_withID.sample(10))
    del ValObject
    
    # We no longer need the spatial information of the raster values because we have their unique settlement ID.
    ValObject_withID = pd.DataFrame(ValObject_withID).drop(columns='geometry')
    
    ValObject_withID.to_csv(''.join([r'Population/', 'Masked_', Year, '.csv']))
    print('\nExported as table, year %s. %s \n' % (Year, time.ctime()))
    
    # Remove the temporary xyz file.
    try:  
        os.remove(os.path.join(ProjectFolder, 'Population', InputXYZName))
    except OSError:
        pass
    print('Removed (or skipped if error) intermediate xyz file. %s \n' % time.ctime())

    

### STEP 3: AGGREGATE BY SETTLEMENT AND MERGE ONTO SUMMARIES TABLE ###
    VariableName = ''.join(['PopSum', Year])
    
    ValAggregated = ValObject_withID.groupby('Sett_ID', 
                                      as_index=False)['Z'].sum().rename(columns={"Z": VariableName})
    print('\nValues aggregated to settlement level, year %s. %s \n' % (Year, time.ctime()))
    print(ValAggregated.sample(10))
    
    AllSummaries = AllSummaries.merge(ValAggregated, how='left', on='Sett_ID')
    print('\nMerged year %s onto latest year settlement feature layer. %s \n' % (Year, time.ctime()))
    print(AllSummaries.sample(10))
    
    del ValObject_withID, ValAggregated
    print('\n\n')
    

print('\n\nFinished. All years masked and assigned their nearest settlement. %s' % time.ctime())

AllSummaries.to_csv(os.path.join(ResultsFolder, 'Pop%sto%s.csv' % (2000, 2015)))
print('Saved to file. %s \n' % time.ctime())

Loading data for year 2000. Thu Dec  8 17:42:09 2022 

Finished gdal.Translate() for year 2000. Thu Dec  8 17:44:53 2022 

Loaded XYZ file as a pandas dataframe, year 2000. Thu Dec  8 17:48:01 2022 

Created geodataframe from non-NoData points, year 2000. Thu Dec  8 17:48:05 2022 


Joined settlement ID onto vectorized raster cells for year 2000. Thu Dec  8 17:59:44 2022 

                      X             Y         Z  \
77166079  -1.485701e+06  6.520920e+05  1.317379   
90963241  -1.263179e+06  4.969494e+05  0.113568   
105159018 -1.432571e+06  3.372770e+05  0.624236   
76041317  -1.520429e+06  6.647375e+05  2.629990   
94520722  -1.291773e+06  4.569369e+05  0.312197   
95215909  -1.411244e+06  4.491043e+05  0.967864   
74338324  -1.484569e+06  6.838944e+05  1.102470   
39510721  -1.160128e+06  1.075620e+06  0.160976   
28334650  -1.090200e+06  1.201320e+06  0.736261   
115023008 -1.001777e+06  2.263934e+05  0.221321   

                                   geometry  index_right   Set

Finished gdal.Translate() for year 2003. Thu Dec  8 18:39:28 2022 

Loaded XYZ file as a pandas dataframe, year 2003. Thu Dec  8 18:42:29 2022 

Created geodataframe from non-NoData points, year 2003. Thu Dec  8 18:42:32 2022 


Joined settlement ID onto vectorized raster cells for year 2003. Thu Dec  8 18:53:31 2022 

                      X             Y         Z  \
101197847 -1.490986e+06  3.818192e+05  0.363024   
105888592 -1.474377e+06  3.290669e+05  0.120589   
8288803   -1.066420e+06  1.426767e+06  0.253015   
91901549  -1.403223e+06  4.863800e+05  0.558720   
99596418  -1.372836e+06  3.998436e+05  0.288774   
46047639  -1.129081e+06  1.002107e+06  0.568933   
66551849  -1.449369e+06  7.714688e+05  0.263148   
38394281  -1.201367e+06  1.088171e+06  0.947264   
89207317  -1.471263e+06  5.166725e+05  0.137083   
67401773  -1.219769e+06  7.619375e+05  2.227901   

                                   geometry  index_right   Sett_ID  ADM_ID  
101197847   POINT (-1490985.697 381819.1


Exported as table, year 2005. Thu Dec  8 19:29:35 2022 

Removed (or skipped if error) intermediate xyz file. Thu Dec  8 19:29:36 2022 


Values aggregated to settlement level, year 2005. Thu Dec  8 19:29:36 2022 

        Sett_ID   PopSum2005
3462    14656.0   143.404916
8409    59697.0   112.879680
10723   99142.0   699.170236
8560    61323.0   191.171702
3649    15941.0    38.951498
10532   89285.0    74.171582
10440   85486.0   877.788448
945      2668.0    84.568843
10850  111619.0  1434.658957
5332    24656.0   132.939086

Merged year 2005 onto latest year settlement feature layer. Thu Dec  8 19:29:36 2022 

       Sett_ID  ADM_ID   PopSum2000   PopSum2001   PopSum2002   PopSum2003  \
2546    7845.0     167  1638.420614  1687.255548  1583.490713  1716.008143   
8006   56283.0     267   434.211739   450.195530   468.482166   450.711706   
9053   67069.0     286    12.579460    12.374531    11.008186    11.596449   
2066    6319.0      44   111.596905   137.708425   139.663374   1

Finished gdal.Translate() for year 2008. Thu Dec  8 20:07:20 2022 

Loaded XYZ file as a pandas dataframe, year 2008. Thu Dec  8 20:10:16 2022 

Created geodataframe from non-NoData points, year 2008. Thu Dec  8 20:10:19 2022 


Joined settlement ID onto vectorized raster cells for year 2008. Thu Dec  8 20:21:19 2022 

                      X             Y         Z  \
45374853  -1.271200e+06  1.009656e+06  0.264824   
58677190  -1.025652e+06  8.600813e+05  0.078509   
83990429  -1.252893e+06  5.753700e+05  0.182034   
84022137  -1.428042e+06  5.749925e+05  0.520043   
91104082  -1.433609e+06  4.953451e+05  1.029662   
60118805  -1.180134e+06  8.438498e+05  0.235516   
79934793  -1.515522e+06  6.209502e+05  0.926240   
113554466 -1.012818e+06  2.429080e+05  0.253989   
73532901  -1.473905e+06  6.929538e+05  0.133780   
111771079 -1.437007e+06  2.629142e+05  0.134126   

                                   geometry  index_right   Sett_ID  ADM_ID  
45374853   POINT (-1271200.265 1009656.2

Finished gdal.Translate() for year 2010. Thu Dec  8 20:42:17 2022 

Loaded XYZ file as a pandas dataframe, year 2010. Thu Dec  8 20:45:14 2022 

Created geodataframe from non-NoData points, year 2010. Thu Dec  8 20:45:17 2022 


Joined settlement ID onto vectorized raster cells for year 2010. Thu Dec  8 20:56:14 2022 

                      X             Y         Z  \
79464965  -1.509105e+06  6.262349e+05  0.783713   
105375744 -1.568463e+06  3.348234e+05  2.674244   
63207533  -1.100864e+06  8.091221e+05  0.056628   
101073663 -1.332351e+06  3.832347e+05  0.187392   
93821599  -1.543738e+06  4.647695e+05  0.202552   
76479522  -1.343676e+06  6.598303e+05  0.057565   
22619688  -1.155409e+06  1.265585e+06  1.005828   
97985178  -1.388690e+06  4.179625e+05  1.194914   
87460306  -1.630086e+06  5.363012e+05  1.292169   
5377235   -1.056133e+06  1.459513e+06  1.910453   

                                   geometry  index_right   Sett_ID  ADM_ID  
79464965    POINT (-1509104.547 626234.9

Finished gdal.Translate() for year 2012. Thu Dec  8 21:17:11 2022 

Loaded XYZ file as a pandas dataframe, year 2012. Thu Dec  8 21:20:08 2022 

Created geodataframe from non-NoData points, year 2012. Thu Dec  8 21:20:11 2022 


Joined settlement ID onto vectorized raster cells for year 2012. Thu Dec  8 21:31:10 2022 

                      X             Y          Z  \
99285913  -1.376422e+06  4.033353e+05   0.315939   
101627426 -1.336409e+06  3.770063e+05   0.407626   
100091665 -1.356038e+06  3.942759e+05   0.220534   
77769897  -1.517220e+06  6.452974e+05  28.306953   
95333319  -1.417283e+06  4.477831e+05   0.199549   
76712824  -1.499007e+06  6.571879e+05   2.309147   
95199380  -1.387369e+06  4.492930e+05   0.428813   
19632736  -1.132383e+06  1.299180e+06   6.770020   
42640666  -1.150502e+06  1.040421e+06   0.242711   
60151165  -1.293754e+06  8.434724e+05   0.113996   

                                   geometry  index_right   Sett_ID  ADM_ID  
99285913    POINT (-1376421.7

Finished gdal.Translate() for year 2014. Thu Dec  8 21:53:13 2022 

Loaded XYZ file as a pandas dataframe, year 2014. Thu Dec  8 21:56:25 2022 

Created geodataframe from non-NoData points, year 2014. Thu Dec  8 21:56:29 2022 


Joined settlement ID onto vectorized raster cells for year 2014. Thu Dec  8 22:11:41 2022 

                     X              Y         Z  \
96146494 -1.488249e+06  438629.306222  0.292688   
75520895 -1.537415e+06  670588.340297  2.020557   
72248139 -1.562517e+06  707392.255386  1.316793   
74942134 -1.516843e+06  677099.802197  1.335178   
94334222 -1.470885e+06  459013.013040  0.397369   
87051286 -1.428230e+06  540925.316367  2.055139   
92798794 -1.459089e+06  476282.542428  0.260160   
90498276 -1.589696e+06  502139.652004  0.563185   
47188904 -1.120682e+06  989272.497364  0.727111   
89300432 -1.394446e+06  515634.420870  0.182080   

                                 geometry  index_right   Sett_ID  ADM_ID  
96146494  POINT (-1488248.996 438629.306) 

Saved to file. Thu Dec  8 22:35:10 2022 



In [63]:
AllSummaries.sort_values('PopSum2010', ascending=False).head(20)

Unnamed: 0,Sett_ID,ADM_ID,PopSum2000,PopSum2001,PopSum2002,PopSum2003,PopSum2004,PopSum2005,PopSum2006,PopSum2007,PopSum2008,PopSum2009,PopSum2010,PopSum2011,PopSum2012,PopSum2013,PopSum2014,PopSum2015
3964,19120.0,12,1371739.0,1433621.0,1532100.0,1611711.0,1677785.0,1768640.0,1869954.0,1947824.0,2055855.0,2155441.0,2260273.0,2370697.0,2559540.0,2697901.0,2856110.0,2999944.0
3083,12521.0,9,755434.8,722916.9,806793.8,869591.9,878966.5,941934.2,1003072.0,991789.2,1040045.0,1135986.0,1178489.0,1235285.0,1295360.0,1351160.0,1472478.0,1600271.0
13493,199779.0,12,481384.0,495793.6,532797.8,588521.9,592590.1,635896.4,671004.6,719385.5,725204.2,764012.8,821075.0,843094.5,995688.9,1077169.0,1116253.0,1144993.0
13494,199780.0,12,481384.0,495793.6,532797.8,588521.9,592590.1,635896.4,671004.6,719385.5,725204.2,764012.8,821075.0,843094.5,995688.9,1077169.0,1116253.0,1144993.0
13495,199782.0,12,481384.0,495793.6,532797.8,588521.9,592590.1,635896.4,671004.6,719385.5,725204.2,764012.8,821075.0,843094.5,995688.9,1077169.0,1116253.0,1144993.0
13496,199787.0,12,481384.0,495793.6,532797.8,588521.9,592590.1,635896.4,671004.6,719385.5,725204.2,764012.8,821075.0,843094.5,995688.9,1077169.0,1116253.0,1144993.0
3135,12674.0,9,343418.5,326506.4,366842.4,396272.0,395475.9,429885.0,458517.2,455557.5,459513.9,521346.0,546524.8,572412.7,623601.1,665302.1,738566.4,799116.8
3105,12544.0,9,343020.8,326210.1,366444.6,395861.6,395085.1,429502.1,458076.0,455178.5,458989.6,520879.3,545938.2,571906.6,622982.7,664689.3,737948.1,798284.5
3137,12684.0,9,342807.8,325999.3,366211.6,395615.0,394830.4,429221.8,457786.9,454876.6,458704.7,520535.8,545594.2,571544.8,622584.1,664260.1,737464.7,797780.7
3133,12670.0,9,342715.1,325899.7,366112.6,395501.4,394717.1,429096.7,457653.6,454738.8,458578.3,520380.4,545447.8,571378.9,622414.3,664075.6,737251.0,797570.7


---

## 9. PREPARE YEARLY DATASETS: NIGHTTIME LIGHTS

### 9.1 Reproject and reclassify with settlement buffer mask.
Reclassify so that we only need to work with cells within X distance of settlements.

In [64]:
ProjCRS = gdal.WarpOptions(dstSRS='ESRI:102022')
AnnualizedSourceFiles = [i for i in os.listdir('NTL/') if i.endswith('.tif')]

with fiona.open(r'Results/Catchment.gpkg', mode="r", layer="Buff250m_2015") as shapefile:
    MaskGeom = [feature["geometry"] for feature in shapefile] # Identify the bounding areas of the mask.
# Mask_out = './LatestYearBuffer.tif'
AnnualizedSourceFiles

['Harmonized_DN_NTL_1999_calDMSP.tif',
 'Harmonized_DN_NTL_2000_calDMSP.tif',
 'Harmonized_DN_NTL_2001_calDMSP.tif',
 'Harmonized_DN_NTL_2002_calDMSP.tif',
 'Harmonized_DN_NTL_2003_calDMSP.tif',
 'Harmonized_DN_NTL_2004_calDMSP.tif',
 'Harmonized_DN_NTL_2005_calDMSP.tif',
 'Harmonized_DN_NTL_2006_calDMSP.tif',
 'Harmonized_DN_NTL_2007_calDMSP.tif',
 'Harmonized_DN_NTL_2008_calDMSP.tif',
 'Harmonized_DN_NTL_2009_calDMSP.tif',
 'Harmonized_DN_NTL_2010_calDMSP.tif',
 'Harmonized_DN_NTL_2011_calDMSP.tif',
 'Harmonized_DN_NTL_2012_calDMSP.tif',
 'Harmonized_DN_NTL_2013_calDMSP.tif',
 'Harmonized_DN_NTL_2014_simVIIRS.tif',
 'Harmonized_DN_NTL_2015_simVIIRS.tif']

In [65]:
ValStart = 1999
ValEnd = 2015

In [66]:
# This codeblock changes each annual population raster's projection (gdal.Warp()), 
# then masks it to within a specified distance of the settlements (rasterio.mask.mask()).

for YearFile in AnnualizedSourceFiles:
    InputRasterName = os.path.join(ProjectFolder, "NTL", YearFile)
    Year = str(re.sub(r'[^0-9]', '', YearFile))
    InputRasterObject = gdal.Open(InputRasterName)
    TempOutputName = "Temp_" + Year + "_albers.tif"
    TempOutputPath = os.path.join(ProjectFolder, "NTL", TempOutputName)
    if exists(TempOutputPath):
        pass
    else:
        # Reproject to same CRS as settlements.
        Warp = gdal.Warp(TempOutputPath, # Where to store the warped raster
                     InputRasterObject, # Which raster to warp
                     format='GTiff', 
                     options=ProjCRS) # Reproject to Africa Albers Equal Area Conic
        print('Finished gdal.Warp() for year %s. %s \n' % (Year, time.ctime()))
        
        Warp = None # Close the files
        InputRasterObject = None

        # Reclassify as nodata if outside settlement buffer zones.
        with rasterio.open(TempOutputPath) as InputRasterObject:
            MaskedOutputRaster, OutTransform = rasterio.mask.mask(
                InputRasterObject, MaskGeom, crop=True) # Anything outside the mask is reclassed to the raster's NoData value.
            OutMetaData = InputRasterObject.meta.copy()
        print('Finished rasterio.mask.mask() for year %s. %s \n' % (Year, time.ctime()))
            
        OutMetaData.update({"driver": "GTiff",
                         "height": MaskedOutputRaster.shape[1],
                         "width": MaskedOutputRaster.shape[2],
                         "transform": OutTransform})
        FinalOutputPath = os.path.join(ProjectFolder, "NTL", ''.join(['Masked_', Year, '.tif']))
        with rasterio.open(FinalOutputPath, "w", **OutMetaData) as dest:
            dest.write(MaskedOutputRaster)
        print('Written to file. %s \n' % time.ctime())
    InputRasterObject = None
    
    try:  # Finally, remove the intermediate file from disk
        os.remove(TempOutputPath)
    except OSError:
        pass
    print('Removed intermediate file. %s \n' % time.ctime())

print('\n \n Finished all years in list. %s' % time.ctime())

Finished gdal.Warp() for year 1999. Thu Dec  8 22:36:22 2022 

Finished rasterio.mask.mask() for year 1999. Thu Dec  8 22:36:25 2022 

Written to file. Thu Dec  8 22:36:25 2022 

Removed intermediate file. Thu Dec  8 22:36:25 2022 

Finished gdal.Warp() for year 2000. Thu Dec  8 22:37:35 2022 

Finished rasterio.mask.mask() for year 2000. Thu Dec  8 22:37:38 2022 

Written to file. Thu Dec  8 22:37:38 2022 

Removed intermediate file. Thu Dec  8 22:37:38 2022 

Finished gdal.Warp() for year 2001. Thu Dec  8 22:38:47 2022 

Finished rasterio.mask.mask() for year 2001. Thu Dec  8 22:38:50 2022 

Written to file. Thu Dec  8 22:38:50 2022 

Removed intermediate file. Thu Dec  8 22:38:50 2022 

Finished gdal.Warp() for year 2002. Thu Dec  8 22:39:58 2022 

Finished rasterio.mask.mask() for year 2002. Thu Dec  8 22:40:01 2022 

Written to file. Thu Dec  8 22:40:01 2022 

Removed intermediate file. Thu Dec  8 22:40:01 2022 

Finished gdal.Warp() for year 2003. Thu Dec  8 22:41:09 2022 

Finis

In [67]:
print(os.listdir('NTL/'))

['Harmonized_DN_NTL_1999_calDMSP.tif', 'Harmonized_DN_NTL_2000_calDMSP.tif', 'Harmonized_DN_NTL_2001_calDMSP.tif', 'Harmonized_DN_NTL_2002_calDMSP.tif', 'Harmonized_DN_NTL_2003_calDMSP.tif', 'Harmonized_DN_NTL_2004_calDMSP.tif', 'Harmonized_DN_NTL_2005_calDMSP.tif', 'Harmonized_DN_NTL_2006_calDMSP.tif', 'Harmonized_DN_NTL_2007_calDMSP.tif', 'Harmonized_DN_NTL_2008_calDMSP.tif', 'Harmonized_DN_NTL_2009_calDMSP.tif', 'Harmonized_DN_NTL_2010_calDMSP.tif', 'Harmonized_DN_NTL_2011_calDMSP.tif', 'Harmonized_DN_NTL_2012_calDMSP.tif', 'Harmonized_DN_NTL_2013_calDMSP.tif', 'Harmonized_DN_NTL_2014_simVIIRS.tif', 'Harmonized_DN_NTL_2015_simVIIRS.tif', 'Masked_1999.tif', 'Masked_2000.tif', 'Masked_2001.tif', 'Masked_2002.tif', 'Masked_2003.tif', 'Masked_2004.tif', 'Masked_2005.tif', 'Masked_2006.tif', 'Masked_2007.tif', 'Masked_2008.tif', 'Masked_2009.tif', 'Masked_2010.tif', 'Masked_2011.tif', 'Masked_2012.tif', 'Masked_2013.tif', 'Masked_2014.tif', 'Masked_2015.tif']


In [68]:
AnnualizedSourceFiles = None

### 9.2 Raster values summarized by settlement.
1. Convert each annualized raster to .xyz, 
2. then bring them to vector space and assign their Sett_ID,
3. and finally, aggregate the value as appropriate to the settlement level and save table to file.

XYZ is similar to .csv. Raster cell centers are stored as x and y, and their value is stored as z.

In [69]:
NoDataVal = 0
Settlements = gpd.read_file(r'Results/SETTLEMENTS.gpkg', layer='SETTLEMENTS')
AllSummaries = pd.DataFrame(Settlements).drop(columns='geometry')

AnnualizedMaskedFiles = [i for i in os.listdir('NTL/') if i.startswith('Masked') and i.endswith('.tif')]
AnnualizedMaskedFiles

['Masked_1999.tif',
 'Masked_2000.tif',
 'Masked_2001.tif',
 'Masked_2002.tif',
 'Masked_2003.tif',
 'Masked_2004.tif',
 'Masked_2005.tif',
 'Masked_2006.tif',
 'Masked_2007.tif',
 'Masked_2008.tif',
 'Masked_2009.tif',
 'Masked_2010.tif',
 'Masked_2011.tif',
 'Masked_2012.tif',
 'Masked_2013.tif',
 'Masked_2014.tif',
 'Masked_2015.tif']

In [70]:
for YearFile in AnnualizedMaskedFiles:
    
### STEP 1: TIF TO XYZ ###
    InputRasterName = os.path.join(ProjectFolder, "NTL", YearFile)
    Year = str(re.sub(r'[^0-9]', '', YearFile))
    print('Loading data for year %s. %s \n' % (Year, time.ctime()))
    InputRasterObject = gdal.Open(InputRasterName)
    XYZOutputPath = r'NTL/{}'.format(
        YearFile.replace('.tif', '.xyz')) # New file path will be the same as original, but .tif is replaced with .xyz
    
    # Create an .xyz version of the .tif
    XYZ = gdal.Translate(XYZOutputPath, # Specify a destination path
                         InputRasterObject, # Input is the masked .tif file
                         format='XYZ', 
                         creationOptions=["ADD_HEADER_LINE=YES"])
    print('Finished gdal.Translate() for year %s. %s \n' % (Year, time.ctime()))

#     # Remove the temporary masked tif file.
#     try:  
#         os.remove(InputRasterName)
#     except OSError:
#         pass
#     print('Removed (or skipped if error) intermediate tif file. %s \n' % time.ctime())
    
    InputRasterObject = None
    XYZ = None # Reload XYZ as a point geodataframe

    
### STEP 2: GENERATE GEODATAFRAME WITH SETT_ID FIELD ###
    InputXYZName = ''.join(['Masked_', Year, '.xyz'])
    InputXYZ = pd.read_table(os.path.join(ProjectFolder, 'NTL', InputXYZName), delim_whitespace=True)
    InputXYZ = InputXYZ.loc[InputXYZ['Z'] >= 10] # Subset to only the cells that contained values of 10 or higher.
    print('Loaded XYZ file as a pandas dataframe, year %s. %s \n' % (Year, time.ctime()))
    ValObject = gpd.GeoDataFrame(InputXYZ,
                                 geometry = gpd.points_from_xy(InputXYZ['X'], InputXYZ['Y']),
                                 crs = 'ESRI:102022')
    print('Created geodataframe from non-NoData points, year %s. %s \n' % (Year, time.ctime()))
    del InputXYZ
    
    # Sjoin_nearest: No need to group by ADM this time. 
    ValObject_withID = gpd.sjoin_nearest(ValObject, 
                                    Settlements, 
                                    how='left') # No need for max_distance parameter this time. We've already narrowed down to nearby raster cells.
    
    print('\nJoined settlement ID onto vectorized raster cells for year %s. %s \n' % (Year, time.ctime()))
    print(ValObject_withID.sample(10))
    del ValObject
    
    # We no longer need the spatial information of the raster values because we have their unique settlement ID.
    ValObject_withID = pd.DataFrame(ValObject_withID).drop(columns='geometry')
    
    ValObject_withID.to_csv(''.join([r'NTL/', 'Masked_', Year, '.csv']))
    print('\nExported as table, year %s. %s \n' % (Year, time.ctime()))
    
    # Remove the temporary xyz file.
    try:  
        os.remove(os.path.join(ProjectFolder, 'NTL', InputXYZName))
    except OSError:
        pass
    print('Removed (or skipped if error) intermediate xyz file. %s \n' % time.ctime())

    

### STEP 3: AGGREGATE BY SETTLEMENT AND MERGE ONTO SUMMARIES TABLE ###
    
    # Cell count
    VariableName = ''.join(['NTLct', Year])
    ValAggregated = ValObject_withID[
        ValObject_withID['Z'].notna()].groupby(
        'Sett_ID', as_index=False)['Z'].count().rename(columns={"Z": VariableName})
    print('\nCells per settlement counted, year %s. %s \n' % (Year, time.ctime()))
    print(ValAggregated.sample(10))
    AllSummaries = AllSummaries.merge(ValAggregated, how='left', on='Sett_ID')
    
    # Sum
    VariableName = ''.join(['NTLsum', Year])
    ValAggregated = ValObject_withID[
        ValObject_withID['Z'].notna()].groupby(
        'Sett_ID', as_index=False)['Z'].sum().rename(columns={"Z": VariableName})
    print('\nValues summed to settlement level, year %s. %s \n' % (Year, time.ctime()))
    print(ValAggregated.sample(10))
    AllSummaries = AllSummaries.merge(ValAggregated, how='left', on='Sett_ID')
    
    # Average
    VariableName = ''.join(['NTLavg', Year])
    ValAggregated = ValObject_withID[
        ValObject_withID['Z'].notna()].groupby(
        'Sett_ID', as_index=False)['Z'].mean().rename(columns={"Z": VariableName})
    print('\nValues averaged to settlement level, year %s. %s \n' % (Year, time.ctime()))
    print(ValAggregated.sample(10))
    AllSummaries = AllSummaries.merge(ValAggregated, how='left', on='Sett_ID')
    print('\nMerged year %s onto latest year settlement feature layer. %s \n' % (Year, time.ctime()))
    
    
    print(AllSummaries.sample(10))
    del ValObject_withID, ValAggregated
    
    

print('\n\nFinished. All years masked and assigned their nearest settlement. %s' % time.ctime())

AllSummaries.to_csv(os.path.join(ResultsFolder, 'NTL%sto%s.csv' % (ValStart, ValEnd)))
print('Saved to file. %s \n' % time.ctime())

Loading data for year 1999. Thu Dec  8 22:55:43 2022 

Finished gdal.Translate() for year 1999. Thu Dec  8 22:55:44 2022 

Loaded XYZ file as a pandas dataframe, year 1999. Thu Dec  8 22:55:46 2022 

Created geodataframe from non-NoData points, year 1999. Thu Dec  8 22:55:46 2022 


Joined settlement ID onto vectorized raster cells for year 1999. Thu Dec  8 22:55:47 2022 

                    X             Y   Z                          geometry  \
1090570 -1.399430e+06  4.586734e+05  61   POINT (-1399430.443 458673.358)   
896720  -1.517622e+06  6.469044e+05  17   POINT (-1517622.054 646904.443)   
1087870 -1.396804e+06  4.612998e+05  17   POINT (-1396803.962 461299.838)   
1093269 -1.402932e+06  4.560469e+05  61   POINT (-1402932.416 456046.877)   
1076843 -1.585035e+06  4.718058e+05  49   POINT (-1585035.047 471805.759)   
1074136 -1.588537e+06  4.744322e+05  61   POINT (-1588537.021 474432.239)   
1074139 -1.585911e+06  4.744322e+05  59   POINT (-1585910.541 474432.239)   
1070537 

Finished gdal.Translate() for year 2002. Thu Dec  8 22:55:56 2022 

Loaded XYZ file as a pandas dataframe, year 2002. Thu Dec  8 22:55:57 2022 

Created geodataframe from non-NoData points, year 2002. Thu Dec  8 22:55:57 2022 


Joined settlement ID onto vectorized raster cells for year 2002. Thu Dec  8 22:55:58 2022 

                    X             Y   Z                          geometry  \
1070538 -1.583284e+06  4.779342e+05  41   POINT (-1583284.060 477934.213)   
1096875 -1.401181e+06  4.525449e+05  62   POINT (-1401181.429 452544.904)   
1097777 -1.400306e+06  4.516694e+05  60   POINT (-1400305.936 451669.410)   
1094165 -1.407310e+06  4.551714e+05  29   POINT (-1407309.883 455171.384)   
1067835 -1.583284e+06  4.805607e+05  24   POINT (-1583284.060 480560.693)   
101680  -1.042229e+06  1.419965e+06  12  POINT (-1042229.128 1419965.131)   
1071443 -1.579782e+06  4.770587e+05  22   POINT (-1579782.087 477058.719)   
898526  -1.514120e+06  6.451535e+05  12   POINT (-1514120.080 6

Finished gdal.Translate() for year 2004. Thu Dec  8 22:56:03 2022 

Loaded XYZ file as a pandas dataframe, year 2004. Thu Dec  8 22:56:05 2022 

Created geodataframe from non-NoData points, year 2004. Thu Dec  8 22:56:05 2022 


Joined settlement ID onto vectorized raster cells for year 2004. Thu Dec  8 22:56:05 2022 

                    X              Y   Z                         geometry  \
1072334 -1.588537e+06  476183.226013  58  POINT (-1588537.021 476183.226)   
1075945 -1.582409e+06  472681.252338  47  POINT (-1582408.567 472681.252)   
1071440 -1.582409e+06  477058.719432  45  POINT (-1582408.567 477058.719)   
900326  -1.515871e+06  643402.468988  26  POINT (-1515871.067 643402.469)   
1105724 -1.542136e+06  443789.969521  11  POINT (-1542135.870 443789.970)   
1094179 -1.395053e+06  455171.383964  41  POINT (-1395052.975 455171.384)   
1098675 -1.402932e+06  450793.916870  54  POINT (-1402932.416 450793.917)   
1096871 -1.404683e+06  452544.903708  49  POINT (-1404683.403 4

Finished gdal.Translate() for year 2006. Thu Dec  8 22:56:11 2022 

Loaded XYZ file as a pandas dataframe, year 2006. Thu Dec  8 22:56:13 2022 

Created geodataframe from non-NoData points, year 2006. Thu Dec  8 22:56:13 2022 


Joined settlement ID onto vectorized raster cells for year 2006. Thu Dec  8 22:56:13 2022 

                    X              Y   Z                         geometry  \
1076839 -1.588537e+06  471805.758920  54  POINT (-1588537.021 471805.759)   
1069635 -1.585035e+06  478809.706269  52  POINT (-1585035.047 478809.706)   
1075946 -1.581533e+06  472681.252338  32  POINT (-1581533.074 472681.252)   
1071439 -1.583284e+06  477058.719432  51  POINT (-1583284.060 477058.719)   
1185265 -1.320636e+06  366746.548674  11  POINT (-1320636.035 366746.549)   
1008835 -1.175304e+06  538343.258742  11  POINT (-1175304.127 538343.259)   
1095065 -1.408185e+06  454295.890545  15  POINT (-1408185.377 454295.891)   
963742  -1.212950e+06  582117.929678  16  POINT (-1212950.344 5

Finished gdal.Translate() for year 2008. Thu Dec  8 22:56:18 2022 

Loaded XYZ file as a pandas dataframe, year 2008. Thu Dec  8 22:56:20 2022 

Created geodataframe from non-NoData points, year 2008. Thu Dec  8 22:56:20 2022 


Joined settlement ID onto vectorized raster cells for year 2008. Thu Dec  8 22:56:21 2022 

                    X              Y   Z                         geometry  \
1091474 -1.396804e+06  457797.864220  55  POINT (-1396803.962 457797.864)   
1075943 -1.584160e+06  472681.252338  56  POINT (-1584159.554 472681.252)   
1081347 -1.585911e+06  467428.291826  13  POINT (-1585910.541 467428.292)   
1103179 -1.403808e+06  446416.449777  22  POINT (-1403807.910 446416.450)   
1094165 -1.407310e+06  455171.383964  24  POINT (-1407309.883 455171.384)   
1081351 -1.582409e+06  467428.291826  16  POINT (-1582408.567 467428.292)   
1075949 -1.578907e+06  472681.252338  16  POINT (-1578906.593 472681.252)   
1077745 -1.584160e+06  470930.265501  51  POINT (-1584159.554 4

Finished gdal.Translate() for year 2010. Thu Dec  8 22:56:27 2022 

Loaded XYZ file as a pandas dataframe, year 2010. Thu Dec  8 22:56:28 2022 

Created geodataframe from non-NoData points, year 2010. Thu Dec  8 22:56:28 2022 


Joined settlement ID onto vectorized raster cells for year 2010. Thu Dec  8 22:56:29 2022 

                    X              Y   Z                         geometry  \
1077746 -1.583270e+06  470802.252092  54  POINT (-1583270.220 470802.252)   
1095080 -1.395039e+06  454167.872351  42  POINT (-1395039.081 454167.872)   
1095968 -1.406420e+06  453292.378681  44  POINT (-1406420.499 453292.379)   
1071437 -1.585021e+06  476930.707786  57  POINT (-1585021.208 476930.708)   
1211261 -1.437063e+06  341229.188850  14  POINT (-1437062.777 341229.189)   
670044  -1.188423e+06  867400.884853  12  POINT (-1188422.575 867400.885)   
1078647 -1.583270e+06  469926.758421  50  POINT (-1583270.220 469926.758)   
1105887 -1.399417e+06  443661.948304  35  POINT (-1399416.549 4

Finished gdal.Translate() for year 2012. Thu Dec  8 22:56:35 2022 

Loaded XYZ file as a pandas dataframe, year 2012. Thu Dec  8 22:56:37 2022 

Created geodataframe from non-NoData points, year 2012. Thu Dec  8 22:56:37 2022 


Joined settlement ID onto vectorized raster cells for year 2012. Thu Dec  8 22:56:37 2022 

                    X             Y   Z                          geometry  \
1076837 -1.590288e+06  4.718058e+05  56   POINT (-1590288.008 471805.759)   
1077668 -1.651573e+06  4.709303e+05  11   POINT (-1651572.547 470930.266)   
1137421 -1.400306e+06  4.131477e+05  11   POINT (-1400305.936 413147.700)   
1082253 -1.581533e+06  4.665528e+05  20   POINT (-1581533.074 466552.798)   
433959  -1.208573e+06  1.096908e+06  58  POINT (-1208572.877 1096908.060)   
1002026 -1.614802e+06  5.444717e+05  11   POINT (-1614801.824 544471.713)   
1092374 -1.397679e+06  4.569224e+05  63   POINT (-1397679.456 456922.371)   
898525  -1.514996e+06  6.451535e+05  24   POINT (-1514995.574 6

Finished gdal.Translate() for year 2014. Thu Dec  8 22:56:43 2022 

Loaded XYZ file as a pandas dataframe, year 2014. Thu Dec  8 22:56:45 2022 

Created geodataframe from non-NoData points, year 2014. Thu Dec  8 22:56:45 2022 


Joined settlement ID onto vectorized raster cells for year 2014. Thu Dec  8 22:56:46 2022 

                    X              Y   Z                         geometry  \
1092370 -1.401181e+06  456922.370801  61  POINT (-1401181.429 456922.371)   
1097773 -1.403808e+06  451669.410289  57  POINT (-1403807.910 451669.410)   
1076767 -1.651573e+06  471805.758920  10  POINT (-1651572.547 471805.759)   
1074144 -1.581533e+06  474432.239176  49  POINT (-1581533.074 474432.239)   
1071440 -1.582409e+06  477058.719432  52  POINT (-1582408.567 477058.719)   
1097780 -1.397679e+06  451669.410289  58  POINT (-1397679.456 451669.410)   
1098681 -1.397679e+06  450793.916870  56  POINT (-1397679.456 450793.917)   
1098676 -1.402057e+06  450793.916870  58  POINT (-1402056.923 4

Saved to file. Thu Dec  8 22:56:50 2022 



In [71]:
AllSummaries.sort_values('NTLsum2010', ascending=False).head(20)

Unnamed: 0,Sett_ID,ADM_ID,NTLct1999,NTLsum1999,NTLavg1999,NTLct2000,NTLsum2000,NTLavg2000,NTLct2001,NTLsum2001,...,NTLavg2012,NTLct2013,NTLsum2013,NTLavg2013,NTLct2014,NTLsum2014,NTLavg2014,NTLct2015,NTLsum2015,NTLavg2015
3964,19120.0,12,298.0,10469.0,35.130872,291.0,10540.0,36.219931,303.0,10930.0,...,39.063205,444.0,17058.0,38.418919,464.0,18567.0,40.015086,476.0,20262.0,42.567227
3083,12521.0,9,185.0,7030.0,38.0,185.0,6926.0,37.437838,185.0,6955.0,...,44.029536,237.0,10674.0,45.037975,238.0,10459.0,43.945378,238.0,10476.0,44.016807
13493,199779.0,12,145.0,7079.0,48.82069,145.0,7090.0,48.896552,146.0,7282.0,...,59.493151,146.0,8553.0,58.582192,146.0,8216.0,56.273973,146.0,8436.0,57.780822
13494,199780.0,12,145.0,7079.0,48.82069,145.0,7090.0,48.896552,146.0,7282.0,...,59.493151,146.0,8553.0,58.582192,146.0,8216.0,56.273973,146.0,8436.0,57.780822
13496,199787.0,12,145.0,7079.0,48.82069,145.0,7090.0,48.896552,146.0,7282.0,...,59.493151,146.0,8553.0,58.582192,146.0,8216.0,56.273973,146.0,8436.0,57.780822
13495,199782.0,12,145.0,7079.0,48.82069,145.0,7090.0,48.896552,146.0,7282.0,...,59.493151,146.0,8553.0,58.582192,146.0,8216.0,56.273973,146.0,8436.0,57.780822
3137,12684.0,9,125.0,5382.0,43.056,126.0,5283.0,41.928571,127.0,5312.0,...,54.782946,129.0,7110.0,55.116279,129.0,6669.0,51.697674,129.0,6621.0,51.325581
3105,12544.0,9,125.0,5382.0,43.056,126.0,5283.0,41.928571,127.0,5312.0,...,54.782946,129.0,7110.0,55.116279,129.0,6669.0,51.697674,129.0,6621.0,51.325581
3135,12674.0,9,125.0,5382.0,43.056,126.0,5283.0,41.928571,127.0,5312.0,...,54.782946,129.0,7110.0,55.116279,129.0,6669.0,51.697674,129.0,6621.0,51.325581
3133,12670.0,9,125.0,5382.0,43.056,126.0,5283.0,41.928571,127.0,5312.0,...,54.782946,129.0,7110.0,55.116279,129.0,6669.0,51.697674,129.0,6621.0,51.325581
