# Spatiotemporal Trends in Urbanization: Cameroon
*Using yearly estimates (2000-2015) of population, built-area, and economic indicators to track city-by-city growth and change over time.*

---

### Research questions 

#### 1. How has the size of Settlement X changed over time? 

- Population size 

- Geographical extents 

- Population density 

#### 2. In what year did Settlement X become a new urban class?  

- From semi-dense to high-density city 

- Small settlement area to built-up area 

- When a hamlet area or small settlement area first appeared

#### 3. Is there a discernable pattern between the spatio-temporal distribution of economic density and population density? 

#### 4. How much of urban space attributable to City X is outside of the administrative limits of the city? 

- When did this fragment(s) appear? 

- Which district/municipality/authority has purview over the fragment(s)? 

#### 5. For the questions above, how does the answer change based on different understandings of urban limits? 

- Scenario A: where "city" is delimited by an official administrative boundary 

- Scenario B: where "city" includes all contiguous (and near-contiguous) built up area 

#### 6. Subnational and inter-national comparisons. Examples: 

- Compare the rates (pop, build-up, economic…) of the fastest growing settlement of each ADM1 region. 

- Which African metropoles experience the most vs. the least fragmentation? Is there a confluence between amount of urban fragmentation and rate of densification? 

### Datasets
1. Most up-to-date administrative boundaries: **ADM3.**
2. Built-up area, yearly: **World Settlement Footprint Evolution.** Resolution: 30m.
3. Settlement types: **GRID3 settlement extents.** Captured between 2009-2019.
4. Population, yearly: **WorldPop.** UN-adjusted, unconstrained. Resolution: 100m.
5. Nighttime lights, yearly: **Harmonization of DMSP and VIIRS.** Resolution: 1km.
6. City names: **UCDB, Africapolis, and GeoNames.**

---

---

## 1. PREPARE WORKSPACE

### 1.1 Off-script

##### Off-script: Create folders in working directory.
> *ADM
<br>Buildup
<br>PlaceName
<br>Population
<br>Settlement
<br>NTL*

##### Off-script: Download datasets (as shapefile, GeoJSON, or tif where possible) and place or extract into corresponding folder:
- ADM: *Sourced internally.*
- Buildup: https://download.geoservice.dlr.de/WSF_EVO/files/
- PlaceName: 
    - GeoNames: (file: cities500.zip) https://download.geonames.org/export/dump/
    - Africapolis: https://africapolis.org/en/data
    - Urban Centres Database: https://ghsl.jrc.ec.europa.eu/ghs_stat_ucdb2015mt_r2019a.php
- Population: https://hub.worldpop.org/geodata/listing?id=69
- Settlement: https://data.grid3.org/datasets/GRID3::grid3-cameroon-settlement-extents-version-01-01-/explore
- NTL: https://figshare.com/articles/dataset/Harmonization_of_DMSP_and_VIIRS_nighttime_light_data_from_1992-2018_at_the_global_scale/9828827/2

##### Other off-script:
- Convert GeoNames from .txt file to shape (delimiter = tab, header rows = 0) and rename fields.
- If necessary, mosaic WSFE rasters that cover the area of interest to create a single file.

### 1.2 Load all packages.

In [1]:
# Note: Most but not all of these packages were used in final form. 

import os, sys, glob, re, time
from os.path import exists
from functools import reduce

import geopandas as gpd 
import pandas as pd
from shapely.geometry import Point, LineString, Polygon, shape, MultiPoint
from shapely.ops import cascaded_union
from shapely.validation import make_valid, explain_validity
import shapely.wkt
import scipy

#from xrspatial import zonal_stats 
#import xarray as xr 
import numpy as np 
import fiona, rioxarray
import rasterio
from rasterio.plot import show
from rasterio import features
from rasterio.features import shapes
from rasterio import mask
from osgeo import gdal, osr, ogr, gdal_array
import matplotlib.pyplot as plt

### 1.3 Set workspace.

In [2]:
ProjectFolder = os.getcwd()
ResultsFolder = os.path.join(ProjectFolder, 'Results')
print(ProjectFolder)
print(ResultsFolder)

C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon
C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon\Results


---

## 2. PREPARE BUILDUP, SETTLEMENT, AND ADMIN DATASETS
Projection for all datasets: Africa Albers Equal Area Conic

### 2.1 WSFE: Check contents and change NoData value as necessary.

In [None]:
WSFE = rasterio.open(os.path.join(ProjectFolder, "Buildup", os.listdir('Buildup/')[0]))
print(WSFE) # WSFE values are all 4 digits long (1985-2015)
print(dir(WSFE))
print(WSFE.crs)
print(WSFE.dtypes)
NoDataValue = WSFE.nodatavals
print(NoDataValue)
print(WSFE.read(1).min(), WSFE.read(1).mean(), np.median(WSFE.read(1)), WSFE.read(1).max())

# If NoDataValue != 0, change to 0.

##### Off-script: Run this block in QGIS.

In [None]:
# # OPEN QGIS FOR THIS PORTION. CODE DOCUMENTED HERE.
# Change NoData value to zero, as this won't interfere with a possible value of 99999 in GRID3 and ADM.
# Then make sure there are no values above 2015 (such as 99999) or below 1985 in the dataset by reclassifying them as NoData.
# Was having trouble with rasterio & gdal here, so moved to QGIS.

# processing.run("native:reclassifybytable", {'INPUT_RASTER':'C:/Users/grace/GIS/povertyequity/urban_growth/WSFE_TCD.tif','RASTER_BAND':1,'TABLE':['2016','','0','','1984','0'],'NO_DATA':0,'RANGE_BOUNDARIES':0,'NODATA_FOR_MISSING':False,'DATA_TYPE':5,'OUTPUT':'C:/Users/grace/GIS/povertyequity/urban_growth/WSFE.tif'})

### 2.2 Prepare raster locations for GRID3 and Admin areas

In [None]:
ADM_vec = gpd.read_file(glob.glob('ADM/*.shp')[0])[['geometry']].to_crs("ESRI:102022") # This glob() function pulls the first file ([0]) in the ADM folder which ended in '.shp'
GRID3_vec = gpd.read_file(glob.glob('Settlement/*.shp')[0])[['type','geometry']].to_crs("ESRI:102022")
ADM_vec['ADM_ID'] = ADM_vec.index
GRID3_vec['G3_ID'] = GRID3_vec.index

In [None]:
ADM_out = './ADM/ADM.tif'
GRID3_out = './Settlement/GRID3.tif'

print(ADM_vec.info(), "\n\n", 
      ADM_vec.crs, "\n\n", 
      len(str(ADM_vec['ADM_ID'].max()))) # We need to know how many digits need to be allocated to each dataset in the "join" serial.
print(GRID3_vec.info(), "\n\n", 
      GRID3_vec.crs, "\n\n", 
      len(str(GRID3_vec['G3_ID'].max())))

---

## 3. WSFE AND ADM; GRID3 AND ADM
RASTERIZE: Bring ADM and GRID3 into raster space.

RASTER MATH: "Join" ADM ID onto GRID3 and onto WSFE by creating unique concatenation string.

VECTORIZE: Bring joined data into vector space.

VECTOR MATH: Split unique ID from raster math step into separate columns.

### 3.1 Rasterize admin areas and GRID3.

In [None]:
# Copy and update the metadata from WSFE for the output
meta = WSFE.meta.copy()
meta.update(compress='lzw')

In [None]:
with rasterio.open(ADM_out, 'w+', **meta) as out:
    out_arr = out.read(1)

    # this is where we create a generator of geom, value pairs to use in rasterizing
    shapes = ((geom,value) for geom, value in zip(ADM_vec.geometry, ADM_vec.ADM_ID))

    burned = features.rasterize(shapes=shapes, fill=0, out=out_arr, transform=out.transform, all_touched=False)
    out.write_band(1, burned)
out = None

In [None]:
with rasterio.open(GRID3_out, 'w+', **meta) as out:
    out_arr = out.read(1)

    # this is where we create a generator of geom, value pairs to use in rasterizing
    shapes = ((geom,value) for geom, value in zip(GRID3_vec.geometry, GRID3_vec.G3_ID))

    burned = features.rasterize(shapes=shapes, fill=0, out=out_arr, transform=out.transform, all_touched=False)
    out.write_band(1, burned)
out = None

*Validation: Check the dimensions, type, and basic stats of the three datasets. All should be the same dimension and NoData value.*

In [None]:
CheckContents = gdal.Open(r"ADM/ADM.tif")
print(gdal.GetDataTypeName(CheckContents.GetRasterBand(1).DataType), 
      CheckContents.GetRasterBand(1).GetNoDataValue())

CheckContents =  gdal.Open(r"Settlement/GRID3.tif")
print(gdal.GetDataTypeName(CheckContents.GetRasterBand(1).DataType), 
      CheckContents.GetRasterBand(1).GetNoDataValue())

CheckContents = gdal.Open(os.path.join(ProjectFolder, "Buildup", os.listdir('Buildup/')[0]))
print(gdal.GetDataTypeName(CheckContents.GetRasterBand(1).DataType), 
      CheckContents.GetRasterBand(1).GetNoDataValue())

CheckContents = None

In [None]:
RastersList = [rasterio.open(r"ADM/ADM.tif"), 
               rasterio.open(r"Settlement/GRID3.tif"), 
               rasterio.open(os.path.join(ProjectFolder, "Buildup", os.listdir('Buildup/')[0]))]

for item in RastersList:
    print(item.name, "\nBands= ", item.count, "\nWxH= ", item.width, "x", item.height, "\n\n")

stats = []
for item in RastersList:
    band = item.read(1)
    stats.append({
        'raster': item.name,
        'min': band.min(),
        'mean': band.mean(),
        'median': np.median(band),
        'max': band.max()})

# Show stats for each channel
print("\n", stats)

RastersList = None
band = None

### 3.2 Raster math to "join" admin to GRID3 and to WSFE.
Processing is more rapid when "joining," i.e. creating serial codes out of two datasets, in raster rather than vector space.
Here, we are concatenating the ID fields of the two datasets to create a serial number that we can then split in vector space later to create two ID fields.

*Adding together the values to create join IDs. This is in effect a concatenation of their ID strings, by way of summation. The number of zeros in the calc multiplication corresponds with number of digits of the maximum value in the "B" dataset. (e.g. Chad ADM codes go up 4 digits, so it's calc=(A*10000)+B).*

In [None]:
# # OPEN TERMINAL FOR THIS PORTION. CODE DOCUMENTED HERE.

# Gdal_calc.py # To see info.

# gdal_calc.py -A C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon\Settlement\GRID3.tif -B  C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon\ADM\ADM.tif --outfile=C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon\Settlement\GRID3_ADM.tif --overwrite --calc="(A*1000)+B"
# gdal_calc.py -A C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon\Buildup\WSFE_CMN.tif -B  C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon\ADM\ADM.tif --outfile=C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon\Buildup\WSFE_ADM.tif --overwrite --calc="(A*1000)+B"

# # END TERMINAL-ONLY ASPECT. RETURN HERE FOR NEXT STEPS.

In [None]:
# Validation: check the basic statistics of the resulting datasets.
RastersList = [rasterio.open(r"Buildup/WSFE_ADM.tif"), 
               rasterio.open(r"Settlement/GRID3_ADM.tif")]
for item in RastersList:
    print(item.name, "\nBands= ", item.count, "\nWxH= ", item.width, "x", item.height, "\n\n")
    
stats = []
for item in RastersList:
    band = item.read(1)
    stats.append({
        'raster': item.name,
        'min': band.min(),
        'mean': band.mean(),
        'median': np.median(band),
        'max': band.max()})

# Show stats for each channel
print("\n", stats)

RastersList = None
band = None

### 3.3 Vectorize "joined" layers.

##### Off-script: Run this block in QGIS.

In [None]:
# OPEN QGIS FOR THIS PORTION. CODE DOCUMENTED HERE.

# Due to dtype errors with both gdal and rasterio here, I decided to run the raster to polygon function in QGIS instead.
# It is possible to run QGIS functions within a Jupyter Notebook, but I ran it within the GUI. Arc or R are other options.
# Command line code here.

# processing.run("gdal:polygonize", {'INPUT':'C:/Users/grace/GIS/povertyequity/urban_growth/Cameroon/Settlement/GRID3_ADM.tif','BAND':1,'FIELD':'DN','EIGHT_CONNECTEDNESS':False,'EXTRA':'','OUTPUT':'C:/Users/grace/GIS/povertyequity/urban_growth/Cameroon/Settlement/GRID3_ADM.shp'})
# processing.run("gdal:polygonize", {'INPUT':'C:/Users/grace/GIS/povertyequity/urban_growth/Cameroon/Buildup/WSFE_ADM.tif','BAND':1,'FIELD':'DN','EIGHT_CONNECTEDNESS':False,'EXTRA':'','OUTPUT':'C:/Users/grace/GIS/povertyequity/urban_growth/Cameroon/Buildup/WSFE_ADM.shp'})

### 3.4 Vector math to split raster strings into admin area, GRID3, and WSFE year assignments.

In [3]:
# Load newly created vectorized datasets.
GRID3_ADM = gpd.read_file(r"Settlement/GRID3_ADM.shp")
WSFE_ADM = gpd.read_file(r"Buildup/WSFE_ADM.shp")
print(GRID3_ADM.info(), "\n\n", GRID3_ADM.sample(10), "\n\n", GRID3_ADM.crs, "\n\n", 
      WSFE_ADM.info(), "\n\n", WSFE_ADM.sample(10), "\n\n", WSFE_ADM.crs)

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 210151 entries, 0 to 210150
Data columns (total 2 columns):
 #   Column    Non-Null Count   Dtype   
---  ------    --------------   -----   
 0   DN        210151 non-null  int64   
 1   geometry  210151 non-null  geometry
dtypes: geometry(1), int64(1)
memory usage: 3.2 MB
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 584286 entries, 0 to 584285
Data columns (total 2 columns):
 #   Column    Non-Null Count   Dtype   
---  ------    --------------   -----   
 0   DN        584286 non-null  int64   
 1   geometry  584286 non-null  geometry
dtypes: geometry(1), int64(1)
memory usage: 8.9 MB
None 

                DN                                           geometry
43974   153788236  POLYGON ((-1142457.003 1123208.708, -1142372.2...
204851    5387074  POLYGON ((-1363604.004 364564.530, -1363547.50...
175268   27143097  POLYGON ((-1372869.072 501011.097, -1372812.57...
196961   15957031  POLYGON ((-1433092.013 424251.9

In [4]:
# Split serial back into separate dataset fields.
# For Cameroon: WSFE and ADM: 4+3=7 digits. GRID3 and ADM: 6+3=9 digits.
GRID3_ADM['gridstring'] = GRID3_ADM['DN'].astype(str).str.zfill(9)
WSFE_ADM['gridstring'] = WSFE_ADM['DN'].astype(str).str.zfill(7)

GRID3_ADM['Sett_ID'] = GRID3_ADM['gridstring'].str[:-3].astype(int) # Remove the last 4 digits to get the GRID3 portion.
GRID3_ADM['ADM_ID'] = GRID3_ADM['gridstring'].str[-3:].astype(int) # Keep only the last 4 digits to get the ADM portion.
WSFE_ADM['year'] = WSFE_ADM['gridstring'].str[:-3].astype(int)
WSFE_ADM['ADM_ID'] = WSFE_ADM['gridstring'].str[-3:].astype(int)

print(GRID3_ADM.sample(10), WSFE_ADM.sample(10))

               DN                                           geometry  \
18460   172947271  POLYGON ((-1036332.430 1244448.881, -1036304.1...   
122370   90010299  POLYGON ((-1518906.879 699800.720, -1518793.89...   
148876   50663332  POLYGON ((-1545374.466 612545.733, -1545346.21...   
52166   136420214  POLYGON ((-1155111.730 1053973.773, -1154970.4...   
118979   74158142  POLYGON ((-1484332.358 709000.431, -1484247.61...   
70883   137072211  POLYGON ((-1037631.799 848134.199, -1037547.05...   
162644   65302342  POLYGON ((-1441396.677 548084.530, -1441283.68...   
71012   137037213  POLYGON ((-1086216.911 847754.830, -1086160.41...   
152486   67439131  POLYGON ((-1489784.059 598888.431, -1489727.56...   
162207  196463182  POLYGON ((-1665961.584 549665.237, -1665876.84...   

       gridstring  Sett_ID  ADM_ID  
18460   172947271   172947     271  
122370  090010299    90010     299  
148876  050663332    50663     332  
52166   136420214   136420     214  
118979  074158142    7

In [5]:
# Dissolve any features that have the same G3 and ADM values so that we have a single unique feature per settlement.
# Note: we do NOT want to dissolve the WSFE features. Distinct features for noncontiguous builtup areas of the same year is necessary to separate them in the Near tool step.
GRID3_ADM = GRID3_ADM.dissolve(by=['Sett_ID', 'ADM_ID'], as_index=False)
print(GRID3_ADM.info(), GRID3_ADM.head())

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 205788 entries, 0 to 205787
Data columns (total 5 columns):
 #   Column      Non-Null Count   Dtype   
---  ------      --------------   -----   
 0   Sett_ID     205788 non-null  int64   
 1   ADM_ID      205788 non-null  int64   
 2   geometry    205788 non-null  geometry
 3   DN          205788 non-null  int64   
 4   gridstring  205788 non-null  object  
dtypes: geometry(1), int64(3), object(1)
memory usage: 7.9+ MB
None    Sett_ID  ADM_ID                                           geometry    DN  \
0        1     354  POLYGON ((-1571983.289 271555.771, -1571870.30...  1354   
1        2     354  POLYGON ((-1572915.445 272093.211, -1572802.45...  2354   
2        3     354  POLYGON ((-1571390.099 273768.760, -1571361.85...  3354   
3        4     354  POLYGON ((-1575062.229 274274.586, -1574949.24...  4354   
4        5     354  POLYGON ((-1573395.647 277214.700, -1573226.16...  5354   

  gridstring  
0  000001354  
1  00000

In [6]:
# Remove features where year, settlement, or admin area = 0.
# This was supposed to be resolved earlier with the gdal_calc NoDataValue parameter.

print("Before: WSFE %s and GRID3 %s\n" % (WSFE_ADM.shape, GRID3_ADM.shape))
WSFE_ADM = WSFE_ADM.loc[(WSFE_ADM["year"] != 0) & (WSFE_ADM["ADM_ID"] != 0)] # Since we change the datatype to integer, no need to include all digits. Otherwise, it would need to be: != '0000'
GRID3_ADM = GRID3_ADM.loc[(GRID3_ADM["Sett_ID"] != 0) & (GRID3_ADM["ADM_ID"] != 0)]
print("After: WSFE %s and GRID3 %s\n" % (WSFE_ADM.shape, GRID3_ADM.shape))

Before: WSFE (584286, 5) and GRID3 (205788, 5)

After: WSFE (584286, 5) and GRID3 (205788, 5)



In [7]:
# The Bounded_ID is our new unique settlement identifier for subsequent matching steps.
GRID3_ADM['Bounded_ID'] = GRID3_ADM.index
WSFE_ADM['WSFE_ID'] = WSFE_ADM.index
GRID3_ADM = GRID3_ADM[['Sett_ID', 'Bounded_ID', 'ADM_ID', 'geometry']]
WSFE_ADM = WSFE_ADM[['year', 'ADM_ID', 'geometry']]

---

## 4. UNIQUE SETTLEMENTS FROM WSFE AND GRID3: TWO VERSIONS

Note that there are 2 versions here, so that we can create a fragmentation index:
1. **Boundless, aka boundary-agnostic settlements**: Unique settlements are linked to GRID3 settlement IDs. Administrative areas do not influence the extents of the settlement.
2. **Bounded, aka politically-defined settlements**: Settlements in the Boundless dataset which spread across more than one administrative area are split into separate settlements in the Bounded dataset. The largest polygon after the split is considered the "principal" settlement, and polygons in other admin areas are considered "fragments." By dividing the fragment area(s) of the Bounded settlement by the area of the Boundless settlement, we can acquire a fragmentation index for each locality.

### 4.1 BOUNDED SETTLEMENTS: Near Join by ADM group.

In [8]:
# The sharding step below doesn't work if any ADM group contains features from only one of the two datasets.
WSFE_u = sorted(WSFE_ADM.ADM_ID.unique().tolist())
GRID3_u = sorted(GRID3_ADM.ADM_ID.unique().tolist())

not_matching = list(set(GRID3_u).symmetric_difference(set(WSFE_u)))
print(not_matching) # Validate: If there are many ADM_IDs in this list, investigate why GRID3 or WSFE is missing in so many areas.

# Take only the features that share an ADM with at least one GRID3 feature.
WSFE_matching = WSFE_ADM[~WSFE_ADM["ADM_ID"].isin(not_matching)] 
GRID3_matching = GRID3_ADM[~GRID3_ADM["ADM_ID"].isin(not_matching)]

WSFE_u = sorted(WSFE_matching.ADM_ID.unique().tolist())
GRID3_u = sorted(GRID3_matching.ADM_ID.unique().tolist())

not_matching = list(set(GRID3_u).symmetric_difference(set(WSFE_u)))
print(not_matching) # This should now be empty.

del WSFE_u, GRID3_u, not_matching, WSFE_ADM, GRID3_ADM

[43]
[]


In [9]:
# Shard the dataframe whose variables we want to join into a dict
shards = {k:d for k, d in GRID3_matching.groupby('ADM_ID', as_index=False)}

# Take the dataframe whose geometry we want to retain.
# Group by ADM, then sjoin_nearest among the smaller dataframe's matching ADM shard
Bounded = WSFE_matching.groupby('ADM_ID', as_index=False).apply(
    lambda d: gpd.sjoin_nearest(
    d, shards[d['ADM_ID'].values[0]], 
        how='left', 
        max_distance=500))

print(Bounded.info())
print(Bounded.sample(10))

<class 'geopandas.geodataframe.GeoDataFrame'>
MultiIndex: 586113 entries, (0, 560868) to (356, 3948)
Data columns (total 7 columns):
 #   Column        Non-Null Count   Dtype   
---  ------        --------------   -----   
 0   year          586113 non-null  int32   
 1   ADM_ID_left   586113 non-null  int32   
 2   geometry      586113 non-null  geometry
 3   index_right   581895 non-null  float64 
 4   Sett_ID       581895 non-null  float64 
 5   Bounded_ID    581895 non-null  float64 
 6   ADM_ID_right  581895 non-null  float64 
dtypes: float64(4), geometry(1), int32(2)
memory usage: 50.8 MB
None
            year  ADM_ID_left  \
192 103057  1985          194   
257 36385   1999          260   
350 428584  2000          353   
237 50371   2013          240   
153 178397  1985          155   
169 404322  2000          171   
319 213840  2006          322   
8   566780  2014            9   
348 457860  1985          351   
198 107903  2003          200   

                             

In [10]:
# Now we can dissolve with the WSFE years, now that we can group them by their settlement ID.
Bounded = Bounded.dissolve(by=['year', 'Bounded_ID'], as_index=False)
print(Bounded.info(), Bounded.sample(10))

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 86839 entries, 0 to 86838
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype   
---  ------        --------------  -----   
 0   year          86839 non-null  int64   
 1   Bounded_ID    86839 non-null  float64 
 2   geometry      86839 non-null  geometry
 3   ADM_ID_left   86839 non-null  int32   
 4   index_right   86839 non-null  float64 
 5   Sett_ID       86839 non-null  float64 
 6   ADM_ID_right  86839 non-null  float64 
dtypes: float64(4), geometry(1), int32(1), int64(1)
memory usage: 4.3 MB
None        year  Bounded_ID                                           geometry  \
18043  1995    191485.0  MULTIPOLYGON (((-1129830.523 1276726.904, -112...   
71856  2012    124536.0  POLYGON ((-1138558.895 726419.814, -1138502.40...   
6182   1985     40964.0  POLYGON ((-1046021.205 500094.287, -1045964.71...   
31978  2001     39450.0  POLYGON ((-1120706.691 462030.880, -1120678.44...   
1739   1985      465

In [11]:
# Clean up and save to file.
Bounded = Bounded.rename(
    columns={"ADM_ID_left": "ADM_ID"})[['ADM_ID', 'year', 'Bounded_ID', 'Sett_ID', 'geometry']]
Bounded.to_file(
    driver='GPKG', filename=r'Results/NonCumulativeSettlements.gpkg', layer='Settlements_Bounded')

In [12]:
del WSFE_matching, GRID3_matching, shards

### 4.2 BOUNDLESS SETTLEMENTS: Simple near join.

In [13]:
# Fragments of any bounded settlement will be combined into a single "boundless" settlement in this version.
# It is based on their "Sett_ID", which is a direct loan from the GRID3 settlement features.
Boundless = Bounded.dissolve(by=['year', 'Sett_ID'], as_index=False)
print(Boundless.info(), Boundless.sample(10))

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 83061 entries, 0 to 83060
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   year        83061 non-null  int64   
 1   Sett_ID     83061 non-null  float64 
 2   geometry    83061 non-null  geometry
 3   ADM_ID      83061 non-null  int32   
 4   Bounded_ID  83061 non-null  float64 
dtypes: float64(2), geometry(1), int32(1), int64(1)
memory usage: 2.9 MB
None        year   Sett_ID                                           geometry  \
79899  2015   19531.0  POLYGON ((-1405720.516 522066.105, -1405692.26...   
30585  2001   48360.0  MULTIPOLYGON (((-1599298.291 553964.758, -1599...   
65778  2012    2665.0  POLYGON ((-1452074.103 347492.902, -1452045.85...   
70061  2013    4202.0  POLYGON ((-1355129.856 358494.617, -1355101.60...   
14605  1993    8049.0  POLYGON ((-1622319.725 476162.394, -1622291.47...   
82965  2015  198923.0  POLYGON ((-1413121.272 426085.618

In [14]:
# Clean up and save to file.
Boundless.to_file(driver='GPKG', filename=r'Results/NonCumulativeSettlements.gpkg', layer='Settlements_Boundless')

---

## 5. CUMULATIVE ANNUALIZED SETTLEMENT EXTENTS
DISSOLVE BY YEAR SETS: Create separate feature layers of each cumulative year.

### 5.1 Define study years for each for loop.

In [15]:
# Boundless = gpd.read_file(r'Results/NonCumulativeSettlements.gpkg', layer='Settlements_Boundless')

def CreateList(r1, r2):
    return [item for item in range(r1, r2+1)]

CuStart, CuEnd = Boundless['year'].min(), Boundless['year'].max()
StudyStart, StudyEnd = 1999, Boundless['year'].max()

AllCuYears = CreateList(CuStart, CuEnd) # All years in the WSFE dataset
AllStudyYears = CreateList(StudyStart, StudyEnd) # All years for which there will be growth stats in the present study.
print(AllCuYears, '\n\n', AllStudyYears)

ReversedStudyYears = []
for i in AllStudyYears:
    ReversedStudyYears.insert(0,i)
ReversedStudyYears.remove(StudyEnd)
print('\n\n', ReversedStudyYears)

[1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015] 

 [1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015]


 [2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999]


### 5.2 Starting with main Boundless dataset, create a cumulative area feature layer for each year.

In [16]:
# For each year in the growth stats study, we are taking features from all years prior to and including that year, 
# dissolving those features, and exporting as its own file.

for item in AllStudyYears:
    print('Subsetting to cumulative area for year: %s. %s\n' % (item, time.ctime()))
    CuYearSet = Boundless[Boundless['year'].between(
        CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.
    print('Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. %s\n' % time.ctime())
    CuYearDissolve = CuYearSet.dissolve(by='Sett_ID', 
                                        aggfunc={"year": "max", "ADM_ID":"min"}, # Though ADM_ID should be matching every time.
                                        as_index=False)
    print('Write to file. %s\n' % time.ctime())
    CuYearName = ''.join(['Cu', str(item), '_Boundless'])
    CuYearDissolve.to_file(driver='GPKG', filename=r'Results/CumulativeSettlements.gpkg', layer=CuYearName)
    del CuYearSet, CuYearDissolve
print("Done with all years in set. %s" % time.ctime())

Subsetting to cumulative area for year: 1999. Sun Nov 20 16:23:51 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Sun Nov 20 16:23:51 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Sun Nov 20 16:24:12 2022

Subsetting to cumulative area for year: 2000. Sun Nov 20 16:24:19 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Sun Nov 20 16:24:19 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Sun Nov 20 16:24:43 2022

Subsetting to cumulative area for year: 2001. Sun Nov 20 16:24:49 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Sun Nov 20 16:24:49 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Sun Nov 20 16:25:15 2022

Subsetting to cumulative area for year: 2002. Sun Nov 20 16:25:22 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Sun Nov 20 16:25:22 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Sun Nov 20 16:25:49 2022

Subsetting to cumulative area for year: 2003. Sun Nov 20 16:25:55 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Sun Nov 20 16:25:55 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Sun Nov 20 16:26:24 2022

Subsetting to cumulative area for year: 2004. Sun Nov 20 16:26:30 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Sun Nov 20 16:26:30 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Sun Nov 20 16:27:01 2022

Subsetting to cumulative area for year: 2005. Sun Nov 20 16:27:07 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Sun Nov 20 16:27:07 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Sun Nov 20 16:27:38 2022

Subsetting to cumulative area for year: 2006. Sun Nov 20 16:27:44 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Sun Nov 20 16:27:44 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Sun Nov 20 16:28:17 2022

Subsetting to cumulative area for year: 2007. Sun Nov 20 16:28:23 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Sun Nov 20 16:28:23 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Sun Nov 20 16:28:56 2022

Subsetting to cumulative area for year: 2008. Sun Nov 20 16:29:02 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Sun Nov 20 16:29:02 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Sun Nov 20 16:29:37 2022

Subsetting to cumulative area for year: 2009. Sun Nov 20 16:29:43 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Sun Nov 20 16:29:43 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Sun Nov 20 16:30:19 2022

Subsetting to cumulative area for year: 2010. Sun Nov 20 16:30:25 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Sun Nov 20 16:30:25 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Sun Nov 20 16:31:02 2022

Subsetting to cumulative area for year: 2011. Sun Nov 20 16:31:08 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Sun Nov 20 16:31:08 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Sun Nov 20 16:31:46 2022

Subsetting to cumulative area for year: 2012. Sun Nov 20 16:31:52 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Sun Nov 20 16:31:52 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Sun Nov 20 16:32:31 2022

Subsetting to cumulative area for year: 2013. Sun Nov 20 16:32:37 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Sun Nov 20 16:32:37 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Sun Nov 20 16:33:18 2022

Subsetting to cumulative area for year: 2014. Sun Nov 20 16:33:24 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Sun Nov 20 16:33:24 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Sun Nov 20 16:34:07 2022

Subsetting to cumulative area for year: 2015. Sun Nov 20 16:34:13 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Sun Nov 20 16:34:13 2022



  CuYearSet = Boundless[Boundless['year'].between(


Write to file. Sun Nov 20 16:34:57 2022

Done with all years in set. Sun Nov 20 16:35:03 2022


##### Join area information from each cumulative layer onto the latest year dataset.

In [17]:
# The latest year in the study contains all settlements. Merge all other years' areas onto this dataset.
SettAreas = gpd.read_file(r'Results/CumulativeSettlements.gpkg', layer=
                          ''.join(['Cu', str(StudyEnd), '_Boundless'])) 
SettAreas['Area15'] = SettAreas['geometry'].area / 10**6
SettAreas = pd.DataFrame(SettAreas).drop(columns='geometry') # We have settlement IDs, so no need to join spatially!


for item in ReversedStudyYears:
    print("Loading cumulative layer for year %s. %s\n" % (item, time.ctime()))
    YearLayer = gpd.read_file(r'Results/CumulativeSettlements.gpkg', layer=''.join(['Cu', str(item), '_Boundless']))
    print("Adding area field and converting to non-spatial dataframe. %s\n" % (time.ctime()))
    AreaYearName = ''.join(['Area', str(item)[2:]])
    YearLayer[AreaYearName] = YearLayer['geometry'].area/ 10**6 
    YearLayer = pd.DataFrame(YearLayer)[['Sett_ID', AreaYearName]]
    print("Merging variables from %s onto our latest year (%s) via table join. %s\n" % (item, StudyEnd, time.ctime()))
    SettAreas = SettAreas.merge(YearLayer, how='left', on='Sett_ID')
print("Done merging annualized areas onto latest year geometries. Saving to file. %s\n" % (time.ctime()))


print(SettAreas.info())
SettAreas.to_csv(os.path.join(ResultsFolder, 'Areas%sto%s.csv' % (StudyStart, StudyEnd)))

Loading cumulative layer for year 2014. Sun Nov 20 16:35:04 2022

Adding area field and converting to non-spatial dataframe. Sun Nov 20 16:35:06 2022

Merging variables from 2014 onto our latest year (2015) via table join. Sun Nov 20 16:35:06 2022

Loading cumulative layer for year 2013. Sun Nov 20 16:35:06 2022

Adding area field and converting to non-spatial dataframe. Sun Nov 20 16:35:07 2022

Merging variables from 2013 onto our latest year (2015) via table join. Sun Nov 20 16:35:07 2022

Loading cumulative layer for year 2012. Sun Nov 20 16:35:07 2022

Adding area field and converting to non-spatial dataframe. Sun Nov 20 16:35:08 2022

Merging variables from 2012 onto our latest year (2015) via table join. Sun Nov 20 16:35:08 2022

Loading cumulative layer for year 2011. Sun Nov 20 16:35:08 2022

Adding area field and converting to non-spatial dataframe. Sun Nov 20 16:35:09 2022

Merging variables from 2011 onto our latest year (2015) via table join. Sun Nov 20 16:35:09 2022

Load

In [18]:
del SettAreas

### 5.3 Repeat for Bounded dataset.

In [19]:
# Bounded = gpd.read_file(r'Results/NonCumulativeSettlements.gpkg', layer='Settlements_Bounded')

for item in AllStudyYears:
    print('Subsetting to cumulative area for year: %s. %s\n' % (item, time.ctime()))
    CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.
    print('Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. %s\n' % time.ctime())
    CuYearDissolve = CuYearSet.dissolve(by='Bounded_ID', 
                                        aggfunc={"year": "max", "ADM_ID":"min", "Sett_ID":"min"}, # Though ADM_ID and Sett_ID should be matching every time.
                                        as_index=False)
    print('Write to file. %s\n' % time.ctime())
    CuYearName = ''.join(['Cu', str(item), '_Bounded'])
    CuYearDissolve.to_file(driver='GPKG', filename=r'Results/CumulativeSettlements.gpkg', layer=CuYearName)
    del CuYearSet, CuYearDissolve
print("Done with all years in set. %s" % time.ctime())

Subsetting to cumulative area for year: 1999. Sun Nov 20 16:35:36 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sun Nov 20 16:35:36 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sun Nov 20 16:35:56 2022

Subsetting to cumulative area for year: 2000. Sun Nov 20 16:36:02 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sun Nov 20 16:36:02 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sun Nov 20 16:36:24 2022

Subsetting to cumulative area for year: 2001. Sun Nov 20 16:36:31 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sun Nov 20 16:36:31 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sun Nov 20 16:36:54 2022

Subsetting to cumulative area for year: 2002. Sun Nov 20 16:37:01 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sun Nov 20 16:37:01 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sun Nov 20 16:37:26 2022

Subsetting to cumulative area for year: 2003. Sun Nov 20 16:37:32 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sun Nov 20 16:37:32 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sun Nov 20 16:37:58 2022

Subsetting to cumulative area for year: 2004. Sun Nov 20 16:38:05 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sun Nov 20 16:38:05 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sun Nov 20 16:38:32 2022

Subsetting to cumulative area for year: 2005. Sun Nov 20 16:38:39 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sun Nov 20 16:38:39 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sun Nov 20 16:39:07 2022

Subsetting to cumulative area for year: 2006. Sun Nov 20 16:39:14 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sun Nov 20 16:39:14 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sun Nov 20 16:39:43 2022

Subsetting to cumulative area for year: 2007. Sun Nov 20 16:39:50 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sun Nov 20 16:39:50 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sun Nov 20 16:40:21 2022

Subsetting to cumulative area for year: 2008. Sun Nov 20 16:40:27 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sun Nov 20 16:40:27 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sun Nov 20 16:40:59 2022

Subsetting to cumulative area for year: 2009. Sun Nov 20 16:41:06 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sun Nov 20 16:41:06 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sun Nov 20 16:41:39 2022

Subsetting to cumulative area for year: 2010. Sun Nov 20 16:41:45 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sun Nov 20 16:41:45 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sun Nov 20 16:42:19 2022

Subsetting to cumulative area for year: 2011. Sun Nov 20 16:42:26 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sun Nov 20 16:42:26 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sun Nov 20 16:43:01 2022

Subsetting to cumulative area for year: 2012. Sun Nov 20 16:43:07 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sun Nov 20 16:43:07 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sun Nov 20 16:43:44 2022

Subsetting to cumulative area for year: 2013. Sun Nov 20 16:43:50 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sun Nov 20 16:43:50 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sun Nov 20 16:44:28 2022

Subsetting to cumulative area for year: 2014. Sun Nov 20 16:44:35 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sun Nov 20 16:44:35 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sun Nov 20 16:45:14 2022

Subsetting to cumulative area for year: 2015. Sun Nov 20 16:45:20 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sun Nov 20 16:45:20 2022



  CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sun Nov 20 16:46:00 2022

Done with all years in set. Sun Nov 20 16:46:06 2022


In [20]:
SettAreas = gpd.read_file(r'Results/CumulativeSettlements.gpkg', 
                          layer=''.join(['Cu', str(StudyEnd), '_Bounded']))
SettAreas['Area15'] = SettAreas['geometry'].area / 10**6
SettAreas = pd.DataFrame(SettAreas).drop(columns='geometry')


for item in ReversedStudyYears:
    print("Loading cumulative layer for year %s. %s\n" % (item, time.ctime()))
    YearLayer = gpd.read_file(r'Results/CumulativeSettlements.gpkg', layer=''.join(['Cu', str(item), '_Bounded']))
    print("Adding area field and converting to non-spatial dataframe. %s\n" % (time.ctime()))
    AreaYearName = ''.join(['Area', str(item)[2:]])
    YearLayer[AreaYearName] = YearLayer['geometry'].area/ 10**6 
    YearLayer = pd.DataFrame(YearLayer)[['Bounded_ID', AreaYearName]]
    print("Merging variables from %s onto our latest year (%s) via table join. %s\n" % (item, StudyEnd, time.ctime()))
    SettAreas = SettAreas.merge(YearLayer, how='left', on='Bounded_ID')
print("Done merging annualized areas onto latest year geometries. Saving to file. %s\n" % (time.ctime()))

print(SettAreas.info())
SettAreas.to_csv(os.path.join(ResultsFolder, 'Areas%sto%s_%s.csv' % (StudyStart, StudyEnd, 'Bounded')))

Loading cumulative layer for year 2014. Sun Nov 20 16:46:07 2022

Adding area field and converting to non-spatial dataframe. Sun Nov 20 16:46:08 2022

Merging variables from 2014 onto our latest year (2015) via table join. Sun Nov 20 16:46:09 2022

Loading cumulative layer for year 2013. Sun Nov 20 16:46:09 2022

Adding area field and converting to non-spatial dataframe. Sun Nov 20 16:46:10 2022

Merging variables from 2013 onto our latest year (2015) via table join. Sun Nov 20 16:46:10 2022

Loading cumulative layer for year 2012. Sun Nov 20 16:46:10 2022

Adding area field and converting to non-spatial dataframe. Sun Nov 20 16:46:11 2022

Merging variables from 2012 onto our latest year (2015) via table join. Sun Nov 20 16:46:11 2022

Loading cumulative layer for year 2011. Sun Nov 20 16:46:11 2022

Adding area field and converting to non-spatial dataframe. Sun Nov 20 16:46:12 2022

Merging variables from 2011 onto our latest year (2015) via table join. Sun Nov 20 16:46:12 2022

Load

In [22]:
del SettAreas

### 5.4 One settlement geofile to rule them all. ...and in the Sett_ID bind them.
The annualized values can be stored as distinct non-spatial dataframes. Their Sett_IDs will be used to join onto this geoversion with place names for the summary stats.

In [35]:
Settlements = gpd.read_file(r'Results/AnnualSettlements.gpkg', 
                           layer=''.join(['Cu', str(StudyEnd), '_Boundless']))[['Sett_ID', 'ADM_ID', 'geometry']]
print(Settlements.info())
Settlements.to_file(driver='GPKG', 
                       filename=r'Results/SETTLEMENTS.gpkg', 
                       layer='SETTLEMENTS')

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 13753 entries, 0 to 13752
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   Sett_ID   13753 non-null  float64 
 1   ADM_ID    13753 non-null  int64   
 2   geometry  13753 non-null  geometry
dtypes: float64(1), geometry(1), int64(1)
memory usage: 322.5 KB
None


### 5.5 Buffer the area of the Boundless dataset's latest year to mask raster data in later sections.
The Bounded dataset would also be fine for our purposes here. The buffer is dissolved to a single feature to be used for its total extents, which are identical between Bounded & Boundless datasets.

In [119]:
# Create buffer layer(s) to use as maximum distance for Near joins.

# Population buffer: 2km
Distance = 2000

print('Creating buffer layer. %s' % time.ctime())
BufferLayer = gpd.read_file(r'Results/SETTLEMENTS.gpkg', layer='SETTLEMENTS')
BufferLayer['geometry'] = BufferLayer['geometry'].apply(
    make_valid).buffer(Distance) # make_valid is a workaround for any null geometries.
print('Finished buffer layer creation. %s' % time.ctime())
BufferFileName1 = ''.join(['Buff', str(Distance), 'm_', str(StudyEnd)])
BufferLayer.to_file(driver='GPKG', filename=r'Results/Catchment.gpkg', layer=BufferFileName1)
print('Saved to file. %s' % time.ctime())

Creating buffer layer. Mon Nov 21 13:33:06 2022
Finished buffer layer creation. Mon Nov 21 15:10:19 2022
Saved to file. Mon Nov 21 15:10:24 2022


In [120]:
# NTL buffer: 250m
Distance = 250

print('Creating buffer layer. %s' % time.ctime())
BufferLayer = gpd.read_file(r'Results/SETTLEMENTS.gpkg', layer='SETTLEMENTS')
BufferLayer['geometry'] = BufferLayer['geometry'].apply(
    make_valid).buffer(Distance) # make_valid is a workaround for any null geometries.
print('Finished buffer layer creation. %s' % time.ctime())
BufferFileName2 = ''.join(['Buff', str(Distance), 'm_', str(StudyEnd)])
BufferLayer.to_file(driver='GPKG', filename=r'Results/Catchment.gpkg', layer=BufferFileName2)
print('Saved to file. %s' % time.ctime())

Creating buffer layer. Mon Nov 21 15:10:24 2022
Finished buffer layer creation. Mon Nov 21 15:11:16 2022
Saved to file. Mon Nov 21 15:11:23 2022


---

## 6. PLACE NAMES
Join urban place names from UCDB, Africapolis, and GeoNames onto the settlement vectors.

### 6.1 Load placename datasets, filter, and project.

In [91]:
# If restarting here:
Settlements = gpd.read_file(r'Results/SETTLEMENTS.gpkg', layer='SETTLEMENTS')

# Load, pull name field, rename, and reproject to match the catchments CRS.
UCDB = gpd.read_file('PlaceName/GHS_STAT_UCDB2015MT_GLOBE_R2019A_V1_2.gpkg', 
                     layer=0)[['UC_NM_MN', 'geometry']].rename(
    columns={"UC_NM_MN": "UCDB_Name"}).to_crs("ESRI:102022")

Africapolis = gpd.read_file('PlaceName/AFRICAPOLIS2020.shp')[['agglosName', 'geometry']].rename(
    columns={"agglosName": "Afpl_Name"}).to_crs("ESRI:102022")

GeoNames = gpd.read_file('PlaceName/GeoNames.gpkg', 
                         layer=0)[['GeoName', 'geometry']].to_crs("ESRI:102022")

print(Settlements.info(), UCDB.info(), Africapolis.info(), GeoNames.info())

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 13753 entries, 0 to 13752
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   Sett_ID   13753 non-null  float64 
 1   ADM_ID    13753 non-null  int64   
 2   geometry  13753 non-null  geometry
dtypes: float64(1), geometry(1), int64(1)
memory usage: 322.5 KB
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 13135 entries, 0 to 13134
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype   
---  ------     --------------  -----   
 0   UCDB_Name  13135 non-null  object  
 1   geometry   13135 non-null  geometry
dtypes: geometry(1), object(1)
memory usage: 205.4+ KB
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 7720 entries, 0 to 7719
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype   
---  ------     --------------  -----   
 0   Afpl_Name  7720 non-null   object  
 1   geometry   7720 non-null   geometry
dtypes

### 6.2 Join placenames onto settlements geodataframe.

In [92]:
# We wrap it in pd.DataFrame() since the sjoin() is the last time we need the geometry.

GeoNames = pd.DataFrame(gpd.sjoin(Settlements, GeoNames, 
                             how='inner', predicate='contains', # Name file is point type, so we can do contain.
                             lsuffix="G3", rsuffix="GN")).drop(columns='geometry')
Africapolis = pd.DataFrame(gpd.sjoin(Settlements, Africapolis, 
                             how='inner', predicate='intersects', # Name file is polygon type.
                             lsuffix="G3", rsuffix="Af")).drop(columns='geometry')
UCDB = pd.DataFrame(gpd.sjoin(Settlements, UCDB, 
                             how='inner', predicate='intersects', # Name file is polygon type.
                             lsuffix="G3", rsuffix="UC")).drop(columns='geometry')

In [93]:
print(GeoNames.info())
print(Africapolis.info())
print(UCDB.info())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 117 entries, 87 to 13644
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Sett_ID   117 non-null    float64
 1   ADM_ID    117 non-null    int64  
 2   index_GN  117 non-null    int64  
 3   GeoName   117 non-null    object 
dtypes: float64(1), int64(2), object(1)
memory usage: 4.6+ KB
None
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1427 entries, 34 to 12876
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Sett_ID    1427 non-null   float64
 1   ADM_ID     1427 non-null   int64  
 2   index_Af   1427 non-null   int64  
 3   Afpl_Name  1427 non-null   object 
dtypes: float64(1), int64(2), object(1)
memory usage: 55.7+ KB
None
<class 'pandas.core.frame.DataFrame'>
Int64Index: 195 entries, 140 to 12714
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     ------------

In [96]:
alldatasets = [pd.DataFrame(Settlements).drop(columns='geometry'), 
               GeoNames[['Sett_ID', 'GeoName']], 
               Africapolis[['Sett_ID', 'Afpl_Name']], 
               UCDB[['Sett_ID', 'UCDB_Name']]]

SettlementsNamed = reduce(lambda left,right: pd.merge(left,right,on=['Sett_ID'],
                                            how='left'), alldatasets).fillna('UNK')

In [99]:
print(SettlementsNamed.info())
print(SettlementsNamed.sample(10))

<class 'pandas.core.frame.DataFrame'>
Int64Index: 13767 entries, 0 to 13766
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Sett_ID    13767 non-null  float64
 1   ADM_ID     13767 non-null  int64  
 2   GeoName    13767 non-null  object 
 3   Afpl_Name  13767 non-null  object 
 4   UCDB_Name  13767 non-null  object 
dtypes: float64(1), int64(1), object(3)
memory usage: 645.3+ KB
None
        Sett_ID  ADM_ID GeoName  Afpl_Name UCDB_Name
4802    22713.0      82     UNK        UNK       UNK
8705    63112.0     105     UNK        UNK       UNK
7950    55179.0     312     UNK  Bafoussam       UNK
3056    11201.0      19     UNK        UNK       UNK
1971     5597.0      74     UNK        UNK       UNK
6232    36292.0      44     UNK        UNK       UNK
11776  155832.0     238     UNK        UNK       UNK
13659  199856.0     135     UNK  Bafoussam       UNK
5924    32233.0      90     UNK        UNK       UNK
11797  156118

In [100]:
del UCDB, Africapolis, GeoNames

### 6.3 Reduce to single name column.

In [101]:
SettlementsNamed.index.is_unique

True

In [None]:
# Create a single name column where non-named settlements are "UNK" but all others use one of the three name sources.
SettlementsNamed['SettName'] = "UNK"

SettlementsNamed.loc[
    SettlementsNamed['Afpl_Name'] != "UNK", 
    'SettName'] = SettlementsNamed['Afpl_Name']

SettlementsNamed.loc[
    (SettlementsNamed['SettName'] == "UNK") & (SettlementsNamed['UCDB_Name'] != "UNK"), 
    'SettName'] = SettlementsNamed['UCDB_Name']

SettlementsNamed.loc[
    (SettlementsNamed['SettName'] == "UNK") & (SettlementsNamed['GeoName'] != "UNK"), 
    'SettName'] = SettlementsNamed['GeoName']

In [106]:
SettlementsNamed.sample(20)

Unnamed: 0,Sett_ID,ADM_ID,GeoName,Afpl_Name,UCDB_Name,SettName
3168,12906.0,113,UNK,UNK,UNK,UNK
12339,188152.0,278,UNK,UNK,UNK,UNK
7565,51709.0,302,UNK,Bafoussam,UNK,Bafoussam
10801,104514.0,203,UNK,UNK,UNK,UNK
13034,196872.0,110,UNK,UNK,UNK,UNK
5384,24746.0,87,UNK,UNK,UNK,UNK
6980,46731.0,190,UNK,Mamfe,UNK,Mamfe
3496,14697.0,27,UNK,UNK,UNK,UNK
7881,54721.0,312,UNK,UNK,UNK,UNK
9032,66496.0,305,UNK,UNK,UNK,UNK


In [107]:
# Drop geometry and save to file.
SettlementsNamed = SettlementsNamed[['Sett_ID', 'SettName']]
SettlementsNamed.to_csv(r'Results/PlaceNames.csv')

In [108]:
del SettlementsNamed

---

## 7. CREATE FRAGMENTATION INDEX
We are determining what percentage of a settlement's area lies outside of its administrative zone each year.
The index is a range of 0 to 100, i.e. the percent of the settlement area which is fragmented.

For each Sett_ID:
((Area of Boundless settlement - Area of largest Bounded settlement feature) / Area of Boundless settlement) * 100

### 7.1 Load boundless and bounded cumulative settlements and clean.

In [113]:
BoundlessAreas = pd.read_csv(os.path.join(ResultsFolder, ('Areas%sto%s.csv' % (StudyStart, StudyEnd))))
print('Loaded Boundless dataset, whose settlements will be used as the index of the Fragmentation Index dataset. %s' 
      % time.ctime())
print(BoundlessAreas.info())

BoundedAreas = pd.read_csv(os.path.join(ResultsFolder, ('Areas%sto%s_%s.csv' % (StudyStart, StudyEnd, 'Bounded'))))
print('Loaded Bounded dataset, which will factor into the fragmentation calculation. %s' % time.ctime())
print(BoundedAreas.info())

Loaded Boundless dataset, whose settlements will be used as the index of the Fragmentation Index dataset. Mon Nov 21 11:41:18 2022
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13753 entries, 0 to 13752
Data columns (total 21 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  13753 non-null  int64  
 1   Sett_ID     13753 non-null  float64
 2   year        13753 non-null  int64  
 3   ADM_ID      13753 non-null  int64  
 4   Area15      13753 non-null  float64
 5   Area14      13649 non-null  float64
 6   Area13      13511 non-null  float64
 7   Area12      13388 non-null  float64
 8   Area11      13262 non-null  float64
 9   Area10      13164 non-null  float64
 10  Area09      13059 non-null  float64
 11  Area08      12992 non-null  float64
 12  Area07      12893 non-null  float64
 13  Area06      12801 non-null  float64
 14  Area05      12714 non-null  float64
 15  Area04      12632 non-null  float64
 16  Area03      12569

In [114]:
LargestFragments = BoundedAreas.loc[BoundedAreas.groupby(["Sett_ID"])["Area15"].idxmax()] 
print(LargestFragments.info())
print("Filtered the Bounded dataset to only rows where latest year's area is largest for each Sett_ID. %s" % time.ctime())
LargestFragments.columns = LargestFragments.columns.str.replace('Area', 'Largest')
LargestFragments = LargestFragments.drop(columns=['year', 'ADM_ID'])
print("Renamed columns to avoid duplication during merge, and dropped unnecessary columns. %s" % time.ctime())
FragIndices = BoundlessAreas.merge(LargestFragments, how='left', on='Sett_ID')
print(FragIndices.info())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 13753 entries, 0 to 14467
Data columns (total 22 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  13753 non-null  int64  
 1   Bounded_ID  13753 non-null  float64
 2   year        13753 non-null  int64  
 3   ADM_ID      13753 non-null  int64  
 4   Sett_ID     13753 non-null  float64
 5   Area15      13753 non-null  float64
 6   Area14      13649 non-null  float64
 7   Area13      13510 non-null  float64
 8   Area12      13386 non-null  float64
 9   Area11      13260 non-null  float64
 10  Area10      13162 non-null  float64
 11  Area09      13057 non-null  float64
 12  Area08      12989 non-null  float64
 13  Area07      12889 non-null  float64
 14  Area06      12799 non-null  float64
 15  Area05      12712 non-null  float64
 16  Area04      12630 non-null  float64
 17  Area03      12566 non-null  float64
 18  Area02      12491 non-null  float64
 19  Area01      12408 non-nul

In [115]:
del BoundlessAreas, BoundedAreas, LargestFragments

### 7.2 Merge and run fragmentation calculation.

In [116]:
for item in AllStudyYears:
    YY = str(item)[2:] # 2-digit year
    AreaYY = ''.join(["Area", YY]) # The Boundless area variable name
    LargestYY = ''.join(['Largest', YY]) # The Bounded largest area variable name
    FragYY = ''.join(["Frag", YY]) # Name for the fragmentation index variable
    print("Created names for Year %s's variables and temporary objects. %s" % (item, time.ctime()))
    
    FragIndices[FragYY] = ((FragIndices[AreaYY] - FragIndices[LargestYY]) / FragIndices[AreaYY]) * 100
    FragIndices[FragYY] = (FragIndices[FragYY].fillna(0).replace([np.inf, -np.inf], 0)).astype('int')
    print("Calculated fragmentation index for year %s. %s" % (item, time.ctime()))

# Remove unnecessary columns.
FragIndices = FragIndices.loc[:, ~FragIndices.columns.str.startswith('Largest')]
FragIndices = FragIndices.loc[:, ~FragIndices.columns.str.startswith('Area')]

print('Completed fragmentation index calculations for all years. %s' % time.ctime())
print(FragIndices.info())
print(FragIndices.sample(5))

Created names for Year 1999's variables and temporary objects. Mon Nov 21 11:41:58 2022
Calculated fragmentation index for year 1999. Mon Nov 21 11:41:58 2022
Created names for Year 2000's variables and temporary objects. Mon Nov 21 11:41:58 2022
Calculated fragmentation index for year 2000. Mon Nov 21 11:41:58 2022
Created names for Year 2001's variables and temporary objects. Mon Nov 21 11:41:58 2022
Calculated fragmentation index for year 2001. Mon Nov 21 11:41:58 2022
Created names for Year 2002's variables and temporary objects. Mon Nov 21 11:41:58 2022
Calculated fragmentation index for year 2002. Mon Nov 21 11:41:58 2022
Created names for Year 2003's variables and temporary objects. Mon Nov 21 11:41:58 2022
Calculated fragmentation index for year 2003. Mon Nov 21 11:41:58 2022
Created names for Year 2004's variables and temporary objects. Mon Nov 21 11:41:58 2022
Calculated fragmentation index for year 2004. Mon Nov 21 11:41:58 2022
Created names for Year 2005's variables and te

In [117]:
FragIndices = FragIndices.drop(columns=['Unnamed: 0_x', 'Unnamed: 0_y', 'year', 'ADM_ID'])
FragIndices.to_csv(os.path.join(ResultsFolder, 'FragIndex%sto%s.csv' % (StudyStart, StudyEnd)))
print('Saved to file. %s' % time.ctime())

Saved to file. Mon Nov 21 11:43:45 2022


In [None]:
del FragIndices

---

## 8. PREPARE YEARLY DATASETS: POPULATION
Can use this as a template for other annualized rasters

### 8.1 Reproject and reclassify with settlement buffer mask.
Reclassify so that we only need to work with cells within X distance of settlements.

In [4]:
ProjCRS = gdal.WarpOptions(dstSRS='ESRI:102022')
AnnualizedSourceFiles = [i for i in os.listdir('Population/') if i.endswith('.tif')]

with fiona.open(r'Results/Catchment.gpkg', mode="r", layer="Buff2.0km_2015") as shapefile:
    MaskGeom = [feature["geometry"] for feature in shapefile] # Identify the bounding areas of the mask.
# Mask_out = './LatestYearBuffer.tif'
AnnualizedSourceFiles

['cmr_ppp_2000_UNadj.tif',
 'cmr_ppp_2001_UNadj.tif',
 'cmr_ppp_2002_UNadj.tif',
 'cmr_ppp_2003_UNadj.tif',
 'cmr_ppp_2004_UNadj.tif',
 'cmr_ppp_2005_UNadj.tif',
 'cmr_ppp_2006_UNadj.tif',
 'cmr_ppp_2007_UNadj.tif',
 'cmr_ppp_2008_UNadj.tif',
 'cmr_ppp_2009_UNadj.tif',
 'cmr_ppp_2010_UNadj.tif',
 'cmr_ppp_2011_UNadj.tif',
 'cmr_ppp_2012_UNadj.tif',
 'cmr_ppp_2013_UNadj.tif',
 'cmr_ppp_2014_UNadj.tif',
 'cmr_ppp_2015_UNadj.tif',
 'Masked_2000.tif',
 'Masked_2001.tif',
 'Masked_2002.tif',
 'Masked_2003.tif',
 'Masked_2004.tif',
 'Masked_2005.tif',
 'Masked_2006.tif',
 'Masked_2007.tif',
 'Masked_2008.tif',
 'Masked_2009.tif',
 'Masked_2010.tif',
 'Masked_2011.tif',
 'Masked_2012.tif',
 'Masked_2013.tif',
 'Masked_2014.tif',
 'Masked_2015.tif']

In [8]:
# This codeblock changes each annual population raster's projection (gdal.Warp()), 
# then masks it to within a specified distance of the settlements (rasterio.mask.mask()).

for YearFile in AnnualizedSourceFiles:
    InputRasterName = os.path.join(ProjectFolder, "Population", YearFile)
    Year = str(re.sub(r'[^0-9]', '', YearFile))
    InputRasterObject = gdal.Open(InputRasterName)
    TempOutputName = "Temp_" + Year + "_albers.tif"
    TempOutputPath = os.path.join(ProjectFolder, "Population", TempOutputName)
    if exists(TempOutputPath):
        pass
    else:
        # Reproject to same CRS as settlements.
        Warp = gdal.Warp(TempOutputPath, # Where to store the warped raster
                     InputRasterObject, # Which raster to warp
                     format='GTiff', 
                     options=ProjCRS) # Reproject to Africa Albers Equal Area Conic
        print('Finished gdal.Warp() for year %s. %s \n' % (Year, time.ctime()))
        
        Warp = None # Close the files
        InputRasterObject = None

        # Reclassify as nodata if outside settlement buffer zones.
        with rasterio.open(TempOutputPath) as InputRasterObject:
            MaskedOutputRaster, OutTransform = rasterio.mask.mask(
                InputRasterObject, MaskGeom, crop=True) # Anything outside the mask is reclassed to the raster's NoData value.
            OutMetaData = InputRasterObject.meta.copy()
        print('Finished rasterio.mask.mask() for year %s. %s \n' % (Year, time.ctime()))
            
        OutMetaData.update({"driver": "GTiff",
                         "height": MaskedOutputRaster.shape[1],
                         "width": MaskedOutputRaster.shape[2],
                         "transform": OutTransform})
        FinalOutputPath = os.path.join(ProjectFolder, "Population", ''.join(['Masked_', Year, '.tif'])) # ''.join([r'Population/', 'Masked_', Year, '.tif']
        with rasterio.open(FinalOutputPath, "w", **OutMetaData) as dest:
            dest.write(MaskedOutputRaster)
        print('Written to file. %s \n' % time.ctime())
    InputRasterObject = None
    
    try:  # Finally, remove the intermediate file from disk
        os.remove(TempOutputPath)
    except OSError:
        pass
    print('Removed intermediate file. %s \n' % time.ctime())

print('\n \n Finished all years in list. %s' % time.ctime())

Finished gdal.Warp() for year 2000. Wed Nov 23 11:12:35 2022 

Finished rasterio.mask.mask() for year 2000. Wed Nov 23 11:12:38 2022 

Written to file. Wed Nov 23 11:12:39 2022 

Removed intermediate file. Wed Nov 23 11:12:39 2022 

Finished gdal.Warp() for year 2001. Wed Nov 23 11:12:43 2022 

Finished rasterio.mask.mask() for year 2001. Wed Nov 23 11:12:47 2022 

Written to file. Wed Nov 23 11:12:47 2022 

Removed intermediate file. Wed Nov 23 11:12:47 2022 

Finished gdal.Warp() for year 2002. Wed Nov 23 11:12:51 2022 

Finished rasterio.mask.mask() for year 2002. Wed Nov 23 11:12:55 2022 

Written to file. Wed Nov 23 11:12:55 2022 

Removed intermediate file. Wed Nov 23 11:12:55 2022 

Finished gdal.Warp() for year 2003. Wed Nov 23 11:12:59 2022 

Finished rasterio.mask.mask() for year 2003. Wed Nov 23 11:13:02 2022 

Written to file. Wed Nov 23 11:13:03 2022 

Removed intermediate file. Wed Nov 23 11:13:03 2022 

Finished gdal.Warp() for year 2004. Wed Nov 23 11:13:07 2022 

Finis

In [9]:
print(os.listdir('Population/'))

['cmr_ppp_2000_UNadj.tif', 'cmr_ppp_2001_UNadj.tif', 'cmr_ppp_2002_UNadj.tif', 'cmr_ppp_2003_UNadj.tif', 'cmr_ppp_2004_UNadj.tif', 'cmr_ppp_2005_UNadj.tif', 'cmr_ppp_2006_UNadj.tif', 'cmr_ppp_2007_UNadj.tif', 'cmr_ppp_2008_UNadj.tif', 'cmr_ppp_2009_UNadj.tif', 'cmr_ppp_2010_UNadj.tif', 'cmr_ppp_2011_UNadj.tif', 'cmr_ppp_2012_UNadj.tif', 'cmr_ppp_2013_UNadj.tif', 'cmr_ppp_2014_UNadj.tif', 'cmr_ppp_2015_UNadj.tif', 'Masked_2000.tif', 'Masked_2001.tif', 'Masked_2002.tif', 'Masked_2003.tif', 'Masked_2004.tif', 'Masked_2005.tif', 'Masked_2006.tif', 'Masked_2007.tif', 'Masked_2008.tif', 'Masked_2009.tif', 'Masked_2010.tif', 'Masked_2011.tif', 'Masked_2012.tif', 'Masked_2013.tif', 'Masked_2014.tif', 'Masked_2015.tif']


In [5]:
AnnualizedSourceFiles = None

### 8.2 Raster values summarized by settlement.
1. Convert each annualized raster to .xyz, 
2. then bring them to vector space and assign their Sett_ID,
3. and finally, aggregate the value as appropriate to the settlement level and save table to file.

XYZ is similar to .csv. Raster cell centers are stored as x and y, and their value is stored as z.

In [6]:
NoDataVal = -99999 
Settlements = gpd.read_file(r'Results/SETTLEMENTS.gpkg', layer='SETTLEMENTS')
AllSummaries = pd.DataFrame(Settlements).drop(columns='geometry')

AnnualizedMaskedFiles = [i for i in os.listdir('Population/') if i.startswith('Masked') and i.endswith('.tif')]
AnnualizedMaskedFiles

['Masked_2000.tif',
 'Masked_2001.tif',
 'Masked_2002.tif',
 'Masked_2003.tif',
 'Masked_2004.tif',
 'Masked_2005.tif',
 'Masked_2006.tif',
 'Masked_2007.tif',
 'Masked_2008.tif',
 'Masked_2009.tif',
 'Masked_2010.tif',
 'Masked_2011.tif',
 'Masked_2012.tif',
 'Masked_2013.tif',
 'Masked_2014.tif',
 'Masked_2015.tif']

In [8]:
for YearFile in AnnualizedMaskedFiles:
    
### STEP 1: TIF TO XYZ ###
    InputRasterName = os.path.join(ProjectFolder, "Population", YearFile)
    Year = str(re.sub(r'[^0-9]', '', YearFile))
    print('Loading data for year %s. %s \n' % (Year, time.ctime()))
    InputRasterObject = gdal.Open(InputRasterName)
    XYZOutputPath = r'Population/{}'.format(
        YearFile.replace('.tif', '.xyz')) # New file path will be the same as original, but .tif is replaced with .xyz
    
    # Create an .xyz version of the .tif
    XYZ = gdal.Translate(XYZOutputPath, # Specify a destination path
                         InputRasterObject, # Input is the masked .tif file
                         format='XYZ', 
                         creationOptions=["ADD_HEADER_LINE=YES"])
    print('Finished gdal.Translate() for year %s. %s \n' % (Year, time.ctime()))

#     # Remove the temporary masked tif file.
#     try:  
#         os.remove(InputRasterName)
#     except OSError:
#         pass
#     print('Removed (or skipped if error) intermediate tif file. %s \n' % time.ctime())
    
    InputRasterObject = None
    XYZ = None # Reload XYZ as a point geodataframe

    
### STEP 2: GENERATE GEODATAFRAME WITH SETT_ID FIELD ###
    InputXYZName = ''.join(['Masked_', Year, '.xyz'])
    InputXYZ = pd.read_table(os.path.join(ProjectFolder, 'Population', InputXYZName), delim_whitespace=True)
    InputXYZ = InputXYZ.loc[InputXYZ['Z'] != NoDataVal] # Subset to only the features that have a raster value.
    print('Loaded XYZ file as a pandas dataframe, year %s. %s \n' % (Year, time.ctime()))
    ValObject = gpd.GeoDataFrame(InputXYZ,
                                 geometry = gpd.points_from_xy(InputXYZ['X'], InputXYZ['Y']),
                                 crs = 'ESRI:102022')
    print('Created geodataframe from non-NoData points, year %s. %s \n' % (Year, time.ctime()))
    del InputXYZ
    
    # Sjoin_nearest: No need to group by ADM this time. 
    ValObject_withID = gpd.sjoin_nearest(ValObject, 
                                    Settlements, 
                                    how='left') # No need for max_distance parameter this time. We've already narrowed down to nearby raster cells.
    
    print('\nJoined settlement ID onto vectorized raster cells for year %s. %s \n' % (Year, time.ctime()))
    print(ValObject_withID.sample(10))
    del ValObject
    
    # We no longer need the spatial information of the raster values because we have their unique settlement ID.
    ValObject_withID = pd.DataFrame(ValObject_withID).drop(columns='geometry')
    
    ValObject_withID.to_csv(''.join([r'Population/', 'Masked_', Year, '.csv']))
    print('\nExported as table, year %s. %s \n' % (Year, time.ctime()))
    
    # Remove the temporary xyz file.
    try:  
        os.remove(os.path.join(ProjectFolder, 'Population', InputXYZName))
    except OSError:
        pass
    print('Removed (or skipped if error) intermediate xyz file. %s \n' % time.ctime())

    

### STEP 3: AGGREGATE BY SETTLEMENT AND MERGE ONTO SUMMARIES TABLE ###
    AbbrevYear = Year[2:]
    VariableName = ''.join(['PopSum', AbbrevYear])
    
    ValAggregated = ValObject_withID.groupby('Sett_ID', 
                                      as_index=False)['Z'].sum().rename(columns={"Z": VariableName})
    print('\nValues aggregated to settlement level, year %s. %s \n' % (Year, time.ctime()))
    print(ValAggregated.sample(10))
    
    AllSummaries = AllSummaries.merge(ValAggregated, how='left', on='Sett_ID')
    print('\nMerged year %s onto latest year settlement feature layer. %s \n' % (Year, time.ctime()))
    print(AllSummaries.sample(10))
    
    del ValObject_withID, ValAggregated
    
    

print('\n\nFinished. All years masked and assigned their nearest settlement. %s' % time.ctime())

AllSummaries.to_csv(os.path.join(ResultsFolder, 'Pop%sto%s.csv' % (2000, 2015)))
print('Saved to file. %s \n' % time.ctime())

Loading data for year 2000. Wed Nov 23 11:55:19 2022 

Finished gdal.Translate() for year 2000. Wed Nov 23 11:57:42 2022 

Loaded XYZ file as a pandas dataframe, year 2000. Wed Nov 23 12:00:31 2022 

Created geodataframe from non-NoData points, year 2000. Wed Nov 23 12:00:33 2022 


Joined settlement ID onto vectorized raster cells for year 2000. Wed Nov 23 12:11:08 2022 

                      X             Y         Z  \
1651220   -1.095013e+06  1.501413e+06  0.168058   
110798030 -1.408130e+06  2.738610e+05  0.081856   
94704429  -1.376327e+06  4.548608e+05  0.212822   
36867200  -1.193818e+06  1.105346e+06  0.993167   
81251345  -1.594414e+06  6.061343e+05  0.157392   
96593020  -1.318196e+06  4.336277e+05  0.200082   
99755951  -1.363116e+06  3.980506e+05  0.127772   
87024505  -1.580070e+06  5.412084e+05  0.872599   
72071853  -1.569689e+06  7.093740e+05  1.019240   
103404757 -1.483814e+06  3.570001e+05  0.162707   

                                   geometry  index_right   Set

Finished gdal.Translate() for year 2003. Wed Nov 23 12:47:56 2022 

Loaded XYZ file as a pandas dataframe, year 2003. Wed Nov 23 12:50:38 2022 

Created geodataframe from non-NoData points, year 2003. Wed Nov 23 12:50:41 2022 


Joined settlement ID onto vectorized raster cells for year 2003. Wed Nov 23 13:01:20 2022 

                      X             Y         Z  \
84635816  -1.320933e+06  5.681036e+05  0.112403   
44411412  -1.127476e+06  1.020509e+06  2.427579   
92345242  -1.500517e+06  4.813785e+05  0.145754   
45081181  -1.270068e+06  1.012959e+06  0.131340   
87347669  -1.173717e+06  5.376224e+05  1.741389   
95902435  -1.556289e+06  4.413660e+05  0.753063   
110864713 -1.450124e+06  2.731061e+05  0.039992   
106227042 -1.209294e+06  3.252921e+05  0.156116   
64168464  -1.481643e+06  7.982696e+05  0.183274   
86568256  -1.084067e+06  5.463987e+05  0.301804   

                                   geometry  index_right   Sett_ID  ADM_ID  
84635816    POINT (-1320932.735 568103.5


Exported as table, year 2005. Wed Nov 23 13:35:45 2022 

Removed (or skipped if error) intermediate xyz file. Wed Nov 23 13:35:45 2022 


Values aggregated to settlement level, year 2005. Wed Nov 23 13:35:45 2022 

        Sett_ID     PopSum05
9062    66545.0   164.534069
11635  146186.0  1252.802229
1465     4141.0  2861.036418
9279    68725.0   574.134731
10338   79802.0    24.912244
2863     9923.0    53.311167
4665    21852.0    95.038536
1799     5082.0   111.183273
2339     7351.0   206.979882
6789    46409.0   461.573238

Merged year 2005 onto latest year settlement feature layer. Wed Nov 23 13:35:45 2022 

        Sett_ID  ADM_ID     PopSum00     PopSum01     PopSum02     PopSum03  \
6699    45245.0     173  1973.978837  2070.213165  2153.684966  1960.787784   
11254  130278.0     217   558.106228   589.848288   603.088087   676.995308   
11629  146152.0     230   265.989819   321.885846   357.911314   367.160384   
6534    38297.0      52   134.187385   152.060852   119.13109

Finished gdal.Translate() for year 2008. Wed Nov 23 14:19:02 2022 

Loaded XYZ file as a pandas dataframe, year 2008. Wed Nov 23 14:21:40 2022 

Created geodataframe from non-NoData points, year 2008. Wed Nov 23 14:21:42 2022 


Joined settlement ID onto vectorized raster cells for year 2008. Wed Nov 23 14:31:56 2022 

                      X             Y         Z  \
93427682  -1.500423e+06  4.692049e+05  0.055304   
89630487  -1.129741e+06  5.119540e+05  0.131213   
106015895 -1.338768e+06  3.276513e+05  0.124690   
78114031  -1.507595e+06  6.414283e+05  0.985953   
5989470   -1.085293e+06  1.452624e+06  0.214365   
96692500  -1.432571e+06  4.324953e+05  0.119662   
79540037  -1.551382e+06  6.253856e+05  1.397560   
75320225  -1.470130e+06  6.728532e+05  1.556616   
110856605 -1.423417e+06  2.732004e+05  0.378678   
104034597 -1.435214e+06  3.499224e+05  0.358900   

                                   geometry  index_right   Sett_ID  ADM_ID  
93427682    POINT (-1500422.598 469204.8

Finished gdal.Translate() for year 2010. Wed Nov 23 14:51:17 2022 

Loaded XYZ file as a pandas dataframe, year 2010. Wed Nov 23 14:53:55 2022 

Created geodataframe from non-NoData points, year 2010. Wed Nov 23 14:53:57 2022 


Joined settlement ID onto vectorized raster cells for year 2010. Wed Nov 23 15:04:13 2022 

                      X             Y          Z  \
74405544  -1.475981e+06  6.831394e+05   0.754115   
22846354  -1.145218e+06  1.263037e+06   0.853730   
73055404  -1.399448e+06  6.983328e+05   0.084264   
95895441  -1.424456e+06  4.414604e+05   0.189197   
91857591  -1.592338e+06  4.868519e+05  14.253531   
102584345 -1.303946e+06  3.662483e+05   0.144345   
54573336  -1.087558e+06  9.062278e+05   0.218343   
26354250  -1.101997e+06  1.223591e+06   2.036159   
91236435  -1.613288e+06  4.938352e+05   1.516501   
75385670  -1.628953e+06  6.720982e+05   1.646775   

                                   geometry  index_right   Sett_ID  ADM_ID  
74405544    POINT (-1475981.0

Finished gdal.Translate() for year 2012. Wed Nov 23 15:23:33 2022 

Loaded XYZ file as a pandas dataframe, year 2012. Wed Nov 23 15:26:10 2022 

Created geodataframe from non-NoData points, year 2012. Wed Nov 23 15:26:12 2022 


Joined settlement ID onto vectorized raster cells for year 2012. Wed Nov 23 15:36:28 2022 

                      X             Y         Z  \
101890546 -1.053491e+06  3.740809e+05  0.120129   
24364645  -1.190515e+06  1.245956e+06  1.786115   
94066822  -1.366041e+06  4.620328e+05  0.299432   
7491714   -1.061229e+06  1.435732e+06  0.961995   
58836604  -1.027162e+06  8.582883e+05  0.172253   
31513445  -1.221845e+06  1.165554e+06  0.790347   
74211930  -1.534584e+06  6.853099e+05  2.278141   
92889233  -1.634898e+06  4.752445e+05  1.978616   
66210171  -1.227413e+06  7.753379e+05  0.783347   
101399919 -1.426154e+06  3.795543e+05  0.224992   

                                   geometry  index_right   Sett_ID  ADM_ID  
101890546   POINT (-1053490.952 374080.9

Finished gdal.Translate() for year 2014. Wed Nov 23 15:55:45 2022 

Loaded XYZ file as a pandas dataframe, year 2014. Wed Nov 23 15:58:23 2022 

Created geodataframe from non-NoData points, year 2014. Wed Nov 23 15:58:25 2022 


Joined settlement ID onto vectorized raster cells for year 2014. Wed Nov 23 16:08:40 2022 

                      X             Y         Z  \
53324041  -9.965864e+05  9.202887e+05  0.257418   
45979958  -1.181361e+06  1.002862e+06  0.580164   
67292710  -1.217976e+06  7.631643e+05  0.889021   
91644692  -1.095296e+06  4.893055e+05  0.578937   
76695677  -1.533546e+06  6.573767e+05  2.817606   
78112674  -1.635653e+06  6.414283e+05  0.424280   
92524250  -1.236567e+06  4.793967e+05  0.208393   
103162288 -1.401713e+06  3.597368e+05  0.464105   
99940009  -1.414452e+06  3.959745e+05  0.344184   
72945067  -1.517786e+06  6.995596e+05  2.412297   

                                   geometry  index_right   Sett_ID  ADM_ID  
53324041     POINT (-996586.438 920288.7

Saved to file. Wed Nov 23 16:25:35 2022 



In [10]:
AllSummaries.sample(20)

Unnamed: 0,Sett_ID,ADM_ID,PopSum00,PopSum01,PopSum02,PopSum03,PopSum04,PopSum05,PopSum06,PopSum07,PopSum08,PopSum09,PopSum10,PopSum11,PopSum12,PopSum13,PopSum14,PopSum15
4595,21133.0,295,2.190004,1.727115,1.641959,1.864361,1.752525,1.748426,1.638094,1.834905,1.815182,1.828446,2.53928,2.855308,3.334095,3.25351,3.217648,2.772022
4845,22770.0,86,209.400139,234.310859,222.566305,243.654188,230.61778,249.180102,252.629279,275.075324,260.975841,302.323238,249.659442,287.126995,316.431957,308.656657,279.990103,270.546047
10851,104749.0,203,353.676892,240.082718,251.939314,262.016045,233.882684,236.621804,235.7961,239.03797,262.463745,268.884971,225.061515,303.005873,289.200799,277.451022,366.55327,355.136125
12852,196170.0,181,675.244368,717.637488,710.57126,675.706987,670.082964,692.748582,708.367766,699.808414,762.582111,735.414647,873.667941,786.885417,747.582974,877.188596,840.260319,938.948393
9664,70561.0,138,304.280552,255.675006,248.62148,286.94556,281.680882,305.066905,291.007577,309.06109,284.353585,298.19439,300.89064,315.767939,274.858355,267.246658,305.043551,301.775216
6795,46411.0,189,109.902313,101.236738,152.718517,169.479533,177.433142,178.104989,178.34904,179.141797,169.249841,182.881142,169.420237,179.567743,205.91343,209.021485,191.019806,206.02004
5832,28445.0,121,223.599995,253.112718,237.151864,280.43223,286.95681,288.641641,296.764634,320.312347,318.915843,389.707968,342.747498,413.159256,484.969937,437.453251,1229.691935,1391.297408
4736,22540.0,290,168.373869,179.3394,170.389068,197.086953,183.574185,181.026619,171.377538,176.498052,187.201379,190.606766,187.666106,186.998355,193.450735,180.372608,165.52076,164.630812
8554,60793.0,155,2578.129185,2031.174007,2040.594864,2449.163953,2274.186983,2557.560822,2418.428217,2725.995616,2872.811403,2391.83455,2763.395366,2623.744151,2572.986253,2777.210593,2483.534465,2516.261756
11493,145491.0,227,804.084177,783.192272,798.141453,896.871611,863.311051,916.201023,941.660557,1011.218468,1035.654339,1136.564346,1098.0926,1214.947393,1369.440888,1403.342602,1415.601111,1588.510209


---

## 9. PREPARE YEARLY DATASETS: NIGHTTIME LIGHTS

### 9.1 Reproject and reclassify with settlement buffer mask.
Reclassify so that we only need to work with cells within X distance of settlements.

In [15]:
ProjCRS = gdal.WarpOptions(dstSRS='ESRI:102022')
AnnualizedSourceFiles = [i for i in os.listdir('NTL/') if i.endswith('.tif')]

with fiona.open(r'Results/Catchment.gpkg', mode="r", layer="Buff250m_2015") as shapefile:
    MaskGeom = [feature["geometry"] for feature in shapefile] # Identify the bounding areas of the mask.
# Mask_out = './LatestYearBuffer.tif'
AnnualizedSourceFiles

['Masked_1999.tif',
 'Masked_2000.tif',
 'Masked_2001.tif',
 'Masked_2002.tif',
 'Masked_2003.tif',
 'Masked_2004.tif',
 'Masked_2005.tif',
 'Masked_2006.tif',
 'Masked_2007.tif',
 'Masked_2008.tif',
 'Masked_2009.tif',
 'Masked_2010.tif',
 'Masked_2011.tif',
 'Masked_2012.tif',
 'Masked_2013.tif',
 'Masked_2014.tif',
 'Masked_2015.tif']

In [6]:
ValStart = 1999
ValEnd = 2015

In [8]:
# This codeblock changes each annual population raster's projection (gdal.Warp()), 
# then masks it to within a specified distance of the settlements (rasterio.mask.mask()).

for YearFile in AnnualizedSourceFiles:
    InputRasterName = os.path.join(ProjectFolder, "NTL", YearFile)
    Year = str(re.sub(r'[^0-9]', '', YearFile))
    InputRasterObject = gdal.Open(InputRasterName)
    TempOutputName = "Temp_" + Year + "_albers.tif"
    TempOutputPath = os.path.join(ProjectFolder, "NTL", TempOutputName)
    if exists(TempOutputPath):
        pass
    else:
        # Reproject to same CRS as settlements.
        Warp = gdal.Warp(TempOutputPath, # Where to store the warped raster
                     InputRasterObject, # Which raster to warp
                     format='GTiff', 
                     options=ProjCRS) # Reproject to Africa Albers Equal Area Conic
        print('Finished gdal.Warp() for year %s. %s \n' % (Year, time.ctime()))
        
        Warp = None # Close the files
        InputRasterObject = None

        # Reclassify as nodata if outside settlement buffer zones.
        with rasterio.open(TempOutputPath) as InputRasterObject:
            MaskedOutputRaster, OutTransform = rasterio.mask.mask(
                InputRasterObject, MaskGeom, crop=True) # Anything outside the mask is reclassed to the raster's NoData value.
            OutMetaData = InputRasterObject.meta.copy()
        print('Finished rasterio.mask.mask() for year %s. %s \n' % (Year, time.ctime()))
            
        OutMetaData.update({"driver": "GTiff",
                         "height": MaskedOutputRaster.shape[1],
                         "width": MaskedOutputRaster.shape[2],
                         "transform": OutTransform})
        FinalOutputPath = os.path.join(ProjectFolder, "NTL", ''.join(['Masked_', Year, '.tif']))
        with rasterio.open(FinalOutputPath, "w", **OutMetaData) as dest:
            dest.write(MaskedOutputRaster)
        print('Written to file. %s \n' % time.ctime())
    InputRasterObject = None
    
    try:  # Finally, remove the intermediate file from disk
        os.remove(TempOutputPath)
    except OSError:
        pass
    print('Removed intermediate file. %s \n' % time.ctime())

print('\n \n Finished all years in list. %s' % time.ctime())

Finished gdal.Warp() for year 2000. Wed Nov 23 11:12:35 2022 

Finished rasterio.mask.mask() for year 2000. Wed Nov 23 11:12:38 2022 

Written to file. Wed Nov 23 11:12:39 2022 

Removed intermediate file. Wed Nov 23 11:12:39 2022 

Finished gdal.Warp() for year 2001. Wed Nov 23 11:12:43 2022 

Finished rasterio.mask.mask() for year 2001. Wed Nov 23 11:12:47 2022 

Written to file. Wed Nov 23 11:12:47 2022 

Removed intermediate file. Wed Nov 23 11:12:47 2022 

Finished gdal.Warp() for year 2002. Wed Nov 23 11:12:51 2022 

Finished rasterio.mask.mask() for year 2002. Wed Nov 23 11:12:55 2022 

Written to file. Wed Nov 23 11:12:55 2022 

Removed intermediate file. Wed Nov 23 11:12:55 2022 

Finished gdal.Warp() for year 2003. Wed Nov 23 11:12:59 2022 

Finished rasterio.mask.mask() for year 2003. Wed Nov 23 11:13:02 2022 

Written to file. Wed Nov 23 11:13:03 2022 

Removed intermediate file. Wed Nov 23 11:13:03 2022 

Finished gdal.Warp() for year 2004. Wed Nov 23 11:13:07 2022 

Finis

In [17]:
print(os.listdir('NTL/'))

['Harmonized_DN_NTL_1999_2015.zip', 'Masked_1999.tif', 'Masked_2000.tif', 'Masked_2001.tif', 'Masked_2002.tif', 'Masked_2003.tif', 'Masked_2004.tif', 'Masked_2005.tif', 'Masked_2006.tif', 'Masked_2007.tif', 'Masked_2008.tif', 'Masked_2009.tif', 'Masked_2010.tif', 'Masked_2011.tif', 'Masked_2012.tif', 'Masked_2013.tif', 'Masked_2014.tif', 'Masked_2015.tif']


In [5]:
AnnualizedSourceFiles = None

### 9.2 Raster values summarized by settlement.
1. Convert each annualized raster to .xyz, 
2. then bring them to vector space and assign their Sett_ID,
3. and finally, aggregate the value as appropriate to the settlement level and save table to file.

XYZ is similar to .csv. Raster cell centers are stored as x and y, and their value is stored as z.

In [14]:
NoDataVal = 0
Settlements = gpd.read_file(r'Results/SETTLEMENTS.gpkg', layer='SETTLEMENTS')
AllSummaries = pd.DataFrame(Settlements).drop(columns='geometry')

AnnualizedMaskedFiles = [i for i in os.listdir('NTL/') if i.startswith('Masked') and i.endswith('.tif')]
AnnualizedMaskedFiles

['Masked_1999.tif',
 'Masked_2000.tif',
 'Masked_2001.tif',
 'Masked_2002.tif',
 'Masked_2003.tif',
 'Masked_2004.tif',
 'Masked_2005.tif',
 'Masked_2006.tif',
 'Masked_2007.tif',
 'Masked_2008.tif',
 'Masked_2009.tif',
 'Masked_2010.tif',
 'Masked_2011.tif',
 'Masked_2012.tif',
 'Masked_2013.tif',
 'Masked_2014.tif',
 'Masked_2015.tif']

In [15]:
for YearFile in AnnualizedMaskedFiles:
    
### STEP 1: TIF TO XYZ ###
    InputRasterName = os.path.join(ProjectFolder, "NTL", YearFile)
    Year = str(re.sub(r'[^0-9]', '', YearFile))
    print('Loading data for year %s. %s \n' % (Year, time.ctime()))
    InputRasterObject = gdal.Open(InputRasterName)
    XYZOutputPath = r'NTL/{}'.format(
        YearFile.replace('.tif', '.xyz')) # New file path will be the same as original, but .tif is replaced with .xyz
    
    # Create an .xyz version of the .tif
    XYZ = gdal.Translate(XYZOutputPath, # Specify a destination path
                         InputRasterObject, # Input is the masked .tif file
                         format='XYZ', 
                         creationOptions=["ADD_HEADER_LINE=YES"])
    print('Finished gdal.Translate() for year %s. %s \n' % (Year, time.ctime()))

#     # Remove the temporary masked tif file.
#     try:  
#         os.remove(InputRasterName)
#     except OSError:
#         pass
#     print('Removed (or skipped if error) intermediate tif file. %s \n' % time.ctime())
    
    InputRasterObject = None
    XYZ = None # Reload XYZ as a point geodataframe

    
### STEP 2: GENERATE GEODATAFRAME WITH SETT_ID FIELD ###
    InputXYZName = ''.join(['Masked_', Year, '.xyz'])
    InputXYZ = pd.read_table(os.path.join(ProjectFolder, 'NTL', InputXYZName), delim_whitespace=True)
    InputXYZ = InputXYZ.loc[InputXYZ['Z'] != NoDataVal] # Subset to only the features that have a raster value.
    print('Loaded XYZ file as a pandas dataframe, year %s. %s \n' % (Year, time.ctime()))
    ValObject = gpd.GeoDataFrame(InputXYZ,
                                 geometry = gpd.points_from_xy(InputXYZ['X'], InputXYZ['Y']),
                                 crs = 'ESRI:102022')
    print('Created geodataframe from non-NoData points, year %s. %s \n' % (Year, time.ctime()))
    del InputXYZ
    
    # Sjoin_nearest: No need to group by ADM this time. 
    ValObject_withID = gpd.sjoin_nearest(ValObject, 
                                    Settlements, 
                                    how='left') # No need for max_distance parameter this time. We've already narrowed down to nearby raster cells.
    
    print('\nJoined settlement ID onto vectorized raster cells for year %s. %s \n' % (Year, time.ctime()))
    print(ValObject_withID.sample(10))
    del ValObject
    
    # We no longer need the spatial information of the raster values because we have their unique settlement ID.
    ValObject_withID = pd.DataFrame(ValObject_withID).drop(columns='geometry')
    
    ValObject_withID.to_csv(''.join([r'NTL/', 'Masked_', Year, '.csv']))
    print('\nExported as table, year %s. %s \n' % (Year, time.ctime()))
    
    # Remove the temporary xyz file.
    try:  
        os.remove(os.path.join(ProjectFolder, 'NTL', InputXYZName))
    except OSError:
        pass
    print('Removed (or skipped if error) intermediate xyz file. %s \n' % time.ctime())

    

### STEP 3: AGGREGATE BY SETTLEMENT AND MERGE ONTO SUMMARIES TABLE ###
    AbbrevYear = Year[2:]
    
    # Sum
    VariableName = ''.join(['NTLsum', AbbrevYear])
    ValAggregated = ValObject_withID[
        ValObject_withID['Z'].notna()].groupby(
        'Sett_ID', as_index=False)['Z'].sum().rename(columns={"Z": VariableName})
    print('\nValues summed to settlement level, year %s. %s \n' % (Year, time.ctime()))
    print(ValAggregated.sample(10))
    AllSummaries = AllSummaries.merge(ValAggregated, how='left', on='Sett_ID')
    
    # Average
    VariableName = ''.join(['NTLavg', AbbrevYear])
    ValAggregated = ValObject_withID[
        ValObject_withID['Z'].notna()].groupby(
        'Sett_ID', as_index=False)['Z'].mean().rename(columns={"Z": VariableName})
    print('\nValues averaged to settlement level, year %s. %s \n' % (Year, time.ctime()))
    print(ValAggregated.sample(10))
    AllSummaries = AllSummaries.merge(ValAggregated, how='left', on='Sett_ID')
    print('\nMerged year %s onto latest year settlement feature layer. %s \n' % (Year, time.ctime()))
    
    
    print(AllSummaries.sample(10))
    del ValObject_withID, ValAggregated
    
    

print('\n\nFinished. All years masked and assigned their nearest settlement. %s' % time.ctime())

AllSummaries.to_csv(os.path.join(ResultsFolder, 'NTL%sto%s.csv' % (ValStart, ValEnd)))
print('Saved to file. %s \n' % time.ctime())

Loading data for year 1999. Thu Nov 24 17:01:55 2022 

Finished gdal.Translate() for year 1999. Thu Nov 24 17:01:57 2022 

Loaded XYZ file as a pandas dataframe, year 1999. Thu Nov 24 17:01:58 2022 

Created geodataframe from non-NoData points, year 1999. Thu Nov 24 17:01:58 2022 


Joined settlement ID onto vectorized raster cells for year 1999. Thu Nov 24 17:01:59 2022 

                    X             Y   Z                          geometry  \
837223  -1.544762e+06  7.046870e+05  11   POINT (-1544762.350 704687.008)   
1095073 -1.401181e+06  4.542959e+05  61   POINT (-1401181.429 454295.891)   
280900  -1.111393e+06  1.245742e+06  26  POINT (-1111393.108 1245741.941)   
679056  -1.186686e+06  8.587738e+05  11   POINT (-1186685.542 858773.850)   
425863  -1.197191e+06  1.104788e+06   5  POINT (-1197191.463 1104787.501)   
1071430 -1.591164e+06  4.770587e+05  59   POINT (-1591163.501 477058.719)   
1065120 -1.593790e+06  4.831872e+05  13   POINT (-1593789.981 483187.173)   
1085164 

Finished gdal.Translate() for year 2002. Thu Nov 24 17:02:07 2022 

Loaded XYZ file as a pandas dataframe, year 2002. Thu Nov 24 17:02:09 2022 

Created geodataframe from non-NoData points, year 2002. Thu Nov 24 17:02:09 2022 


Joined settlement ID onto vectorized raster cells for year 2002. Thu Nov 24 17:02:10 2022 

                    X             Y   Z                          geometry  \
1104079 -1.404683e+06  4.455410e+05  13   POINT (-1404683.403 445540.956)   
1069621 -1.597292e+06  4.788097e+05  16   POINT (-1597291.955 478809.706)   
429456  -1.206822e+06  1.101286e+06  13  POINT (-1206821.890 1101285.527)   
1091468 -1.402057e+06  4.577979e+05  62   POINT (-1402056.923 457797.864)   
1130437 -1.204195e+06  4.201516e+05   7   POINT (-1204195.410 420151.647)   
1089658 -1.409061e+06  4.595489e+05   8   POINT (-1409060.870 459548.851)   
1088769 -1.398555e+06  4.604243e+05  50   POINT (-1398554.949 460424.344)   
465527  -1.179682e+06  1.066266e+06  11  POINT (-1179681.594 10

Finished gdal.Translate() for year 2005. Thu Nov 24 17:02:19 2022 

Loaded XYZ file as a pandas dataframe, year 2005. Thu Nov 24 17:02:20 2022 

Created geodataframe from non-NoData points, year 2005. Thu Nov 24 17:02:20 2022 


Joined settlement ID onto vectorized raster cells for year 2005. Thu Nov 24 17:02:21 2022 

                    X              Y   Z                         geometry  \
987822  -1.429197e+06  558479.607373   7  POINT (-1429197.219 558479.607)   
939084  -1.502739e+06  605756.251983   5  POINT (-1502738.666 605756.252)   
1211264 -1.434450e+06  341357.239531   5  POINT (-1434450.179 341357.240)   
1080652 -1.405559e+06  468303.785245   5  POINT (-1405558.896 468303.785)   
1071434 -1.587662e+06  477058.719432  58  POINT (-1587661.528 477058.719)   
898520  -1.519373e+06  645153.455825  10  POINT (-1519373.041 645153.456)   
1071393 -1.623557e+06  477058.719432   5  POINT (-1623556.758 477058.719)   
1122917 -1.477349e+06  427155.594565   5  POINT (-1477349.357 4

Finished gdal.Translate() for year 2007. Thu Nov 24 17:02:26 2022 

Loaded XYZ file as a pandas dataframe, year 2007. Thu Nov 24 17:02:27 2022 

Created geodataframe from non-NoData points, year 2007. Thu Nov 24 17:02:28 2022 


Joined settlement ID onto vectorized raster cells for year 2007. Thu Nov 24 17:02:28 2022 

                    X             Y   Z                          geometry  \
679053  -1.189312e+06  8.587738e+05  14   POINT (-1189312.022 858773.850)   
897628  -1.511494e+06  6.460289e+05   5   POINT (-1511493.600 646028.949)   
1081286 -1.639316e+06  4.674283e+05   5   POINT (-1639315.639 467428.292)   
1007456 -1.593790e+06  5.392188e+05   7   POINT (-1593789.981 539218.752)   
225914  -1.133280e+06  1.299147e+06   8  POINT (-1133280.443 1299147.040)   
1057235 -1.397679e+06  4.910666e+05   8   POINT (-1397679.456 491066.614)   
311640  -1.018591e+06  1.215975e+06   8  POINT (-1018590.805 1215975.165)   
1070535 -1.585911e+06  4.779342e+05  58   POINT (-1585910.541 4

Finished gdal.Translate() for year 2009. Thu Nov 24 17:02:33 2022 

Loaded XYZ file as a pandas dataframe, year 2009. Thu Nov 24 17:02:35 2022 

Created geodataframe from non-NoData points, year 2009. Thu Nov 24 17:02:35 2022 


Joined settlement ID onto vectorized raster cells for year 2009. Thu Nov 24 17:02:35 2022 

                    X             Y   Z                          geometry  \
898522  -1.517608e+06  6.450255e+05  16   POINT (-1517608.195 645025.493)   
1076778 -1.641928e+06  4.716777e+05  16   POINT (-1641928.296 471677.746)   
464627  -1.178792e+06  1.067013e+06  10  POINT (-1178792.144 1067013.442)   
1060569 -1.634049e+06  4.874366e+05  13   POINT (-1634048.853 487436.632)   
1006129 -1.177917e+06  5.408417e+05  13   POINT (-1177916.651 540841.746)   
1094175 -1.398541e+06  4.550434e+05  59   POINT (-1398541.056 455043.366)   
1208559 -1.436187e+06  3.438557e+05   8   POINT (-1436187.284 343855.670)   
1006133 -1.174415e+06  5.408417e+05  16   POINT (-1174414.676 5

Finished gdal.Translate() for year 2011. Thu Nov 24 17:02:40 2022 

Loaded XYZ file as a pandas dataframe, year 2011. Thu Nov 24 17:02:42 2022 

Created geodataframe from non-NoData points, year 2011. Thu Nov 24 17:02:42 2022 


Joined settlement ID onto vectorized raster cells for year 2011. Thu Nov 24 17:02:43 2022 

                    X              Y   Z                         geometry  \
941784  -1.505351e+06  603001.796345   5  POINT (-1505351.284 603001.796)   
1059665 -1.636675e+06  488312.125503   8  POINT (-1636675.334 488312.126)   
867949  -1.464203e+06  674792.277331   7  POINT (-1464203.081 674792.277)   
896725  -1.513231e+06  646776.479873   7  POINT (-1513230.727 646776.480)   
543974  -1.127138e+06  989969.998730   8  POINT (-1127138.018 989969.999)   
1059037 -1.397666e+06  489187.619173  10  POINT (-1397665.562 489187.619)   
1092359 -1.410798e+06  456794.353363   6  POINT (-1410797.967 456794.353)   
1210459 -1.350389e+06  342104.682520   6  POINT (-1350388.904 3

Finished gdal.Translate() for year 2013. Thu Nov 24 17:02:48 2022 

Loaded XYZ file as a pandas dataframe, year 2013. Thu Nov 24 17:02:50 2022 

Created geodataframe from non-NoData points, year 2013. Thu Nov 24 17:02:50 2022 


Joined settlement ID onto vectorized raster cells for year 2013. Thu Nov 24 17:02:50 2022 

                    X             Y   Z                          geometry  \
939040  -1.541260e+06  6.057563e+05   7   POINT (-1541260.376 605756.252)   
1195803 -1.560521e+06  3.562406e+05   9   POINT (-1560521.232 356240.628)   
20535   -1.090381e+06  1.498760e+06   5  POINT (-1090381.266 1498759.539)   
102584  -1.039603e+06  1.419090e+06  32  POINT (-1039602.647 1419089.638)   
1089669 -1.399430e+06  4.595489e+05  63   POINT (-1399430.443 459548.851)   
901227  -1.515871e+06  6.425270e+05  22   POINT (-1515871.067 642526.976)   
1074978 -1.640191e+06  4.735567e+05  12   POINT (-1640191.133 473556.746)   
939080  -1.506241e+06  6.057563e+05   8   POINT (-1506240.640 6

Finished gdal.Translate() for year 2015. Thu Nov 24 17:02:56 2022 

Loaded XYZ file as a pandas dataframe, year 2015. Thu Nov 24 17:02:58 2022 

Created geodataframe from non-NoData points, year 2015. Thu Nov 24 17:02:58 2022 


Joined settlement ID onto vectorized raster cells for year 2015. Thu Nov 24 17:03:00 2022 

                    X             Y   Z                          geometry  \
939038  -1.543011e+06  6.057563e+05   7   POINT (-1543011.363 605756.252)   
1169901 -1.361784e+06  3.816299e+05   7   POINT (-1361784.226 381629.937)   
946287  -1.507116e+06  5.987523e+05   6   POINT (-1507116.133 598752.305)   
1143728 -1.400306e+06  4.070192e+05   7   POINT (-1400305.936 407019.246)   
896748  -1.493108e+06  6.469044e+05   7   POINT (-1493108.238 646904.443)   
1080452 -1.580658e+06  4.683038e+05  35   POINT (-1580657.580 468303.785)   
884087  -1.534256e+06  6.591614e+05   7   POINT (-1534256.429 659161.351)   
1136943 -1.029972e+06  4.140232e+05   6   POINT (-1029972.220 4

In [16]:
AllSummaries.sample(20)

Unnamed: 0,Sett_ID,ADM_ID,NTLsum99,NTLavg99,NTLsum00,NTLavg00,NTLsum01,NTLavg01,NTLsum02,NTLavg02,...,NTLsum11,NTLavg11,NTLsum12,NTLavg12,NTLsum13,NTLavg13,NTLsum14,NTLavg14,NTLsum15,NTLavg15
12927,196283.0,181,,,,,,,,,...,,,,,,,,,,
6229,36299.0,44,,,,,,,,,...,,,,,,,,,,
5052,23556.0,85,,,,,,,,,...,,,,,,,,,,
5452,25174.0,87,,,,,,,5.0,5.0,...,,,5.0,5.0,5.0,5.0,6.0,6.0,7.0,7.0
7398,50467.0,332,,,,,,,,,...,,,,,,,,,,
4015,19171.0,295,,,,,,,,,...,,,,,,,6.0,6.0,6.0,6.0
4898,22848.0,83,,,,,,,,,...,,,,,,,,,,
927,2676.0,9,,,,,,,,,...,,,,,,,,,,
409,1259.0,6,,,,,,,,,...,,,,,,,,,,
9426,68877.0,137,,,,,,,,,...,,,,,,,,,12.0,6.0


In [17]:
AllSummaries.sort_values('NTLsum10', ascending=False).head(20)

Unnamed: 0,Sett_ID,ADM_ID,NTLsum99,NTLavg99,NTLsum00,NTLavg00,NTLsum01,NTLavg01,NTLsum02,NTLavg02,...,NTLsum11,NTLavg11,NTLsum12,NTLavg12,NTLsum13,NTLavg13,NTLsum14,NTLavg14,NTLsum15,NTLavg15
3967,19120.0,80,11592.0,24.611465,11687.0,24.656118,12135.0,25.176349,12816.0,26.424742,...,15538.0,31.905544,17625.0,36.042945,17387.0,35.483673,18785.0,38.336735,20394.0,41.620408
3074,12521.0,284,7363.0,31.737069,7273.0,30.687764,7326.0,30.911392,7904.0,33.350211,...,10909.0,46.421277,10426.0,43.991561,10662.0,44.987342,10436.0,44.033755,10451.0,44.097046
3128,12684.0,352,2360.0,59.0,2348.0,58.7,2353.0,58.825,2419.0,60.475,...,2417.0,60.425,2465.0,61.625,2452.0,61.3,2362.0,59.05,2323.0,58.075
3126,12674.0,352,2360.0,59.0,2348.0,58.7,2353.0,58.825,2419.0,60.475,...,2365.0,60.641026,2465.0,61.625,2452.0,61.3,2362.0,59.05,2323.0,58.075
3124,12670.0,352,2360.0,59.0,2348.0,58.7,2353.0,58.825,2419.0,60.475,...,2365.0,60.641026,2465.0,61.625,2452.0,61.3,2362.0,59.05,2323.0,58.075
3096,12544.0,352,2360.0,59.0,2348.0,58.7,2353.0,58.825,2419.0,60.475,...,2365.0,60.641026,2465.0,61.625,2452.0,61.3,2362.0,59.05,2323.0,58.075
3141,12883.0,113,2009.0,30.439394,2064.0,31.272727,2061.0,31.227273,1936.0,29.333333,...,1833.0,28.640625,2600.0,39.393939,2506.0,37.969697,2234.0,33.848485,2172.0,32.909091
11499,145497.0,230,1609.0,21.171053,1699.0,22.355263,1898.0,24.973684,1798.0,23.350649,...,2072.0,26.564103,2364.0,30.307692,2339.0,29.987179,2061.0,26.423077,2038.0,26.128205
13626,199780.0,80,1701.0,53.15625,1720.0,53.75,1724.0,53.875,1705.0,53.28125,...,1753.0,54.78125,1929.0,60.28125,1886.0,58.9375,1813.0,56.65625,1853.0,57.90625
13629,199787.0,80,1701.0,53.15625,1720.0,53.75,1724.0,53.875,1705.0,53.28125,...,1753.0,54.78125,1929.0,60.28125,1886.0,58.9375,1813.0,56.65625,1853.0,57.90625
