# Spatiotemporal Trends in Urbanization: Cameroon
*Using yearly estimates (2000-2015) of population, built-area, and economic indicators to track city-by-city growth and change over time.*

---

### Research questions 

#### 1. How has the size of Settlement X changed over time? 

- Population size 

- Geographical extents 

- Population density 

#### 2. In what year did Settlement X become a new urban class?  

- From semi-dense to high-density city 

- Small settlement area to built-up area 

- When a hamlet area or small settlement area first appeared

#### 3. Is there a discernable pattern between the spatio-temporal distribution of economic density and population density? 

#### 4. How much of urban space attributable to City X is outside of the administrative limits of the city? 

- When did this fragment(s) appear? 

- Which district/municipality/authority has purview over the fragment(s)? 

#### 5. For the questions above, how does the answer change based on different understandings of urban limits? 

- Scenario A: where "city" is delimited by an official administrative boundary 

- Scenario B: where "city" includes all contiguous (and near-contiguous) built up area 

#### 6. Subnational and inter-national comparisons. Examples: 

- Compare the rates (pop, build-up, economic…) of the fastest growing settlement of each ADM1 region. 

- Which African metropoles experience the most vs. the least fragmentation? Is there a confluence between amount of urban fragmentation and rate of densification? 

### Datasets
1. Most up-to-date administrative boundaries: **ADM3.**
2. Built-up area, yearly: **World Settlement Footprint Evolution.** Resolution: 30m.
3. Settlement types: **GRID3 settlement extents.** Captured between 2009-2019.
4. Population, yearly: **WorldPop.** UN-adjusted, unconstrained. Resolution: 100m.
5. Nighttime lights, yearly: **Harmonization of DMSP and VIIRS.** Resolution: 1km.
6. City names: **UCDB, Africapolis, and GeoNames.**

---

---

## 1. PREPARE WORKSPACE

### 1.1 Off-script

##### Off-script: Create folders in working directory.
> *ADM
<br>Buildup
<br>PlaceName
<br>Population
<br>Settlement
<br>NTL*

##### Off-script: Download datasets (as shapefile, GeoJSON, or tif where possible) and place or extract into corresponding folder:
- ADM: *Sourced internally.*
- Buildup: https://download.geoservice.dlr.de/WSF_EVO/files/
- PlaceName: 
    - GeoNames: (file: cities500.zip) https://download.geonames.org/export/dump/
    - Africapolis: https://africapolis.org/en/data
    - Urban Centres Database: https://ghsl.jrc.ec.europa.eu/ghs_stat_ucdb2015mt_r2019a.php
- Population: https://hub.worldpop.org/geodata/listing?id=69
- Settlement: https://data.grid3.org/datasets/GRID3::grid3-cameroon-settlement-extents-version-01-01-/explore
- NTL: https://figshare.com/articles/dataset/Harmonization_of_DMSP_and_VIIRS_nighttime_light_data_from_1992-2018_at_the_global_scale/9828827/2

##### Other off-script:
- Convert GeoNames from .txt file to shape (delimiter = tab, header rows = 0) and rename fields.
- If necessary, mosaic WSFE rasters that cover the area of interest to create a single file.

### 1.2 Load all packages.

In [1]:
# Note: Most but not all of these packages were used in final form. 

import os, sys, glob, re, time
from os.path import exists

import geopandas as gpd 
import pandas as pd
from shapely.geometry import Point, LineString, Polygon, shape, MultiPoint
from shapely.ops import cascaded_union
from shapely.validation import make_valid, explain_validity
import shapely.wkt
import scipy

#from xrspatial import zonal_stats 
#import xarray as xr 
import numpy as np 
import fiona, rioxarray
import rasterio
from rasterio.plot import show
from rasterio import features
from rasterio.features import shapes
from rasterio import mask
from osgeo import gdal, osr, ogr, gdal_array
import matplotlib.pyplot as plt

### 1.3 Set workspace.

In [88]:
ProjectFolder = os.getcwd()
ResultsFolder = os.path.join(ProjectFolder, 'Results')
print(ProjectFolder)
print(ResultsFolder)

C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon
C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon\Results


---

## 2. PREPARE BUILDUP, SETTLEMENT, AND ADMIN DATASETS
Projection for all datasets: Africa Albers Equal Area Conic

### 2.1 WSFE: Check contents and change NoData value as necessary.

In [None]:
WSFE = rasterio.open(os.path.join(ProjectFolder, "Buildup", os.listdir('Buildup/')[0]))
print(WSFE) # WSFE values are all 4 digits long (1985-2015)
print(dir(WSFE))
print(WSFE.crs)
print(WSFE.dtypes)
NoDataValue = WSFE.nodatavals
print(NoDataValue)
print(WSFE.read(1).min(), WSFE.read(1).mean(), np.median(WSFE.read(1)), WSFE.read(1).max())

# If NoDataValue != 0, change to 0.

##### Off-script: Run this block in QGIS.

In [None]:
# # OPEN QGIS FOR THIS PORTION. CODE DOCUMENTED HERE.
# Change NoData value to zero, as this won't interfere with a possible value of 99999 in GRID3 and ADM.
# Then make sure there are no values above 2015 (such as 99999) or below 1985 in the dataset by reclassifying them as NoData.
# Was having trouble with rasterio & gdal here, so moved to QGIS.

# processing.run("native:reclassifybytable", {'INPUT_RASTER':'C:/Users/grace/GIS/povertyequity/urban_growth/WSFE_TCD.tif','RASTER_BAND':1,'TABLE':['2016','','0','','1984','0'],'NO_DATA':0,'RANGE_BOUNDARIES':0,'NODATA_FOR_MISSING':False,'DATA_TYPE':5,'OUTPUT':'C:/Users/grace/GIS/povertyequity/urban_growth/WSFE.tif'})

### 2.2 Prepare raster locations for GRID3 and Admin areas

In [None]:
ADM_vec = gpd.read_file(glob.glob('ADM/*.shp')[0])[['geometry']].to_crs("ESRI:102022") # This glob() function pulls the first file ([0]) in the ADM folder which ended in '.shp'
GRID3_vec = gpd.read_file(glob.glob('Settlement/*.shp')[0])[['type','geometry']].to_crs("ESRI:102022")
ADM_vec['ADM_ID'] = ADM_vec.index
GRID3_vec['G3_ID'] = GRID3_vec.index

In [None]:
ADM_out = './ADM/ADM.tif'
GRID3_out = './Settlement/GRID3.tif'

print(ADM_vec.info(), "\n\n", 
      ADM_vec.crs, "\n\n", 
      len(str(ADM_vec['ADM_ID'].max()))) # We need to know how many digits need to be allocated to each dataset in the "join" serial.
print(GRID3_vec.info(), "\n\n", 
      GRID3_vec.crs, "\n\n", 
      len(str(GRID3_vec['G3_ID'].max())))

---

## 3. WSFE AND ADM; GRID3 AND ADM
RASTERIZE: Bring ADM and GRID3 into raster space.

RASTER MATH: "Join" ADM ID onto GRID3 and onto WSFE by creating unique concatenation string.

VECTORIZE: Bring joined data into vector space.

VECTOR MATH: Split unique ID from raster math step into separate columns.

### 3.1 Rasterize admin areas and GRID3.

In [None]:
# Copy and update the metadata from WSFE for the output
meta = WSFE.meta.copy()
meta.update(compress='lzw')

In [None]:
with rasterio.open(ADM_out, 'w+', **meta) as out:
    out_arr = out.read(1)

    # this is where we create a generator of geom, value pairs to use in rasterizing
    shapes = ((geom,value) for geom, value in zip(ADM_vec.geometry, ADM_vec.ADM_ID))

    burned = features.rasterize(shapes=shapes, fill=0, out=out_arr, transform=out.transform, all_touched=False)
    out.write_band(1, burned)
out = None

In [None]:
with rasterio.open(GRID3_out, 'w+', **meta) as out:
    out_arr = out.read(1)

    # this is where we create a generator of geom, value pairs to use in rasterizing
    shapes = ((geom,value) for geom, value in zip(GRID3_vec.geometry, GRID3_vec.G3_ID))

    burned = features.rasterize(shapes=shapes, fill=0, out=out_arr, transform=out.transform, all_touched=False)
    out.write_band(1, burned)
out = None

*Validation: Check the dimensions, type, and basic stats of the three datasets. All should be the same dimension and NoData value.*

In [None]:
CheckContents = gdal.Open(r"ADM/ADM.tif")
print(gdal.GetDataTypeName(CheckContents.GetRasterBand(1).DataType), 
      CheckContents.GetRasterBand(1).GetNoDataValue())

CheckContents =  gdal.Open(r"Settlement/GRID3.tif")
print(gdal.GetDataTypeName(CheckContents.GetRasterBand(1).DataType), 
      CheckContents.GetRasterBand(1).GetNoDataValue())

CheckContents = gdal.Open(os.path.join(ProjectFolder, "Buildup", os.listdir('Buildup/')[0]))
print(gdal.GetDataTypeName(CheckContents.GetRasterBand(1).DataType), 
      CheckContents.GetRasterBand(1).GetNoDataValue())

CheckContents = None

In [None]:
RastersList = [rasterio.open(r"ADM/ADM.tif"), 
               rasterio.open(r"Settlement/GRID3.tif"), 
               rasterio.open(os.path.join(ProjectFolder, "Buildup", os.listdir('Buildup/')[0]))]

for item in RastersList:
    print(item.name, "\nBands= ", item.count, "\nWxH= ", item.width, "x", item.height, "\n\n")

stats = []
for item in RastersList:
    band = item.read(1)
    stats.append({
        'raster': item.name,
        'min': band.min(),
        'mean': band.mean(),
        'median': np.median(band),
        'max': band.max()})

# Show stats for each channel
print("\n", stats)

RastersList = None
band = None

### 3.2 Raster math to "join" admin to GRID3 and to WSFE.
Processing is more rapid when "joining," i.e. creating serial codes out of two datasets, in raster rather than vector space.
Here, we are concatenating the ID fields of the two datasets to create a serial number that we can then split in vector space later to create two ID fields.

*Adding together the values to create join IDs. This is in effect a concatenation of their ID strings, by way of summation. The number of zeros in the calc multiplication corresponds with number of digits of the maximum value in the "B" dataset. (e.g. Chad ADM codes go up 4 digits, so it's calc=(A*10000)+B).*

In [None]:
# # OPEN TERMINAL FOR THIS PORTION. CODE DOCUMENTED HERE.

# Gdal_calc.py # To see info.

# gdal_calc.py -A C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon\Settlement\GRID3.tif -B  C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon\ADM\ADM.tif --outfile=C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon\Settlement\GRID3_ADM.tif --overwrite --calc="(A*1000)+B"
# gdal_calc.py -A C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon\Buildup\WSFE_CMN.tif -B  C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon\ADM\ADM.tif --outfile=C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon\Buildup\WSFE_ADM.tif --overwrite --calc="(A*1000)+B"

# # END TERMINAL-ONLY ASPECT. RETURN HERE FOR NEXT STEPS.

In [None]:
# Validation: check the basic statistics of the resulting datasets.
RastersList = [rasterio.open(r"Buildup/WSFE_ADM.tif"), 
               rasterio.open(r"Settlement/GRID3_ADM.tif")]
for item in RastersList:
    print(item.name, "\nBands= ", item.count, "\nWxH= ", item.width, "x", item.height, "\n\n")
    
stats = []
for item in RastersList:
    band = item.read(1)
    stats.append({
        'raster': item.name,
        'min': band.min(),
        'mean': band.mean(),
        'median': np.median(band),
        'max': band.max()})

# Show stats for each channel
print("\n", stats)

RastersList = None
band = None

### 3.3 Vectorize "joined" layers.

##### Off-script: Run this block in QGIS.

In [None]:
# OPEN QGIS FOR THIS PORTION. CODE DOCUMENTED HERE.

# Due to dtype errors with both gdal and rasterio here, I decided to run the raster to polygon function in QGIS instead.
# It is possible to run QGIS functions within a Jupyter Notebook, but I ran it within the GUI. Arc or R are other options.
# Command line code here.

# processing.run("gdal:polygonize", {'INPUT':'C:/Users/grace/GIS/povertyequity/urban_growth/Cameroon/Settlement/GRID3_ADM.tif','BAND':1,'FIELD':'DN','EIGHT_CONNECTEDNESS':False,'EXTRA':'','OUTPUT':'C:/Users/grace/GIS/povertyequity/urban_growth/Cameroon/Settlement/GRID3_ADM.shp'})
# processing.run("gdal:polygonize", {'INPUT':'C:/Users/grace/GIS/povertyequity/urban_growth/Cameroon/Buildup/WSFE_ADM.tif','BAND':1,'FIELD':'DN','EIGHT_CONNECTEDNESS':False,'EXTRA':'','OUTPUT':'C:/Users/grace/GIS/povertyequity/urban_growth/Cameroon/Buildup/WSFE_ADM.shp'})

### 3.4 Vector math to split raster strings into admin area, GRID3, and WSFE year assignments.

In [4]:
# Load newly created vectorized datasets.
GRID3_ADM = gpd.read_file(r"Settlement/GRID3_ADM.shp")
WSFE_ADM = gpd.read_file(r"Buildup/WSFE_ADM.shp")
print(GRID3_ADM.info(), "\n\n", GRID3_ADM.sample(10), "\n\n", GRID3_ADM.crs, "\n\n", 
      WSFE_ADM.info(), "\n\n", WSFE_ADM.sample(10), "\n\n", WSFE_ADM.crs)

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 210151 entries, 0 to 210150
Data columns (total 2 columns):
 #   Column    Non-Null Count   Dtype   
---  ------    --------------   -----   
 0   DN        210151 non-null  int64   
 1   geometry  210151 non-null  geometry
dtypes: geometry(1), int64(1)
memory usage: 3.2 MB
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 584286 entries, 0 to 584285
Data columns (total 2 columns):
 #   Column    Non-Null Count   Dtype   
---  ------    --------------   -----   
 0   DN        584286 non-null  int64   
 1   geometry  584286 non-null  geometry
dtypes: geometry(1), int64(1)
memory usage: 8.9 MB
None 

                DN                                           geometry
9868    183794257  POLYGON ((-1098165.459 1276979.817, -1098137.2...
130798   71719143  POLYGON ((-1494360.099 672422.887, -1494331.85...
69733   117481203  POLYGON ((-1349141.459 851390.454, -1349056.71...
202997    1400005  POLYGON ((-1468146.736 382426.5

In [5]:
# Split serial back into separate dataset fields.
# For Cameroon: WSFE and ADM: 4+3=7 digits. GRID3 and ADM: 6+3=9 digits.
GRID3_ADM['gridstring'] = GRID3_ADM['DN'].astype(str).str.zfill(9)
WSFE_ADM['gridstring'] = WSFE_ADM['DN'].astype(str).str.zfill(7)

GRID3_ADM['Sett_ID'] = GRID3_ADM['gridstring'].str[:-3].astype(int) # Remove the last 4 digits to get the GRID3 portion.
GRID3_ADM['ADM_ID'] = GRID3_ADM['gridstring'].str[-3:].astype(int) # Keep only the last 4 digits to get the ADM portion.
WSFE_ADM['year'] = WSFE_ADM['gridstring'].str[:-3].astype(int)
WSFE_ADM['ADM_ID'] = WSFE_ADM['gridstring'].str[-3:].astype(int)

print(GRID3_ADM.sample(10), WSFE_ADM.sample(10))

               DN                                           geometry  \
8410    181666243  POLYGON ((-1169997.982 1283365.870, -1169913.2...   
151305   66537131  POLYGON ((-1501478.382 604673.816, -1501393.64...   
16445   179978243  POLYGON ((-1158868.602 1253427.293, -1158783.8...   
181164   35849335  POLYGON ((-1276122.555 481030.969, -1276009.56...   
14020   169809271  POLYGON ((-1032321.333 1263164.444, -1032151.8...   
39433   158862239  POLYGON ((-1150648.679 1164243.844, -1150535.6...   
209234    2132075  POLYGON ((-1431114.712 275191.396, -1431058.21...   
124427   73925140  POLYGON ((-1466338.918 692877.227, -1466282.42...   
2889    193386274  POLYGON ((-1039015.910 1406281.592, -1038959.4...   
75777   118213206  POLYGON ((-1266377.285 827806.316, -1266349.03...   

       gridstring  Sett_ID  ADM_ID  
8410    181666243   181666     243  
151305  066537131    66537     131  
16445   179978243   179978     243  
181164  035849335    35849     335  
14020   169809271   16

In [6]:
# Dissolve any features that have the same G3 and ADM values so that we have a single unique feature per settlement.
# Note: we do NOT want to dissolve the WSFE features. Distinct features for noncontiguous builtup areas of the same year is necessary to separate them in the Near tool step.
GRID3_ADM = GRID3_ADM.dissolve(by=['Sett_ID', 'ADM_ID'], as_index=False)
print(GRID3_ADM.info(), GRID3_ADM.head())

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 205788 entries, 0 to 205787
Data columns (total 5 columns):
 #   Column      Non-Null Count   Dtype   
---  ------      --------------   -----   
 0   Sett_ID     205788 non-null  int64   
 1   ADM_ID      205788 non-null  int64   
 2   geometry    205788 non-null  geometry
 3   DN          205788 non-null  int64   
 4   gridstring  205788 non-null  object  
dtypes: geometry(1), int64(3), object(1)
memory usage: 7.9+ MB
None    Sett_ID  ADM_ID                                           geometry    DN  \
0        1     354  POLYGON ((-1571983.289 271555.771, -1571870.30...  1354   
1        2     354  POLYGON ((-1572915.445 272093.211, -1572802.45...  2354   
2        3     354  POLYGON ((-1571390.099 273768.760, -1571361.85...  3354   
3        4     354  POLYGON ((-1575062.229 274274.586, -1574949.24...  4354   
4        5     354  POLYGON ((-1573395.647 277214.700, -1573226.16...  5354   

  gridstring  
0  000001354  
1  00000

In [7]:
# Remove features where year, settlement, or admin area = 0.
# This was supposed to be resolved earlier with the gdal_calc NoDataValue parameter.

print("Before: WSFE %s and GRID3 %s\n" % (WSFE_ADM.shape, GRID3_ADM.shape))
WSFE_ADM = WSFE_ADM.loc[(WSFE_ADM["year"] != 0) & (WSFE_ADM["ADM_ID"] != 0)] # Since we change the datatype to integer, no need to include all digits. Otherwise, it would need to be: != '0000'
GRID3_ADM = GRID3_ADM.loc[(GRID3_ADM["Sett_ID"] != 0) & (GRID3_ADM["ADM_ID"] != 0)]
print("After: WSFE %s and GRID3 %s\n" % (WSFE_ADM.shape, GRID3_ADM.shape))

Before: WSFE (584286, 5) and GRID3 (205788, 5)

After: WSFE (584286, 5) and GRID3 (205788, 5)



In [8]:
# The Bounded_ID is our new unique settlement identifier for subsequent matching steps.
GRID3_ADM['Bounded_ID'] = GRID3_ADM.index
WSFE_ADM['WSFE_ID'] = WSFE_ADM.index
GRID3_ADM = GRID3_ADM[['Sett_ID', 'Bounded_ID', 'ADM_ID', 'geometry']]
WSFE_ADM = WSFE_ADM[['year', 'ADM_ID', 'geometry']]

---

## 4. UNIQUE SETTLEMENTS FROM WSFE AND GRID3: TWO VERSIONS

Note that there are 2 versions here, so that we can create a fragmentation index:
1. **Boundless, aka boundary-agnostic settlements**: Unique settlements are linked to GRID3 settlement IDs. Administrative areas do not influence the extents of the settlement.
2. **Bounded, aka politically-defined settlements**: Settlements in the Boundless dataset which spread across more than one administrative area are split into separate settlements in the Bounded dataset. The largest polygon after the split is considered the "principal" settlement, and polygons in other admin areas are considered "fragments." By dividing the fragment area(s) of the Bounded settlement by the area of the Boundless settlement, we can acquire a fragmentation index for each locality.

### 4.1 BOUNDED SETTLEMENTS: Near Join by ADM group.

In [10]:
# The sharding step below doesn't work if any ADM group contains features from only one of the two datasets.
WSFE_u = sorted(WSFE_ADM.ADM_ID.unique().tolist())
GRID3_u = sorted(GRID3_ADM.ADM_ID.unique().tolist())

not_matching = list(set(GRID3_u).symmetric_difference(set(WSFE_u)))
print(not_matching) # Validate: If there are many ADM_IDs in this list, investigate why GRID3 or WSFE is missing in so many areas.

# Take only the features that share an ADM with at least one GRID3 feature.
WSFE_matching = WSFE_ADM[~WSFE_ADM["ADM_ID"].isin(not_matching)] 
GRID3_matching = GRID3_ADM[~GRID3_ADM["ADM_ID"].isin(not_matching)]

WSFE_u = sorted(WSFE_matching.ADM_ID.unique().tolist())
GRID3_u = sorted(GRID3_matching.ADM_ID.unique().tolist())

not_matching = list(set(GRID3_u).symmetric_difference(set(WSFE_u)))
print(not_matching) # This should now be empty.

del WSFE_u, GRID3_u, not_matching, WSFE_ADM, GRID3_ADM

[43]
[]


In [11]:
# Shard the dataframe whose variables we want to join into a dict
shards = {k:d for k, d in GRID3_matching.groupby('ADM_ID', as_index=False)}

# Take the dataframe whose geometry we want to retain.
# Group by ADM, then sjoin_nearest among the smaller dataframe's matching ADM shard
Bounded = WSFE_matching.groupby('ADM_ID', as_index=False).apply(
    lambda d: gpd.sjoin_nearest(
    d, shards[d['ADM_ID'].values[0]], 
        how='left', 
        max_distance=500))

print(Bounded.info())
print(Bounded.sample(10))

<class 'geopandas.geodataframe.GeoDataFrame'>
MultiIndex: 586113 entries, (0, 560868) to (356, 3948)
Data columns (total 7 columns):
 #   Column        Non-Null Count   Dtype   
---  ------        --------------   -----   
 0   year          586113 non-null  int32   
 1   ADM_ID_left   586113 non-null  int32   
 2   geometry      586113 non-null  geometry
 3   index_right   581895 non-null  float64 
 4   Sett_ID       581895 non-null  float64 
 5   Bounded_ID    581895 non-null  float64 
 6   ADM_ID_right  581895 non-null  float64 
dtypes: float64(4), geometry(1), int32(2)
memory usage: 50.8 MB
None
            year  ADM_ID_left  \
338 493162  1995          341   
286 229845  2006          289   
344 418376  1999          347   
118 352670  2015          120   
311 305584  1985          314   
177 344591  2007          179   
348 460159  2005          351   
76  582967  2012           78   
169 394489  2006          171   
174 369227  2010          176   

                             

In [12]:
# Now we can dissolve with the WSFE years, now that we can group them by their settlement ID.
Bounded = Bounded.dissolve(by=['year', 'Bounded_ID'], as_index=False)
print(Bounded.info(), Bounded.sample(10))

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 86839 entries, 0 to 86838
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype   
---  ------        --------------  -----   
 0   year          86839 non-null  int64   
 1   Bounded_ID    86839 non-null  float64 
 2   geometry      86839 non-null  geometry
 3   ADM_ID_left   86839 non-null  int32   
 4   index_right   86839 non-null  float64 
 5   Sett_ID       86839 non-null  float64 
 6   ADM_ID_right  86839 non-null  float64 
dtypes: float64(4), geometry(1), int32(1), int64(1)
memory usage: 4.3 MB
None        year  Bounded_ID                                           geometry  \
11200  1985    191769.0  MULTIPOLYGON (((-1138897.861 1310016.578, -113...   
7834   1985     61487.0  POLYGON ((-1540628.944 682539.408, -1540600.69...   
19188  1996    159211.0  POLYGON ((-1217735.679 1185867.906, -1217707.4...   
81407  2014    179149.0  MULTIPOLYGON (((-1152851.957 1282006.463, -115...   
78018  2014      578

In [15]:
# Clean up and save to file.
Bounded = Bounded.rename(
    columns={"ADM_ID_left": "ADM_ID"})[['ADM_ID', 'year', 'Bounded_ID', 'Sett_ID', 'geometry']]
Bounded.to_file(
    driver='GPKG', filename=r'Results/NonCumulativeSettlements.gpkg', layer='Settlements_Bounded')

In [None]:
del WSFE_matching, GRID3_matching, shards

### 4.2 BOUNDLESS SETTLEMENTS: Simple near join.

In [16]:
# Fragments of any bounded settlement will be combined into a single "boundless" settlement in this version.
# It is based on their "Sett_ID", which is a direct loan from the GRID3 settlement features.
Boundless = Bounded.dissolve(by=['year', 'Sett_ID'], as_index=False)
print(Boundless.info(), Boundless.sample(10))

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 83061 entries, 0 to 83060
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   year        83061 non-null  int64   
 1   Sett_ID     83061 non-null  float64 
 2   geometry    83061 non-null  geometry
 3   ADM_ID      83061 non-null  int32   
 4   Bounded_ID  83061 non-null  float64 
dtypes: float64(2), geometry(1), int32(1), int64(1)
memory usage: 2.9 MB
None        year   Sett_ID                                           geometry  \
80183  2015   33901.0  MULTIPOLYGON (((-1338944.235 411511.506, -1338...   
26068  2000    6321.0  POLYGON ((-1001955.638 252207.926, -1001927.39...   
67007  2012   37880.0  POLYGON ((-1135310.472 389729.374, -1135282.22...   
37216  2003   63066.0  MULTIPOLYGON (((-1422216.857 535027.896, -1422...   
29526  2001    7283.0  POLYGON ((-1083392.195 404366.714, -1083335.70...   
82500  2015  175813.0  MULTIPOLYGON (((-1152908.451 1282

In [17]:
# Clean up and save to file.
Boundless.to_file(driver='GPKG', filename=r'Results/NonCumulativeSettlements.gpkg', layer='Settlements_Boundless')

---

## 5. CUMULATIVE ANNUALIZED SETTLEMENT EXTENTS
DISSOLVE BY YEAR SETS: Create separate feature layers of each cumulative year.

### 5.1 Define study years for each for loop.

In [4]:
# Boundless = gpd.read_file(r'Results/NonCumulativeSettlements.gpkg', layer='Settlements_Boundless')

def CreateList(r1, r2):
    return [item for item in range(r1, r2+1)]

CuStart, CuEnd = Boundless['year'].min(), Boundless['year'].max()
StudyStart, StudyEnd = 1999, Boundless['year'].max()

AllCuYears = CreateList(CuStart, CuEnd) # All years in the WSFE dataset
AllStudyYears = CreateList(StudyStart, StudyEnd) # All years for which there will be growth stats in the present study.
print(AllCuYears, '\n\n', AllStudyYears)

ReversedStudyYears = []
for i in AllStudyYears:
    ReversedStudyYears.insert(0,i)
ReversedStudyYears.remove(StudyEnd)
print('\n\n', ReversedStudyYears)

[1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015] [1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015]


### 5.2 Starting with main Boundless dataset, create a cumulative area feature layer for each year.

In [19]:
# For each year in the growth stats study, we are taking features from all years prior to and including that year, 
# dissolving those features, and exporting as its own file.

for item in AllStudyYears:
    print('Subsetting to cumulative area for year: %s. %s\n' % (item, time.ctime()))
    CuYearSet = Boundless[Boundless['year'].between(
        CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.
    print('Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. %s\n' % time.ctime())
    CuYearDissolve = CuYearSet.dissolve(by='Sett_ID', 
                                        aggfunc={"year": "max", "ADM_ID":"min"}, # Though ADM_ID should be matching every time.
                                        as_index=False)
    print('Write to file. %s\n' % time.ctime())
    CuYearName = ''.join(['Cu', str(item), '_Boundless'])
    CuYearDissolve.to_file(driver='GPKG', filename=r'Results/CumulativeSettlements.gpkg', layer=CuYearName)
    del CuYearSet, CuYearDissolve
print("Done with all years in set. %s" % time.ctime())

Subsetting to cumulative area for year: 1999. Thu Nov 17 14:10:45 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Nov 17 14:10:45 2022



  CuYearSet = Settlements[Settlements['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Nov 17 14:11:10 2022

Subsetting to cumulative area for year: 2000. Thu Nov 17 14:11:17 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Nov 17 14:11:17 2022



  CuYearSet = Settlements[Settlements['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Nov 17 14:11:44 2022

Subsetting to cumulative area for year: 2001. Thu Nov 17 14:11:51 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Nov 17 14:11:51 2022



  CuYearSet = Settlements[Settlements['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Nov 17 14:12:20 2022

Subsetting to cumulative area for year: 2002. Thu Nov 17 14:12:27 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Nov 17 14:12:27 2022



  CuYearSet = Settlements[Settlements['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Nov 17 14:12:56 2022

Subsetting to cumulative area for year: 2003. Thu Nov 17 14:13:03 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Nov 17 14:13:03 2022



  CuYearSet = Settlements[Settlements['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Nov 17 14:13:35 2022

Subsetting to cumulative area for year: 2004. Thu Nov 17 14:13:41 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Nov 17 14:13:41 2022



  CuYearSet = Settlements[Settlements['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Nov 17 14:14:13 2022

Subsetting to cumulative area for year: 2005. Thu Nov 17 14:14:20 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Nov 17 14:14:20 2022



  CuYearSet = Settlements[Settlements['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Nov 17 14:14:55 2022

Subsetting to cumulative area for year: 2006. Thu Nov 17 14:15:02 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Nov 17 14:15:02 2022



  CuYearSet = Settlements[Settlements['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Nov 17 14:15:38 2022

Subsetting to cumulative area for year: 2007. Thu Nov 17 14:15:44 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Nov 17 14:15:44 2022



  CuYearSet = Settlements[Settlements['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Nov 17 14:16:21 2022

Subsetting to cumulative area for year: 2008. Thu Nov 17 14:16:28 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Nov 17 14:16:28 2022



  CuYearSet = Settlements[Settlements['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Nov 17 14:17:05 2022

Subsetting to cumulative area for year: 2009. Thu Nov 17 14:17:12 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Nov 17 14:17:12 2022



  CuYearSet = Settlements[Settlements['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Nov 17 14:17:50 2022

Subsetting to cumulative area for year: 2010. Thu Nov 17 14:17:56 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Nov 17 14:17:56 2022



  CuYearSet = Settlements[Settlements['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Nov 17 14:18:36 2022

Subsetting to cumulative area for year: 2011. Thu Nov 17 14:18:42 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Nov 17 14:18:43 2022



  CuYearSet = Settlements[Settlements['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Nov 17 14:19:25 2022

Subsetting to cumulative area for year: 2012. Thu Nov 17 14:19:31 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Nov 17 14:19:31 2022



  CuYearSet = Settlements[Settlements['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Nov 17 14:20:14 2022

Subsetting to cumulative area for year: 2013. Thu Nov 17 14:20:21 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Nov 17 14:20:21 2022



  CuYearSet = Settlements[Settlements['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Nov 17 14:21:03 2022

Subsetting to cumulative area for year: 2014. Thu Nov 17 14:21:09 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Nov 17 14:21:09 2022



  CuYearSet = Settlements[Settlements['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Nov 17 14:21:55 2022

Subsetting to cumulative area for year: 2015. Thu Nov 17 14:22:01 2022

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Nov 17 14:22:01 2022



  CuYearSet = Settlements[Settlements['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Nov 17 14:22:48 2022

Done with all years in set. Thu Nov 17 14:22:54 2022


##### Join area information from each cumulative layer onto the latest year dataset.

In [21]:
# The latest year in the study contains all settlements. Merge all other years' areas onto this dataset.
SettAreas = gpd.read_file(r'Results/CumulativeSettlements.gpkg', layer=
                          ''.join(['Cu', str(StudyEnd), '_Boundless'])) 
SettAreas['Area15'] = SettAreas['geometry'].area / 10**6
SettAreas = pd.DataFrame(SettAreas).drop(columns='geometry') # We have settlement IDs, so no need to join spatially!


for item in ReversedStudyYears:
    print("Loading cumulative layer for year %s. %s\n" % (item, time.ctime()))
    YearLayer = gpd.read_file(r'Results/CumulativeSettlements.gpkg', layer=''.join(['Cu', str(item), '_Boundless']))
    print("Adding area field and converting to non-spatial dataframe. %s\n" % (time.ctime()))
    AreaYearName = ''.join(['Area', str(item)[2:]])
    YearLayer[AreaYearName] = YearLayer['geometry'].area/ 10**6 
    YearLayer = pd.DataFrame(YearLayer)[['Sett_ID', AreaYearName]]
    print("Merging variables from %s onto our latest year (%s) via table join. %s\n" % (item, StudyEnd, time.ctime()))
    SettAreas = SettAreas.merge(YearLayer, how='left', on='Sett_ID')
print("Done merging annualized areas onto latest year geometries. Saving to file. %s\n" % (time.ctime()))


print(SettAreas.info())
SettAreas.to_csv(os.path.join(ResultsFolder, 'Areas%sto%s.csv' % (StudyStart, StudyEnd)))

Loading cumulative layer for year 2014. Thu Nov 17 14:22:55 2022

Adding area field and converting to non-spatial dataframe. Thu Nov 17 14:22:57 2022

Merging variables from 2014 onto our latest year (2015) via table join. Thu Nov 17 14:22:57 2022

Loading cumulative layer for year 2013. Thu Nov 17 14:22:57 2022

Adding area field and converting to non-spatial dataframe. Thu Nov 17 14:22:58 2022

Merging variables from 2013 onto our latest year (2015) via table join. Thu Nov 17 14:22:58 2022

Loading cumulative layer for year 2012. Thu Nov 17 14:22:58 2022

Adding area field and converting to non-spatial dataframe. Thu Nov 17 14:22:59 2022

Merging variables from 2012 onto our latest year (2015) via table join. Thu Nov 17 14:22:59 2022

Loading cumulative layer for year 2011. Thu Nov 17 14:22:59 2022

Adding area field and converting to non-spatial dataframe. Thu Nov 17 14:23:00 2022

Merging variables from 2011 onto our latest year (2015) via table join. Thu Nov 17 14:23:00 2022

Load

In [None]:
del SettAreas

### 5.3 Repeat for Bounded dataset.

In [8]:
# Bounded = gpd.read_file(r'Results/NonCumulativeSettlements.gpkg', layer='Settlements_Bounded')

for item in AllStudyYears:
    print('Subsetting to cumulative area for year: %s. %s\n' % (item, time.ctime()))
    CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.
    print('Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. %s\n' % time.ctime())
    CuYearDissolve = CuYearSet.dissolve(by='Bounded_ID', 
                                        aggfunc={"year": "max", "ADM_ID":"min", "Sett_ID":"min"}, # Though ADM_ID and Sett_ID should be matching every time.
                                        as_index=False)
    print('Write to file. %s\n' % time.ctime())
    CuYearName = ''.join(['Cu', str(item), '_Bounded'])
    CuYearDissolve.to_file(driver='GPKG', filename=r'Results/CumulativeSettlements.gpkg', layer=CuYearName)
    del CuYearSet, CuYearDissolve
print("Done with all years in set. %s" % time.ctime())

Subsetting to cumulative area for year: 1999. Sat Nov 19 15:42:00 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sat Nov 19 15:42:00 2022



  CuYearSet = Settlements_Bounded[Settlements_Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sat Nov 19 15:42:53 2022

Subsetting to cumulative area for year: 2000. Sat Nov 19 15:43:13 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sat Nov 19 15:43:13 2022



  CuYearSet = Settlements_Bounded[Settlements_Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sat Nov 19 15:44:07 2022

Subsetting to cumulative area for year: 2001. Sat Nov 19 15:44:25 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sat Nov 19 15:44:25 2022



  CuYearSet = Settlements_Bounded[Settlements_Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sat Nov 19 15:45:25 2022

Subsetting to cumulative area for year: 2002. Sat Nov 19 15:45:43 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sat Nov 19 15:45:43 2022



  CuYearSet = Settlements_Bounded[Settlements_Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sat Nov 19 15:46:50 2022

Subsetting to cumulative area for year: 2003. Sat Nov 19 15:47:09 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sat Nov 19 15:47:09 2022



  CuYearSet = Settlements_Bounded[Settlements_Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sat Nov 19 15:48:11 2022

Subsetting to cumulative area for year: 2004. Sat Nov 19 15:48:25 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sat Nov 19 15:48:25 2022



  CuYearSet = Settlements_Bounded[Settlements_Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sat Nov 19 15:49:32 2022

Subsetting to cumulative area for year: 2005. Sat Nov 19 15:49:50 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sat Nov 19 15:49:50 2022



  CuYearSet = Settlements_Bounded[Settlements_Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sat Nov 19 15:51:01 2022

Subsetting to cumulative area for year: 2006. Sat Nov 19 15:51:20 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sat Nov 19 15:51:20 2022



  CuYearSet = Settlements_Bounded[Settlements_Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sat Nov 19 15:52:28 2022

Subsetting to cumulative area for year: 2007. Sat Nov 19 15:52:44 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sat Nov 19 15:52:44 2022



  CuYearSet = Settlements_Bounded[Settlements_Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sat Nov 19 15:54:02 2022

Subsetting to cumulative area for year: 2008. Sat Nov 19 15:54:20 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sat Nov 19 15:54:20 2022



  CuYearSet = Settlements_Bounded[Settlements_Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sat Nov 19 15:55:35 2022

Subsetting to cumulative area for year: 2009. Sat Nov 19 15:55:49 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sat Nov 19 15:55:49 2022



  CuYearSet = Settlements_Bounded[Settlements_Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sat Nov 19 15:57:03 2022

Subsetting to cumulative area for year: 2010. Sat Nov 19 15:57:18 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sat Nov 19 15:57:18 2022



  CuYearSet = Settlements_Bounded[Settlements_Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sat Nov 19 15:58:42 2022

Subsetting to cumulative area for year: 2011. Sat Nov 19 15:58:57 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sat Nov 19 15:58:57 2022



  CuYearSet = Settlements_Bounded[Settlements_Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sat Nov 19 16:00:23 2022

Subsetting to cumulative area for year: 2012. Sat Nov 19 16:00:41 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sat Nov 19 16:00:41 2022



  CuYearSet = Settlements_Bounded[Settlements_Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sat Nov 19 16:02:15 2022

Subsetting to cumulative area for year: 2013. Sat Nov 19 16:02:30 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sat Nov 19 16:02:30 2022



  CuYearSet = Settlements_Bounded[Settlements_Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sat Nov 19 16:03:51 2022

Subsetting to cumulative area for year: 2014. Sat Nov 19 16:04:06 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sat Nov 19 16:04:06 2022



  CuYearSet = Settlements_Bounded[Settlements_Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sat Nov 19 16:05:29 2022

Subsetting to cumulative area for year: 2015. Sat Nov 19 16:05:44 2022

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Sat Nov 19 16:05:44 2022



  CuYearSet = Settlements_Bounded[Settlements_Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Sat Nov 19 16:08:00 2022

Done with all years in set. Sat Nov 19 16:08:24 2022


In [10]:
SettAreas = gpd.read_file(r'Results/CumulativeSettlements.gpkg', 
                          layer=''.join(['Cu', str(StudyEnd), '_Bounded']))
SettAreas['Area15'] = SettAreas['geometry'].area / 10**6
SettAreas = pd.DataFrame(SettAreas).drop(columns='geometry')


for item in ReversedStudyYears:
    print("Loading cumulative layer for year %s. %s\n" % (item, time.ctime()))
    YearLayer = gpd.read_file(r'Results/CumulativeSettlements.gpkg', layer=''.join(['Cu', str(item), '_Bounded']))
    print("Adding area field and converting to non-spatial dataframe. %s\n" % (time.ctime()))
    AreaYearName = ''.join(['Area', str(item)[2:]])
    YearLayer[AreaYearName] = YearLayer['geometry'].area/ 10**6 
    YearLayer = pd.DataFrame(YearLayer)[['Bounded_ID', AreaYearName]]
    print("Merging variables from %s onto our latest year (%s) via table join. %s\n" % (item, StudyEnd, time.ctime()))
    SettAreas = SettAreas.merge(YearLayer, how='left', on='Bounded_ID')
print("Done merging annualized areas onto latest year geometries. Saving to file. %s\n" % (time.ctime()))

print(SettAreas.info())
SettAreas.to_csv(os.path.join(ResultsFolder, 'Areas%sto%s_%s.csv' % (StudyStart, StudyEnd, 'Bounded')))

Loading cumulative layer for year 2014. Sat Nov 19 16:08:30 2022

Adding area field and converting to non-spatial dataframe. Sat Nov 19 16:08:35 2022

Merging variables from 2014 onto our latest year (2015) via table join. Sat Nov 19 16:08:36 2022

Loading cumulative layer for year 2013. Sat Nov 19 16:08:36 2022

Adding area field and converting to non-spatial dataframe. Sat Nov 19 16:08:41 2022

Merging variables from 2013 onto our latest year (2015) via table join. Sat Nov 19 16:08:41 2022

Loading cumulative layer for year 2012. Sat Nov 19 16:08:41 2022

Adding area field and converting to non-spatial dataframe. Sat Nov 19 16:08:46 2022

Merging variables from 2012 onto our latest year (2015) via table join. Sat Nov 19 16:08:46 2022

Loading cumulative layer for year 2011. Sat Nov 19 16:08:46 2022

Adding area field and converting to non-spatial dataframe. Sat Nov 19 16:08:51 2022

Merging variables from 2011 onto our latest year (2015) via table join. Sat Nov 19 16:08:51 2022

Load

In [None]:
del SettAreas

### 5.4 One settlement geofile to rule them all. ...and in the Sett_ID bind them.
The annualized values can be stored as distinct non-spatial dataframes. Their Sett_IDs will be used to join onto this geoversion with place names for the summary stats.

In [None]:
Settlements = gpd.read_file('AnnualSettlements.gpkg', 
                           layer=''.join(['Cu', str(StudyEnd), '_Boundless']))[['Sett_ID', 'ADM_ID', 'geometry']]
print(Settlements.info())
Settlements.to_file(driver='GPKG', 
                       filename=r'Results/SETTLEMENTS.gpkg', 
                       layer='SETTLEMENTS')

---

## 6. PLACE NAMES
Join urban place names from UCDB, Africapolis, and GeoNames onto the settlement vectors.

### 6.1 Load placename datasets, filter, and project.

In [85]:
# Load, pull name field, rename, and reproject to match the catchments CRS.
UCDB = gpd.read_file('PlaceName/GHS_STAT_UCDB2015MT_GLOBE_R2019A_V1_2.gpkg', 
                     layer=0)[['UC_NM_MN', 'geometry']].rename(
    columns={"UC_NM_MN": "UCDB_Name"}).to_crs("ESRI:102022")

Africapolis = gpd.read_file('PlaceName/AFRICAPOLIS2020.shp')[['agglosName', 'geometry']].rename(
    columns={"agglosName": "Afpl_Name"}).to_crs("ESRI:102022")

GeoNames = gpd.read_file('PlaceName/GeoNames.gpkg', 
                         layer=0)[['GeoName', 'geometry']].to_crs("ESRI:102022")

print(UCDB.info(), Africapolis.info(), GeoNames.info())

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 13135 entries, 0 to 13134
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype   
---  ------     --------------  -----   
 0   UCDB_Name  13135 non-null  object  
 1   geometry   13135 non-null  geometry
dtypes: geometry(1), object(1)
memory usage: 205.4+ KB
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 7720 entries, 0 to 7719
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype   
---  ------     --------------  -----   
 0   Afpl_Name  7720 non-null   object  
 1   geometry   7720 non-null   geometry
dtypes: geometry(1), object(1)
memory usage: 120.8+ KB
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 199390 entries, 0 to 199389
Data columns (total 2 columns):
 #   Column    Non-Null Count   Dtype   
---  ------    --------------   -----   
 0   GeoName   199390 non-null  object  
 1   geometry  199390 non-null  geometry
dtypes: geometry(1), object(1)
memory usage: 3.0+ 

### 6.2 Join placenames onto settlements geodataframe.

In [None]:
SettlementsNamed = gpd.sjoin(Settlements, GeoNames, 
                             how='left', predicate='contains', # Name file is point type, so we can do contain.
                             lsuffix="G3", rsuffix="GN") 
SettlementsNamed = gpd.sjoin(SettlementsNamed, Africapolis, 
                             how='left', predicate='intersects', # Name file is polygon type.
                             lsuffix="G3", rsuffix="Af") 
SettlementsNamed = gpd.sjoin(SettlementsNamed, UCDB, 
                             how='left', predicate='intersects', # Name file is polygon type.
                             lsuffix="G3", rsuffix="UC") 

In [None]:
print(SettlementsNamed.info())
print(SettlementsNamed.sample(10)

In [None]:
print(SettlementsNamed['GeoName'].count(), 
      SettlementsNamed['Afpl_Name'].count(), 
      SettlementsNamed['UCDB_Name'].count())

In [None]:
SettlementsNamed = SettlementsNamed[['year', 'Sett_ID', 'GeoName', 'Afpl_Name', 'UCDB_Name', 'geometry']]

### 6.3 Reduce to single name column.

In [None]:
# Create a single name column where non-named settlements are "UNK" but all others use one of the three name sources.
SettlementsNamed['SettName'] = "UNK"

SettlementsNamed.loc[
    SettlementsNamed['Afpl_Name'].isnan == False, 
    'SettName'] = SettlementsNamed['Afpl_Name']

SettlementsNamed.loc[
    SettlementsNamed['SettName'] == "UNK" & SettlementsNamed['UCDB_Name'].isnan == False, 
    'SettName'] = SettlementsNamed['UCDB_Name']

SettlementsNamed.loc[
    SettlementsNamed['SettName'] == "UNK" & SettlementsNamed['GeoName'].isnan == False, 
    'SettName'] = SettlementsNamed['GeoName']

SettlementsNamed.sample(20)

In [None]:
# Drop geometry and save to file.
SettlementsNamed = pd.DataFrame(SettlementsNamed).drop(columns='geometry')
SettlementsNamed.to_csv(r'Results/PlaceNames.csv')

---

## 6. CREATE FRAGMENTATION INDEX
We are determining what percentage of a settlement's area lies outside of its administrative zone each year.
The index is a range of 0 to 100, i.e. the percent of the settlement area which is fragmented.

For each Sett_ID:
((Area of Boundless settlement - Area of largest Bounded settlement feature) / Area of Boundless settlement) * 100

In [78]:
FragIndices = pd.DataFrame(
    gpd.read_file('AnnualizedOntoLatestYear.gpkg', 
                  layer=('Areas%sto%s_%s' % (StudyStart, StudyEnd, 'Boundless')))).drop(
    columns='geometry') # We can assign geometries back to the settlements after calculating fragmentation.
print('Loaded Boundless dataset, whose settlements will be used as the index of the Fragmentation Index dataset. %s' % time.ctime())
print(FragIndices.info())
Bounded = pd.DataFrame(
    gpd.read_file('AnnualizedOntoLatestYear.gpkg', 
                  layer=('Areas%sto%s_%s' % (StudyStart, StudyEnd, 'Bounded')))).drop(
    columns='geometry')
print('Loaded Bounded dataset, which will factor into the fragmentation calculation. %s' % time.ctime())
print(Bounded.info())

Loaded Boundless dataset, whose settlements will be used as the index of the Fragmentation Index dataset. Sun Nov 20 09:37:43 2022
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13753 entries, 0 to 13752
Data columns (total 20 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   Sett_ID  13753 non-null  float64
 1   year     13753 non-null  int64  
 2   ADM_ID   13753 non-null  int64  
 3   Area15   13753 non-null  float64
 4   Area14   13649 non-null  float64
 5   Area13   13511 non-null  float64
 6   Area12   13388 non-null  float64
 7   Area11   13262 non-null  float64
 8   Area10   13164 non-null  float64
 9   Area09   13059 non-null  float64
 10  Area08   12992 non-null  float64
 11  Area07   12893 non-null  float64
 12  Area06   12801 non-null  float64
 13  Area05   12714 non-null  float64
 14  Area04   12632 non-null  float64
 15  Area03   12569 non-null  float64
 16  Area02   12494 non-null  float64
 17  Area01   12415 non-null  float64


In [79]:
LargestFragments = Bounded.loc[Bounded.groupby(["Sett_ID"])["Area15"].idxmax()] 
print(LargestFragments.info())
print("Filtered the Bounded dataset to only rows where latest year's area is largest for each Sett_ID. %s" % time.ctime())
LargestFragments.columns = LargestFragments.columns.str.replace('Area', 'Largest')
LargestFragments = LargestFragments.drop(columns=['year', 'ADM_ID'])
print("Renamed columns to avoid duplication during merge, and dropped unnecessary columns. %s" % time.ctime())
FragIndices = FragIndices.merge(LargestFragments, how='left', on='Sett_ID')
print(FragIndices.info())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 13753 entries, 0 to 14467
Data columns (total 21 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Bounded_ID  13753 non-null  float64
 1   year        13753 non-null  int64  
 2   ADM_ID      13753 non-null  int64  
 3   Sett_ID     13753 non-null  float64
 4   Area15      13753 non-null  float64
 5   Area14      13649 non-null  float64
 6   Area13      13510 non-null  float64
 7   Area12      13386 non-null  float64
 8   Area11      13260 non-null  float64
 9   Area10      13162 non-null  float64
 10  Area09      13057 non-null  float64
 11  Area08      12989 non-null  float64
 12  Area07      12889 non-null  float64
 13  Area06      12799 non-null  float64
 14  Area05      12712 non-null  float64
 15  Area04      12630 non-null  float64
 16  Area03      12566 non-null  float64
 17  Area02      12491 non-null  float64
 18  Area01      12408 non-null  float64
 19  Area00      12314 non-nul

In [82]:
for item in AllStudyYears:
    YY = str(item)[2:] # 2-digit year
    AreaYY = ''.join(["Area", YY]) # The Boundless area variable name
    LargestYY = ''.join(['Largest', YY]) # The Bounded largest area variable name
    FragYY = ''.join(["Frag", YY]) # Name for the fragmentation index variable
    print("Created names for Year %s's variables and temporary objects. %s" % (item, time.ctime()))
    
    FragIndices[FragYY] = ((FragIndices[AreaYY] - FragIndices[LargestYY]) / FragIndices[AreaYY]) * 100
    FragIndices[FragYY] = (FragIndices[FragYY].fillna(0).replace([np.inf, -np.inf], 0)).astype('int')
    print("Calculated fragmentation index for year %s. %s" % (item, time.ctime()))

FragIndices = FragIndices.loc[:, ~FragIndices.columns.str.startswith('Largest')]
FragIndices = FragIndices.loc[:, ~FragIndices.columns.str.startswith('Area')]

print('Completed fragmentation index calculations for all years and saved to file. %s' % time.ctime())

Created names for Year 1999's variables and temporary objects. Sun Nov 20 09:41:24 2022
Calculated fragmentation index for year 1999. Sun Nov 20 09:41:24 2022
Created names for Year 2000's variables and temporary objects. Sun Nov 20 09:41:24 2022
Calculated fragmentation index for year 2000. Sun Nov 20 09:41:24 2022
Created names for Year 2001's variables and temporary objects. Sun Nov 20 09:41:24 2022
Calculated fragmentation index for year 2001. Sun Nov 20 09:41:24 2022
Created names for Year 2002's variables and temporary objects. Sun Nov 20 09:41:24 2022
Calculated fragmentation index for year 2002. Sun Nov 20 09:41:24 2022
Created names for Year 2003's variables and temporary objects. Sun Nov 20 09:41:24 2022
Calculated fragmentation index for year 2003. Sun Nov 20 09:41:24 2022
Created names for Year 2004's variables and temporary objects. Sun Nov 20 09:41:24 2022
Calculated fragmentation index for year 2004. Sun Nov 20 09:41:24 2022
Created names for Year 2005's variables and te

In [83]:
print(FragIndices.info())
print(FragIndices.sample(15))

Settlements = gpd.read_file('AnnualizedOntoLatestYear.gpkg', 
                  layer=('Areas%sto%s_%s' % (StudyStart, StudyEnd, 'Boundless')))[['Sett_ID', 'geometry']]
Settlements = Settlements.merge(FragIndices, how='left', on='Sett_ID')
Settlements.to_file(driver='GPKG', 
                    filename='AnnualizedOntoLatestYear.gpkg', 
                    layer=('FragIndex%sto%s' % (StudyStart, StudyEnd)))

<class 'pandas.core.frame.DataFrame'>
Int64Index: 13753 entries, 0 to 13752
Data columns (total 21 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Sett_ID     13753 non-null  float64
 1   year        13753 non-null  int64  
 2   ADM_ID      13753 non-null  int64  
 3   Bounded_ID  13753 non-null  float64
 4   Frag99      13753 non-null  int32  
 5   Frag00      13753 non-null  int32  
 6   Frag01      13753 non-null  int32  
 7   Frag02      13753 non-null  int32  
 8   Frag03      13753 non-null  int32  
 9   Frag04      13753 non-null  int32  
 10  Frag05      13753 non-null  int32  
 11  Frag06      13753 non-null  int32  
 12  Frag07      13753 non-null  int32  
 13  Frag08      13753 non-null  int32  
 14  Frag09      13753 non-null  int32  
 15  Frag10      13753 non-null  int32  
 16  Frag11      13753 non-null  int32  
 17  Frag12      13753 non-null  int32  
 18  Frag13      13753 non-null  int32  
 19  Frag14      13753 non-nul

In [84]:
Settlements.sample(5)

Unnamed: 0,Sett_ID,geometry,year,ADM_ID,Bounded_ID,Frag99,Frag00,Frag01,Frag02,Frag03,...,Frag06,Frag07,Frag08,Frag09,Frag10,Frag11,Frag12,Frag13,Frag14,Frag15
221,766.0,"MULTIPOLYGON (((-1515714.950 357166.824, -1515...",1985,4,655.0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
12485,191351.0,"MULTIPOLYGON (((-1061754.872 1428537.936, -106...",2015,279,195070.0,3,3,3,3,2,...,3,3,3,4,4,4,5,7,7,7
244,832.0,"POLYGON ((-1517212.050 367694.328, -1517183.80...",1985,4,728.0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3176,12922.0,"MULTIPOLYGON (((-1599213.550 492380.440, -1599...",2015,113,13434.0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
10243,79589.0,"MULTIPOLYGON (((-1578734.360 682001.968, -1578...",2015,164,81596.0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


---

## 7. PREPARE YEARLY DATASETS
Population, nighttime lights, and any other annualized rasters

### 7.1 Buffer the area of the Boundless dataset's latest year to mask raster data in next section.
The Bounded dataset would also be fine for our purposes here. The buffer is dissolved to a single feature to be used for its total extents, which are identical between Bounded & Boundless datasets.

In [7]:
# Create buffer layer(s) to use as maximum distance for Near joins.

Distance = 2000

print('Creating buffer layer. %s' % time.ctime())
BufferLayer = gpd.read_file('AnnualSettlements.gpkg', layer=''.join(['Cu', str(StudyEnd), '_Boundless']))
BufferLayer['geometry'] = BufferLayer['geometry'].apply(
    make_valid).buffer(Distance) #.dissolve() # make_valid is a workaround for any null geometries.
print('Finished buffer layer creation. %s' % time.ctime())
BufferFileName = ''.join(['Buff', str(Distance/1000), 'km_', str(StudyEnd)])
BufferLayer.to_file(driver='GPKG', filename='Catchment.gpkg', layer=BufferFileName)
print('Saved to file. %s' % time.ctime())

Creating buffer layer. Thu Nov 17 16:42:08 2022
Finished buffer layer creation. Thu Nov 17 18:05:13 2022
Saved to file. Thu Nov 17 18:05:18 2022


### 7.2 Population: Reproject and reclassify with settlement buffer mask.
Reclassify so that we only need to work with cells within X distance of settlements.

In [None]:
ProjCRS = gdal.WarpOptions(dstSRS='ESRI:102022')
AnnualizedSourceFiles = [i for i in os.listdir('Population/') if i.endswith('.tif')]

with fiona.open("Catchment.gpkg", mode="r", layer=BufferFileName) as shapefile:
    MaskGeom = [feature["geometry"] for feature in shapefile] # Identify the bounding areas of the mask.
# Mask_out = './LatestYearBuffer.tif'
AnnualizedSourceFiles

In [None]:
# This code block changes each annual population raster's projection (gdal.Warp()), 
# then masks it to within a specified distance of the settlements (rasterio.mask.mask()).

for YearFile in AnnualizedSourceFiles:
    InputRasterName = os.path.join(ProjectFolder, "Population", YearFile)
    Year = str(re.sub(r'[^0-9]', '', YearFile))
    InputRasterObject = gdal.Open(InputRasterName)
    TempOutputName = "Wpop_" + Year + "_albers.tif"
    TempOutputPath = os.path.join(ProjectFolder, "Population", TempOutputName)
    if exists(TempOutputPath):
        pass
    else:
        # Reproject to same CRS as settlements.
        Warp = gdal.Warp(TempOutputPath, # Where to store the warped raster
                     InputRasterObject, # Which raster to warp
                     format='GTiff', 
                     options=ProjCRS) # Reproject to Africa Albers Equal Area Conic
        print('Finished gdal.Warp() for year %s. %s \n' % (Year, time.ctime()))
        
        Warp = None # Close the files
        InputRasterObject = None

        # Reclassify as nodata if outside settlement buffer zones.
        with rasterio.open(TempOutputPath) as InputRasterObject:
            MaskedOutputRaster, OutTransform = rasterio.mask.mask(InputRasterObject, MaskGeom) # Anything outside the mask is reclassed to the raster's NoData value.
            OutMetaData = InputRasterObject.meta.copy()
        print('Finished rasterio.mask.mask() for year %s. %s \n' % (Year, time.ctime()))
            
        OutMetaData.update({"driver": "GTiff",
                         "height": MaskedOutputRaster.shape[1],
                         "width": MaskedOutputRaster.shape[2],
                         "transform": OutTransform})
        FinalOutputPath = os.path.join(ProjectFolder, "Population", ''.join(['Masked_', Year, '.tif'])) # ''.join([r'Population/', 'Masked_', Year, '.tif']
        with rasterio.open(FinalOutputPath), "w", **OutMetaData) as dest:
            dest.write(MaskedOutputRaster)
        print('Written to file. %s \n' % time.ctime())
    InputRasterObject = None
    
    try:  # Finally, remove the intermediate file from disk
        os.remove(TempOutputPath)
    except OSError:
        pass
    print('Removed intermediate file. %s \n' % time.ctime())

print('\n \n Finished all years in list. %s' % time.ctime())

In [None]:
print(os.listdir('Population/'))

### 7.3 Convert each annualized raster to .xyz where cell centers are stored as x and y. 
Similar to .csv.

In [None]:
AnnualizedMaskedFiles = [i for i in os.listdir('Population/') if i.startswith('Masked') and i.endswith('.tif')]
AnnualizedMaskedFiles

In [None]:
for YearFile in AnnualizedMaskedFiles:
    InputRasterName = os.path.join(ProjectFolder, "Population", YearFile)
    Year = str(re.sub(r'[^0-9]', '', YearFile))
    InputRasterObject = gdal.Open(InputRasterName)
    XYZOutputPath = r'Population/{}'.format(
        YearFile.replace('.tif', '.xyz')) # New file path will be the same as original, but .tif is replaced with .xyz
    
    # Create an .xyz version of the .tif
    XYZ = gdal.Translate(XYZOutputPath, # Specify a destination path
                         InputRasterObject, # Input is the masked .tif file
                         format='XYZ', 
                         creationOptions=["ADD_HEADER_LINE=YES"])
    print('Finished gdal.Translate() for year %s. %s \n' % (Year, time.ctime()))
    
    InputRasterObject = None # Close the files
    XYZ = None 

### 7.4 Create spatial objects (gdf) from the x,y fields.

In [None]:
# We will remove all vectorized cells that contain the raster's NoData value before Near joining to settlements.
NoDataVal = -99999 

# If starting script from this section:
ADM_vec = gpd.read_file(glob.glob('ADM/*.shp')[0])[['geometry']].to_crs("ESRI:102022")
ADM_vec['ADM_ID'] = ADM_vec.index

AnnualizedXYZFiles = [i for i in os.listdir('Population/') if i.startswith('Masked') and i.endswith('.xyz')]
AnnualizedXYZFiles

In [None]:
for YearFile in AnnualizedXYZFiles:
    InputXYZName = os.path.join(ProjectFolder, "Population", YearFile)
    Year = str(re.sub(r'[^0-9]', '', YearFile))
    InputXYZ = pd.read_table(InputXYZName, delim_whitespace=True)
    InputXYZ = InputXYZ.loc[InputXYZ['Z'] != NoDataVal] # Subset to only the features that have a raster value.
    print('Loaded XYZ file as a pandas dataframe, year %s. %s \n' % (Year, time.ctime()))
    OutputSpatialObject = gpd.GeoDataFrame(InputXYZ, 
                                           geometry = gpd.points_from_xy(InputXYZ['X'], InputXYZ['Y']), 
                                           crs = 'ESRI:102022')
    print('Created geodataframe from non-NoData points, year %s. %s \n' % (Year, time.ctime()))
    
    # Add ADM field for thoroughness.
    OutputSpatialObject = gpd.sjoin(OutputSpatialObject, ADM_vec, how='left', predicate='within') 
#   OutputSpatialObject = OutputSpatialObject[['Z', 'ADM_ID', 'geometry']]
    
    OutputVectorName = YearFile.replace('.xyz', '')
    OutputSpatialObject.to_file(driver='GPKG', filename='Population/PopulationWithinBuffer.gpkg', layer=OutputVectorName)
    print('Exported as geopackage layer, year %s. %s \n' % (Year, time.ctime()))
    
#   # The XYZ files are very large and unnecessary now that we have point objects in a geopackage. Removing XYZ file.
#     try:  
#         os.remove(InputXYZName)
#     except OSError:
#         pass
#     print('Removed intermediate XYZ file. %s \n' % time.ctime())

print('\n \n Finished generating raster value points within settlement catchments, all years. %s' % time.ctime())

---

## 8. JOIN RASTER DATA BY SETTLEMENT GROUP AND PERFORM SUMMARY STATS

### 8.1 Merge Sett_ID onto value vectors.
Merge settlement ID onto the raster data that we vectorized in previous section.

In [None]:
LatestYearSett = gpd.read_file('AnnualSettlements.gpkg', layer='Cu2015')

def CreateList(r1, r2):
    return [item for item in range(r1, r2+1)]

AllValYears = CreateList(2000, 2015) # All years for which there will be growth stats in the present study.
print(AllValYears)

In [None]:
LatestYearSett.info()

In [None]:
# Join nearest settlement ID to each value cell.

for item in AllValYears:
    Year = str(item)
    ValFileName = ''.join(['Masked_', str(item)])
    ValObject = gpd.read_file('Population/PopulationWithinBuffer.gpkg', layer=ValFileName)[['Z', 'ADM_ID', 'geometry']]
    print('Loaded value data for year %s. %s \n' % (Year, time.ctime()))
    
    # Sjoin_nearest: No need to group by ADM this time. 
    ValObject_withID = gpd.sjoin_nearest(ValObject, 
                                    LatestYearSett, 
                                    how='left') # No need for max_distance parameter this time. We've already narrowed down to nearby raster cells.
    
    print('\nJoined settlement ID onto vectorized raster cells for year %s. %s \n' % (Year, time.ctime()))
    print(ValObject_withID.sample(10))
    
    # We no longer need the spatial information of the raster values because we have their unique settlement ID.
    ValObject_withID = pd.DataFrame(ValObject_withID).drop(columns='geometry')
    
    ValObject_withID.to_csv(''.join([r'Population/', ValFileName, '.csv']))
    print('\nExported as table, year %s. %s \n' % (Year, time.ctime()))

print('\n \n Finished generating raster value points within settlement catchments, all years. %s' % time.ctime())

### 8.2 Perform settlement summary stats.

In [None]:
# Aggregate to settlement level, then join onto settlements geodataframe.
AllSummaries =  LatestYearSett

for item in AllValYears:
    Year = str(item)
    AbbrevYear = Year[2:]
    VariableName = ''.join(['PopSum', AbbrevYear])
    ValFileName = ''.join(['Masked_', str(item)])
    ValObject = pd.read_csv(''.join([r'Population/', ValFileName, '.csv']))
    print('Loaded value data for year %s. %s \n' % (Year, time.ctime()))
    
    ValAggregated = ValObject.groupby('Sett_ID', 
                                      as_index=False)['Z'].sum().rename(columns={"Z": VariableName})
    
    print('\nValues aggregated to settlement level, year %s. %s \n' % (Year, time.ctime()))
    print(ValAggregated.sample(10))
    
    AllSummaries = AllSummaries.merge(ValAggregated, how='left', on='Sett_ID')
    print('\nMerged year %s onto latest year settlement feature layer. %s \n' % (Year, time.ctime()))
    print(AllSummaries.sample(10))

print('\n \n Finished merging aggregate raster data onto settlements for all years. %s' % time.ctime())

AllSummaries.to_file(driver='GPKG', filename='AnnualizedOntoLatestYear.gpkg', layer=('Pop%sto%s' % (2000, 2015)))

print('\n \n Written to file. %s' % time.ctime())
AllSummaries.info()

---