# Spatiotemporal Trends in Urbanization
## Overview
This repository investigates year-by-year change in cities and settlements in Central and Western Africa (CWA). The goal is to capture activity for every settlement locality in a country to produce indicators that are high frequency, spatially granular, and timely. The Jupyter Notebook is the primary script used to construct each country's dataset. It tracks population, built-area, and economic and climate indicators across a 16-year timeframe from 2000 to 2015. 

The repository is split into three sections: methodology and notebooks, source data, and outputs. Outputs are organized by country and include growth tables ("urban panel datasets"), charts, and country briefs.


## Datasets
Datasets used to create each African country's urban panel data are as follows:
1. Most up-to-date administrative boundaries.
2. City names: **UCDB, Africapolis, and GeoNames.**
3. Settlement types: **GRID3 settlement extents.** Captured between 2009-2019.
4. Built-up area, yearly: **World Settlement Footprint Evolution.** Resolution: 30m.
5. Population, yearly: **WorldPop.** UN-adjusted, unconstrained. Resolution: 100m.
6. Nighttime lights, yearly: **Harmonization of DMSP and VIIRS.** Resolution: 1km and 500m.
7. Flood extents, by return period: **FATHOM.** Resolution: 90m.


## Accessing Data
Source data are available to the public by providers listed in the previous section, with the exception of flood data. Please note that the source data files in this repository have been fit for purpose and may not cover your area of interest. Some sources are also not global; GRID3 settlement extents are only available for sub-Saharan Africa, and Africapolis names for Africa.

Results from the analysis are currently available for Cameroon and are under development for Central African Republic and four Sahel countries: Burkina Faso, Chad, Mali, and Niger. Results are available in the outputs folder by country. Please contact the CWA Geospatial team to inquire about new locations.
<br>
> **Walker Kosmidou-Bradley**, wkosmidoubradley@worldbank.org
<br>
> **Grace Doherty**, gdoherty2@worldbank.org

## License
Materials under this repository are open-source under an MIT license. The community is invited to test, adapt, and re-purpose materials as needed.

---

## 1. PREPARE WORKSPACE

### 1.1 Off-script

##### Off-script: Create folders in your working directory. (The folder where you are storing this script).
> *ADM
<br>Buildup
<br>PlaceName
<br>Population
<br>Settlement
<br>NTL*

##### Before starting: Download datasets (as shapefile, GeoJSON, or tif where possible) and place or extract into corresponding folder. You can download the cleaned files from our [GitHub Repository](https://github.com/worldbank/Urban_Spatio_Temporal_Trends) or access original sources here:
- ADM: *Varies by source.*
- Buildup: https://download.geoservice.dlr.de/WSF_EVO/files/
- PlaceName: 
    - GeoNames: (file: cities500.zip) https://download.geonames.org/export/dump/
    - Africapolis: https://africapolis.org/en/data
    - Urban Centres Database: https://ghsl.jrc.ec.europa.eu/ghs_stat_ucdb2015mt_r2019a.php
- Population: https://hub.worldpop.org/geodata/listing?id=69
- Settlement: https://data.grid3.org/datasets/GRID3::grid3-cameroon-settlement-extents-version-01-01-/explore
- Nighttime Lights: https://eogdata.mines.edu/products/dmsp/#v4 and https://eogdata.mines.edu/products/vnl/#annual_v2

##### Other off-script:
- Convert GeoNames from .txt file to shape (delimiter = tab, header rows = 0) and rename fields.
- If necessary, mosaic WSFE rasters that cover the area of interest to create a single file.

### 1.2 Load all packages.

In [1]:
# Built-in:
# dir(), print(), range(), format(), int(), len(), list(), max(), min(), zip(), sorted(), sum(), open(), del, = None, try except, with as, for in, if elif else
# Also: list.append(), string.zfill(), list.insert(), list.remove(), string.join(), count(), startswith(), endswith(), contains(), replace()

import os, sys, glob, re, time, subprocess # os.getcwd(), os.path.join(), os.listdir(), os.remove(), time.ctime(), glob.glob()
from os.path import exists # exists()
from functools import reduce # reduce()

import geopandas as gpd # read_file(), GeoDataFrame(), sjoin_nearest(), to_crs(), to_file(), .crs, buffer(), dissolve()
import pandas as pd # .dtypes, Series(), concat(), DataFrame(), read_table(), merge(), to_csv(), .loc[], head(), sample(), astype(), unique(), rename(), between(), drop(), fillna(), idxmax(), isna(), isin(), apply(), info(), sort_values(), notna(), groupby(), value_counts(), duplicated(), drop_duplicates()
from shapely.geometry import Point, LineString, Polygon, shape, MultiPoint
from shapely.ops import cascaded_union
from shapely.validation import make_valid  # in apply(make_valid)
import shapely.wkt

import numpy as np # median(), mean(), tolist(), .inf
import fiona, rioxarray # fiona.open()
import rasterio # open(), write_band(), .name, .count, .width, .height. nodatavals, .meta, update(), copy(), write()
from rasterio.plot import show
from rasterio import features # features.rasterize()
from rasterio.features import shapes
from rasterio import mask # rasterio.mask.mask()
from osgeo import gdal, osr, ogr, gdal_array, gdalconst # Open(), SpatialReference, WarpOptions(), Warp(), GetDataTypeName(), GetRasterBand(), GetNoDataValue(), Translate(), GetProjection(), GetAttrValue()

### 1.3 Set workspace.

In [2]:
ProjectFolder = os.getcwd()
ResultsFolder = os.path.join(ProjectFolder, 'Results')
print(ProjectFolder)
print(ResultsFolder)

Q:\GIS\povertyequity\urban_growth\Cameroon
Q:\GIS\povertyequity\urban_growth\Cameroon\Results


---

## 2. PREPARE BUILDUP, SETTLEMENT, AND ADMIN DATASETS
Projection for all datasets: Africa Albers Equal Area Conic

### 2.1 Prepare raster locations for GRID3 and Admin areas

In [None]:
ADM_vec = gpd.read_file(glob.glob('ADM/*.shp')[0])[['geometry']].to_crs("ESRI:102022") # This glob() function pulls the first file ([0]) in the ADM folder which ended in '.shp'
GRID3_vec = gpd.read_file(glob.glob('Settlement/*.shp')[0])[['type','geometry']].to_crs("ESRI:102022")
ADM_vec['ADM_ID'] = range(0,len(ADM_vec))
GRID3_vec['G3_ID'] = range(0,len(GRID3_vec))
ADM_vec.to_file(driver='GPKG', filename=r'ADM/ADM_warp.gpkg', layer='ADM')
GRID3_vec.to_file(driver='GPKG', filename=r'Settlement/Settlement_warp.gpkg', layer='GRID3')

In [None]:
ADM_vec = gpd.read_file(r'ADM/ADM_warp.gpkg', layer='ADM')
GRID3_vec = gpd.read_file(r'Settlement/Settlement_warp.gpkg', layer='GRID3')

print(ADM_vec.info(), "\n\n", 
      ADM_vec.sample(5),
      ADM_vec.crs, "\n\n", 
      len(str(ADM_vec['ADM_ID'].max()))) # We need to know how many digits need to be allocated to each dataset in the "join" serial.
print(GRID3_vec.info(), "\n\n",
      GRID3_vec.sample(5),
      GRID3_vec.crs, "\n\n", 
      len(str(GRID3_vec['G3_ID'].max())))

### 2.2 Reproject WSFE to project CRS.

In [None]:
WSFE_in = glob.glob('Buildup/*.tif')[0]
WSFE_warp = './Buildup/WSFE_warp.tif'
ProjCRS = gdal.WarpOptions(dstSRS='ESRI:102022')

In [None]:
Warp = gdal.Warp(WSFE_warp, # Where to store the warped raster
                 WSFE_in, # Which raster to warp
                 format='GTiff', 
                 options=ProjCRS) # Reproject to Africa Albers Equal Area Conic
Warp = None
print('Reprojected dataset. %s' % time.ctime())

try:  
    os.remove(os.path.join(ProjectFolder, WSFE_in))
except OSError:
    pass
print('Removed (or skipped if error) intermediate file. %s' % time.ctime())

In [None]:
WSFE = rasterio.open(os.path.join(ProjectFolder, "Buildup", os.listdir('Buildup/')[0]))
print(WSFE) # WSFE values are all 4 digits long (1985-2015)
print(dir(WSFE))
print(WSFE.crs)
print(WSFE.dtypes)
NoDataValue = WSFE.nodatavals
print(NoDataValue)
print(WSFE.read(1).min(), WSFE.read(1).mean(), np.median(WSFE.read(1)), WSFE.read(1).max())

# If NoDataValue != 0, change to 0. (See step 2.1)

---

## 3. WSFE AND ADM; GRID3 AND ADM
RASTERIZE: Bring ADM and GRID3 into raster space.

RASTER MATH: "Join" ADM ID onto GRID3 and onto WSFE by creating unique concatenation string.

VECTORIZE: Bring joined data into vector space.

VECTOR MATH: Split unique ID from raster math step into separate columns.

### 3.1 Rasterize admin areas and GRID3 using WSFE specs.

In [None]:
# Copy and update the metadata from WSFE for the output
meta = WSFE.meta.copy()
meta.update(compress='lzw')
WSFE.meta

ADM_out = './ADM/ADM_rasterized.tif'
GRID3_out = './Settlement/GRID3_rasterized.tif'

In [None]:
print("Rasterizing dataset. %s" % time.ctime())
with rasterio.open(ADM_out, 'w+', **meta) as out:
    out_arr = out.read(1)

    # this is where we create a generator of geom, value pairs to use in rasterizing
    shapes = ((geom,value) for geom, value in zip(ADM_vec.geometry, ADM_vec.ADM_ID))

    burned = features.rasterize(shapes=shapes, fill=0, out=out_arr, transform=out.transform)
    out.write_band(1, burned)
out = None

In [None]:
print("Rasterizing dataset. %s" % time.ctime())
with rasterio.open(GRID3_out, 'w+', **meta) as out:
    out_arr = out.read(1)

    # this is where we create a generator of geom, value pairs to use in rasterizing
    shapes = ((geom,value) for geom, value in zip(GRID3_vec.geometry, GRID3_vec.G3_ID))

    burned = features.rasterize(shapes=shapes, fill=0, out=out_arr, transform=out.transform)
    out.write_band(1, burned)
out = None

*Validation: Check the dimensions, type, and basic stats of the three datasets. All should be the same dimension and NoData value.*

In [None]:
RastersList = [gdal.Open(r"ADM/ADM_rasterized.tif"), 
               gdal.Open(r"Settlement/GRID3_rasterized.tif"),
               gdal.Open(os.path.join(ProjectFolder, "Buildup", os.listdir('Buildup/')[0]))]

for item in RastersList:
    print(gdal.GetDataTypeName(item.GetRasterBand(1).DataType), 
          item.GetRasterBand(1).GetNoDataValue(),
         "\n\n")

RastersList = None

RastersList = [rasterio.open(r"ADM/ADM_rasterized.tif"), 
               rasterio.open(r"Settlement/GRID3_rasterized.tif"), 
               rasterio.open(os.path.join(ProjectFolder, "Buildup", os.listdir('Buildup/')[0]))]

for item in RastersList:
    print(item.name, "\nBands= ", item.count, "\nWxH= ", item.width, "x", item.height, "\n\n")

stats = []
for item in RastersList:
    band = item.read(1)
    stats.append({
        'raster': item.name,
        'min': band.min(),
        'mean': band.mean(),
        'median': np.median(band),
        'max': band.max()})

# Show stats for each channel
print("\n", stats)

RastersList = None
band = None

### 3.2 Raster math to "join" admin to GRID3 and to WSFE.
Processing is more rapid when "joining," i.e. creating serial codes out of two datasets, in raster rather than vector space.
Here, we are concatenating the ID fields of the two datasets to create a serial number that we can then split in vector space later to create two ID fields.

*Adding together the values to create join IDs. This is in effect a concatenation of their ID strings, by way of summation. The number of zeros in the calc multiplication corresponds with number of digits of the maximum value in the "B" dataset. (e.g. Chad ADM codes go up 4 digits, so it's calc=(A*10000)+B).*

In [None]:
# # OPEN TERMINAL FOR THIS PORTION. CODE DOCUMENTED HERE.

# Gdal_calc.py # To see info.

# gdal_calc.py -A C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon\Settlement\GRID3_rasterized.tif -B  C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon\ADM\ADM_rasterized.tif --outfile=C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon\Settlement\GRID3_ADM.tif --overwrite --calc="(A*1000)+B"
# gdal_calc.py -A C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon\Buildup\WSFE_warp.tif -B  C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon\ADM\ADM_rasterized.tif --outfile=C:\Users\grace\GIS\povertyequity\urban_growth\Cameroon\Buildup\WSFE_ADM.tif --overwrite --calc="(A*1000)+B"

# # END TERMINAL-ONLY ASPECT. RETURN HERE FOR NEXT STEPS.

In [None]:
# Validation: check the basic statistics of the resulting datasets.
RastersList = [rasterio.open(r"Buildup/WSFE_ADM.tif"), 
               rasterio.open(r"Settlement/GRID3_ADM.tif")]
for item in RastersList:
    print(item.name, "\nBands= ", item.count, "\nWxH= ", item.width, "x", item.height, "\n\n")
    
stats = []
for item in RastersList:
    band = item.read(1)
    stats.append({
        'raster': item.name,
        'min': band.min(),
        'mean': band.mean(),
        'median': np.median(band),
        'max': band.max()})

# Show stats for each channel
print("\n", stats)

RastersList = None
band = None

### 3.3 Vectorize "joined" layers.

##### Off-script: Run this block in QGIS.

In [None]:
# OPEN QGIS FOR THIS PORTION. CODE DOCUMENTED HERE.

# Due to dtype errors with both gdal and rasterio here, I decided to run the raster to polygon function in QGIS instead.
# It is possible to run QGIS functions within a Jupyter Notebook, but I ran it within the GUI. Arc or R are other options.
# Command line code here.

# processing.run("gdal:polygonize", {'INPUT':'C:/Users/grace/GIS/povertyequity/urban_growth/Cameroon/Settlement/GRID3_ADM.tif','BAND':1,'FIELD':'gridcode','EIGHT_CONNECTEDNESS':False,'EXTRA':'','OUTPUT':'C:/Users/grace/GIS/povertyequity/urban_growth/Cameroon/Settlement/GRID3_ADM.shp'})
# processing.run("gdal:polygonize", {'INPUT':'C:/Users/grace/GIS/povertyequity/urban_growth/Cameroon/Buildup/WSFE_ADM.tif','BAND':1,'FIELD':'gridcode','EIGHT_CONNECTEDNESS':False,'EXTRA':'','OUTPUT':'C:/Users/grace/GIS/povertyequity/urban_growth/Cameroon/Buildup/WSFE_ADM.shp'})

### 3.4 Vector math to split raster strings into admin area, GRID3, and WSFE year assignments.

In [None]:
# Load newly created vectorized datasets.
GRID3_ADM = gpd.read_file(r"Settlement/GRID3_ADM.shp")
WSFE_ADM = gpd.read_file(r"Buildup/WSFE_ADM.shp")
print(GRID3_ADM.info(), "\n\n", GRID3_ADM.sample(10), "\n\n", GRID3_ADM.crs, "\n\n", 
      WSFE_ADM.info(), "\n\n", WSFE_ADM.sample(10), "\n\n", WSFE_ADM.crs)

In [None]:
print(GRID3_ADM['gridcode'].max(), WSFE_ADM['gridcode'].max())

In [None]:
# Split serial back into separate dataset fields.
# For Burkina: WSFE and ADM: 4+3=7 digits. GRID3 and ADM: 6+3=9 digits.
GRID3_ADM['gridstring'] = GRID3_ADM['gridcode'].astype(str).str.zfill(9)
WSFE_ADM['gridstring'] = WSFE_ADM['gridcode'].astype(str).str.zfill(7)

GRID3_ADM['Sett_ID'] = GRID3_ADM['gridstring'].str[:-3].astype(int) # Remove the last 3 digits to get the GRID3 portion.
GRID3_ADM['ADM_ID'] = GRID3_ADM['gridstring'].str[-3:].astype(int) # Keep only the last 3 digits to get the ADM portion.
WSFE_ADM['year'] = WSFE_ADM['gridstring'].str[:-3].astype(int)
WSFE_ADM['ADM_ID'] = WSFE_ADM['gridstring'].str[-3:].astype(int)

print(GRID3_ADM.sample(10), WSFE_ADM.sample(10))

In [None]:
# Dissolve any features that have the same G3 and ADM values so that we have a single unique feature per settlement.
# Note: we do NOT want to dissolve the WSFE features. Distinct features for noncontiguous builtup areas of the same year is necessary to separate them in the Near tool step.
GRID3_ADM = GRID3_ADM.dissolve(by=['Sett_ID', 'ADM_ID'], as_index=False)
print(GRID3_ADM.info(), GRID3_ADM.head())

In [None]:
# Remove features where year, settlement, or admin area = 0.
# This was supposed to be resolved earlier with the gdal_calc NoDataValue parameter.

print("Before: WSFE %s and GRID3 %s\n" % (WSFE_ADM.shape, GRID3_ADM.shape))
WSFE_ADM = WSFE_ADM.loc[(WSFE_ADM["year"] != 0) & (WSFE_ADM["ADM_ID"] != 0)] # Since we change the datatype to integer, no need to include all digits. Otherwise, it would need to be: != '0000'
GRID3_ADM = GRID3_ADM.loc[(GRID3_ADM["Sett_ID"] != 0) & (GRID3_ADM["ADM_ID"] != 0)]
print("After: WSFE %s and GRID3 %s\n" % (WSFE_ADM.shape, GRID3_ADM.shape))

In [None]:
# The Bounded_ID is our new unique settlement identifier for subsequent matching steps.
GRID3_ADM['Bounded_ID'] = GRID3_ADM.index
WSFE_ADM['WSFE_ID'] = WSFE_ADM.index
GRID3_ADM = GRID3_ADM[['Sett_ID', 'Bounded_ID', 'ADM_ID', 'geometry']]
WSFE_ADM = WSFE_ADM[['WSFE_ID', 'year', 'ADM_ID', 'geometry']]

In [None]:
# Validation: 
# The first two printed numbers should be the same. There shouldn't be any GRID3 rows with matching Sett_ID and ADM_IDs.
# The latter two numbers should be different, and the first should be larger. We never dissolved WSFE by any column.

print(len(GRID3_ADM[['Sett_ID', 'ADM_ID']]),
      len(GRID3_ADM[['Sett_ID', 'ADM_ID']].drop_duplicates()),
      len(WSFE_ADM[['year', 'ADM_ID']]),
      len(WSFE_ADM[['year', 'ADM_ID']].drop_duplicates()))

In [None]:
GRID3_ADM.to_file(
    driver='GPKG', filename='Settlement/GRID3_ADM.gpkg', layer='GRID3_ADM_cleaned')
WSFE_ADM.to_file(
    driver='GPKG', filename=r'Buildup/WSFE_ADM.gpkg', layer='WSFE_ADM_cleaned')

---

## 4. UNIQUE SETTLEMENTS FROM WSFE AND GRID3: TWO VERSIONS

Note that there are 2 versions here, so that we can create a fragmentation index:
1. **Boundless, aka boundary-agnostic settlements**: Unique settlements are linked to GRID3 settlement IDs. Administrative areas do not influence the extents of the settlement.
2. **Bounded, aka politically-defined settlements**: Settlements in the Boundless dataset which spread across more than one administrative area are split into separate settlements in the Bounded dataset. The largest polygon after the split is considered the "principal" settlement, and polygons in other admin areas are considered "fragments." By dividing the fragment area(s) of the Bounded settlement by the area of the Boundless settlement, we can acquire a fragmentation index for each locality.

### 4.1 BOUNDED SETTLEMENTS: Near Join by ADM group.

In [None]:
print("Number of admin areas with GRID3 features: %s" % len(GRID3_ADM['ADM_ID'].unique().tolist()))
print("Number of admin areas with WSFE features: %s" % len(WSFE_ADM['ADM_ID'].unique().tolist()))
print("Number of admin areas where one dataset is observed but the other is not: %s" % (
    len(GRID3_ADM['ADM_ID'].unique().tolist()) - len(WSFE_ADM['ADM_ID'].unique().tolist())))

In [None]:
ADM_IDs = sorted(GRID3_ADM['ADM_ID'].unique().tolist())
ADM_IDs

In [None]:
# We're creating this field to help in removing duplicates from the sjoin_nearest, next section.
GRID3_ADM['G3_Area'] = GRID3_ADM['geometry'].area / 10**6

In [None]:
# Create empty geodataframe to append onto using the dataframe whose geometry we want to retain.
Bounded = GRID3_ADM[0:0]
Bounded["year"] = pd.Series(dtype='int')
Bounded.info()

In [None]:
for ID in ADM_IDs:
    WSFE_shard = WSFE_ADM.loc[WSFE_ADM['ADM_ID'] == ID]
    GRID3_shard = GRID3_ADM.loc[GRID3_ADM['ADM_ID'] == ID]
    WSFE_GRID3_shard = gpd.sjoin_nearest(WSFE_shard, 
                                         GRID3_shard, 
                                         how='inner',
                                         max_distance=500)
    Bounded = pd.concat([Bounded, WSFE_GRID3_shard])
    print('Completed near join in admin area %s. %s \n' % (ID, time.ctime()))
print('Completed near join for all ADMs. %s \n' % time.ctime())

del WSFE_shard, GRID3_shard, WSFE_GRID3_shard

In [None]:
Bounded.sample(20)

In [None]:
Bounded.info()

In [None]:
# Remove WSFE features that did not match any GRID3 settlements.
Bounded = Bounded.loc[~Bounded['Sett_ID'].isna()]
Bounded.info()

In [None]:
del GRID3_ADM, ADM_IDs

### 4.2 Remove duplicates: where buildup polygons intersected with more than one GRID3 settlement extent.
This happens when the first dataset (WSFE) intersects (distance = 0) with more than one feature of the second dataset (GRID3). More common for large cities. For example, Yaoundé, CMN has a large contiguous 1985 WSFE polygon which overlaps several small GRID3 features that are not Yaoundé.

In [None]:
# The first number should always be zero. 
# The second tells us whether/how many WSFE polygons were duplicated by the Near join.

print(len(WSFE_ADM[WSFE_ADM.duplicated('WSFE_ID')]), len(Bounded[Bounded.duplicated('WSFE_ID')]))

In [None]:
# If there are duplicate WSFE_IDs, then we need to choose between them.
# We'll pick the one that joined with the largest GRID3 polygon.
# To do that, we can just sort the dataframe by GRID3 areas, then drop_duplicates. 
# It will retain the first row of each WSFE_ID group.
Bounded = Bounded.sort_values('G3_Area', ascending=False).drop_duplicates(['WSFE_ID'])
Bounded.info()

In [None]:
print(len(Bounded[Bounded.duplicated('WSFE_ID')]))

In [None]:
# Now we can dissolve with the WSFE years, now that we can group them by their administratively split ID.
Bounded = Bounded.dissolve(by=['year', 'Bounded_ID'], as_index=False)
print(Bounded.info(), Bounded.sample(10))

In [None]:
# Clean up and save to file.
Bounded = Bounded[['ADM_ID_left', 'year', 'Bounded_ID', 'Sett_ID', 'geometry']].rename(columns={"ADM_ID_left": "ADM_ID"})
Bounded = Bounded.astype({"ADM_ID":'int', "Bounded_ID":'int', "Sett_ID":'int', "year":'int'})
print(Bounded.sample(10))
Bounded.to_file(
    driver='GPKG', filename=r'Results/NonCumulativeSettlements.gpkg', layer='Settlements_Bounded')

In [None]:
del WSFE_ADM

### 4.3 BOUNDLESS SETTLEMENTS: Dissolve features that were split by an ADM boundary.

In [None]:
# Fragments of any bounded settlement will be combined into a single "boundless" settlement in this version.
# It is based on their "Sett_ID", which is a direct loan from the GRID3 settlement features.
Boundless = Bounded.dissolve(by=['year', 'Sett_ID'], as_index=False)
print(Boundless.info(), Boundless.sample(10))

In [None]:
# Clean up and save to file.
Boundless.to_file(driver='GPKG', filename=r'Results/NonCumulativeSettlements.gpkg', layer='Settlements_Boundless')

---

## 5. CUMULATIVE ANNUALIZED SETTLEMENT EXTENTS
DISSOLVE BY YEAR SETS: Create separate feature layers of each cumulative year.

### 5.1 Define study years for each for loop.

In [24]:
Boundless = gpd.read_file(r'Results/NonCumulativeSettlements.gpkg', layer='Settlements_Boundless')

def CreateList(r1, r2):
    return [item for item in range(r1, r2+1)]

CuStart, CuEnd = Boundless['year'].min(), Boundless['year'].max()
StudyStart, StudyEnd = 1999, Boundless['year'].max()

AllCuYears = CreateList(CuStart, CuEnd) # All years in the WSFE dataset
AllStudyYears = CreateList(StudyStart, StudyEnd) # All years for which there will be growth stats in the present study.
print(AllCuYears, '\n\n', AllStudyYears)

ReversedStudyYears = []
for i in AllStudyYears:
    ReversedStudyYears.insert(0,i)
ReversedStudyYears.remove(StudyEnd)
print('\n\n', ReversedStudyYears)

[1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015] 

 [1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015]


 [2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999]


### 5.2 Starting with main Boundless dataset, create a cumulative area feature layer for each year.

In [None]:
# For each year in the growth stats study, we are taking features from all years prior to and including that year, 
# dissolving those features, and exporting as its own file.

for item in AllStudyYears:
    print('Subsetting to cumulative area for year: %s. %s\n' % (item, time.ctime()))
    CuYearSet = Boundless[Boundless['year'].between(
        CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.
    print('Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. %s\n' % time.ctime())
    CuYearDissolve = CuYearSet.dissolve(by='Sett_ID', 
                                        aggfunc={"year": "max", "ADM_ID":"min"}, # Though ADM_ID should be matching every time.
                                        as_index=False)
    print('Write to file. %s\n' % time.ctime())
    CuYearName = ''.join(['Cu', str(item), '_Boundless'])
    CuYearDissolve.to_file(driver='GPKG', filename=r'Results/CumulativeSettlements.gpkg', layer=CuYearName)
    del CuYearSet, CuYearDissolve
print("Done with all years in set. %s" % time.ctime())

##### Join area information from each cumulative layer onto the latest year dataset.

In [None]:
# The latest year in the study contains all settlements. Merge all other years' areas onto this dataset.
SettAreas = gpd.read_file(r'Results/CumulativeSettlements.gpkg', layer=
                          ''.join(['Cu', str(StudyEnd), '_Boundless'])) 
SettAreas['Area2015'] = SettAreas['geometry'].area / 10**6
SettAreas = pd.DataFrame(SettAreas).drop(columns='geometry') # We have settlement IDs, so no need to join spatially!


for item in ReversedStudyYears:
    print("Loading cumulative layer for year %s. %s\n" % (item, time.ctime()))
    YearLayer = gpd.read_file(r'Results/CumulativeSettlements.gpkg', layer=''.join(['Cu', str(item), '_Boundless']))
    print("Adding area field and converting to non-spatial dataframe. %s\n" % (time.ctime()))
    AreaYearName = ''.join(['Area', str(item)])
    YearLayer[AreaYearName] = YearLayer['geometry'].area/ 10**6 
    YearLayer = pd.DataFrame(YearLayer)[['Sett_ID', AreaYearName]]
    print("Merging variables from %s onto our latest year (%s) via table join. %s\n" % (item, StudyEnd, time.ctime()))
    SettAreas = SettAreas.merge(YearLayer, how='left', on='Sett_ID')
print("Done merging annualized areas onto latest year geometries. Saving to file. %s\n" % (time.ctime()))


print(SettAreas.info())
SettAreas.to_csv(os.path.join(ResultsFolder, 'Areas%sto%s.csv' % (StudyStart, StudyEnd)))

In [None]:
del SettAreas

### 5.3 Repeat for Bounded dataset.

In [None]:
# Bounded = gpd.read_file(r'Results/NonCumulativeSettlements.gpkg', layer='Settlements_Bounded')

for item in AllStudyYears:
    print('Subsetting to cumulative area for year: %s. %s\n' % (item, time.ctime()))
    CuYearSet = Bounded[Bounded['year'].between(CuStart, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.
    print('Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. %s\n' % time.ctime())
    CuYearDissolve = CuYearSet.dissolve(by='Bounded_ID', 
                                        aggfunc={"year": "max", "ADM_ID":"min", "Sett_ID":"min"}, # Though ADM_ID and Sett_ID should be matching every time.
                                        as_index=False)
    print('Write to file. %s\n' % time.ctime())
    CuYearName = ''.join(['Cu', str(item), '_Bounded'])
    CuYearDissolve.to_file(driver='GPKG', filename=r'Results/CumulativeSettlements.gpkg', layer=CuYearName)
    del CuYearSet, CuYearDissolve
print("Done with all years in set. %s" % time.ctime())

In [None]:
SettAreas = gpd.read_file(r'Results/CumulativeSettlements.gpkg', 
                          layer=''.join(['Cu', str(StudyEnd), '_Bounded']))
SettAreas['Area2015'] = SettAreas['geometry'].area / 10**6
SettAreas = pd.DataFrame(SettAreas).drop(columns='geometry')


for item in ReversedStudyYears:
    print("Loading cumulative layer for year %s. %s\n" % (item, time.ctime()))
    YearLayer = gpd.read_file(r'Results/CumulativeSettlements.gpkg', layer=''.join(['Cu', str(item), '_Bounded']))
    print("Adding area field and converting to non-spatial dataframe. %s\n" % (time.ctime()))
    AreaYearName = ''.join(['Area', str(item)])
    YearLayer[AreaYearName] = YearLayer['geometry'].area/ 10**6 
    YearLayer = pd.DataFrame(YearLayer)[['Bounded_ID', AreaYearName]]
    print("Merging variables from %s onto our latest year (%s) via table join. %s\n" % (item, StudyEnd, time.ctime()))
    SettAreas = SettAreas.merge(YearLayer, how='left', on='Bounded_ID')
print("Done merging annualized areas onto latest year geometries. Saving to file. %s\n" % (time.ctime()))

print(SettAreas.info())
SettAreas.to_csv(os.path.join(ResultsFolder, 'Areas%sto%s_%s.csv' % (StudyStart, StudyEnd, 'Bounded')))

In [None]:
del SettAreas

### 5.4 One settlement geofile to rule them all. ...and in the Sett_ID bind them.
The annualized values can be stored as distinct non-spatial dataframes. Their Sett_IDs will be used to join onto this geoversion with place names for the summary stats.

In [32]:
Settlements = gpd.read_file(r'Results/CumulativeSettlements.gpkg', 
                           layer=''.join(['Cu', str(StudyEnd), '_Boundless']))[['Sett_ID', 'ADM_ID', 'geometry']]
print(Settlements.info())
print(Settlements.crs)
Settlements.to_file(driver='GPKG', 
                       filename=r'Results/SETTLEMENTS.gpkg', 
                       layer='SETTLEMENTS_albers')

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 13292 entries, 0 to 13291
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   Sett_ID   13292 non-null  int64   
 1   ADM_ID    13292 non-null  int64   
 2   geometry  13292 non-null  geometry
dtypes: geometry(1), int64(2)
memory usage: 311.7 KB
None
PROJCS["Africa_Albers_Equal_Area_Conic",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",0],PARAMETER["longitude_of_center",25],PARAMETER["standard_parallel_1",20],PARAMETER["standard_parallel_2",-23],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["ESRI","102022"]]


In [26]:
# Saving all the final products as WGS84.
Settlements_WGS = Settlements.to_crs(4326) 
print(Settlements_WGS.info())
print(Settlements_WGS.crs)
Settlements_WGS.to_file(driver='GPKG', 
                       filename=r'Results/SETTLEMENTS.gpkg', 
                       layer='SETTLEMENTS')

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 13292 entries, 0 to 13291
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   Sett_ID   13292 non-null  int64   
 1   ADM_ID    13292 non-null  int64   
 2   geometry  13292 non-null  geometry
dtypes: geometry(1), int64(2)
memory usage: 311.7 KB
None
epsg:4326


### 5.5 Buffer the area of the Boundless dataset's latest year to mask raster data in later sections.
The Bounded dataset would also be fine for our purposes here. The buffer is dissolved to a single feature to be used for its total extents, which are identical between Bounded & Boundless datasets.

In [28]:
# Create buffer layer(s) to use as maximum distance for Near joins.

# Population buffer: 2km
Distance = 2000 # We start with the Africa Albers projection, which is in meters, before saving to file in the standard 4326.

print('Creating buffer layer. %s' % time.ctime())
BufferLayer = Settlements[['Sett_ID', 'geometry']]
BufferLayer['geometry'] = BufferLayer['geometry'].apply(
    make_valid).buffer(Distance) # make_valid is a workaround for any null geometries.
print('Finished buffer layer creation. %s' % time.ctime())
BufferLayer = BufferLayer.to_crs(4326) # Saving all the final products as WGS84.
BufferFileName1 = ''.join(['Buff', str(Distance), 'm_', str(StudyEnd)])
BufferLayer.to_file(driver='GPKG', filename=r'Results/Catchment.gpkg', layer=BufferFileName1)
print('Saved to file. %s' % time.ctime())

Creating buffer layer. Thu Jan 19 10:23:57 2023


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)


Finished buffer layer creation. Thu Jan 19 11:45:22 2023
Saved to file. Thu Jan 19 11:45:27 2023


In [29]:
# Nighttime Lights buffer: 250m
Distance = 250

print('Creating buffer layer. %s' % time.ctime())
BufferLayer = Settlements[['Sett_ID', 'geometry']]
BufferLayer['geometry'] = BufferLayer['geometry'].apply(
    make_valid).buffer(Distance) # make_valid is a workaround for any null geometries.
print('Finished buffer layer creation. %s' % time.ctime())
BufferLayer = BufferLayer.to_crs(4326) # Saving all the final products as WGS84.
BufferFileName2 = ''.join(['Buff', str(Distance), 'm_', str(StudyEnd)])
BufferLayer.to_file(driver='GPKG', filename=r'Results/Catchment.gpkg', layer=BufferFileName2)
print('Saved to file. %s' % time.ctime())

Creating buffer layer. Thu Jan 19 11:45:27 2023


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)


Finished buffer layer creation. Thu Jan 19 11:47:00 2023
Saved to file. Thu Jan 19 11:47:07 2023


---

## 6. PLACE NAMES
Join urban place names from UCDB, Africapolis, and GeoNames onto the settlement vectors.

### 6.1 Load placename datasets, filter, and project.

In [None]:
# Anytime we use a spatial join or work with area, 
# my preference is to keep it in a planar, equal area, meters projection. So we'll load as the Africa Albers.
Settlements = gpd.read_file(r'Results/SETTLEMENTS.gpkg', layer='SETTLEMENTS_albers')
Settlements['Area2015'] = Settlements['geometry'].area / 10**6

# Load, pull name field, rename, and reproject to match the catchments CRS.
UCDB = gpd.read_file('PlaceName/GHS_STAT_UCDB2015MT_GLOBE_R2019A_V1_2.gpkg', 
                     layer=0)[['UC_NM_MN', 'geometry']].rename(
    columns={"UC_NM_MN": "UCDB_Name"}).to_crs("ESRI:102022")

Africapolis = gpd.read_file('PlaceName/AFRICAPOLIS2020.shp')[['agglosName', 'geometry']].rename(
    columns={"agglosName": "Afpl_Name"}).to_crs("ESRI:102022")

GeoNames = gpd.read_file('PlaceName/GeoNames.gpkg', 
                         layer=0)[['GeoName', 'geometry']].to_crs("ESRI:102022")

print(Settlements.info(), UCDB.info(), Africapolis.info(), GeoNames.info())

### 6.2 Join placenames onto settlements geodataframe.

In [None]:
# We wrap it in pd.DataFrame() since the sjoin() is the last time we need the geometry.

GeoNames = pd.DataFrame(gpd.sjoin_nearest(GeoNames, Settlements, 
                             how='left', distance_col="distGN", max_distance=250, 
                             lsuffix="G3", rsuffix="GN")).drop(columns='geometry')
Africapolis = pd.DataFrame(gpd.sjoin_nearest(Africapolis, Settlements, 
                             how='left', distance_col="distAF", max_distance=250,
                             lsuffix="G3", rsuffix="Af")).drop(columns='geometry')
UCDB = pd.DataFrame(gpd.sjoin_nearest(UCDB, Settlements, 
                             how='left', distance_col="distUC", max_distance=250,
                             lsuffix="G3", rsuffix="UC")).drop(columns='geometry')

In [None]:
print(GeoNames.info())
print(Africapolis.info())
print(UCDB.info())

In [None]:
alldatasets = [pd.DataFrame(Settlements).drop(columns='geometry'),
               Africapolis[['Sett_ID', 'Afpl_Name', 'distAF']], 
               GeoNames[['Sett_ID', 'GeoName', 'distGN']],
               UCDB[['Sett_ID', 'UCDB_Name', 'distUC']]]

SettlementsNamed = reduce(lambda left,right: pd.merge(left,right,on=['Sett_ID'], how='left'), alldatasets)
SettlementsNamed[['Afpl_Name', 'GeoName', 'UCDB_Name']] = SettlementsNamed[['Afpl_Name', 'GeoName', 'UCDB_Name']].fillna('UNK')

# Replace NaN values with a countable distance.
SettlementsNamed[['distAF', 'distGN', 'distUC']] = SettlementsNamed[['distAF', 'distGN', 'distUC']].fillna(-1)

In [None]:
print(SettlementsNamed.info())
print(SettlementsNamed.sample(10))

In [None]:
del UCDB, Africapolis, GeoNames

The near joins should have prevented duplication of rows, but if df1 intersects with two features in df2, it creates a new row. Two of our placenames sources are polygons, so there may be instances.

In [None]:
SettlementsNamed[SettlementsNamed.duplicated('Sett_ID', keep=False)]

In [None]:
SettlementsNamed.drop_duplicates(subset=['Sett_ID'], inplace=True, keep='first')
SettlementsNamed.info() # Range of entries should be the same as original Settlements file.

### 6.3 Reduce to single name column.

In [None]:
# Determine which source has a name geometrically closest to the settlement.
# Since we switched NaN values to -1 earlier, we also resolved what happens in the event of a tie, 
# i.e. when more than one source is 0.0 meters from the settlement. It will take the value from the first column.
SettlementsNamed['SettName'] = "UNK"
SettlementsNamed['closest'] = SettlementsNamed[['distAF', 'distGN', 'distUC']].idxmax(axis=1)

In [None]:
SettlementsNamed.sample(20)

In [None]:
# Create a single name column where non-named settlements are "UNK" but all others use one of the three name sources.
SettlementsNamed.loc[
    SettlementsNamed['closest'] == "distAF", 
    'SettName'] = SettlementsNamed['Afpl_Name']

SettlementsNamed.loc[
    SettlementsNamed['closest'] == "distUC", 
    'SettName'] = SettlementsNamed['UCDB_Name']

SettlementsNamed.loc[
    SettlementsNamed['closest'] == "distGN", 
    'SettName'] = SettlementsNamed['GeoName']

In [None]:
SettlementsNamed.sample(20)

### 6.4 Make sure place name is unique by stripping smaller localities of duplicated names.

In [None]:
Dupes = SettlementsNamed[ 
    (SettlementsNamed['SettName'] != 'UNK') & 
    (SettlementsNamed.duplicated('SettName', keep=False)) ] # keep=False is necessary to retain *all* duplicates, not just first or last in each group.

print("Number of named settlements: %s" % SettlementsNamed['SettName'].str.contains('UNK').value_counts()[False])
print("Number of named settlements where name is duplicated at least once: %s" % len(Dupes))

In [None]:
Largest = Dupes.loc[Dupes.groupby(["SettName"])["Area2015"].idxmax()]
print(Largest)

In [None]:
# Filter to settlements which have a duplicated name and are not the largest of those with that name, then replace with UNK.
SettlementsNamed.loc[(~SettlementsNamed.Sett_ID.isin(Largest.Sett_ID)) 
                     & (SettlementsNamed.Sett_ID.isin(Dupes.Sett_ID)), 
                     'SettName'] = 'UNK'

In [None]:
# Second number should now be zero.

print("Number of named settlements: %s" % SettlementsNamed['SettName'].str.contains('UNK').value_counts()[False])
print("Number of named settlements where name is duplicated at least once: %s" % len(SettlementsNamed[ 
    (SettlementsNamed['SettName'] != 'UNK') & 
    (SettlementsNamed.duplicated('SettName', keep=False)) ]))

In [None]:
print(SettlementsNamed.info(), SettlementsNamed[SettlementsNamed['SettName'] != "UNK"].sample(20))

In [None]:
# Drop extra columns and save to file.
SettlementsNamed = SettlementsNamed[['Sett_ID', 'SettName']]
SettlementsNamed.to_csv(r'Results/PlaceNames.csv')

In [None]:
del SettlementsNamed

---

## 7. CREATE FRAGMENTATION INDEX
We are determining what percentage of a settlement's area lies outside of its administrative zone each year.
The index is a range of 0 to 100, i.e. the percent of the settlement area which is fragmented.

For each Sett_ID:
((Area of Boundless settlement - Area of largest Bounded settlement feature) / Area of Boundless settlement) * 100

### 7.1 Load boundless and bounded cumulative settlements and clean.

In [None]:
BoundlessAreas = pd.read_csv(os.path.join(ResultsFolder, ('Areas%sto%s.csv' % (StudyStart, StudyEnd))))
print('Loaded Boundless dataset, whose settlements will be used as the index of the Fragmentation Index dataset. %s' 
      % time.ctime())
print(BoundlessAreas.info())

BoundedAreas = pd.read_csv(os.path.join(ResultsFolder, ('Areas%sto%s_%s.csv' % (StudyStart, StudyEnd, 'Bounded'))))
print('Loaded Bounded dataset, which will factor into the fragmentation calculation. %s' % time.ctime())
print(BoundedAreas.info())

In [None]:
LargestFragments = BoundedAreas.loc[BoundedAreas.groupby(["Sett_ID"])["Area2015"].idxmax()] 
print(LargestFragments.info())
print("Filtered the Bounded dataset to only rows where latest year's area is largest for each Sett_ID. %s" % time.ctime())
LargestFragments.columns = LargestFragments.columns.str.replace('Area', 'Largest')
LargestFragments = LargestFragments.drop(columns=['year', 'ADM_ID'])
print("Renamed columns to avoid duplication during merge, and dropped unnecessary columns. %s" % time.ctime())
FragIndices = BoundlessAreas.merge(LargestFragments, how='left', on='Sett_ID')
print(FragIndices.info())

In [None]:
del BoundlessAreas, BoundedAreas, LargestFragments

### 7.2 Merge and run fragmentation calculation.

In [None]:
for item in AllStudyYears:
    YY = str(item) # 4-digit year
    AreaYY = ''.join(["Area", YY]) # The Boundless area variable name
    LargestYY = ''.join(['Largest', YY]) # The Bounded largest area variable name
    FragYY = ''.join(["Frag", YY]) # Name for the fragmentation index variable
    print("Created names for Year %s's variables and temporary objects. %s" % (item, time.ctime()))
    
    FragIndices[FragYY] = ((FragIndices[AreaYY] - FragIndices[LargestYY]) / FragIndices[AreaYY]) * 100
    FragIndices[FragYY] = (FragIndices[FragYY].fillna(0).replace([np.inf, -np.inf], 0)).astype('int')
    print("Calculated fragmentation index for year %s. %s" % (item, time.ctime()))

# Remove unnecessary columns.
FragIndices = FragIndices.loc[:, ~FragIndices.columns.str.startswith('Largest')]
FragIndices = FragIndices.loc[:, ~FragIndices.columns.str.startswith('Area')]

print('Completed fragmentation index calculations for all years. %s' % time.ctime())
print(FragIndices.info())
print(FragIndices.sample(5))

In [None]:
FragIndices = FragIndices.drop(columns=['Unnamed: 0_x', 'Unnamed: 0_y', 'year', 'ADM_ID'])
FragIndices.to_csv(os.path.join(ResultsFolder, 'FragIndex%sto%s.csv' % (StudyStart, StudyEnd)))
print('Saved to file. %s' % time.ctime())

In [None]:
del FragIndices

---

## 8. PREPARE YEARLY DATASETS: POPULATION
Can use this as a template for other annualized rasters

### 8.1 Reproject and reclassify with settlement buffer mask.
Reclassify so that we only need to work with cells within X distance of settlements.

In [None]:
ProjCRS = gdal.WarpOptions(dstSRS='ESRI:102022')
AnnualizedSourceFiles = [i for i in os.listdir('Population/') if i.endswith('.tif')]

with fiona.open(r'Results/Catchment.gpkg', mode="r", layer="Buff2000m_2015") as shapefile:
    MaskGeom = [feature["geometry"] for feature in shapefile] # Identify the bounding areas of the mask.
# Mask_out = './LatestYearBuffer.tif'
AnnualizedSourceFiles

In [None]:
# This codeblock changes each annual population raster's projection (gdal.Warp()), 
# then masks it to within a specified distance of the settlements (rasterio.mask.mask()).

for YearFile in AnnualizedSourceFiles:
    InputRasterName = os.path.join(ProjectFolder, "Population", YearFile)
    Year = str(re.sub(r'[^0-9]', '', YearFile))
    InputRasterObject = gdal.Open(InputRasterName)
    TempOutputName = "Temp_" + Year + "_albers.tif"
    TempOutputPath = os.path.join(ProjectFolder, "Population", TempOutputName)
    if exists(TempOutputPath):
        pass
    else:
        # Reproject to same CRS as settlements.
        Warp = gdal.Warp(TempOutputPath, # Where to store the warped raster
                     InputRasterObject, # Which raster to warp
                     format='GTiff', 
                     options=ProjCRS) # Reproject to Africa Albers Equal Area Conic
        print('Finished gdal.Warp() for year %s. %s \n' % (Year, time.ctime()))
        
        Warp = None # Close the files
        InputRasterObject = None

        # Reclassify as nodata if outside settlement buffer zones.
        with rasterio.open(TempOutputPath) as InputRasterObject:
            MaskedOutputRaster, OutTransform = rasterio.mask.mask(
                InputRasterObject, MaskGeom, crop=True) # Anything outside the mask is reclassed to the raster's NoData value.
            OutMetaData = InputRasterObject.meta.copy()
        print('Finished rasterio.mask.mask() for year %s. %s \n' % (Year, time.ctime()))
            
        OutMetaData.update({"driver": "GTiff",
                         "height": MaskedOutputRaster.shape[1],
                         "width": MaskedOutputRaster.shape[2],
                         "transform": OutTransform})
        FinalOutputPath = os.path.join(ProjectFolder, "Population", ''.join(['Masked_', Year, '.tif'])) # ''.join([r'Population/', 'Masked_', Year, '.tif']
        with rasterio.open(FinalOutputPath, "w", **OutMetaData) as dest:
            dest.write(MaskedOutputRaster)
        print('Written to file. %s \n' % time.ctime())
    InputRasterObject = None
    
    try:  # Finally, remove the intermediate file from disk
        os.remove(TempOutputPath)
    except OSError:
        pass
    print('Removed intermediate file. %s \n' % time.ctime())

print('\n \n Finished all years in list. %s' % time.ctime())

In [None]:
print(os.listdir('Population/'))

In [None]:
AnnualizedSourceFiles = None

### 8.2 Raster values summarized by settlement.
1. Convert each annualized raster to .xyz, 
2. then bring them to vector space and assign their Sett_ID,
3. and finally, aggregate the value as appropriate to the settlement level and save table to file.

XYZ is similar to .csv. Raster cell centers are stored as x and y, and their value is stored as z.

In [None]:
NoDataVal = -99999 
Settlements = gpd.read_file(r'Results/SETTLEMENTS.gpkg', layer='SETTLEMENTS')
AllSummaries = pd.DataFrame(Settlements).drop(columns='geometry')

AnnualizedMaskedFiles = [i for i in os.listdir('Population/') if i.startswith('Masked') and i.endswith('.tif')]
AnnualizedMaskedFiles

In [None]:
for YearFile in AnnualizedMaskedFiles:
    
### STEP 1: TIF TO XYZ ###
    InputRasterName = os.path.join(ProjectFolder, "Population", YearFile)
    Year = str(re.sub(r'[^0-9]', '', YearFile))
    print('Loading data for year %s. %s \n' % (Year, time.ctime()))
    InputRasterObject = gdal.Open(InputRasterName)
    XYZOutputPath = r'Population/{}'.format(
        YearFile.replace('.tif', '.xyz')) # New file path will be the same as original, but .tif is replaced with .xyz
    
    # Create an .xyz version of the .tif
    XYZ = gdal.Translate(XYZOutputPath, # Specify a destination path
                         InputRasterObject, # Input is the masked .tif file
                         format='XYZ', 
                         creationOptions=["ADD_HEADER_LINE=YES"])
    print('Finished gdal.Translate() for year %s. %s \n' % (Year, time.ctime()))

#     # Remove the temporary masked tif file.
#     try:  
#         os.remove(InputRasterName)
#     except OSError:
#         pass
#     print('Removed (or skipped if error) intermediate tif file. %s \n' % time.ctime())
    
    InputRasterObject = None
    XYZ = None # Reload XYZ as a point geodataframe

    
### STEP 2: GENERATE GEODATAFRAME WITH SETT_ID FIELD ###
    InputXYZName = ''.join(['Masked_', Year, '.xyz'])
    InputXYZ = pd.read_table(os.path.join(ProjectFolder, 'Population', InputXYZName), delim_whitespace=True)
    InputXYZ = InputXYZ.loc[InputXYZ['Z'] != NoDataVal] # Subset to only the features that have a raster value.
    print('Loaded XYZ file as a pandas dataframe, year %s. %s \n' % (Year, time.ctime()))
    ValObject = gpd.GeoDataFrame(InputXYZ,
                                 geometry = gpd.points_from_xy(InputXYZ['X'], InputXYZ['Y']),
                                 crs = 'ESRI:102022')
    print('Created geodataframe from non-NoData points, year %s. %s \n' % (Year, time.ctime()))
    del InputXYZ
    
    # Sjoin_nearest: No need to group by ADM this time. 
    ValObject_withID = gpd.sjoin_nearest(ValObject, 
                                    Settlements, 
                                    how='left') # No need for max_distance parameter this time. We've already narrowed down to nearby raster cells.
    
    print('\nJoined settlement ID onto vectorized raster cells for year %s. %s \n' % (Year, time.ctime()))
    print(ValObject_withID.sample(10))
    del ValObject
    
    # We no longer need the spatial information of the raster values because we have their unique settlement ID.
    ValObject_withID = pd.DataFrame(ValObject_withID).drop(columns='geometry')
    
    ValObject_withID.to_csv(''.join([r'Population/', 'Masked_', Year, '.csv']))
    print('\nExported as table, year %s. %s \n' % (Year, time.ctime()))
    
    # Remove the temporary xyz file.
    try:  
        os.remove(os.path.join(ProjectFolder, 'Population', InputXYZName))
    except OSError:
        pass
    print('Removed (or skipped if error) intermediate xyz file. %s \n' % time.ctime())

    

### STEP 3: AGGREGATE BY SETTLEMENT AND MERGE ONTO SUMMARIES TABLE ###
    VariableName = ''.join(['PopSum', Year])
    
    ValAggregated = ValObject_withID.groupby('Sett_ID', 
                                      as_index=False)['Z'].sum().rename(columns={"Z": VariableName})
    print('\nValues aggregated to settlement level, year %s. %s \n' % (Year, time.ctime()))
    print(ValAggregated.sample(10))
    
    AllSummaries = AllSummaries.merge(ValAggregated, how='left', on='Sett_ID')
    print('\nMerged year %s onto latest year settlement feature layer. %s \n' % (Year, time.ctime()))
    print(AllSummaries.sample(10))
    
    del ValObject_withID, ValAggregated
    print('\n\n')
    

print('\n\nFinished. All years masked and assigned their nearest settlement. %s' % time.ctime())

AllSummaries.to_csv(os.path.join(ResultsFolder, 'Pop%sto%s.csv' % (2000, 2015)))
print('Saved to file. %s \n' % time.ctime())

In [None]:
# Check contents
AllSummaries.sort_values('PopSum2010', ascending=False).head(20)

---

## 9. PREPARE YEARLY DATASETS: NIGHTTIME LIGHTS

### 9.1 Reclassify with settlement buffer mask.
Reclassify so that we only need to work with cells within X distance of settlements. The two NTL sources have already been reprojected in a separate script, and cropped to Central & Western Africa.

In [3]:
ProjCRS = gdal.WarpOptions(dstSRS='EPSG:4326')
with fiona.open(r'Results/Catchment.gpkg', mode="r", layer="Buff250m_2015") as shapefile:
    MaskGeom = [feature["geometry"] for feature in shapefile] # Identify the bounding areas of the mask.
# Mask_out = './LatestYearBuffer.tif'

In [4]:
AnnualizedSourceFiles = []
AnnualizedSourceFiles = AnnualizedSourceFiles + [i for i in os.listdir('NTL/SourceFiles/') if i.endswith('tif')]
    
AnnualizedSourceFiles

['D_1999_avg.tif',
 'D_1999_cfc.tif',
 'D_2000_avg.tif',
 'D_2000_cfc.tif',
 'D_2001_avg.tif',
 'D_2001_cfc.tif',
 'D_2002_avg.tif',
 'D_2002_cfc.tif',
 'D_2003_avg.tif',
 'D_2003_cfc.tif',
 'D_2004_avg.tif',
 'D_2004_cfc.tif',
 'D_2005_avg.tif',
 'D_2005_cfc.tif',
 'D_2006_avg.tif',
 'D_2006_cfc.tif',
 'D_2007_avg.tif',
 'D_2007_cfc.tif',
 'D_2008_avg.tif',
 'D_2008_cfc.tif',
 'D_2009_avg.tif',
 'D_2009_cfc.tif',
 'D_2010_avg.tif',
 'D_2010_cfc.tif',
 'D_2011_avg.tif',
 'D_2011_cfc.tif',
 'D_2012_avg.tif',
 'D_2012_cfc.tif',
 'D_2013_avg.tif',
 'D_2013_cfc.tif',
 'V_2012_avg.tif',
 'V_2012_cfc.tif',
 'V_2013_avg.tif',
 'V_2013_cfc.tif',
 'V_2014_avg.tif',
 'V_2014_cfc.tif',
 'V_2015_avg.tif',
 'V_2015_cfc.tif']

In [5]:
# This codeblock changes each annual raster's projection (gdal.Warp()), 
# then masks it to within a specified distance of the settlements (rasterio.mask.mask()).

for YearFile in AnnualizedSourceFiles:
    
    InputRasterPath = os.path.join(ProjectFolder, "NTL/SourceFiles", YearFile)
    Sensor = YearFile[:1] # Remove characters in or after position 1 (2nd character). aka keep just the first character.
    Year = YearFile[2:6] # Remove characters before position 2, and those in or after position 6. (keep the 3rd - 6th characters)
    # Confusingly, left of : is inclusive and right of : is exclusive. "Slice returns a "substring" text[i:j] from your original String where "i" are the initial index (inclusive) and "j" are the end index (exclusive)."
    
    if YearFile.endswith('avg.tif') == True:
        IndicType = '_avg'
    else:
        IndicType = '_cfc'
        
    TempOutputName = 'Temp_' + Sensor + '_' + Year + IndicType + '.tif'
    TempOutputPath = os.path.join(ProjectFolder, "NTL", TempOutputName)
    FinalOutputName = 'Msk_' + Sensor + '_' + Year + IndicType + '.tif'
    FinalOutputPath = os.path.join(ProjectFolder, "NTL", FinalOutputName)
    
    if exists(FinalOutputPath): # If we already did both the warp and the mask, the file will be here and we can skip it.
        pass
    else:
        # Reproject to same CRS as settlements.
        InputRasterObject = gdal.Open(InputRasterPath)
        
        if osr.SpatialReference(wkt=InputRasterObject.GetProjection()).GetAttrValue('geogcs') != 'WGS 84':
            Warp = gdal.Warp(TempOutputPath, # Where to store the warped raster
                         InputRasterObject, # Which raster to warp
                         format='GTiff', 
                         options=ProjCRS) # Reproject to Africa Albers Equal Area Conic
            print('Finished gdal.Warp() for %s. %s \n' % (YearFile, time.ctime()))

            Warp = None # Close the files
        else:
            pass
        
        InputRasterObject = None


        # Reclassify as nodata if outside settlement buffer zones.
        if exists(TempOutputPath):
            NewInputPath = TempOutputPath # If we warped the data, then use that file for next step.
        else:
            NewInputPath = InputRasterPath # Otherwise, we must have skipped the warp and can use the source file.
            
        with rasterio.open(NewInputPath) as InputRasterObject:
            MaskedOutputRaster, OutTransform = rasterio.mask.mask(
                InputRasterObject, MaskGeom, crop=True) # Anything outside the mask is reclassed to the raster's NoData value.
            OutMetaData = InputRasterObject.meta.copy()
        print('Finished rasterio.mask.mask() for %s. %s \n' % (YearFile, time.ctime()))
            
        OutMetaData.update({"driver": "GTiff",
                         "height": MaskedOutputRaster.shape[1],
                         "width": MaskedOutputRaster.shape[2],
                         "transform": OutTransform})

        with rasterio.open(FinalOutputPath, "w", **OutMetaData) as dest:
            dest.write(MaskedOutputRaster)
        print('Written to file. %s \n' % time.ctime())
        
    InputRasterObject = None
    
    if exists(TempOutputPath):
        try:  # Finally, remove the intermediate file from disk
            os.remove(TempOutputPath)
        except OSError:
            pass
        print('Removed intermediate file. %s \n' % time.ctime())
    else:
        pass


print('\n \n Finished all years in list. %s' % time.ctime())

Finished rasterio.mask.mask() for D_1999_avg.tif. Thu Jan 19 17:21:34 2023 

Written to file. Thu Jan 19 17:21:34 2023 

Finished rasterio.mask.mask() for D_1999_cfc.tif. Thu Jan 19 17:21:36 2023 

Written to file. Thu Jan 19 17:21:36 2023 

Finished rasterio.mask.mask() for D_2000_avg.tif. Thu Jan 19 17:21:38 2023 

Written to file. Thu Jan 19 17:21:39 2023 

Finished rasterio.mask.mask() for D_2000_cfc.tif. Thu Jan 19 17:21:41 2023 

Written to file. Thu Jan 19 17:21:41 2023 

Finished rasterio.mask.mask() for D_2001_avg.tif. Thu Jan 19 17:21:43 2023 

Written to file. Thu Jan 19 17:21:43 2023 

Finished rasterio.mask.mask() for D_2001_cfc.tif. Thu Jan 19 17:21:45 2023 

Written to file. Thu Jan 19 17:21:45 2023 

Finished rasterio.mask.mask() for D_2002_avg.tif. Thu Jan 19 17:21:47 2023 

Written to file. Thu Jan 19 17:21:47 2023 

Finished rasterio.mask.mask() for D_2002_cfc.tif. Thu Jan 19 17:21:49 2023 

Written to file. Thu Jan 19 17:21:49 2023 

Finished rasterio.mask.mask() fo

In [6]:
print(os.listdir('NTL/'))

['Msk_D_1999_avg.tif', 'Msk_D_1999_cfc.tif', 'Msk_D_2000_avg.tif', 'Msk_D_2000_cfc.tif', 'Msk_D_2001_avg.tif', 'Msk_D_2001_cfc.tif', 'Msk_D_2002_avg.tif', 'Msk_D_2002_cfc.tif', 'Msk_D_2003_avg.tif', 'Msk_D_2003_cfc.tif', 'Msk_D_2004_avg.tif', 'Msk_D_2004_cfc.tif', 'Msk_D_2005_avg.tif', 'Msk_D_2005_cfc.tif', 'Msk_D_2006_avg.tif', 'Msk_D_2006_cfc.tif', 'Msk_D_2007_avg.tif', 'Msk_D_2007_cfc.tif', 'Msk_D_2008_avg.tif', 'Msk_D_2008_cfc.tif', 'Msk_D_2009_avg.tif', 'Msk_D_2009_cfc.tif', 'Msk_D_2010_avg.tif', 'Msk_D_2010_cfc.tif', 'Msk_D_2011_avg.tif', 'Msk_D_2011_cfc.tif', 'Msk_D_2012_avg.tif', 'Msk_D_2012_cfc.tif', 'Msk_D_2013_avg.tif', 'Msk_D_2013_cfc.tif', 'Msk_V_2012_avg.tif', 'Msk_V_2012_cfc.tif', 'Msk_V_2013_avg.tif', 'Msk_V_2013_cfc.tif', 'Msk_V_2014_avg.tif', 'Msk_V_2014_cfc.tif', 'Msk_V_2015_avg.tif', 'Msk_V_2015_cfc.tif', 'SourceFiles']


In [7]:
AnnualizedSourceFiles = None

### 9.2 Raster values summarized by settlement.
1. Convert each annualized raster to .xyz, 
2. then bring them to vector space and assign their Sett_ID,
3. and finally, aggregate the value as appropriate to the settlement level and save table to file.

XYZ is similar to .csv. Raster cell centers are stored as x and y, and their value is stored as z.

In [33]:
Settlements = gpd.read_file(r'Results/SETTLEMENTS.gpkg', layer='SETTLEMENTS_albers')[['Sett_ID', 'geometry']]
AllSummaries = pd.DataFrame(Settlements).drop(columns='geometry') # Create an ID-only dataframe on which to merge results.

AnnualizedMaskedFiles = [i for i in os.listdir('NTL') if i.endswith('.tif')]
AnnualizedMaskedFiles

['Msk_D_1999_avg.tif',
 'Msk_D_1999_cfc.tif',
 'Msk_D_2000_avg.tif',
 'Msk_D_2000_cfc.tif',
 'Msk_D_2001_avg.tif',
 'Msk_D_2001_cfc.tif',
 'Msk_D_2002_avg.tif',
 'Msk_D_2002_cfc.tif',
 'Msk_D_2003_avg.tif',
 'Msk_D_2003_cfc.tif',
 'Msk_D_2004_avg.tif',
 'Msk_D_2004_cfc.tif',
 'Msk_D_2005_avg.tif',
 'Msk_D_2005_cfc.tif',
 'Msk_D_2006_avg.tif',
 'Msk_D_2006_cfc.tif',
 'Msk_D_2007_avg.tif',
 'Msk_D_2007_cfc.tif',
 'Msk_D_2008_avg.tif',
 'Msk_D_2008_cfc.tif',
 'Msk_D_2009_avg.tif',
 'Msk_D_2009_cfc.tif',
 'Msk_D_2010_avg.tif',
 'Msk_D_2010_cfc.tif',
 'Msk_D_2011_avg.tif',
 'Msk_D_2011_cfc.tif',
 'Msk_D_2012_avg.tif',
 'Msk_D_2012_cfc.tif',
 'Msk_D_2013_avg.tif',
 'Msk_D_2013_cfc.tif',
 'Msk_V_2012_avg.tif',
 'Msk_V_2012_cfc.tif',
 'Msk_V_2013_avg.tif',
 'Msk_V_2013_cfc.tif',
 'Msk_V_2014_avg.tif',
 'Msk_V_2014_cfc.tif',
 'Msk_V_2015_avg.tif',
 'Msk_V_2015_cfc.tif']

In [34]:
AllSummaries

Unnamed: 0,Sett_ID
0,1
1,2
2,3
3,4
4,5
...,...
13287,201743
13288,201776
13289,201797
13290,201799


In [35]:
for YearFile in AnnualizedMaskedFiles:
    
### STEP 1: TIF TO DATAFRAME OBJECT ###
    InputRasterPath = os.path.join(ProjectFolder, "NTL", YearFile)
    Sensor = YearFile[4:5]
    Year = YearFile[6:10]
    
    print('Loading data for %s. %s \n' % (YearFile, time.ctime()))
    
    ValDF = rioxarray.open_rasterio(InputRasterPath).to_dataframe("value").reset_index()
    ValDF = ValDF[ValDF.value < 100000] # Excluding NoData values by choosing a number far greater than the highest NTL value.
    ValDF = ValDF[ValDF.value > 0] 

    print('Created pandas dataframe from %s. %s \n' % (YearFile, time.ctime()))
    
    
### STEP 2: GENERATE GEODATAFRAME WITH SETT_ID FIELD ###
    ValDF = gpd.GeoDataFrame(ValDF,
                              geometry = gpd.points_from_xy(ValDF['x'], ValDF['y']),
                              crs = 'EPSG:4326')[['value', 'geometry']]
    print('Created geodataframe from non-NoData points, %s. %s \n' % (YearFile, time.ctime()))

    # Remember to reproject our new raster-derived geodataframe into the same equal area projection as Settlements.
    ValDF = ValDF.to_crs("ESRI:102022")
    
    # Sjoin_nearest: No need to group by ADM this time. 
    ValDF_withID = gpd.sjoin_nearest(ValDF, 
                                    Settlements, 
                                    how='left') # No need for max_distance parameter this time. We've already narrowed down to nearby raster cells.
    
    print('\nJoined settlement ID onto vectorized raster cells for %s. %s \n' % (YearFile, time.ctime()))
    try:
        print(ValDF_withID.sample(10))
    except ValueError:
        pass
    del ValDF
    
    # We no longer need the spatial information of the raster values because we have their unique settlement ID.
    ValDF_withID = pd.DataFrame(ValDF_withID).drop(columns='geometry')
    
    ValDF_withID.to_csv(''.join([r'NTL/', YearFile.replace('.tif', '.csv')]))
    print('\nExported as table. %s. %s \n' % (YearFile, time.ctime()))

    
### STEP 3: AGGREGATE BY SETTLEMENT AND MERGE ONTO SUMMARIES TABLE ###
 
    if YearFile.endswith('_avg.tif') == True:
        IndicType = '_avg'
    else:
        IndicType = '_cfc'

        
    if YearFile.find('cfc') == -1:
        # Cell count
        VariableName = ''.join(['NTLct_', Sensor, IndicType, Year])
        ValAggregated = ValDF_withID[
            ValDF_withID['value'].notna()].groupby(
            'Sett_ID', as_index=False)['value'].count().rename(columns={'value': VariableName})
        print('\nCells per settlement counted, year %s. %s \n' % (Year, time.ctime()))
        try:
            print(ValAggregated.sample(10))
        except ValueError:
            pass
        AllSummaries = AllSummaries.merge(ValAggregated, how='left', on='Sett_ID')
        del VariableName, ValAggregated
    
        # Sum
        VariableName = ''.join(['NTLsum_', Sensor, IndicType, Year])
        ValAggregated = ValDF_withID[
            ValDF_withID['value'].notna()].groupby(
            'Sett_ID', as_index=False)['value'].sum().rename(columns={'value': VariableName})
        print('\nValues summed to settlement level, year %s. %s \n' % (Year, time.ctime()))
        try:
            print(ValAggregated.sample(10))
        except ValueError:
            pass
        AllSummaries = AllSummaries.merge(ValAggregated, how='left', on='Sett_ID')
        del VariableName, ValAggregated
    
        # Average
        VariableName = ''.join(['NTLavg_', Sensor, IndicType, Year])
        ValAggregated = ValDF_withID[
            ValDF_withID['value'].notna()].groupby(
            'Sett_ID', as_index=False)['value'].mean().rename(columns={'value': VariableName})
        print('\nValues averaged to settlement level, year %s. %s \n' % (Year, time.ctime()))
        try:
            print(ValAggregated.sample(10))
        except ValueError:
            pass
        AllSummaries = AllSummaries.merge(ValAggregated, how='left', on='Sett_ID')
        print('\nMerged year %s onto latest year settlement feature layer. %s \n' % (Year, time.ctime()))
        del VariableName, ValAggregated
        
        # Maximum
        VariableName = ''.join(['NTLmax_', Sensor, IndicType, Year])
        ValAggregated = ValDF_withID[
            ValDF_withID['value'].notna()].groupby(
            'Sett_ID', as_index=False)['value'].max().rename(columns={'value': VariableName})
        print('\nMax value of available pixels assigned at settlement level, year %s. %s \n' % (Year, time.ctime()))
        try:
            print(ValAggregated.sample(10))
        except ValueError:
            pass
        AllSummaries = AllSummaries.merge(ValAggregated, how='left', on='Sett_ID')
        print('\nMerged year %s onto latest year settlement feature layer. %s \n' % (Year, time.ctime()))
        del VariableName, ValAggregated
        
        # Minimum
        VariableName = ''.join(['NTLmin_', Sensor, IndicType, Year])
        ValAggregated = ValDF_withID[
            ValDF_withID['value'].notna()].groupby(
            'Sett_ID', as_index=False)['value'].min().rename(columns={'value': VariableName})
        print('\nMin value of available pixels assigned at settlement level, year %s. %s \n' % (Year, time.ctime()))
        try:
            print(ValAggregated.sample(10))
        except ValueError:
            pass
        AllSummaries = AllSummaries.merge(ValAggregated, how='left', on='Sett_ID')
        print('\nMerged year %s onto latest year settlement feature layer. %s \n' % (Year, time.ctime()))
        del VariableName, ValAggregated


    else:
        # Average of cloud-free observations
        VariableName = ''.join(['NTLcfc_', Sensor, '_', Year])
        ValAggregated = ValDF_withID[
            ValDF_withID['value'].notna()].groupby(
            'Sett_ID', as_index=False)['value'].mean().rename(columns={'value': VariableName})
        print('\nCount of cloud-free observations averaged to settlement level, year %s. %s \n' % (Year, time.ctime()))
        try:
            print(ValAggregated.sample(10))
        except ValueError:
            pass
        AllSummaries = AllSummaries.merge(ValAggregated, how='left', on='Sett_ID')
        print('\nMerged year %s onto latest year settlement feature layer. %s \n' % (Year, time.ctime()))
        del VariableName, ValAggregated
    
    print(AllSummaries.sample(10))

    

print('\n\nFinished. All years masked and assigned their nearest settlement. %s' % time.ctime())

AllSummaries.to_csv(os.path.join(ResultsFolder, 'NTL%sto%s.csv' % (1999, 2015)))
print('Saved to file. %s \n' % time.ctime())

Loading data for Msk_D_1999_avg.tif. Thu Jan 19 19:22:44 2023 

Created pandas dataframe from Msk_D_1999_avg.tif. Thu Jan 19 19:22:44 2023 

Created geodataframe from non-NoData points, Msk_D_1999_avg.tif. Thu Jan 19 19:22:44 2023 


Joined settlement ID onto vectorized raster cells for Msk_D_1999_avg.tif. Thu Jan 19 19:22:49 2023 

             value                          geometry  index_right  Sett_ID
839228    4.605854   POINT (-1534636.267 618905.242)         7013    49131
833735    3.138605   POINT (-1563253.200 624621.214)         7280    51594
83561     2.417447  POINT (-1067902.905 1430526.678)        12074   191358
914267    3.116667   POINT (-1171378.545 540260.867)         9780    74882
970209   57.097740   POINT (-1584812.137 476873.424)         3026    12521
1039554   2.958242   POINT (-1424251.015 402951.542)         4701    22721
898496    3.903226   POINT (-1432068.881 555601.140)         8372    63000
826509    3.833730   POINT (-1516534.952 632778.194)         7015


Joined settlement ID onto vectorized raster cells for Msk_D_2000_avg.tif. Thu Jan 19 19:24:13 2023 

             value                         geometry  index_right  Sett_ID
1067798   2.551342  POINT (-1394657.008 372545.426)         4710    22731
1049566   3.027961  POINT (-1422457.221 392113.396)         4687    22707
606667    4.541837  POINT (-1190308.817 871857.747)        10942   133991
968379   41.890804  POINT (-1593480.324 478787.304)         3087    12883
970200   48.277779  POINT (-1592601.928 476822.713)         3087    12883
875087    3.168706  POINT (-1214876.909 582357.504)         9829    74961
1007747   4.625306  POINT (-1387240.562 437672.994)         5119    24517
715449    3.980515  POINT (-1551939.042 752315.677)        10152    85422
1041744   2.640152  POINT (-1104090.980 402638.141)         6303    38127
1087823   2.724042  POINT (-1390207.483 350864.082)         1337     3809

Exported as table. Msk_D_2000_avg.tif. Thu Jan 19 19:24:13 2023 


Cells per settle


Joined settlement ID onto vectorized raster cells for Msk_D_2001_avg.tif. Thu Jan 19 19:25:30 2023 

             value                          geometry  index_right  Sett_ID
997720   29.755117   POINT (-1402014.539 448430.726)         3884    19120
1110544   2.741602   POINT (-1415151.796 326047.248)          998     2741
815581    2.873044   POINT (-1523537.194 644526.118)         7015    49133
794644    2.946522   POINT (-1529741.588 667080.734)         7786    56245
32564     3.079179  POINT (-1100294.417 1484210.512)        12226   191801
1183819   2.757239   POINT (-1003950.914 249088.582)         2034     6320
811906    2.853117   POINT (-1553876.630 648264.913)         7398    51744
1094898   5.977622   POINT (-1567498.887 341894.420)          158      448
38005     2.804409  POINT (-1116781.150 1478373.823)        12208   191757
809193    2.818343   POINT (-1539170.817 651305.321)         7785    56244

Exported as table. Msk_D_2001_avg.tif. Thu Jan 19 19:25:30 2023 


Cells


Exported as table. Msk_D_2001_cfc.tif. Thu Jan 19 19:26:46 2023 


Count of cloud-free observations averaged to settlement level, year 2001. Thu Jan 19 19:26:46 2023 

       Sett_ID  NTLcfc_D_2001
7243     54805      48.000000
5573     33159     253.638889
4460     22687     255.000000
5572     33156     255.000000
2055      6416     255.000000
9656     97910     255.000000
933       2717     255.000000
5935     36525     243.476190
10462   146039     251.920635
2398      7825     211.333333

Merged year 2001 onto latest year settlement feature layer. Thu Jan 19 19:26:46 2023 

       Sett_ID  NTLct_D_avg1999  NTLsum_D_avg1999  NTLavg_D_avg1999  \
6052     36262              1.0          2.950807          2.950807   
2075      6361              2.0          5.629714          2.814857   
2395      7656              NaN               NaN               NaN   
10467   104520              1.0          3.126068          3.126068   
4458     20788              NaN               NaN         

Created pandas dataframe from Msk_D_2002_cfc.tif. Thu Jan 19 19:26:48 2023 

Created geodataframe from non-NoData points, Msk_D_2002_cfc.tif. Thu Jan 19 19:26:48 2023 


Joined settlement ID onto vectorized raster cells for Msk_D_2002_cfc.tif. Thu Jan 19 19:28:03 2023 

         value                          geometry  index_right  Sett_ID
607002     255   POINT (-1688785.846 867944.087)        10193    88984
1139887    255   POINT (-1222098.628 295499.399)         1846     5226
103149     255  POINT (-1443147.877 1407428.432)        11757   175762
888903     255   POINT (-1071071.495 568275.000)         6409    38382
90258      255  POINT (-1574428.008 1420096.938)        11729   175220
1113871    255   POINT (-1685867.779 320377.813)            7        9
160770     255  POINT (-1190021.791 1348063.810)        11945   190217
578582     255   POINT (-1082076.194 902668.315)        10998   136073
39562      255  POINT (-1345339.257 1475304.608)        12280   192845
1110449    255   PO

Created pandas dataframe from Msk_D_2003_cfc.tif. Thu Jan 19 19:28:08 2023 

Created geodataframe from non-NoData points, Msk_D_2003_cfc.tif. Thu Jan 19 19:28:08 2023 


Joined settlement ID onto vectorized raster cells for Msk_D_2003_cfc.tif. Thu Jan 19 19:29:22 2023 

        value                          geometry  index_right  Sett_ID
308731    255  POINT (-1509569.748 1188618.061)        11203   145682
50609     255  POINT (-1234908.699 1464357.258)        11979   191089
278243    255  POINT (-1117356.794 1223678.879)        11533   163730
749838    255    POINT (-928776.537 719188.158)        10652   122071
715827    255   POINT (-1224397.245 754176.682)        10566   111590
906032    255   POINT (-1210381.521 548928.383)         9814    74935
92713     255  POINT (-1022679.157 1421096.578)        11949   190914
650470    255   POINT (-1083467.696 825385.631)        10836   126226
29402     255  POINT (-1475763.689 1485117.599)        12280   192845
536829    255    POINT (-9894

Created pandas dataframe from Msk_D_2004_cfc.tif. Thu Jan 19 19:29:27 2023 

Created geodataframe from non-NoData points, Msk_D_2004_cfc.tif. Thu Jan 19 19:29:27 2023 


Joined settlement ID onto vectorized raster cells for Msk_D_2004_cfc.tif. Thu Jan 19 19:30:41 2023 

         value                          geometry  index_right  Sett_ID
1030466     49   POINT (-1413924.916 412872.680)        12963   198721
1017698    255   POINT (-1438234.822 426533.986)         3605    16152
454264     255  POINT (-1566733.418 1032742.397)        10459   104213
514354     255   POINT (-1540303.021 968567.092)        10459   104213
975400     255   POINT (-1029981.479 474919.749)         6444    38861
991264     255   POINT (-1476483.321 454890.238)         3409    14684
353317     255  POINT (-1512748.019 1141045.034)        11119   143025
724988     255   POINT (-1171492.731 744630.210)        10730   122912
818867     255   POINT (-1041955.371 644108.038)         6418    38401
340518     255  POI

Created pandas dataframe from Msk_D_2005_cfc.tif. Thu Jan 19 19:30:45 2023 

Created geodataframe from non-NoData points, Msk_D_2005_cfc.tif. Thu Jan 19 19:30:46 2023 


Joined settlement ID onto vectorized raster cells for Msk_D_2005_cfc.tif. Thu Jan 19 19:32:00 2023 

         value                          geometry  index_right  Sett_ID
7618       255  POINT (-1427227.734 1508438.781)        12280   192845
863691     255   POINT (-1627125.116 591760.265)         6558    45288
1076133    255   POINT (-1269162.089 364351.292)         6242    36757
933089     255   POINT (-1420595.646 518264.732)         4031    19304
1195544    255   POINT (-1094689.251 235855.067)         2155     6476
65108      255  POINT (-1287838.947 1448705.500)        11979   191089
1183747    255   POINT (-1066211.045 248824.268)         2155     6476
216483     255  POINT (-1013447.356 1289941.600)        11626   169885
1144704    255    POINT (-991152.656 291610.283)         2201     6934
610361     255   PO

Created pandas dataframe from Msk_D_2006_cfc.tif. Thu Jan 19 19:32:04 2023 

Created geodataframe from non-NoData points, Msk_D_2006_cfc.tif. Thu Jan 19 19:32:05 2023 


Joined settlement ID onto vectorized raster cells for Msk_D_2006_cfc.tif. Thu Jan 19 19:33:20 2023 

         value                          geometry  index_right  Sett_ID
1194216    255   POINT (-1456129.042 234952.248)          585     1713
933843     255   POINT (-1555635.097 516456.842)        12592   197134
1132600    255   POINT (-1228192.631 303367.683)         1787     5075
525939     255    POINT (-963514.446 959809.979)        11045   136271
1124688    255    POINT (-987781.130 313345.382)         2216     7025
579644     255    POINT (-950281.784 902238.814)        11045   136271
506294     255   POINT (-1427608.209 978031.277)        10620   116741
197642     255  POINT (-1570222.864 1306333.911)        11203   145682
1011356    255   POINT (-1414044.897 433576.129)        12684   197688
263182     255  POI

Created pandas dataframe from Msk_D_2007_cfc.tif. Thu Jan 19 19:33:24 2023 

Created geodataframe from non-NoData points, Msk_D_2007_cfc.tif. Thu Jan 19 19:33:25 2023 


Joined settlement ID onto vectorized raster cells for Msk_D_2007_cfc.tif. Thu Jan 19 19:34:38 2023 

         value                          geometry  index_right  Sett_ID
904437     255   POINT (-1015587.062 551784.716)         6399    38356
1040513    255   POINT (-1381847.168 402209.227)         4741    22769
160726     255  POINT (-1228241.151 1347875.124)        11933   189942
392340     255  POINT (-1605337.764 1098670.208)        10185    85519
934382     255   POINT (-1089034.922 518984.649)         6390    38341
663229     255   POINT (-1066939.245 811749.863)        10653   122072
809335     255   POINT (-1416177.183 652049.505)         5697    28436
1137800    255   POINT (-1453035.204 296207.528)          634     1781
1166991    255   POINT (-1391447.523 264965.242)          834     2526
471708     255  POI

Created pandas dataframe from Msk_D_2008_cfc.tif. Thu Jan 19 19:34:42 2023 

Created geodataframe from non-NoData points, Msk_D_2008_cfc.tif. Thu Jan 19 19:34:42 2023 


Joined settlement ID onto vectorized raster cells for Msk_D_2008_cfc.tif. Thu Jan 19 19:34:44 2023 

         value                          geometry  index_right  Sett_ID
813795      42   POINT (-1494100.887 646672.993)         8927    68641
1008732     37   POINT (-1322330.671 437047.662)         5882    33920
1082414     37   POINT (-1346120.774 357031.935)         1504     4194
821943      36   POINT (-1530423.106 637604.981)         7350    51689
798357      39   POINT (-1466484.840 663540.131)         9359    69169
764602      40   POINT (-1540340.376 699414.268)         8080    59177
836462      39   POINT (-1565832.462 621656.015)         7268    51573
400985      64  POINT (-1210471.918 1092190.534)        11165   145497
852884      34   POINT (-1529347.691 604192.058)         7010    49128
441041      64  POI

Created pandas dataframe from Msk_D_2009_cfc.tif. Thu Jan 19 19:34:47 2023 

Created geodataframe from non-NoData points, Msk_D_2009_cfc.tif. Thu Jan 19 19:34:47 2023 


Joined settlement ID onto vectorized raster cells for Msk_D_2009_cfc.tif. Thu Jan 19 19:34:48 2023 

         value                          geometry  index_right  Sett_ID
968391      35   POINT (-1583093.853 478854.901)         3026    12521
985885      35   POINT (-1406415.530 461216.803)         3884    19120
229185      28  POINT (-1046389.299 1276265.098)        11680   173672
823774      23   POINT (-1520883.739 635699.257)         7015    49133
864848      39   POINT (-1413231.689 592108.580)         5690    28422
938319      27   POINT (-1619662.159 511117.167)         2495     7835
942811      33   POINT (-1669836.363 505855.482)        12410   196158
612126      26   POINT (-1191147.153 865985.237)        10842   126275
816561      29   POINT (-1462902.374 643914.335)         9609    72423
1093229     31   PO

Created pandas dataframe from Msk_D_2010_cfc.tif. Thu Jan 19 19:34:51 2023 

Created geodataframe from non-NoData points, Msk_D_2010_cfc.tif. Thu Jan 19 19:34:51 2023 


Joined settlement ID onto vectorized raster cells for Msk_D_2010_cfc.tif. Thu Jan 19 19:34:53 2023 

         value                         geometry  index_right  Sett_ID
805665      51  POINT (-1442184.988 655826.762)         9195    68944
851037      54  POINT (-1552741.754 606010.809)         7057    49179
807394      49  POINT (-1520993.878 653383.969)         7643    54700
850164      54  POINT (-1520706.220 607195.601)        12330   195330
1036841     53  POINT (-1409558.510 405995.325)        12938   198644
986324      58  POINT (-1026469.644 463110.759)         2129     6415
821038      47  POINT (-1526098.734 638614.678)         7015    49133
781095      53  POINT (-1442341.567 682342.262)         9701    73542
966552      64  POINT (-1599552.011 480717.854)         3087    12883
979315      60  POINT (-15795

Created pandas dataframe from Msk_D_2011_cfc.tif. Thu Jan 19 19:34:55 2023 

Created geodataframe from non-NoData points, Msk_D_2011_cfc.tif. Thu Jan 19 19:34:55 2023 


Joined settlement ID onto vectorized raster cells for Msk_D_2011_cfc.tif. Thu Jan 19 19:34:57 2023 

         value                          geometry  index_right  Sett_ID
964733      50   POINT (-1598699.371 482693.655)         3087    12883
1158777     50   POINT (-1412253.738 273734.556)          827     2511
872004      49   POINT (-1520559.242 583595.824)        12575   196753
845569      48   POINT (-1559707.354 611865.215)         3241    13090
891254      61   POINT (-1399213.141 563663.260)         5677    28382
348275      65  POINT (-1149996.714 1148845.025)        11394   155469
814637      52   POINT (-1552991.747 645323.102)         7390    51734
976585      48   POINT (-1579574.305 470010.810)         3026    12521
821948      56   POINT (-1526092.593 637632.082)         7015    49133
760983      49   PO

Created pandas dataframe from Msk_D_2012_cfc.tif. Thu Jan 19 19:34:59 2023 

Created geodataframe from non-NoData points, Msk_D_2012_cfc.tif. Thu Jan 19 19:34:59 2023 


Joined settlement ID onto vectorized raster cells for Msk_D_2012_cfc.tif. Thu Jan 19 19:35:01 2023 

         value                         geometry  index_right  Sett_ID
836528      56  POINT (-1508673.393 622015.850)         7015    49133
1077769     54  POINT (-1428334.781 361497.875)          950     2691
770087      52  POINT (-1518645.714 693660.649)        10195    89141
912152      59  POINT (-1426787.732 540870.153)         8441    63115
797409      54  POINT (-1499406.244 664322.417)         9498    71391
907591      63  POINT (-1436340.038 545735.155)         8626    65133
491270      70  POINT (-1040856.981 996562.768)        10965   135910
842852      50  POINT (-1548467.908 614885.923)         7073    49199
783675      40  POINT (-1572261.559 678594.872)        10029    79791
798281      44  POINT (-15323

Created pandas dataframe from Msk_D_2013_cfc.tif. Thu Jan 19 19:35:03 2023 

Created geodataframe from non-NoData points, Msk_D_2013_cfc.tif. Thu Jan 19 19:35:03 2023 


Joined settlement ID onto vectorized raster cells for Msk_D_2013_cfc.tif. Thu Jan 19 19:35:05 2023 

         value                          geometry  index_right  Sett_ID
997725      66   POINT (-1397687.381 448455.549)         3884    19120
94510       76  POINT (-1042655.228 1419087.368)        11949   190914
990345      63   POINT (-1484278.404 455828.429)         3769    17627
228177      67  POINT (-1131493.719 1276853.790)        11870   187865
999529      70   POINT (-1411522.882 446404.724)        12720   197726
1041499     65   POINT (-1316081.464 401586.673)         2007     6115
853816      59   POINT (-1510289.959 603327.457)         8738    66478
982253      66   POINT (-1399514.318 465197.990)         3884    19120
975673      61   POINT (-1581311.705 470984.825)         3026    12521
781076      54   PO

Created pandas dataframe from Msk_V_2012_cfc.tif. Thu Jan 19 19:35:07 2023 

Created geodataframe from non-NoData points, Msk_V_2012_cfc.tif. Thu Jan 19 19:35:07 2023 


Joined settlement ID onto vectorized raster cells for Msk_V_2012_cfc.tif. Thu Jan 19 19:35:15 2023 

         value                         geometry  index_right  Sett_ID
3713622     21  POINT (-1358729.442 524031.846)         5765    32216
3331060     31  POINT (-1516062.013 626392.937)         7015    49133
3229119     28  POINT (-1525327.499 653848.024)         7794    56253
3251025     29  POINT (-1496708.084 648130.641)         8974    68697
3214546     29  POINT (-1530982.170 657741.930)         7785    56244
3360075     24  POINT (-1561479.996 618243.478)         7274    51585
3210902     27  POINT (-1532720.682 658713.304)         7785    56244
4075697     23  POINT (-1403614.744 425749.624)        12674   197678
3772962     22  POINT (-1670268.944 505852.422)        12410   196158
4708969     20  POINT (-14406

Created pandas dataframe from Msk_V_2013_cfc.tif. Thu Jan 19 19:35:17 2023 

Created geodataframe from non-NoData points, Msk_V_2013_cfc.tif. Thu Jan 19 19:35:17 2023 


Joined settlement ID onto vectorized raster cells for Msk_V_2013_cfc.tif. Thu Jan 19 19:35:25 2023 

         value                          geometry  index_right  Sett_ID
3791288     43   POINT (-1615701.294 501295.808)         3100    12896
3880451     33   POINT (-1622898.581 477115.607)         2418     7748
3842206     32   POINT (-1633786.985 487386.477)         2420     7750
3873231     38   POINT (-1596945.467 479257.127)         3087    12883
3111034     50   POINT (-1432402.354 686327.903)         9314    69097
3003517     48   POINT (-1491918.993 714926.065)        10325    91403
3334691     55   POINT (-1519953.178 625385.901)         7015    49133
3981163     41   POINT (-1357893.722 451636.810)         5539    27235
3913313     33   POINT (-1578699.000 468538.449)         3026    12521
1237208     78  POI

Created pandas dataframe from Msk_V_2014_cfc.tif. Thu Jan 19 19:35:27 2023 

Created geodataframe from non-NoData points, Msk_V_2014_cfc.tif. Thu Jan 19 19:35:27 2023 


Joined settlement ID onto vectorized raster cells for Msk_V_2014_cfc.tif. Thu Jan 19 19:35:35 2023 

         value                         geometry  index_right  Sett_ID
3982896     48  POINT (-1395537.653 450931.659)         3884    19120
3143802     52  POINT (-1428885.558 677511.788)         9248    69014
4152213     40  POINT (-1370615.068 405230.973)         4974    23720
3659346     54  POINT (-1218559.592 539537.668)         9878    75080
3338313     37  POINT (-1527741.560 624354.434)         7015    49133
3130903     46  POINT (-1497772.970 680535.747)        10276    89559
3509253     54  POINT (-1588064.807 577756.158)         6897    48413
4490941     56  POINT (-1280144.385 313960.326)         1425     4087
2392763     63  POINT (-1160876.953 881289.256)        10857   126314
3192742     51  POINT (-15154

Created pandas dataframe from Msk_V_2015_cfc.tif. Thu Jan 19 19:35:38 2023 

Created geodataframe from non-NoData points, Msk_V_2015_cfc.tif. Thu Jan 19 19:35:38 2023 


Joined settlement ID onto vectorized raster cells for Msk_V_2015_cfc.tif. Thu Jan 19 19:35:45 2023 

         value                         geometry  index_right  Sett_ID
4419794     53  POINT (-1352474.770 332818.497)         1698     4570
3656659     52  POINT (-1593865.848 537867.799)         3091    12887
3403775     58  POINT (-1552744.642 606502.282)         7057    49179
4303382     56  POINT (-1323235.325 364557.179)         1901     5393
3440105     59  POINT (-1582990.610 596475.492)         6920    48441
3063708     68  POINT (-1435076.152 699073.752)         9334    69125
4681674     43  POINT (-1438557.418 261234.733)          695     1961
4559719     55  POINT (-1445239.237 294278.955)          791     2352
2829263     62  POINT (-1290301.348 763151.217)        10564   111588
3241876     60  POINT (-15179

In [36]:
# Check contents
AllSummaries.sort_values('NTLsum_D_avg2012', ascending=False).head(20)

Unnamed: 0,Sett_ID,NTLct_D_avg1999,NTLsum_D_avg1999,NTLavg_D_avg1999,NTLmax_D_avg1999,NTLmin_D_avg1999,NTLcfc_D_1999,NTLct_D_avg2000,NTLsum_D_avg2000,NTLavg_D_avg2000,...,NTLavg_V_avg2014,NTLmax_V_avg2014,NTLmin_V_avg2014,NTLcfc_V_2014,NTLct_V_avg2015,NTLsum_V_avg2015,NTLavg_V_avg2015,NTLmax_V_avg2015,NTLmin_V_avg2015,NTLcfc_V_2015
3884,19120,438.0,9611.361328,21.943747,61.909538,3.231744,87.160279,438.0,9599.539062,21.916756,...,7.43365,53.476597,0.353381,48.342286,1728.0,14608.425781,8.45395,63.478222,0.453195,62.276
3026,12521,210.0,6557.554199,31.22645,62.346153,4.289488,136.992063,210.0,6252.830078,29.775381,...,8.392214,35.826717,0.767897,40.35446,852.0,6976.91748,8.18887,33.587158,0.917058,54.590376
7015,49133,795.0,3117.941162,3.921938,22.779999,2.435407,86.184615,795.0,3027.046143,3.807605,...,1.008841,7.41559,0.008234,48.989076,378.0,443.144897,1.172341,7.096574,0.116093,60.507491
7785,56244,594.0,1916.081665,3.225727,7.825595,2.316017,60.635569,594.0,1881.4552,3.167433,...,0.965479,4.385991,0.002007,47.797247,190.0,163.064102,0.858232,3.687119,0.101485,59.085941
3087,12883,59.0,1648.25061,27.936451,60.402977,7.404324,148.153846,59.0,1543.542725,26.161741,...,5.113477,17.031408,0.851426,38.226496,234.0,1135.911011,4.854321,16.127314,0.98259,56.299145
11165,145497,71.0,1310.021362,18.451004,40.610146,3.764438,128.641026,71.0,1277.427124,17.991932,...,4.370294,28.266197,0.102738,69.014388,276.0,1161.651123,4.208881,25.3549,0.218844,81.003597
8080,59177,142.0,850.203857,5.987351,17.179167,2.729798,128.144531,142.0,859.820251,6.055072,...,1.434916,7.003358,0.073521,51.33871,341.0,490.429718,1.43821,8.337029,0.282365,64.310036
10792,126049,62.0,698.9422,11.273262,26.21007,3.438859,203.679104,62.0,670.161926,10.809064,...,2.271091,18.159517,0.186436,66.702041,229.0,598.111206,2.611839,19.830303,0.212812,65.885714
11949,190914,30.0,311.628357,10.387611,27.949089,2.920383,253.327422,30.0,318.997803,10.63326,...,1.697803,4.522364,0.196655,100.570175,110.0,194.188782,1.765353,3.902459,0.339373,113.763158
9780,74882,44.0,331.212311,7.527553,15.69958,2.848894,218.813433,44.0,339.305237,7.711483,...,1.833448,8.98857,0.213823,56.372222,178.0,337.004608,1.893284,8.955063,0.291159,61.994444


---

## 10. FLOOD EXPOSURE BY RETURN PERIOD

Reclassify flood layer:
 - -9999 to 0
 - More than 0 to 1

Reclassify buildup layer:
 - Less than 1985 to 0
 - More than 2015 to 0
 - else: same as original
 - Assign 0 as the NoData value for both datasets.

Resample flood layer to buildup's shape:
 - Check CRS match. If not, warp Flood to same CRS as WSFE.
 - Crop Flood to WSFE.
 - Change Flood spatial resolution to WSFE with gdal.Warp().

Retain year info for only WSFE cells that are flooded.
 - WSFE * Flood = WSFE_Flooded (0 and NoData get assigned to any non-flooded buildup)

Generate a spatial object from flooded-only buildup cells
 - WSFE_Flooded to numpy array to Pandas points dataframe, excluding 0 and NoData.

Calculate area at risk of flood per settlement per year (that year's new buildup only)
 - Create "area" field for each feature with the value 30 (30 square meter resolution)
 - Spatial join Settlement IDs onto WSFE_Flooded_dataframe
 - Calculate sum of area field for each [Sett_ID, year] group. (new dataframe created)

Calculate area at risk of flood per settlement per year (cumulative area in that year)
 - Iterating through the study years, subset the sum area dataframe to that year's build-up to-date
 - Calculate sum of area field for each Sett_ID
 - Assign sum of area for that cumulative year to a Settlements dataframe by table join on Sett_ID

Calculate percent flooded
 - Area_flooded_year / Area_year = percent_flooded_year
 - Save to file as csv.

### 10.1 Reclassify and set NoData values of rasters.

In [22]:
# From Stack Exchange @RutgerH
# https://gis.stackexchange.com/questions/163685/reclassify-a-raster-value-to-9999-and-set-it-to-the-nodata-value-using-python-a

# Default arguments can be changed here, or can be specified below when running the functions.

def readRaster(filename):
    filehandle = gdal.Open(filename)
    band1 = filehandle.GetRasterBand(1)
    geotransform = filehandle.GetGeoTransform()
    geoproj = filehandle.GetProjection()
    Z = band1.ReadAsArray()
    xsize = filehandle.RasterXSize
    ysize = filehandle.RasterYSize
    return xsize,ysize,geotransform,geoproj,Z

def writeRaster(filename,geotransform,geoprojection,data):
    (x,y) = data.shape
    Dformat = "GTiff"
    driver = gdal.GetDriverByName(Dformat)
    # you can change the dataformat but be sure to be able to store negative values including -9999
    dst_datatype = gdal.GDT_UInt32
    dst_ds = driver.Create(filename,y,x,1,dst_datatype)
    dst_ds.GetRasterBand(1).WriteArray(data)
    dst_ds.SetGeoTransform(geotransform)
    dst_ds.SetProjection(geoprojection)
    dst_ds.GetRasterBand(1).SetNoDataValue(0) # For both buildup and flood, we want NoData to be zero after reclassification.
    return 1


# Based on Stack Exchange @Kurt Schwehr:
# https://stackoverflow.com/questions/10454316/how-to-project-and-resample-a-grid-to-match-another-grid-with-gdal-python

def resampleRaster(InRaster_Path, MatchRaster_Path, OutFile_Path):

    RasterObject = gdal.Open(InRaster_Path)
    In_proj = RasterObject.GetProjection()
    print('Loading for %s. %s' % (InRaster_Path, time.ctime()))
    
    [Match_x, Match_y, Match_geo, Match_proj, Match_Z] = readRaster(MatchRaster_Path)
    print('---Specs to match to: \n', 
      Match_proj, '\n', Match_geo, '\n', Match_x, '\n', Match_y, '\n')
        
    OutFile = gdal.GetDriverByName('GTiff').Create(OutFile_Path, Match_x, Match_y, 1, gdalconst.GDT_UInt32)
    OutFile.SetGeoTransform(Match_geo)
    OutFile.SetProjection(Match_proj)
    print('---Created raster file for upsampled version. %s' % time.ctime())
    
    gdal.ReprojectImage(RasterObject, OutFile, In_proj, Match_proj, gdal.GRA_NearestNeighbour) # Nearest because categorical.
    print('---Resampled flood values onto an empty raster matching the dimensions of the buildup layer. %s' % time.ctime())
    return 1

##### Flood layers

In [12]:
InRasters = ['FD_1in20.tif', 'FD_1in100.tif', 'FD_1in1000.tif']
BinaryRasters = []

for Raster in InRasters:
    InPath = os.path.join(ProjectFolder, 'Hazard', Raster)
    OutPath = os.path.join(ProjectFolder, 'Hazard', Raster.replace('.tif', '_binary.tif'))
    
    # Together, x and y define the data's "shape".
    # geotransform contains the parameters detailing how the raster should be stretched and aligned.
    # geoproj is the map projection
    # Z are the values in the raster band.
    [xsize, ysize, geotransform, geoproj, Z] = readRaster(InPath)
    
    Z[Z<=0] = 0
    Z[Z>0] = 1
    
    writeRaster(OutPath,geotransform,geoproj,Z)
    InPath = OutPath = None
    BinaryRasters = BinaryRasters + [Raster.replace('.tif', '_binary.tif')]
    
    print('Finished reclassifying %s. %s' % (Raster, time.ctime()))

print('\nNew flood set: %s' % BinaryRasters)

Finished reclassifying FD_1in20.tif. Mon Jan 23 16:24:21 2023
Finished reclassifying FD_1in100.tif. Mon Jan 23 16:24:27 2023
Finished reclassifying FD_1in1000.tif. Mon Jan 23 16:24:35 2023

New flood set: ['FD_1in20_binary.tif', 'FD_1in100_binary.tif', 'FD_1in1000_binary.tif']


##### Buildup
This dataset should already be classed correctly, but the NoData value needs to be zero so we'll run it anyway.

In [13]:
InPath = os.path.join(ProjectFolder, 'Buildup', 'WSFE_WGS84.tif')
OutPath = os.path.join(ProjectFolder, 'Buildup', 'WSFE_reclass.tif')

[xsize,ysize,geotransform,geoproj,Z] = readRaster(InPath)
Z[Z<1985] = 0
Z[Z>2015] = 0

writeRaster(OutPath,geotransform,geoproj,Z)
InPath = None
OutPath = None

print('Finished reclassifying WSFE. %s' % time.ctime())

Finished reclassifying WSFE. Mon Jan 23 16:25:23 2023


### 10.2 Resample flood data to match buildup layer.

 - Align flood to WSFE: CRS, extent, origin, and resolution.

In [None]:
# BinaryRasters = ['FD_1in20_binary.tif', 'FD_1in100_binary.tif', 'FD_1in1000_binary.tif']
ResampledRasters = []

WSFEPath = os.path.join(ProjectFolder, 'Buildup', 'WSFE_reclass.tif')

for Raster in BinaryRasters:
    RasterPath = os.path.join(ProjectFolder, 'Hazard', Raster)
    OutPath = os.path.join(ProjectFolder, 'Hazard', Raster.replace('_binary', '_upsample'))
    resampleRaster(RasterPath, WSFEPath, OutPath)
    ResampledRasters = ResampledRasters + Raster.replace('_binary', '_upsample')
    
print('Done. %s' % time.ctime())

Specs to match to: 
 GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]] 
 (8.408148714860227, 0.0002701731237957677, 0.0, 13.11588340291311, 0.0, -0.0002701731237957677) 
 28990 
 42564 

Loading for Q:\GIS\povertyequity\urban_growth\Cameroon\Hazard\FD_1in20_binary.tif. Mon Jan 23 17:07:11 2023
---Created raster file for upsampled version. Mon Jan 23 17:07:11 2023


In [3]:
### STEP 2: LOAD THE PRE-RESAMPLED FLOOD DATASET ###
for Raster in BinaryRasters:
    RasterObject = gdal.Open(os.path.join(ProjectFolder, 'Hazard', Raster))
    In_Proj = RasterObject.GetProjection()
    print('Loading for %s. %s' % (Raster, time.ctime()))

### STEP 3: APPLY THE PARAMETERS FROM STEP 1 TO THE OBJECT THAT WILL BECOME OUR NEW UPSAMPLED FLOOD RASTER. ###
    OutName = Raster.replace('_binary', '_upsample')
    OutFile = gdal.GetDriverByName('GTiff').Create(os.path.join(ProjectFolder, 'Hazard', OutName), 
                                                   W_xsize, W_ysize, 1, gdalconst.GDT_UInt32)
    OutFile.SetGeoTransform(Match_GeoTr)
    OutFile.SetProjection(Match_Proj)
    print('---Created raster file for upsampled version of %s. %s' % (Raster, time.ctime()))

### STEP 4: APPLY FLOOD DATA TO THE PREPARED RASTER LOCATION. ###
    gdal.ReprojectImage(RasterObject, OutFile, In_Proj, Match_Proj, gdalconst.GRA_Bilinear)
    print('---Resampled %s flood values onto raster that matches buildup. %s' % (Raster, time.ctime()))

### STEP 5: ENSURE NODATA IS STILL ASSIGNED AS ZERO. ###
    OutFile.GetRasterBand(1).SetNoDataValue(0)
    print(OutFile.GetRasterBand(1).GetNoDataValue())
    
    
    ResampledRasters = ResampledRasters + [OutName]
    RasterObject = OutFile = None
    
print('Completed resampling to match built-up areas. %s' % time.ctime())
print('New flood set: %s' % ResampledRasters)
WSFE = None

Built-up area raster information: 
 GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]] 
 (8.408148714860227, 0.0002701731237957677, 0.0, 13.11588340291311, 0.0, -0.0002701731237957677) 
 28990 
 42564 

Loading for FD_1in20_binary.tif. Fri Jan 20 20:23:48 2023
---Created raster file for upsampled version of FD_1in20_binary.tif. Fri Jan 20 20:23:48 2023
---Resampled FD_1in20_binary.tif flood values onto raster that matches buildup. Fri Jan 20 20:25:24 2023
0.0
Loading for FD_1in100_binary.tif. Fri Jan 20 20:25:30 2023
---Created raster file for upsampled version of FD_1in100_binary.tif. Fri Jan 20 20:25:30 2023
---Resampled FD_1in100_binary.tif flood values onto raster that matches buildup. Fri Jan 20 20:27:08 2023
0.0
Loading for FD_1in10

In [50]:
import subprocess

In [51]:
# define paths to raster and vector
inraster = os.path.join(ProjectFolder, 'Buildup', 'WSFE_reclass.tif')
inshape = os.path.join(ProjectFolder, 'Hazard','FD_1in0_binary.shp')

# make gdal_rasterize command - will burn value 0 to raster where polygon intersects 
cmd = 'gdal_rasterize -burn 1 '+ inshape + ' ' + inraster

# run command
subprocess.call(cmd, shell=True)

1

In [57]:
test1 = 'test.tif'
test2 = r'Hazard\test2.tif'

cmd = 'gdal_calc.py -A ' + test1 + ' -B ' + test2 + ' --outfile=test3.tif --overwrite --calc="A*B"*5'

subprocess.call(cmd, shell=True)

0

### 10.3 Retain only built area year values where cells were flooded.

In [4]:
WSFEPath = os.path.join(ProjectFolder, 'Buildup', 'WSFE_reclass.tif')

ResampledRasters = ['FD_1in20_upsample.tif', 'FD_1in100_upsample.tif', 'FD_1in1000_upsample.tif']
FloodedRasters = []


for Raster in ResampledRasters:
    InPath = os.path.join(ProjectFolder, 'Hazard', Raster)
    OutPath = os.path.join(ProjectFolder, 'Buildup', Raster.replace('_upsample', '_WSFEintersect.tif'))
    
    [xsize,ysize,geotransform,geoproj,Z] = readFile(InPath)
    [W_xsize, W_ysize, W_geotransform, W_geoproj, W_Z] = readFile(WSFEPath)
    Z = Z * W_Z
    
    writeFile(OutPath,geotransform,geoproj,Z)
    InPath = None
    OutPath = None
    FloodedRasters = FloodedRasters + [Raster.replace('_upsample', '_WSFEintersect')]
    
    print('Finished identifying flooded buildup for %s. %s' % (Raster, time.ctime()))

print('New flood set: %s' % FloodedRasters)

Finished identifying flooded buildup for FD_1in20_upsample.tif. Mon Jan 23 11:31:01 2023
Finished identifying flooded buildup for FD_1in100_upsample.tif. Mon Jan 23 11:31:55 2023
Finished identifying flooded buildup for FD_1in1000_upsample.tif. Mon Jan 23 11:32:51 2023
New flood set: ['FD_1in20_WSFEintersect.tif', 'FD_1in100_WSFEintersect.tif', 'FD_1in1000_WSFEintersect.tif']


In [8]:
WSFE = rasterio.open(os.path.join(ProjectFolder, 'Buildup', 'WSFE_reclass.tif'))
WSFEband = WSFE.read(1, masked=True)
meta = WSFE.meta.copy()
meta.update(compress='lzw')
WSFE.meta

print('Prepared buildup layer and metadata. %s' % time.ctime())

Prepared buildup layer and metadata. Fri Jan 20 21:02:11 2023


In [10]:
# ResampledRasters = ['FD_1in20_upsample.tif', 'FD_1in100_upsample.tif', 'FD_1in1000_upsample.tif']
FloodedRasters = []

for Raster in ResampledRasters:
    InPath = os.path.join(ProjectFolder, 'Hazard', Raster)
    OutPath = os.path.join(ProjectFolder, 'Buildup', ''.join(['WSFE_', Raster.replace('_upsample', '')]))
    
    FloodObject = rasterio.open(InPath)
    FloodBand = FloodObject.read(1, masked=True)
    print('Loaded %s as object. %s' % (Raster, time.ctime()))
    
    Flooded = WSFEband * FloodBand
    print('Calculated flooded built-up areas. %s' % time.ctime())
    
    with rasterio.open(OutPath, 'w+', **meta) as out:
        out.write(Flooded,1)
    
    FloodedRasters = FloodedRasters + [''.join(['WSFE_', Raster.replace('_upsample', '')])]
    out = None
    InPath = None
    OutPath = None
    print('Written to file. %s' % time.ctime())

Loaded FD_1in20_upsample.tif as object. Fri Jan 20 21:04:34 2023
Calculated flooded built-up areas. Fri Jan 20 21:04:37 2023
Written to file. Fri Jan 20 21:05:03 2023
Loaded FD_1in100_upsample.tif as object. Fri Jan 20 21:05:37 2023
Calculated flooded built-up areas. Fri Jan 20 21:05:44 2023
Written to file. Fri Jan 20 21:06:05 2023
Loaded FD_1in1000_upsample.tif as object. Fri Jan 20 21:06:37 2023
Calculated flooded built-up areas. Fri Jan 20 21:06:40 2023
Written to file. Fri Jan 20 21:07:00 2023


In [6]:
# Some non-flooded cells were showing -9999 upon spot-checking. Let's be thorough and reclass to just the buildup years again.

FloodedRasters = ['WSFE_FD_1in20.tif', 'WSFE_FD_1in100.tif', 'WSFE_FD_1in1000.tif']
FloodedRasters2 = []

for Raster in FloodedRasters:
    InPath = os.path.join(ProjectFolder, 'Buildup', Raster)
    OutPath = os.path.join(ProjectFolder, 'Buildup', Raster.replace('.tif', '_reclass.tif'))
    
    [xsize,ysize,geotransform,geoproj,Z] = readFile(InPath)
    Z[Z<1985] = 0
    Z[Z>2015] = 0
    
    writeFile(OutPath,geotransform,geoproj,Z)
    InPath = None
    
    OutRaster = gdal.Open(OutPath)
    OutBand = OutRaster.GetRasterBand(1)
    OutBand.SetNoDataValue(0)
    OutPath = None
    OutRaster = None
    OutBand = None

    FloodedRasters2 = FloodedRasters2 + [Raster.replace('.tif', '_reclass.tif')]
    print('Finished reclassifying flooded cells for %s. %s' % (Raster, time.ctime()))

Finished reclassifying flooded cells for WSFE_FD_1in20.tif. Fri Jan 20 21:29:14 2023
Finished reclassifying flooded cells for WSFE_FD_1in100.tif. Fri Jan 20 21:29:40 2023
Finished reclassifying flooded cells for WSFE_FD_1in1000.tif. Fri Jan 20 21:30:05 2023


In [None]:
FloodedRasters2

# SCRATCH

### 10.2 Polygonize flood data.

In [19]:
# https://pcjericks.github.io/py-gdalogr-cookbook/raster_layers.html?highlight=vectorize
# https://www.e-education.psu.edu/geog489/node/2215

# BinaryRasters = ['FD_1in20_binary.tif', 'FD_1in100_binary.tif', 'FD_1in1000_binary.tif']

for Raster in BinaryRasters:
    InObject = gdal.Open(os.path.join(ProjectFolder, 'Hazard', Raster))
    InBand = InObject.GetRasterBand(1)
    print('Loading %s. %s' % (Raster, time.ctime()))
    
    OutDriver = ogr.GetDriverByName("ESRI Shapefile")
    OutName = Raster.replace('.tif','')
    OutPath = os.path.join(ProjectFolder, 'Hazard', ''.join([OutName, '.shp']))

    SpatRef = osr.SpatialReference()
    Proj = InObject.GetProjectionRef()
    SpatRef.ImportFromWkt(Proj)
    
    if os.path.exists(OutPath):
        OutDriver.DeleteDataSource(OutPath)
    OutFile = OutDriver.CreateDataSource(OutPath)
    OutLayer = OutFile.CreateLayer(OutName, srs = SpatRef, geom_type=ogr.wkbPolygon)
    OutField = ogr.FieldDefn("Flooded", ogr.OFTInteger)
    OutLayer.CreateField(OutField)
    OutField = OutLayer.GetLayerDefn().GetFieldIndex("Flooded")
    
    print('Vectorizing. %s' % time.ctime())
    gdal.Polygonize(InBand, None, OutLayer, -1, [], callback=None)
    print('Completed polygons for %s. %s' % (Raster, time.ctime()))
    
    del InObject
    del InBand
    del OutFile
    del OutLayer
    del OutField

print('Finished. %s' % time.ctime())

Loading FD_1in20_binary.tif. Mon Jan 23 12:26:11 2023


AttributeError: 'NoneType' object has no attribute 'CreateLayer'

In [44]:
InObject = gdal.Open(os.path.join(ProjectFolder, 'Hazard', 'FD_1in20_binary.tif'))
InBand = InObject.GetRasterBand(1)
print('Loading %s. %s' % (Raster, time.ctime()))

Loading FD_1in20_binary.tif. Mon Jan 23 12:34:50 2023


In [45]:
OutDriver = ogr.GetDriverByName("ESRI Shapefile")
OutName = Raster.replace('.tif','')
OutPath = os.path.join(ProjectFolder, 'Hazard', ''.join([OutName, '.shp']))
print(OutDriver, OutName, OutPath)

<osgeo.ogr.Driver; proxy of <Swig Object of type 'OGRDriverShadow *' at 0x000001FABD4BA9D0> > FD_1in20_binary Q:\GIS\povertyequity\urban_growth\Cameroon\Hazard\FD_1in20_binary.shp


In [46]:
SpatRef = osr.SpatialReference()
Proj = InObject.GetProjectionRef()
SpatRef.ImportFromWkt(Proj)
print(Proj, '\n\n', SpatRef)

GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]] 

 GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0,
        AUTHORITY["EPSG","8901"]],
    UNIT["degree",0.0174532925199433,
        AUTHORITY["EPSG","9122"]],
    AXIS["Latitude",NORTH],
    AXIS["Longitude",EAST],
    AUTHORITY["EPSG","4326"]]


In [47]:
OutFile = OutDriver.CreateDataSource(OutPath)
OutLayer = OutFile.CreateLayer(OutName, srs = SpatRef, geom_type=ogr.wkbPolygon)
OutField = ogr.FieldDefn("Flooded", ogr.OFTInteger)
OutLayer.CreateField(OutField)
OutField = OutLayer.GetLayerDefn().GetFieldIndex("Flooded")

In [48]:
print(OutFile, '\n', OutLayer, '\n', OutField)

<osgeo.ogr.DataSource; proxy of <Swig Object of type 'OGRDataSourceShadow *' at 0x000001FABC22D0B0> > 
 <osgeo.ogr.Layer; proxy of <Swig Object of type 'OGRLayerShadow *' at 0x000001FABD4FFF60> > 
 0


In [49]:
print('Vectorizing. %s' % time.ctime())
gdal.Polygonize(InBand, None, OutLayer, 0, [], callback=None)
print('Completed polygons for %s. %s' % (Raster, time.ctime()))

del InObject
del OutFile

print('Finished. %s' % time.ctime())

Vectorizing. Mon Jan 23 12:34:57 2023
Completed polygons for FD_1in20_binary.tif. Mon Jan 23 12:36:04 2023
Finished. Mon Jan 23 12:36:05 2023
