# Spatiotemporal Trends in Urbanization
## Overview
This repository investigates year-by-year change in cities and settlements in Central and Western Africa (CWA). The goal is to capture activity for every settlement locality in a country to produce indicators that are high frequency, spatially granular, and timely. The Jupyter Notebook is the primary script used to construct each country's dataset. It tracks population, built-area, and economic and climate indicators across a 16-year timeframe from 2000 to 2015. 

The repository is split into three sections: methodology and notebooks, source data, and outputs. Outputs are organized by country and include growth tables ("urban panel datasets"), charts, and country briefs.


## Datasets
Datasets used to create each African country's urban panel data are as follows:
1. Most up-to-date administrative boundaries.
2. City names: **UCDB, Africapolis, and GeoNames.**
3. Settlement types: **GRID3 settlement extents.** Captured between 2009-2019.
4. Built-up area, yearly: **World Settlement Footprint Evolution.** Resolution: 30m.
5. Population, yearly: **WorldPop.** UN-adjusted, unconstrained. Resolution: 100m.
6. Nighttime lights, yearly: **Harmonization of DMSP and VIIRS.** Resolution: 1km and 500m.
7. Flood extents, by return period: **FATHOM.** Resolution: 90m.


## Accessing Data
Source data are available to the public by providers listed in the previous section, with the exception of flood data. Please note that the source data files in this repository have been fit for purpose and may not cover your area of interest. Some sources are also not global; GRID3 settlement extents are only available for sub-Saharan Africa, and Africapolis names for Africa.

Results from the analysis are currently available for Cameroon and are under development for Central African Republic and four Sahel countries: Burkina Faso, Chad, Mali, and Niger. Results are available in the outputs folder by country. Please contact the CWA Geospatial team to inquire about new locations.
<br>
> **Walker Kosmidou-Bradley**, wkosmidoubradley@worldbank.org
<br>
> **Grace Doherty**, gdoherty2@worldbank.org

## License
Materials under this repository are open-source under an MIT license. The community is invited to test, adapt, and re-purpose materials as needed.

---

## 1. PREPARE WORKSPACE

### 1.1 Off-script

##### Off-script: Create folders in your working directory. (The folder where you are storing this script).
> *ADM
<br>Buildup
<br>PlaceName
<br>Population
<br>Settlement
<br>NTL*

##### Before starting: Download datasets (as shapefile, GeoJSON, or tif where possible) and place or extract into corresponding folder. You can download the cleaned files from our [GitHub Repository](https://github.com/worldbank/Urban_Spatio_Temporal_Trends) or access original sources here:
- ADM: *Varies by source.*
- Buildup: https://download.geoservice.dlr.de/WSF_EVO/files/
- PlaceName: 
    - GeoNames: (file: cities500.zip) https://download.geonames.org/export/dump/
    - Africapolis: https://africapolis.org/en/data
    - Urban Centres Database: https://ghsl.jrc.ec.europa.eu/ghs_stat_ucdb2015mt_r2019a.php
- Population: https://hub.worldpop.org/geodata/listing?id=69
- Settlement: https://data.grid3.org/datasets/GRID3::grid3-cameroon-settlement-extents-version-01-01-/explore
- Nighttime Lights: https://eogdata.mines.edu/products/dmsp/#v4 and https://eogdata.mines.edu/products/vnl/#annual_v2

##### Other off-script:
- Convert GeoNames from .txt file to shape (delimiter = tab, header rows = 0) and rename fields.
- If necessary, mosaic WSFE rasters that cover the area of interest to create a single file.

### 1.2 Load all packages.

In [1]:
# Built-in:
# dir(), print(), range(), format(), int(), len(), list(), max(), min(), zip(), sorted(), sum(), open(), del, = None, try except, with as, for in, if elif else
# Also: list.append(), string.zfill(), list.insert(), list.remove(), string.join(), count(), startswith(), endswith(), contains(), replace()

import os, sys, glob, re, time, subprocess # os.getcwd(), os.path.join(), os.listdir(), os.remove(), time.ctime(), glob.glob()
from os.path import exists # exists()
from functools import reduce # reduce()

import geopandas as gpd # read_file(), GeoDataFrame(), sjoin_nearest(), to_crs(), to_file(), .crs, buffer(), dissolve()
import pandas as pd # .dtypes, Series(), concat(), DataFrame(), read_table(), merge(), to_csv(), .loc[], head(), sample(), astype(), unique(), rename(), between(), drop(), fillna(), idxmax(), isna(), isin(), apply(), info(), sort_values(), notna(), groupby(), value_counts(), duplicated(), drop_duplicates()
from shapely.geometry import Point, LineString, Polygon, shape, MultiPoint
from shapely.ops import cascaded_union
from shapely.validation import make_valid  # in apply(make_valid)
import shapely.wkt

import numpy as np # median(), mean(), tolist(), .inf
import fiona, rioxarray # fiona.open()
import rasterio # open(), write_band(), .name, .count, .width, .height. nodatavals, .meta, update(), copy(), write()
from rasterio.plot import show
from rasterio import features # features.rasterize()
from rasterio.features import shapes
from rasterio import mask # rasterio.mask.mask()
from osgeo import gdal, osr, ogr, gdal_array, gdalconst # Open(), SpatialReference, WarpOptions(), Warp(), GetDataTypeName(), GetRasterBand(), GetNoDataValue(), Translate(), GetProjection(), GetAttrValue()

### 1.3 User-defined functions.

In [2]:
# From Stack Exchange @RutgerH
# https://gis.stackexchange.com/questions/163685/reclassify-a-raster-value-to-9999-and-set-it-to-the-nodata-value-using-python-a
def readRaster(filename):
    filehandle = gdal.Open(filename)
    band1 = filehandle.GetRasterBand(1)
    geotransform = filehandle.GetGeoTransform()
    geoproj = filehandle.GetProjection()
    Z = band1.ReadAsArray()
    xsize = filehandle.RasterXSize
    ysize = filehandle.RasterYSize
    return xsize,ysize,geotransform,geoproj,Z

In [3]:
# Default arguments can be changed here, or can be specified below when running the functions.
def writeRaster(filename,geotransform,geoprojection,data, NoDataVal=0):
    (x,y) = data.shape
    Dformat = "GTiff"
    driver = gdal.GetDriverByName(Dformat)
    # you can change the dataformat but be sure to be able to store negative values including -9999
    dst_datatype = gdal.GDT_UInt32
    dst_ds = driver.Create(filename,y,x,1,dst_datatype)
    dst_ds.GetRasterBand(1).WriteArray(data)
    dst_ds.SetGeoTransform(geotransform)
    dst_ds.SetProjection(geoprojection)
    dst_ds.GetRasterBand(1).SetNoDataValue(NoDataVal)
    return 1
    dst_ds = None

In [4]:
# Based on Stack Exchange @Kurt Schwehr:
# https://stackoverflow.com/questions/10454316/how-to-project-and-resample-a-grid-to-match-another-grid-with-gdal-python
def resampleRaster(InRaster_Path, MatchRaster_Path, OutFile_Path):
    print('Loading for %s. %s' % (InRaster_Path, time.ctime()))
    
    RasterObject = gdal.Open(InRaster_Path)
    In_proj = RasterObject.GetProjection()
    [Match_x, Match_y, Match_geo, Match_proj, Match_Z] = readRaster(MatchRaster_Path)
    print('---Specs to match to: \n', 
      Match_proj, '\n', Match_geo, '\n', Match_x, '\n', Match_y, '\n')
        
    OutFile = gdal.GetDriverByName('GTiff').Create(OutFile_Path, Match_x, Match_y, 1, gdalconst.GDT_UInt32)
    OutFile.SetGeoTransform(Match_geo)
    OutFile.SetProjection(Match_proj)
    print('---Created raster file for upsampled version. %s' % time.ctime())
    
    gdal.ReprojectImage(RasterObject, OutFile, In_proj, Match_proj, gdal.GRA_NearestNeighbour) # Nearest because categorical.
    print('---Resampled flood values onto an empty raster matching the dimensions of the buildup layer. %s \n\n' % time.ctime())
    
    OutFile.GetRasterBand(1).SetNoDataValue(0)
    
    RasterObject = Outfile = None
    return 1

In [5]:
def calcShell(A, B, OutFile, Calculation):
    """Raster math using gdal_calc.py.

    The OSgeo package for Python API does not make raster calculations
    easy outside of the shell. This function plugs two raster files
    into a string which subprocess.call() then commits to the terminal.
    It only works with two objects but easy to adapt for 3 or more.

        A : str
            File path to the first raster for the calculation.
        B : str
            File path to the second raster for the calculation.
        OutFile : str
            File path where to store the raster generated from the calculation.
        Calculation : str
            Algebra that uses A and B to create a new raster. Use double quotes.
    """
    print('Running for %s. %s' % (A, time.ctime()))
    cmd = 'gdal_calc.py -A ' + A + ' -B ' + B + ' --outfile=' + OutFile + ' --overwrite --calc=' + Calculation
    subprocess.call(cmd, shell=True)
    cmd = A = B = None
    print('Ran in shell. See OutFile folder to inspect results. %s' % time.ctime())

In [6]:
def RasterToShapefile(InRasterPath, OutFilePath = 'RastToShp.shp', Band=1, 
                      OutName='RastToShp', VariableName='value', Driver = 'ESRI Shapefile'):
    """Raster tiff to vector polygon shapefile.
    
    """
    Raster = gdal.Open(InRasterPath)
    RasterBand = Raster.GetRasterBand(Band)
    
    OutDriver = ogr.GetDriverByName(Driver)
    InProj = Raster.GetProjectionRef()
    SpatRef = osr.SpatialReference()
    SpatRef.ImportFromWkt(InProj)
    print(InProj, '\n\n', SpatRef)
    
    OutFile = OutDriver.CreateDataSource(OutFilePath)
    OutLayer = OutFile.CreateLayer(OutName, srs = SpatRef, geom_type = ogr.wkbPolygon)
    OutField = ogr.FieldDefn(VariableName, ogr.OFTInteger)
    OutLayer.CreateField(OutField)
    OutField = OutLayer.GetLayerDefn().GetFieldIndex(VariableName)
    print('\n', OutFile, '\n', OutLayer, '\n', OutField)
    
    print('Vectorizing. Input: %s. %s' % (InRasterPath, time.ctime()))
    gdal.Polygonize(RasterBand, None, OutLayer, 0, [], callback=None)
    print('Completed polygons. Stored as: %s. %s' % (OutFilePath, time.ctime()))

    del Raster, RasterBand, OutFile, OutLayer

In [7]:
def rioStats(InRasterPath, Band = 1):
    out = rasterio.open(InRasterPath)
    stats = []
    band = out.read(Band)
    stats.append({
        'raster': out.name,
        'bands': out.count,
        'data type': out.dtypes,
        'no data value': out.nodatavals,
        'width': out.width,
        'height': out.height,
        'min': band.min(),
        'mean': band.mean(),
        'median': np.median(band),
        'max': band.max()})
    print("\n", stats)
    
    out = band = None

In [8]:
def ShapeToRaster(Shapefile, ValueVar, MetaRasterPath, OutFilePath = 'ShpToRast.tif', Band=1):
    """
    Polygon spatial object to raster tiff.
    """
    # Copy and update the metadata from another raster for the output
    MetaRaster = rasterio.open(MetaRasterPath)
    meta = MetaRaster.meta.copy()
    meta.update(compress='lzw')
    MetaRaster.meta

    print("Rasterizing dataset. %s" % time.ctime())
    with rasterio.open(OutFilePath, 'w+', **meta) as out:
        out_arr = out.read(Band)

        # this is where we create a generator of geom, value pairs to use in rasterizing
        shapes = ((geom,value) for geom, value in zip(Shapefile.geometry, Shapefile[ValueVar]))

        burned = features.rasterize(shapes=shapes, fill=0, out=out_arr, transform=out.transform)
        out.write_band(1, burned)
    out = burned = shapes = None
    
    print("Finished rasterizing. Checking contents. %s" % time.ctime())
    rioStats(OutFilePath)

In [9]:
def ListFromRange(r1, r2):
    return [item for item in range(r1, r2+1)]

In [156]:
def MaskByZone(MaskPath, SourceFolder, DestFolder, MaskLayerName = None, dstSRS = 'ESRI:102022'):
    """
    Reduces the size of a raster's valid data cells to vector areas of interest.
    This is useful if the raster data needs to be vectorized later to save space.
    
    The script prepares the vector zones as a list of geometries in the desired
    spatial reference system, then warps each raster in the specified source
    folder to the same SRS. Masking in rasterio then reclassifies any raster cells
    falling outside of a mask polygon as NoData.
    """
    
    ProjSRS = osr.SpatialReference()
    ProjSRS.SetFromUserInput(dstSRS)
    ProjWarp = gdal.WarpOptions(dstSRS = dstSRS)
    
    AnnualizedSourceFiles = []
    AnnualizedSourceFiles = AnnualizedSourceFiles + [i for i in os.listdir(''.join([SourceFolder, r'/'])) if i.endswith('tif')]
    print(AnnualizedSourceFiles)
    
    
    ### 1. ASSIGN SPATIAL REFERENCE SYSTEM OF VECTOR MASK AND LOAD GEOMETRIES
    Vector = gpd.read_file(filename=MaskPath, layer=MaskLayerName)
    if Vector.crs != dstSRS:
        if MaskLayerName == None:
            MaskPath = MaskPath + '_temp'
        else:
            MaskLayerName = MaskLayerName + '_temp'
        Vector.to_crs(dstSRS).to_file(filename=MaskPath, layer=MaskLayerName)
    Vector = None # We're reloading the geometries with fiona
    
    with fiona.open(MaskPath, mode="r", layer=MaskLayerName) as Vector:
        MaskGeom = [feature["geometry"] for feature in Vector] # Identify the bounding areas of the mask.
    
    
    ### 2. PREPARE DESTINATION FILES
    for FileName in AnnualizedSourceFiles:
    
        InputRasterPath = os.path.join(ProjectFolder, SourceFolder, FileName)
        
        Sensor = re.search('[A-Z]+_', FileName)
        if Sensor is None:
            Sensor = ''
        else:
            Sensor = Sensor.group(0)

        Year = re.search('\d{4}', FileName).group(0)

        if FileName.endswith('avg.tif') == True:
            IndicType = '_avg'
        elif FileName.endswith('cfc.tif') == True:
            IndicType = '_cfc'
        else:
            IndicType = ''

        TempOutputName = 'Temp_' + Sensor + Year + IndicType + '.tif'
        TempOutputPath = os.path.join(ProjectFolder, DestFolder, TempOutputName)
        FinalOutputName = 'Msk_' + Sensor + Year + IndicType + '.tif'
        FinalOutputPath = os.path.join(ProjectFolder, DestFolder, FinalOutputName)

    ### 3. ASSIGN SPATIAL REFERENCE SYSTEM OF RASTER(S)
        InputRasterObject = gdal.Open(InputRasterPath)
        SourceSRS = osr.SpatialReference(wkt=InputRasterObject.GetProjection())
        print('Source projection: ', SourceSRS.GetAttrValue('projcs'))
        print('Destination projection: ', ProjSRS.GetAttrValue('projcs'))

        if SourceSRS.GetAttrValue('projcs') != ProjSRS.GetAttrValue('projcs'):
            Warp = gdal.Warp(TempOutputPath, # Where to store the warped raster
                         InputRasterObject, # Which raster to warp
                         format='GTiff', 
                         options=ProjWarp) # Reproject to Africa Albers Equal Area Conic
            print('Finished gdal.Warp() for %s. %s \n' % (YearFile, time.ctime()))

            Warp = None # Close the files
        else:
            pass
        InputRasterObject = None
        
    ### 4. RECLASSIFY AS NODATA IF OUTSIDE OF SETTLEMENT BUFFER ZONE.
        if exists(TempOutputPath):
            NewInputPath = TempOutputPath 
            print("We warped the data, so we'll use that file for next step.")
        else:
            NewInputPath = InputRasterPath 
            print("We skipped the warp, so we continue to use the source file.")

        with rasterio.open(NewInputPath) as InputRasterObject:
            MaskedOutputRaster, OutTransform = rasterio.mask.mask(
                InputRasterObject, MaskGeom, crop=True) # Anything outside the mask is reclassed to the raster's NoData value.
            OutMetaData = InputRasterObject.meta.copy()
        print('Finished rasterio.mask.mask() for %s. %s \n' % (FileName, time.ctime()))

        OutMetaData.update({"driver": "GTiff",
                         "height": MaskedOutputRaster.shape[1],
                         "width": MaskedOutputRaster.shape[2],
                         "transform": OutTransform})

        with rasterio.open(FinalOutputPath, "w", **OutMetaData) as dest:
            dest.write(MaskedOutputRaster)
        print('Written to file. %s \n' % time.ctime())
        InputRasterObject = None

        if exists(TempOutputPath):
            try:  # Finally, remove the intermediate file from disk
                os.remove(TempOutputPath)
            except OSError:
                pass
            print('Removed intermediate file. %s \n' % time.ctime())
        else:
            pass


    print('\n \n Finished all years in list. %s' % time.ctime())

In [161]:
def BatchZonalStats(FolderName, CRS = 'ESRI:102022', 
                    Zones = Settlements,
                    JoinField = 'Sett_ID',
                    StatsWanted = ['count', 'sum', 'mean', 'max', 'min'],
                    SeriesStart = 1999, SeriesEnd = 2015):
    """
    Normally, we would use numpy to generate a point gdf from the raster's matrix. 
    However, I was running into a lot of memory errors with that method.
    This method uses some extra steps: tif to xyz to df to gdf. But it saves to file
    and deletes intermediate files along the way, circumventing memory issues.
    
    Run MaskByZone() prior to reduce the raster to only your area(s) of interest.
    
    """
    AnnualizedFiles = [i for i in os.listdir(FolderName) if i.endswith('.tif')]
    print(AnnualizedFiles)
    AllSummaries = pd.DataFrame(Zones).drop(columns='geometry')[[JoinField]]
    print(AllSummaries)
    
    for FileName in AnnualizedFiles:
    ### STEP 1: TIF TO XYZ ###
        print('Loading data for %s. %s \n' % (FileName, time.ctime()))
        
        Sensor = re.search('[A-Z]+_', FileName)
        if Sensor is None:
            Sensor = ''
        else:
            Sensor = Sensor.group(0)
        Year = re.search('\d{4}', FileName).group(0)
        
        InputRasterPath = os.path.join(ProjectFolder, FolderName, FileName)
        InputRasterObject = gdal.Open(InputRasterPath)
        XYZOutputPath = FolderName + r'/{}'.format(
            FileName.replace('.tif', '.xyz')) # New file path will be the same as original, but .tif is replaced with .xyz

        # Create an .xyz version of the .tif
        XYZ = gdal.Translate(XYZOutputPath, # Specify a destination path
                             InputRasterObject, # Input is the masked .tif file
                             format='XYZ', 
                             creationOptions=["ADD_HEADER_LINE=YES"])
        print('Finished gdal.Translate() for year %s. %s \n' % (Year, time.ctime()))

    #     Un-comment this block to remove the temporary masked tif file.
    #     try:  
    #         os.remove(InputRasterName)
    #     except OSError:
    #         pass
    #     print('Removed (or skipped if error) intermediate tif file. %s \n' % time.ctime())

        InputRasterObject = None
        XYZ = None # Reload XYZ as a point geodataframe


    ### STEP 2: GENERATE GEODATAFRAME WITH JOIN FIELD ###
        InputXYZ = pd.read_table(XYZOutputPath, delim_whitespace=True)
        InputXYZ = InputXYZ.loc[InputXYZ['Z'] > 0] # Subset to only the features that have a value.
        print('Loaded XYZ file as a pandas dataframe. %s \n' % time.ctime())
        ValObject = gpd.GeoDataFrame(InputXYZ,
                                     geometry = gpd.points_from_xy(InputXYZ['X'], InputXYZ['Y']),
                                     crs = CRS)
        print('Created geodataframe from non-NoData points. %s \n' % time.ctime())
        del InputXYZ

        # Sjoin_nearest: No need to group by ADM this time. 
        ValObject_withID = pd.DataFrame(gpd.sjoin_nearest(ValObject, 
                                        Zones, 
                                        how='left')).drop(columns='geometry')[['Z', JoinField]] # No need for max_distance parameter this time. We've already narrowed down to nearby raster cells.

        print('\nJoined zone ID onto vectorized raster cells. %s \n' % time.ctime())
        print(ValObject_withID.sample(10))
        del ValObject

        ValObject_withID.to_csv(''.join([FolderName, r'/', FileName.replace('.tif', '.csv')]))
        print('\nExported as table. %s \n' % time.ctime())

        # Remove the temporary xyz file.
        try:  
            os.remove(os.path.join(XYZOutputPath))
        except OSError:
            pass
        print('Removed (or skipped if error) intermediate xyz file. %s \n' % time.ctime())


    ### STEP 3: AGGREGATE BY SETTLEMENT AND MERGE ONTO SUMMARIES TABLE ###
        GroupedVals = ValObject_withID[ValObject_withID['Z'].notna()].groupby(JoinField, as_index=False)
        
        # Unless this is a cloud-free coverage variable, run the desired aggregation methods.
        if FileName.find('cfc') == -1:
            if 'count' in StatsWanted:
                VariableName = ''.join([FolderName[:3].upper(), 'ct', Sensor, Year])
                AllSummaries = AllSummaries.merge(GroupedVals.count().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
            if 'sum' in StatsWanted:
                VariableName = ''.join([FolderName[:3].upper(), 'sum', Sensor, Year])
                AllSummaries = AllSummaries.merge(GroupedVals.sum().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
            if 'mean' in StatsWanted:
                VariableName = ''.join([FolderName[:3].upper(), 'avg', Sensor, Year])
                AllSummaries = AllSummaries.merge(GroupedVals.mean().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
            if 'max' in StatsWanted:
                VariableName = ''.join([FolderName[:3].upper(), 'max', Sensor, Year])
                AllSummaries = AllSummaries.merge(GroupedVals.max().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
            if 'min' in StatsWanted:
                VariableName = ''.join([FolderName[:3].upper(), 'min', Sensor, Year])
                AllSummaries = AllSummaries.merge(GroupedVals.min().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
            print('\nDesired aggregation methods applied to settlement level, year %s. %s \n' % (Year, time.ctime()))
            
            # Save in-progress results
            AllSummaries.to_csv(os.path.join(ResultsFolder, ''.join([FolderName, '%sto%s.csv' % (SeriesStart, SeriesEnd)])))
            print(AllSummaries.sample(10))

        else:
            VariableName = ''.join([FolderName[:3].upper(), 'cfc_', Sensor, Year])
            AllSummaries = AllSummaries.merge(GroupedVals.mean().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
            print('\nCount of cloud-free observations averaged to settlement level, year %s. %s \n' % (Year, time.ctime()))
            
            # Save in-progress results
            AllSummaries.to_csv(os.path.join(ResultsFolder, ''.join([FolderName, '%sto%s.csv' % (SeriesStart, SeriesEnd)])))
            print(AllSummaries.sample(10))
    
    print('\n\nFinished. All years masked and assigned their nearest settlement. %s' % time.ctime())
    print(AllSummaries.sample(10))
    AllSummaries.to_csv(os.path.join(ResultsFolder, ''.join([FolderName, '%sto%s.csv' % (SeriesStart, SeriesEnd)])))
    print('Saved to file. %s \n' % time.ctime())

### 1.4 Set up workspace.

In [10]:
ProjectFolder = os.getcwd()
ResultsFolder = os.path.join(ProjectFolder, 'Results')
print(ProjectFolder)
print(ResultsFolder)

Q:\GIS\povertyequity\urban_growth\BurkinaFaso
Q:\GIS\povertyequity\urban_growth\BurkinaFaso\Results


In [11]:
WSFEYears = ListFromRange(1985, 2015) # All years in the WSFE dataset.
AllStudyYears = ListFromRange(1999, 2015) # All years for which there will be growth stats in the present study.
print(WSFEYears, '\n\n', AllStudyYears)

[1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015] 

 [1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015]


---

## 2. PREPARE BUILDUP, SETTLEMENT, AND ADMIN DATASETS
Projection for all datasets: Africa Albers Equal Area Conic

### 2.1 Prepare raster locations for GRID3 and Admin areas

In [23]:
ADM_vec = gpd.read_file(glob.glob('ADM/*.shp')[0])[['geometry']].to_crs("ESRI:102022") # This glob() function pulls the first file ([0]) in the ADM folder which ended in '.shp'
GRID3_vec = gpd.read_file(glob.glob('Settlement/*.shp')[0])[['type','geometry']].to_crs("ESRI:102022")
ADM_vec['ADM_ID'] = range(0,len(ADM_vec))
ADM_vec['ADM_ID'] = ADM_vec['ADM_ID'] + 1 # We have to add 1 if we want our rasterized version's NoData value to be 0. Otherwise the first feature won't be valid.
GRID3_vec['G3_ID'] = range(0,len(GRID3_vec))
GRID3_vec['G3_ID'] = GRID3_vec['G3_ID'] +1
ADM_vec.to_file(driver='GPKG', filename=r'ADM/ADM_equalarea.gpkg', layer='ADM')
GRID3_vec.to_file(driver='GPKG', filename=r'Settlement/Settlement_equalarea.gpkg', layer='GRID3')

In [11]:
ADM_vec = gpd.read_file(r'ADM/ADM_equalarea.gpkg', layer='ADM')
GRID3_vec = gpd.read_file(r'Settlement/Settlement_equalarea.gpkg', layer='GRID3')

# We need to know how many digits need to be allocated to each dataset in the "join" serial.
len_ADM = len(str(ADM_vec['ADM_ID'].max()))
len_G3 =  len(str(GRID3_vec['G3_ID'].max()))

print(ADM_vec.info(), "\n\n", 
      ADM_vec.sample(5),
      ADM_vec.crs, "\n\n", 
      len_ADM) 
print(GRID3_vec.info(), "\n\n",
      GRID3_vec.sample(5),
      GRID3_vec.crs, "\n\n", 
      len_G3)

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 351 entries, 0 to 350
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   ADM_ID    351 non-null    int64   
 1   geometry  351 non-null    geometry
dtypes: geometry(1), int64(1)
memory usage: 5.6 KB
None 

      ADM_ID                                           geometry
128     129  POLYGON ((-2814706.700 1366314.825, -2814686.8...
196     197  POLYGON ((-3121153.823 1262207.751, -3120940.4...
82       83  MULTIPOLYGON (((-3039069.415 1467544.605, -303...
197     198  POLYGON ((-2651759.240 1596949.597, -2651690.5...
308     309  POLYGON ((-2778815.854 1482839.071, -2778612.6... PROJCS["Africa_Albers_Equal_Area_Conic",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",

### 2.2 Reproject WSFE to project CRS, and ensure valid data range is only 1985-2015.

In [10]:
InPath = glob.glob('Buildup/*.tif')[0]
OutWGS = os.path.join(ProjectFolder, 'Buildup', 'WSFE_reclass.tif')

# Together, x and y define the data's "shape".
# geotransform contains the parameters detailing how the raster should be stretched and aligned.
# geoproj is the map projection
# Z are the values in the raster band.
[xsize,ysize,geotransform,geoproj,Z] = readRaster(InPath)
Z[Z<1985] = 0
Z[Z>2015] = 0

writeRaster(OutWGS,geotransform,geoproj,Z)
print('Wrote the reclassed raster to file. %s' % time.ctime())

Wrote the reclassed raster to file. Wed Jan 25 16:06:18 2023


In [11]:
OutEqArea = os.path.join(ProjectFolder, 'Buildup', 'WSFE_equalarea.tif')

# Whenever we want to work in a projected CRS, we'll use Africa Albers Equal Area Conic.
ProjEqArea = gdal.WarpOptions(dstSRS='ESRI:102022')
Warp = gdal.Warp(OutEqArea, # Where to store the warped raster
                 OutWGS, # Which raster to warp
                 format='GTiff', 
                 options=ProjEqArea)
print('Wrote the reclassed and reprojected raster to file. %s' % time.ctime())

Wrote the reclassed and reprojected raster to file. Wed Jan 25 16:07:25 2023


In [12]:
rioStats(OutWGS)
rioStats(OutEqArea)


 [{'raster': 'Q:/GIS/povertyequity/urban_growth/BurkinaFaso/Buildup/WSFE_reclass.tif', 'bands': 1, 'data type': ('uint32',), 'no data value': (0.0,), 'width': 29405, 'height': 21083, 'min': 0, 'mean': 6.654020696638043, 'median': 0.0, 'max': 2015}]

 [{'raster': 'Q:/GIS/povertyequity/urban_growth/BurkinaFaso/Buildup/WSFE_equalarea.tif', 'bands': 1, 'data type': ('uint32',), 'no data value': (0.0,), 'width': 28597, 'height': 22808, 'min': 0, 'mean': 6.506819179191691, 'median': 0.0, 'max': 2015}]


In [13]:
InPath = Warp = OutWGS = OutEqArea = None

---

## 3. WSFE AND ADM; GRID3 AND ADM
RASTERIZE: Bring ADM and GRID3 into raster space.

RASTER MATH: "Join" ADM ID onto GRID3 and onto WSFE by creating unique concatenation string.

VECTORIZE: Bring joined data into vector space.

VECTOR MATH: Split unique ID from raster math step into separate columns.

### 3.1 Rasterize admin areas and GRID3 using WSFE specs.

In [12]:
# Copy and update the metadata from WSFE for the output
WSFE = os.path.join(ProjectFolder, 'Buildup', 'WSFE_equalarea.tif')
ADM_out = os.path.join(ProjectFolder, 'ADM', 'ADM_rasterized.tif')
GRID3_out = os.path.join(ProjectFolder, 'Settlement', 'GRID3_rasterized.tif')

ShapeToRaster(Shapefile=ADM_vec, ValueVar="ADM_ID", MetaRasterPath=WSFE, OutFilePath=ADM_out)
ShapeToRaster(GRID3_vec, "G3_ID", WSFE, GRID3_out)

Rasterizing dataset. Wed Jan 25 19:02:35 2023
Finished rasterizing. Checking contents. Wed Jan 25 19:03:24 2023

 [{'raster': 'Q:/GIS/povertyequity/urban_growth/BurkinaFaso/ADM/ADM_rasterized.tif', 'bands': 1, 'data type': ('uint32',), 'no data value': (0.0,), 'width': 28597, 'height': 22808, 'min': 0, 'mean': 84.24833096349128, 'median': 0.0, 'max': 351}]
Rasterizing dataset. Wed Jan 25 19:03:42 2023
Finished rasterizing. Checking contents. Wed Jan 25 19:07:57 2023

 [{'raster': 'Q:/GIS/povertyequity/urban_growth/BurkinaFaso/Settlement/GRID3_rasterized.tif', 'bands': 1, 'data type': ('uint32',), 'no data value': (0.0,), 'width': 28597, 'height': 22808, 'min': 0, 'mean': 12265.64792682353, 'median': 0.0, 'max': 615220}]


### 3.2 Raster math to "join" admin to GRID3 and to WSFE.
Processing is more rapid when "joining," i.e. creating serial codes out of two datasets, in raster rather than vector space.
Here, we are concatenating the ID fields of the two datasets to create a serial number that we can then split in vector space later to create two ID fields.

In [13]:
# In paths
InG3 = os.path.join(ProjectFolder, 'Settlement', 'GRID3_rasterized.tif')
InWSFE = os.path.join(ProjectFolder, 'Buildup', 'WSFE_equalarea.tif')
InADM = os.path.join(ProjectFolder, 'ADM', 'ADM_rasterized.tif')

# Out paths
G3_ADM = os.path.join(ProjectFolder, 'Settlement', 'GRID3_ADM.tif')
WSFE_ADM = os.path.join(ProjectFolder, 'Buildup', 'WSFE_ADM.tif')

# Calculations
Calc = "(A*" + str(10**len_ADM) + ")+B" 
# The number of digits in the largest ADM index value (len_ADM) is 
# the number of zeroes we tack onto the first variable in the serial.

calcShell(A=InG3, B=InADM, OutFile=G3_ADM, Calculation=Calc)
calcShell(A=InWSFE, B=InADM, OutFile=WSFE_ADM, Calculation=Calc)

Running for Q:\GIS\povertyequity\urban_growth\BurkinaFaso\Settlement\GRID3_rasterized.tif. Wed Jan 25 19:08:11 2023
Ran in shell. See OutFile folder to inspect results. Wed Jan 25 19:09:52 2023
Running for Q:\GIS\povertyequity\urban_growth\BurkinaFaso\Buildup\WSFE_equalarea.tif. Wed Jan 25 19:09:52 2023
Ran in shell. See OutFile folder to inspect results. Wed Jan 25 19:11:36 2023


*Adding together the values to create join IDs. This is in effect a concatenation of their ID strings, by way of summation. The number of zeros in the calc multiplication corresponds with number of digits of the maximum value in the "B" dataset. (e.g. Chad ADM codes go up 4 digits, so it's calc=(A*10000)+B).*

In [14]:
rioStats(G3_ADM)
rioStats(WSFE_ADM)


 [{'raster': 'Q:/GIS/povertyequity/urban_growth/BurkinaFaso/Settlement/GRID3_ADM.tif', 'bands': 1, 'data type': ('uint32',), 'no data value': (4294967293.0,), 'width': 28597, 'height': 22808, 'min': 1188, 'mean': 4132712942.105214, 'median': 4294967293.0, 'max': 4294967293}]

 [{'raster': 'Q:/GIS/povertyequity/urban_growth/BurkinaFaso/Buildup/WSFE_ADM.tif', 'bands': 1, 'data type': ('uint32',), 'no data value': (4294967293.0,), 'width': 28597, 'height': 22808, 'min': 1985001, 'mean': 4281025071.9005694, 'median': 4294967293.0, 'max': 4294967293}]


### 3.3 Vectorize serialized layers.

In [15]:
G3_in = os.path.join(ProjectFolder, 'Settlement', 'GRID3_ADM.tif')
G3_out = os.path.join(ProjectFolder, 'Settlement', 'GRID3_ADM.shp')
WSFE_in = os.path.join(ProjectFolder, 'Buildup', 'WSFE_ADM.tif')
WSFE_out = os.path.join(ProjectFolder, 'Buildup', 'WSFE_ADM.shp')

RasterToShapefile(G3_in, G3_out, OutName='GRID3_ADM', VariableName='gridcode', Driver = 'ESRI Shapefile')
RasterToShapefile(WSFE_in, WSFE_out, OutName='WSFE_ADM', VariableName='gridcode', Driver = 'ESRI Shapefile')

PROJCS["Africa_Albers_Equal_Area_Conic",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",0],PARAMETER["longitude_of_center",25],PARAMETER["standard_parallel_1",20],PARAMETER["standard_parallel_2",-23],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]] 

 PROJCS["Africa_Albers_Equal_Area_Conic",
    GEOGCS["WGS 84",
        DATUM["WGS_1984",
            SPHEROID["WGS 84",6378137,298.257223563,
                AUTHORITY["EPSG","7030"]],
            AUTHORITY["EPSG","6326"]],
        PRIMEM["Greenwich",0],
        UNIT["degree",0.0174532925199433,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","4326"]],
    PROJECTION["Albers_Conic_

### 3.4 Vector math to split raster strings into admin area, GRID3, and WSFE year assignments.

In [17]:
# Load newly created vectorized datasets.
GRID3_ADM = gpd.read_file(r"Settlement/GRID3_ADM.shp")
WSFE_ADM = gpd.read_file(r"Buildup/WSFE_ADM.shp")
print(GRID3_ADM.info(), "\n\n", GRID3_ADM.sample(10), "\n\n", GRID3_ADM.crs, "\n\n", 
      WSFE_ADM.info(), "\n\n", WSFE_ADM.sample(10), "\n\n", WSFE_ADM.crs, "\n\n", 
      GRID3_ADM['gridcode'].max(), WSFE_ADM['gridcode'].max())

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 651922 entries, 0 to 651921
Data columns (total 2 columns):
 #   Column    Non-Null Count   Dtype   
---  ------    --------------   -----   
 0   gridcode  651922 non-null  int64   
 1   geometry  651922 non-null  geometry
dtypes: geometry(1), int64(1)
memory usage: 9.9 MB
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 922083 entries, 0 to 922082
Data columns (total 2 columns):
 #   Column    Non-Null Count   Dtype   
---  ------    --------------   -----   
 0   gridcode  922083 non-null  int64   
 1   geometry  922083 non-null  geometry
dtypes: geometry(1), int64(1)
memory usage: 14.1 MB
None 

           gridcode                                           geometry
496831   279505220  POLYGON ((-2616149.075 1326384.485, -2616061.6...
204663  2147483647  POLYGON ((-3010491.859 1478617.650, -3010433.5...
437005   198318009  POLYGON ((-2800537.196 1362096.018, -2800449.7...
544353   248887327  POLYGON ((-2679438.658 13

In [18]:
# Split serial back into separate dataset fields.
# For example, Burkina: WSFE and ADM: 4+3=7 digits. GRID3 and ADM: 6+3=9 digits.

G3_Fill = len_G3 + len_ADM
WSFE_Fill = 4 + len_ADM

GRID3_ADM['gridstring'] = GRID3_ADM['gridcode'].astype(str).str.zfill(G3_Fill)
WSFE_ADM['gridstring'] = WSFE_ADM['gridcode'].astype(str).str.zfill(WSFE_Fill)

GRID3_ADM['Sett_ID'] = GRID3_ADM['gridstring'].str[:-len_ADM].astype(int) # Remove the last 3 digits to get the GRID3 portion.
GRID3_ADM['ADM_ID'] = GRID3_ADM['gridstring'].str[-len_ADM:].astype(int) # Keep only the last 3 digits to get the ADM portion.
WSFE_ADM['year'] = WSFE_ADM['gridstring'].str[:-len_ADM].astype(int)
WSFE_ADM['ADM_ID'] = WSFE_ADM['gridstring'].str[-len_ADM:].astype(int)

print(GRID3_ADM.sample(10), WSFE_ADM.sample(10))

         gridcode                                           geometry  \
358880  300874343  POLYGON ((-2663434.060 1398565.510, -2663346.6...   
300619  125123293  POLYGON ((-3062820.188 1425443.905, -3062761.8...   
298126  151834064  POLYGON ((-2967200.735 1426784.910, -2967113.2...   
215761  538380031  POLYGON ((-2593118.780 1471796.018, -2593031.3...   
347848  301186343  POLYGON ((-2653872.115 1403550.549, -2653842.9...   
8714    500320166  POLYGON ((-2727918.886 1699475.263, -2727831.4...   
417300  291808144  POLYGON ((-2581166.349 1372503.379, -2581137.1...   
506974  106117116  POLYGON ((-2974663.717 1320524.878, -2974576.2...   
144825  544916038  POLYGON ((-2630113.013 1524066.042, -2630083.8...   
90412   578743270  POLYGON ((-2560847.215 1576831.654, -2560730.6...   

       gridstring  Sett_ID  ADM_ID  
358880  300874343   300874     343  
300619  125123293   125123     293  
298126  151834064   151834      64  
215761  538380031   538380      31  
347848  301186343   30

In [19]:
# Dissolve any features that have the same G3 and ADM values so that we have a single unique feature per settlement.
# Note: we do NOT want to dissolve the WSFE features. Distinct features for noncontiguous builtup areas of the same year is necessary to separate them in the Near tool step.
print(time.ctime())
GRID3_ADM = GRID3_ADM.dissolve(by=['Sett_ID', 'ADM_ID'], as_index=False)
print(GRID3_ADM.info(), GRID3_ADM.head(), "\n\n", time.ctime())

Thu Jan 26 09:47:24 2023
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 620198 entries, 0 to 620197
Data columns (total 5 columns):
 #   Column      Non-Null Count   Dtype   
---  ------      --------------   -----   
 0   Sett_ID     620198 non-null  int64   
 1   ADM_ID      620198 non-null  int64   
 2   geometry    620198 non-null  geometry
 3   gridcode    620198 non-null  int64   
 4   gridstring  620198 non-null  object  
dtypes: geometry(1), int64(3), object(1)
memory usage: 23.7+ MB
None    Sett_ID  ADM_ID                                           geometry  \
0        1     188  POLYGON ((-3057776.844 1150654.589, -3057718.5...   
1        2     188  POLYGON ((-3062499.513 1164531.070, -3062412.0...   
2        3     209  POLYGON ((-3116256.302 1193537.581, -3115993.9...   
3        4     284  POLYGON ((-3124885.375 1215635.003, -3124710.4...   
4        5      14  POLYGON ((-3102204.907 1222602.396, -3102088.2...   

   gridcode gridstring  
0      1188  000001188 

In [20]:
# Remove features where year, settlement, or admin area = 0.
# This was supposed to be resolved earlier with the gdal_calc NoDataValue parameter.

print("Before: WSFE %s and GRID3 %s\n" % (WSFE_ADM.shape, GRID3_ADM.shape))
WSFE_ADM = WSFE_ADM.loc[(WSFE_ADM["year"] != 0) & (WSFE_ADM["ADM_ID"] != 0)] # Since we change the datatype to integer, no need to include all digits. Otherwise, it would need to be: != '0000'
GRID3_ADM = GRID3_ADM.loc[(GRID3_ADM["Sett_ID"] != 0) & (GRID3_ADM["ADM_ID"] != 0)]
print("After: WSFE %s and GRID3 %s\n" % (WSFE_ADM.shape, GRID3_ADM.shape))

Before: WSFE (922083, 5) and GRID3 (620198, 5)

After: WSFE (922083, 5) and GRID3 (620198, 5)



In [21]:
# The Bounded_ID is our new unique settlement identifier for subsequent matching steps.
GRID3_ADM['Bounded_ID'] = GRID3_ADM.index
WSFE_ADM['WSFE_ID'] = WSFE_ADM.index
GRID3_ADM = GRID3_ADM[['Sett_ID', 'Bounded_ID', 'ADM_ID', 'geometry']]
WSFE_ADM = WSFE_ADM[['WSFE_ID', 'year', 'ADM_ID', 'geometry']]

In [22]:
# Validation: 
# The first two printed numbers should be the same. There shouldn't be any GRID3 rows with matching Sett_ID and ADM_IDs.
# The latter two numbers should be different, and the first should be larger. We never dissolved WSFE by any column.

print(len(GRID3_ADM[['Sett_ID', 'ADM_ID']]),
      len(GRID3_ADM[['Sett_ID', 'ADM_ID']].drop_duplicates()),
      len(WSFE_ADM[['year', 'ADM_ID']]),
      len(WSFE_ADM[['year', 'ADM_ID']].drop_duplicates()))

620198 620198 922083 10026


In [23]:
GRID3_ADM.to_file(
    driver='GPKG', filename='Settlement/GRID3_ADM.gpkg', layer='GRID3_ADM_cleaned')
WSFE_ADM.to_file(
    driver='GPKG', filename=r'Buildup/WSFE_ADM.gpkg', layer='WSFE_ADM_cleaned')

---

## 4. UNIQUE SETTLEMENTS FROM WSFE AND GRID3: TWO VERSIONS

Note that there are 2 versions here, so that we can create a fragmentation index:
1. **Boundless, aka boundary-agnostic settlements**: Unique settlements are linked to GRID3 settlement IDs. Administrative areas do not influence the extents of the settlement.
2. **Bounded, aka politically-defined settlements**: Settlements in the Boundless dataset which spread across more than one administrative area are split into separate settlements in the Bounded dataset. The largest polygon after the split is considered the "principal" settlement, and polygons in other admin areas are considered "fragments." By dividing the fragment area(s) of the Bounded settlement by the area of the Boundless settlement, we can acquire a fragmentation index for each locality.

### 4.1 BOUNDED SETTLEMENTS: Near Join by ADM group.

In [24]:
print("Number of admin areas with GRID3 features: %s" % len(GRID3_ADM['ADM_ID'].unique().tolist()))
print("Number of admin areas with WSFE features: %s" % len(WSFE_ADM['ADM_ID'].unique().tolist()))
print("Number of admin areas where one dataset is observed but the other is not: %s" % (
    len(GRID3_ADM['ADM_ID'].unique().tolist()) - len(WSFE_ADM['ADM_ID'].unique().tolist())))

Number of admin areas with GRID3 features: 352
Number of admin areas with WSFE features: 352
Number of admin areas where one dataset is observed but the other is not: 0


In [25]:
ADM_IDs = sorted(GRID3_ADM['ADM_ID'].unique().tolist())
ADM_IDs

[1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 47,
 48,
 49,
 50,
 51,
 52,
 53,
 54,
 55,
 56,
 57,
 58,
 59,
 60,
 61,
 62,
 63,
 64,
 65,
 66,
 67,
 68,
 69,
 70,
 71,
 72,
 73,
 74,
 75,
 76,
 77,
 78,
 79,
 80,
 81,
 82,
 83,
 84,
 85,
 86,
 87,
 88,
 89,
 90,
 91,
 92,
 93,
 94,
 95,
 96,
 97,
 98,
 99,
 100,
 101,
 102,
 103,
 104,
 105,
 106,
 107,
 108,
 109,
 110,
 111,
 112,
 113,
 114,
 115,
 116,
 117,
 118,
 119,
 120,
 121,
 122,
 123,
 124,
 125,
 126,
 127,
 128,
 129,
 130,
 131,
 132,
 133,
 134,
 135,
 136,
 137,
 138,
 139,
 140,
 141,
 142,
 143,
 144,
 145,
 146,
 147,
 148,
 149,
 150,
 151,
 152,
 153,
 154,
 155,
 156,
 157,
 158,
 159,
 160,
 161,
 162,
 163,
 164,
 165,
 166,
 167,
 168,
 169,
 170,
 171,
 172,
 173,
 174,
 175,
 176,
 177,
 178,
 179,
 180,
 181,
 182,
 183,
 184,
 185

In [26]:
# We're creating this field to help in removing duplicates from the sjoin_nearest, next section.
GRID3_ADM['G3_Area'] = GRID3_ADM['geometry'].area / 10**6

In [27]:
# Create empty geodataframe to append onto using the dataframe whose geometry we want to retain.
Bounded = GRID3_ADM[0:0]
Bounded["year"] = pd.Series(dtype='int')
Bounded.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 0 entries
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   Sett_ID     0 non-null      int64   
 1   Bounded_ID  0 non-null      int64   
 2   ADM_ID      0 non-null      int64   
 3   geometry    0 non-null      geometry
 4   G3_Area     0 non-null      float64 
 5   year        0 non-null      int32   
dtypes: float64(1), geometry(1), int32(1), int64(3)
memory usage: 0.0 bytes


In [28]:
for ID in ADM_IDs:
    WSFE_shard = WSFE_ADM.loc[WSFE_ADM['ADM_ID'] == ID]
    GRID3_shard = GRID3_ADM.loc[GRID3_ADM['ADM_ID'] == ID]
    WSFE_GRID3_shard = gpd.sjoin_nearest(WSFE_shard, 
                                         GRID3_shard, 
                                         how='inner',
                                         max_distance=500)
    Bounded = pd.concat([Bounded, WSFE_GRID3_shard])
    print('Completed near join in admin area %s. %s \n' % (ID, time.ctime()))
print('Completed near join for all ADMs. %s \n' % time.ctime())

del WSFE_shard, GRID3_shard, WSFE_GRID3_shard

Completed near join in admin area 1. Thu Jan 26 10:05:30 2023 

Completed near join in admin area 2. Thu Jan 26 10:05:30 2023 

Completed near join in admin area 3. Thu Jan 26 10:05:30 2023 

Completed near join in admin area 4. Thu Jan 26 10:05:31 2023 

Completed near join in admin area 5. Thu Jan 26 10:05:31 2023 

Completed near join in admin area 6. Thu Jan 26 10:05:31 2023 

Completed near join in admin area 7. Thu Jan 26 10:05:31 2023 

Completed near join in admin area 8. Thu Jan 26 10:05:31 2023 

Completed near join in admin area 9. Thu Jan 26 10:05:31 2023 

Completed near join in admin area 10. Thu Jan 26 10:05:31 2023 

Completed near join in admin area 11. Thu Jan 26 10:05:31 2023 

Completed near join in admin area 12. Thu Jan 26 10:05:31 2023 

Completed near join in admin area 13. Thu Jan 26 10:05:32 2023 

Completed near join in admin area 14. Thu Jan 26 10:05:32 2023 

Completed near join in admin area 15. Thu Jan 26 10:05:32 2023 

Completed near join in admin area 

Completed near join in admin area 127. Thu Jan 26 10:05:51 2023 

Completed near join in admin area 128. Thu Jan 26 10:05:51 2023 

Completed near join in admin area 129. Thu Jan 26 10:05:51 2023 

Completed near join in admin area 130. Thu Jan 26 10:05:51 2023 

Completed near join in admin area 131. Thu Jan 26 10:05:52 2023 

Completed near join in admin area 132. Thu Jan 26 10:05:52 2023 

Completed near join in admin area 133. Thu Jan 26 10:05:52 2023 

Completed near join in admin area 134. Thu Jan 26 10:05:52 2023 

Completed near join in admin area 135. Thu Jan 26 10:05:52 2023 

Completed near join in admin area 136. Thu Jan 26 10:05:52 2023 

Completed near join in admin area 137. Thu Jan 26 10:05:52 2023 

Completed near join in admin area 138. Thu Jan 26 10:05:52 2023 

Completed near join in admin area 139. Thu Jan 26 10:05:52 2023 

Completed near join in admin area 140. Thu Jan 26 10:05:52 2023 

Completed near join in admin area 141. Thu Jan 26 10:05:53 2023 

Completed 

Completed near join in admin area 252. Thu Jan 26 10:06:25 2023 

Completed near join in admin area 253. Thu Jan 26 10:06:25 2023 

Completed near join in admin area 254. Thu Jan 26 10:06:26 2023 

Completed near join in admin area 255. Thu Jan 26 10:06:27 2023 

Completed near join in admin area 256. Thu Jan 26 10:06:27 2023 

Completed near join in admin area 257. Thu Jan 26 10:06:27 2023 

Completed near join in admin area 258. Thu Jan 26 10:06:27 2023 

Completed near join in admin area 259. Thu Jan 26 10:06:27 2023 

Completed near join in admin area 260. Thu Jan 26 10:06:27 2023 

Completed near join in admin area 261. Thu Jan 26 10:06:28 2023 

Completed near join in admin area 262. Thu Jan 26 10:06:28 2023 

Completed near join in admin area 263. Thu Jan 26 10:06:28 2023 

Completed near join in admin area 264. Thu Jan 26 10:06:28 2023 

Completed near join in admin area 265. Thu Jan 26 10:06:28 2023 

Completed near join in admin area 266. Thu Jan 26 10:06:28 2023 

Completed 

In [29]:
Bounded.sample(20)

Unnamed: 0,Sett_ID,Bounded_ID,ADM_ID,geometry,G3_Area,year,WSFE_ID,ADM_ID_left,index_right,ADM_ID_right
838100,36811,37034,,"POLYGON ((-3117713.916 1272802.609, -3117684.7...",7.998835,2008,838100.0,216.0,37034.0,216.0
239362,533209,538156,,"POLYGON ((-2602301.746 1456724.293, -2602272.5...",0.100283,2010,239362.0,31.0,538156.0,31.0
549551,130003,130871,,"POLYGON ((-2932713.597 1393347.254, -2932684.4...",1.051271,2009,549551.0,228.0,130871.0,228.0
70095,457760,462410,,"POLYGON ((-2850096.058 1560448.077, -2850066.9...",0.29235,2000,70095.0,225.0,462410.0,225.0
354093,227047,229233,,"POLYGON ((-2773746.258 1436113.637, -2773717.1...",378.496507,1991,354093.0,218.0,229233.0,218.0
186503,388701,392517,,"POLYGON ((-2987286.651 1488966.706, -2987257.4...",1.047871,1985,186503.0,215.0,392517.0,215.0
580041,247671,250117,,"POLYGON ((-2686406.051 1379674.838, -2686376.8...",0.222662,2007,580041.0,148.0,250117.0,148.0
580723,117196,118008,,"POLYGON ((-3058884.631 1379383.315, -3058797.1...",7.990336,1985,580723.0,159.0,118008.0,159.0
708101,51186,51489,,"POLYGON ((-3026175.782 1326530.246, -3026146.6...",1.626622,2000,708101.0,269.0,51489.0,269.0
609055,247085,249500,,"POLYGON ((-2645942.697 1371220.679, -2645913.5...",29.35909,2009,609055.0,296.0,249500.0,296.0


In [30]:
Bounded.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 921485 entries, 211201 to 922082
Data columns (total 10 columns):
 #   Column        Non-Null Count   Dtype   
---  ------        --------------   -----   
 0   Sett_ID       921485 non-null  int64   
 1   Bounded_ID    921485 non-null  int64   
 2   ADM_ID        0 non-null       float64 
 3   geometry      921485 non-null  geometry
 4   G3_Area       921485 non-null  float64 
 5   year          921485 non-null  int32   
 6   WSFE_ID       921485 non-null  float64 
 7   ADM_ID_left   921485 non-null  float64 
 8   index_right   921485 non-null  float64 
 9   ADM_ID_right  921485 non-null  float64 
dtypes: float64(6), geometry(1), int32(1), int64(2)
memory usage: 73.8 MB


In [31]:
# Remove WSFE features that did not match any GRID3 settlements.
Bounded = Bounded.loc[~Bounded['Sett_ID'].isna()]
Bounded.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 921485 entries, 211201 to 922082
Data columns (total 10 columns):
 #   Column        Non-Null Count   Dtype   
---  ------        --------------   -----   
 0   Sett_ID       921485 non-null  int64   
 1   Bounded_ID    921485 non-null  int64   
 2   ADM_ID        0 non-null       float64 
 3   geometry      921485 non-null  geometry
 4   G3_Area       921485 non-null  float64 
 5   year          921485 non-null  int32   
 6   WSFE_ID       921485 non-null  float64 
 7   ADM_ID_left   921485 non-null  float64 
 8   index_right   921485 non-null  float64 
 9   ADM_ID_right  921485 non-null  float64 
dtypes: float64(6), geometry(1), int32(1), int64(2)
memory usage: 73.8 MB


In [32]:
del GRID3_ADM, ADM_IDs

### 4.2 Remove duplicates: where buildup polygons intersected with more than one GRID3 settlement extent.
This happens when the first dataset (WSFE) intersects (distance = 0) with more than one feature of the second dataset (GRID3). More common for large cities. For example, Yaoundé, CMN has a large contiguous 1985 WSFE polygon which overlaps several small GRID3 features that are not Yaoundé.

In [33]:
# The first number should always be zero. 
# The second tells us whether/how many WSFE polygons were duplicated by the Near join.

print(len(WSFE_ADM[WSFE_ADM.duplicated('WSFE_ID')]), len(Bounded[Bounded.duplicated('WSFE_ID')]))

0 1731


In [34]:
# If there are duplicate WSFE_IDs, then we need to choose between them.
# We'll pick the one that joined with the largest GRID3 polygon.
# To do that, we can just sort the dataframe by GRID3 areas, then drop_duplicates. 
# It will retain the first row of each WSFE_ID group.
Bounded = Bounded.sort_values('G3_Area', ascending=False).drop_duplicates(['WSFE_ID'])
Bounded.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 919754 entries, 922082 to 862313
Data columns (total 10 columns):
 #   Column        Non-Null Count   Dtype   
---  ------        --------------   -----   
 0   Sett_ID       919754 non-null  int64   
 1   Bounded_ID    919754 non-null  int64   
 2   ADM_ID        0 non-null       float64 
 3   geometry      919754 non-null  geometry
 4   G3_Area       919754 non-null  float64 
 5   year          919754 non-null  int32   
 6   WSFE_ID       919754 non-null  float64 
 7   ADM_ID_left   919754 non-null  float64 
 8   index_right   919754 non-null  float64 
 9   ADM_ID_right  919754 non-null  float64 
dtypes: float64(6), geometry(1), int32(1), int64(2)
memory usage: 73.7 MB


In [35]:
print(len(Bounded[Bounded.duplicated('WSFE_ID')]))

0


In [36]:
# Now we can dissolve with the WSFE years, now that we can group them by their administratively split ID.
Bounded = Bounded.dissolve(by=['year', 'Bounded_ID'], as_index=False)
print(Bounded.info(), Bounded.sample(10))

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 180796 entries, 0 to 180795
Data columns (total 10 columns):
 #   Column        Non-Null Count   Dtype   
---  ------        --------------   -----   
 0   year          180796 non-null  int64   
 1   Bounded_ID    180796 non-null  int64   
 2   geometry      180796 non-null  geometry
 3   Sett_ID       180796 non-null  int64   
 4   ADM_ID        0 non-null       float64 
 5   G3_Area       180796 non-null  float64 
 6   WSFE_ID       180796 non-null  float64 
 7   ADM_ID_left   180796 non-null  float64 
 8   index_right   180796 non-null  float64 
 9   ADM_ID_right  180796 non-null  float64 
dtypes: float64(6), geometry(1), int64(3)
memory usage: 13.8 MB
None         year  Bounded_ID                                           geometry  \
75702   2003      138845  POLYGON ((-2994079.130 1410430.485, -2994049.9...   
95142   2006      249989  MULTIPOLYGON (((-2681362.708 1353087.966, -268...   
2677    1985      138901  MULTIPOLY

In [37]:
# Clean up and save to file.
Bounded = Bounded[['ADM_ID_left', 'year', 'Bounded_ID', 'Sett_ID', 'geometry']].rename(columns={"ADM_ID_left": "ADM_ID"})
Bounded = Bounded.astype({"ADM_ID":'int', "Bounded_ID":'int', "Sett_ID":'int', "year":'int'})
print(Bounded.sample(10))
Bounded.to_file(
    driver='GPKG', filename=r'Results/NonCumulativeSettlements.gpkg', layer='Settlements_Bounded')

        ADM_ID  year  Bounded_ID  Sett_ID  \
131339      30  2011      167131   165921   
136676     272  2011      586607   581299   
73014      219  2002      461528   456905   
20446      329  1990      157793   156712   
38087       19  1995      463087   458422   
171714     151  2015      163923   162770   
14519       97  1988      250058   247615   
163285     324  2014      312964   309772   
163266      67  2014      312844   309661   
54262      320  1999      413412   409365   

                                                 geometry  
131339  POLYGON ((-2811002.861 1305453.153, -2810973.7...  
136676  POLYGON ((-2568893.242 1623737.660, -2568864.0...  
73014   MULTIPOLYGON (((-2863302.038 1572283.900, -286...  
20446   POLYGON ((-2925629.595 1255340.398, -2925600.4...  
38087   MULTIPOLYGON (((-2848638.445 1595459.956, -284...  
171714  POLYGON ((-2907526.034 1292043.108, -2907467.7...  
14519   MULTIPOLYGON (((-2671276.021 1370579.329, -267...  
163285  MULTIPOLYGON (((

In [38]:
del WSFE_ADM

### 4.3 BOUNDLESS SETTLEMENTS: Dissolve features that were split by an ADM boundary.

In [39]:
# Fragments of any bounded settlement will be combined into a single "boundless" settlement in this version.
# It is based on their "Sett_ID", which is a direct loan from the GRID3 settlement features.
Boundless = Bounded.dissolve(by=['year', 'Sett_ID'], as_index=False)
print(Boundless.info(), Boundless.sample(10))

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 179618 entries, 0 to 179617
Data columns (total 5 columns):
 #   Column      Non-Null Count   Dtype   
---  ------      --------------   -----   
 0   year        179618 non-null  int64   
 1   Sett_ID     179618 non-null  int64   
 2   geometry    179618 non-null  geometry
 3   ADM_ID      179618 non-null  int32   
 4   Bounded_ID  179618 non-null  int32   
dtypes: geometry(1), int32(2), int64(2)
memory usage: 5.5 MB
None         year  Sett_ID                                           geometry  \
308     1985      580  MULTIPOLYGON (((-2997344.185 1204732.054, -299...   
177785  2015   468152  MULTIPOLYGON (((-2825287.475 1597704.681, -282...   
90270   2005   457940  POLYGON ((-2863855.931 1567123.948, -2863826.7...   
130656  2011   180320  MULTIPOLYGON (((-2865342.697 1408273.217, -286...   
71275   2002   314016  POLYGON ((-2633698.743 1418097.533, -2633669.5...   
123679  2010   294894  POLYGON ((-2670547.215 1447249.805, 

In [40]:
# Clean up and save to file.
Boundless.to_file(driver='GPKG', filename=r'Results/NonCumulativeSettlements.gpkg', layer='Settlements_Boundless')

---

## 5. CUMULATIVE ANNUALIZED SETTLEMENT EXTENTS
DISSOLVE BY YEAR SETS: Create separate feature layers of each cumulative year.

### 5.1 Define study years for each for loop.

In [43]:
Boundless = gpd.read_file(r'Results/NonCumulativeSettlements.gpkg', layer='Settlements_Boundless')

ReversedStudyYears = []
for i in AllStudyYears:
    ReversedStudyYears.insert(0,i)
ReversedStudyYears.remove(2015)
print('\n\n', ReversedStudyYears)



 [2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999]


### 5.2 Starting with main Boundless dataset, create a cumulative area feature layer for each year.

In [44]:
# For each year in the growth stats study, we are taking features from all years prior to and including that year, 
# dissolving those features, and exporting as its own file.

for item in AllStudyYears:
    print('Subsetting to cumulative area for year: %s. %s\n' % (item, time.ctime()))
    YearSet = Boundless[Boundless['year'].between(
        1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.
    print('Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. %s\n' % time.ctime())
    YearDissolve = YearSet.dissolve(by='Sett_ID', 
                                        aggfunc={"year": "max", "ADM_ID":"min"}, # Though ADM_ID should be matching every time.
                                        as_index=False)
    print('Write to file. %s\n' % time.ctime())
    YearName = ''.join(['Cu', str(item), '_Boundless'])
    YearDissolve.to_file(driver='GPKG', filename=r'Results/CumulativeSettlements.gpkg', layer=YearName)
    del YearSet, YearDissolve
print("Done with all years in set. %s" % time.ctime())

Subsetting to cumulative area for year: 1999. Thu Jan 26 11:10:46 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Jan 26 11:10:46 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Thu Jan 26 11:11:14 2023

Subsetting to cumulative area for year: 2000. Thu Jan 26 11:11:22 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Jan 26 11:11:22 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Thu Jan 26 11:11:54 2023

Subsetting to cumulative area for year: 2001. Thu Jan 26 11:12:03 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Jan 26 11:12:03 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Thu Jan 26 11:12:37 2023

Subsetting to cumulative area for year: 2002. Thu Jan 26 11:12:46 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Jan 26 11:12:46 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Thu Jan 26 11:13:20 2023

Subsetting to cumulative area for year: 2003. Thu Jan 26 11:13:29 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Jan 26 11:13:29 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Thu Jan 26 11:14:05 2023

Subsetting to cumulative area for year: 2004. Thu Jan 26 11:14:13 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Jan 26 11:14:13 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Thu Jan 26 11:14:52 2023

Subsetting to cumulative area for year: 2005. Thu Jan 26 11:15:01 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Jan 26 11:15:01 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Thu Jan 26 11:15:39 2023

Subsetting to cumulative area for year: 2006. Thu Jan 26 11:15:48 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Jan 26 11:15:48 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Thu Jan 26 11:16:30 2023

Subsetting to cumulative area for year: 2007. Thu Jan 26 11:16:39 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Jan 26 11:16:39 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Thu Jan 26 11:17:21 2023

Subsetting to cumulative area for year: 2008. Thu Jan 26 11:17:30 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Jan 26 11:17:30 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Thu Jan 26 11:18:15 2023

Subsetting to cumulative area for year: 2009. Thu Jan 26 11:18:24 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Jan 26 11:18:24 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Thu Jan 26 11:19:09 2023

Subsetting to cumulative area for year: 2010. Thu Jan 26 11:19:18 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Jan 26 11:19:18 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Thu Jan 26 11:20:06 2023

Subsetting to cumulative area for year: 2011. Thu Jan 26 11:20:16 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Jan 26 11:20:16 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Thu Jan 26 11:21:06 2023

Subsetting to cumulative area for year: 2012. Thu Jan 26 11:21:15 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Jan 26 11:21:15 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Thu Jan 26 11:22:06 2023

Subsetting to cumulative area for year: 2013. Thu Jan 26 11:22:15 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Jan 26 11:22:15 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Thu Jan 26 11:23:09 2023

Subsetting to cumulative area for year: 2014. Thu Jan 26 11:23:19 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Jan 26 11:23:19 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Thu Jan 26 11:24:14 2023

Subsetting to cumulative area for year: 2015. Thu Jan 26 11:24:24 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Thu Jan 26 11:24:24 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Thu Jan 26 11:25:22 2023

Done with all years in set. Thu Jan 26 11:25:32 2023


##### Join area information from each cumulative layer onto the latest year dataset.

In [45]:
# The latest year in the study contains all settlements. Merge all other years' areas onto this dataset.
SettAreas = gpd.read_file(r'Results/CumulativeSettlements.gpkg', layer=
                          ''.join(['Cu', str(2015), '_Boundless'])) 
SettAreas['Area2015'] = SettAreas['geometry'].area / 10**6
SettAreas = pd.DataFrame(SettAreas).drop(columns='geometry') # We have settlement IDs, so no need to join spatially!


for item in ReversedStudyYears:
    print("Loading cumulative layer for year %s. %s\n" % (item, time.ctime()))
    YearLayer = gpd.read_file(r'Results/CumulativeSettlements.gpkg', layer=''.join(['Cu', str(item), '_Boundless']))
    print("Adding area field and converting to non-spatial dataframe. %s\n" % (time.ctime()))
    AreaYearName = ''.join(['Area', str(item)])
    YearLayer[AreaYearName] = YearLayer['geometry'].area/ 10**6 
    YearLayer = pd.DataFrame(YearLayer)[['Sett_ID', AreaYearName]]
    print("Merging variables from %s onto our latest year (%s) via table join. %s\n" % (item, 2015, time.ctime()))
    SettAreas = SettAreas.merge(YearLayer, how='left', on='Sett_ID')
print("Done merging annualized areas onto latest year geometries. Saving to file. %s\n" % (time.ctime()))


print(SettAreas.info())
SettAreas.to_csv(os.path.join(ResultsFolder, 'Areas%sto%s.csv' % (1999, 2015)))

Loading cumulative layer for year 2014. Thu Jan 26 11:25:34 2023

Adding area field and converting to non-spatial dataframe. Thu Jan 26 11:25:36 2023

Merging variables from 2014 onto our latest year (2015) via table join. Thu Jan 26 11:25:36 2023

Loading cumulative layer for year 2013. Thu Jan 26 11:25:36 2023

Adding area field and converting to non-spatial dataframe. Thu Jan 26 11:25:38 2023

Merging variables from 2013 onto our latest year (2015) via table join. Thu Jan 26 11:25:38 2023

Loading cumulative layer for year 2012. Thu Jan 26 11:25:38 2023

Adding area field and converting to non-spatial dataframe. Thu Jan 26 11:25:40 2023

Merging variables from 2012 onto our latest year (2015) via table join. Thu Jan 26 11:25:40 2023

Loading cumulative layer for year 2011. Thu Jan 26 11:25:40 2023

Adding area field and converting to non-spatial dataframe. Thu Jan 26 11:25:42 2023

Merging variables from 2011 onto our latest year (2015) via table join. Thu Jan 26 11:25:42 2023

Load

In [46]:
del SettAreas

### 5.3 Repeat for Bounded dataset.

In [47]:
# Bounded = gpd.read_file(r'Results/NonCumulativeSettlements.gpkg', layer='Settlements_Bounded')

for item in AllStudyYears:
    print('Subsetting to cumulative area for year: %s. %s\n' % (item, time.ctime()))
    YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.
    print('Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. %s\n' % time.ctime())
    YearDissolve = YearSet.dissolve(by='Bounded_ID', 
                                        aggfunc={"year": "max", "ADM_ID":"min", "Sett_ID":"min"}, # Though ADM_ID and Sett_ID should be matching every time.
                                        as_index=False)
    print('Write to file. %s\n' % time.ctime())
    YearName = ''.join(['Cu', str(item), '_Bounded'])
    YearDissolve.to_file(driver='GPKG', filename=r'Results/CumulativeSettlements.gpkg', layer=YearName)
    del YearSet, YearDissolve
print("Done with all years in set. %s" % time.ctime())

Subsetting to cumulative area for year: 1999. Thu Jan 26 11:26:05 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Jan 26 11:26:05 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Jan 26 11:26:33 2023

Subsetting to cumulative area for year: 2000. Thu Jan 26 11:26:42 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Jan 26 11:26:42 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Jan 26 11:27:12 2023

Subsetting to cumulative area for year: 2001. Thu Jan 26 11:27:20 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Jan 26 11:27:20 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Jan 26 11:27:52 2023

Subsetting to cumulative area for year: 2002. Thu Jan 26 11:28:01 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Jan 26 11:28:01 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Jan 26 11:28:34 2023

Subsetting to cumulative area for year: 2003. Thu Jan 26 11:28:44 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Jan 26 11:28:44 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Jan 26 11:29:19 2023

Subsetting to cumulative area for year: 2004. Thu Jan 26 11:29:29 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Jan 26 11:29:29 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Jan 26 11:30:06 2023

Subsetting to cumulative area for year: 2005. Thu Jan 26 11:30:15 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Jan 26 11:30:15 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Jan 26 11:30:54 2023

Subsetting to cumulative area for year: 2006. Thu Jan 26 11:31:04 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Jan 26 11:31:04 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Jan 26 11:31:45 2023

Subsetting to cumulative area for year: 2007. Thu Jan 26 11:31:55 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Jan 26 11:31:55 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Jan 26 11:32:38 2023

Subsetting to cumulative area for year: 2008. Thu Jan 26 11:32:47 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Jan 26 11:32:47 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Jan 26 11:33:32 2023

Subsetting to cumulative area for year: 2009. Thu Jan 26 11:33:42 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Jan 26 11:33:42 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Jan 26 11:34:29 2023

Subsetting to cumulative area for year: 2010. Thu Jan 26 11:34:40 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Jan 26 11:34:40 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Jan 26 11:35:32 2023

Subsetting to cumulative area for year: 2011. Thu Jan 26 11:35:42 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Jan 26 11:35:42 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Jan 26 11:36:31 2023

Subsetting to cumulative area for year: 2012. Thu Jan 26 11:36:41 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Jan 26 11:36:41 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Jan 26 11:37:33 2023

Subsetting to cumulative area for year: 2013. Thu Jan 26 11:37:42 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Jan 26 11:37:42 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Jan 26 11:38:36 2023

Subsetting to cumulative area for year: 2014. Thu Jan 26 11:38:46 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Jan 26 11:38:46 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Jan 26 11:39:42 2023

Subsetting to cumulative area for year: 2015. Thu Jan 26 11:39:52 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Thu Jan 26 11:39:52 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Thu Jan 26 11:40:51 2023

Done with all years in set. Thu Jan 26 11:41:02 2023


In [48]:
SettAreas = gpd.read_file(r'Results/CumulativeSettlements.gpkg', 
                          layer=''.join(['Cu', str(2015), '_Bounded']))
SettAreas['Area2015'] = SettAreas['geometry'].area / 10**6
SettAreas = pd.DataFrame(SettAreas).drop(columns='geometry')


for item in ReversedStudyYears:
    print("Loading cumulative layer for year %s. %s\n" % (item, time.ctime()))
    YearLayer = gpd.read_file(r'Results/CumulativeSettlements.gpkg', layer=''.join(['Cu', str(item), '_Bounded']))
    print("Adding area field and converting to non-spatial dataframe. %s\n" % (time.ctime()))
    AreaYearName = ''.join(['Area', str(item)])
    YearLayer[AreaYearName] = YearLayer['geometry'].area/ 10**6 
    YearLayer = pd.DataFrame(YearLayer)[['Bounded_ID', AreaYearName]]
    print("Merging variables from %s onto our latest year (%s) via table join. %s\n" % (item, 2015, time.ctime()))
    SettAreas = SettAreas.merge(YearLayer, how='left', on='Bounded_ID')
print("Done merging annualized areas onto latest year geometries. Saving to file. %s\n" % (time.ctime()))

print(SettAreas.info())
SettAreas.to_csv(os.path.join(ResultsFolder, 'Areas%sto%s_%s.csv' % (1999, 2015, 'Bounded')))

Loading cumulative layer for year 2014. Thu Jan 26 11:41:04 2023

Adding area field and converting to non-spatial dataframe. Thu Jan 26 11:41:06 2023

Merging variables from 2014 onto our latest year (2015) via table join. Thu Jan 26 11:41:06 2023

Loading cumulative layer for year 2013. Thu Jan 26 11:41:06 2023

Adding area field and converting to non-spatial dataframe. Thu Jan 26 11:41:08 2023

Merging variables from 2013 onto our latest year (2015) via table join. Thu Jan 26 11:41:08 2023

Loading cumulative layer for year 2012. Thu Jan 26 11:41:08 2023

Adding area field and converting to non-spatial dataframe. Thu Jan 26 11:41:10 2023

Merging variables from 2012 onto our latest year (2015) via table join. Thu Jan 26 11:41:10 2023

Loading cumulative layer for year 2011. Thu Jan 26 11:41:10 2023

Adding area field and converting to non-spatial dataframe. Thu Jan 26 11:41:13 2023

Merging variables from 2011 onto our latest year (2015) via table join. Thu Jan 26 11:41:13 2023

Load

In [49]:
del SettAreas

### 5.4 One settlement geofile to rule them all. ...and in the Sett_ID bind them.
The annualized values can be stored as distinct non-spatial dataframes. Their Sett_IDs will be used to join onto this geoversion with place names for the summary stats.

In [50]:
Settlements = gpd.read_file(r'Results/CumulativeSettlements.gpkg', 
                           layer=''.join(['Cu', str(2015), '_Boundless']))[['Sett_ID', 'ADM_ID', 'geometry']]
print(Settlements.info())
print(Settlements.crs)
Settlements.to_file(driver='GPKG', 
                       filename=r'Results/SETTLEMENTS.gpkg', 
                       layer='SETTLEMENTS_equalarea')

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 23490 entries, 0 to 23489
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   Sett_ID   23490 non-null  int64   
 1   ADM_ID    23490 non-null  int64   
 2   geometry  23490 non-null  geometry
dtypes: geometry(1), int64(2)
memory usage: 550.7 KB
None
PROJCS["Africa_Albers_Equal_Area_Conic",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",0],PARAMETER["longitude_of_center",25],PARAMETER["standard_parallel_1",20],PARAMETER["standard_parallel_2",-23],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["ESRI","102022"]]


In [51]:
# Saving all the final products as WGS84.
Settlements_WGS = Settlements.to_crs(4326) 
print(Settlements_WGS.info())
print(Settlements_WGS.crs)
Settlements_WGS.to_file(driver='GPKG', 
                       filename=r'Results/SETTLEMENTS.gpkg', 
                       layer='SETTLEMENTS')

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 23490 entries, 0 to 23489
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   Sett_ID   23490 non-null  int64   
 1   ADM_ID    23490 non-null  int64   
 2   geometry  23490 non-null  geometry
dtypes: geometry(1), int64(2)
memory usage: 550.7 KB
None
epsg:4326


### 5.5 Buffer the area of the Boundless dataset's latest year to mask raster data in later sections.
The Bounded dataset would also be fine for our purposes here. The buffer is dissolved to a single feature to be used for its total extents, which are identical between Bounded & Boundless datasets.

In [52]:
# Create buffer layer(s) to use as maximum distance for Near joins.

# Population buffer: 2km
Distance = 2000 # The Africa Albers projection is in meters. Saving in this projection to use in later sections.

print('Creating buffer layer. %s' % time.ctime())
BufferLayer = Settlements[['Sett_ID', 'geometry']]
BufferLayer['geometry'] = BufferLayer['geometry'].apply(
    make_valid).buffer(Distance) # make_valid is a workaround for any null geometries.
print('Finished buffer layer creation. %s' % time.ctime())
BufferFileName1 = ''.join(['Buff', str(Distance), 'm_', str(2015)])
BufferLayer.to_file(driver='GPKG', filename=r'Results/Catchment.gpkg', layer=BufferFileName1)
print('Saved to file. %s' % time.ctime())

Creating buffer layer. Thu Jan 26 11:41:57 2023


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)


Finished buffer layer creation. Thu Jan 26 11:55:50 2023
Saved to file. Thu Jan 26 11:55:59 2023


In [53]:
# Nighttime Lights buffer: 250m
Distance = 250

print('Creating buffer layer. %s' % time.ctime())
BufferLayer = Settlements[['Sett_ID', 'geometry']]
BufferLayer['geometry'] = BufferLayer['geometry'].apply(
    make_valid).buffer(Distance) # make_valid is a workaround for any null geometries.
print('Finished buffer layer creation. %s' % time.ctime())
BufferFileName2 = ''.join(['Buff', str(Distance), 'm_', str(2015)])
BufferLayer.to_file(driver='GPKG', filename=r'Results/Catchment.gpkg', layer=BufferFileName2)
print('Saved to file. %s' % time.ctime())

Creating buffer layer. Thu Jan 26 11:55:59 2023


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)


Finished buffer layer creation. Thu Jan 26 11:57:45 2023
Saved to file. Thu Jan 26 11:57:55 2023


---

## 6. PLACE NAMES
Join urban place names from UCDB, Africapolis, and GeoNames onto the settlement vectors.

### 6.1 Load placename datasets, filter, and project.

In [54]:
# Anytime we use a spatial join or work with area, 
# my preference is to keep it in a planar, equal area, meters projection. So we'll load as the Africa Albers.
Settlements = gpd.read_file(r'Results/SETTLEMENTS.gpkg', layer='SETTLEMENTS_equalarea')
Settlements['Area2015'] = Settlements['geometry'].area / 10**6

# Load, pull name field, rename, and reproject to match the catchments CRS.
UCDB = gpd.read_file('PlaceName/GHS_STAT_UCDB2015MT_GLOBE_R2019A_V1_2.gpkg', 
                     layer=0)[['UC_NM_MN', 'geometry']].rename(
    columns={"UC_NM_MN": "UCDB_Name"}).to_crs("ESRI:102022")

Africapolis = gpd.read_file('PlaceName/AFRICAPOLIS2020.shp')[['agglosName', 'geometry']].rename(
    columns={"agglosName": "Afpl_Name"}).to_crs("ESRI:102022")

GeoNames = gpd.read_file('PlaceName/GeoNames.gpkg', 
                         layer=0)[['GeoName', 'geometry']].to_crs("ESRI:102022")

print(Settlements.info(), UCDB.info(), Africapolis.info(), GeoNames.info())

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 23490 entries, 0 to 23489
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   Sett_ID   23490 non-null  int64   
 1   ADM_ID    23490 non-null  int64   
 2   geometry  23490 non-null  geometry
 3   Area2015  23490 non-null  float64 
dtypes: float64(1), geometry(1), int64(2)
memory usage: 734.2 KB
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 13135 entries, 0 to 13134
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype   
---  ------     --------------  -----   
 0   UCDB_Name  13135 non-null  object  
 1   geometry   13135 non-null  geometry
dtypes: geometry(1), object(1)
memory usage: 205.4+ KB
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 7720 entries, 0 to 7719
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype   
---  ------     --------------  -----   
 0   Afpl_Name  7720 non-null   object  
 1   ge

### 6.2 Join placenames onto settlements geodataframe.

In [55]:
# We wrap it in pd.DataFrame() since the sjoin() is the last time we need the geometry.

GeoNames = pd.DataFrame(gpd.sjoin_nearest(GeoNames, Settlements, 
                             how='left', distance_col="distGN", max_distance=250, 
                             lsuffix="G3", rsuffix="GN")).drop(columns='geometry')
Africapolis = pd.DataFrame(gpd.sjoin_nearest(Africapolis, Settlements, 
                             how='left', distance_col="distAF", max_distance=250,
                             lsuffix="G3", rsuffix="Af")).drop(columns='geometry')
UCDB = pd.DataFrame(gpd.sjoin_nearest(UCDB, Settlements, 
                             how='left', distance_col="distUC", max_distance=250,
                             lsuffix="G3", rsuffix="UC")).drop(columns='geometry')

In [56]:
print(GeoNames.info())
print(Africapolis.info())
print(UCDB.info())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 199390 entries, 0 to 199389
Data columns (total 6 columns):
 #   Column    Non-Null Count   Dtype  
---  ------    --------------   -----  
 0   GeoName   199390 non-null  object 
 1   index_GN  75 non-null      float64
 2   Sett_ID   75 non-null      float64
 3   ADM_ID    75 non-null      float64
 4   Area2015  75 non-null      float64
 5   distGN    75 non-null      float64
dtypes: float64(5), object(1)
memory usage: 10.6+ MB
None
<class 'pandas.core.frame.DataFrame'>
Int64Index: 7824 entries, 0 to 7719
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Afpl_Name  7824 non-null   object 
 1   index_Af   208 non-null    float64
 2   Sett_ID    208 non-null    float64
 3   ADM_ID     208 non-null    float64
 4   Area2015   208 non-null    float64
 5   distAF     208 non-null    float64
dtypes: float64(5), object(1)
memory usage: 427.9+ KB
None
<class 'pandas.core.frame.D

In [57]:
alldatasets = [pd.DataFrame(Settlements).drop(columns='geometry'),
               Africapolis[['Sett_ID', 'Afpl_Name', 'distAF']], 
               GeoNames[['Sett_ID', 'GeoName', 'distGN']],
               UCDB[['Sett_ID', 'UCDB_Name', 'distUC']]]

SettlementsNamed = reduce(lambda left,right: pd.merge(left,right,on=['Sett_ID'], how='left'), alldatasets)
SettlementsNamed[['Afpl_Name', 'GeoName', 'UCDB_Name']] = SettlementsNamed[['Afpl_Name', 'GeoName', 'UCDB_Name']].fillna('UNK')

# Replace NaN values with a countable distance.
SettlementsNamed[['distAF', 'distGN', 'distUC']] = SettlementsNamed[['distAF', 'distGN', 'distUC']].fillna(-1)

In [58]:
print(SettlementsNamed.info())
print(SettlementsNamed.sample(10))

<class 'pandas.core.frame.DataFrame'>
Int64Index: 23492 entries, 0 to 23491
Data columns (total 9 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Sett_ID    23492 non-null  int64  
 1   ADM_ID     23492 non-null  int64  
 2   Area2015   23492 non-null  float64
 3   Afpl_Name  23492 non-null  object 
 4   distAF     23492 non-null  float64
 5   GeoName    23492 non-null  object 
 6   distGN     23492 non-null  float64
 7   UCDB_Name  23492 non-null  object 
 8   distUC     23492 non-null  float64
dtypes: float64(4), int64(2), object(3)
memory usage: 1.8+ MB
None
       Sett_ID  ADM_ID  Area2015 Afpl_Name  distAF GeoName  distGN UCDB_Name  \
11524   266047      27  0.062039       UNK    -1.0     UNK    -1.0       UNK   
14745   400586     130  0.016997       UNK    -1.0     UNK    -1.0       UNK   
1108     29880     210  0.018697       UNK    -1.0     UNK    -1.0       UNK   
18407   454527      54  0.013598       UNK    -1.0     UNK    -1.

In [59]:
del UCDB, Africapolis, GeoNames

The near joins should have prevented duplication of rows, but if df1 intersects with two features in df2, it creates a new row. Two of our placenames sources are polygons, so there may be instances.

In [60]:
SettlementsNamed[SettlementsNamed.duplicated('Sett_ID', keep=False)]

Unnamed: 0,Sett_ID,ADM_ID,Area2015,Afpl_Name,distAF,GeoName,distGN,UCDB_Name,distUC
9910,227047,147,346.334596,Ouagadougou,0.0,Saaba,0.0,Ouagadougou,0.0
9911,227047,147,346.334596,Ouagadougou,0.0,Ouagadougou,0.0,Ouagadougou,0.0
11757,277234,265,3.837945,Cinkassi/Cinkanse [BFA],0.0,UNK,-1.0,Cinkassé,0.0
11758,277234,265,3.837945,Cinkassi/Cinkanse [TGO],0.0,UNK,-1.0,Cinkassé,0.0


In [61]:
SettlementsNamed.drop_duplicates(subset=['Sett_ID'], inplace=True, keep='first')
SettlementsNamed.info() # Range of entries should be the same as original Settlements file.

<class 'pandas.core.frame.DataFrame'>
Int64Index: 23490 entries, 0 to 23491
Data columns (total 9 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Sett_ID    23490 non-null  int64  
 1   ADM_ID     23490 non-null  int64  
 2   Area2015   23490 non-null  float64
 3   Afpl_Name  23490 non-null  object 
 4   distAF     23490 non-null  float64
 5   GeoName    23490 non-null  object 
 6   distGN     23490 non-null  float64
 7   UCDB_Name  23490 non-null  object 
 8   distUC     23490 non-null  float64
dtypes: float64(4), int64(2), object(3)
memory usage: 1.8+ MB


### 6.3 Reduce to single name column.

In [62]:
# Determine which source has a name geometrically closest to the settlement.
# Since we switched NaN values to -1 earlier, we also resolved what happens in the event of a tie, 
# i.e. when more than one source is 0.0 meters from the settlement. It will take the value from the first column.
SettlementsNamed['SettName'] = "UNK"
SettlementsNamed['closest'] = SettlementsNamed[['distAF', 'distGN', 'distUC']].idxmax(axis=1)

In [63]:
SettlementsNamed.sample(20)

Unnamed: 0,Sett_ID,ADM_ID,Area2015,Afpl_Name,distAF,GeoName,distGN,UCDB_Name,distUC,SettName,closest
14191,388852,75,0.05949,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
3088,66105,37,0.004249,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
9379,215219,119,0.015297,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
4938,117585,264,0.072238,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
15635,409516,320,0.011898,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
15632,409511,137,0.011898,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
2260,51309,127,0.028895,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
17761,446872,257,0.003399,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
10077,227336,290,0.007649,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
11461,265911,187,0.045042,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF


In [64]:
# Create a single name column where non-named settlements are "UNK" but all others use one of the three name sources.
SettlementsNamed.loc[
    SettlementsNamed['closest'] == "distAF", 
    'SettName'] = SettlementsNamed['Afpl_Name']

SettlementsNamed.loc[
    SettlementsNamed['closest'] == "distUC", 
    'SettName'] = SettlementsNamed['UCDB_Name']

SettlementsNamed.loc[
    SettlementsNamed['closest'] == "distGN", 
    'SettName'] = SettlementsNamed['GeoName']

In [65]:
SettlementsNamed.sample(20)

Unnamed: 0,Sett_ID,ADM_ID,Area2015,Afpl_Name,distAF,GeoName,distGN,UCDB_Name,distUC,SettName,closest
7582,185243,63,0.011048,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
20822,474698,134,0.014448,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
16942,417578,20,0.046742,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
6827,166088,276,0.098583,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
10261,227649,255,0.062889,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
20211,468259,306,0.003399,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
18770,457220,121,0.006799,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
2882,55834,37,0.00085,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
5911,138901,41,0.021246,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
10896,247572,346,0.075637,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF


### 6.4 Make sure place name is unique by stripping smaller localities of duplicated names.

In [66]:
Dupes = SettlementsNamed[ 
    (SettlementsNamed['SettName'] != 'UNK') & 
    (SettlementsNamed.duplicated('SettName', keep=False)) ] # keep=False is necessary to retain *all* duplicates, not just first or last in each group.

print("Number of named settlements: %s" % SettlementsNamed['SettName'].str.contains('UNK').value_counts()[False])
print("Number of named settlements where name is duplicated at least once: %s" % len(Dupes))

Number of named settlements: 274
Number of named settlements where name is duplicated at least once: 186


In [67]:
Largest = Dupes.loc[Dupes.groupby(["SettName"])["Area2015"].idxmax()]
print(Largest)

       Sett_ID  ADM_ID    Area2015                Afpl_Name  distAF  \
7            8      14   16.495685                  Banfora     0.0   
13929   388531      18    0.421528                      UNK    -1.0   
16061   415490      20    1.024925               Barsalogho     0.0   
2177     51196      37   88.690864           Bobo-Dioulasso     0.0   
22184   527798      38    1.874780                  Bogande     0.0   
5268    129734      43    3.492904                   Boromo     0.0   
15003   404331      52    2.207073                   Bousse     0.0   
11757   277234     265    3.837945  Cinkassi/Cinkanse [BFA]     0.0   
2170     51189      59    2.212172                    Dande     0.0   
5552    137698      64   12.117232                 Dedougou     0.0   
7008    180084      70    0.961186                    Didyr     0.0   
20499   474186      76    3.520099                    Djibo     0.0   
22880   580905      81    4.186386                     Dori     0.0   
22879 

In [68]:
# Filter to settlements which have a duplicated name and are not the largest of those with that name, then replace with UNK.
SettlementsNamed.loc[(~SettlementsNamed.Sett_ID.isin(Largest.Sett_ID)) 
                     & (SettlementsNamed.Sett_ID.isin(Dupes.Sett_ID)), 
                     'SettName'] = 'UNK'

In [69]:
# Second number should now be zero.

print("Number of named settlements: %s" % SettlementsNamed['SettName'].str.contains('UNK').value_counts()[False])
print("Number of named settlements where name is duplicated at least once: %s" % len(SettlementsNamed[ 
    (SettlementsNamed['SettName'] != 'UNK') & 
    (SettlementsNamed.duplicated('SettName', keep=False)) ]))

Number of named settlements: 140
Number of named settlements where name is duplicated at least once: 0


In [70]:
print(SettlementsNamed.info(), SettlementsNamed[SettlementsNamed['SettName'] != "UNK"].sample(20))

<class 'pandas.core.frame.DataFrame'>
Int64Index: 23490 entries, 0 to 23491
Data columns (total 11 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Sett_ID    23490 non-null  int64  
 1   ADM_ID     23490 non-null  int64  
 2   Area2015   23490 non-null  float64
 3   Afpl_Name  23490 non-null  object 
 4   distAF     23490 non-null  float64
 5   GeoName    23490 non-null  object 
 6   distGN     23490 non-null  float64
 7   UCDB_Name  23490 non-null  object 
 8   distUC     23490 non-null  float64
 9   SettName   23490 non-null  object 
 10  closest    23490 non-null  object 
dtypes: float64(4), int64(2), object(5)
memory usage: 2.7+ MB
None        Sett_ID  ADM_ID  Area2015                         Afpl_Name  \
13626   351202     233  1.804242                          Nadiagou   
15141   404684     213  0.720677                               UNK   
8058    202966     136  1.035123                             Kindi   
17117   445141     150  

In [71]:
# Drop extra columns and save to file.
SettlementsNamed = SettlementsNamed[['Sett_ID', 'SettName']]
SettlementsNamed.to_csv(r'Results/PlaceNames.csv')

In [72]:
del SettlementsNamed

---

## 7. CREATE FRAGMENTATION INDEX
We are determining what percentage of a settlement's area lies outside of its administrative zone each year.
The index is a range of 0 to 100, i.e. the percent of the settlement area which is fragmented.

For each Sett_ID:
((Area of Boundless settlement - Area of largest Bounded settlement feature) / Area of Boundless settlement) * 100

### 7.1 Load boundless and bounded cumulative settlements and clean.

In [73]:
BoundlessAreas = pd.read_csv(os.path.join(ResultsFolder, 'Areas1999to2015.csv'))
print('Loaded Boundless dataset, whose settlements will be used as the index of the Fragmentation Index dataset. %s' 
      % time.ctime())
print(BoundlessAreas.info())

BoundedAreas = pd.read_csv(os.path.join(ResultsFolder, 'Areas1999to2015_Bounded.csv'))
print('Loaded Bounded dataset, which will factor into the fragmentation calculation. %s' % time.ctime())
print(BoundedAreas.info())

Loaded Boundless dataset, whose settlements will be used as the index of the Fragmentation Index dataset. Thu Jan 26 13:47:43 2023
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23490 entries, 0 to 23489
Data columns (total 21 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  23490 non-null  int64  
 1   Sett_ID     23490 non-null  int64  
 2   year        23490 non-null  int64  
 3   ADM_ID      23490 non-null  int64  
 4   Area2015    23490 non-null  float64
 5   Area2014    22886 non-null  float64
 6   Area2013    22457 non-null  float64
 7   Area2012    21629 non-null  float64
 8   Area2011    21316 non-null  float64
 9   Area2010    20948 non-null  float64
 10  Area2009    20471 non-null  float64
 11  Area2008    20062 non-null  float64
 12  Area2007    19712 non-null  float64
 13  Area2006    19369 non-null  float64
 14  Area2005    19010 non-null  float64
 15  Area2004    18683 non-null  float64
 16  Area2003    18259

In [74]:
LargestFragments = BoundedAreas.loc[BoundedAreas.groupby(["Sett_ID"])["Area2015"].idxmax()] 
print(LargestFragments.info())
print("Filtered the Bounded dataset to only rows where latest year's area is largest for each Sett_ID. %s" % time.ctime())
LargestFragments.columns = LargestFragments.columns.str.replace('Area', 'Largest')
LargestFragments = LargestFragments.drop(columns=['year', 'ADM_ID'])
print("Renamed columns to avoid duplication during merge, and dropped unnecessary columns. %s" % time.ctime())
FragIndices = BoundlessAreas.merge(LargestFragments, how='left', on='Sett_ID')
print(FragIndices.info())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 23490 entries, 0 to 23739
Data columns (total 22 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  23490 non-null  int64  
 1   Bounded_ID  23490 non-null  int64  
 2   year        23490 non-null  int64  
 3   ADM_ID      23490 non-null  int64  
 4   Sett_ID     23490 non-null  int64  
 5   Area2015    23490 non-null  float64
 6   Area2014    22886 non-null  float64
 7   Area2013    22456 non-null  float64
 8   Area2012    21626 non-null  float64
 9   Area2011    21315 non-null  float64
 10  Area2010    20947 non-null  float64
 11  Area2009    20469 non-null  float64
 12  Area2008    20059 non-null  float64
 13  Area2007    19708 non-null  float64
 14  Area2006    19364 non-null  float64
 15  Area2005    19005 non-null  float64
 16  Area2004    18678 non-null  float64
 17  Area2003    18254 non-null  float64
 18  Area2002    17879 non-null  float64
 19  Area2001    17587 non-nul

In [75]:
del BoundlessAreas, BoundedAreas, LargestFragments

### 7.2 Merge and run fragmentation calculation.

In [76]:
for item in AllStudyYears:
    YY = str(item) # 4-digit year
    AreaYY = ''.join(["Area", YY]) # The Boundless area variable name
    LargestYY = ''.join(['Largest', YY]) # The Bounded largest area variable name
    FragYY = ''.join(["Frag", YY]) # Name for the fragmentation index variable
    print("Created names for Year %s's variables and temporary objects. %s" % (item, time.ctime()))
    
    FragIndices[FragYY] = ((FragIndices[AreaYY] - FragIndices[LargestYY]) / FragIndices[AreaYY]) * 100
    FragIndices[FragYY] = (FragIndices[FragYY].fillna(0).replace([np.inf, -np.inf], 0)).astype('int')
    print("Calculated fragmentation index for year %s. %s" % (item, time.ctime()))

# Remove unnecessary columns.
FragIndices = FragIndices.loc[:, ~FragIndices.columns.str.startswith('Largest')]
FragIndices = FragIndices.loc[:, ~FragIndices.columns.str.startswith('Area')]

print('Completed fragmentation index calculations for all years. %s' % time.ctime())
print(FragIndices.info())
print(FragIndices.sample(5))

Created names for Year 1999's variables and temporary objects. Thu Jan 26 13:47:48 2023
Calculated fragmentation index for year 1999. Thu Jan 26 13:47:48 2023
Created names for Year 2000's variables and temporary objects. Thu Jan 26 13:47:48 2023
Calculated fragmentation index for year 2000. Thu Jan 26 13:47:48 2023
Created names for Year 2001's variables and temporary objects. Thu Jan 26 13:47:48 2023
Calculated fragmentation index for year 2001. Thu Jan 26 13:47:48 2023
Created names for Year 2002's variables and temporary objects. Thu Jan 26 13:47:48 2023
Calculated fragmentation index for year 2002. Thu Jan 26 13:47:48 2023
Created names for Year 2003's variables and temporary objects. Thu Jan 26 13:47:48 2023
Calculated fragmentation index for year 2003. Thu Jan 26 13:47:48 2023
Created names for Year 2004's variables and temporary objects. Thu Jan 26 13:47:48 2023
Calculated fragmentation index for year 2004. Thu Jan 26 13:47:48 2023
Created names for Year 2005's variables and te

In [77]:
FragIndices = FragIndices.drop(columns=['Unnamed: 0_x', 'Unnamed: 0_y', 'year', 'ADM_ID'])
FragIndices.to_csv(os.path.join(ResultsFolder, 'FragIndex%sto%s.csv' % (1999, 2015)))
print('Saved to file. %s' % time.ctime())

Saved to file. Thu Jan 26 13:47:49 2023


In [78]:
del FragIndices

---

## 8. PREPARE YEARLY DATASETS: POPULATION
Can use this as a template for other annualized rasters

### 8.1 Reproject and reclassify with settlement buffer mask.
Reclassify so that we only need to work with cells within X distance of settlements.

In [157]:
MaskByZone(MaskPath=r'Results/Catchment.gpkg', MaskLayerName = "Buff2000m_2015", 
           SourceFolder='Population/SourceFiles', DestFolder='Population')

['bfa_ppp_2000_UNadj.tif', 'bfa_ppp_2001_UNadj.tif', 'bfa_ppp_2002_UNadj.tif', 'bfa_ppp_2003_UNadj.tif', 'bfa_ppp_2004_UNadj.tif', 'bfa_ppp_2005_UNadj.tif', 'bfa_ppp_2006_UNadj.tif', 'bfa_ppp_2007_UNadj.tif', 'bfa_ppp_2008_UNadj.tif', 'bfa_ppp_2009_UNadj.tif', 'bfa_ppp_2010_UNadj.tif', 'bfa_ppp_2011_UNadj.tif', 'bfa_ppp_2012_UNadj.tif', 'bfa_ppp_2013_UNadj.tif', 'bfa_ppp_2014_UNadj.tif', 'bfa_ppp_2015_UNadj.tif']
Source projection:  None
Destination projection:  Africa_Albers_Equal_Area_Conic
Finished gdal.Warp() for Msk_D_1999_avg.tif. Sun Feb 19 10:43:33 2023 

We warped the data, so we'll use that file for next step.
Finished rasterio.mask.mask() for bfa_ppp_2000_UNadj.tif. Sun Feb 19 10:43:36 2023 

Written to file. Sun Feb 19 10:43:38 2023 

Removed intermediate file. Sun Feb 19 10:43:38 2023 

Source projection:  None
Destination projection:  Africa_Albers_Equal_Area_Conic
Finished gdal.Warp() for Msk_D_1999_avg.tif. Sun Feb 19 10:43:41 2023 

We warped the data, so we'll use tha

In [158]:
print(os.listdir('Population/'))

['Msk_2000.tif', 'Msk_2001.tif', 'Msk_2002.tif', 'Msk_2003.tif', 'Msk_2004.tif', 'Msk_2005.tif', 'Msk_2006.tif', 'Msk_2007.tif', 'Msk_2008.tif', 'Msk_2009.tif', 'Msk_2010.tif', 'Msk_2011.tif', 'Msk_2012.tif', 'Msk_2013.tif', 'Msk_2014.tif', 'Msk_2015.tif', 'PreDefinedFunction', 'SourceFiles']


### 8.2 Raster values summarized by settlement.
1. Convert each annualized raster to .xyz, 
2. then bring them to vector space and assign their Sett_ID,
3. and finally, aggregate the value as appropriate to the settlement level and save table to file.

XYZ is similar to .csv. Raster cell centers are stored as x and y, and their value is stored as z.

In [159]:
Settlements = gpd.read_file(r'Results/SETTLEMENTS.gpkg', layer='SETTLEMENTS_equalarea')[['Sett_ID', 'geometry']]

In [None]:
BatchZonalStats(FolderName='Population', Zones=Settlements, StatsWanted=['count', 'sum'], SeriesStart=2000, SeriesEnd=2015)

['Msk_2000.tif', 'Msk_2001.tif', 'Msk_2002.tif', 'Msk_2003.tif', 'Msk_2004.tif', 'Msk_2005.tif', 'Msk_2006.tif', 'Msk_2007.tif', 'Msk_2008.tif', 'Msk_2009.tif', 'Msk_2010.tif', 'Msk_2011.tif', 'Msk_2012.tif', 'Msk_2013.tif', 'Msk_2014.tif', 'Msk_2015.tif']
       Sett_ID
0            1
1            2
2            3
3            4
4            5
...        ...
23485   612616
23486   612723
23487   612934
23488   612944
23489   615066

[23490 rows x 1 columns]
Loading data for Msk_2000.tif. Sun Feb 19 10:46:44 2023 

Finished gdal.Translate() for year 2000. Sun Feb 19 10:48:15 2023 

Loaded XYZ file as a pandas dataframe. Sun Feb 19 10:49:59 2023 

Created geodataframe from non-NoData points. Sun Feb 19 10:50:03 2023 



---

## 9. PREPARE YEARLY DATASETS: NIGHTTIME LIGHTS

### 9.1 Reclassify with settlement buffer mask.
Reclassify so that we only need to work with cells within X distance of settlements. The two NTL sources have already been reprojected in a separate script, and cropped to Central & Western Africa.

In [148]:
MaskByZone(MaskPath = r'Results/Catchment.gpkg', MaskLayerName = 'Buff250m_2015', 
          SourceFolder = r"NTL/SourceFiles", DestFolder = 'NTL')

['D_1999_avg.tif', 'D_1999_cfc.tif', 'D_2000_avg.tif', 'D_2000_cfc.tif', 'D_2001_avg.tif', 'D_2001_cfc.tif', 'D_2002_avg.tif', 'D_2002_cfc.tif', 'D_2003_avg.tif', 'D_2003_cfc.tif', 'D_2004_avg.tif', 'D_2004_cfc.tif', 'D_2005_avg.tif', 'D_2005_cfc.tif', 'D_2006_avg.tif', 'D_2006_cfc.tif', 'D_2007_avg.tif', 'D_2007_cfc.tif', 'D_2008_avg.tif', 'D_2008_cfc.tif', 'D_2009_avg.tif', 'D_2009_cfc.tif', 'D_2010_avg.tif', 'D_2010_cfc.tif', 'D_2011_avg.tif', 'D_2011_cfc.tif', 'D_2012_avg.tif', 'D_2012_cfc.tif', 'D_2013_avg.tif', 'D_2013_cfc.tif', 'V_2012_avg.tif', 'V_2012_cfc.tif', 'V_2013_avg.tif', 'V_2013_cfc.tif', 'V_2014_avg.tif', 'V_2014_cfc.tif', 'V_2015_avg.tif', 'V_2015_cfc.tif']
Source projection:  None
Destination projection:  Africa_Albers_Equal_Area_Conic
Finished gdal.Warp() for Msk_D_1999_avg.tif. Sun Feb 19 10:19:04 2023 

We warped the data, so we'll use that file for next step.
Finished rasterio.mask.mask() for D_1999_avg.tif. Sun Feb 19 10:19:07 2023 

Written to file. Sun Feb 19

Finished rasterio.mask.mask() for D_2008_cfc.tif. Sun Feb 19 10:20:23 2023 

Written to file. Sun Feb 19 10:20:23 2023 

Removed intermediate file. Sun Feb 19 10:20:23 2023 

Source projection:  None
Destination projection:  Africa_Albers_Equal_Area_Conic
Finished gdal.Warp() for Msk_D_1999_avg.tif. Sun Feb 19 10:20:24 2023 

We warped the data, so we'll use that file for next step.
Finished rasterio.mask.mask() for D_2009_avg.tif. Sun Feb 19 10:20:27 2023 

Written to file. Sun Feb 19 10:20:27 2023 

Removed intermediate file. Sun Feb 19 10:20:27 2023 

Source projection:  None
Destination projection:  Africa_Albers_Equal_Area_Conic
Finished gdal.Warp() for Msk_D_1999_avg.tif. Sun Feb 19 10:20:28 2023 

We warped the data, so we'll use that file for next step.
Finished rasterio.mask.mask() for D_2009_cfc.tif. Sun Feb 19 10:20:30 2023 

Written to file. Sun Feb 19 10:20:30 2023 

Removed intermediate file. Sun Feb 19 10:20:30 2023 

Source projection:  None
Destination projection:  Afr

In [149]:
print(os.listdir('NTL/'))

['Msk_D_1999_avg.tif', 'Msk_D_1999_cfc.tif', 'Msk_D_2000_avg.tif', 'Msk_D_2000_cfc.tif', 'Msk_D_2001_avg.tif', 'Msk_D_2001_cfc.tif', 'Msk_D_2002_avg.tif', 'Msk_D_2002_cfc.tif', 'Msk_D_2003_avg.tif', 'Msk_D_2003_cfc.tif', 'Msk_D_2004_avg.tif', 'Msk_D_2004_cfc.tif', 'Msk_D_2005_avg.tif', 'Msk_D_2005_cfc.tif', 'Msk_D_2006_avg.tif', 'Msk_D_2006_cfc.tif', 'Msk_D_2007_avg.tif', 'Msk_D_2007_cfc.tif', 'Msk_D_2008_avg.tif', 'Msk_D_2008_cfc.tif', 'Msk_D_2009_avg.tif', 'Msk_D_2009_cfc.tif', 'Msk_D_2010_avg.tif', 'Msk_D_2010_cfc.tif', 'Msk_D_2011_avg.tif', 'Msk_D_2011_cfc.tif', 'Msk_D_2012_avg.tif', 'Msk_D_2012_cfc.tif', 'Msk_D_2013_avg.tif', 'Msk_D_2013_cfc.tif', 'Msk_V_2012_avg.tif', 'Msk_V_2012_cfc.tif', 'Msk_V_2013_avg.tif', 'Msk_V_2013_cfc.tif', 'Msk_V_2014_avg.tif', 'Msk_V_2014_cfc.tif', 'Msk_V_2015_avg.tif', 'Msk_V_2015_cfc.tif', 'PreDefiningFunction', 'SourceFiles']


### 9.2 Raster values summarized by settlement.
1. Convert each annualized raster to .xyz, 
2. then bring them to vector space and assign their Sett_ID,
3. and finally, aggregate the value as appropriate to the settlement level and save table to file.

XYZ is similar to .csv. Raster cell centers are stored as x and y, and their value is stored as z.

In [152]:
Settlements = gpd.read_file(r'Results/SETTLEMENTS.gpkg', layer='SETTLEMENTS_equalarea')[['Sett_ID', 'geometry']]

In [153]:
BatchZonalStats(FolderName = 'NTL', Zones=Settlements)

['Msk_D_1999_avg.tif', 'Msk_D_1999_cfc.tif', 'Msk_D_2000_avg.tif', 'Msk_D_2000_cfc.tif', 'Msk_D_2001_avg.tif', 'Msk_D_2001_cfc.tif', 'Msk_D_2002_avg.tif', 'Msk_D_2002_cfc.tif', 'Msk_D_2003_avg.tif', 'Msk_D_2003_cfc.tif', 'Msk_D_2004_avg.tif', 'Msk_D_2004_cfc.tif', 'Msk_D_2005_avg.tif', 'Msk_D_2005_cfc.tif', 'Msk_D_2006_avg.tif', 'Msk_D_2006_cfc.tif', 'Msk_D_2007_avg.tif', 'Msk_D_2007_cfc.tif', 'Msk_D_2008_avg.tif', 'Msk_D_2008_cfc.tif', 'Msk_D_2009_avg.tif', 'Msk_D_2009_cfc.tif', 'Msk_D_2010_avg.tif', 'Msk_D_2010_cfc.tif', 'Msk_D_2011_avg.tif', 'Msk_D_2011_cfc.tif', 'Msk_D_2012_avg.tif', 'Msk_D_2012_cfc.tif', 'Msk_D_2013_avg.tif', 'Msk_D_2013_cfc.tif', 'Msk_V_2012_avg.tif', 'Msk_V_2012_cfc.tif', 'Msk_V_2013_avg.tif', 'Msk_V_2013_cfc.tif', 'Msk_V_2014_avg.tif', 'Msk_V_2014_cfc.tif', 'Msk_V_2015_avg.tif', 'Msk_V_2015_cfc.tif']
       Sett_ID
0            1
1            2
2            3
3            4
4            5
...        ...
23485   612616
23486   612723
23487   612934
23488   61294


Joined zone ID onto vectorized raster cells. Sun Feb 19 10:27:30 2023 

          Z  Sett_ID
298134  255   359197
568462  255   351448
92607   255   595231
239063   64   400517
370245  255   366201
457103  255   156058
84749   255   388932
109355   67   468224
262767  255   515632
526269   62    13849

Exported as table. Sun Feb 19 10:27:32 2023 

Removed (or skipped if error) intermediate xyz file. Sun Feb 19 10:27:32 2023 


Count of cloud-free observations averaged to settlement level, year 2000. Sun Feb 19 10:27:32 2023 

       Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
14433   396424          3.0  6.805647e+38  2.268549e+38  3.402823e+38   
5805    138060         30.0  9.527906e+39  3.175969e+38  3.402823e+38   
14808   400664         10.0  3.402823e+39  3.402823e+38  3.402823e+38   
14124   388776         22.0  7.145929e+39  3.248150e+38  3.402823e+38   
14426   396414          2.0  3.402823e+38  1.701412e+38  3.402823e+38   
5191    125407          2.0  

Finished gdal.Translate() for year 2002. Sun Feb 19 10:28:39 2023 

Loaded XYZ file as a pandas dataframe. Sun Feb 19 10:28:40 2023 

Created geodataframe from non-NoData points. Sun Feb 19 10:28:40 2023 


Joined zone ID onto vectorized raster cells. Sun Feb 19 10:29:09 2023 

                   Z  Sett_ID
612482  3.402823e+38   351346
102809  3.402823e+38   388946
592090  3.402823e+38     6273
26188   3.402823e+38   388946
643361  3.402823e+38     1701
38010   3.402823e+38   388946
409367  3.402823e+38   132479
505253  3.402823e+38   351346
6055    3.402823e+38   612944
458767  3.402823e+38    51179

Exported as table. Sun Feb 19 10:29:11 2023 

Removed (or skipped if error) intermediate xyz file. Sun Feb 19 10:29:11 2023 


Desired aggregation methods applied to settlement level, year 2002. Sun Feb 19 10:29:11 2023 

       Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
19958   460349          3.0  6.805647e+38  2.268549e+38  3.402823e+38   
11480   265958        

Finished gdal.Translate() for year 2003. Sun Feb 19 10:29:45 2023 

Loaded XYZ file as a pandas dataframe. Sun Feb 19 10:29:46 2023 

Created geodataframe from non-NoData points. Sun Feb 19 10:29:46 2023 


Joined zone ID onto vectorized raster cells. Sun Feb 19 10:30:14 2023 

                   Z  Sett_ID
25463   3.402823e+38   458340
225681  3.402823e+38   409860
299400  3.402823e+38   138148
74127   3.402823e+38   458333
253450  3.402823e+38   396608
389230  3.402823e+38   358868
236563  3.402823e+38   416042
593025  3.402823e+38       86
474444  3.402823e+38   236577
281697  3.402823e+38   508872

Exported as table. Sun Feb 19 10:30:16 2023 

Removed (or skipped if error) intermediate xyz file. Sun Feb 19 10:30:16 2023 


Desired aggregation methods applied to settlement level, year 2003. Sun Feb 19 10:30:16 2023 

       Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
7047    180155         28.0  9.187623e+39  3.281294e+38  3.402823e+38   
10612   236718        

Finished gdal.Translate() for year 2004. Sun Feb 19 10:30:50 2023 

Loaded XYZ file as a pandas dataframe. Sun Feb 19 10:30:51 2023 

Created geodataframe from non-NoData points. Sun Feb 19 10:30:51 2023 


Joined zone ID onto vectorized raster cells. Sun Feb 19 10:31:20 2023 

                   Z  Sett_ID
191203  3.402823e+38   388927
11851   3.402823e+38   388946
57435   3.402823e+38   581615
55622   3.402823e+38   597544
376151  3.402823e+38   199973
41293   3.402823e+38   597561
508371  2.796747e+00      773
653066  3.402823e+38   351211
217010  3.402823e+38   568745
77724   3.402823e+38   458333

Exported as table. Sun Feb 19 10:31:22 2023 

Removed (or skipped if error) intermediate xyz file. Sun Feb 19 10:31:22 2023 


Desired aggregation methods applied to settlement level, year 2004. Sun Feb 19 10:31:22 2023 

       Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
8421    203559          1.0  3.402823e+38  3.402823e+38  3.402823e+38   
4467    102720        

Finished gdal.Translate() for year 2005. Sun Feb 19 10:31:58 2023 

Loaded XYZ file as a pandas dataframe. Sun Feb 19 10:31:59 2023 

Created geodataframe from non-NoData points. Sun Feb 19 10:31:59 2023 


Joined zone ID onto vectorized raster cells. Sun Feb 19 10:32:29 2023 

                   Z  Sett_ID
174647  3.402823e+38   568874
506544  3.402823e+38    54050
367861  3.402823e+38   129963
45882   3.402823e+38   568902
213001  3.113179e+00   453764
149064  3.402823e+38   468355
543533  3.402823e+38    89563
434303  3.402823e+38    37031
463859  3.402823e+38   351448
479889  3.402823e+38   237822

Exported as table. Sun Feb 19 10:32:31 2023 

Removed (or skipped if error) intermediate xyz file. Sun Feb 19 10:32:31 2023 


Desired aggregation methods applied to settlement level, year 2005. Sun Feb 19 10:32:31 2023 

       Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
11969   277690          2.0  3.402823e+38  1.701412e+38  3.402823e+38   
12208   294277        

Finished gdal.Translate() for year 2006. Sun Feb 19 10:33:07 2023 

Loaded XYZ file as a pandas dataframe. Sun Feb 19 10:33:08 2023 

Created geodataframe from non-NoData points. Sun Feb 19 10:33:08 2023 


Joined zone ID onto vectorized raster cells. Sun Feb 19 10:33:37 2023 

                   Z  Sett_ID
417037  3.402823e+38   351319
169384  3.402823e+38   359342
41095   3.402823e+38   597335
591327  3.402823e+38    84361
197114  3.402823e+38   568787
311645  3.402823e+38   359103
88046   3.402823e+38   581592
456205  3.402823e+38   162616
74870   3.402823e+38   388946
453179  3.402823e+38   358868

Exported as table. Sun Feb 19 10:33:39 2023 

Removed (or skipped if error) intermediate xyz file. Sun Feb 19 10:33:39 2023 


Desired aggregation methods applied to settlement level, year 2006. Sun Feb 19 10:33:39 2023 

       Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
22605   556871         30.0  9.187623e+39  3.062541e+38  3.402823e+38   
3899     92495        

Finished gdal.Translate() for year 2007. Sun Feb 19 10:34:16 2023 

Loaded XYZ file as a pandas dataframe. Sun Feb 19 10:34:17 2023 

Created geodataframe from non-NoData points. Sun Feb 19 10:34:18 2023 


Joined zone ID onto vectorized raster cells. Sun Feb 19 10:34:46 2023 

                   Z  Sett_ID
109255  3.402823e+38   400843
590199  3.402823e+38     2714
341493  3.402823e+38    37253
436897  3.402823e+38   351312
289944  3.402823e+38   556915
282887  3.402823e+38   359265
99472   3.402823e+38   468653
133494  3.402823e+38   388946
374414  2.589366e+00   272300
589083  3.402823e+38   351211

Exported as table. Sun Feb 19 10:34:48 2023 

Removed (or skipped if error) intermediate xyz file. Sun Feb 19 10:34:48 2023 


Desired aggregation methods applied to settlement level, year 2007. Sun Feb 19 10:34:48 2023 

       Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
8871    204411          1.0  2.866604e+00  2.866604e+00  2.866604e+00   
12346   294539        

Finished gdal.Translate() for year 2008. Sun Feb 19 10:35:23 2023 

Loaded XYZ file as a pandas dataframe. Sun Feb 19 10:35:24 2023 

Created geodataframe from non-NoData points. Sun Feb 19 10:35:24 2023 


Joined zone ID onto vectorized raster cells. Sun Feb 19 10:35:25 2023 

               Z  Sett_ID
439120  2.958333   165769
214921  2.457143   445743
267272  2.891892   514917
386675  2.957747    51940
73620   2.712500   597211
310540  2.986667   310166
530816  2.813559      604
296899  3.454545   501561
537964  2.666667      214
358371  2.558442   318745

Exported as table. Sun Feb 19 10:35:25 2023 

Removed (or skipped if error) intermediate xyz file. Sun Feb 19 10:35:25 2023 


Desired aggregation methods applied to settlement level, year 2008. Sun Feb 19 10:35:25 2023 

       Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
15587   409423         51.0  1.735440e+40  3.402823e+38  3.402823e+38   
336        390          1.0  4.290984e+00  4.290984e+00  4.290984e

Finished gdal.Translate() for year 2009. Sun Feb 19 10:35:32 2023 

Loaded XYZ file as a pandas dataframe. Sun Feb 19 10:35:33 2023 

Created geodataframe from non-NoData points. Sun Feb 19 10:35:33 2023 


Joined zone ID onto vectorized raster cells. Sun Feb 19 10:35:34 2023 

                Z  Sett_ID
454977   2.633333   165870
465051   3.120000   236602
308056  10.913043   227047
216631   2.192308   388866
165428   3.138889   458144
345991   2.724138   294261
355987   3.166667   309658
307256   2.636364   295080
389409   3.060606   277878
219638   2.785714   445238

Exported as table. Sun Feb 19 10:35:34 2023 

Removed (or skipped if error) intermediate xyz file. Sun Feb 19 10:35:34 2023 


Desired aggregation methods applied to settlement level, year 2009. Sun Feb 19 10:35:34 2023 

       Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
11552   266102         14.0  4.423671e+39  3.159765e+38  3.402823e+38   
10423   231847          1.0  3.402823e+38  3.402823e+38

Finished gdal.Translate() for year 2010. Sun Feb 19 10:35:41 2023 

Loaded XYZ file as a pandas dataframe. Sun Feb 19 10:35:42 2023 

Created geodataframe from non-NoData points. Sun Feb 19 10:35:42 2023 


Joined zone ID onto vectorized raster cells. Sun Feb 19 10:35:43 2023 

               Z  Sett_ID
224933  3.830986   400594
170039  3.691176   445953
319539  4.060606   138190
186386  3.788732   509105
409262  3.926471   277738
375807  7.014084   247086
419904  4.687500   236740
306908  4.688525   138045
329688  5.478261   227036
308843  4.454545   203971

Exported as table. Sun Feb 19 10:35:43 2023 

Removed (or skipped if error) intermediate xyz file. Sun Feb 19 10:35:43 2023 


Desired aggregation methods applied to settlement level, year 2010. Sun Feb 19 10:35:44 2023 

       Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
13537   319236        102.0  3.470880e+40  3.402823e+38  3.402823e+38   
8726    204048          1.0  3.402823e+38  3.402823e+38  3.402823e

Finished gdal.Translate() for year 2011. Sun Feb 19 10:35:51 2023 

Loaded XYZ file as a pandas dataframe. Sun Feb 19 10:35:52 2023 

Created geodataframe from non-NoData points. Sun Feb 19 10:35:52 2023 


Joined zone ID onto vectorized raster cells. Sun Feb 19 10:35:53 2023 

        Z  Sett_ID
351815  5   215262
333048  3   294504
253702  3   415892
221010  4   400628
337387  4   203523
211075  4   400701
273298  4   396490
75389   3   597381
232647  3   388797
380386  3    51965

Exported as table. Sun Feb 19 10:35:53 2023 

Removed (or skipped if error) intermediate xyz file. Sun Feb 19 10:35:53 2023 


Desired aggregation methods applied to settlement level, year 2011. Sun Feb 19 10:35:53 2023 

       Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
19619   458287          NaN           NaN           NaN           NaN   
1740     37034         24.0  7.826494e+39  3.261039e+38  3.402823e+38   
10680   240411         96.0  3.266711e+40  3.402823e+38  3.402823e+38 

Finished gdal.Translate() for year 2012. Sun Feb 19 10:36:00 2023 

Loaded XYZ file as a pandas dataframe. Sun Feb 19 10:36:01 2023 

Created geodataframe from non-NoData points. Sun Feb 19 10:36:01 2023 


Joined zone ID onto vectorized raster cells. Sun Feb 19 10:36:02 2023 

                Z  Sett_ID
317738   3.940299   310073
302299  49.268658   227047
293065   5.150000   137699
237467   3.546667   416057
474220   4.176471   155989
500997   4.151515    30161
324593   3.842857   138055
295012   3.535211   204180
222931   3.442857   453911
281579   4.028986   501760

Exported as table. Sun Feb 19 10:36:02 2023 

Removed (or skipped if error) intermediate xyz file. Sun Feb 19 10:36:02 2023 


Desired aggregation methods applied to settlement level, year 2012. Sun Feb 19 10:36:03 2023 

       Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
22161   525266          9.0  3.062541e+39  3.402823e+38  3.402823e+38   
8473    203649          NaN           NaN           NaN

Finished gdal.Translate() for year 2013. Sun Feb 19 10:36:10 2023 

Loaded XYZ file as a pandas dataframe. Sun Feb 19 10:36:11 2023 

Created geodataframe from non-NoData points. Sun Feb 19 10:36:11 2023 


Joined zone ID onto vectorized raster cells. Sun Feb 19 10:36:12 2023 

                Z  Sett_ID
200397   3.727273   454292
400078   3.542857    37119
237411   3.680556   409642
296015   4.260870   227641
516322   3.878788    35188
452445  58.532467    51196
561553   4.338461    87580
378758   3.703125   180106
277828   4.071429   180519
442539   4.128205    51657

Exported as table. Sun Feb 19 10:36:12 2023 

Removed (or skipped if error) intermediate xyz file. Sun Feb 19 10:36:12 2023 


Desired aggregation methods applied to settlement level, year 2013. Sun Feb 19 10:36:12 2023 

       Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
19777   458473          3.0  1.020847e+39  3.402823e+38  3.402823e+38   
14213   388878          4.0  1.361129e+39  3.402823e+38

Finished gdal.Translate() for year 2012. Sun Feb 19 10:36:23 2023 

Loaded XYZ file as a pandas dataframe. Sun Feb 19 10:36:26 2023 

Created geodataframe from non-NoData points. Sun Feb 19 10:36:26 2023 


Joined zone ID onto vectorized raster cells. Sun Feb 19 10:36:28 2023 

                 Z  Sett_ID
1274228   3.616046   227047
1560696   0.201957   265667
867092    0.127327   456898
1283495   0.187434   309529
1810573   0.163434   155897
1209382   0.426780   227047
867096    0.058891   456898
1806685   8.331445    51196
922673    2.750182   400483
1231017  14.372094   227047

Exported as table. Sun Feb 19 10:36:28 2023 

Removed (or skipped if error) intermediate xyz file. Sun Feb 19 10:36:28 2023 


Desired aggregation methods applied to settlement level, year 2012. Sun Feb 19 10:36:28 2023 

       Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
8440    203588          4.0  1.020847e+39  2.552118e+38  3.402823e+38   
22850   574874          2.0  6.805647e+38  3

Finished gdal.Translate() for year 2013. Sun Feb 19 10:36:47 2023 

Loaded XYZ file as a pandas dataframe. Sun Feb 19 10:36:50 2023 

Created geodataframe from non-NoData points. Sun Feb 19 10:36:50 2023 


Joined zone ID onto vectorized raster cells. Sun Feb 19 10:36:52 2023 

                Z  Sett_ID
1270406  0.420260   180085
1241831  1.044975   227047
1240035  1.889953   227047
951404   0.069640   401297
1227401  6.547158   227047
1290435  1.172861   227047
1322624  0.492216   180694
1382734  0.224199   318632
1239991  6.156235   227047
1199942  0.254382   137698

Exported as table. Sun Feb 19 10:36:52 2023 

Removed (or skipped if error) intermediate xyz file. Sun Feb 19 10:36:52 2023 


Desired aggregation methods applied to settlement level, year 2013. Sun Feb 19 10:36:52 2023 

       Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
1966     38637          6.0  1.701412e+39  2.835686e+38  3.402823e+38   
12979   310078          5.0  1.701412e+39  3.402823e+38

Finished gdal.Translate() for year 2014. Sun Feb 19 10:37:11 2023 

Loaded XYZ file as a pandas dataframe. Sun Feb 19 10:37:15 2023 

Created geodataframe from non-NoData points. Sun Feb 19 10:37:15 2023 


Joined zone ID onto vectorized raster cells. Sun Feb 19 10:37:16 2023 

                 Z  Sett_ID
685094    4.053637   456905
1212988   3.282514   227047
692296    3.905006   456905
1274230   7.839927   234081
1829465   0.462932   351201
1812373   0.681269   155897
694099    1.974902   456905
1292191   0.250984   227277
1562495   2.550558   265667
1240019  21.193678   227047

Exported as table. Sun Feb 19 10:37:16 2023 

Removed (or skipped if error) intermediate xyz file. Sun Feb 19 10:37:16 2023 


Desired aggregation methods applied to settlement level, year 2014. Sun Feb 19 10:37:16 2023 

       Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
23090   581358         12.0  4.083388e+39  3.402823e+38  3.402823e+38   
20867   474787         45.0  1.531271e+40  3

Finished gdal.Translate() for year 2015. Sun Feb 19 10:37:35 2023 

Loaded XYZ file as a pandas dataframe. Sun Feb 19 10:37:39 2023 

Created geodataframe from non-NoData points. Sun Feb 19 10:37:39 2023 


Joined zone ID onto vectorized raster cells. Sun Feb 19 10:37:40 2023 

                 Z  Sett_ID
1385940   0.250107   215150
1813090   0.869642   277234
1794048   0.959889    51640
1144605   0.587136   501405
1274244   2.391495   227047
1848094   0.383477    51464
1265228  21.848469   227047
1250831   2.345626   227047
2231581   0.377271        3
1819295   5.830219    51196

Exported as table. Sun Feb 19 10:37:40 2023 

Removed (or skipped if error) intermediate xyz file. Sun Feb 19 10:37:40 2023 


Desired aggregation methods applied to settlement level, year 2015. Sun Feb 19 10:37:40 2023 

       Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
8455    203615          4.0  1.020847e+39  2.552118e+38  3.402823e+38   
18347   454337          7.0  2.041694e+39  2

Saved to file. Sun Feb 19 10:37:58 2023 



---

## 10. FLOOD EXPOSURE BY RETURN PERIOD

### 10.1 Reclassify and set NoData values of rasters.

##### Flood layers

In [None]:
InRasters = ['FD_1in20.tif', 'FD_1in100.tif', 'FD_1in1000.tif']
BinaryRasters = []

for Raster in InRasters:
    InPath = os.path.join(ProjectFolder, 'Hazard', Raster)
    OutPath = os.path.join(ProjectFolder, 'Hazard', Raster.replace('.tif', '_binary.tif'))
    
    [xsize, ysize, geotransform, geoproj, Z] = readRaster(InPath)
    
    Z[Z<0.15] = 0
    Z[Z>=0.15] = 1
    
    writeRaster(OutPath,geotransform,geoproj,Z)
    InPath = OutPath = None
    BinaryRasters = BinaryRasters + [Raster.replace('.tif', '_binary.tif')]
    
    print('Finished reclassifying %s. %s' % (Raster, time.ctime()))

print('\nNew flood set: %s' % BinaryRasters)

### 10.2 Resample flood data to match buildup layer.

 - Align flood to WSFE: CRS, extent, origin, and resolution.

In [None]:
BinaryRasters = ['FD_1in20_binary.tif', 'FD_1in100_binary.tif', 'FD_1in1000_binary.tif']
ResampledRasters = []

WSFEPath = os.path.join(ProjectFolder, 'Buildup', 'WSFE_reclass.tif')

for Raster in BinaryRasters:
    RasterPath = os.path.join(ProjectFolder, 'Hazard', Raster)
    OutPath = os.path.join(ProjectFolder, 'Hazard', Raster.replace('_binary', '_upsample'))
    resampleRaster(RasterPath, WSFEPath, OutPath)
    ResampledRasters = ResampledRasters + [Raster.replace('_binary', '_upsample')]
    
print('Done. New flood set: %s. %s' % (ResampledRasters, time.ctime()))

### 10.3 Mask out built areas that were not flooded.

In [None]:
WSFEPath = ''.join([r'Buildup/', 'WSFE_reclass.tif'])
ResampledRasters = ['FD_1in20_upsample.tif', 'FD_1in100_upsample.tif', 'FD_1in1000_upsample.tif']
IntersectRasters = []

for Raster in ResampledRasters:
    InPath = ''.join([r'Hazard/', Raster])
    OutPath = ''.join([r'Hazard/', Raster.replace('_upsample', '_WSFE')])
    
    calcShell(WSFEPath, InPath, OutPath, "A*B")
    IntersectRasters = IntersectRasters + [Raster.replace('_upsample', '_WSFE')]
    
print('Done with list. New flood set: %s' % IntersectRasters)

Generate a spatial object from flooded-only buildup cells
 - WSFE_Flooded to numpy array to Pandas points dataframe, excluding 0 and NoData.

Calculate area at risk of flood per settlement per year (that year's new buildup only)
 - Create "area" field for each feature with the value 30 (30 square meter resolution)
 - Spatial join Settlement IDs onto WSFE_Flooded_dataframe
 - Calculate sum of area field for each [Sett_ID, year] group. (new dataframe created)

Calculate area at risk of flood per settlement per year (cumulative area in that year)
 - Iterating through the study years, subset the sum area dataframe to that year's build-up to-date
 - Calculate sum of area field for each Sett_ID
 - Assign sum of area for that cumulative year to a Settlements dataframe by table join on Sett_ID

Calculate percent flooded
 - Area_flooded_year / Area_year = percent_flooded_year
 - Save to file as csv.

### 10.4 Rasterize settlement and create "join" serial.

In [None]:
Settlements = gpd.read_file(r'Results/SETTLEMENTS.gpkg', layer='SETTLEMENTS_equalarea')[['Sett_ID', 'geometry']]
Sett = os.path.join(ProjectFolder, 'Settlement', 'Settlements_rasterized.tif')
# Copy and update the metadata from WSFE for the output
WSFE = os.path.join(ProjectFolder, 'Buildup', 'WSFE_equalarea.tif')

ShapeToRaster(Shapefile=Settlements, ValueVar="Sett_ID", MetaRasterPath=WSFE, OutFilePath=Sett)

In [None]:
len_Sett = len(str(Settlements['Sett_ID'].max()))
SettRasters = []

for Raster in IntersectRasters:
    InFlood = os.path.join(ProjectFolder, 'Hazard', Raster)
    OutSettFlood = os.path.join(ProjectFolder, 'Hazard', Raster.replace('_WSFE', '_Settlements'))
    # Calculations
    Calc = "(A*" + str(10**len_Sett) + ")+B" 

    calcShell(A=InG3, B=InADM, OutFile=G3_ADM, Calculation=Calc)
    
    SettRasters = SettRasters + [Raster.replace('_WSFE', '_Settlements')]
    rioStats(OutSettFlood)

print('Done with list. New flood set: %s. %s' % (SettRasters, time.ctime()))

### 10.5 Vectorize serialized layers.

In [None]:
FloodShps = []

for Raster in SettRasters:
    InPath = os.path.join(ProjectFolder, 'Hazard', Raster)
    OutName = Raster.replace('.tif', '')
    OutPath = os.path.join(ProjectFolder, 'Hazard', ''.join(OutName, '.shp'))
    
    RasterToShapefile(InPath, OutPath, OutName=OutName, VariableName='gridcode', Driver = 'ESRI Shapefile')
    
    FloodShps = FloodShps + [''.join(OutName, '.shp')]

print('Done with list. New flood set: %s. %s' % (FloodShps, time.ctime()))

### 10.6 Vector math to split raster strings back out into df variables.

In [None]:
# Load newly created vectorized datasets.
GRID3_ADM = gpd.read_file(r"Settlement/GRID3_ADM.shp")
WSFE_ADM = gpd.read_file(r"Buildup/WSFE_ADM.shp")
print(GRID3_ADM.info(), "\n\n", GRID3_ADM.sample(10), "\n\n", GRID3_ADM.crs, "\n\n", 
      WSFE_ADM.info(), "\n\n", WSFE_ADM.sample(10), "\n\n", WSFE_ADM.crs, "\n\n", 
      GRID3_ADM['gridcode'].max(), WSFE_ADM['gridcode'].max())

In [None]:
# Split serial back into separate dataset fields.
# For example, Burkina: WSFE and ADM: 4+3=7 digits. GRID3 and ADM: 6+3=9 digits.

G3_Fill = len_G3 + len_ADM
WSFE_Fill = 4 + len_ADM

GRID3_ADM['gridstring'] = GRID3_ADM['gridcode'].astype(str).str.zfill(G3_Fill)
WSFE_ADM['gridstring'] = WSFE_ADM['gridcode'].astype(str).str.zfill(WSFE_Fill)

GRID3_ADM['Sett_ID'] = GRID3_ADM['gridstring'].str[:-len_ADM].astype(int) # Remove the last 3 digits to get the GRID3 portion.
GRID3_ADM['ADM_ID'] = GRID3_ADM['gridstring'].str[-len_ADM:].astype(int) # Keep only the last 3 digits to get the ADM portion.
WSFE_ADM['year'] = WSFE_ADM['gridstring'].str[:-len_ADM].astype(int)
WSFE_ADM['ADM_ID'] = WSFE_ADM['gridstring'].str[-len_ADM:].astype(int)

print(GRID3_ADM.sample(10), WSFE_ADM.sample(10))

### 10.4 Generate vector spatial object from built-up cells and from flooded-only subset of built-up cells.

In [None]:
Settlements = gpd.read_file(r'Results/SETTLEMENTS.gpkg', layer='SETTLEMENTS_equalarea')[['Sett_ID', 'geometry']]

# Create a dataframe on which to merge results.
AllSummaries = pd.DataFrame(Settlements).drop(columns='geometry') 
print(AllSummaries.sample(10))

IntersectRasters = ['FD_1in20_WSFE.tif', 'FD_1in100_WSFE.tif', 'FD_1in1000_WSFE.tif']

print(WSFEYears, '\n', AllStudyYears)

##### All build-up

In [None]:
### STEP 1: TIF TO DATAFRAME OBJECT ###
Raster = 'WSFE_reclass.tif'
InPath = os.path.join(ProjectFolder, 'Buildup', Raster)
OutCSVPath = ''.join([r'Buildup/', Raster.replace('.tif', '.csv')])

if exists(OutCSVPath):
    print('Already produced points for %s. Moving on to aggregation. %s' % (Raster, time.ctime()))
    ValDF_withID = pd.read_csv(OutCSVPath)
else:
    print('Loading data for %s. %s \n' % (Raster, time.ctime()))
    ValDF = rioxarray.open_rasterio(InPath).to_dataframe("year").reset_index()
    ValDF = ValDF[ValDF.year >= 1985]  # Making sure we only retain cells with buildup years.
    ValDF = ValDF[ValDF.year <= 2015]

    print('---Created pandas dataframe. %s \n' % time.ctime())


### STEP 2: GENERATE GEODATAFRAME WITH SETT_ID FIELD ###
    ValDF = gpd.GeoDataFrame(ValDF,
                              geometry = gpd.points_from_xy(ValDF['x'], ValDF['y']),
                              crs = 'EPSG:4326')[['year', 'geometry']]
    print('---Created geodataframe from non-NoData points. %s \n' % time.ctime())

    # Remember to reproject our new raster-derived geodataframe into the same equal area projection as Settlements.
    ValDF = ValDF.to_crs("ESRI:102022")

    # Sjoin_nearest: No need to group by ADM this time. 
    ValDF_withID = gpd.sjoin_nearest(ValDF, 
                                    Settlements, 
                                    how='left') # No need for max_distance parameter this time. We've already narrowed down to nearby raster cells.

    print('---Joined settlement ID onto vectorized raster cells. %s \n' % time.ctime())
    try:
        print(ValDF_withID.sample(5))
    except ValueError:
        pass
    del ValDF

    # We no longer need the spatial information of the raster values because we have their unique settlement ID.
    ValDF_withID = pd.DataFrame(ValDF_withID).drop(columns='geometry')

    ValDF_withID.to_csv(OutCSVPath)
    print('---Exported as table. %s \n' % time.ctime())


### STEP 3: AGGREGATE BY SETTLEMENT AND MERGE ONTO SUMMARIES TABLE ###

# AllStudyYears are our variables (1999-2015), but we'll pull cells from the start of WSFEYears. (1985)
for item in AllStudyYears: 
    print('------Subsetting to cumulative area for year: %s. %s\n' % (item, time.ctime()))
    YearSubset = ValDF_withID[ValDF_withID['year'].between(1985, item, inclusive=True)]
    VariableName = ''.join(['WSFE', str(item)])

    ValAggregated = YearSubset.groupby(
        'Sett_ID', as_index=False)['year'].count().rename(columns={'year': VariableName})
    print('\n------Cells flooded by that year per settlement counted, year %s. %s \n' % (item, time.ctime()))

    try:
        print(ValAggregated.sample(5))
    except ValueError:
        pass
    AllSummaries = AllSummaries.merge(ValAggregated, how='left', on='Sett_ID')
    print(AllSummaries.sample(5))

    del VariableName, ValAggregated, YearSubset
del ValDF_withID
print('\n\n')


print('\n\nFinished. All years masked and assigned their nearest settlement. %s' % time.ctime())

AllSummaries.to_csv(os.path.join(ResultsFolder, 'WSFEcells%sto%s.csv' % (1999, 2015)))
print('Saved to file. %s \n' % time.ctime())

In [None]:
AllSummaries = pd.read_csv(os.path.join(ResultsFolder, 'WSFEcells1999to2015.csv'))

##### Flooded build-up

In [None]:
for Raster in IntersectRasters:
    
### STEP 1: TIF TO DATAFRAME OBJECT ###
    InPath = os.path.join(ProjectFolder, 'Hazard', Raster)
    OutCSVPath = ''.join([r'Hazard/', Raster.replace('.tif', '.csv')])
    
    if exists(OutCSVPath):
        print('Already produced points for %s. Moving on to aggregation. %s' % (Raster, time.ctime()))
        ValDF_withID = pd.read_csv(OutCSVPath)
    else:
        print('Loading data for %s. %s \n' % (Raster, time.ctime()))
        ValDF = rioxarray.open_rasterio(InPath, dtype='uint32').to_dataframe("year").reset_index()
#         ValDF = ValDF[ValDF.year >= 1985] # Making sure we only retain cells with buildup years.
#         ValDF = ValDF[ValDF.year <= 2015] 
        print('---Created pandas dataframe. %s \n' % time.ctime())


### STEP 2: GENERATE GEODATAFRAME WITH SETT_ID FIELD ###
        ValDF = gpd.GeoDataFrame(ValDF,
                                  geometry = gpd.points_from_xy(ValDF['x'], ValDF['y']),
                                  crs = 'EPSG:4326')[['year', 'geometry']]
        print('---Created geodataframe from non-NoData points. %s \n' % time.ctime())

        # Remember to reproject our new raster-derived geodataframe into the same equal area projection as Settlements.
        ValDF = ValDF.to_crs("ESRI:102022")

        # Sjoin_nearest: No need to group by ADM this time. 
        ValDF_withID = gpd.sjoin_nearest(ValDF, 
                                        Settlements, 
                                        how='left') # No need for max_distance parameter this time. We've already narrowed down to nearby raster cells.

        print('---Joined settlement ID onto vectorized raster cells. %s \n' % time.ctime())
        try:
            print(ValDF_withID.sample(5))
        except ValueError:
            pass
        del ValDF

        # We no longer need the spatial information of the raster values because we have their unique settlement ID.
        ValDF_withID = pd.DataFrame(ValDF_withID).drop(columns='geometry')

        ValDF_withID.to_csv(OutCSVPath)
        print('---Exported as table. %s \n' % time.ctime())

    
### STEP 3: AGGREGATE BY SETTLEMENT AND MERGE ONTO SUMMARIES TABLE ###

    # AllStudyYears are our variables (1999-2015), but we'll pull cells from the start of WSFEYears. (1985)
    for item in AllStudyYears: 
        print('------Subsetting to cumulative area for year: %s. %s\n' % (item, time.ctime()))
        YearSubset = ValDF_withID[ValDF_withID['year'].between(1985, item, inclusive=True)]
        FloodedVariable = ''.join([Raster.replace('FD_', 'FD').replace('WSFE.tif',''), str(item)])
        WSFEVariable = ''.join(['WSFE', str(item)])
        
        ValAggregated = YearSubset.groupby(
            'Sett_ID', as_index=False)['year'].count().rename(columns={'year': FloodedVariable})
        print('\n------Cells flooded by that year per settlement counted, year %s. %s \n' % (item, time.ctime()))
        
        try:
            print(ValAggregated.sample(5))
        except ValueError:
            pass
        AllSummaries = AllSummaries.merge(ValAggregated, how='left', on='Sett_ID')
        AllSummaries[FloodedVariable] = AllSummaries[FloodedVariable] / AllSummaries[WSFEVariable] # Area of cells (30m2 each) as percent of total area that year.
        print(AllSummaries.sample(5))
        
        del VariableName, ValAggregated, YearSubset
    del ValDF_withID
    print('\n\n')


print('\n\nFinished. All years masked and assigned their nearest settlement. %s' % time.ctime())

AllSummaries = AllSummaries.loc[:, ~AllSummaries.columns.str.startswith('WSFE')] # Drop WSFE columns
AllSummaries.to_csv(os.path.join(ResultsFolder, 'Flood%sto%s.csv' % (1999, 2015)))
print('Saved to file. %s \n' % time.ctime())

In [None]:
AllSummaries.sort_values('FD1in20_2012', ascending=False).head(20)

In [None]:
ValDF_withID = ValDF = None

# SCRATCH

In [None]:
for YearFile in AnnualizedMaskedFiles:
    
### STEP 1: TIF TO DATAFRAME OBJECT ###
    InputRasterPath = os.path.join(ProjectFolder, "Population", YearFile)
    Year = str(re.sub(r'[^0-9]', '', YearFile))
    
    print('Loading data for %s. %s \n' % (YearFile, time.ctime()))
    
    ValDF = rioxarray.open_rasterio(InputRasterPath).to_dataframe("value").reset_index()
    ValDF = ValDF[ValDF.value < 100000] # Excluding NoData values by choosing a number far greater than the highest valid value.
    ValDF = ValDF[ValDF.value > 0] 

    print('Created pandas dataframe from %s. %s \n' % (YearFile, time.ctime()))
    
    
### STEP 2: GENERATE GEODATAFRAME WITH SETT_ID FIELD ###
    ValDF = gpd.GeoDataFrame(ValDF,
                              geometry = gpd.points_from_xy(ValDF['x'], ValDF['y']),
                              crs = 'EPSG:4326')[['value', 'geometry']]
    print('Created geodataframe from non-NoData points, %s. %s \n' % (YearFile, time.ctime()))

    # Remember to reproject our new raster-derived geodataframe into the same equal area projection as Settlements.
    ValDF = ValDF.to_crs("ESRI:102022")
    
    # Sjoin_nearest: No need to group by ADM this time. 
    ValDF_withID = gpd.sjoin_nearest(ValDF, 
                                    Settlements, 
                                    how='left') # No need for max_distance parameter this time. We've already narrowed down to nearby raster cells.
    
    print('\nJoined settlement ID onto vectorized raster cells for %s. %s \n' % (YearFile, time.ctime()))
    try:
        print(ValDF_withID.sample(10))
    except ValueError:
        pass
    del ValDF
    
    # We no longer need the spatial information of the raster values because we have their unique settlement ID.
    ValDF_withID = pd.DataFrame(ValDF_withID).drop(columns='geometry')

    ValObject_withID.to_csv(''.join([r'Population/', 'Masked_', Year, '.csv']))
    print('\nExported as table. %s. %s \n' % (YearFile, time.ctime()))
    

### STEP 3: AGGREGATE BY SETTLEMENT AND MERGE ONTO SUMMARIES TABLE ###
    VariableName = ''.join(['PopSum', Year])
    
    ValAggregated = ValObject_withID.groupby('Sett_ID', 
                                      as_index=False)['Z'].sum().rename(columns={"Z": VariableName})
    print('\nValues aggregated to settlement level, year %s. %s \n' % (Year, time.ctime()))
    print(ValAggregated.sample(10))
    
    AllSummaries = AllSummaries.merge(ValAggregated, how='left', on='Sett_ID')
    print('\nMerged year %s onto latest year settlement feature layer. %s \n' % (Year, time.ctime()))
    print(AllSummaries.sample(10))
    
    del ValObject_withID, ValAggregated
    print('\n\n')
    

print('\n\nFinished. All years masked and assigned their nearest settlement. %s' % time.ctime())

AllSummaries.to_csv(os.path.join(ResultsFolder, 'Pop%sto%s.csv' % (2000, 2015)))
print('Saved to file. %s \n' % time.ctime())

### 10.2 Polygonize flood data.

In [None]:
OutDriver = ogr.GetDriverByName("ESRI Shapefile")
OutName = Raster.replace('.tif','')
OutPath = os.path.join(ProjectFolder, 'Hazard', ''.join([OutName, '.shp']))
print(OutDriver, OutName, OutPath)

In [None]:
SpatRef = osr.SpatialReference()
Proj = InObject.GetProjectionRef()
SpatRef.ImportFromWkt(Proj)
print(Proj, '\n\n', SpatRef)

In [None]:
OutFile = OutDriver.CreateDataSource(OutPath)
OutLayer = OutFile.CreateLayer(OutName, srs = SpatRef, geom_type=ogr.wkbPolygon)
OutField = ogr.FieldDefn("Flooded", ogr.OFTInteger)
OutLayer.CreateField(OutField)
OutField = OutLayer.GetLayerDefn().GetFieldIndex("Flooded")

In [None]:
print(OutFile, '\n', OutLayer, '\n', OutField)

In [None]:
print('Vectorizing. %s' % time.ctime())
gdal.Polygonize(InBand, None, OutLayer, 0, [], callback=None)
print('Completed polygons for %s. %s' % (Raster, time.ctime()))

del InObject
del OutFile

print('Finished. %s' % time.ctime())