# Spatiotemporal Trends in Urbanization
## Overview
This repository investigates year-by-year change in cities and settlements in Central and Western Africa (CWA). The goal is to capture activity for every settlement locality in a country to produce indicators that are high frequency, spatially granular, and timely. The Jupyter Notebook is the primary script used to construct each country's dataset. It tracks population, built-area, and economic and climate indicators across a 16-year timeframe from 2000 to 2015. 

The repository is split into three sections: methodology and notebooks, source data, and outputs. Outputs are organized by country and include growth tables ("urban panel datasets"), charts, and country briefs.


## Datasets
Datasets used to create each African country's urban panel data are as follows:
1. Most up-to-date administrative boundaries.
2. City names: **UCDB, Africapolis, and GeoNames.**
3. Settlement types: **GRID3 settlement extents.** Captured between 2009-2019.
4. Built-up area, yearly: **World Settlement Footprint Evolution.** Resolution: 30m.
5. Population, yearly: **WorldPop.** UN-adjusted, unconstrained. Resolution: 100m.
6. Nighttime lights, yearly: **Harmonization of DMSP and VIIRS.** Resolution: 1km and 500m.
7. Flood extents, by return period: **FATHOM.** Resolution: 90m.


## Accessing Data
Source data are available to the public by providers listed in the previous section, with the exception of flood data. Please note that the source data files in this repository have been fit for purpose and may not cover your area of interest. Some sources are also not global; GRID3 settlement extents are only available for sub-Saharan Africa, and Africapolis names for Africa.

Results from the analysis are currently available for Cameroon and are under development for Central African Republic and four Sahel countries: Burkina Faso, Chad, Mali, and Niger. Results are available in the outputs folder by country. Please contact the CWA Geospatial team to inquire about new locations.
<br>
> **Walker Kosmidou-Bradley**, wkosmidoubradley@worldbank.org
<br>
> **Grace Doherty**, gdoherty2@worldbank.org

## License
Materials under this repository are open-source under an MIT license. The community is invited to test, adapt, and re-purpose materials as needed.

---

## 1. PREPARE WORKSPACE

### 1.1 Off-script

##### Off-script: Create folders in your working directory. (The folder where you are storing this script).
> *ADM
<br>Buildup
<br>PlaceName
<br>Population
<br>Settlement
<br>NTL*

##### Before starting: Download datasets (as shapefile, GeoJSON, or tif where possible) and place or extract into corresponding folder. You can download the cleaned files from our [GitHub Repository](https://github.com/worldbank/Urban_Spatio_Temporal_Trends) or access original sources here:
- ADM: *Varies by source.*
- Buildup: https://download.geoservice.dlr.de/WSF_EVO/files/
- PlaceName: 
    - GeoNames: (file: cities500.zip) https://download.geonames.org/export/dump/
    - Africapolis: https://africapolis.org/en/data
    - Urban Centres Database: https://ghsl.jrc.ec.europa.eu/ghs_stat_ucdb2015mt_r2019a.php
- Population: https://hub.worldpop.org/geodata/listing?id=69
- Settlement: https://data.grid3.org/datasets/GRID3::grid3-cameroon-settlement-extents-version-01-01-/explore
- Nighttime Lights: https://eogdata.mines.edu/products/dmsp/#v4 and https://eogdata.mines.edu/products/vnl/#annual_v2

##### Other off-script:
- Convert GeoNames from .txt file to shape (delimiter = tab, header rows = 0) and rename fields.
- If necessary, mosaic WSFE rasters that cover the area of interest to create a single file.

### 1.2 Load all packages.

In [2]:
# Built-in:
# dir(), print(), range(), format(), int(), len(), list(), max(), min(), zip(), sorted(), sum(), open(), del, = None, try except, with as, for in, if elif else
# Also: list.append(), list.insert(), list.remove(), count(), startswith(), endswith(), contains(), replace()

import os, sys, glob, re, time, subprocess, string # os.getcwd(), os.path.join(), os.listdir(), os.remove(), time.ctime(), glob.glob(), string.zfill(), string.join()
from os.path import exists # exists()
from functools import reduce # reduce()

import geopandas as gpd # read_file(), GeoDataFrame(), sjoin_nearest(), to_crs(), to_file(), .crs, buffer(), dissolve()
import pandas as pd # .dtypes, Series(), concat(), DataFrame(), read_table(), merge(), to_csv(), .loc[], head(), sample(), astype(), unique(), rename(), between(), drop(), fillna(), idxmax(), isna(), isin(), apply(), info(), sort_values(), notna(), groupby(), value_counts(), duplicated(), drop_duplicates()
from shapely.geometry import Point, LineString, Polygon, shape, MultiPoint
from shapely.ops import cascaded_union
from shapely.validation import make_valid  # in apply(make_valid)
import shapely.wkt

import numpy as np # median(), mean(), tolist(), .inf
import fiona, rioxarray # fiona.open()
import rasterio # open(), write_band(), .name, .count, .width, .height. nodatavals, .meta, update(), copy(), write()
from rasterio.plot import show
from rasterio import features # features.rasterize()
from rasterio.features import shapes
from rasterio import mask # rasterio.mask.mask()
from rasterio.enums import Resampling # rasterio.enums.Resampling()
from osgeo import gdal, osr, ogr, gdal_array, gdalconst # Open(), SpatialReference, WarpOptions(), Warp(), GetDataTypeName(), GetRasterBand(), GetNoDataValue(), Translate(), GetProjection(), GetAttrValue()

### 1.3 Set up workspace.

In [3]:
ProjectFolder = os.getcwd()
ResultsFolder = os.path.join(ProjectFolder, 'Results')
print(ProjectFolder)
print(ResultsFolder)

Q:\GIS\povertyequity\urban_growth\Chad
Q:\GIS\povertyequity\urban_growth\Chad\Results


In [4]:
def ListFromRange(r1, r2):
    return [item for item in range(r1, r2+1)]

In [5]:
WSFEYears = ListFromRange(1985, 2015) # All years in the WSFE dataset.
AllStudyYears = ListFromRange(1999, 2015) # All years for which there will be growth stats in the present study.
ReversedStudyYears = []
for i in AllStudyYears:
    ReversedStudyYears.insert(0,i)
print(WSFEYears, '\n\n', AllStudyYears, '\n\n', ReversedStudyYears)

[1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015] 

 [1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015] 

 [2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999]


### 1.4 User-defined functions.

In [6]:
# From Stack Exchange @RutgerH
# https://gis.stackexchange.com/questions/163685/reclassify-a-raster-value-to-9999-and-set-it-to-the-nodata-value-using-python-a
def readRaster(filename):
    filehandle = gdal.Open(filename)
    band1 = filehandle.GetRasterBand(1)
    geotransform = filehandle.GetGeoTransform()
    geoproj = filehandle.GetProjection()
    Z = band1.ReadAsArray()
    xsize = filehandle.RasterXSize
    ysize = filehandle.RasterYSize
    return xsize,ysize,geotransform,geoproj,Z

In [7]:
# Default arguments can be changed here, or can be specified below when running the functions.
def writeRaster(filename,geotransform,geoprojection,data, NoDataVal=0, dst_datatype=gdal.GDT_UInt32):
    (x,y) = data.shape
    Dformat = "GTiff"
    driver = gdal.GetDriverByName(Dformat)
    # you can change the dataformat but be sure to be able to store negative values including -9999
    dst_ds = driver.Create(filename,y,x,1,dst_datatype)
    dst_ds.GetRasterBand(1).WriteArray(data)
    dst_ds.SetGeoTransform(geotransform)
    dst_ds.SetProjection(geoprojection)
    dst_ds.GetRasterBand(1).SetNoDataValue(NoDataVal)
    return 1
    dst_ds = None

In [8]:
# Based on Stack Exchange @Kurt Schwehr:
# https://stackoverflow.com/questions/10454316/how-to-project-and-resample-a-grid-to-match-another-grid-with-gdal-python
def resampleRaster(InRaster_Path, MatchRaster_Path, OutFile_Path, 
                   DataType = gdalconst.GDT_UInt32, 
                   ResampType = gdal.GRA_Bilinear, NoDataVal = 0):
    print('Loading for %s. %s' % (InRaster_Path, time.ctime()))
    
    RasterObject = gdal.Open(InRaster_Path)
    In_proj = RasterObject.GetProjection()
    [Match_x, Match_y, Match_geo, Match_proj, Match_Z] = readRaster(MatchRaster_Path)
    print('---Specs to match to: \n', 
      Match_proj, '\n', Match_geo, '\n', Match_x, '\n', Match_y, '\n')
        
    OutFile = gdal.GetDriverByName('GTiff').Create(OutFile_Path, Match_x, Match_y, 1, DataType)
    OutFile.SetGeoTransform(Match_geo)
    OutFile.SetProjection(Match_proj)
    print('---Created raster file for upsampled version. %s' % time.ctime())
    
    gdal.ReprojectImage(RasterObject, OutFile, In_proj, Match_proj, ResampType)
    print('---Resampled values onto an empty raster matching the dimensions of the buildup layer. %s \n\n' % time.ctime())
    
    OutFile.GetRasterBand(1).SetNoDataValue(NoDataVal)
    
    RasterObject = Outfile = None
    return 1

In [9]:
def calcShell(A, OutFile, Calculation, OutType = '', 
              B=None, C=None, D=None, E=None, F=None, G=None):
    """Raster math using gdal_calc.py.

    The OSgeo package for Python API does not make raster calculations
    easy outside of the shell. This function plugs up to 6 raster files
    into a string which subprocess.call() then commits to the terminal.

        A : str
            File path to the first raster for the calculation.
        B : str
            File path to the second raster for the calculation.
        OutFile : str
            File path where to store the raster generated from the calculation.
        Calculation : str
            Algebra that uses A and B to create a new raster. Use double quotes.
    """
    print('Running for %s. %s' % (A, time.ctime()))
    cmd = 'gdal_calc.py -A ' + A
    if B is not None:
        cmd = cmd + ' -B ' + B 
    if C is not None:
        cmd = cmd + ' -C ' + C 
    if D is not None:
        cmd = cmd + ' -D ' + D
    if E is not None:
        cmd = cmd + ' -E ' + E
    if F is not None:
        cmd = cmd + ' -F ' + F
    if G is not None:
        cmd = cmd + ' -G ' + G
    cmd = cmd + OutType + ' --outfile=' + OutFile + ' --overwrite --calc=' + Calculation
    subprocess.call(cmd, shell=True)
    cmd = A = B = C = D = E = F = G = None
    print('Ran in shell. See OutFile folder to inspect results. %s' % time.ctime())

In [10]:
def mosaicShell(A, B, OutFile, Band = 1, OutType = '',
                  C=None, D=None, E=None, F=None, G=None):
    print('Running for %s. %s' % (A, time.ctime()))
    
    StringFiles = ' '.join([A,B])
    
    for RasterName in [C,D,E,F,G]:
        if RasterName is not None:
            StringFiles = ' '.join([StringFiles, RasterName])
        else:
            pass
        
    cmd = 'gdal_merge.py -o ' + OutFile + OutType + ' -of gtiff ' + StringFiles
    
    subprocess.call(cmd, shell=True)
    print('Ran in shell. See OutFile folder to inspect results. %s' % time.ctime())

In [11]:
def RasterToShapefile(InRasterPath, OutFilePath = 'RastToShp.shp', Band=1, 
                      OutName='RastToShp', VariableName='value', Driver = 'ESRI Shapefile'):
    """Raster tiff to vector polygon shapefile.
    Can also be used for other file types like geopackage, but note that this code
    currently does not account for writing into an existing file. It will write over
    the file if specified as the file path.
    
    """
    Raster = gdal.Open(InRasterPath)
    RasterBand = Raster.GetRasterBand(Band)
    
    OutDriver = ogr.GetDriverByName(Driver)
    InProj = Raster.GetProjectionRef()
    SpatRef = osr.SpatialReference()
    SpatRef.ImportFromWkt(InProj)
    print(InProj, '\n\n', SpatRef)
    
    if exists(OutFilePath):
        OutFile = ogr.Open(OutFilePath)
    else:
        OutFile = OutDriver.CreateDataSource(OutFilePath)
    OutLayer = OutFile.CreateLayer(OutName, srs = SpatRef, geom_type = ogr.wkbPolygon)
    OutField = ogr.FieldDefn(VariableName, ogr.OFTInteger)
    OutLayer.CreateField(OutField)
    OutField = OutLayer.GetLayerDefn().GetFieldIndex(VariableName)
    print('\n', OutFile, '\n', OutLayer, '\n', OutField)
    
    print('Vectorizing. Input: %s. %s' % (InRasterPath, time.ctime()))
    gdal.Polygonize(RasterBand, None, OutLayer, 0, [], callback=None)
    print('Completed polygons. Stored as: %s. %s' % (OutFilePath, time.ctime()))

    del Raster, RasterBand, OutFile, OutLayer

In [12]:
def rioStats(InRasterPath, Band = 1):
    out = rasterio.open(InRasterPath)
    stats = []
    band = out.read(Band)
    stats.append({
        'raster': out.name,
        'bands': out.count,
        'data type': out.dtypes,
        'no data value': out.nodatavals,
        'width': out.width,
        'height': out.height,
        'min': band.min(),
        'mean': band.mean(),
        'median': np.median(band),
        'max': band.max()})
    print("\n", stats)
    
    out = band = None

In [13]:
def ShapeToRaster(Shapefile, ValueVar, MetaRasterPath, OutFilePath = 'ShpToRast.tif', Band=1, NewDType=None):
    """
    Polygon spatial object to raster tiff.
    """
    # Copy and update the metadata from another raster for the output
    MetaRaster = rasterio.open(MetaRasterPath)
    meta = MetaRaster.meta.copy()
    meta.update(compress='lzw')
    if NewDType is not None:
        meta.update(dtype=NewDType)
    MetaRaster.meta

    print("Rasterizing dataset. %s" % time.ctime())
    with rasterio.open(OutFilePath, 'w+', **meta) as out:
        out_arr = out.read(Band)

        # this is where we create a generator of geom, value pairs to use in rasterizing
        shapes = ((geom,value) for geom, value in zip(Shapefile.geometry, Shapefile[ValueVar]))

        burned = features.rasterize(shapes=shapes, fill=0, out=out_arr, transform=out.transform)
        out.write_band(1, burned)
    out = burned = shapes = None
    
    print("Finished rasterizing. Checking contents. %s" % time.ctime())
    rioStats(OutFilePath)

In [14]:
def MaskByZone(MaskPath, SourceFolder, DestFolder, SourceList = None,
               MaskLayerName = None, dstSRS = 'ESRI:102022'):
    """
    Reduces the size of a raster's valid data cells to vector areas of interest.
    This is useful if the raster data needs to be vectorized later to save space.
    
    The script prepares the vector zones as a list of geometries in the desired
    spatial reference system, then warps each raster in the specified source
    folder to the same SRS. Masking in rasterio then reclassifies any raster cells
    falling outside of a mask polygon as NoData.
    """
    
    ProjSRS = osr.SpatialReference()
    ProjSRS.SetFromUserInput(dstSRS)
    ProjWarp = gdal.WarpOptions(dstSRS = dstSRS)
    
    if SourceList is not None:
        SourceFiles = SourceList
    else:
        SourceFiles = []
        SourceFiles = SourceFiles + [i for i in os.listdir(''.join([SourceFolder, r'/'])) if i.endswith('tif')]
        print(SourceFiles)

    
    ### 1. ASSIGN SPATIAL REFERENCE SYSTEM OF VECTOR MASK AND LOAD GEOMETRIES
    Vector = gpd.read_file(filename=MaskPath, layer=MaskLayerName)
    if Vector.crs != dstSRS:
        if MaskLayerName == None:
            MaskPath = MaskPath + '_temp'
        else:
            MaskLayerName = MaskLayerName + '_temp'
        Vector.to_crs(dstSRS).to_file(filename=MaskPath, layer=MaskLayerName)
    Vector = None # We're reloading the geometries with fiona
    
    with fiona.open(MaskPath, mode="r", layer=MaskLayerName) as Vector:
        MaskGeom = [feature["geometry"] for feature in Vector] # Identify the bounding areas of the mask.
    
    
    ### 2. PREPARE DESTINATION FILES
    for FileName in SourceFiles:
    
        InputRasterPath = os.path.join(ProjectFolder, SourceFolder, FileName)
        
        Sensor = re.search('[A-Z]+_', FileName)
        if Sensor is None:
            Sensor = ''
        else:
            Sensor = Sensor.group(0)

        Year = re.search('\d{4}', FileName)
        if Year is None:
            Year = ''
        else:
            Year = Year.group(0)

        if FileName.endswith('avg.tif') == True:
            IndicType = '_avg'
        elif FileName.endswith('cfc.tif') == True:
            IndicType = '_cfc'
        else:
            IndicType = ''

        TempOutputName = 'Temp_' + Sensor + Year + IndicType + '.tif'
        TempOutputPath = os.path.join(ProjectFolder, DestFolder, TempOutputName)
        FinalOutputName = 'Msk_' + Sensor + Year + IndicType + '.tif'
        FinalOutputPath = os.path.join(ProjectFolder, DestFolder, FinalOutputName)

    ### 3. ASSIGN SPATIAL REFERENCE SYSTEM OF RASTER(S)
        InputRasterObject = gdal.Open(InputRasterPath)
        SourceSRS = osr.SpatialReference(wkt=InputRasterObject.GetProjection())
        print('Source projection: ', SourceSRS.GetAttrValue('projcs'))
        print('Destination projection: ', ProjSRS.GetAttrValue('projcs'))

        if SourceSRS.GetAttrValue('projcs') != ProjSRS.GetAttrValue('projcs'):
            Warp = gdal.Warp(TempOutputPath, # Where to store the warped raster
                         InputRasterObject, # Which raster to warp
                         format='GTiff', 
                         options=ProjWarp) # Reproject to Africa Albers Equal Area Conic
            print('Finished gdal.Warp() for %s. %s \n' % (FileName, time.ctime()))

            Warp = None # Close the files
        else:
            pass
        InputRasterObject = None
        
    ### 4. RECLASSIFY AS NODATA IF OUTSIDE OF SETTLEMENT BUFFER ZONE.
        if exists(TempOutputPath):
            NewInputPath = TempOutputPath 
            print("We warped the data, so we'll use that file for next step.")
        else:
            NewInputPath = InputRasterPath 
            print("We skipped the warp, so we continue to use the source file.")

        with rasterio.open(NewInputPath) as InputRasterObject:
            MaskedOutputRaster, OutTransform = rasterio.mask.mask(
                InputRasterObject, MaskGeom, crop=True) # Anything outside the mask is reclassed to the raster's NoData value.
            OutMetaData = InputRasterObject.meta.copy()
        print('Finished rasterio.mask.mask() for %s. %s \n' % (FileName, time.ctime()))

        OutMetaData.update({"driver": "GTiff",
                         "height": MaskedOutputRaster.shape[1],
                         "width": MaskedOutputRaster.shape[2],
                         "transform": OutTransform})

        with rasterio.open(FinalOutputPath, "w", **OutMetaData) as dest:
            dest.write(MaskedOutputRaster)
        print('Written to file. %s \n' % time.ctime())
        InputRasterObject = None

        if exists(TempOutputPath):
            try:  # Finally, remove the intermediate file from disk
                os.remove(TempOutputPath)
            except OSError:
                pass
            print('Removed intermediate file. %s \n' % time.ctime())
        else:
            pass


    print('\n \n Finished all years in list. %s' % time.ctime())

In [15]:
def BatchZonalStats(FolderName, Zones, 
                    CRS = 'ESRI:102022', 
                    JoinField = 'Sett_ID',
                    StatsWanted = ['count', 'sum', 'mean', 'max', 'min'],
                    SeriesStart = 1999, SeriesEnd = 2015, 
                    AnnualizedFiles = None, VarPrefix = None):
    """
    Normally, we would use numpy to generate a point gdf from the raster's matrix. 
    However, I was running into a lot of memory errors with that method.
    This method uses some extra steps: tif to xyz to df to gdf. But it saves to file
    and deletes intermediate files along the way, circumventing memory issues.
    
    Run MaskByZone() prior to reduce the raster to only your area(s) of interest.
    
    """
    if AnnualizedFiles is None:
        AnnualizedFiles = [i for i in os.listdir(FolderName) if i.endswith('.tif')]
    print(AnnualizedFiles)
    AllSummaries = pd.DataFrame(Zones).drop(columns='geometry')[[JoinField]]
    print(AllSummaries)
    
    if VarPrefix is None:
        VarPrefix = FolderName[:3].upper()
    
    for FileName in AnnualizedFiles:
    ### STEP 1: TIF TO XYZ ###
        print('Loading data for %s. %s \n' % (FileName, time.ctime()))
        
        Sensor = re.search('[A-Z]+_', FileName)
        if Sensor is None:
            Sensor = ''
        else:
            Sensor = Sensor.group(0)
            
        Year = re.search('\d{4}', FileName)
        if Year is None:
            Year = ''
        else:
            Year = Year.group(0)
        
        InputRasterPath = os.path.join(ProjectFolder, FolderName, FileName)
        InputRasterObject = gdal.Open(InputRasterPath)
        XYZOutputPath = FolderName + r'/{}'.format(
            FileName.replace('.tif', '.xyz')) # New file path will be the same as original, but .tif is replaced with .xyz

        # Create an .xyz version of the .tif
        if exists(XYZOutputPath):
            print("Already created xyz file.")
        else:
            print("Creating XYZ (gdal.Translate()).")
            XYZ = gdal.Translate(XYZOutputPath, # Specify a destination path
                                 InputRasterObject, # Input is the masked .tif file
                                 format='XYZ', 
                                 creationOptions=["ADD_HEADER_LINE=YES"])
            print('Finished gdal.Translate() for year %s. %s \n' % (Year, time.ctime()))
            XYZ = None # Reload XYZ as a point geodataframe

        InputRasterObject = None


    ### STEP 2: GENERATE GEODATAFRAME WITH JOIN FIELD ###
        InputXYZ = pd.read_table(XYZOutputPath, delim_whitespace=True)
        InputXYZ = InputXYZ.loc[InputXYZ['Z'] > 0] # Subset to only the features that have a value.
        
        if re.search('WSFE', FileName) is not None: # Scale back up to years if working with flood/building data.
            InputXYZ['Z'] = InputXYZ['Z'] + 1900
            
        print('Loaded XYZ file as a pandas dataframe. %s \n' % time.ctime())
        ValObject = gpd.GeoDataFrame(InputXYZ,
                                     geometry = gpd.points_from_xy(InputXYZ['X'], InputXYZ['Y']),
                                     crs = CRS)
        print('Created geodataframe from non-NoData points. %s \n' % time.ctime())
        del InputXYZ

        # Sjoin_nearest: No need to group by ADM this time. 
        ValObject_withID = pd.DataFrame(gpd.sjoin_nearest(ValObject, 
                                        Zones, 
                                        how='left')).drop(columns='geometry')[['Z', JoinField]] # No need for max_distance parameter this time. We've already narrowed down to nearby raster cells.

        print('\nJoined zone ID onto vectorized raster cells. %s \n' % time.ctime())
        print(ValObject_withID.sample(10))
        del ValObject

        ValObject_withID.to_csv(''.join([FolderName, r'/', FileName.replace('.tif', '.csv')]))
        print('\nExported as table. %s \n' % time.ctime())

#         # Remove the temporary xyz file.
#         try:  
#             os.remove(os.path.join(XYZOutputPath))
#         except OSError:
#             pass
#         print('Removed (or skipped if error) intermediate xyz file. %s \n' % time.ctime())


    ### STEP 3: AGGREGATE BY SETTLEMENT AND MERGE ONTO SUMMARIES TABLE ###
        GroupedVals = ValObject_withID[ValObject_withID['Z'].notna()].groupby(JoinField, as_index=False)
        
        # Run this block if the variable is about cloud-free coverage.
        if re.search('cfc', FileName) is not None:
            VariableName = ''.join([VarPrefix, 'cfc_', Sensor, Year])
            AllSummaries = AllSummaries.merge(GroupedVals.mean().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
            print('\nCount of cloud-free observations averaged to settlement level, year %s. %s \n' % (Year, time.ctime()))
            
            # Save in-progress results
            AllSummaries.to_csv(os.path.join(ResultsFolder, ''.join([VarPrefix, '%sto%s.csv' % (SeriesStart, SeriesEnd)])))
            print(AllSummaries.sample(10))
        
        # Run this block if we're working with the flooded buildings data.
        elif re.search('WSFE', FileName) is not None:
            for BuiltYear in AllStudyYears:
                Grouped_Subset = GroupedVals[GroupedVals['Z'].between(
                    1985, BuiltYear, inclusive=True)] # Inclusive parameter means we include the years 1985 and the named year.
                if 'count' in StatsWanted:
                    VariableName = ''.join([VarPrefix, 'ct', Sensor, BuiltYear])
                    AllSummaries = AllSummaries.merge(GroupedVals.count().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
                if 'sum' in StatsWanted:
                    VariableName = ''.join([VarPrefix, 'sum', Sensor, BuiltYear])
                    AllSummaries = AllSummaries.merge(GroupedVals.sum().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
                if 'mean' in StatsWanted:
                    VariableName = ''.join([VarPrefix, 'avg', Sensor, BuiltYear])
                    AllSummaries = AllSummaries.merge(GroupedVals.mean().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
                if 'max' in StatsWanted:
                    VariableName = ''.join([VarPrefix, 'max', Sensor, BuiltYear])
                    AllSummaries = AllSummaries.merge(GroupedVals.max().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
                if 'min' in StatsWanted:
                    VariableName = ''.join([VarPrefix, 'min', Sensor, BuiltYear])
                    AllSummaries = AllSummaries.merge(GroupedVals.min().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
                print('\nDesired aggregation methods applied to settlement level, year %s. %s \n' % (Year, time.ctime()))

                # Save in-progress results
                AllSummaries.to_csv(os.path.join(ResultsFolder, ''.join([VarPrefix, '%sto%s.csv' % (SeriesStart, SeriesEnd)])))
                print(AllSummaries.sample(10))
        
        # Anything else takes the standard aggregation method.
        else:
            if 'count' in StatsWanted:
                VariableName = ''.join([VarPrefix, 'ct', Sensor, Year])
                AllSummaries = AllSummaries.merge(GroupedVals.count().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
            if 'sum' in StatsWanted:
                VariableName = ''.join([VarPrefix, 'sum', Sensor, Year])
                AllSummaries = AllSummaries.merge(GroupedVals.sum().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
            if 'mean' in StatsWanted:
                VariableName = ''.join([VarPrefix, 'avg', Sensor, Year])
                AllSummaries = AllSummaries.merge(GroupedVals.mean().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
            if 'max' in StatsWanted:
                VariableName = ''.join([VarPrefix, 'max', Sensor, Year])
                AllSummaries = AllSummaries.merge(GroupedVals.max().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
            if 'min' in StatsWanted:
                VariableName = ''.join([VarPrefix, 'min', Sensor, Year])
                AllSummaries = AllSummaries.merge(GroupedVals.min().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
            print('\nDesired aggregation methods applied to settlement level, year %s. %s \n' % (Year, time.ctime()))
            
            # Save in-progress results
            AllSummaries.to_csv(os.path.join(ResultsFolder, ''.join([VarPrefix, '%sto%s.csv' % (SeriesStart, SeriesEnd)])))
            print(AllSummaries.sample(10))

    
    print('\n\nFinished. All years masked and assigned their nearest settlement. %s' % time.ctime())
    print(AllSummaries.sample(10))
    AllSummaries.to_csv(os.path.join(ResultsFolder, ''.join([VarPrefix, '%sto%s.csv' % (SeriesStart, SeriesEnd)])))
    print('Saved to file. %s \n' % time.ctime())

---

## 2. PREPARE BUILDUP, SETTLEMENT, AND ADMIN DATASETS
Projection for all datasets: Africa Albers Equal Area Conic

### 2.1 Prepare GRID3 and Admin area files

In [20]:
ADM_vec = gpd.read_file(glob.glob('ADM/*.shp')[0])[['geometry']].to_crs("ESRI:102022") # This glob() function pulls the first file ([0]) in the ADM folder which ended in '.shp'
GRID3_vec = gpd.read_file(glob.glob('Settlement/*.shp')[0])[['type','geometry']].to_crs("ESRI:102022")
ADM_vec['ADM_ID'] = range(0,len(ADM_vec))
ADM_vec['ADM_ID'] = ADM_vec['ADM_ID'] + 1 # We have to add 1 if we want our rasterized version's NoData value to be 0. Otherwise the first feature won't be valid.
GRID3_vec['G3_ID'] = range(0,len(GRID3_vec))
GRID3_vec['G3_ID'] = GRID3_vec['G3_ID'] +1
ADM_vec.to_file(driver='GPKG', filename=r'ADM/ADM_withID.gpkg', layer='ADM')
GRID3_vec.to_file(driver='GPKG', filename=r'Settlement/Settlement_withID.gpkg', layer='GRID3')

In [21]:
ADM_vec = gpd.read_file(r'ADM/ADM_withID.gpkg', layer='ADM')
GRID3_vec = gpd.read_file(r'Settlement/Settlement_withID.gpkg', layer='GRID3')

# We need to know how many digits need to be allocated to each dataset in the "join" serial.
len_ADM = len(str(ADM_vec['ADM_ID'].max()))
len_G3 =  len(str(GRID3_vec['G3_ID'].max()))

print(ADM_vec.info(), "\n\n", 
      ADM_vec.sample(5),
      ADM_vec.crs, "\n\n", 
      len_ADM) 
print(GRID3_vec.info(), "\n\n",
      GRID3_vec.sample(5),
      GRID3_vec.crs, "\n\n", 
      len_G3)

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 70 entries, 0 to 69
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   ADM_ID    70 non-null     int64   
 1   geometry  70 non-null     geometry
dtypes: geometry(1), int64(1)
memory usage: 1.2 KB
None 

     ADM_ID                                           geometry
58      59  POLYGON ((-847666.562 2643358.538, -847572.175...
19      20  POLYGON ((-931034.749 1058450.297, -926665.181...
60      61  POLYGON ((-285472.101 1944944.251, -285080.373...
4        5  POLYGON ((-891560.498 1444313.809, -887072.886...
51      52  POLYGON ((-940078.833 1672908.818, -935138.805... PROJCS["Africa_Albers_Equal_Area_Conic",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",0],PARAM

### 2.2 Reproject WSFE to project CRS, and ensure valid data range is only 1985-2015.

In [20]:
# # If WSFE hasn't been mosaicked yet:
# A=os.path.join(ProjectFolder, 'Buildup', 'Pre_mosaic', 'WSFE1.tif')
# B=os.path.join(ProjectFolder, 'Buildup', 'Pre_mosaic', 'WSFE2.tif')
# C=os.path.join(ProjectFolder, 'Buildup', 'Pre_mosaic', 'WSFE3.tif')
# D=os.path.join(ProjectFolder, 'Buildup', 'Pre_mosaic', 'WSFE4.tif')

# Out=os.path.join(ProjectFolder, 'Buildup', 'WSFE.tif')

# mosaicShell(A=A, B=B, C=C, D=D, OutFile=Out, OutType = ' -ot UInt32 ')

# Out=None

Running for Q:\GIS\povertyequity\urban_growth\Mali\Buildup\Pre_mosaic\WSFE1.tif. Thu Mar  2 10:39:22 2023
Ran in shell. See OutFile folder to inspect results. Thu Mar  2 10:40:15 2023


In [17]:
InPath = glob.glob('Buildup/*.tif')[0]
OutWGS = os.path.join(ProjectFolder, 'Buildup', 'WSFE_reclass.tif')

# Together, x and y define the data's "shape".
# geotransform contains the parameters detailing how the raster should be stretched and aligned.
# geoproj is the map projection
# Z are the values in the raster band.
[xsize,ysize,geotransform,geoproj,Z] = readRaster(InPath)
Z[Z<1985] = 0
Z[Z>2015] = 0

writeRaster(OutWGS,geotransform,geoproj,Z, NoDataVal=0, dst_datatype=gdal.GDT_UInt32)
print('Wrote the reclassed raster to file. %s' % time.ctime())

Wrote the reclassed raster to file. Fri Mar  3 12:08:22 2023


In [18]:
OutEqArea = os.path.join(ProjectFolder, 'Buildup', 'WSFE_equalarea.tif')

# Whenever we want to work in a projected CRS, we'll use Africa Albers Equal Area Conic.
ProjEqArea = gdal.WarpOptions(dstSRS='ESRI:102022')
Warp = gdal.Warp(OutEqArea, # Where to store the warped raster
                 OutWGS, # Which raster to warp
                 format='GTiff', 
                 options=ProjEqArea)
print('Wrote the reclassed and reprojected raster to file. %s' % time.ctime())

Wrote the reclassed and reprojected raster to file. Fri Mar  3 12:09:24 2023


In [19]:
rioStats(OutWGS)
rioStats(OutEqArea)


 [{'raster': 'Q:/GIS/povertyequity/urban_growth/Chad/Buildup/WSFE_reclass.tif', 'bands': 1, 'data type': ('uint32',), 'no data value': (0.0,), 'width': 39041, 'height': 60394, 'min': 0, 'mean': 0.521958189572685, 'median': 0.0, 'max': 2015}]

 [{'raster': 'Q:/GIS/povertyequity/urban_growth/Chad/Buildup/WSFE_equalarea.tif', 'bands': 1, 'data type': ('uint32',), 'no data value': (0.0,), 'width': 37348, 'height': 61455, 'min': 0, 'mean': 0.5220090568694347, 'median': 0.0, 'max': 2015}]


In [20]:
InPath = Warp = OutWGS = OutEqArea = None

---

## 3. WSFE AND ADM; GRID3 AND ADM
RASTERIZE: Bring ADM and GRID3 into raster space.

RASTER MATH: "Join" ADM ID onto GRID3 and onto WSFE by creating unique concatenation string.

VECTORIZE: Bring joined data into vector space.

VECTOR MATH: Split unique ID from raster math step into separate columns.

### 3.1 Rasterize admin areas and GRID3 using WSFE specs.

In [22]:
# Copy and update the metadata from WSFE for the output
WSFE = os.path.join(ProjectFolder, 'Buildup', 'WSFE_equalarea.tif')
ADM_out = os.path.join(ProjectFolder, 'ADM', 'ADM_rasterized.tif')
GRID3_out = os.path.join(ProjectFolder, 'Settlement', 'GRID3_rasterized.tif')

In [23]:
ShapeToRaster(Shapefile=ADM_vec, ValueVar="ADM_ID", MetaRasterPath=WSFE, OutFilePath=ADM_out, NewDType = 'uint16')
ShapeToRaster(GRID3_vec, "G3_ID", WSFE, GRID3_out, NewDType='uint32')

Rasterizing dataset. Fri Mar  3 13:27:53 2023
Finished rasterizing. Checking contents. Fri Mar  3 13:28:49 2023

 [{'raster': 'Q:/GIS/povertyequity/urban_growth/Chad/ADM/ADM_rasterized.tif', 'bands': 1, 'data type': ('uint16',), 'no data value': (0.0,), 'width': 37348, 'height': 61455, 'min': 0, 'mean': 25.36022486877017, 'median': 11.0, 'max': 70}]
Rasterizing dataset. Fri Mar  3 13:29:19 2023
Finished rasterizing. Checking contents. Fri Mar  3 13:32:34 2023

 [{'raster': 'Q:/GIS/povertyequity/urban_growth/Chad/Settlement/GRID3_rasterized.tif', 'bands': 1, 'data type': ('uint32',), 'no data value': (0.0,), 'width': 37348, 'height': 61455, 'min': 0, 'mean': 1171.613932064173, 'median': 0.0, 'max': 353534}]


### 3.2 Raster math to "join" admin to GRID3 and to WSFE.
Processing is more rapid when "joining," i.e. creating serial codes out of two datasets, in raster rather than vector space.
Here, we are concatenating the ID fields of the two datasets to create a serial number that we can then split in vector space later to create two ID fields.

In [30]:
# In paths
InG3 = os.path.join(ProjectFolder, 'Settlement', 'GRID3_rasterized.tif')
InWSFE = os.path.join(ProjectFolder, 'Buildup', 'WSFE_equalarea.tif')
InADM = os.path.join(ProjectFolder, 'ADM', 'ADM_rasterized.tif')

# Out paths
G3_ADM = os.path.join(ProjectFolder, 'Settlement', 'GRID3_ADM.tif')
WSFE_ADM = os.path.join(ProjectFolder, 'Buildup', 'WSFE_ADM.tif')

In [31]:
# Recalculate number of digits for each dataset if starting from Section 3
G3_rio = rasterio.open(InG3).read(1)
WSFE_rio = rasterio.open(InWSFE).read(1)
ADM_rio = rasterio.open(InADM).read(1)

len_G3 = len(str(G3_rio.max()))
len_WSFE = len(str(WSFE_rio.max()))
len_ADM = len(str(ADM_rio.max()))

G3_rio = WSFE_rio = ADM_rio = None
print('GRID3: ', len_G3, '\nWSFE: ', len_WSFE, '\nADM: ', len_ADM)

GRID3:  6 
WSFE:  4 
ADM:  2


In [33]:
# Calculations
# The number of digits in the largest ADM index value (len_ADM) is 
# the number of zeroes we tack onto the first variable in the serial.

Calc = "(A*" + str(10**len_ADM) + ")+B" 

calcShell(A=InG3, B=InADM, OutFile=G3_ADM, Calculation=Calc)
calcShell(A=InWSFE, B=InADM, OutFile=WSFE_ADM, Calculation=Calc)

Running for Q:\GIS\povertyequity\urban_growth\Chad\Buildup\WSFE_equalarea.tif. Fri Mar  3 16:07:32 2023
Ran in shell. See OutFile folder to inspect results. Fri Mar  3 16:10:11 2023


*Adding together the values to create join IDs. This is in effect a concatenation of their ID strings, by way of summation. The number of zeros in the calc multiplication corresponds with number of digits of the maximum value in the "B" dataset. (e.g. Chad ADM codes go up 4 digits, so it's calc=(A*10000)+B).*

In [34]:
rioStats(G3_ADM)
rioStats(WSFE_ADM)


 [{'raster': 'Q:/GIS/povertyequity/urban_growth/Chad/Settlement/GRID3_ADM.tif', 'bands': 1, 'data type': ('uint32',), 'no data value': (4294967293.0,), 'width': 37348, 'height': 61455, 'min': 129, 'mean': 4260448031.470159, 'median': 4294967293.0, 'max': 4294967293}]

 [{'raster': 'Q:/GIS/povertyequity/urban_growth/Chad/Buildup/WSFE_ADM.tif', 'bands': 1, 'data type': ('uint32',), 'no data value': (4294967293.0,), 'width': 37348, 'height': 61455, 'min': 198501, 'mean': 4293842016.965927, 'median': 4294967293.0, 'max': 4294967293}]


### 3.3 Vectorize serialized layers.

In [35]:
G3_in = os.path.join(ProjectFolder, 'Settlement', 'GRID3_ADM.tif')
G3_out = os.path.join(ProjectFolder, 'Settlement', 'GRID3_ADM.shp')
WSFE_in = os.path.join(ProjectFolder, 'Buildup', 'WSFE_ADM.tif')
WSFE_out = os.path.join(ProjectFolder, 'Buildup', 'WSFE_ADM.shp')

In [None]:
RasterToShapefile(G3_in, G3_out, OutName='GRID3_ADM', VariableName='gridcode', Driver = 'ESRI Shapefile')

In [36]:
RasterToShapefile(WSFE_in, WSFE_out, OutName='WSFE_ADM', VariableName='gridcode', Driver = 'ESRI Shapefile')

PROJCS["Africa_Albers_Equal_Area_Conic",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",0],PARAMETER["longitude_of_center",25],PARAMETER["standard_parallel_1",20],PARAMETER["standard_parallel_2",-23],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]] 

 PROJCS["Africa_Albers_Equal_Area_Conic",
    GEOGCS["WGS 84",
        DATUM["WGS_1984",
            SPHEROID["WGS 84",6378137,298.257223563,
                AUTHORITY["EPSG","7030"]],
            AUTHORITY["EPSG","6326"]],
        PRIMEM["Greenwich",0],
        UNIT["degree",0.0174532925199433,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","4326"]],
    PROJECTION["Albers_Conic_

### 3.4 Vector math to split raster strings into admin area, GRID3, and WSFE year assignments.

In [37]:
# Load newly created vectorized datasets.
GRID3_ADM = gpd.read_file(r"Settlement/GRID3_ADM.shp").to_crs("ESRI:102022")
WSFE_ADM = gpd.read_file(r"Buildup/WSFE_ADM.shp").to_crs("ESRI:102022")
print(GRID3_ADM.info(), "\n\n", GRID3_ADM.sample(10), "\n\n", GRID3_ADM.crs, "\n\n", 
      WSFE_ADM.info(), "\n\n", WSFE_ADM.sample(10), "\n\n", WSFE_ADM.crs, "\n\n", 
      GRID3_ADM['gridcode'].max(), WSFE_ADM['gridcode'].max())

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 374089 entries, 0 to 374088
Data columns (total 2 columns):
 #   Column    Non-Null Count   Dtype   
---  ------    --------------   -----   
 0   gridcode  374089 non-null  int64   
 1   geometry  374089 non-null  geometry
dtypes: geometry(1), int64(1)
memory usage: 5.7 MB
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 127329 entries, 0 to 127328
Data columns (total 2 columns):
 #   Column    Non-Null Count   Dtype   
---  ------    --------------   -----   
 0   gridcode  127329 non-null  int64   
 1   geometry  127329 non-null  geometry
dtypes: geometry(1), int64(1)
memory usage: 1.9 MB
None 

           gridcode                                           geometry
300502     9718140  POLYGON ((-609043.637 1167617.022, -608984.515...
360162     3549024  POLYGON ((-837579.703 1017476.719, -837491.020...
189222    15885513  POLYGON ((-915059.075 1487614.811, -914999.953...
60487     34373166  POLYGON ((-915088.636 1646

In [38]:
# Split serial back into separate dataset fields.
# For example, Burkina: WSFE and ADM: 4+3=7 digits. GRID3 and ADM: 6+3=9 digits.
G3_Fill = len_G3 + len_ADM
WSFE_Fill = len_WSFE + len_ADM

GRID3_ADM['gridstring'] = GRID3_ADM['gridcode'].astype(str).str.zfill(G3_Fill)
WSFE_ADM['gridstring'] = WSFE_ADM['gridcode'].astype(str).str.zfill(WSFE_Fill)

GRID3_ADM['Sett_ID'] = GRID3_ADM['gridstring'].str[:-len_ADM].astype(int) # Remove the last 3 digits to get the GRID3 portion.
GRID3_ADM['ADM_ID'] = GRID3_ADM['gridstring'].str[-len_ADM:].astype(int) # Keep only the last 3 digits to get the ADM portion.
WSFE_ADM['year'] = WSFE_ADM['gridstring'].str[:-len_ADM].astype(int)
WSFE_ADM['ADM_ID'] = WSFE_ADM['gridstring'].str[-len_ADM:].astype(int)

print(GRID3_ADM.sample(10), WSFE_ADM.sample(10))

        gridcode                                           geometry  \
2397    35224759  POLYGON ((-857710.741 2333000.194, -857622.058...   
269332  11012956  POLYGON ((-218454.188 1324290.304, -218365.505...   
273656   6974705  POLYGON ((-974092.385 1309480.245, -974003.702...   
350540   3719724  POLYGON ((-850143.126 1044436.348, -850054.443...   
124163  27054202  POLYGON ((-556099.892 1571627.163, -556040.770...   
153883  21528643  POLYGON ((-321326.456 1538371.042, -321237.773...   
334694   2901047  POLYGON ((-890493.887 1086708.574, -890434.765...   
50123   29579141  POLYGON ((-427154.824 1664478.254, -427066.141...   
181079  20591742  POLYGON ((-380803.182 1500473.844, -380684.938...   
106369  15088119  POLYGON ((-1028159.448 1589245.517, -1028041.2...   

       gridstring  Sett_ID  ADM_ID  
2397     35224759   352247      59  
269332   11012956   110129      56  
273656   06974705    69747       5  
350540   03719724    37197      24  
124163   27054202   270542       

In [39]:
# Dissolve any features that have the same G3 and ADM values so that we have a single unique feature per settlement.
# Note: we do NOT want to dissolve the WSFE features. Distinct features for noncontiguous builtup areas of the same year is necessary to separate them in the Near tool step.
print(time.ctime())
GRID3_ADM = GRID3_ADM.dissolve(by=['Sett_ID', 'ADM_ID'], as_index=False)
print(GRID3_ADM.info(), GRID3_ADM.head(), "\n\n", time.ctime())

Fri Mar  3 16:12:49 2023
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 354287 entries, 0 to 354286
Data columns (total 5 columns):
 #   Column      Non-Null Count   Dtype   
---  ------      --------------   -----   
 0   Sett_ID     354287 non-null  int64   
 1   ADM_ID      354287 non-null  int64   
 2   geometry    354287 non-null  geometry
 3   gridcode    354287 non-null  int64   
 4   gridstring  354287 non-null  object  
dtypes: geometry(1), int64(3), object(1)
memory usage: 13.5+ MB
None    Sett_ID  ADM_ID                                           geometry  \
0        1      29  POLYGON ((-971431.896 905410.981, -971313.652 ...   
1        2      29  POLYGON ((-967766.332 914870.500, -967677.649 ...   
2        3      29  POLYGON ((-940008.556 930537.828, -939949.434 ...   
3        4      29  MULTIPOLYGON (((-958484.179 937366.418, -95848...   
4        5      29  POLYGON ((-903205.115 999740.121, -903086.871 ...   

   gridcode gridstring  
0       129   00000129 

In [40]:
# Remove features where year, settlement, or admin area = 0.
# This was supposed to be resolved earlier with the gdal_calc NoDataValue parameter.

print("Before: WSFE %s and GRID3 %s\n" % (WSFE_ADM.shape, GRID3_ADM.shape))
WSFE_ADM = WSFE_ADM.loc[(WSFE_ADM["year"] != 0) & (WSFE_ADM["ADM_ID"] != 0)] # Since we change the datatype to integer, no need to include all digits. Otherwise, it would need to be: != '0000'
GRID3_ADM = GRID3_ADM.loc[(GRID3_ADM["Sett_ID"] != 0) & (GRID3_ADM["ADM_ID"] != 0)]
print("After: WSFE %s and GRID3 %s\n" % (WSFE_ADM.shape, GRID3_ADM.shape))

Before: WSFE (127329, 5) and GRID3 (354287, 5)

After: WSFE (127329, 5) and GRID3 (354287, 5)



In [41]:
# The Bounded_ID is our new unique settlement identifier for subsequent matching steps.
GRID3_ADM['Bounded_ID'] = GRID3_ADM.index
WSFE_ADM['WSFE_ID'] = WSFE_ADM.index
GRID3_ADM = GRID3_ADM[['Sett_ID', 'Bounded_ID', 'ADM_ID', 'geometry']]
WSFE_ADM = WSFE_ADM[['WSFE_ID', 'year', 'ADM_ID', 'geometry']]

In [42]:
# Validation: 
# The first two printed numbers should be the same. There shouldn't be any GRID3 rows with matching Sett_ID and ADM_IDs.
# The latter two numbers should be different, and the first should be larger. We never dissolved WSFE by any column.

print(len(GRID3_ADM[['Sett_ID', 'ADM_ID']]),
      len(GRID3_ADM[['Sett_ID', 'ADM_ID']].drop_duplicates()),
      len(WSFE_ADM[['year', 'ADM_ID']]),
      len(WSFE_ADM[['year', 'ADM_ID']].drop_duplicates()))

354287 354287 127329 1698


In [43]:
GRID3_ADM.to_file(
    driver='GPKG', filename='Settlement/GRID3_ADM.gpkg', layer='GRID3_ADM_cleaned')
WSFE_ADM.to_file(
    driver='GPKG', filename=r'Buildup/WSFE_ADM.gpkg', layer='WSFE_ADM_cleaned')

---

## 4. UNIQUE SETTLEMENTS FROM WSFE AND GRID3: TWO VERSIONS

Note that there are 2 versions here, so that we can create a fragmentation index:
1. **Boundless, aka boundary-agnostic settlements**: Unique settlements are linked to GRID3 settlement IDs. Administrative areas do not influence the extents of the settlement.
2. **Bounded, aka politically-defined settlements**: Settlements in the Boundless dataset which spread across more than one administrative area are split into separate settlements in the Bounded dataset. The largest polygon after the split is considered the "principal" settlement, and polygons in other admin areas are considered "fragments." By dividing the fragment area(s) of the Bounded settlement by the area of the Boundless settlement, we can acquire a fragmentation index for each locality.

### 4.1 BOUNDED SETTLEMENTS: Near Join by ADM group.

In [44]:
print("Number of admin areas with GRID3 features: %s" % len(GRID3_ADM['ADM_ID'].unique().tolist()))
print("Number of admin areas with WSFE features: %s" % len(WSFE_ADM['ADM_ID'].unique().tolist()))
print("Number of admin areas where one dataset is observed but the other is not: %s" % (
    len(GRID3_ADM['ADM_ID'].unique().tolist()) - len(WSFE_ADM['ADM_ID'].unique().tolist())))

Number of admin areas with GRID3 features: 70
Number of admin areas with WSFE features: 68
Number of admin areas where one dataset is observed but the other is not: 2


In [45]:
ADM_IDs = sorted(GRID3_ADM['ADM_ID'].unique().tolist())
ADM_IDs

[1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 47,
 48,
 49,
 50,
 51,
 52,
 53,
 54,
 55,
 56,
 57,
 58,
 59,
 60,
 61,
 62,
 63,
 64,
 65,
 66,
 67,
 68,
 69,
 70]

In [46]:
# We're creating this field to help in removing duplicates from the sjoin_nearest, next section.
GRID3_ADM['G3_Area'] = GRID3_ADM['geometry'].area / 10**6

In [47]:
# Create empty geodataframe to append onto using the dataframe whose geometry we want to retain.
Bounded = GRID3_ADM[0:0]
Bounded["year"] = pd.Series(dtype='int')
Bounded.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 0 entries
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   Sett_ID     0 non-null      int64   
 1   Bounded_ID  0 non-null      int64   
 2   ADM_ID      0 non-null      int64   
 3   geometry    0 non-null      geometry
 4   G3_Area     0 non-null      float64 
 5   year        0 non-null      int32   
dtypes: float64(1), geometry(1), int32(1), int64(3)
memory usage: 0.0 bytes


In [48]:
for ID in ADM_IDs:
    WSFE_shard = WSFE_ADM.loc[WSFE_ADM['ADM_ID'] == ID]
    GRID3_shard = GRID3_ADM.loc[GRID3_ADM['ADM_ID'] == ID]
    WSFE_GRID3_shard = gpd.sjoin_nearest(WSFE_shard, 
                                         GRID3_shard, 
                                         how='inner',
                                         max_distance=500)
    Bounded = pd.concat([Bounded, WSFE_GRID3_shard])
    print('Completed near join in admin area %s. %s \n' % (ID, time.ctime()))
print('Completed near join for all ADMs. %s \n' % time.ctime())

del WSFE_shard, GRID3_shard, WSFE_GRID3_shard

Completed near join in admin area 1. Fri Mar  3 16:23:14 2023 

Completed near join in admin area 2. Fri Mar  3 16:23:14 2023 

Completed near join in admin area 3. Fri Mar  3 16:23:15 2023 

Completed near join in admin area 4. Fri Mar  3 16:23:15 2023 

Completed near join in admin area 5. Fri Mar  3 16:23:15 2023 

Completed near join in admin area 6. Fri Mar  3 16:23:15 2023 

Completed near join in admin area 7. Fri Mar  3 16:23:15 2023 

Completed near join in admin area 8. Fri Mar  3 16:23:16 2023 

Completed near join in admin area 9. Fri Mar  3 16:23:16 2023 

Completed near join in admin area 10. Fri Mar  3 16:23:16 2023 

Completed near join in admin area 11. Fri Mar  3 16:23:16 2023 

Completed near join in admin area 12. Fri Mar  3 16:23:16 2023 

Completed near join in admin area 13. Fri Mar  3 16:23:16 2023 

Completed near join in admin area 14. Fri Mar  3 16:23:17 2023 

Completed near join in admin area 15. Fri Mar  3 16:23:17 2023 

Completed near join in admin area 

In [49]:
Bounded.sample(20)

Unnamed: 0,Sett_ID,Bounded_ID,ADM_ID,geometry,G3_Area,year,WSFE_ID,ADM_ID_left,index_right,ADM_ID_right
87523,32801,32887,,"POLYGON ((-909117.315 1083456.864, -909087.754...",2.840895,1998,87523.0,64.0,32887.0,64.0
117747,424,423,,"POLYGON ((-929543.963 1004262.954, -929514.402...",4.680354,2015,117747.0,29.0,423.0,29.0
115952,4096,4110,,"POLYGON ((-928952.743 1007839.835, -928893.621...",11.559321,2004,115952.0,29.0,4110.0,29.0
114013,4096,4109,,"POLYGON ((-929189.231 1010293.397, -929159.670...",36.285852,1999,114013.0,20.0,4109.0,20.0
28970,131507,131590,,"POLYGON ((-1040190.774 1431271.551, -1040161.2...",209.346228,2008,28970.0,51.0,131590.0,51.0
78568,14202,14153,,"POLYGON ((-946482.414 1105213.758, -946452.853...",2.719429,1998,78568.0,63.0,14153.0,63.0
103516,4090,4103,,"POLYGON ((-950887.003 1028739.459, -950857.442...",7.994877,1985,103516.0,20.0,4103.0,20.0
120142,40546,40700,,"POLYGON ((-648862.300 993946.166, -648832.739 ...",5.709752,2011,120142.0,39.0,40700.0,39.0
110208,4096,4109,,"POLYGON ((-930342.110 1014579.742, -930312.549...",36.285852,2000,110208.0,20.0,4109.0,20.0
16110,237162,237555,,"POLYGON ((-1115453.071 1586112.052, -1115393.9...",4.473251,2002,16110.0,70.0,237555.0,70.0


In [50]:
Bounded.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 131563 entries, 19660 to 17120
Data columns (total 10 columns):
 #   Column        Non-Null Count   Dtype   
---  ------        --------------   -----   
 0   Sett_ID       131563 non-null  int64   
 1   Bounded_ID    131563 non-null  int64   
 2   ADM_ID        0 non-null       float64 
 3   geometry      131563 non-null  geometry
 4   G3_Area       131563 non-null  float64 
 5   year          131563 non-null  int32   
 6   WSFE_ID       131563 non-null  float64 
 7   ADM_ID_left   131563 non-null  float64 
 8   index_right   131563 non-null  float64 
 9   ADM_ID_right  131563 non-null  float64 
dtypes: float64(6), geometry(1), int32(1), int64(2)
memory usage: 10.5 MB


In [51]:
# Remove WSFE features that did not match any GRID3 settlements.
Bounded = Bounded.loc[~Bounded['Sett_ID'].isna()]
Bounded.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 131563 entries, 19660 to 17120
Data columns (total 10 columns):
 #   Column        Non-Null Count   Dtype   
---  ------        --------------   -----   
 0   Sett_ID       131563 non-null  int64   
 1   Bounded_ID    131563 non-null  int64   
 2   ADM_ID        0 non-null       float64 
 3   geometry      131563 non-null  geometry
 4   G3_Area       131563 non-null  float64 
 5   year          131563 non-null  int32   
 6   WSFE_ID       131563 non-null  float64 
 7   ADM_ID_left   131563 non-null  float64 
 8   index_right   131563 non-null  float64 
 9   ADM_ID_right  131563 non-null  float64 
dtypes: float64(6), geometry(1), int32(1), int64(2)
memory usage: 10.5 MB


In [52]:
del GRID3_ADM, ADM_IDs

### 4.2 Remove duplicates: where buildup polygons intersected with more than one GRID3 settlement extent.
This happens when the first dataset (WSFE) intersects (distance = 0) with more than one feature of the second dataset (GRID3). More common for large cities. For example, Yaoundé, CMN has a large contiguous 1985 WSFE polygon which overlaps several small GRID3 features that are not Yaoundé.

In [53]:
# The first number should always be zero. 
# The second tells us whether/how many WSFE polygons were duplicated by the Near join.

print(len(WSFE_ADM[WSFE_ADM.duplicated('WSFE_ID')]), len(Bounded[Bounded.duplicated('WSFE_ID')]))

0 5830


In [54]:
# If there are duplicate WSFE_IDs, then we need to choose between them.
# We'll pick the one that joined with the largest GRID3 polygon.
# To do that, we can just sort the dataframe by GRID3 areas, then drop_duplicates. 
# It will retain the first row of each WSFE_ID group.
Bounded = Bounded.sort_values('G3_Area', ascending=False).drop_duplicates(['WSFE_ID'])
Bounded.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 125733 entries, 125952 to 57406
Data columns (total 10 columns):
 #   Column        Non-Null Count   Dtype   
---  ------        --------------   -----   
 0   Sett_ID       125733 non-null  int64   
 1   Bounded_ID    125733 non-null  int64   
 2   ADM_ID        0 non-null       float64 
 3   geometry      125733 non-null  geometry
 4   G3_Area       125733 non-null  float64 
 5   year          125733 non-null  int32   
 6   WSFE_ID       125733 non-null  float64 
 7   ADM_ID_left   125733 non-null  float64 
 8   index_right   125733 non-null  float64 
 9   ADM_ID_right  125733 non-null  float64 
dtypes: float64(6), geometry(1), int32(1), int64(2)
memory usage: 10.1 MB


In [55]:
print(len(Bounded[Bounded.duplicated('WSFE_ID')]))

0


In [56]:
# Now we can dissolve with the WSFE years, now that we can group them by their administratively split ID.
Bounded = Bounded.dissolve(by=['year', 'Bounded_ID'], as_index=False)
print(Bounded.info(), Bounded.sample(10))

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 15001 entries, 0 to 15000
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype   
---  ------        --------------  -----   
 0   year          15001 non-null  int64   
 1   Bounded_ID    15001 non-null  int64   
 2   geometry      15001 non-null  geometry
 3   Sett_ID       15001 non-null  int64   
 4   ADM_ID        0 non-null      float64 
 5   G3_Area       15001 non-null  float64 
 6   WSFE_ID       15001 non-null  float64 
 7   ADM_ID_left   15001 non-null  float64 
 8   index_right   15001 non-null  float64 
 9   ADM_ID_right  15001 non-null  float64 
dtypes: float64(6), geometry(1), int64(3)
memory usage: 1.1 MB
None        year  Bounded_ID                                           geometry  \
7080   2001       47723  POLYGON ((-793770.305 1040977.712, -793740.744...   
8839   2004      127777  MULTIPOLYGON (((-998184.598 1463759.087, -9981...   
7795   2002      146244  MULTIPOLYGON (((-997859.427 

In [57]:
# Clean up and save to file.
Bounded = Bounded[['ADM_ID_left', 'year', 'Bounded_ID', 'Sett_ID', 'geometry']].rename(columns={"ADM_ID_left": "ADM_ID"})
Bounded = Bounded.astype({"ADM_ID":'int', "Bounded_ID":'int', "Sett_ID":'int', "year":'int'})
print(Bounded.sample(10))
Bounded.to_file(
    driver='GPKG', filename=r'Results/NonCumulativeSettlements.gpkg', layer='Settlements_Bounded')

       ADM_ID  year  Bounded_ID  Sett_ID  \
7634       62  2002       42821    42655   
8687       64  2004       32861    32775   
10664      28  2008       21369    21358   
10655      63  2008       16991    17032   
896         5  1985       68183    68069   
3549       42  1995      205194   204889   
11650      27  2010       23460    23416   
744        62  1985       40848    40687   
3206       63  1995       14292    14341   
5039       64  1999       13998    14048   

                                                geometry  
7634   MULTIPOLYGON (((-670678.315 1055137.429, -6707...  
8687   MULTIPOLYGON (((-938264.457 1083427.303, -9382...  
10664  MULTIPOLYGON (((-871456.605 935770.125, -87145...  
10655  POLYGON ((-957449.544 1118161.474, -957419.983...  
896    MULTIPOLYGON (((-947546.610 1294699.746, -9475...  
3549   MULTIPOLYGON (((-385207.770 1503636.871, -3852...  
11650  MULTIPOLYGON (((-870333.287 1022117.796, -8703...  
744    MULTIPOLYGON (((-669732.363 1043786.

In [58]:
del WSFE_ADM

### 4.3 BOUNDLESS SETTLEMENTS: Dissolve features that were split by an ADM boundary.

In [59]:
# Fragments of any bounded settlement will be combined into a single "boundless" settlement in this version.
# It is based on their "Sett_ID", which is a direct loan from the GRID3 settlement features.
Boundless = Bounded.dissolve(by=['year', 'Sett_ID'], as_index=False)
print(Boundless.info(), Boundless.sample(10))

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 14857 entries, 0 to 14856
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   year        14857 non-null  int64   
 1   Sett_ID     14857 non-null  int64   
 2   geometry    14857 non-null  geometry
 3   ADM_ID      14857 non-null  int32   
 4   Bounded_ID  14857 non-null  int32   
dtypes: geometry(1), int32(2), int64(2)
memory usage: 464.4 KB
None        year  Sett_ID                                           geometry  \
13975  2014    87797  MULTIPOLYGON (((-717916.788 1338479.583, -7179...   
7428   2002    28342  POLYGON ((-895667.061 1092620.773, -895637.500...   
1681   1989    87635  POLYGON ((-736510.655 1301232.727, -736481.094...   
10432  2007   302298  POLYGON ((-395465.436 1700808.719, -395435.875...   
2735   1994    23416  MULTIPOLYGON (((-872047.825 1022354.284, -8720...   
12242  2011   127630  POLYGON ((-1067032.159 1486521.054, -1067002.5... 

In [60]:
# Clean up and save to file.
Boundless.to_file(driver='GPKG', filename=r'Results/NonCumulativeSettlements.gpkg', layer='Settlements_Boundless')

---

## 5. CUMULATIVE ANNUALIZED SETTLEMENT EXTENTS
DISSOLVE BY YEAR SETS: Create separate feature layers of each cumulative year.

### 5.1 Define study years for each for loop.

In [61]:
Boundless = gpd.read_file(r'Results/NonCumulativeSettlements.gpkg', layer='Settlements_Boundless')

In [62]:
ReversedStudyYears_to2014 = []
for i in AllStudyYears:
    ReversedStudyYears_to2014.insert(0,i)
ReversedStudyYears_to2014.remove(2015)
print('\n\n', ReversedStudyYears_to2014)



 [2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999]


### 5.2 Starting with main Boundless dataset, create a cumulative area feature layer for each year.

In [63]:
# For each year in the growth stats study, we are taking features from all years prior to and including that year, 
# dissolving those features, and exporting as its own file.

for item in AllStudyYears:
    print('Subsetting to cumulative area for year: %s. %s\n' % (item, time.ctime()))
    YearSet = Boundless[Boundless['year'].between(
        1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.
    print('Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. %s\n' % time.ctime())
    YearDissolve = YearSet.dissolve(by='Sett_ID', 
                                        aggfunc={"year": "max", "ADM_ID":"min"}, # Though ADM_ID should be matching every time.
                                        as_index=False)
    print('Write to file. %s\n' % time.ctime())
    YearName = ''.join(['Cu', str(item), '_Boundless'])
    YearDissolve.to_file(driver='GPKG', filename=r'Results/CumulativeSettlements.gpkg', layer=YearName)
    del YearSet, YearDissolve
print("Done with all years in set. %s" % time.ctime())

Subsetting to cumulative area for year: 1999. Fri Mar  3 16:35:31 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Fri Mar  3 16:35:31 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Fri Mar  3 16:35:44 2023

Subsetting to cumulative area for year: 2000. Fri Mar  3 16:35:48 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Fri Mar  3 16:35:48 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Fri Mar  3 16:36:02 2023

Subsetting to cumulative area for year: 2001. Fri Mar  3 16:36:06 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Fri Mar  3 16:36:06 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Fri Mar  3 16:36:21 2023

Subsetting to cumulative area for year: 2002. Fri Mar  3 16:36:25 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Fri Mar  3 16:36:25 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Fri Mar  3 16:36:41 2023

Subsetting to cumulative area for year: 2003. Fri Mar  3 16:36:45 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Fri Mar  3 16:36:45 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Fri Mar  3 16:37:02 2023

Subsetting to cumulative area for year: 2004. Fri Mar  3 16:37:05 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Fri Mar  3 16:37:05 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Fri Mar  3 16:37:23 2023

Subsetting to cumulative area for year: 2005. Fri Mar  3 16:37:27 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Fri Mar  3 16:37:27 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Fri Mar  3 16:37:45 2023

Subsetting to cumulative area for year: 2006. Fri Mar  3 16:37:48 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Fri Mar  3 16:37:48 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Fri Mar  3 16:38:08 2023

Subsetting to cumulative area for year: 2007. Fri Mar  3 16:38:11 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Fri Mar  3 16:38:11 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Fri Mar  3 16:38:30 2023

Subsetting to cumulative area for year: 2008. Fri Mar  3 16:38:34 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Fri Mar  3 16:38:34 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Fri Mar  3 16:38:52 2023

Subsetting to cumulative area for year: 2009. Fri Mar  3 16:38:56 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Fri Mar  3 16:38:56 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Fri Mar  3 16:39:17 2023

Subsetting to cumulative area for year: 2010. Fri Mar  3 16:39:20 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Fri Mar  3 16:39:20 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Fri Mar  3 16:39:41 2023

Subsetting to cumulative area for year: 2011. Fri Mar  3 16:39:45 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Fri Mar  3 16:39:45 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Fri Mar  3 16:40:06 2023

Subsetting to cumulative area for year: 2012. Fri Mar  3 16:40:10 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Fri Mar  3 16:40:10 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Fri Mar  3 16:40:32 2023

Subsetting to cumulative area for year: 2013. Fri Mar  3 16:40:35 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Fri Mar  3 16:40:35 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Fri Mar  3 16:40:58 2023

Subsetting to cumulative area for year: 2014. Fri Mar  3 16:41:01 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Fri Mar  3 16:41:01 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Fri Mar  3 16:41:25 2023

Subsetting to cumulative area for year: 2015. Fri Mar  3 16:41:29 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Fri Mar  3 16:41:29 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Fri Mar  3 16:41:53 2023

Done with all years in set. Fri Mar  3 16:41:56 2023


##### Join area information from each cumulative layer onto the latest year dataset.

In [64]:
# The latest year in the study contains all settlements. Merge all other years' areas onto this dataset.
SettAreas = gpd.read_file(r'Results/CumulativeSettlements.gpkg', layer=
                          ''.join(['Cu', str(2015), '_Boundless'])) 
SettAreas['AREA2015'] = SettAreas['geometry'].area / 10**6
SettAreas = pd.DataFrame(SettAreas).drop(columns='geometry') # We have settlement IDs, so no need to join spatially!


for item in ReversedStudyYears_to2014:
    print("Loading cumulative layer for year %s. %s\n" % (item, time.ctime()))
    YearLayer = gpd.read_file(r'Results/CumulativeSettlements.gpkg', layer=''.join(['Cu', str(item), '_Boundless']))
    print("Adding area field and converting to non-spatial dataframe. %s\n" % (time.ctime()))
    AreaYearName = ''.join(['AREA', str(item)])
    YearLayer[AreaYearName] = YearLayer['geometry'].area/ 10**6 
    YearLayer = pd.DataFrame(YearLayer)[['Sett_ID', AreaYearName]]
    print("Merging variables from %s onto our latest year (%s) via table join. %s\n" % (item, 2015, time.ctime()))
    SettAreas = SettAreas.merge(YearLayer, how='left', on='Sett_ID')
print("Done merging annualized areas onto latest year geometries. Saving to file. %s\n" % (time.ctime()))


print(SettAreas.info())
SettAreas.to_csv(os.path.join(ResultsFolder, 'Areas%sto%s.csv' % (1999, 2015)))

Loading cumulative layer for year 2014. Fri Mar  3 16:41:57 2023

Adding area field and converting to non-spatial dataframe. Fri Mar  3 16:41:57 2023

Merging variables from 2014 onto our latest year (2015) via table join. Fri Mar  3 16:41:57 2023

Loading cumulative layer for year 2013. Fri Mar  3 16:41:57 2023

Adding area field and converting to non-spatial dataframe. Fri Mar  3 16:41:58 2023

Merging variables from 2013 onto our latest year (2015) via table join. Fri Mar  3 16:41:58 2023

Loading cumulative layer for year 2012. Fri Mar  3 16:41:58 2023

Adding area field and converting to non-spatial dataframe. Fri Mar  3 16:41:59 2023

Merging variables from 2012 onto our latest year (2015) via table join. Fri Mar  3 16:41:59 2023

Loading cumulative layer for year 2011. Fri Mar  3 16:41:59 2023

Adding area field and converting to non-spatial dataframe. Fri Mar  3 16:41:59 2023

Merging variables from 2011 onto our latest year (2015) via table join. Fri Mar  3 16:41:59 2023

Load

In [65]:
del SettAreas

### 5.3 Repeat for Bounded dataset.

In [66]:
# Bounded = gpd.read_file(r'Results/NonCumulativeSettlements.gpkg', layer='Settlements_Bounded')

for item in AllStudyYears:
    print('Subsetting to cumulative area for year: %s. %s\n' % (item, time.ctime()))
    YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.
    print('Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. %s\n' % time.ctime())
    YearDissolve = YearSet.dissolve(by='Bounded_ID', 
                                        aggfunc={"year": "max", "ADM_ID":"min", "Sett_ID":"min"}, # Though ADM_ID and Sett_ID should be matching every time.
                                        as_index=False)
    print('Write to file. %s\n' % time.ctime())
    YearName = ''.join(['Cu', str(item), '_Bounded'])
    YearDissolve.to_file(driver='GPKG', filename=r'Results/CumulativeSettlements.gpkg', layer=YearName)
    del YearSet, YearDissolve
print("Done with all years in set. %s" % time.ctime())

Subsetting to cumulative area for year: 1999. Fri Mar  3 16:42:08 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Fri Mar  3 16:42:08 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Fri Mar  3 16:42:21 2023

Subsetting to cumulative area for year: 2000. Fri Mar  3 16:42:25 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Fri Mar  3 16:42:25 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Fri Mar  3 16:42:39 2023

Subsetting to cumulative area for year: 2001. Fri Mar  3 16:42:42 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Fri Mar  3 16:42:42 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Fri Mar  3 16:42:57 2023

Subsetting to cumulative area for year: 2002. Fri Mar  3 16:43:00 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Fri Mar  3 16:43:00 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Fri Mar  3 16:43:16 2023

Subsetting to cumulative area for year: 2003. Fri Mar  3 16:43:20 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Fri Mar  3 16:43:20 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Fri Mar  3 16:43:36 2023

Subsetting to cumulative area for year: 2004. Fri Mar  3 16:43:40 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Fri Mar  3 16:43:40 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Fri Mar  3 16:43:58 2023

Subsetting to cumulative area for year: 2005. Fri Mar  3 16:44:02 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Fri Mar  3 16:44:02 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Fri Mar  3 16:44:21 2023

Subsetting to cumulative area for year: 2006. Fri Mar  3 16:44:25 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Fri Mar  3 16:44:25 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Fri Mar  3 16:44:44 2023

Subsetting to cumulative area for year: 2007. Fri Mar  3 16:44:47 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Fri Mar  3 16:44:47 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Fri Mar  3 16:45:08 2023

Subsetting to cumulative area for year: 2008. Fri Mar  3 16:45:11 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Fri Mar  3 16:45:11 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Fri Mar  3 16:45:32 2023

Subsetting to cumulative area for year: 2009. Fri Mar  3 16:45:35 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Fri Mar  3 16:45:35 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Fri Mar  3 16:45:56 2023

Subsetting to cumulative area for year: 2010. Fri Mar  3 16:46:00 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Fri Mar  3 16:46:00 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Fri Mar  3 16:46:21 2023

Subsetting to cumulative area for year: 2011. Fri Mar  3 16:46:25 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Fri Mar  3 16:46:25 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Fri Mar  3 16:46:47 2023

Subsetting to cumulative area for year: 2012. Fri Mar  3 16:46:50 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Fri Mar  3 16:46:50 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Fri Mar  3 16:47:13 2023

Subsetting to cumulative area for year: 2013. Fri Mar  3 16:47:17 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Fri Mar  3 16:47:17 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Fri Mar  3 16:47:40 2023

Subsetting to cumulative area for year: 2014. Fri Mar  3 16:47:43 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Fri Mar  3 16:47:43 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Fri Mar  3 16:48:07 2023

Subsetting to cumulative area for year: 2015. Fri Mar  3 16:48:10 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Fri Mar  3 16:48:10 2023



  YearSet = Bounded[Bounded['year'].between(1985, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Fri Mar  3 16:48:35 2023

Done with all years in set. Fri Mar  3 16:48:38 2023


In [67]:
SettAreas = gpd.read_file(r'Results/CumulativeSettlements.gpkg', 
                          layer=''.join(['Cu', str(2015), '_Bounded']))
SettAreas['AREA2015'] = SettAreas['geometry'].area / 10**6
SettAreas = pd.DataFrame(SettAreas).drop(columns='geometry')


for item in ReversedStudyYears_to2014:
    print("Loading cumulative layer for year %s. %s\n" % (item, time.ctime()))
    YearLayer = gpd.read_file(r'Results/CumulativeSettlements.gpkg', layer=''.join(['Cu', str(item), '_Bounded']))
    print("Adding area field and converting to non-spatial dataframe. %s\n" % (time.ctime()))
    AreaYearName = ''.join(['AREA', str(item)])
    YearLayer[AreaYearName] = YearLayer['geometry'].area/ 10**6 
    YearLayer = pd.DataFrame(YearLayer)[['Bounded_ID', AreaYearName]]
    print("Merging variables from %s onto our latest year (%s) via table join. %s\n" % (item, 2015, time.ctime()))
    SettAreas = SettAreas.merge(YearLayer, how='left', on='Bounded_ID')
print("Done merging annualized areas onto latest year geometries. Saving to file. %s\n" % (time.ctime()))

print(SettAreas.info())
SettAreas.to_csv(os.path.join(ResultsFolder, 'Areas%sto%s_%s.csv' % (1999, 2015, 'Bounded')))

Loading cumulative layer for year 2014. Fri Mar  3 16:48:39 2023

Adding area field and converting to non-spatial dataframe. Fri Mar  3 16:48:40 2023

Merging variables from 2014 onto our latest year (2015) via table join. Fri Mar  3 16:48:40 2023

Loading cumulative layer for year 2013. Fri Mar  3 16:48:40 2023

Adding area field and converting to non-spatial dataframe. Fri Mar  3 16:48:40 2023

Merging variables from 2013 onto our latest year (2015) via table join. Fri Mar  3 16:48:40 2023

Loading cumulative layer for year 2012. Fri Mar  3 16:48:40 2023

Adding area field and converting to non-spatial dataframe. Fri Mar  3 16:48:41 2023

Merging variables from 2012 onto our latest year (2015) via table join. Fri Mar  3 16:48:41 2023

Loading cumulative layer for year 2011. Fri Mar  3 16:48:41 2023

Adding area field and converting to non-spatial dataframe. Fri Mar  3 16:48:42 2023

Merging variables from 2011 onto our latest year (2015) via table join. Fri Mar  3 16:48:42 2023

Load

In [68]:
del SettAreas

### 5.4 One settlement geofile to rule them all. ...and in the Sett_ID bind them.
The annualized values can be stored as distinct non-spatial dataframes. Their Sett_IDs will be used to join onto this geoversion with place names for the summary stats.

In [69]:
Settlements = gpd.read_file(r'Results/CumulativeSettlements.gpkg', 
                           layer=''.join(['Cu', str(2015), '_Boundless']))[['Sett_ID', 'ADM_ID', 'geometry']]
print(Settlements.info())
print(Settlements.crs)
Settlements.to_file(driver='GPKG', 
                       filename=r'Results/SETTLEMENTS.gpkg', 
                       layer='SETTLEMENTS_equalarea')

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 1939 entries, 0 to 1938
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   Sett_ID   1939 non-null   int64   
 1   ADM_ID    1939 non-null   int64   
 2   geometry  1939 non-null   geometry
dtypes: geometry(1), int64(2)
memory usage: 45.6 KB
None
PROJCS["Africa_Albers_Equal_Area_Conic",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",0],PARAMETER["longitude_of_center",25],PARAMETER["standard_parallel_1",20],PARAMETER["standard_parallel_2",-23],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["ESRI","102022"]]


In [70]:
# Saving all the final products as WGS84.
Settlements_WGS = Settlements.to_crs(4326) 
print(Settlements_WGS.info())
print(Settlements_WGS.crs)
Settlements_WGS.to_file(driver='GPKG', 
                       filename=r'Results/SETTLEMENTS.gpkg', 
                       layer='SETTLEMENTS')

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 1939 entries, 0 to 1938
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   Sett_ID   1939 non-null   int64   
 1   ADM_ID    1939 non-null   int64   
 2   geometry  1939 non-null   geometry
dtypes: geometry(1), int64(2)
memory usage: 45.6 KB
None
epsg:4326


### 5.5 Buffer the area of the Boundless dataset's latest year to mask raster data in later sections.
The Bounded dataset would also be fine for our purposes here. The buffer is dissolved to a single feature to be used for its total extents, which are identical between Bounded & Boundless datasets.

In [71]:
# Create buffer layer(s) to use as maximum distance for Near joins.

# Population buffer: 2km
Distance = 2000 # The Africa Albers projection is in meters. Saving in this projection to use in later sections.

print('Creating buffer layer. %s' % time.ctime())
BufferLayer = Settlements[['Sett_ID', 'geometry']]
BufferLayer['geometry'] = BufferLayer['geometry'].apply(
    make_valid).buffer(Distance) # make_valid is a workaround for any null geometries.
print('Finished buffer layer creation. %s' % time.ctime())
BufferFileName1 = ''.join(['Buff', str(Distance), 'm_', str(2015)])
BufferLayer.to_file(driver='GPKG', filename=r'Results/Catchment.gpkg', layer=BufferFileName1)
print('Saved to file. %s' % time.ctime())

Creating buffer layer. Fri Mar  3 16:48:59 2023


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)


Finished buffer layer creation. Fri Mar  3 16:52:45 2023
Saved to file. Fri Mar  3 16:52:47 2023


In [72]:
# Nighttime Lights buffer: 250m
Distance = 250

print('Creating buffer layer. %s' % time.ctime())
BufferLayer = Settlements[['Sett_ID', 'geometry']]
BufferLayer['geometry'] = BufferLayer['geometry'].apply(
    make_valid).buffer(Distance) # make_valid is a workaround for any null geometries.
print('Finished buffer layer creation. %s' % time.ctime())
BufferFileName2 = ''.join(['Buff', str(Distance), 'm_', str(2015)])
BufferLayer.to_file(driver='GPKG', filename=r'Results/Catchment.gpkg', layer=BufferFileName2)
print('Saved to file. %s' % time.ctime())

Creating buffer layer. Fri Mar  3 16:52:47 2023


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)


Finished buffer layer creation. Fri Mar  3 16:53:25 2023
Saved to file. Fri Mar  3 16:53:28 2023


---

## 6. PLACE NAMES
Join urban place names from UCDB, Africapolis, and GeoNames onto the settlement vectors.

### 6.1 Load placename datasets, filter, and project.

In [73]:
# Anytime we use a spatial join or work with area, 
# my preference is to keep it in a planar, equal area, meters projection. So we'll load as the Africa Albers.
Settlements = gpd.read_file(r'Results/SETTLEMENTS.gpkg', layer='SETTLEMENTS_equalarea')
Settlements['AREA2015'] = Settlements['geometry'].area / 10**6

# Load, pull name field, rename, and reproject to match the catchments CRS.
UCDB = gpd.read_file('PlaceName/GHS_STAT_UCDB2015MT_GLOBE_R2019A_V1_2.gpkg', 
                     layer=0)[['UC_NM_MN', 'geometry']].rename(
    columns={"UC_NM_MN": "UCDB_Name"}).to_crs("ESRI:102022")

Africapolis = gpd.read_file('PlaceName/AFRICAPOLIS2020.shp')[['agglosName', 'geometry']].rename(
    columns={"agglosName": "Afpl_Name"}).to_crs("ESRI:102022")

GeoNames = gpd.read_file('PlaceName/GeoNames.gpkg', 
                         layer=0)[['GeoName', 'geometry']].to_crs("ESRI:102022")

print(Settlements.info(), UCDB.info(), Africapolis.info(), GeoNames.info())

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 1939 entries, 0 to 1938
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   Sett_ID   1939 non-null   int64   
 1   ADM_ID    1939 non-null   int64   
 2   geometry  1939 non-null   geometry
 3   AREA2015  1939 non-null   float64 
dtypes: float64(1), geometry(1), int64(2)
memory usage: 60.7 KB
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 13135 entries, 0 to 13134
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype   
---  ------     --------------  -----   
 0   UCDB_Name  13135 non-null  object  
 1   geometry   13135 non-null  geometry
dtypes: geometry(1), object(1)
memory usage: 205.4+ KB
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 7720 entries, 0 to 7719
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype   
---  ------     --------------  -----   
 0   Afpl_Name  7720 non-null   object  
 1   geome

### 6.2 Join placenames onto settlements geodataframe.

In [74]:
# We wrap it in pd.DataFrame() since the sjoin() is the last time we need the geometry.

GeoNames = pd.DataFrame(gpd.sjoin_nearest(GeoNames, Settlements, 
                             how='left', distance_col="distGN", max_distance=250, 
                             lsuffix="G3", rsuffix="GN")).drop(columns='geometry')
Africapolis = pd.DataFrame(gpd.sjoin_nearest(Africapolis, Settlements, 
                             how='left', distance_col="distAF", max_distance=250,
                             lsuffix="G3", rsuffix="Af")).drop(columns='geometry')
UCDB = pd.DataFrame(gpd.sjoin_nearest(UCDB, Settlements, 
                             how='left', distance_col="distUC", max_distance=250,
                             lsuffix="G3", rsuffix="UC")).drop(columns='geometry')

In [75]:
print(GeoNames.info())
print(Africapolis.info())
print(UCDB.info())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 199390 entries, 0 to 199389
Data columns (total 6 columns):
 #   Column    Non-Null Count   Dtype  
---  ------    --------------   -----  
 0   GeoName   199390 non-null  object 
 1   index_GN  42 non-null      float64
 2   Sett_ID   42 non-null      float64
 3   ADM_ID    42 non-null      float64
 4   AREA2015  42 non-null      float64
 5   distGN    42 non-null      float64
dtypes: float64(5), object(1)
memory usage: 10.6+ MB
None
<class 'pandas.core.frame.DataFrame'>
Int64Index: 7772 entries, 0 to 7719
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Afpl_Name  7772 non-null   object 
 1   index_Af   139 non-null    float64
 2   Sett_ID    139 non-null    float64
 3   ADM_ID     139 non-null    float64
 4   AREA2015   139 non-null    float64
 5   distAF     139 non-null    float64
dtypes: float64(5), object(1)
memory usage: 425.0+ KB
None
<class 'pandas.core.frame.D

In [76]:
alldatasets = [pd.DataFrame(Settlements).drop(columns='geometry'),
               Africapolis[['Sett_ID', 'Afpl_Name', 'distAF']], 
               GeoNames[['Sett_ID', 'GeoName', 'distGN']],
               UCDB[['Sett_ID', 'UCDB_Name', 'distUC']]]

SettlementsNamed = reduce(lambda left,right: pd.merge(left,right,on=['Sett_ID'], how='left'), alldatasets)
SettlementsNamed[['Afpl_Name', 'GeoName', 'UCDB_Name']] = SettlementsNamed[['Afpl_Name', 'GeoName', 'UCDB_Name']].fillna('UNK')

# Replace NaN values with a countable distance.
SettlementsNamed[['distAF', 'distGN', 'distUC']] = SettlementsNamed[['distAF', 'distGN', 'distUC']].fillna(-1)

In [77]:
print(SettlementsNamed.info())
print(SettlementsNamed.sample(10))

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1942 entries, 0 to 1941
Data columns (total 9 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Sett_ID    1942 non-null   int64  
 1   ADM_ID     1942 non-null   int64  
 2   AREA2015   1942 non-null   float64
 3   Afpl_Name  1942 non-null   object 
 4   distAF     1942 non-null   float64
 5   GeoName    1942 non-null   object 
 6   distGN     1942 non-null   float64
 7   UCDB_Name  1942 non-null   object 
 8   distUC     1942 non-null   float64
dtypes: float64(4), int64(2), object(3)
memory usage: 151.7+ KB
None
      Sett_ID  ADM_ID  AREA2015 Afpl_Name  distAF GeoName  distGN UCDB_Name  \
1445   131844       6  0.011360       UNK    -1.0     UNK    -1.0       UNK   
808     35135      24  0.041945       UNK    -1.0     UNK    -1.0       UNK   
141      4227      20  0.042819       UNK    -1.0     UNK    -1.0       UNK   
1059    51517      37  0.048936       UNK    -1.0     UNK    -1.0    

In [78]:
del UCDB, Africapolis, GeoNames

The near joins should have prevented duplication of rows, but if df1 intersects with two features in df2, it creates a new row. Two of our placenames sources are polygons, so there may be instances.

In [79]:
SettlementsNamed[SettlementsNamed.duplicated('Sett_ID', keep=False)]

Unnamed: 0,Sett_ID,ADM_ID,AREA2015,Afpl_Name,distAF,GeoName,distGN,UCDB_Name,distUC
1402,131506,6,12.969719,Koundoul II / Sara,0.0,UNK,-1.0,N'Djamena,0.0
1403,131506,6,12.969719,Ndjamena/Kousseri [CMR],0.0,UNK,-1.0,N'Djamena,0.0
1404,131507,6,148.711349,Koundoul II / Sara,0.0,N'Djamena,0.0,N'Djamena,0.0
1405,131507,6,148.711349,Ndjamena/Kousseri [CMR],0.0,N'Djamena,0.0,N'Djamena,0.0
1406,131507,6,148.711349,Ndjamena/Kousseri [TCD],0.0,N'Djamena,0.0,N'Djamena,0.0


In [80]:
SettlementsNamed.drop_duplicates(subset=['Sett_ID'], inplace=True, keep='first')
SettlementsNamed.info() # Range of entries should be the same as original Settlements file.

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1939 entries, 0 to 1941
Data columns (total 9 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Sett_ID    1939 non-null   int64  
 1   ADM_ID     1939 non-null   int64  
 2   AREA2015   1939 non-null   float64
 3   Afpl_Name  1939 non-null   object 
 4   distAF     1939 non-null   float64
 5   GeoName    1939 non-null   object 
 6   distGN     1939 non-null   float64
 7   UCDB_Name  1939 non-null   object 
 8   distUC     1939 non-null   float64
dtypes: float64(4), int64(2), object(3)
memory usage: 151.5+ KB


### 6.3 Reduce to single name column.

In [81]:
# Determine which source has a name geometrically closest to the settlement.
# Since we switched NaN values to -1 earlier, we also resolved what happens in the event of a tie, 
# i.e. when more than one source is 0.0 meters from the settlement. It will take the value from the first column.
SettlementsNamed['SettName'] = "UNK"
SettlementsNamed['closest'] = SettlementsNamed[['distAF', 'distGN', 'distUC']].idxmax(axis=1)

In [82]:
SettlementsNamed.sample(20)

Unnamed: 0,Sett_ID,ADM_ID,AREA2015,Afpl_Name,distAF,GeoName,distGN,UCDB_Name,distUC,SettName,closest
1547,146075,19,0.035828,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
966,42496,62,0.249922,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
1113,55011,38,0.003495,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
292,14309,63,0.2473,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
1516,145851,19,0.094376,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
1788,302537,48,0.001748,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
1611,183068,8,0.036702,UNK,-1.0,UNK,-1.0,Mongo,0.0,UNK,distUC
65,1881,29,0.001748,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
1473,131969,6,0.023594,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
1833,335702,58,0.117096,UNK,-1.0,Bardaï,0.0,UNK,-1.0,UNK,distGN


In [83]:
# Create a single name column where non-named settlements are "UNK" but all others use one of the three name sources.
SettlementsNamed.loc[
    SettlementsNamed['closest'] == "distAF", 
    'SettName'] = SettlementsNamed['Afpl_Name']

SettlementsNamed.loc[
    SettlementsNamed['closest'] == "distUC", 
    'SettName'] = SettlementsNamed['UCDB_Name']

SettlementsNamed.loc[
    SettlementsNamed['closest'] == "distGN", 
    'SettName'] = SettlementsNamed['GeoName']

In [84]:
SettlementsNamed.sample(20)

Unnamed: 0,Sett_ID,ADM_ID,AREA2015,Afpl_Name,distAF,GeoName,distGN,UCDB_Name,distUC,SettName,closest
1015,47550,30,1.12727,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
579,24985,27,0.000874,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
1054,47903,30,0.005243,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
256,14156,63,0.036702,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
387,17085,63,0.007865,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
1897,338115,51,0.021846,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
457,21184,28,0.019225,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
421,21080,28,0.004369,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
162,4803,20,0.004369,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
1540,146043,19,0.113601,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF


In [85]:
SettlementsNamed[SettlementsNamed['SettName'] != 'UNK'].sample(20)

Unnamed: 0,Sett_ID,ADM_ID,AREA2015,Afpl_Name,distAF,GeoName,distGN,UCDB_Name,distUC,SettName,closest
1265,107503,56,0.034954,Koukou Angarana,0.0,UNK,-1.0,UNK,-1.0,Koukou Angarana,distAF
1869,337921,51,0.007865,UNK,-1.0,UNK,-1.0,N'Djamena,0.0,N'Djamena,distUC
1670,236319,18,0.005243,UNK,-1.0,UNK,-1.0,Bol,0.0,Bol,distUC
1868,337920,51,0.013982,UNK,-1.0,UNK,-1.0,N'Djamena,0.0,N'Djamena,distUC
1814,333078,65,2.237062,Faya,0.0,Faya-Largeau,0.0,Faya-Largeau,0.0,Faya,distAF
943,40544,39,3.29355,UNK,-1.0,UNK,-1.0,Danamadji,0.0,Danamadji,distUC
1228,91878,40,2.396977,Kyabe,0.0,Kyabé,17.699476,كيابي Kyabé,0.0,Kyabé,distGN
1391,130022,14,0.010486,Massaguet,0.0,UNK,-1.0,UNK,-1.0,Massaguet,distAF
1889,337993,51,0.004369,Ndjamena/Kousseri [TCD],0.0,UNK,-1.0,N'Djamena,0.0,Ndjamena/Kousseri [TCD],distAF
1833,335702,58,0.117096,UNK,-1.0,Bardaï,0.0,UNK,-1.0,Bardaï,distGN


### 6.4 Make sure place name is unique by stripping smaller localities of duplicated names.

In [86]:
Dupes = SettlementsNamed[ 
    (SettlementsNamed['SettName'] != 'UNK') & 
    (SettlementsNamed.duplicated('SettName', keep=False)) ] # keep=False is necessary to retain *all* duplicates, not just first or last in each group.

print("Number of named settlements: %s" % SettlementsNamed['SettName'].str.contains('UNK').value_counts()[False])
print("Number of named settlements where name is duplicated at least once: %s" % len(Dupes))

Number of named settlements: 185
Number of named settlements where name is duplicated at least once: 107


In [87]:
Largest = Dupes.loc[Dupes.groupby(["SettName"])["AREA2015"].idxmax()]
print(Largest)

       Sett_ID  ADM_ID    AREA2015                Afpl_Name  distAF  \
1763    278762      41    0.145060                   Abeche     0.0   
1748    278172      41   23.189424                   Abeche     0.0   
1257    102257      44    1.440983                 Am Timan     0.0   
418      21076      28    0.340802              Amboko Camp     0.0   
1614    186498       1    2.903812                      Ati     0.0   
560      23470      27    1.253105                      UNK    -1.0   
592      25275      23    8.207223                   Benoye     0.0   
1780    302299      48    0.865114                  Biltine     0.0   
1596    170082       9    0.959490                  Bitkine     0.0   
1636    234510      18    2.400473                      Bol     0.0   
792      35033      24   12.879712                     Doba     0.0   
419      21078      28    1.332625                    Donia     0.0   
1814    333078      65    2.237062                     Faya     0.0   
1824  

In [88]:
# Filter to settlements which have a duplicated name and are not the largest of those with that name, then replace with UNK.
SettlementsNamed.loc[(~SettlementsNamed.Sett_ID.isin(Largest.Sett_ID)) 
                     & (SettlementsNamed.Sett_ID.isin(Dupes.Sett_ID)), 
                     'SettName'] = 'UNK'

In [89]:
# Second number should now be zero.

print("Number of named settlements: %s" % SettlementsNamed['SettName'].str.contains('UNK').value_counts()[False])
print("Number of named settlements where name is duplicated at least once: %s" % len(SettlementsNamed[ 
    (SettlementsNamed['SettName'] != 'UNK') & 
    (SettlementsNamed.duplicated('SettName', keep=False)) ]))

Number of named settlements: 108
Number of named settlements where name is duplicated at least once: 0


In [90]:
print(SettlementsNamed.info(), SettlementsNamed[SettlementsNamed['SettName'] != "UNK"].sample(20))

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1939 entries, 0 to 1941
Data columns (total 11 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Sett_ID    1939 non-null   int64  
 1   ADM_ID     1939 non-null   int64  
 2   AREA2015   1939 non-null   float64
 3   Afpl_Name  1939 non-null   object 
 4   distAF     1939 non-null   float64
 5   GeoName    1939 non-null   object 
 6   distGN     1939 non-null   float64
 7   UCDB_Name  1939 non-null   object 
 8   distUC     1939 non-null   float64
 9   SettName   1939 non-null   object 
 10  closest    1939 non-null   object 
dtypes: float64(4), int64(2), object(5)
memory usage: 246.3+ KB
None       Sett_ID  ADM_ID   AREA2015        Afpl_Name  distAF    GeoName  \
1777   302294      48   0.057674            Arada     0.0        UNK   
1267   107505      56   0.372261        Goz Beida     0.0  Goz-Beida   
1621   204889      42   0.027089          Abdi II     0.0        UNK   
1494   137572   

In [91]:
# Drop extra columns and save to file.
SettlementsNamed = SettlementsNamed[['Sett_ID', 'SettName']]
SettlementsNamed.to_csv(r'Results/PlaceNames.csv')

In [92]:
del SettlementsNamed

---

## 7. CREATE FRAGMENTATION INDEX
We are determining what percentage of a settlement's area lies outside of its administrative zone each year.
The index is a range of 0 to 100, i.e. the percent of the settlement area which is fragmented.

For each Sett_ID:
((Area of Boundless settlement - Area of largest Bounded settlement feature) / Area of Boundless settlement) * 100

### 7.1 Load boundless and bounded cumulative settlements and clean.

In [93]:
BoundlessAreas = pd.read_csv(os.path.join(ResultsFolder, 'Areas1999to2015.csv'))
print('Loaded Boundless dataset, whose settlements will be used as the index of the Fragmentation Index dataset. %s' 
      % time.ctime())
print(BoundlessAreas.info())

BoundedAreas = pd.read_csv(os.path.join(ResultsFolder, 'Areas1999to2015_Bounded.csv'))
print('Loaded Bounded dataset, which will factor into the fragmentation calculation. %s' % time.ctime())
print(BoundedAreas.info())

Loaded Boundless dataset, whose settlements will be used as the index of the Fragmentation Index dataset. Fri Mar  3 16:56:00 2023
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1939 entries, 0 to 1938
Data columns (total 21 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  1939 non-null   int64  
 1   Sett_ID     1939 non-null   int64  
 2   year        1939 non-null   int64  
 3   ADM_ID      1939 non-null   int64  
 4   AREA2015    1939 non-null   float64
 5   AREA2014    1917 non-null   float64
 6   AREA2013    1889 non-null   float64
 7   AREA2012    1873 non-null   float64
 8   AREA2011    1850 non-null   float64
 9   AREA2010    1838 non-null   float64
 10  AREA2009    1824 non-null   float64
 11  AREA2008    1809 non-null   float64
 12  AREA2007    1798 non-null   float64
 13  AREA2006    1791 non-null   float64
 14  AREA2005    1781 non-null   float64
 15  AREA2004    1763 non-null   float64
 16  AREA2003    1750 no

In [94]:
LargestFragments = BoundedAreas.loc[BoundedAreas.groupby(["Sett_ID"])["AREA2015"].idxmax()] 
print(LargestFragments.info())
print("Filtered the Bounded dataset to only rows where latest year's area is largest for each Sett_ID. %s" % time.ctime())
LargestFragments.columns = LargestFragments.columns.str.replace('AREA', 'Largest')
LargestFragments = LargestFragments.drop(columns=['year', 'ADM_ID'])
print("Renamed columns to avoid duplication during merge, and dropped unnecessary columns. %s" % time.ctime())
FragIndices = BoundlessAreas.merge(LargestFragments, how='left', on='Sett_ID')
print(FragIndices.info())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1939 entries, 0 to 1957
Data columns (total 22 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  1939 non-null   int64  
 1   Bounded_ID  1939 non-null   int64  
 2   year        1939 non-null   int64  
 3   ADM_ID      1939 non-null   int64  
 4   Sett_ID     1939 non-null   int64  
 5   AREA2015    1939 non-null   float64
 6   AREA2014    1917 non-null   float64
 7   AREA2013    1889 non-null   float64
 8   AREA2012    1873 non-null   float64
 9   AREA2011    1850 non-null   float64
 10  AREA2010    1838 non-null   float64
 11  AREA2009    1824 non-null   float64
 12  AREA2008    1809 non-null   float64
 13  AREA2007    1798 non-null   float64
 14  AREA2006    1791 non-null   float64
 15  AREA2005    1781 non-null   float64
 16  AREA2004    1763 non-null   float64
 17  AREA2003    1750 non-null   float64
 18  AREA2002    1737 non-null   float64
 19  AREA2001    1710 non-null  

In [95]:
del BoundlessAreas, BoundedAreas, LargestFragments

### 7.2 Merge and run fragmentation calculation.

In [96]:
for item in AllStudyYears:
    YY = str(item) # 4-digit year
    AreaYY = ''.join(["AREA", YY]) # The Boundless area variable name
    LargestYY = ''.join(['Largest', YY]) # The Bounded largest area variable name
    FragYY = ''.join(["Frag", YY]) # Name for the fragmentation index variable
    print("Created names for Year %s's variables and temporary objects. %s" % (item, time.ctime()))
    
    FragIndices[FragYY] = ((FragIndices[AreaYY] - FragIndices[LargestYY]) / FragIndices[AreaYY]) * 100
    FragIndices[FragYY] = (FragIndices[FragYY].fillna(0).replace([np.inf, -np.inf], 0)).astype('int')
    print("Calculated fragmentation index for year %s. %s" % (item, time.ctime()))

# Remove unnecessary columns.
FragIndices = FragIndices.loc[:, ~FragIndices.columns.str.startswith('Largest')]
FragIndices = FragIndices.loc[:, ~FragIndices.columns.str.startswith('AREA')]

print('Completed fragmentation index calculations for all years. %s' % time.ctime())
print(FragIndices.info())
print(FragIndices.sample(5))

Created names for Year 1999's variables and temporary objects. Fri Mar  3 16:56:02 2023
Calculated fragmentation index for year 1999. Fri Mar  3 16:56:02 2023
Created names for Year 2000's variables and temporary objects. Fri Mar  3 16:56:02 2023
Calculated fragmentation index for year 2000. Fri Mar  3 16:56:02 2023
Created names for Year 2001's variables and temporary objects. Fri Mar  3 16:56:02 2023
Calculated fragmentation index for year 2001. Fri Mar  3 16:56:02 2023
Created names for Year 2002's variables and temporary objects. Fri Mar  3 16:56:02 2023
Calculated fragmentation index for year 2002. Fri Mar  3 16:56:02 2023
Created names for Year 2003's variables and temporary objects. Fri Mar  3 16:56:02 2023
Calculated fragmentation index for year 2003. Fri Mar  3 16:56:02 2023
Created names for Year 2004's variables and temporary objects. Fri Mar  3 16:56:02 2023
Calculated fragmentation index for year 2004. Fri Mar  3 16:56:02 2023
Created names for Year 2005's variables and te

In [97]:
FragIndices = FragIndices.drop(columns=['Unnamed: 0_x', 'Unnamed: 0_y', 'year', 'ADM_ID'])
FragIndices.to_csv(os.path.join(ResultsFolder, 'FragIndex%sto%s.csv' % (1999, 2015)))
print('Saved to file. %s' % time.ctime())

Saved to file. Fri Mar  3 16:56:02 2023


In [98]:
del FragIndices

---

## 8. PREPARE YEARLY DATASETS: POPULATION
Can use this as a template for other annualized rasters

### 8.1 Reproject and reclassify with settlement buffer mask.
Reclassify so that we only need to work with cells within X distance of settlements.

In [99]:
MaskByZone(MaskPath=r'Results/Catchment.gpkg', MaskLayerName = "Buff2000m_2015", 
           SourceFolder='Population/SourceFiles', DestFolder='Population')

['tcd_ppp_2000_UNadj.tif', 'tcd_ppp_2001_UNadj.tif', 'tcd_ppp_2002_UNadj.tif', 'tcd_ppp_2003_UNadj.tif', 'tcd_ppp_2004_UNadj.tif', 'tcd_ppp_2005_UNadj.tif', 'tcd_ppp_2006_UNadj.tif', 'tcd_ppp_2007_UNadj.tif', 'tcd_ppp_2008_UNadj.tif', 'tcd_ppp_2009_UNadj.tif', 'tcd_ppp_2010_UNadj.tif', 'tcd_ppp_2011_UNadj.tif', 'tcd_ppp_2012_UNadj.tif', 'tcd_ppp_2013_UNadj.tif', 'tcd_ppp_2014_UNadj.tif', 'tcd_ppp_2015_UNadj.tif']
Source projection:  None
Destination projection:  Africa_Albers_Equal_Area_Conic
Finished gdal.Warp() for tcd_ppp_2000_UNadj.tif. Fri Mar  3 16:56:43 2023 

We warped the data, so we'll use that file for next step.
Finished rasterio.mask.mask() for tcd_ppp_2000_UNadj.tif. Fri Mar  3 16:56:53 2023 

Written to file. Fri Mar  3 16:57:10 2023 

Removed intermediate file. Fri Mar  3 16:57:11 2023 

Source projection:  None
Destination projection:  Africa_Albers_Equal_Area_Conic
Finished gdal.Warp() for tcd_ppp_2001_UNadj.tif. Fri Mar  3 16:57:44 2023 

We warped the data, so we'll

In [100]:
print(os.listdir('Population/'))

['Msk_2000.tif', 'Msk_2001.tif', 'Msk_2002.tif', 'Msk_2003.tif', 'Msk_2004.tif', 'Msk_2005.tif', 'Msk_2006.tif', 'Msk_2007.tif', 'Msk_2008.tif', 'Msk_2009.tif', 'Msk_2010.tif', 'Msk_2011.tif', 'Msk_2012.tif', 'Msk_2013.tif', 'Msk_2014.tif', 'Msk_2015.tif', 'SourceFiles']


### 8.2 Raster values summarized by settlement.
1. Convert each annualized raster to .xyz, 
2. then bring them to vector space and assign their Sett_ID,
3. and finally, aggregate the value as appropriate to the settlement level and save table to file.

XYZ is similar to .csv. Raster cell centers are stored as x and y, and their value is stored as z.

In [15]:
Settlements = gpd.read_file(r'Results/SETTLEMENTS.gpkg', layer='SETTLEMENTS_equalarea')[['Sett_ID', 'geometry']]

In [17]:
BatchZonalStats(FolderName='Population', Zones=Settlements, StatsWanted=['count', 'sum'], SeriesStart=2000, SeriesEnd=2015)

['Msk_2000.tif', 'Msk_2001.tif', 'Msk_2002.tif', 'Msk_2003.tif', 'Msk_2004.tif', 'Msk_2005.tif', 'Msk_2006.tif', 'Msk_2007.tif', 'Msk_2008.tif', 'Msk_2009.tif', 'Msk_2010.tif', 'Msk_2011.tif', 'Msk_2012.tif', 'Msk_2013.tif', 'Msk_2014.tif', 'Msk_2015.tif']
       Sett_ID
0            1
1            2
2            3
3            5
4            6
...        ...
1934    349427
1935    350163
1936    350168
1937    350194
1938  21474836

[1939 rows x 1 columns]
Loading data for Msk_2000.tif. Sat Mar  4 10:46:03 2023 

Creating XYZ (gdal.Translate()).
Finished gdal.Translate() for year 2000. Sat Mar  4 10:51:51 2023 

Loaded XYZ file as a pandas dataframe. Sat Mar  4 10:56:19 2023 

Created geodataframe from non-NoData points. Sat Mar  4 10:56:20 2023 


Joined zone ID onto vectorized raster cells. Sat Mar  4 10:58:40 2023 

                  Z   Sett_ID
154789375  0.366830     14108
153759932  0.714932      7236
157698170  1.283338     25393
168813180  0.277252     22200
158684949  0.85252

Finished gdal.Translate() for year 2005. Sat Mar  4 11:57:39 2023 

Loaded XYZ file as a pandas dataframe. Sat Mar  4 12:02:25 2023 

Created geodataframe from non-NoData points. Sat Mar  4 12:02:26 2023 


Joined zone ID onto vectorized raster cells. Sat Mar  4 12:05:00 2023 

                  Z   Sett_ID
152296984  0.260291  21474836
155994426  0.980247     32773
155330276  0.436872     28329
107714095  0.125193    128614
151063309  0.068720     91945
149687322  0.933372     51723
120667565  0.134403     68400
166838139  0.628611      4108
132052207  0.029496     68069
89543624   0.097904    295995

Exported as table. Sat Mar  4 12:05:10 2023 


Desired aggregation methods applied to settlement level, year 2005. Sat Mar  4 12:05:10 2023 

      Sett_ID  POPct2000     POPsum2000  POPct2001     POPsum2001  POPct2002  \
885     38009     1676.0     387.261782     1676.0     380.783746     1676.0   
17         76      611.0      87.279992      611.0      84.341481      611.0   
439     

Finished gdal.Translate() for year 2008. Sat Mar  4 12:37:22 2023 

Loaded XYZ file as a pandas dataframe. Sat Mar  4 12:42:13 2023 

Created geodataframe from non-NoData points. Sat Mar  4 12:42:13 2023 


Joined zone ID onto vectorized raster cells. Sat Mar  4 12:44:53 2023 

                  Z  Sett_ID
86625144   0.377989   302299
42436782   0.045092   333100
166933782  0.522417    38019
132903007  0.632899    65860
162072758  0.270004    38201
154581202  0.673421    14131
95593278   0.196823   249157
121176209  0.543752   131795
155475704  0.457571    28329
164003344  0.411325    23383

Exported as table. Sat Mar  4 12:45:02 2023 


Desired aggregation methods applied to settlement level, year 2008. Sat Mar  4 12:45:02 2023 

      Sett_ID  POPct2000   POPsum2000  POPct2001   POPsum2001  POPct2002  \
1282   127636     1344.0   521.428076     1344.0   722.995086     1344.0   
709     32830      997.0   475.385926      997.0   453.969880      997.0   
193      7367     1975.0   658.

Finished gdal.Translate() for year 2011. Sat Mar  4 13:17:25 2023 

Loaded XYZ file as a pandas dataframe. Sat Mar  4 13:22:03 2023 

Created geodataframe from non-NoData points. Sat Mar  4 13:22:03 2023 


Joined zone ID onto vectorized raster cells. Sat Mar  4 13:24:37 2023 

                  Z  Sett_ID
99062231   0.666328   145866
168739745  0.378128    21166
119940271  0.504937   131856
130169730  0.293970   107500
101440404  0.306989   234648
137795424  0.361026    60768
155588044  0.266032     7679
175199578  0.542435        2
133422881  1.248143    65797
144123675  0.078337    81231

Exported as table. Sat Mar  4 13:24:46 2023 


Desired aggregation methods applied to settlement level, year 2011. Sat Mar  4 13:24:46 2023 

      Sett_ID  POPct2000   POPsum2000  POPct2001   POPsum2001  POPct2002  \
1320   127757      750.0   448.506403      750.0   603.177643      750.0   
561     23480     1378.0  1015.904689     1378.0  1231.836441     1378.0   
1151    60896     1110.0   440.


Exported as table. Sat Mar  4 13:51:14 2023 


Desired aggregation methods applied to settlement level, year 2013. Sat Mar  4 13:51:14 2023 

      Sett_ID  POPct2000   POPsum2000  POPct2001   POPsum2001  POPct2002  \
879     37980     1511.0   263.817148     1511.0   268.559635     1511.0   
406     18480      553.0   378.627814      553.0   412.019413      553.0   
701     32801     2858.0  2108.349583     2858.0  2283.998979     2858.0   
617     25445      973.0   491.827793      973.0   464.035705      973.0   
1228    91878     3329.0   833.307844     3329.0  1352.205955     3329.0   
1635   234525     1404.0   113.572933     1404.0    97.152483     1404.0   
1804   326068      937.0    47.740954      937.0    77.699560      937.0   
452     21154      761.0   187.308857      761.0   186.240750      761.0   
668     28617      506.0   113.052985      506.0   156.839133      506.0   
1431   131770     1263.0   434.327962     1263.0   530.202190     1263.0   

       POPsum2002  P

Saved to file. Sat Mar  4 14:17:41 2023 



---

## 9. PREPARE YEARLY DATASETS: NIGHTTIME LIGHTS

### 9.1 Reclassify with settlement buffer mask.
Reclassify so that we only need to work with cells within X distance of settlements. The two NTL sources have already been reprojected in a separate script, and cropped to Central & Western Africa.

In [18]:
MaskByZone(MaskPath = r'Results/Catchment.gpkg', MaskLayerName = 'Buff250m_2015', 
          SourceFolder = r"NTL/SourceFiles", DestFolder = 'NTL')

['D_1999_avg.tif', 'D_1999_cfc.tif', 'D_2000_avg.tif', 'D_2000_cfc.tif', 'D_2001_avg.tif', 'D_2001_cfc.tif', 'D_2002_avg.tif', 'D_2002_cfc.tif', 'D_2003_avg.tif', 'D_2003_cfc.tif', 'D_2004_avg.tif', 'D_2004_cfc.tif', 'D_2005_avg.tif', 'D_2005_cfc.tif', 'D_2006_avg.tif', 'D_2006_cfc.tif', 'D_2007_avg.tif', 'D_2007_cfc.tif', 'D_2008_avg.tif', 'D_2008_cfc.tif', 'D_2009_avg.tif', 'D_2009_cfc.tif', 'D_2010_avg.tif', 'D_2010_cfc.tif', 'D_2011_avg.tif', 'D_2011_cfc.tif', 'D_2012_avg.tif', 'D_2012_cfc.tif', 'D_2013_avg.tif', 'D_2013_cfc.tif', 'V_2012_avg.tif', 'V_2012_cfc.tif', 'V_2013_avg.tif', 'V_2013_cfc.tif', 'V_2014_avg.tif', 'V_2014_cfc.tif', 'V_2015_avg.tif', 'V_2015_cfc.tif']
Source projection:  None
Destination projection:  Africa_Albers_Equal_Area_Conic
Finished gdal.Warp() for D_1999_avg.tif. Sat Mar  4 14:17:44 2023 

We warped the data, so we'll use that file for next step.
Finished rasterio.mask.mask() for D_1999_avg.tif. Sat Mar  4 14:17:44 2023 

Written to file. Sat Mar  4 14:

Finished gdal.Warp() for D_2009_avg.tif. Sat Mar  4 14:18:28 2023 

We warped the data, so we'll use that file for next step.
Finished rasterio.mask.mask() for D_2009_avg.tif. Sat Mar  4 14:18:29 2023 

Written to file. Sat Mar  4 14:18:29 2023 

Removed intermediate file. Sat Mar  4 14:18:29 2023 

Source projection:  None
Destination projection:  Africa_Albers_Equal_Area_Conic
Finished gdal.Warp() for D_2009_cfc.tif. Sat Mar  4 14:18:30 2023 

We warped the data, so we'll use that file for next step.
Finished rasterio.mask.mask() for D_2009_cfc.tif. Sat Mar  4 14:18:30 2023 

Written to file. Sat Mar  4 14:18:30 2023 

Removed intermediate file. Sat Mar  4 14:18:30 2023 

Source projection:  None
Destination projection:  Africa_Albers_Equal_Area_Conic
Finished gdal.Warp() for D_2010_avg.tif. Sat Mar  4 14:18:32 2023 

We warped the data, so we'll use that file for next step.
Finished rasterio.mask.mask() for D_2010_avg.tif. Sat Mar  4 14:18:33 2023 

Written to file. Sat Mar  4 14:18

In [19]:
print(os.listdir('NTL/'))

['Msk_D_1999_avg.tif', 'Msk_D_1999_cfc.tif', 'Msk_D_2000_avg.tif', 'Msk_D_2000_cfc.tif', 'Msk_D_2001_avg.tif', 'Msk_D_2001_cfc.tif', 'Msk_D_2002_avg.tif', 'Msk_D_2002_cfc.tif', 'Msk_D_2003_avg.tif', 'Msk_D_2003_cfc.tif', 'Msk_D_2004_avg.tif', 'Msk_D_2004_cfc.tif', 'Msk_D_2005_avg.tif', 'Msk_D_2005_cfc.tif', 'Msk_D_2006_avg.tif', 'Msk_D_2006_cfc.tif', 'Msk_D_2007_avg.tif', 'Msk_D_2007_cfc.tif', 'Msk_D_2008_avg.tif', 'Msk_D_2008_cfc.tif', 'Msk_D_2009_avg.tif', 'Msk_D_2009_cfc.tif', 'Msk_D_2010_avg.tif', 'Msk_D_2010_cfc.tif', 'Msk_D_2011_avg.tif', 'Msk_D_2011_cfc.tif', 'Msk_D_2012_avg.tif', 'Msk_D_2012_cfc.tif', 'Msk_D_2013_avg.tif', 'Msk_D_2013_cfc.tif', 'Msk_V_2012_avg.tif', 'Msk_V_2012_cfc.tif', 'Msk_V_2013_avg.tif', 'Msk_V_2013_cfc.tif', 'Msk_V_2014_avg.tif', 'Msk_V_2014_cfc.tif', 'Msk_V_2015_avg.tif', 'Msk_V_2015_cfc.tif', 'SourceFiles']


### 9.2 Raster values summarized by settlement.
1. Convert each annualized raster to .xyz, 
2. then bring them to vector space and assign their Sett_ID,
3. and finally, aggregate the value as appropriate to the settlement level and save table to file.

XYZ is similar to .csv. Raster cell centers are stored as x and y, and their value is stored as z.

In [20]:
Settlements = gpd.read_file(r'Results/SETTLEMENTS.gpkg', layer='SETTLEMENTS_equalarea')[['Sett_ID', 'geometry']]

In [21]:
BatchZonalStats(FolderName = 'NTL', Zones=Settlements)

['Msk_D_1999_avg.tif', 'Msk_D_1999_cfc.tif', 'Msk_D_2000_avg.tif', 'Msk_D_2000_cfc.tif', 'Msk_D_2001_avg.tif', 'Msk_D_2001_cfc.tif', 'Msk_D_2002_avg.tif', 'Msk_D_2002_cfc.tif', 'Msk_D_2003_avg.tif', 'Msk_D_2003_cfc.tif', 'Msk_D_2004_avg.tif', 'Msk_D_2004_cfc.tif', 'Msk_D_2005_avg.tif', 'Msk_D_2005_cfc.tif', 'Msk_D_2006_avg.tif', 'Msk_D_2006_cfc.tif', 'Msk_D_2007_avg.tif', 'Msk_D_2007_cfc.tif', 'Msk_D_2008_avg.tif', 'Msk_D_2008_cfc.tif', 'Msk_D_2009_avg.tif', 'Msk_D_2009_cfc.tif', 'Msk_D_2010_avg.tif', 'Msk_D_2010_cfc.tif', 'Msk_D_2011_avg.tif', 'Msk_D_2011_cfc.tif', 'Msk_D_2012_avg.tif', 'Msk_D_2012_cfc.tif', 'Msk_D_2013_avg.tif', 'Msk_D_2013_cfc.tif', 'Msk_V_2012_avg.tif', 'Msk_V_2012_cfc.tif', 'Msk_V_2013_avg.tif', 'Msk_V_2013_cfc.tif', 'Msk_V_2014_avg.tif', 'Msk_V_2014_cfc.tif', 'Msk_V_2015_avg.tif', 'Msk_V_2015_cfc.tif']
       Sett_ID
0            1
1            2
2            3
3            5
4            6
...        ...
1934    349427
1935    350163
1936    350168
1937    35019


Exported as table. Sat Mar  4 14:29:19 2023 


Count of cloud-free observations averaged to settlement level, year 2000. Sat Mar  4 14:29:19 2023 

      Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
448     21148         37.0  1.259045e+40  3.402823e+38  3.402823e+38   
1203    80863        470.0  1.592521e+41  3.388343e+38  3.402823e+38   
1095    51759         48.0  1.633355e+40  3.402823e+38  3.402823e+38   
1197    68952         49.0  1.633355e+40  3.333378e+38  3.402823e+38   
1917   338611          8.0  2.381976e+39  2.977471e+38  3.402823e+38   
414     18504         24.0  8.166776e+39  3.402823e+38  3.402823e+38   
1264   107502       6326.0  2.152286e+42  3.402286e+38  3.402823e+38   
951     40631        103.0  3.504908e+40  3.402823e+38  3.402823e+38   
57        413         21.0  6.125082e+39  2.916706e+38  3.402823e+38   
530     23356        110.0  3.743106e+40  3.402823e+38  3.402823e+38   

      NTLminD_1999  NTLcfc_D_1999  NTLctD_2000  NTLsumD_20

Finished gdal.Translate() for year 2002. Sat Mar  4 14:34:04 2023 

Loaded XYZ file as a pandas dataframe. Sat Mar  4 14:34:08 2023 

Created geodataframe from non-NoData points. Sat Mar  4 14:34:08 2023 


Joined zone ID onto vectorized raster cells. Sat Mar  4 14:36:18 2023 

                    Z  Sett_ID
1531116  3.402823e+38    28703
453857   3.402823e+38   333078
1718098  3.402823e+38     4185
756863   3.402823e+38   244062
1661071  3.402823e+38   100168
1846859  3.402823e+38    40548
313590   3.402823e+38   333092
345505   3.402823e+38   335702
1411657  3.402823e+38    87760
1483316  3.402823e+38    87636

Exported as table. Sat Mar  4 14:36:25 2023 


Desired aggregation methods applied to settlement level, year 2002. Sat Mar  4 14:36:25 2023 

      Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
1692   248792       1425.0  4.828606e+41  3.388496e+38  3.402823e+38   
359     14454         18.0  5.784800e+39  3.213778e+38  3.402823e+38   
1059    51517        

Finished gdal.Translate() for year 2003. Sat Mar  4 14:38:46 2023 

Loaded XYZ file as a pandas dataframe. Sat Mar  4 14:38:50 2023 

Created geodataframe from non-NoData points. Sat Mar  4 14:38:51 2023 


Joined zone ID onto vectorized raster cells. Sat Mar  4 14:40:59 2023 

                    Z  Sett_ID
914653   3.402823e+38   263056
678343   3.402823e+38   333092
288488   3.402823e+38   333873
21352    3.402823e+38   333873
964273   3.402823e+38   244061
13719    3.402823e+38   333873
1151275  3.402823e+38   107509
837235   3.402823e+38   244248
1002351  3.402823e+38   263056
751687   3.402823e+38   244248

Exported as table. Sat Mar  4 14:41:06 2023 


Desired aggregation methods applied to settlement level, year 2003. Sat Mar  4 14:41:06 2023 

      Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
913     38126        119.0  4.015332e+40  3.374228e+38  3.402823e+38   
1580   152551       2642.0  8.986857e+41  3.401535e+38  3.402823e+38   
1443   131847        

Finished gdal.Translate() for year 2004. Sat Mar  4 14:43:29 2023 

Loaded XYZ file as a pandas dataframe. Sat Mar  4 14:43:34 2023 

Created geodataframe from non-NoData points. Sat Mar  4 14:43:34 2023 


Joined zone ID onto vectorized raster cells. Sat Mar  4 14:45:40 2023 

                    Z  Sett_ID
966255   3.402823e+38   213078
100559   3.402823e+38   335702
1391686  3.402823e+38   107501
148676   3.402823e+38   335702
1166808  3.402823e+38   180711
1873505  3.402823e+38    40548
855475   3.402823e+38   244248
1353677  3.402823e+38    67973
1384886  3.402823e+38    87797
1337825  3.402823e+38    87797

Exported as table. Sat Mar  4 14:45:47 2023 


Desired aggregation methods applied to settlement level, year 2004. Sat Mar  4 14:45:47 2023 

      Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
416     18515        548.0  1.864747e+41  3.402823e+38  3.402823e+38   
660     28568          NaN           NaN           NaN           NaN   
1201    80861        

Finished gdal.Translate() for year 2005. Sat Mar  4 14:48:09 2023 

Loaded XYZ file as a pandas dataframe. Sat Mar  4 14:48:13 2023 

Created geodataframe from non-NoData points. Sat Mar  4 14:48:14 2023 


Joined zone ID onto vectorized raster cells. Sat Mar  4 14:50:19 2023 

                    Z  Sett_ID
680945   3.402823e+38   350194
171073   3.402823e+38   335702
1180385  3.402823e+38   128050
1816116  3.402823e+38   100168
517494   3.402823e+38   244062
32264    3.402823e+38   335702
564201   3.402823e+38   333864
1596401  3.402823e+38    28669
204891   3.402823e+38   333873
1724993  3.402823e+38    91913

Exported as table. Sat Mar  4 14:50:26 2023 


Desired aggregation methods applied to settlement level, year 2005. Sat Mar  4 14:50:27 2023 

      Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
1841   337712          8.0  2.381976e+39  2.977471e+38  3.402823e+38   
1446   131854         26.0  8.847341e+39  3.402823e+38  3.402823e+38   
1355   128138        

Finished gdal.Translate() for year 2006. Sat Mar  4 14:52:51 2023 

Loaded XYZ file as a pandas dataframe. Sat Mar  4 14:52:55 2023 

Created geodataframe from non-NoData points. Sat Mar  4 14:52:55 2023 


Joined zone ID onto vectorized raster cells. Sat Mar  4 14:55:02 2023 

                    Z  Sett_ID
1413978  3.402823e+38   102323
1435699  3.402823e+38    65834
18258    3.402823e+38   335702
905299   3.402823e+38   314094
1383124  3.402823e+38   107501
934163   3.402823e+38   313655
7557     3.402823e+38   335702
1419701  3.402823e+38    65864
588566   3.402823e+38   333092
1490818  3.402823e+38    91880

Exported as table. Sat Mar  4 14:55:09 2023 


Desired aggregation methods applied to settlement level, year 2006. Sat Mar  4 14:55:09 2023 

      Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
1196    68883       1182.0  4.022137e+41  3.402823e+38  3.402823e+38   
1037    47735         53.0  1.803496e+40  3.402823e+38  3.402823e+38   
374     15464        

Finished gdal.Translate() for year 2007. Sat Mar  4 14:57:30 2023 

Loaded XYZ file as a pandas dataframe. Sat Mar  4 14:57:34 2023 

Created geodataframe from non-NoData points. Sat Mar  4 14:57:34 2023 


Joined zone ID onto vectorized raster cells. Sat Mar  4 14:59:39 2023 

                    Z  Sett_ID
1514452  3.402823e+38   102273
55230    3.402823e+38   333873
689296   3.402823e+38   330617
665284   3.402823e+38   333092
1671959  3.402823e+38     7317
53138    3.402823e+38   333873
698753   3.402823e+38   330617
381477   3.402823e+38   333864
929388   3.402823e+38   256109
820401   3.402823e+38   190157

Exported as table. Sat Mar  4 14:59:46 2023 


Desired aggregation methods applied to settlement level, year 2007. Sat Mar  4 14:59:46 2023 

      Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
839     37816        131.0  4.457699e+40  3.402823e+38  3.402823e+38   
970     42515        360.0  1.214808e+41  3.374467e+38  3.402823e+38   
797     35071        

Finished gdal.Translate() for year 2008. Sat Mar  4 15:02:11 2023 

Loaded XYZ file as a pandas dataframe. Sat Mar  4 15:02:15 2023 

Created geodataframe from non-NoData points. Sat Mar  4 15:02:15 2023 


Joined zone ID onto vectorized raster cells. Sat Mar  4 15:02:15 2023 

                Z   Sett_ID
1039368  2.905406    255879
1629375  3.031746     13874
1359405  2.657143     98000
1211840  3.563380    193591
1606993  2.971429  21474836
1726701  2.867647      4092
1640092  2.791045     32785
1335320  2.794118    131593
1045338  3.121212    212734
1603706  2.825397     18456

Exported as table. Sat Mar  4 15:02:15 2023 


Desired aggregation methods applied to settlement level, year 2008. Sat Mar  4 15:02:15 2023 

       Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
824      37786         23.0  7.826494e+39  3.402823e+38  3.402823e+38   
12          53        107.0  3.606993e+40  3.371021e+38  3.402823e+38   
239      13968         46.0  1.565299e+40  3.402823

Finished gdal.Translate() for year 2009. Sat Mar  4 15:02:26 2023 

Loaded XYZ file as a pandas dataframe. Sat Mar  4 15:02:28 2023 

Created geodataframe from non-NoData points. Sat Mar  4 15:02:28 2023 


Joined zone ID onto vectorized raster cells. Sat Mar  4 15:02:29 2023 

                Z  Sett_ID
1629379  3.176471    14446
1727767  2.742857     4092
1760001  3.129032    38088
1763075  3.000000    21078
1637893  4.379310    18121
1695678  3.482759     4091
1679691  3.290323    25273
1638997  4.000000    13874
1435805  2.807692    65807
1716285  3.900000    40544

Exported as table. Sat Mar  4 15:02:29 2023 


Desired aggregation methods applied to settlement level, year 2009. Sat Mar  4 15:02:29 2023 

      Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
1621   212430        264.0  8.949426e+40  3.389934e+38  3.402823e+38   
1686   239384        118.0  4.015332e+40  3.402823e+38  3.402823e+38   
1811   333078        152.0  4.934094e+40  3.246114e+38  3.402823e

Finished gdal.Translate() for year 2010. Sat Mar  4 15:02:39 2023 

Loaded XYZ file as a pandas dataframe. Sat Mar  4 15:02:42 2023 

Created geodataframe from non-NoData points. Sat Mar  4 15:02:42 2023 


Joined zone ID onto vectorized raster cells. Sat Mar  4 15:02:42 2023 

                 Z  Sett_ID
994359    3.774648   249199
1604775   4.307693    18456
1592839   4.259740    51696
1735239   5.500000     4096
1196387   6.900000   127629
1622964   3.808219    14258
455996   10.658537   333078
1240177   9.442857   131507
1239097   7.242857   131507
1828318   4.375000    21108

Exported as table. Sat Mar  4 15:02:42 2023 


Desired aggregation methods applied to settlement level, year 2010. Sat Mar  4 15:02:42 2023 

      Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
1295   127679         19.0  6.125082e+39  3.223727e+38  3.402823e+38   
1312   127709         20.0  6.465365e+39  3.232682e+38  3.402823e+38   
58        415         27.0  8.847341e+39  3.276793e+38

Finished gdal.Translate() for year 2011. Sat Mar  4 15:02:53 2023 

Loaded XYZ file as a pandas dataframe. Sat Mar  4 15:02:56 2023 

Created geodataframe from non-NoData points. Sat Mar  4 15:02:56 2023 


Joined zone ID onto vectorized raster cells. Sat Mar  4 15:02:56 2023 

          Z  Sett_ID
1828309   6    21075
1610162   4    14339
1241250   9   131507
1266181   3   170456
1262641   4   131889
1690515   7    47551
1095170  10   186498
1613342   4    14303
1721429   6    35033
1681812   4    25382

Exported as table. Sat Mar  4 15:02:56 2023 


Desired aggregation methods applied to settlement level, year 2011. Sat Mar  4 15:02:56 2023 

      Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
873     37964         32.0  1.020847e+40  3.190147e+38  3.402823e+38   
796     35064         69.0  2.313920e+40  3.353507e+38  3.402823e+38   
1045    47772         41.0  1.361129e+40  3.319828e+38  3.402823e+38   
512     22279         44.0  1.497242e+40  3.402823e+38  3.4

Finished gdal.Translate() for year 2012. Sat Mar  4 15:03:06 2023 

Loaded XYZ file as a pandas dataframe. Sat Mar  4 15:03:09 2023 

Created geodataframe from non-NoData points. Sat Mar  4 15:03:09 2023 


Joined zone ID onto vectorized raster cells. Sat Mar  4 15:03:09 2023 

                 Z  Sett_ID
1243378  56.808220   131507
1262323   4.516129   107503
1245515  62.743244   131507
1655332  10.209678    42500
1610132   3.565217    17085
1249796  52.837837   131507
1240179  18.594202   131507
1741842   3.854839    38144
1247649  38.013699   131507
1035090  11.186440   255879

Exported as table. Sat Mar  4 15:03:09 2023 


Desired aggregation methods applied to settlement level, year 2012. Sat Mar  4 15:03:09 2023 

      Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
1883   337978          1.0  3.402823e+38  3.402823e+38  3.402823e+38   
545     23386         19.0  6.125082e+39  3.223727e+38  3.402823e+38   
1252    98000       8220.0  2.796440e+42  3.401996e+38

Finished gdal.Translate() for year 2013. Sat Mar  4 15:03:20 2023 

Loaded XYZ file as a pandas dataframe. Sat Mar  4 15:03:23 2023 

Created geodataframe from non-NoData points. Sat Mar  4 15:03:23 2023 


Joined zone ID onto vectorized raster cells. Sat Mar  4 15:03:23 2023 

                 Z  Sett_ID
1064972   3.805970   252849
1082424   3.621212   262862
1740583  17.577465     4096
1857061   6.982456        2
1036029   5.493333   145226
1649876   3.706667    47835
1734170   4.851351     4096
1616582   4.027027    14426
1617664   3.840580    32849
1246594  41.191177   131507

Exported as table. Sat Mar  4 15:03:23 2023 


Desired aggregation methods applied to settlement level, year 2013. Sat Mar  4 15:03:23 2023 

      Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
1236    91950       1759.0  5.982164e+41  3.400889e+38  3.402823e+38   
580     25033          5.0  1.701412e+39  3.402823e+38  3.402823e+38   
107      4125         45.0  1.531271e+40  3.402823e+38

Finished gdal.Translate() for year 2012. Sat Mar  4 15:03:45 2023 

Loaded XYZ file as a pandas dataframe. Sat Mar  4 15:03:55 2023 

Created geodataframe from non-NoData points. Sat Mar  4 15:03:55 2023 


Joined zone ID onto vectorized raster cells. Sat Mar  4 15:03:56 2023 

                 Z  Sett_ID
5002999   7.754341   131507
6896785   0.511719    35033
4921119   1.149692   107505
4042669   1.894869   278172
4981592   3.031257   131507
6960712   1.595811     4096
4998716  10.446504   131507
7315589   0.025682    21075
6903200   0.129424    35033
6952167   1.045245     4096

Exported as table. Sat Mar  4 15:03:56 2023 


Desired aggregation methods applied to settlement level, year 2012. Sat Mar  4 15:03:56 2023 

      Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
1715   249199       1041.0  3.538936e+41  3.399555e+38  3.402823e+38   
272     14225         38.0  1.259045e+40  3.313275e+38  3.402823e+38   
1571   152034       1682.0  5.709938e+41  3.394731e+38

Finished gdal.Translate() for year 2013. Sat Mar  4 15:04:36 2023 

Loaded XYZ file as a pandas dataframe. Sat Mar  4 15:04:47 2023 

Created geodataframe from non-NoData points. Sat Mar  4 15:04:47 2023 


Joined zone ID onto vectorized raster cells. Sat Mar  4 15:04:47 2023 

                 Z  Sett_ID
4966659   1.410204   131507
4821345  60.206497   127794
7056953   0.385291    21632
4981614  19.205040   131507
5017946   0.871938   131506
4996588   0.719036   131507
4998706   5.409372   131507
4977349   3.330688   131507
2816508   2.330354   349427
7003571   1.387360    23769

Exported as table. Sat Mar  4 15:04:47 2023 


Desired aggregation methods applied to settlement level, year 2013. Sat Mar  4 15:04:47 2023 

      Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
1686   239384        118.0  4.015332e+40  3.402823e+38  3.402823e+38   
655     28558          4.0  1.361129e+39  3.402823e+38  3.402823e+38   
1318   127742         40.0  1.327101e+40  3.317753e+38

Finished gdal.Translate() for year 2014. Sat Mar  4 15:05:27 2023 

Loaded XYZ file as a pandas dataframe. Sat Mar  4 15:05:37 2023 

Created geodataframe from non-NoData points. Sat Mar  4 15:05:37 2023 


Joined zone ID onto vectorized raster cells. Sat Mar  4 15:05:38 2023 

                Z  Sett_ID
4505219  0.219005   137572
4973076  3.301822   131507
4936733  0.912548   131507
4941004  0.239285   131507
6969262  2.005328     4096
4968779  1.543735   131507
4973070  5.075675   131507
4955966  1.227142   131507
5015811  1.217330   131506
4990150  2.050503   131507

Exported as table. Sat Mar  4 15:05:38 2023 


Desired aggregation methods applied to settlement level, year 2014. Sat Mar  4 15:05:38 2023 

      Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
832     37794         36.0  1.190988e+40  3.308301e+38  3.402823e+38   
1096    51765         77.0  2.586146e+40  3.358631e+38  3.402823e+38   
241     13998        227.0  7.724409e+40  3.402823e+38  3.402823e

Finished gdal.Translate() for year 2015. Sat Mar  4 15:06:18 2023 

Loaded XYZ file as a pandas dataframe. Sat Mar  4 15:06:29 2023 

Created geodataframe from non-NoData points. Sat Mar  4 15:06:29 2023 


Joined zone ID onto vectorized raster cells. Sat Mar  4 15:06:29 2023 

                 Z  Sett_ID
4938872   0.314350   131507
6967288   3.683770    25093
5998903   1.316735    60380
4981615  32.401726   131507
4977340  28.645947   131507
4034133   1.451429   278172
4046954   0.564598   278172
4964532   0.876384   131507
4962372   4.775517   131507
5000861   5.082416   131507

Exported as table. Sat Mar  4 15:06:29 2023 


Desired aggregation methods applied to settlement level, year 2015. Sat Mar  4 15:06:29 2023 

      Sett_ID  NTLctD_1999  NTLsumD_1999  NTLavgD_1999  NTLmaxD_1999  \
1019    47554       1671.0  5.675910e+41  3.396714e+38  3.402823e+38   
861     37906        187.0  6.363280e+40  3.402823e+38  3.402823e+38   
1369   128664         13.0  4.423671e+39  3.402823e+38

Saved to file. Sat Mar  4 15:06:54 2023 



---

## 10. FLOOD EXPOSURE BY RETURN PERIOD

### 10.1 Calculate Expected Annual Depth (EAD) using exceedance probabilities of every flood return period.

##### Flood layers

In [15]:
FloodFolder = os.path.join(ProjectFolder, 'Flood')
SourceFolder = os.path.join(FloodFolder, 'fluvial_undefended')

In [17]:
InRasters = os.listdir(SourceFolder)
InRasters

['FU_1in10.tif',
 'FU_1in100.tif',
 'FU_1in1000.tif',
 'FU_1in20.tif',
 'FU_1in200.tif',
 'FU_1in250.tif',
 'FU_1in5.tif',
 'FU_1in50.tif',
 'FU_1in500.tif',
 'FU_1in75.tif']

In [25]:
Exceedances = []
    
for Raster in InRasters:
    InPath = os.path.join(SourceFolder, Raster)
    RP = re.sub('\D', '', Raster)[1:] # Get the return period
    NewFileName = Raster.replace('.tif', '_EXC.tif')
    OutPath = os.path.join(FloodFolder, NewFileName)
    
    Calc = "(1/" + RP + ")*A"

    calcShell(A=InPath, OutFile=OutPath, Calculation = Calc)
    Exceedances = Exceedances + [NewFileName]
    
print('Done with list. New flood set: %s' % Exceedances)

Running for Q:\GIS\povertyequity\urban_growth\Chad\Flood\fluvial_undefended\FU_1in10.tif. Tue Mar 14 11:19:28 2023
Ran in shell. See OutFile folder to inspect results. Tue Mar 14 11:19:43 2023
Running for Q:\GIS\povertyequity\urban_growth\Chad\Flood\fluvial_undefended\FU_1in100.tif. Tue Mar 14 11:19:43 2023
Ran in shell. See OutFile folder to inspect results. Tue Mar 14 11:19:57 2023
Running for Q:\GIS\povertyequity\urban_growth\Chad\Flood\fluvial_undefended\FU_1in1000.tif. Tue Mar 14 11:19:57 2023
Ran in shell. See OutFile folder to inspect results. Tue Mar 14 11:20:11 2023
Running for Q:\GIS\povertyequity\urban_growth\Chad\Flood\fluvial_undefended\FU_1in20.tif. Tue Mar 14 11:20:11 2023
Ran in shell. See OutFile folder to inspect results. Tue Mar 14 11:20:25 2023
Running for Q:\GIS\povertyequity\urban_growth\Chad\Flood\fluvial_undefended\FU_1in200.tif. Tue Mar 14 11:20:25 2023
Ran in shell. See OutFile folder to inspect results. Tue Mar 14 11:20:38 2023
Running for Q:\GIS\povertyequit

In [36]:
# gdal_calc doesn't always take well to adding together a large number of files, so we'll do it in 2 batches.

Calc = 'A+B+C+D+E'
OutName = 'Batch1.tif'

A = os.path.join(FloodFolder, Exceedances[0])
B = os.path.join(FloodFolder, Exceedances[1])
C = os.path.join(FloodFolder, Exceedances[2])
D = os.path.join(FloodFolder, Exceedances[3])
E = os.path.join(FloodFolder, Exceedances[4])

calcShell(A=A, B=B, C=C, D=D, E=E,
          OutFile = os.path.join(FloodFolder, OutName), 
          Calculation = Calc)


Running for Q:\GIS\povertyequity\urban_growth\Chad\Flood\FU_1in10_EXC.tif. Tue Mar 14 11:37:07 2023
Ran in shell. See OutFile folder to inspect results. Tue Mar 14 11:40:27 2023


In [37]:
Calc = 'A+B+C+D+E+F'
OutName = 'FU_ExpectedAnnualDepth.tif'

A = os.path.join(FloodFolder, 'Batch1.tif')
B = os.path.join(FloodFolder, Exceedances[5])
C = os.path.join(FloodFolder, Exceedances[6])
D = os.path.join(FloodFolder, Exceedances[7])
E = os.path.join(FloodFolder, Exceedances[8])
F = os.path.join(FloodFolder, Exceedances[9])

calcShell(A=A, B=B, C=C, D=D, E=E, F=F,
          OutFile = os.path.join(FloodFolder, OutName), 
          Calculation = Calc)

Running for Q:\GIS\povertyequity\urban_growth\Chad\Flood\Batch1.tif. Tue Mar 14 11:52:24 2023
Ran in shell. See OutFile folder to inspect results. Tue Mar 14 11:55:19 2023


### 10.2 Reclassify and resample flood data and buildup data in preparation for the impact calculation.

##### Reclassify flood as a binary: flooded / not-flooded

In [40]:
InPath = os.path.join(FloodFolder, OutName)
OutPath = os.path.join(FloodFolder, 'FU_EAD_reclassed.tif')

[xsize, ysize, geotransform, geoproj, Z] = readRaster(InPath)

Z[Z<0.15] = 0 # Not-flooded category. This includes no data cells.
Z[Z>=0.15] = 1 # Flooded category. This includes permanent water bodies.

writeRaster(OutPath,geotransform,geoproj,Z)
InPath = OutPath = None

print('Finished reclassifying. %s' % time.ctime())

Finished reclassifying. Tue Mar 14 13:26:39 2023


##### Buildup

In [41]:
WSFE = 'WSFE_equalarea.tif'
WSFEPath = os.path.join(ProjectFolder, 'Buildup', WSFE)
OutPath = os.path.join(FloodFolder, WSFE.replace('equalarea.tif', 'simplified.tif'))

[xsize, ysize, geotransform, geoproj, Z] = readRaster(WSFEPath)

np.putmask(Z, Z>0, Z-1984) # All years now converted to at most 2 digits: 1-31. (All non-buildup = 0)

writeRaster(OutPath,geotransform,geoproj,Z)

print('\nSimplified buildup file: %s' % OutPath)


Simplified buildup file: Q:\GIS\povertyequity\urban_growth\Chad\Flood\WSFE_simplified.tif


##### Resample flood to match buildup

In [42]:
WSFEPath = os.path.join(FloodFolder, 'WSFE_simplified.tif') 

RasterPath = os.path.join(FloodFolder, 'FU_EAD_reclassed.tif')
OutPath = os.path.join(FloodFolder, 'FU_EAD_resampled.tif')
resampleRaster(RasterPath, WSFEPath, OutPath)
    
print('Resampled to match WSFE. %s' % time.ctime())

Loading for Q:\GIS\povertyequity\urban_growth\Chad\Flood\FU_EAD_reclassed.tif. Tue Mar 14 13:37:10 2023
---Specs to match to: 
 PROJCS["Africa_Albers_Equal_Area_Conic",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",0],PARAMETER["longitude_of_center",25],PARAMETER["standard_parallel_1",20],PARAMETER["standard_parallel_2",-23],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]] 
 (-1208067.6740575756, 29.56099669088672, 0.0, 2693319.1828761376, 0.0, -29.56099669088672) 
 37348 
 61455 

---Created raster file for upsampled version. Tue Mar 14 13:37:20 2023
---Resampled values onto an empty raster matching the dimensions of the buildup layer. Tue M

### 10.3 Mask out built areas that were not flooded.

In [45]:
# WSFEPath = os.path.join(FloodFolder, 'WSFE_simplified.tif') 
    
InPath = os.path.join(FloodFolder, 'FU_EAD_resampled.tif')
OutPath = os.path.join(FloodFolder, 'FU_EAD_WSFEimpact.tif')

calcShell(A=WSFEPath, B=InPath, OutFile=OutPath, Calculation="A*B", OutType=" --type=Byte")
    
print('Done. Only built-up cells that have been flooded remain.. %s' % time.ctime())

Running for Q:\GIS\povertyequity\urban_growth\Chad\Flood\WSFE_simplified.tif. Tue Mar 14 13:54:24 2023
Ran in shell. See OutFile folder to inspect results. Tue Mar 14 13:57:58 2023
Done. Only built-up cells that have been flooded remain.. Tue Mar 14 13:57:58 2023


### 10.4 Join with Settlements via concatenation
Using the serial method, combine settlement IDs with 1) WSFE year cells and 2) the flooded-only WSFE year cells under each scenario.

##### Rasterize the settlements we've created.

In [17]:
# WSFEPath = os.path.join(FloodFolder, 'WSFE_simplified.tif') 
OutSett = os.path.join(ResultsFolder, 'Settlements2015_rasterized.tif')
Settlements = gpd.read_file(r'Results/SETTLEMENTS.gpkg', layer='SETTLEMENTS_equalarea')[['Sett_ID', 'geometry']]

len_WSFE = 2 # We already know that the WSFE years are reclassified to 1-31, i.e. a max of 2 digits.

In [None]:
ShapeToRaster(Shapefile=Settlements, ValueVar="Sett_ID", MetaRasterPath=WSFEPath, OutFilePath=OutSett, NewDType = 'uint32')

##### Calculations

In [31]:
Calc = "(A*" + str(10**len_WSFE) + ")+B" 

FloodImpactPath = os.path.join(FloodFolder, 'FU_EAD_WSFEimpact.tif')
FloodSerialPath = os.path.join(FloodFolder, 'FU_Settlements_serial.tif')

#WSFEPath = os.path.join(FloodFolder, 'WSFE_simplified.tif')
WSFESerialPath = os.path.join(FloodFolder, 'WSFE_Settlements_serial.tif')

In [59]:
calcShell(A=OutSett, B=FloodImpactPath, OutFile=FloodSerialPath, Calculation=Calc)
calcShell(A=OutSett, B=WSFEPath, OutFile=WSFESerialPath, Calculation=Calc)

Running for Q:\GIS\povertyequity\urban_growth\Chad\Results\Settlements2015_rasterized.tif. Tue Mar 14 15:14:01 2023
Ran in shell. See OutFile folder to inspect results. Tue Mar 14 15:18:13 2023
Running for Q:\GIS\povertyequity\urban_growth\Chad\Results\Settlements2015_rasterized.tif. Tue Mar 14 15:18:13 2023
Ran in shell. See OutFile folder to inspect results. Tue Mar 14 15:20:25 2023


### 10.5 Vector math to split raster strings into Settlement and WSFE year assignments.

##### Vectorize

In [16]:
FloodVec = 'FloodedBuildup.shp' # Was having write issues when putting both in the same gpkg, so we're settling for .shp.
FloodVecPath = os.path.join(FloodFolder, FloodVec)
BuildVec = 'AllBuildup.shp' 
BuildVecPath = os.path.join(FloodFolder, BuildVec)

In [33]:
RasterToShapefile(InRasterPath=FloodSerialPath, OutFilePath=FloodVecPath, 
                  OutName='', VariableName='gridcode')
RasterToShapefile(InRasterPath=WSFESerialPath, OutFilePath=BuildVecPath, 
                  OutName='', VariableName='gridcode')

PROJCS["Africa_Albers_Equal_Area_Conic",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",0],PARAMETER["longitude_of_center",25],PARAMETER["standard_parallel_1",20],PARAMETER["standard_parallel_2",-23],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]] 

 PROJCS["Africa_Albers_Equal_Area_Conic",
    GEOGCS["WGS 84",
        DATUM["WGS_1984",
            SPHEROID["WGS 84",6378137,298.257223563,
                AUTHORITY["EPSG","7030"]],
            AUTHORITY["EPSG","6326"]],
        PRIMEM["Greenwich",0],
        UNIT["degree",0.0174532925199433,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","4326"]],
    PROJECTION["Albers_Conic_

##### Split string into separate fields

In [17]:
Sett_rio = rasterio.open(os.path.join(ResultsFolder, 'Settlements2015_rasterized.tif')).read(1)
len_Sett = len(str(Sett_rio.max()))
Sett_rio = None

Fill = len_Sett + 2 # Add the digits stored as len_WSFE # or just write +2 since we already know the length of reclassed WSFE.

OutPackage = os.path.join(FloodFolder, 'FloodedSettlements.gpkg')

In [20]:
# Load newly created vectorized datasets.
for File in [FloodVec, BuildVec]:
    InObject = gpd.read_file(os.path.join(FloodFolder, File)).to_crs("ESRI:102022")
    print(InObject.info(), '\n\n', InObject.sample(10), '\n\n', InObject.crs, '\n\n', InObject['gridcode'].max())
    
    InObject['gridstring'] = InObject['gridcode'].astype(str).str.zfill(Fill)

    InObject['Sett_ID'] = InObject['gridstring'].str[:-2].astype(int) # Remove the last digits to get the Sett ID portion.
    InObject['year'] = InObject['gridstring'].str[-2:].astype(int) # Keep only the last digits to get the year portion.
    InObject['year'] = np.where(InObject['year'] > 0, InObject['year'] + 1984, InObject['year']) # Reclass back to year value.
    
    print('%s Serial split by year of buildup and Sett ID.\n\n' % time.ctime(), InObject.sample(10))
    
    # Remove features where year or settlement = 0.
    print("%s Before: %s\n" % (File, InObject.shape))
    InObject = InObject.loc[(InObject["year"] >1984) & (InObject["year"] < 2016) & (InObject["Sett_ID"] != 0)] 
    print("%s After: %s\n" % (File, InObject.shape))

    # Save intermediate file.
    InObject.to_file(driver='GPKG', filename=OutPackage, layer=File.replace('.shp', ''))

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 8946 entries, 0 to 8945
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   gridcode  8946 non-null   int64   
 1   geometry  8946 non-null   geometry
dtypes: geometry(1), int64(1)
memory usage: 139.9 KB
None 

       gridcode                                           geometry
3414  13150701  POLYGON ((-1043028.630 1424502.083, -1042969.5...
6425  13150601  POLYGON ((-1035963.551 1415633.784, -1035933.9...
3303  13150717  POLYGON ((-1040811.555 1424679.449, -1040781.9...
7052   8085929  POLYGON ((-864539.332 1212313.249, -864509.771...
6342  13150620  POLYGON ((-1035845.307 1415692.906, -1035815.7...
906   12762922  POLYGON ((-998155.037 1464645.916, -998125.476...
2536  13150730  POLYGON ((-1041254.970 1426660.036, -1041225.4...
7959   3283215  POLYGON ((-909767.656 1093537.164, -909678.973...
7756   2828611  POLYGON ((-906309.020 1106041.466, -906279.459...
785  

### 10.6 Group by settlement and count cells for each year.

##### Flooded buildings

In [33]:
Settlements = gpd.read_file(r'Results/SETTLEMENTS.gpkg', layer='SETTLEMENTS_equalarea')
AllSummaries = pd.DataFrame(Settlements).drop(columns='geometry')[['Sett_ID']]
Settlements = None

ValObject = pd.DataFrame(gpd.read_file(OutPackage, layer='FloodedBuildup'))[['Sett_ID', 'year']]

print(AllSummaries.info(), '\n', AllSummaries.sample(10), '\n', ValObject.info(), '\n', ValObject.sample(10))

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1939 entries, 0 to 1938
Data columns (total 1 columns):
 #   Column   Non-Null Count  Dtype
---  ------   --------------  -----
 0   Sett_ID  1939 non-null   int64
dtypes: int64(1)
memory usage: 15.3 KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8689 entries, 0 to 8688
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype
---  ------   --------------  -----
 0   Sett_ID  8689 non-null   int64
 1   year     8689 non-null   int64
dtypes: int64(2)
memory usage: 135.9 KB
None 
       Sett_ID
993     42781
1835   337702
300     14332
50        384
363     14957
1552   146131
251     14115
1633   234510
1286   127646
749     33670 
 None 
       Sett_ID  year
3914   131507  2000
220    127985  2000
5666   131506  2012
7531    28286  1999
2216   131507  2006
1800   131507  2003
6549   131507  2004
7496    28286  2001
6698   131629  2004
4773   337921  2004


In [34]:
for BuiltYear in AllStudyYears:
    GroupedVals = ValObject[
        ValObject['year']<=BuiltYear].groupby(
        'Sett_ID', as_index=False)
    
    VariableName = ''.join(['FLDct_', str(BuiltYear)])
    
    AllSummaries = AllSummaries.merge(GroupedVals.count().rename(columns={'year': VariableName}), how = 'left', on='Sett_ID')

    print('\nDesired aggregation methods applied to settlement level, year %s. %s \n' % (BuiltYear, time.ctime()))

    # Save in-progress results
    AllSummaries.to_csv(os.path.join(FloodFolder, 'FloodedCellCount1999to2015.csv'))
    print(AllSummaries.sort_values(by=AllSummaries.columns[1], ascending=False).head(10))


Desired aggregation methods applied to settlement level, year 1999. Tue Mar 14 19:45:40 2023 

       Sett_ID  FLDct_1999
1403    131507       833.0
1402    131506       441.0
1276    127629       170.0
96        4096        75.0
633      28286        71.0
1938  21474836        64.0
969      42500        61.0
592      25275        47.0
1835    337702        45.0
654      28555        44.0

Desired aggregation methods applied to settlement level, year 2000. Tue Mar 14 19:45:40 2023 

       Sett_ID  FLDct_1999  FLDct_2000
1403    131507       833.0       981.0
1402    131506       441.0       495.0
1276    127629       170.0       186.0
96        4096        75.0        89.0
633      28286        71.0       102.0
1938  21474836        64.0        66.0
969      42500        61.0        64.0
592      25275        47.0        59.0
1835    337702        45.0        54.0
654      28555        44.0        47.0

Desired aggregation methods applied to settlement level, year 2001. Tue Mar 14 19


Desired aggregation methods applied to settlement level, year 2011. Tue Mar 14 19:45:40 2023 

       Sett_ID  FLDct_1999  FLDct_2000  FLDct_2001  FLDct_2002  FLDct_2003  \
1403    131507       833.0       981.0      1072.0      1168.0      1311.0   
1402    131506       441.0       495.0       539.0       618.0       684.0   
1276    127629       170.0       186.0       187.0       204.0       242.0   
96        4096        75.0        89.0        97.0       117.0       131.0   
633      28286        71.0       102.0       125.0       127.0       128.0   
1938  21474836        64.0        66.0        75.0        90.0       103.0   
969      42500        61.0        64.0        75.0        81.0        81.0   
592      25275        47.0        59.0        60.0        62.0        65.0   
1835    337702        45.0        54.0        56.0        57.0        58.0   
654      28555        44.0        47.0        72.0        84.0        87.0   

      FLDct_2004  FLDct_2005  FLDct_2006  FLD

##### All buildings

In [35]:
Settlements = gpd.read_file(r'Results/SETTLEMENTS.gpkg', layer='SETTLEMENTS_equalarea')
AllSummaries = pd.DataFrame(Settlements).drop(columns='geometry')[['Sett_ID']]
Settlements = None
AllSummaries

ValObject = pd.DataFrame(gpd.read_file(OutPackage, layer='AllBuildup'))[['Sett_ID', 'year']]

In [36]:
for BuiltYear in AllStudyYears:
    GroupedVals = ValObject[
        ValObject['year']<=BuiltYear].groupby(
        'Sett_ID', as_index=False)
    
    VariableName = ''.join(['BLDct_', str(BuiltYear)])
    
    AllSummaries = AllSummaries.merge(GroupedVals.count().rename(columns={'year': VariableName}), how = 'left', on='Sett_ID')

    print('\nDesired aggregation methods applied to settlement level, year %s. %s \n' % (BuiltYear, time.ctime()))

    # Save in-progress results
    AllSummaries.to_csv(os.path.join(FloodFolder, 'BuiltCellCount1999to2015.csv'))
    print(AllSummaries.sort_values(by=AllSummaries.columns[1], ascending=False).head(10))


Desired aggregation methods applied to settlement level, year 1999. Tue Mar 14 19:46:08 2023 

      Sett_ID  BLDct_1999
1403   131507      5985.0
96       4096      1988.0
969     42500      1934.0
1745   278172      1476.0
592     25275       896.0
1402   131506       741.0
1016    47551       608.0
1731   262678       595.0
524     23321       561.0
233     13874       531.0

Desired aggregation methods applied to settlement level, year 2000. Tue Mar 14 19:46:08 2023 

      Sett_ID  BLDct_1999  BLDct_2000
1403   131507      5985.0      7188.0
96       4096      1988.0      2396.0
969     42500      1934.0      2072.0
1745   278172      1476.0      1547.0
592     25275       896.0       991.0
1402   131506       741.0       859.0
1016    47551       608.0       692.0
1731   262678       595.0       648.0
524     23321       561.0       665.0
233     13874       531.0       569.0

Desired aggregation methods applied to settlement level, year 2001. Tue Mar 14 19:46:08 2023 

      Se


Desired aggregation methods applied to settlement level, year 2009. Tue Mar 14 19:46:08 2023 

      Sett_ID  BLDct_1999  BLDct_2000  BLDct_2001  BLDct_2002  BLDct_2003  \
1403   131507      5985.0      7188.0      8185.0      9285.0     10675.0   
96       4096      1988.0      2396.0      2627.0      2894.0      3005.0   
969     42500      1934.0      2072.0      2248.0      2331.0      2361.0   
1745   278172      1476.0      1547.0      1574.0      1785.0      2212.0   
592     25275       896.0       991.0      1023.0      1038.0      1057.0   
1402   131506       741.0       859.0       957.0      1101.0      1280.0   
1016    47551       608.0       692.0       738.0       753.0       755.0   
1731   262678       595.0       648.0       662.0       687.0       701.0   
524     23321       561.0       665.0       689.0       716.0       727.0   
233     13874       531.0       569.0       590.0       628.0       743.0   

      BLDct_2004  BLDct_2005  BLDct_2006  BLDct_2007  BL


Desired aggregation methods applied to settlement level, year 2015. Tue Mar 14 19:46:08 2023 

      Sett_ID  BLDct_1999  BLDct_2000  BLDct_2001  BLDct_2002  BLDct_2003  \
1403   131507      5985.0      7188.0      8185.0      9285.0     10675.0   
96       4096      1988.0      2396.0      2627.0      2894.0      3005.0   
969     42500      1934.0      2072.0      2248.0      2331.0      2361.0   
1745   278172      1476.0      1547.0      1574.0      1785.0      2212.0   
592     25275       896.0       991.0      1023.0      1038.0      1057.0   
1402   131506       741.0       859.0       957.0      1101.0      1280.0   
1016    47551       608.0       692.0       738.0       753.0       755.0   
1731   262678       595.0       648.0       662.0       687.0       701.0   
524     23321       561.0       665.0       689.0       716.0       727.0   
233     13874       531.0       569.0       590.0       628.0       743.0   

      BLDct_2004  BLDct_2005  BLDct_2006  BLDct_2007  BL

### 10.7 Calculate area and percent flooded and save to file

In [29]:
BuiltArea = pd.read_csv(os.path.join(FloodFolder, 'BuiltCellCount1999to2015.csv'))
Flood = pd.read_csv(os.path.join(FloodFolder, 'FloodedCellCount1999to2015.csv'))
Areas = pd.read_csv(os.path.join(ResultsFolder, 'Areas1999to2015.csv'))

for Dataset in [BuiltArea, Flood, Areas]:
    if 'Unnamed: 0' in Dataset.columns:
        Dataset.drop(columns='Unnamed: 0', inplace=True)
    else:
        pass
    print(Dataset.info())

Stats = reduce(lambda  left,right: pd.merge(left,right,on=['Sett_ID'],
                                            how='outer'), [BuiltArea, Flood, Areas])
print(Stats.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1939 entries, 0 to 1938
Data columns (total 18 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Sett_ID     1939 non-null   int64  
 1   BLDct_1999  1662 non-null   float64
 2   BLDct_2000  1691 non-null   float64
 3   BLDct_2001  1710 non-null   float64
 4   BLDct_2002  1737 non-null   float64
 5   BLDct_2003  1750 non-null   float64
 6   BLDct_2004  1763 non-null   float64
 7   BLDct_2005  1781 non-null   float64
 8   BLDct_2006  1791 non-null   float64
 9   BLDct_2007  1798 non-null   float64
 10  BLDct_2008  1809 non-null   float64
 11  BLDct_2009  1824 non-null   float64
 12  BLDct_2010  1838 non-null   float64
 13  BLDct_2011  1850 non-null   float64
 14  BLDct_2012  1873 non-null   float64
 15  BLDct_2013  1889 non-null   float64
 16  BLDct_2014  1917 non-null   float64
 17  BLDct_2015  1939 non-null   int64  
dtypes: float64(16), int64(2)
memory usage: 272.8 KB
None
<class 'pandas.

In [30]:
# Quick spot-checking. Number of flood cells should always be less than or equal to number of built area cells.
Check1 = (Stats['FLDct_2007'] > Stats['BLDct_2007']).sum()
Check2 = (Stats['FLDct_2000'] > Stats['BLDct_2000']).sum()
Check3 = (Stats['FLDct_2011'] > Stats['BLDct_2011']).sum()
print(Check1, Check2, Check3) # All should be zero. 

2 3 0


##### Percent flooded

In [31]:
for year in AllStudyYears:
    RawVar = ''.join(['FLDct_', str(year)])
    DenomVar = ''.join(['BLDct_', str(year)])
    NewVar = ''.join(['FLDpc', str(year)])
    if ((RawVar in Stats.columns) and (DenomVar in Stats.columns)):
        Stats[NewVar] = Stats[RawVar] / Stats[DenomVar]
    else:
        pass
Stats.sort_values(by='FLDpc2005', ascending=False).head(10)

Unnamed: 0,Sett_ID,BLDct_1999,BLDct_2000,BLDct_2001,BLDct_2002,BLDct_2003,BLDct_2004,BLDct_2005,BLDct_2006,BLDct_2007,...,FLDpc2006,FLDpc2007,FLDpc2008,FLDpc2009,FLDpc2010,FLDpc2011,FLDpc2012,FLDpc2013,FLDpc2014,FLDpc2015
1390,129929,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,2.0,2.0,1.0,0.666667,0.666667,0.666667,0.75,0.571429,0.5,0.5
988,42736,5.0,5.0,5.0,6.0,7.0,7.0,7.0,7.0,7.0,...,1.142857,1.142857,1.142857,1.0,1.0,1.0,1.0,1.0,0.9,0.9
678,29307,,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,0.5,0.5,0.5,0.333333,0.333333,0.333333,0.333333,0.333333
1846,337719,2.0,2.0,2.0,2.0,2.0,2.0,2.0,3.0,3.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
1845,337718,3.0,3.0,3.0,3.0,5.0,8.0,15.0,17.0,20.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
724,32873,1.0,1.0,1.0,3.0,3.0,3.0,3.0,4.0,4.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
1842,337713,1.0,1.0,1.0,1.0,2.0,2.0,3.0,3.0,4.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
964,41701,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,2.0,...,0.5,0.5,0.5,0.5,0.5,0.333333,0.333333,0.25,0.25,0.166667
1833,337698,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
1005,44861,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


##### Area flooded

In [32]:
for year in AllStudyYears:
    RawVar = ''.join(['FLDpc', str(year)])
    DenomVar = ''.join(['AREA', str(year)])
    NewVar = ''.join(['FLDarea', str(year)])
    if ((RawVar in Stats.columns) and (DenomVar in Stats.columns)):
        Stats[NewVar] = Stats[RawVar] * Stats[DenomVar]
    else:
        pass
Stats.sort_values(by='FLDarea2005', ascending=False).head(10)

Unnamed: 0,Sett_ID,BLDct_1999,BLDct_2000,BLDct_2001,BLDct_2002,BLDct_2003,BLDct_2004,BLDct_2005,BLDct_2006,BLDct_2007,...,FLDarea2006,FLDarea2007,FLDarea2008,FLDarea2009,FLDarea2010,FLDarea2011,FLDarea2012,FLDarea2013,FLDarea2014,FLDarea2015
1403,131507,5985.0,7188.0,8185.0,9285.0,10675.0,11951.0,13487.0,14606.0,15713.0,...,16.655587,16.970924,17.700706,17.972061,18.463154,19.555881,20.139341,21.477919,22.011602,22.385899
1402,131506,741.0,859.0,957.0,1101.0,1280.0,1510.0,1671.0,1818.0,1995.0,...,5.79469,6.013235,6.103088,6.161564,6.345808,6.433329,6.570747,6.693979,6.720239,6.77941
1276,127629,180.0,196.0,197.0,215.0,259.0,277.0,299.0,310.0,314.0,...,2.097866,2.104376,2.116532,2.11092,2.113387,2.10784,2.103588,2.102753,2.104835,2.104835
1938,21474836,288.0,309.0,359.0,448.0,502.0,526.0,552.0,616.0,648.0,...,1.411002,1.373302,1.344638,1.31777,1.291951,1.258767,1.239307,1.227914,1.191595,1.104724
96,4096,1988.0,2396.0,2627.0,2894.0,3005.0,3305.0,3382.0,3398.0,3409.0,...,1.236166,1.240035,1.248213,1.237548,1.24368,1.241821,1.240363,1.240796,1.272527,1.341342
1272,127625,43.0,59.0,68.0,69.0,81.0,103.0,116.0,129.0,133.0,...,0.688244,0.697663,0.719378,0.757705,0.790788,0.794843,0.801648,0.803049,0.809131,0.814482
969,42500,1934.0,2072.0,2248.0,2331.0,2361.0,2369.0,2406.0,2425.0,2522.0,...,0.650003,0.732983,0.740743,0.748107,0.780385,0.81323,0.8321,0.843429,0.839896,0.878839
592,25275,896.0,991.0,1023.0,1038.0,1057.0,1103.0,1162.0,1169.0,1175.0,...,0.482084,0.486844,0.483259,0.478688,0.475,0.46952,0.462475,0.459279,0.46422,0.460513
1835,337702,57.0,73.0,79.0,80.0,82.0,86.0,94.0,100.0,104.0,...,0.365655,0.366816,0.365156,0.369984,0.36229,0.359633,0.362051,0.3624,0.3624,0.3624
967,42498,80.0,98.0,137.0,165.0,171.0,172.0,176.0,185.0,200.0,...,0.35904,0.343599,0.354604,0.349997,0.361921,0.360633,0.341076,0.345971,0.339573,0.343065


In [33]:
# Drop original variables.
Stats = Stats.loc[:, Stats.columns.str.contains('Sett|FLDpc|FLDarea')]
Stats.columns

Index(['Sett_ID', 'FLDpc1999', 'FLDpc2000', 'FLDpc2001', 'FLDpc2002',
       'FLDpc2003', 'FLDpc2004', 'FLDpc2005', 'FLDpc2006', 'FLDpc2007',
       'FLDpc2008', 'FLDpc2009', 'FLDpc2010', 'FLDpc2011', 'FLDpc2012',
       'FLDpc2013', 'FLDpc2014', 'FLDpc2015', 'FLDarea1999', 'FLDarea2000',
       'FLDarea2001', 'FLDarea2002', 'FLDarea2003', 'FLDarea2004',
       'FLDarea2005', 'FLDarea2006', 'FLDarea2007', 'FLDarea2008',
       'FLDarea2009', 'FLDarea2010', 'FLDarea2011', 'FLDarea2012',
       'FLDarea2013', 'FLDarea2014', 'FLDarea2015'],
      dtype='object')

In [34]:
# Save to file
Stats.to_csv(os.path.join(ResultsFolder, 'Flood1999to2015.csv'))

## 11. GROWTH STATISTICS

### 11.1 Load and prep.

In [125]:
AllStudyYears = ListFromRange(1999, 2015)

In [126]:
PlaceNames = pd.read_csv(os.path.join(ResultsFolder, 'PlaceNames.csv'))
Areas = pd.read_csv(os.path.join(ResultsFolder, 'Areas1999to2015.csv'))
Population = pd.read_csv(os.path.join(ResultsFolder, 'POP2000to2015.csv'))
NTL = pd.read_csv(os.path.join(ResultsFolder, 'NTL1999to2015.csv'))
Flood = pd.read_csv(os.path.join(ResultsFolder, 'Flood1999to2015.csv'))

RawValues = [PlaceNames, Areas, Population, NTL, Flood]

for Dataset in RawValues:
    if 'Unnamed: 0' in Dataset.columns:
        Dataset.drop(columns='Unnamed: 0', inplace=True)
    else:
        pass
    if 'year' in Dataset.columns:
        Dataset.drop(columns='year', inplace=True)
    else:
        pass
    print(Dataset.info(verbose=True))

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1939 entries, 0 to 1938
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Sett_ID   1939 non-null   int64 
 1   SettName  1939 non-null   object
dtypes: int64(1), object(1)
memory usage: 30.4+ KB
None
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1939 entries, 0 to 1938
Data columns (total 19 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Sett_ID   1939 non-null   int64  
 1   ADM_ID    1939 non-null   int64  
 2   AREA2015  1939 non-null   float64
 3   AREA2014  1917 non-null   float64
 4   AREA2013  1889 non-null   float64
 5   AREA2012  1873 non-null   float64
 6   AREA2011  1850 non-null   float64
 7   AREA2010  1838 non-null   float64
 8   AREA2009  1824 non-null   float64
 9   AREA2008  1809 non-null   float64
 10  AREA2007  1798 non-null   float64
 11  AREA2006  1791 non-null   float64
 12  AREA2005  1781 non-null   fl

In [127]:
AllStats = reduce(lambda  left,right: pd.merge(left,right,on=['Sett_ID'],
                                            how='outer'), RawValues)
AllStats.to_csv(os.path.join(ResultsFolder, 'AllStats.csv'))

AllStats[AllStats.SettName!='UNK'].sample(5)

Unnamed: 0,Sett_ID,SettName,ADM_ID,AREA2015,AREA2014,AREA2013,AREA2012,AREA2011,AREA2010,AREA2009,...,FLDarea2006,FLDarea2007,FLDarea2008,FLDarea2009,FLDarea2010,FLDarea2011,FLDarea2012,FLDarea2013,FLDarea2014,FLDarea2015
482,22184,Kokanti,25,0.332938,0.332938,0.332064,0.33119,0.327695,0.326821,0.326821,...,,,,,,,,,,
590,25273,Bebalem,23,0.581986,0.581986,0.581986,0.581986,0.581986,0.580238,0.574121,...,0.01019,0.01019,0.010131,0.010072,0.009961,0.009906,0.009906,0.009906,0.009906,0.009906
1191,68616,Massenya,5,0.602958,0.5715,0.512078,0.502465,0.49897,0.494601,0.488484,...,,,,,,,,,0.003211,0.006216
1615,193591,Mangalme,11,0.191374,0.183509,0.180014,0.16778,0.15205,0.150303,0.150303,...,,,,,,,,,,
233,13874,Kelo,63,13.736088,13.609379,13.486166,13.427618,13.340233,13.234496,13.183813,...,0.153455,0.15133,0.14896,0.146487,0.181677,0.183641,0.211165,0.214634,0.223655,0.212798


### 11.2 Change over time of raw variables
pch = percent change

#### Population change

In [128]:
Stats = PlaceNames.copy().merge(Population, how = 'outer', on='Sett_ID')
for year in AllStudyYears:
    RawVar = ''.join(['POPsum', str(year)])
    LagVar = ''.join(['POPsum', str(year-1)])
    NewVar = ''.join(['POPpch', str(year)])
    if ((RawVar in Stats.columns) and (LagVar in Stats.columns)):
        Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
    else:
        pass

In [129]:
# Drop original variables.
Stats = Stats.loc[:, Stats.columns.str.contains('Sett|pch')]
Stats.columns

Index(['Sett_ID', 'SettName', 'POPpch2001', 'POPpch2002', 'POPpch2003',
       'POPpch2004', 'POPpch2005', 'POPpch2006', 'POPpch2007', 'POPpch2008',
       'POPpch2009', 'POPpch2010', 'POPpch2011', 'POPpch2012', 'POPpch2013',
       'POPpch2014', 'POPpch2015'],
      dtype='object')

In [130]:
Stats.to_csv(os.path.join(ResultsFolder, 'PopChange.csv'))
Stats.drop(columns='SettName', inplace=True)
AllStats = AllStats.merge(Stats, how='left', on='Sett_ID')
AllStats[AllStats.SettName!='UNK'].sort_values(by=AllStats.columns[5], ascending=False).head(10)

Unnamed: 0,Sett_ID,SettName,ADM_ID,AREA2015,AREA2014,AREA2013,AREA2012,AREA2011,AREA2010,AREA2009,...,POPpch2006,POPpch2007,POPpch2008,POPpch2009,POPpch2010,POPpch2011,POPpch2012,POPpch2013,POPpch2014,POPpch2015
1403,131507,Koundoul II / Sara,6,148.711349,147.762345,146.154456,143.143161,141.426914,138.523102,137.290096,...,0.053176,0.063984,0.037304,0.018646,0.009433,0.016592,0.023282,0.074007,0.010564,0.0416
96,4096,Moundou,20,26.606188,26.238296,26.044301,26.01022,25.977888,25.748065,25.50426,...,0.06977,-0.045907,0.156758,-0.038563,-0.071351,0.017607,0.013552,-0.02205,0.05495,-0.044796
1745,278172,Abéché,41,23.189424,22.910666,22.147792,21.331614,20.079383,19.196792,18.938132,...,0.269676,-0.155332,0.378849,0.035748,-0.017689,0.038792,-0.253827,0.075921,-0.124336,0.080708
969,42500,Sarh,62,18.003984,17.899121,17.775908,17.701631,17.613372,17.545211,17.498897,...,0.202189,-0.09819,0.143968,0.114475,-0.189461,0.082034,-0.040413,0.063462,-0.094357,0.053194
233,13874,Kelo,63,13.736088,13.609379,13.486166,13.427618,13.340233,13.234496,13.183813,...,0.108237,0.023285,0.04575,-0.053744,-0.009072,0.104343,-0.022251,0.091664,-0.072681,-6.9e-05
792,35033,Doba,24,12.879712,12.707563,12.585224,12.495217,12.39385,12.350158,12.333555,...,0.150725,-0.051698,0.128101,-0.011095,-0.144326,0.086058,0.120852,-0.021857,-0.001727,0.009177
592,25275,Benoye,23,8.207223,8.19062,8.16353,8.148675,8.120712,8.088379,8.077019,...,0.08549,0.025476,0.030602,-0.061035,0.030421,0.085674,-0.009384,0.083119,-0.063343,0.008611
1938,21474836,Lai,47,8.330436,8.06828,7.910987,7.903122,7.867294,7.784278,7.736216,...,0.023543,-0.062737,0.063371,0.014194,0.004281,0.090717,-0.007012,-0.016821,0.065494,-0.017465
524,23321,Bébédja,27,5.978899,5.945693,5.926468,5.92472,5.920351,5.906369,5.882775,...,0.091217,0.034232,0.054047,0.168066,-0.097511,-0.020662,0.073291,0.096089,-0.022835,-0.00081
1122,60380,Bongor,33,5.251854,5.005427,4.779973,4.598212,4.539664,4.500341,4.364893,...,0.082878,-0.113357,0.205501,-0.053385,-0.015855,0.106654,0.052697,-0.035693,0.084758,-0.001532


#### Area change

In [131]:
Stats = PlaceNames.copy().merge(Areas, how = 'outer', on='Sett_ID')
for year in AllStudyYears:
    RawVar = ''.join(['AREA', str(year)])
    LagVar = ''.join(['AREA', str(year-1)])
    NewVar = ''.join(['AREApch', str(year)])
    if ((RawVar in Stats.columns) and (LagVar in Stats.columns)):
        Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
    else:
        pass

In [132]:
# Drop original variables.
Stats = Stats.loc[:, Stats.columns.str.contains('Sett|pch')]
Stats.columns

Index(['Sett_ID', 'SettName', 'AREApch2000', 'AREApch2001', 'AREApch2002',
       'AREApch2003', 'AREApch2004', 'AREApch2005', 'AREApch2006',
       'AREApch2007', 'AREApch2008', 'AREApch2009', 'AREApch2010',
       'AREApch2011', 'AREApch2012', 'AREApch2013', 'AREApch2014',
       'AREApch2015'],
      dtype='object')

In [133]:
Stats.to_csv(os.path.join(ResultsFolder, 'AreaChange.csv'))
Stats.drop(columns='SettName', inplace=True)
AllStats = AllStats.merge(Stats, how='left', on='Sett_ID')
AllStats[AllStats.SettName!='UNK'].sort_values(by=AllStats.columns[5], ascending=False).head(10)

Unnamed: 0,Sett_ID,SettName,ADM_ID,AREA2015,AREA2014,AREA2013,AREA2012,AREA2011,AREA2010,AREA2009,...,AREApch2006,AREApch2007,AREApch2008,AREApch2009,AREApch2010,AREApch2011,AREApch2012,AREApch2013,AREApch2014,AREApch2015
1403,131507,Koundoul II / Sara,6,148.711349,147.762345,146.154456,143.143161,141.426914,138.523102,137.290096,...,0.01504,0.017585,0.010913,0.008894,0.008981,0.020963,0.012135,0.021037,0.011001,0.006423
96,4096,Moundou,20,26.606188,26.238296,26.044301,26.01022,25.977888,25.748065,25.50426,...,0.000668,0.000457,0.001617,0.023998,0.009559,0.008926,0.001245,0.00131,0.007449,0.014021
1745,278172,Abéché,41,23.189424,22.910666,22.147792,21.331614,20.079383,19.196792,18.938132,...,0.022164,0.028295,0.015508,0.008986,0.013658,0.045976,0.062364,0.038261,0.034445,0.012167
969,42500,Sarh,62,18.003984,17.899121,17.775908,17.701631,17.613372,17.545211,17.498897,...,0.001162,0.006811,0.001704,0.001701,0.002647,0.003885,0.005011,0.004196,0.006931,0.005859
233,13874,Kelo,63,13.736088,13.609379,13.486166,13.427618,13.340233,13.234496,13.183813,...,0.001266,0.001131,0.001329,0.001195,0.003844,0.007989,0.006551,0.00436,0.009136,0.00931
792,35033,Doba,24,12.879712,12.707563,12.585224,12.495217,12.39385,12.350158,12.333555,...,0.005614,0.0058,0.010524,0.006777,0.001346,0.003538,0.008179,0.007203,0.009721,0.013547
592,25275,Benoye,23,8.207223,8.19062,8.16353,8.148675,8.120712,8.088379,8.077019,...,0.001087,0.00076,0.001085,0.001408,0.001406,0.003997,0.003443,0.001823,0.003318,0.002027
1938,21474836,Lai,47,8.330436,8.06828,7.910987,7.903122,7.867294,7.784278,7.736216,...,0.020348,0.006189,0.003303,0.005109,0.006213,0.010665,0.004554,0.000995,0.019883,0.032492
524,23321,Bébédja,27,5.978899,5.945693,5.926468,5.92472,5.920351,5.906369,5.882775,...,0.010937,0.001953,0.004799,0.004776,0.004011,0.002367,0.000738,0.000295,0.003244,0.005585
1122,60380,Bongor,33,5.251854,5.005427,4.779973,4.598212,4.539664,4.500341,4.364893,...,0.033891,0.017684,0.038567,0.01918,0.031031,0.008738,0.012897,0.039529,0.047166,0.049232


#### NTL change

In [134]:
Stats = PlaceNames.copy().merge(NTL, how = 'outer', on='Sett_ID')
Sensors = ['D_', 'V_']
Methods = ['sum', 'avg', 'max']

for year in AllStudyYears:
    for Sensor in Sensors:
        for agg in Methods:
            RawVar = ''.join(['NTL', agg, Sensor, str(year)])
            LagVar = ''.join(['NTL', agg, Sensor, str(year-1)])
            NewVar = ''.join(['NTL', agg, '_pch', Sensor, str(year)])
            if ((RawVar in Stats.columns) and (LagVar in Stats.columns)):
                Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
            else:
                pass

In [135]:
# Drop original variables.
Stats = Stats.loc[:, Stats.columns.str.contains('Sett|pch')]
Stats.columns

Index(['Sett_ID', 'SettName', 'NTLsum_pchD_2000', 'NTLavg_pchD_2000',
       'NTLmax_pchD_2000', 'NTLsum_pchD_2001', 'NTLavg_pchD_2001',
       'NTLmax_pchD_2001', 'NTLsum_pchD_2002', 'NTLavg_pchD_2002',
       'NTLmax_pchD_2002', 'NTLsum_pchD_2003', 'NTLavg_pchD_2003',
       'NTLmax_pchD_2003', 'NTLsum_pchD_2004', 'NTLavg_pchD_2004',
       'NTLmax_pchD_2004', 'NTLsum_pchD_2005', 'NTLavg_pchD_2005',
       'NTLmax_pchD_2005', 'NTLsum_pchD_2006', 'NTLavg_pchD_2006',
       'NTLmax_pchD_2006', 'NTLsum_pchD_2007', 'NTLavg_pchD_2007',
       'NTLmax_pchD_2007', 'NTLsum_pchD_2008', 'NTLavg_pchD_2008',
       'NTLmax_pchD_2008', 'NTLsum_pchD_2009', 'NTLavg_pchD_2009',
       'NTLmax_pchD_2009', 'NTLsum_pchD_2010', 'NTLavg_pchD_2010',
       'NTLmax_pchD_2010', 'NTLsum_pchD_2011', 'NTLavg_pchD_2011',
       'NTLmax_pchD_2011', 'NTLsum_pchD_2012', 'NTLavg_pchD_2012',
       'NTLmax_pchD_2012', 'NTLsum_pchD_2013', 'NTLavg_pchD_2013',
       'NTLmax_pchD_2013', 'NTLsum_pchV_2013', 'NTLavg_pchV

In [136]:
Stats.to_csv(os.path.join(ResultsFolder, 'NTLChange.csv'))

In [137]:
Stats.drop(columns='SettName', inplace=True)
AllStats = AllStats.merge(Stats, how='left', on='Sett_ID')
AllStats[AllStats.SettName!='UNK'].sort_values(by=AllStats.columns[5], ascending=False).head(10)

Unnamed: 0,Sett_ID,SettName,ADM_ID,AREA2015,AREA2014,AREA2013,AREA2012,AREA2011,AREA2010,AREA2009,...,NTLmax_pchD_2013,NTLsum_pchV_2013,NTLavg_pchV_2013,NTLmax_pchV_2013,NTLsum_pchV_2014,NTLavg_pchV_2014,NTLmax_pchV_2014,NTLsum_pchV_2015,NTLavg_pchV_2015,NTLmax_pchV_2015
1403,131507,Koundoul II / Sara,6,148.711349,147.762345,146.154456,143.143161,141.426914,138.523102,137.290096,...,0.0,-0.0074,-0.071669,-0.148804,0.132067,0.053732,-0.085548,0.064132,0.015402,-0.208743
96,4096,Moundou,20,26.606188,26.238296,26.044301,26.01022,25.977888,25.748065,25.50426,...,0.057762,0.012547,0.003818,0.244069,-0.086732,-0.094538,-0.11529,-0.081113,-0.022638,0.127446
1745,278172,Abéché,41,23.189424,22.910666,22.147792,21.331614,20.079383,19.196792,18.938132,...,0.031361,0.279186,0.228309,0.33973,-0.414856,-0.394204,-0.097331,-0.302666,-0.182436,-0.068708
969,42500,Sarh,62,18.003984,17.899121,17.775908,17.701631,17.613372,17.545211,17.498897,...,0.174422,0.235522,0.195343,0.07217,-0.791825,-0.644367,-0.629232,0.634063,0.527955,0.607947
233,13874,Kelo,63,13.736088,13.609379,13.486166,13.427618,13.340233,13.234496,13.183813,...,0.02162,0.388406,0.099154,0.367049,-0.008407,-0.300052,-0.002532,0.40532,0.225151,0.232395
792,35033,Doba,24,12.879712,12.707563,12.585224,12.495217,12.39385,12.350158,12.333555,...,-0.024918,0.35683,0.289826,0.011888,0.173822,0.011485,0.110814,0.22349,0.210612,0.212611
592,25275,Benoye,23,8.207223,8.19062,8.16353,8.148675,8.120712,8.088379,8.077019,...,-0.011494,,,,,,,,,
1938,21474836,Lai,47,8.330436,8.06828,7.910987,7.903122,7.867294,7.784278,7.736216,...,0.021505,,,,,,,,,
524,23321,Bébédja,27,5.978899,5.945693,5.926468,5.92472,5.920351,5.906369,5.882775,...,0.008922,,,,,,,,,
1122,60380,Bongor,33,5.251854,5.005427,4.779973,4.598212,4.539664,4.500341,4.364893,...,0.355541,15.617292,4.539097,5.776868,1.969506,0.719188,0.890444,-0.193381,-0.207289,-0.144159


#### Flood change: change in area and change in percent of total area

In [138]:
Stats = PlaceNames.copy().merge(Flood, how = 'outer', on='Sett_ID')
for year in AllStudyYears:
    RawVar = ''.join(['FLDarea', str(year)])
    LagVar = ''.join(['FLDarea', str(year-1)])
    NewVar = ''.join(['FLDareapch', str(year)])
    if ((RawVar in Stats.columns) and (LagVar in Stats.columns)):
        Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
    else:
        pass

In [139]:
for year in AllStudyYears:
    RawVar = ''.join(['FLDpc', str(year)])
    LagVar = ''.join(['FLDpc', str(year-1)])
    NewVar = ''.join(['FLDpcpch', str(year)])
    if ((RawVar in Stats.columns) and (LagVar in Stats.columns)):
        Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
    else:
        pass

In [140]:
# Drop original variables.
Stats = Stats.loc[:, Stats.columns.str.contains('Sett|FLDareapch|FLDpcpch')]
Stats.columns

Index(['Sett_ID', 'SettName', 'FLDareapch2000', 'FLDareapch2001',
       'FLDareapch2002', 'FLDareapch2003', 'FLDareapch2004', 'FLDareapch2005',
       'FLDareapch2006', 'FLDareapch2007', 'FLDareapch2008', 'FLDareapch2009',
       'FLDareapch2010', 'FLDareapch2011', 'FLDareapch2012', 'FLDareapch2013',
       'FLDareapch2014', 'FLDareapch2015', 'FLDpcpch2000', 'FLDpcpch2001',
       'FLDpcpch2002', 'FLDpcpch2003', 'FLDpcpch2004', 'FLDpcpch2005',
       'FLDpcpch2006', 'FLDpcpch2007', 'FLDpcpch2008', 'FLDpcpch2009',
       'FLDpcpch2010', 'FLDpcpch2011', 'FLDpcpch2012', 'FLDpcpch2013',
       'FLDpcpch2014', 'FLDpcpch2015'],
      dtype='object')

In [141]:
Stats.to_csv(os.path.join(ResultsFolder, 'FloodChange.csv'))
Stats.drop(columns='SettName', inplace=True)
AllStats = AllStats.merge(Stats, how='left', on='Sett_ID')
AllStats[AllStats.SettName!='UNK'].sort_values(by=AllStats.columns[5], ascending=False).head(10)

Unnamed: 0,Sett_ID,SettName,ADM_ID,AREA2015,AREA2014,AREA2013,AREA2012,AREA2011,AREA2010,AREA2009,...,FLDpcpch2006,FLDpcpch2007,FLDpcpch2008,FLDpcpch2009,FLDpcpch2010,FLDpcpch2011,FLDpcpch2012,FLDpcpch2013,FLDpcpch2014,FLDpcpch2015
1403,131507,Koundoul II / Sara,6,148.711349,147.762345,146.154456,143.143161,141.426914,138.523102,137.290096,...,0.009578,0.001325,0.031743,0.00638,0.018181,0.037437,0.017488,0.044493,0.013696,0.010515
96,4096,Moundou,20,26.606188,26.238296,26.044301,26.01022,25.977888,25.748065,25.50426,...,0.025636,0.002671,0.004971,-0.03178,-0.004561,-0.010329,-0.002416,-0.00096,0.017991,0.039502
1745,278172,Abéché,41,23.189424,22.910666,22.147792,21.331614,20.079383,19.196792,18.938132,...,-0.050352,-0.089166,-0.050378,-0.004523,0.002528,-0.012544,0.121688,0.1678,0.091068,0.027881
969,42500,Sarh,62,18.003984,17.899121,17.775908,17.701631,17.613372,17.545211,17.498897,...,0.025989,0.120034,0.008867,0.008226,0.040393,0.038055,0.018102,0.009379,-0.011044,0.040271
233,13874,Kelo,63,13.736088,13.609379,13.486166,13.427618,13.340233,13.234496,13.183813,...,-0.014019,-0.01496,-0.016968,-0.017778,0.23548,0.002799,0.142395,0.012014,0.032594,-0.05732
792,35033,Doba,24,12.879712,12.707563,12.585224,12.495217,12.39385,12.350158,12.333555,...,-0.078303,-0.075415,-0.085517,-0.071703,-0.020075,-0.045509,-0.08443,0.387424,0.342742,-0.03102
592,25275,Benoye,23,8.207223,8.19062,8.16353,8.148675,8.120712,8.088379,8.077019,...,-0.005988,0.009106,-0.008439,-0.010851,-0.009098,-0.015472,-0.018385,-0.008716,0.007414,-0.009992
1938,21474836,Lai,47,8.330436,8.06828,7.910987,7.903122,7.867294,7.784278,7.736216,...,-0.079677,-0.032705,-0.024096,-0.024963,-0.025646,-0.035966,-0.019923,-0.010178,-0.048496,-0.102079
524,23321,Bébédja,27,5.978899,5.945693,5.926468,5.92472,5.920351,5.906369,5.882775,...,,,,,,,,,,
1122,60380,Bongor,33,5.251854,5.005427,4.779973,4.598212,4.539664,4.500341,4.364893,...,-0.081883,-0.045898,0.139804,0.231983,0.279851,0.030634,0.320159,-0.035917,0.131173,0.02659


#### Update parent spreadsheet

In [142]:
AllStats.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1939 entries, 0 to 1938
Data columns (total 314 columns):
 #    Column            Dtype  
---   ------            -----  
 0    Sett_ID           int64  
 1    SettName          object 
 2    ADM_ID            int64  
 3    AREA2015          float64
 4    AREA2014          float64
 5    AREA2013          float64
 6    AREA2012          float64
 7    AREA2011          float64
 8    AREA2010          float64
 9    AREA2009          float64
 10   AREA2008          float64
 11   AREA2007          float64
 12   AREA2006          float64
 13   AREA2005          float64
 14   AREA2004          float64
 15   AREA2003          float64
 16   AREA2002          float64
 17   AREA2001          float64
 18   AREA2000          float64
 19   AREA1999          float64
 20   POPct2000         float64
 21   POPsum2000        float64
 22   POPct2001         float64
 23   POPsum2001        float64
 24   POPct2002         float64
 25   POPsum2002        floa

In [143]:
AllStats.to_csv(os.path.join(ResultsFolder, 'AllStats.csv'))

### 11.3 Densities
POPden = people per square kilometer
<br>NTL...den = nighttime light luminosity per square kilometer
<br>NTL...pop = nighttime light luminosity per capita

#### Population Density

In [144]:
Stats = PlaceNames.copy().merge(Population, how = 'outer', on='Sett_ID')
Stats = Stats.merge(Areas, how='left', on='Sett_ID')

for year in AllStudyYears:
    RawVar = ''.join(['POPsum', str(year)])
    DenomVar = ''.join(['AREA', str(year)])
    NewVar = ''.join(['POPden', str(year)])
    if ((RawVar in Stats.columns) and (DenomVar in Stats.columns)):
        Stats[NewVar] = Stats[RawVar] / Stats[DenomVar]
    else:
        pass

In [145]:
# Change in density
for year in AllStudyYears:
    RawVar = ''.join(['POPden', str(year)])
    LagVar = ''.join(['POPden', str(year-1)])
    NewVar = ''.join(['POPdenpch', str(year)])
    if ((RawVar in Stats.columns) and (LagVar in Stats.columns)):
        Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
    else:
        pass

In [146]:
# Drop original variables.
Stats = Stats.loc[:, ~Stats.columns.str.contains('AREA|sum|ct|year|ADM')]
Stats.columns

Index(['Sett_ID', 'SettName', 'POPden2000', 'POPden2001', 'POPden2002',
       'POPden2003', 'POPden2004', 'POPden2005', 'POPden2006', 'POPden2007',
       'POPden2008', 'POPden2009', 'POPden2010', 'POPden2011', 'POPden2012',
       'POPden2013', 'POPden2014', 'POPden2015', 'POPdenpch2001',
       'POPdenpch2002', 'POPdenpch2003', 'POPdenpch2004', 'POPdenpch2005',
       'POPdenpch2006', 'POPdenpch2007', 'POPdenpch2008', 'POPdenpch2009',
       'POPdenpch2010', 'POPdenpch2011', 'POPdenpch2012', 'POPdenpch2013',
       'POPdenpch2014', 'POPdenpch2015'],
      dtype='object')

In [147]:
Stats.to_csv(os.path.join(ResultsFolder, 'PopDensity.csv'))
Stats.drop(columns='SettName', inplace=True)
AllStats = AllStats.merge(Stats, how='left', on='Sett_ID')
AllStats[AllStats.SettName!='UNK'].sort_values(by=AllStats.columns[5], ascending=False).head(10)

Unnamed: 0,Sett_ID,SettName,ADM_ID,AREA2015,AREA2014,AREA2013,AREA2012,AREA2011,AREA2010,AREA2009,...,POPdenpch2006,POPdenpch2007,POPdenpch2008,POPdenpch2009,POPdenpch2010,POPdenpch2011,POPdenpch2012,POPdenpch2013,POPdenpch2014,POPdenpch2015
1403,131507,Koundoul II / Sara,6,148.711349,147.762345,146.154456,143.143161,141.426914,138.523102,137.290096,...,0.037571,0.045597,0.026106,0.009666,0.000448,-0.004281,0.011013,0.051878,-0.000433,0.034953
96,4096,Moundou,20,26.606188,26.238296,26.044301,26.01022,25.977888,25.748065,25.50426,...,0.069055,-0.046343,0.154891,-0.061095,-0.080145,0.008604,0.012292,-0.02333,0.04715,-0.058004
1745,278172,Abéché,41,23.189424,22.910666,22.147792,21.331614,20.079383,19.196792,18.938132,...,0.242145,-0.178574,0.357793,0.026524,-0.030925,-0.006868,-0.29763,0.036272,-0.153494,0.067716
969,42500,Sarh,62,18.003984,17.899121,17.775908,17.701631,17.613372,17.545211,17.498897,...,0.200794,-0.104291,0.142022,0.112583,-0.191601,0.077847,-0.045198,0.059018,-0.100591,0.04706
233,13874,Kelo,63,13.736088,13.609379,13.486166,13.427618,13.340233,13.234496,13.183813,...,0.106836,0.022129,0.044362,-0.054873,-0.012867,0.09559,-0.028614,0.086925,-0.081077,-0.009293
792,35033,Doba,24,12.879712,12.707563,12.585224,12.495217,12.39385,12.350158,12.333555,...,0.144301,-0.057166,0.116352,-0.017751,-0.145476,0.082229,0.111759,-0.028852,-0.011337,-0.004312
592,25275,Benoye,23,8.207223,8.19062,8.16353,8.148675,8.120712,8.088379,8.077019,...,0.084312,0.024698,0.029485,-0.062356,0.028974,0.081352,-0.012784,0.081148,-0.066441,0.00657
1938,21474836,Lai,47,8.330436,8.06828,7.910987,7.903122,7.867294,7.784278,7.736216,...,0.003131,-0.068503,0.05987,0.009038,-0.001919,0.079208,-0.011514,-0.017799,0.044722,-0.048385
524,23321,Bébédja,27,5.978899,5.945693,5.926468,5.92472,5.920351,5.906369,5.882775,...,0.079411,0.032215,0.049013,0.162514,-0.101117,-0.022975,0.0725,0.095765,-0.025995,-0.00636
1122,60380,Bongor,33,5.251854,5.005427,4.779973,4.598212,4.539664,4.500341,4.364893,...,0.047381,-0.128764,0.160735,-0.071199,-0.045474,0.097068,0.039293,-0.072361,0.035899,-0.048382


#### Nighttime Lights Density

In [148]:
Stats = PlaceNames.copy().merge(NTL, how = 'outer', on='Sett_ID')
Stats = Stats.merge(Areas, how='left', on='Sett_ID')
Sensors = ['D_', 'V_']
Methods = ['sum', 'avg', 'max']

for year in AllStudyYears:
    for Sensor in Sensors:
        for agg in Methods:
            RawVar = ''.join(['NTL', agg, Sensor, str(year)])
            DenomVar = ''.join(['AREA', str(year)])
            NewVar = ''.join(['NTL', agg, '_den', Sensor, str(year)])
            if ((RawVar in Stats.columns) and (DenomVar in Stats.columns)):
                Stats[NewVar] = Stats[RawVar] / Stats[DenomVar]
            else:
                pass

In [149]:
# Change in density
for year in AllStudyYears:
    for Sensor in Sensors:
        for agg in Methods:
            RawVar = ''.join(['NTL', agg, '_den', Sensor, str(year)])
            LagVar = ''.join(['NTL', agg, '_den', Sensor, str(year-1)])
            NewVar = ''.join(['NTL', agg, '_denpch', Sensor, str(year)])
            if ((RawVar in Stats.columns) and (LagVar in Stats.columns)):
                Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
            else:
                pass

  Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
  Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
  Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
  Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
  Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
  Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
  Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
  Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
  Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
  Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
  Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
  Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
  Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]


In [150]:
list(Stats.columns)

['Sett_ID',
 'SettName',
 'NTLctD_1999',
 'NTLsumD_1999',
 'NTLavgD_1999',
 'NTLmaxD_1999',
 'NTLminD_1999',
 'NTLcfc_D_1999',
 'NTLctD_2000',
 'NTLsumD_2000',
 'NTLavgD_2000',
 'NTLmaxD_2000',
 'NTLminD_2000',
 'NTLcfc_D_2000',
 'NTLctD_2001',
 'NTLsumD_2001',
 'NTLavgD_2001',
 'NTLmaxD_2001',
 'NTLminD_2001',
 'NTLcfc_D_2001',
 'NTLctD_2002',
 'NTLsumD_2002',
 'NTLavgD_2002',
 'NTLmaxD_2002',
 'NTLminD_2002',
 'NTLcfc_D_2002',
 'NTLctD_2003',
 'NTLsumD_2003',
 'NTLavgD_2003',
 'NTLmaxD_2003',
 'NTLminD_2003',
 'NTLcfc_D_2003',
 'NTLctD_2004',
 'NTLsumD_2004',
 'NTLavgD_2004',
 'NTLmaxD_2004',
 'NTLminD_2004',
 'NTLcfc_D_2004',
 'NTLctD_2005',
 'NTLsumD_2005',
 'NTLavgD_2005',
 'NTLmaxD_2005',
 'NTLminD_2005',
 'NTLcfc_D_2005',
 'NTLctD_2006',
 'NTLsumD_2006',
 'NTLavgD_2006',
 'NTLmaxD_2006',
 'NTLminD_2006',
 'NTLcfc_D_2006',
 'NTLctD_2007',
 'NTLsumD_2007',
 'NTLavgD_2007',
 'NTLmaxD_2007',
 'NTLminD_2007',
 'NTLcfc_D_2007',
 'NTLctD_2008',
 'NTLsumD_2008',
 'NTLavgD_2008',
 'NTLma

In [151]:
# Drop original variables.
Stats = Stats.loc[:, Stats.columns.str.contains('Sett|den')]
list(Stats.columns)

['Sett_ID',
 'SettName',
 'NTLsum_denD_1999',
 'NTLavg_denD_1999',
 'NTLmax_denD_1999',
 'NTLsum_denD_2000',
 'NTLavg_denD_2000',
 'NTLmax_denD_2000',
 'NTLsum_denD_2001',
 'NTLavg_denD_2001',
 'NTLmax_denD_2001',
 'NTLsum_denD_2002',
 'NTLavg_denD_2002',
 'NTLmax_denD_2002',
 'NTLsum_denD_2003',
 'NTLavg_denD_2003',
 'NTLmax_denD_2003',
 'NTLsum_denD_2004',
 'NTLavg_denD_2004',
 'NTLmax_denD_2004',
 'NTLsum_denD_2005',
 'NTLavg_denD_2005',
 'NTLmax_denD_2005',
 'NTLsum_denD_2006',
 'NTLavg_denD_2006',
 'NTLmax_denD_2006',
 'NTLsum_denD_2007',
 'NTLavg_denD_2007',
 'NTLmax_denD_2007',
 'NTLsum_denD_2008',
 'NTLavg_denD_2008',
 'NTLmax_denD_2008',
 'NTLsum_denD_2009',
 'NTLavg_denD_2009',
 'NTLmax_denD_2009',
 'NTLsum_denD_2010',
 'NTLavg_denD_2010',
 'NTLmax_denD_2010',
 'NTLsum_denD_2011',
 'NTLavg_denD_2011',
 'NTLmax_denD_2011',
 'NTLsum_denD_2012',
 'NTLavg_denD_2012',
 'NTLmax_denD_2012',
 'NTLsum_denV_2012',
 'NTLavg_denV_2012',
 'NTLmax_denV_2012',
 'NTLsum_denD_2013',
 'NTLavg_

In [152]:
Stats.to_csv(os.path.join(ResultsFolder, 'NTLDensity.csv'))
Stats.drop(columns='SettName', inplace=True)
AllStats = AllStats.merge(Stats, how='left', on='Sett_ID')
AllStats[AllStats.SettName!='UNK'].sort_values(by=AllStats.columns[5], ascending=False).head(10)

Unnamed: 0,Sett_ID,SettName,ADM_ID,AREA2015,AREA2014,AREA2013,AREA2012,AREA2011,AREA2010,AREA2009,...,NTLmax_denpchD_2013,NTLsum_denpchV_2013,NTLavg_denpchV_2013,NTLmax_denpchV_2013,NTLsum_denpchV_2014,NTLavg_denpchV_2014,NTLmax_denpchV_2014,NTLsum_denpchV_2015,NTLavg_denpchV_2015,NTLmax_denpchV_2015
1403,131507,Koundoul II / Sara,6,148.711349,147.762345,146.154456,143.143161,141.426914,138.523102,137.290096,...,-0.020604,-0.027851,-0.090796,-0.166342,0.119749,0.042266,-0.095499,0.057341,0.008922,-0.213792
96,4096,Moundou,20,26.606188,26.238296,26.044301,26.01022,25.977888,25.748065,25.50426,...,0.056378,0.011222,0.002504,0.242441,-0.093484,-0.101232,-0.121831,-0.093819,-0.036153,0.111856
1745,278172,Abéché,41,23.189424,22.910666,22.147792,21.331614,20.079383,19.196792,18.938132,...,-0.006646,0.232046,0.183044,0.290359,-0.43434,-0.414376,-0.127388,-0.311049,-0.192264,-0.079903
969,42500,Sarh,62,18.003984,17.899121,17.775908,17.701631,17.613372,17.545211,17.498897,...,0.169515,0.23036,0.190348,0.06769,-0.793258,-0.646815,-0.631784,0.624545,0.519055,0.598582
233,13874,Kelo,63,13.736088,13.609379,13.486166,13.427618,13.340233,13.234496,13.183813,...,0.017185,0.382378,0.094383,0.361114,-0.017385,-0.306389,-0.011563,0.392357,0.21385,0.221027
792,35033,Doba,24,12.879712,12.707563,12.585224,12.495217,12.39385,12.350158,12.333555,...,-0.031891,0.347127,0.280602,0.004652,0.162521,0.001747,0.10012,0.207137,0.194431,0.196404
592,25275,Benoye,23,8.207223,8.19062,8.16353,8.148675,8.120712,8.088379,8.077019,...,-0.013293,,,,,,,,,
1938,21474836,Lai,47,8.330436,8.06828,7.910987,7.903122,7.867294,7.784278,7.736216,...,0.020489,,,,,,,,,
524,23321,Bébédja,27,5.978899,5.945693,5.926468,5.92472,5.920351,5.906369,5.882775,...,0.008624,,,,,,,,,
1122,60380,Bongor,33,5.251854,5.005427,4.779973,4.598212,4.539664,4.500341,4.364893,...,0.303995,14.985409,4.32847,5.519173,1.835754,0.641752,0.805295,-0.231229,-0.244484,-0.184317


#### Nighttime Lights per Capita

In [153]:
Stats = PlaceNames.copy().merge(NTL, how = 'outer', on='Sett_ID')
Stats = Stats.merge(Population, how='left', on='Sett_ID')
Sensors = ['D_', 'V_']
Methods = ['sum', 'avg', 'max']

for year in AllStudyYears:
    for Sensor in Sensors:
        for agg in Methods:
            RawVar = ''.join(['NTL', agg, Sensor, str(year)])
            DenomVar = ''.join(['POPsum', str(year)])
            NewVar = ''.join(['NTL', agg, '_pop', Sensor, str(year)])
            if ((RawVar in Stats.columns) and (DenomVar in Stats.columns)):
                Stats[NewVar] = Stats[RawVar] / Stats[DenomVar]
            else:
                pass

In [154]:
# Change in density
for year in AllStudyYears:
    for Sensor in Sensors:
        for agg in Methods:
            RawVar = ''.join(['NTL', agg, '_pop', Sensor, str(year)])
            LagVar = ''.join(['NTL', agg, '_pop', Sensor, str(year-1)])
            NewVar = ''.join(['NTL', agg, '_poppch', Sensor, str(year)])
            if ((RawVar in Stats.columns) and (LagVar in Stats.columns)):
                Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
            else:
                pass

  Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
  Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
  Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
  Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
  Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
  Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]


In [155]:
list(Stats.columns)

['Sett_ID',
 'SettName',
 'NTLctD_1999',
 'NTLsumD_1999',
 'NTLavgD_1999',
 'NTLmaxD_1999',
 'NTLminD_1999',
 'NTLcfc_D_1999',
 'NTLctD_2000',
 'NTLsumD_2000',
 'NTLavgD_2000',
 'NTLmaxD_2000',
 'NTLminD_2000',
 'NTLcfc_D_2000',
 'NTLctD_2001',
 'NTLsumD_2001',
 'NTLavgD_2001',
 'NTLmaxD_2001',
 'NTLminD_2001',
 'NTLcfc_D_2001',
 'NTLctD_2002',
 'NTLsumD_2002',
 'NTLavgD_2002',
 'NTLmaxD_2002',
 'NTLminD_2002',
 'NTLcfc_D_2002',
 'NTLctD_2003',
 'NTLsumD_2003',
 'NTLavgD_2003',
 'NTLmaxD_2003',
 'NTLminD_2003',
 'NTLcfc_D_2003',
 'NTLctD_2004',
 'NTLsumD_2004',
 'NTLavgD_2004',
 'NTLmaxD_2004',
 'NTLminD_2004',
 'NTLcfc_D_2004',
 'NTLctD_2005',
 'NTLsumD_2005',
 'NTLavgD_2005',
 'NTLmaxD_2005',
 'NTLminD_2005',
 'NTLcfc_D_2005',
 'NTLctD_2006',
 'NTLsumD_2006',
 'NTLavgD_2006',
 'NTLmaxD_2006',
 'NTLminD_2006',
 'NTLcfc_D_2006',
 'NTLctD_2007',
 'NTLsumD_2007',
 'NTLavgD_2007',
 'NTLmaxD_2007',
 'NTLminD_2007',
 'NTLcfc_D_2007',
 'NTLctD_2008',
 'NTLsumD_2008',
 'NTLavgD_2008',
 'NTLma

In [156]:
# Drop original variables.
Stats = Stats.loc[:, Stats.columns.str.contains('Sett|_pop')]
list(Stats.columns)

['Sett_ID',
 'SettName',
 'NTLsum_popD_2000',
 'NTLavg_popD_2000',
 'NTLmax_popD_2000',
 'NTLsum_popD_2001',
 'NTLavg_popD_2001',
 'NTLmax_popD_2001',
 'NTLsum_popD_2002',
 'NTLavg_popD_2002',
 'NTLmax_popD_2002',
 'NTLsum_popD_2003',
 'NTLavg_popD_2003',
 'NTLmax_popD_2003',
 'NTLsum_popD_2004',
 'NTLavg_popD_2004',
 'NTLmax_popD_2004',
 'NTLsum_popD_2005',
 'NTLavg_popD_2005',
 'NTLmax_popD_2005',
 'NTLsum_popD_2006',
 'NTLavg_popD_2006',
 'NTLmax_popD_2006',
 'NTLsum_popD_2007',
 'NTLavg_popD_2007',
 'NTLmax_popD_2007',
 'NTLsum_popD_2008',
 'NTLavg_popD_2008',
 'NTLmax_popD_2008',
 'NTLsum_popD_2009',
 'NTLavg_popD_2009',
 'NTLmax_popD_2009',
 'NTLsum_popD_2010',
 'NTLavg_popD_2010',
 'NTLmax_popD_2010',
 'NTLsum_popD_2011',
 'NTLavg_popD_2011',
 'NTLmax_popD_2011',
 'NTLsum_popD_2012',
 'NTLavg_popD_2012',
 'NTLmax_popD_2012',
 'NTLsum_popV_2012',
 'NTLavg_popV_2012',
 'NTLmax_popV_2012',
 'NTLsum_popD_2013',
 'NTLavg_popD_2013',
 'NTLmax_popD_2013',
 'NTLsum_popV_2013',
 'NTLavg_

In [157]:
Stats.to_csv(os.path.join(ResultsFolder, 'NTLperCapita.csv'))
Stats.drop(columns='SettName', inplace=True)
AllStats = AllStats.merge(Stats, how='left', on='Sett_ID')
AllStats[AllStats.SettName!='UNK'].sort_values(by=AllStats.columns[5], ascending=False).head(10)

Unnamed: 0,Sett_ID,SettName,ADM_ID,AREA2015,AREA2014,AREA2013,AREA2012,AREA2011,AREA2010,AREA2009,...,NTLmax_poppchD_2013,NTLsum_poppchV_2013,NTLavg_poppchV_2013,NTLmax_poppchV_2013,NTLsum_poppchV_2014,NTLavg_poppchV_2014,NTLmax_poppchV_2014,NTLsum_poppchV_2015,NTLavg_poppchV_2015,NTLmax_poppchV_2015
1403,131507,Koundoul II / Sara,6,148.711349,147.762345,146.154456,143.143161,141.426914,138.523102,137.290096,...,-0.068907,-0.075797,-0.135637,-0.207457,0.120233,0.042717,-0.095107,0.021633,-0.025151,-0.240344
96,4096,Moundou,20,26.606188,26.238296,26.044301,26.01022,25.977888,25.748065,25.50426,...,0.081612,0.035377,0.026451,0.27212,-0.134302,-0.141701,-0.161372,-0.03802,0.023196,0.180319
1745,278172,Abéché,41,23.189424,22.910666,22.147792,21.331614,20.079383,19.196792,18.938132,...,-0.041416,0.188922,0.141635,0.245193,-0.331771,-0.308186,0.03084,-0.354743,-0.243492,-0.138257
969,42500,Sarh,62,18.003984,17.899121,17.775908,17.701631,17.613372,17.545211,17.498897,...,0.104339,0.161793,0.124011,0.008188,-0.770135,-0.607315,-0.590603,0.551531,0.450782,0.526734
233,13874,Kelo,63,13.736088,13.609379,13.486166,13.427618,13.340233,13.234496,13.183813,...,-0.064162,0.271825,0.006861,0.252262,0.069312,-0.245192,0.075647,0.405417,0.225236,0.232481
792,35033,Doba,24,12.879712,12.707563,12.585224,12.495217,12.39385,12.350158,12.333555,...,-0.00313,0.387149,0.318647,0.034499,0.175852,0.013234,0.112735,0.212365,0.199603,0.201584
592,25275,Benoye,23,8.207223,8.19062,8.16353,8.148675,8.120712,8.088379,8.077019,...,-0.087353,,,,,,,,,
1938,21474836,Lai,47,8.330436,8.06828,7.910987,7.903122,7.867294,7.784278,7.736216,...,0.038982,,,,,,,,,
524,23321,Bébédja,27,5.978899,5.945693,5.926468,5.92472,5.920351,5.906369,5.882775,...,-0.079525,,,,,,,,,
1122,60380,Bongor,33,5.251854,5.005427,4.779973,4.598212,4.539664,4.500341,4.364893,...,0.405715,16.232364,4.744121,6.027707,1.737482,0.584858,0.742733,-0.192144,-0.206072,-0.142846


#### Update parent spreadsheet

In [158]:
AllStats.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1939 entries, 0 to 1938
Data columns (total 555 columns):
 #    Column               Dtype  
---   ------               -----  
 0    Sett_ID              int64  
 1    SettName             object 
 2    ADM_ID               int64  
 3    AREA2015             float64
 4    AREA2014             float64
 5    AREA2013             float64
 6    AREA2012             float64
 7    AREA2011             float64
 8    AREA2010             float64
 9    AREA2009             float64
 10   AREA2008             float64
 11   AREA2007             float64
 12   AREA2006             float64
 13   AREA2005             float64
 14   AREA2004             float64
 15   AREA2003             float64
 16   AREA2002             float64
 17   AREA2001             float64
 18   AREA2000             float64
 19   AREA1999             float64
 20   POPct2000            float64
 21   POPsum2000           float64
 22   POPct2001            float64
 23   POPsum2001 

In [159]:
AllStats.to_csv(os.path.join(ResultsFolder, 'AllStats.csv'))

### 11.4 Urban Type

In [161]:
for year in AllStudyYears:
    PopVar = ''.join(['POPsum', str(year)])
    DenVar = ''.join(['POPden', str(year)])
    NewVar = ''.join(['UrbType', str(year)])
    if ((PopVar in AllStats.columns) and (DenVar in AllStats.columns)):
        AllStats[NewVar] = 'LD'
        AllStats.loc[(AllStats[PopVar] >= 5000) & (AllStats[DenVar] >= 300), NewVar] = 'SDurban'
        AllStats.loc[(AllStats[PopVar] >= 50000) & (AllStats[DenVar] >= 1500), NewVar] = 'HDurban'
    else:
        pass

In [162]:
AllStats.to_csv(os.path.join(ResultsFolder, 'AllStats.csv'))

In [163]:
Stats = AllStats.loc[:, AllStats.columns.str.contains('Sett|UrbType')]
list(Stats.columns)

['Sett_ID',
 'SettName',
 'UrbType2000',
 'UrbType2001',
 'UrbType2002',
 'UrbType2003',
 'UrbType2004',
 'UrbType2005',
 'UrbType2006',
 'UrbType2007',
 'UrbType2008',
 'UrbType2009',
 'UrbType2010',
 'UrbType2011',
 'UrbType2012',
 'UrbType2013',
 'UrbType2014',
 'UrbType2015']

In [164]:
Stats[Stats.SettName!='UNK'].sort_values(by=Stats.columns[5], ascending=False).head(10)

Unnamed: 0,Sett_ID,SettName,UrbType2000,UrbType2001,UrbType2002,UrbType2003,UrbType2004,UrbType2005,UrbType2006,UrbType2007,UrbType2008,UrbType2009,UrbType2010,UrbType2011,UrbType2012,UrbType2013,UrbType2014,UrbType2015
1938,21474836,Lai,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban
592,25275,Benoye,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban
969,42500,Sarh,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban
1745,278172,Abéché,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban
1115,56147,Fianga,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban
1122,60380,Bongor,LD,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban
792,35033,Doba,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban
687,32767,Béré,LD,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban
1625,212434,Adre,LD,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban
1061,51519,Lere,LD,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban


In [165]:
Stats.to_csv(os.path.join(ResultsFolder, 'UrbanType.csv'))