# Spatiotemporal Trends in Urban and Rural Settlements
## Overview
This repository investigates year-by-year change in cities and settlements in Central and Western Africa (CWA). The goal is to capture activity for every settlement locality in a country to produce indicators that are high frequency, spatially granular, and timely. The Jupyter Notebook is the primary script used to construct each country's dataset. It tracks population, built-area, and economic and climate indicators across a 16-year timeframe from 2000 to 2015. 

The repository is split into three sections: methodology and notebooks, source data, and outputs. Outputs are organized by country and include growth tables ("urban panel datasets"), charts, and country briefs.


## Datasets
Datasets used to create each African country's urban panel data are as follows:
1. Most up-to-date administrative boundaries.
2. City names: **UCDB, Africapolis, and GeoNames.**
3. Settlement types: **GRID3 settlement extents.** Captured between 2009-2019.
4. Built-up area, yearly: **World Settlement Footprint Evolution.** Resolution: 30m.
5. Population, yearly: **WorldPop.** UN-adjusted, unconstrained. Resolution: 100m.
6. Nighttime lights, yearly: **Harmonization of DMSP and VIIRS.** Resolution: 1km and 500m.
7. Flood extents, by return period: **FATHOM.** Resolution: 90m.


## Accessing Data
Source data are available to the public by providers listed in the previous section, with the exception of flood data. Please note that the source data files in this repository have been fit for purpose and may not cover your area of interest. Some sources are also not global; GRID3 settlement extents are only available for sub-Saharan Africa, and Africapolis names for Africa.

Results from the analysis are currently available for Cameroon and are under development for Central African Republic and four Sahel countries: Burkina Faso, Chad, Mali, and Niger. Results are available in the outputs folder by country. Please contact the CWA Geospatial team to inquire about new locations.
<br>
> **Walker Kosmidou-Bradley**, wkosmidoubradley@worldbank.org
<br>
> **Grace Doherty**, gdoherty2@worldbank.org

## License
Materials under this repository are open-source under an MIT license. The community is invited to test, adapt, and re-purpose materials as needed.

---

## 1. PREPARE WORKSPACE

### 1.1 Off-script

##### Off-script: Create folders in your working directory. (The folder where you are storing this script).
> *ADM
<br>Buildup
<br>PlaceName
<br>Population
<br>Settlement
<br>NTL*

##### Before starting: Download datasets (as shapefile, GeoJSON, or tif where possible) and place or extract into corresponding folder. You can download the cleaned files from our [GitHub Repository](https://github.com/worldbank/Urban_Spatio_Temporal_Trends) or access original sources here:
- ADM: *Varies by source.*
- Buildup: https://download.geoservice.dlr.de/WSF_EVO/files/
    - If more than one tif, name the tifs as follows: WSFE1.tif, WSFE2.tif... etc.
- PlaceName: 
    - GeoNames: (file: cities500.zip) https://download.geonames.org/export/dump/
    - Africapolis: https://africapolis.org/en/data
    - Urban Centres Database: https://ghsl.jrc.ec.europa.eu/ghs_stat_ucdb2015mt_r2019a.php
- Population: https://hub.worldpop.org/geodata/listing?id=69
- Settlement: https://data.grid3.org/datasets/GRID3::grid3-cameroon-settlement-extents-version-01-01-/explore
- Nighttime Lights: https://eogdata.mines.edu/products/dmsp/#v4 and https://eogdata.mines.edu/products/vnl/#annual_v2

##### Other off-script:
- Convert GeoNames from .txt file to shape (delimiter = tab, header rows = 0) and rename fields.
- If necessary, mosaic WSFE rasters that cover the area of interest to create a single file.

### 1.2 Load all packages.

In [63]:
# Built-in:
# dir(), print(), range(), format(), int(), len(), list(), max(), min(), zip(), sorted(), sum(), open(), del, = None, try except, with as, for in, if elif else
# Also: list.append(), list.insert(), list.remove(), count(), startswith(), endswith(), contains(), replace()

import os, sys, glob, re, time, subprocess, string # os.getcwd(), os.path.join(), os.listdir(), os.remove(), time.ctime(), glob.glob(), string.zfill(), string.join()
from os.path import exists # exists()
from functools import reduce # reduce()

import geopandas as gpd # read_file(), GeoDataFrame(), sjoin_nearest(), to_crs(), to_file(), .crs, buffer(), dissolve()
import pandas as pd # .dtypes, Series(), concat(), DataFrame(), read_table(), merge(), to_csv(), .loc[], head(), sample(), astype(), unique(), rename(), between(), drop(), fillna(), idxmax(), isna(), isin(), apply(), info(), sort_values(), notna(), groupby(), value_counts(), duplicated(), drop_duplicates()
from shapely.geometry import Point, LineString, Polygon, shape, MultiPoint
from shapely.ops import cascaded_union
from shapely.validation import make_valid  # in apply(make_valid)
import shapely.wkt

import numpy as np # median(), mean(), tolist(), .inf
import fiona, rioxarray # fiona.open()
import rasterio # open(), write_band(), .name, .count, .width, .height. nodatavals, .meta, update(), copy(), write()
from rasterio.plot import show
from rasterio import features # features.rasterize()
from rasterio.features import shapes
from rasterio import mask # rasterio.mask.mask()
from rasterio.enums import Resampling # rasterio.enums.Resampling()
from rasterstats import zonal_stats # zonal_stats()
from osgeo import gdal, osr, ogr, gdal_array, gdalconst # Open(), SpatialReference, WarpOptions(), Warp(), GetDataTypeName(), GetRasterBand(), GetNoDataValue(), Translate(), GetProjection(), GetAttrValue()

### 1.3 Set up workspace.

In [75]:
Workspace = os.getcwd()
Source = os.path.join(Workspace, 'Source')
Intermediate = os.path.join(Workspace, 'Intermediate')
Results = os.path.join(Workspace, 'Results')
NTL = os.path.join(os.path.dirname(Workspace), 'NighttimeLights_VIIRS_DMSP', 'Temp')

print('\n'.join([Workspace, Source, Intermediate, Results, NTL]))

Q:\GIS\povertyequity\urban_growth\Syria
Q:\GIS\povertyequity\urban_growth\Syria\Source
Q:\GIS\povertyequity\urban_growth\Syria\Intermediate
Q:\GIS\povertyequity\urban_growth\Syria\Results
Q:\GIS\povertyequity\urban_growth\NighttimeLights_VIIRS_DMSP\Temp


In [3]:
def ListFromRange(r1, r2):
    return [item for item in range(r1, r2+1)]

In [4]:
# These are beyond the maximum expected timeframe. 
# For instance, some sources we use don't go past even 2015, and Earth Observation wouldn't have started until later than 1900.

# Why include this? To ensure that numerically assigned, non-year values (like NoData or a QC marker) are excluded.
# The next block of code will re-write these objects with the accurate timeframe.
Year_end = 2030
Year_start = 1900

print("Eligible date range:", Year_start, 'to', Year_end)

EligibleYears = ListFromRange(Year_start, Year_end)
print(EligibleYears)

Eligible date range: 1900 to 2030
[1900, 1901, 1902, 1903, 1904, 1905, 1906, 1907, 1908, 1909, 1910, 1911, 1912, 1913, 1914, 1915, 1916, 1917, 1918, 1919, 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930, 1931, 1932, 1933, 1934, 1935, 1936, 1937, 1938, 1939, 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025, 2026, 2027, 2028, 2029, 2030]


### 1.4 User-defined functions.

In [5]:
# From Stack Exchange @RutgerH
# https://gis.stackexchange.com/questions/163685/reclassify-a-raster-value-to-9999-and-set-it-to-the-nodata-value-using-python-a
def readRaster(filename):
    filehandle = gdal.Open(filename)
    band1 = filehandle.GetRasterBand(1)
    geotransform = filehandle.GetGeoTransform()
    geoproj = filehandle.GetProjection()
    Z = band1.ReadAsArray()
    xsize = filehandle.RasterXSize
    ysize = filehandle.RasterYSize
    return xsize,ysize,geotransform,geoproj,Z

In [6]:
# Default arguments can be changed here, or can be specified below when running the functions.
def writeRaster(filename,geotransform,geoprojection,data, NoDataVal=0, dst_datatype=gdal.GDT_UInt32):
    (x,y) = data.shape
    Dformat = "GTiff"
    driver = gdal.GetDriverByName(Dformat)
    # you can change the dataformat but be sure to be able to store negative values including -9999
    dst_ds = driver.Create(filename,y,x,1,dst_datatype)
    dst_ds.GetRasterBand(1).WriteArray(data)
    dst_ds.SetGeoTransform(geotransform)
    dst_ds.SetProjection(geoprojection)
    dst_ds.GetRasterBand(1).SetNoDataValue(NoDataVal)
    return 1
    dst_ds = None

In [7]:
# Based on Stack Exchange @Kurt Schwehr:
# https://stackoverflow.com/questions/10454316/how-to-project-and-resample-a-grid-to-match-another-grid-with-gdal-python
def resampleRaster(InRaster_Path, MatchRaster_Path, OutFile_Path, 
                   DataType = gdalconst.GDT_UInt32, 
                   ResampType = gdal.GRA_Bilinear, NoDataVal = 0):
    print('Loading for %s. %s' % (InRaster_Path, time.ctime()))
    
    RasterObject = gdal.Open(InRaster_Path)
    In_proj = RasterObject.GetProjection()
    [Match_x, Match_y, Match_geo, Match_proj, Match_Z] = readRaster(MatchRaster_Path)
    print('---Specs to match to: \n', 
      Match_proj, '\n', Match_geo, '\n', Match_x, '\n', Match_y, '\n')
        
    OutFile = gdal.GetDriverByName('GTiff').Create(OutFile_Path, Match_x, Match_y, 1, DataType)
    OutFile.SetGeoTransform(Match_geo)
    OutFile.SetProjection(Match_proj)
    print('---Created raster file for upsampled version. %s' % time.ctime())
    
    gdal.ReprojectImage(RasterObject, OutFile, In_proj, Match_proj, ResampType)
    print('---Resampled values onto an empty raster matching the dimensions of the buildup layer. %s \n\n' % time.ctime())
    
    OutFile.GetRasterBand(1).SetNoDataValue(NoDataVal)
    
    RasterObject = Outfile = None
    return 1

In [8]:
def calcShell(A, OutFile, Calculation, OutType = '',
              B=None, C=None, D=None, E=None, F=None, G=None):
    """Raster math using gdal_calc.py.

    The OSgeo package for Python API does not make raster calculations
    easy outside of the shell. This function plugs up to 6 raster files
    into a string which subprocess.call() then commits to the terminal.

        A : str
            File path to the first raster for the calculation.
        B : str
            File path to the second raster for the calculation.
        NoDataVal : numeric
            Optional value to assign as No Data in the output raster.
        OutFile : str
            File path where to store the raster generated from the calculation.
        Calculation : str
            Algebra that uses A and B to create a new raster. Use double quotes.
    """
    print('Running for %s. %s' % (A, time.ctime()))
    cmd = 'gdal_calc.py -A ' + A
    if B is not None:
        cmd = cmd + ' -B ' + B 
    if C is not None:
        cmd = cmd + ' -C ' + C 
    if D is not None:
        cmd = cmd + ' -D ' + D
    if E is not None:
        cmd = cmd + ' -E ' + E
    if F is not None:
        cmd = cmd + ' -F ' + F
    if G is not None:
        cmd = cmd + ' -G ' + G
        
    cmd = cmd + OutType + ' --outfile=' + OutFile + ' --overwrite --calc=' + Calculation
    subprocess.call(cmd, shell=True)
    cmd = A = B = C = D = E = F = G = None
    print('Ran in shell. See OutFile folder to inspect results. %s' % time.ctime())

In [9]:
def mosaicShell(A, B, OutFile, Band = 1, OutType = '',
                  C=None, D=None, E=None, F=None, G=None):
    print('Running for %s. %s' % (A, time.ctime()))
    
    StringFiles = ' '.join([A,B])
    
    for RasterName in [C,D,E,F,G]:
        if RasterName is not None:
            StringFiles = ' '.join([StringFiles, RasterName])
        else:
            pass
        
    cmd = 'gdal_merge.py -o ' + OutFile + OutType + ' -of gtiff ' + StringFiles
    
    subprocess.call(cmd, shell=True)
    print('Ran in shell. See OutFile folder to inspect results. %s' % time.ctime())

In [10]:
def RasterToShapefile(InRasterPath, OutFilePath = 'RastToShp.shp', Band=1, 
                      OutName='RastToShp', VariableName='value', Driver = 'ESRI Shapefile'):
    """Raster tiff to vector polygon shapefile.
    Can also be used for other file types like geopackage, but note that this code
    currently does not account for writing into an existing file. It will write over
    the file if specified as the file path.
    
    """
    Raster = gdal.Open(InRasterPath)
    RasterBand = Raster.GetRasterBand(Band)
    
    OutDriver = ogr.GetDriverByName(Driver)
    InProj = Raster.GetProjectionRef()
    SpatRef = osr.SpatialReference()
    SpatRef.ImportFromWkt(InProj)
    print(InProj, '\n\n', SpatRef)
    
    if exists(OutFilePath):
        OutFile = ogr.Open(OutFilePath)
    else:
        OutFile = OutDriver.CreateDataSource(OutFilePath)
    OutLayer = OutFile.CreateLayer(OutName, srs = SpatRef, geom_type = ogr.wkbPolygon)
    OutField = ogr.FieldDefn(VariableName, ogr.OFTInteger)
    OutLayer.CreateField(OutField)
    OutField = OutLayer.GetLayerDefn().GetFieldIndex(VariableName)
    print('\n', OutFile, '\n', OutLayer, '\n', OutField)
    
    print('Vectorizing. Input: %s. %s' % (InRasterPath, time.ctime()))
    gdal.Polygonize(RasterBand, None, OutLayer, 0, [], callback=None)
    print('Completed polygons. Stored as: %s. %s' % (OutFilePath, time.ctime()))

    del Raster, RasterBand, OutFile, OutLayer

In [11]:
def rioStats(InRasterPath, Band = 1):
    out = rasterio.open(InRasterPath)
    stats = []
    band = out.read(Band)
    stats.append({
        'raster': out.name,
        'bands': out.count,
        'data type': out.dtypes,
        'no data value': out.nodatavals,
        'width': out.width,
        'height': out.height,
        'min': band.min(),
        'mean': band.mean(),
        'median': np.median(band),
        'max': band.max()})
    print("\n", stats)
    
    out = band = None

In [12]:
def ShapeToRaster(Shapefile, ValueVar, MetaRasterPath, OutFilePath = 'ShpToRast.tif', Band=1, NewDType=None):
    """
    Polygon spatial object to raster tiff.
    """
    # Copy and update the metadata from another raster for the output
    MetaRaster = rasterio.open(MetaRasterPath)
    meta = MetaRaster.meta.copy()
    meta.update(compress='lzw')
    if NewDType is not None:
        meta.update(dtype=NewDType)
    MetaRaster.meta

    print("Rasterizing dataset. %s" % time.ctime())
    with rasterio.open(OutFilePath, 'w+', **meta) as out:
        out_arr = out.read(Band)

        # this is where we create a generator of geom, value pairs to use in rasterizing
        shapes = ((geom,value) for geom, value in zip(Shapefile.geometry, Shapefile[ValueVar]))

        burned = features.rasterize(shapes=shapes, fill=0, out=out_arr, transform=out.transform)
        out.write_band(1, burned)
    out = burned = shapes = None
    
    print("Finished rasterizing. Checking contents. %s" % time.ctime())
    rioStats(OutFilePath)

In [13]:
def BatchZonal(RasterFileList, Zones, KeepFields = None, 
               RasterDirectory = os.getcwd(),
               OutPath = 'BatchZonal.csv',
               Statistics=['count', 'sum', 'mean'],
               NoDataVal = None,
               Prefix = '',
               Suffix = '', 
               DropStatName = False):
    '''
    Choose a single file for the raster source, or multiple. If multiple, then each stat field
    will receive a suffix with the raster's position in the RasterFileList.
    
        RasterFileList: list
                        List of file paths, e.g. tifs. Loaded with rasterio.
        Zones:          geodataframe object
        KeepFields:     list
                        List of field names (string format).
        OutPath:        string
                        File path for the results.
        Statistics:     list
                        Full Statistics options: 
                        ['max', 'min', 'mean', 'count', 'sum', 'median', 'std', 'majority', 'minority', 
                        'unique', 'range']
                        There's also a percentile option in zonal_stats(). I believe you 
                        write it after an underscore in string format, e.g.: 'percentile_25'.
        NoDataVal:      numeric
        Prefix:         string
        Suffix:         string
        DropStatName:   boolean
    
    '''
    if KeepFields is None:
        AllSummaries = pd.DataFrame(Zones)[[]]
    else:
        AllSummaries = pd.DataFrame(Zones)[KeepFields]
    print('Dataframe to merge with: \n', AllSummaries)
    
    
    for idx, RasterFile in enumerate(RasterFileList):
        print('Loading with rasterio. %s\nRaster: %s' % (time.ctime(), RasterFile))
        
        InRasterPath = os.path.join(RasterDirectory, RasterFile)
        
        Year = re.search('\d{4}', os.path.basename(RasterFile))
        if Year is None:
            Year = '_' + str(idx)
        else:
            Year = Year.group(0)

        with rasterio.open(InRasterPath) as src:
            transform = src.meta['transform']
            array = src.read(1)

        print(transform)
        print(array)
        
        print('Zonal statistics. %s' % time.ctime())
        zStats = zonal_stats(Zones, array, affine=transform, stats = Statistics, nodata=NoDataVal)
        #print(zStats)

        ValByZone = pd.DataFrame(Zones)[[]].join(pd.DataFrame(zStats))
        
        if DropStatName is False:
            for stat in Statistics:
                StatField = ''.join([Prefix, stat, Year, Suffix])
                ValByZone = ValByZone.rename(columns={stat:StatField})
            print('%s stat output field for %s: %s' % (stat, RasterFile, StatField))
        else:
            for stat in Statistics:
                StatField = ''.join([Prefix, Year, Suffix])
                ValByZone = ValByZone.rename(columns={stat:StatField})
            print('%s stat output field for %s: %s' % (stat, RasterFile, StatField))

        AllSummaries = AllSummaries.join(ValByZone)
        print(AllSummaries.sample(5))
        
    print('Final dataframe: \n', AllSummaries.sample(5))

    AllSummaries.to_csv(OutPath)

In [14]:
def MaskByZone(MaskPath, SourceFolder, DestFolder, SourceList = None,
               MaskLayerName = None, dstSRS = 'ESRI:102022'):
    """
    Reduces the size of a raster's valid data cells to vector areas of interest.
    This is useful if the raster data needs to be vectorized later to save space.
    
    The script prepares the vector zones as a list of geometries in the desired
    spatial reference system, then warps each raster in the specified source
    folder to the same SRS. Masking in rasterio then reclassifies any raster cells
    falling outside of a mask polygon as NoData.
    """
    
    ProjSRS = osr.SpatialReference()
    ProjSRS.SetFromUserInput(dstSRS)
    ProjWarp = gdal.WarpOptions(dstSRS = dstSRS)
    
    if SourceList is not None:
        SourceFiles = SourceList
    else:
        SourceFiles = []
        SourceFiles = SourceFiles + [i for i in os.listdir(''.join([SourceFolder, r'/'])) if i.endswith('tif')]
        print(SourceFiles)

    
    ### 1. ASSIGN SPATIAL REFERENCE SYSTEM OF VECTOR MASK AND LOAD GEOMETRIES
    Vector = gpd.read_file(filename=MaskPath, layer=MaskLayerName)
    if Vector.crs != dstSRS:
        if MaskLayerName == None:
            MaskPath = MaskPath + '_temp'
        else:
            MaskLayerName = MaskLayerName + '_temp'
        Vector.to_crs(dstSRS).to_file(filename=MaskPath, layer=MaskLayerName)
    Vector = None # We're reloading the geometries with fiona
    
    with fiona.open(MaskPath, mode="r", layer=MaskLayerName) as Vector:
        MaskGeom = [feature["geometry"] for feature in Vector] # Identify the bounding areas of the mask.
    
    
    ### 2. PREPARE DESTINATION FILES
    for FileName in SourceFiles:
    
        InputRasterPath = os.path.join(Workspace, SourceFolder, FileName)
        
        Sensor = re.search('[A-Z]+_', FileName)
        if Sensor is None:
            Sensor = ''
        else:
            Sensor = Sensor.group(0)

        Year = re.search('\d{4}', FileName)
        if Year is None:
            Year = ''
        else:
            Year = Year.group(0)

        if FileName.endswith('avg.tif') == True:
            IndicType = '_avg'
        elif FileName.endswith('cfc.tif') == True:
            IndicType = '_cfc'
        else:
            IndicType = ''

        TempOutputName = 'Temp_' + Sensor + Year + IndicType + '.tif'
        TempOutputPath = os.path.join(Workspace, DestFolder, TempOutputName)
        FinalOutputName = 'Msk_' + Sensor + Year + IndicType + '.tif'
        FinalOutputPath = os.path.join(Workspace, DestFolder, FinalOutputName)

    ### 3. ASSIGN SPATIAL REFERENCE SYSTEM OF RASTER(S)
        InputRasterObject = gdal.Open(InputRasterPath)
        SourceSRS = osr.SpatialReference(wkt=InputRasterObject.GetProjection())
        print('Source projection: ', SourceSRS.GetAttrValue('projcs'))
        print('Destination projection: ', ProjSRS.GetAttrValue('projcs'))

        if SourceSRS.GetAttrValue('projcs') != ProjSRS.GetAttrValue('projcs'):
            Warp = gdal.Warp(TempOutputPath, # Where to store the warped raster
                         InputRasterObject, # Which raster to warp
                         format='GTiff', 
                         options=ProjWarp) # Reproject to Africa Albers Equal Area Conic
            print('Finished gdal.Warp() for %s. %s \n' % (FileName, time.ctime()))

            Warp = None # Close the files
        else:
            pass
        InputRasterObject = None
        
    ### 4. RECLASSIFY AS NODATA IF OUTSIDE OF SETTLEMENT BUFFER ZONE.
        if exists(TempOutputPath):
            NewInputPath = TempOutputPath 
            print("We warped the data, so we'll use that file for next step.")
        else:
            NewInputPath = InputRasterPath 
            print("We skipped the warp, so we continue to use the source file.")

        with rasterio.open(NewInputPath) as InputRasterObject:
            MaskedOutputRaster, OutTransform = rasterio.mask.mask(
                InputRasterObject, MaskGeom, crop=True) # Anything outside the mask is reclassed to the raster's NoData value.
            OutMetaData = InputRasterObject.meta.copy()
        print('Finished rasterio.mask.mask() for %s. %s \n' % (FileName, time.ctime()))

        OutMetaData.update({"driver": "GTiff",
                         "height": MaskedOutputRaster.shape[1],
                         "width": MaskedOutputRaster.shape[2],
                         "transform": OutTransform})

        with rasterio.open(FinalOutputPath, "w", **OutMetaData) as dest:
            dest.write(MaskedOutputRaster)
        print('Written to file. %s \n' % time.ctime())
        InputRasterObject = None

        if exists(TempOutputPath):
            try:  # Finally, remove the intermediate file from disk
                os.remove(TempOutputPath)
            except OSError:
                pass
            print('Removed intermediate file. %s \n' % time.ctime())
        else:
            pass


    print('\n \n Finished all years in list. %s' % time.ctime())

In [15]:
def BatchZonalStats(FolderName, Zones, 
                    CRS = 'ESRI:102022', 
                    JoinField = 'Sett_ID',
                    StatsWanted = ['count', 'sum', 'mean', 'max', 'min'],
                    SeriesStart = 1999, SeriesEnd = 2021, 
                    AnnualizedFiles = None, VarPrefix = None):
    """
    Normally, we would use numpy to generate a point gdf from the raster's matrix. 
    However, I was running into a lot of memory errors with that method.
    This method uses some extra steps: tif to xyz to df to gdf. But it saves to file
    and deletes intermediate files along the way, circumventing memory issues.
    
    Run MaskByZone() prior to reduce the raster to only your area(s) of interest.
    
    """
    if AnnualizedFiles is None:
        AnnualizedFiles = [i for i in os.listdir(FolderName) if i.endswith('.tif')]
    print(AnnualizedFiles)
    AllSummaries = pd.DataFrame(Zones).drop(columns='geometry')[[JoinField]]
    print(AllSummaries)
    
    if VarPrefix is None:
        VarPrefix = FolderName[:3].upper()
    
    for FileName in AnnualizedFiles:
    ### STEP 1: TIF TO XYZ ###
        print('Loading data for %s. %s \n' % (FileName, time.ctime()))
        
        Sensor = re.search('[A-Z]+_', FileName)
        if Sensor is None:
            Sensor = ''
        else:
            Sensor = Sensor.group(0)
            
        Year = re.search('\d{4}', FileName)
        if Year is None:
            Year = ''
        else:
            Year = Year.group(0)
        
        InputRasterPath = os.path.join(Workspace, FolderName, FileName)
        InputRasterObject = gdal.Open(InputRasterPath)
        XYZOutputPath = FolderName + r'/{}'.format(
            FileName.replace('.tif', '.xyz')) # New file path will be the same as original, but .tif is replaced with .xyz

        # Create an .xyz version of the .tif
        if exists(XYZOutputPath):
            print("Already created xyz file.")
        else:
            print("Creating XYZ (gdal.Translate()).")
            XYZ = gdal.Translate(XYZOutputPath, # Specify a destination path
                                 InputRasterObject, # Input is the masked .tif file
                                 format='XYZ', 
                                 creationOptions=["ADD_HEADER_LINE=YES"])
            print('Finished gdal.Translate() for year %s. %s \n' % (Year, time.ctime()))
            XYZ = None # Reload XYZ as a point geodataframe

        InputRasterObject = None


    ### STEP 2: GENERATE GEODATAFRAME WITH JOIN FIELD ###
        InputXYZ = pd.read_table(XYZOutputPath, delim_whitespace=True)
        InputXYZ = InputXYZ.loc[InputXYZ['Z'] > 0] # Subset to only the features that have a value.
        
        if re.search('WSFE', FileName) is not None: # Scale back up to years if working with flood/building data.
            InputXYZ['Z'] = InputXYZ['Z'] + 1900
            
        print('Loaded XYZ file as a pandas dataframe. %s \n' % time.ctime())
        ValObject = gpd.GeoDataFrame(InputXYZ,
                                     geometry = gpd.points_from_xy(InputXYZ['X'], InputXYZ['Y']),
                                     crs = CRS)
        print('Created geodataframe from non-NoData points. %s \n' % time.ctime())
        del InputXYZ

        # Sjoin_nearest: No need to group by ADM this time. 
        ValObject_withID = pd.DataFrame(gpd.sjoin_nearest(ValObject, 
                                        Zones, 
                                        how='left')).drop(columns='geometry')[['Z', JoinField]] # No need for max_distance parameter this time. We've already narrowed down to nearby raster cells.

        print('\nJoined zone ID onto vectorized raster cells. %s \n' % time.ctime())
        print(ValObject_withID.sample(10))
        del ValObject

        ValObject_withID.to_csv(''.join([FolderName, r'/', FileName.replace('.tif', '.csv')]))
        print('\nExported as table. %s \n' % time.ctime())

#         # Remove the temporary xyz file.
#         try:  
#             os.remove(os.path.join(XYZOutputPath))
#         except OSError:
#             pass
#         print('Removed (or skipped if error) intermediate xyz file. %s \n' % time.ctime())


    ### STEP 3: AGGREGATE BY SETTLEMENT AND MERGE ONTO SUMMARIES TABLE ###
        GroupedVals = ValObject_withID[ValObject_withID['Z'].notna()].groupby(JoinField, as_index=False)
        
        # Run this block if the variable is about cloud-free coverage.
        if re.search('cfc', FileName) is not None:
            VariableName = ''.join([VarPrefix, 'cfc_', Sensor, Year])
            AllSummaries = AllSummaries.merge(GroupedVals.mean().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
            print('\nCount of cloud-free observations averaged to settlement level, year %s. %s \n' % (Year, time.ctime()))
            
            # Save in-progress results
            AllSummaries.to_csv(os.path.join(Results, ''.join([VarPrefix, '%sto%s.csv' % (SeriesStart, SeriesEnd)])))
            print(AllSummaries.sample(10))
        
        # Run this block if we're working with the flooded buildings data.
        elif re.search('WSFE', FileName) is not None:
            for BuiltYear in EligibleYears:
                Grouped_Subset = GroupedVals[GroupedVals['Z'].between(
                    1985, BuiltYear, inclusive=True)] # Inclusive parameter means we include the years 1985 and the named year.
                if 'count' in StatsWanted:
                    VariableName = ''.join([VarPrefix, 'ct', Sensor, BuiltYear])
                    AllSummaries = AllSummaries.merge(GroupedVals.count().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
                if 'sum' in StatsWanted:
                    VariableName = ''.join([VarPrefix, 'sum', Sensor, BuiltYear])
                    AllSummaries = AllSummaries.merge(GroupedVals.sum().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
                if 'mean' in StatsWanted:
                    VariableName = ''.join([VarPrefix, 'avg', Sensor, BuiltYear])
                    AllSummaries = AllSummaries.merge(GroupedVals.mean().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
                if 'max' in StatsWanted:
                    VariableName = ''.join([VarPrefix, 'max', Sensor, BuiltYear])
                    AllSummaries = AllSummaries.merge(GroupedVals.max().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
                if 'min' in StatsWanted:
                    VariableName = ''.join([VarPrefix, 'min', Sensor, BuiltYear])
                    AllSummaries = AllSummaries.merge(GroupedVals.min().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
                print('\nDesired aggregation methods applied to settlement level, year %s. %s \n' % (Year, time.ctime()))

                # Save in-progress results
                AllSummaries.to_csv(os.path.join(Results, ''.join([VarPrefix, '%sto%s.csv' % (SeriesStart, SeriesEnd)])))
                print(AllSummaries.sample(10))
        
        # Anything else takes the standard aggregation method.
        else:
            if 'count' in StatsWanted:
                VariableName = ''.join([VarPrefix, 'ct', Sensor, Year])
                AllSummaries = AllSummaries.merge(GroupedVals.count().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
            if 'sum' in StatsWanted:
                VariableName = ''.join([VarPrefix, 'sum', Sensor, Year])
                AllSummaries = AllSummaries.merge(GroupedVals.sum().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
            if 'mean' in StatsWanted:
                VariableName = ''.join([VarPrefix, 'avg', Sensor, Year])
                AllSummaries = AllSummaries.merge(GroupedVals.mean().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
            if 'max' in StatsWanted:
                VariableName = ''.join([VarPrefix, 'max', Sensor, Year])
                AllSummaries = AllSummaries.merge(GroupedVals.max().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
            if 'min' in StatsWanted:
                VariableName = ''.join([VarPrefix, 'min', Sensor, Year])
                AllSummaries = AllSummaries.merge(GroupedVals.min().rename(columns={'Z': VariableName}), how = 'left', on=JoinField)
            print('\nDesired aggregation methods applied to settlement level, year %s. %s \n' % (Year, time.ctime()))
            
            # Save in-progress results
            AllSummaries.to_csv(os.path.join(Results, ''.join([VarPrefix, '%sto%s.csv' % (SeriesStart, SeriesEnd)])))
            print(AllSummaries.sample(10))

    
    print('\n\nFinished. All years masked and assigned their nearest settlement. %s' % time.ctime())
    print(AllSummaries.sample(10))
    AllSummaries.to_csv(os.path.join(Results, ''.join([VarPrefix, '%sto%s.csv' % (SeriesStart, SeriesEnd)])))
    print('Saved to file. %s \n' % time.ctime())

---

## 2. PREPARE BUILDUP, SETTLEMENT, AND ADMIN DATASETS
Projection for all datasets: Africa Albers Equal Area Conic

### 2.1 Prepare GRID3 and Admin area files

#### Admin areas

In [16]:
# Pull the first file ([0]) which ends in '.shp' from the specified folder, drop all variables, and reproject.
ADM_vec = gpd.read_file(glob.glob('Source/ADM/*.shp')[0])[['geometry']].to_crs('ESRI:102022')

# Create a fresh unique ID
ADM_vec['ADM_ID'] = range(0, len(ADM_vec))
ADM_vec['ADM_ID'] = ADM_vec['ADM_ID'] + 1  # Plus one allows us to assign 0 as No Data when rasterizing, to match WSFE.
# We have to add 1 if we want our rasterized version's NoData value to be 0. Otherwise the first feature won't be valid.
ADM_vec.to_file(driver='GPKG', filename=os.path.join(Source, 'ADM', 'ADM_withID.gpkg'), layer='ADM')

# Now reload.
ADM_vec = gpd.read_file(os.path.join(Source, 'ADM', 'ADM_withID.gpkg'), layer='ADM')

In [17]:
# We need to know how many digits need to be allocated to each dataset in the "join" serial.
len_ADM = len(str(ADM_vec['ADM_ID'].max()))

print(ADM_vec.info(), "\n\n", 
      ADM_vec.sample(5),
      ADM_vec.crs, "\n\n", 
      'Number of digits: ', len_ADM) 

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 272 entries, 0 to 271
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   ADM_ID    272 non-null    int64   
 1   geometry  272 non-null    geometry
dtypes: geometry(1), int64(1)
memory usage: 4.4 KB
None 

      ADM_ID                                           geometry
50       51  POLYGON ((1184519.255 3881539.664, 1184522.540...
174     175  POLYGON ((1189990.139 3892919.557, 1190113.966...
95       96  POLYGON ((1190359.520 3725033.179, 1190481.824...
66       67  POLYGON ((1340425.094 3983571.296, 1339706.926...
150     151  POLYGON ((1226443.094 3854947.081, 1226283.464... PROJCS["Africa_Albers_Equal_Area_Conic",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",

#### Settlements

In [18]:
# For GRID3, we will still retain the 'type' field.
GRID3_vec = gpd.read_file(glob.glob('Source/Settlement/*.shp')[0])[['class_v1','geometry']].to_crs("ESRI:102022")

GRID3_vec['G3_ID'] = range(0,len(GRID3_vec))
GRID3_vec['G3_ID'] = GRID3_vec['G3_ID'] +1 # Plus one allows us to assign 0 as No Data when rasterizing, to match WSFE.

In [19]:
GRID3_vec.to_file(driver='GPKG', filename=os.path.join(Source, 'Settlement', 'Settlement_withID.gpkg'), layer='GRID3')
GRID3_vec = gpd.read_file(os.path.join(Source, 'Settlement', 'Settlement_withID.gpkg'), layer='GRID3')

In [20]:
len_G3 =  len(str(GRID3_vec['G3_ID'].max()))

print(GRID3_vec.info(), "\n\n",
      GRID3_vec.sample(5),
      GRID3_vec.crs, "\n\n", 
      'Number of digits: ', len_G3)

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 157080 entries, 0 to 157079
Data columns (total 3 columns):
 #   Column    Non-Null Count   Dtype   
---  ------    --------------   -----   
 0   class_v1  157080 non-null  object  
 1   G3_ID     157080 non-null  int64   
 2   geometry  157080 non-null  geometry
dtypes: geometry(1), int64(1), object(1)
memory usage: 3.6+ MB
None 

        class_v1   G3_ID                                           geometry
2385        SSA    2386  POLYGON ((1157156.777 3652077.065, 1157244.456...
146381   Hamlet  146382  POLYGON ((1653157.319 3977659.526, 1653420.703...
96226    Hamlet   96227  POLYGON ((1275960.840 3974436.382, 1276048.634...
111234   Hamlet  111235  POLYGON ((1347453.247 3979275.227, 1347541.043...
22091    Hamlet   22092  POLYGON ((1156671.016 3923360.306, 1156758.791... PROJCS["Africa_Albers_Equal_Area_Conic",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["degree",0.017

In [21]:
GRID3_vec = GRID3_vec.rename(columns={'class_v1':'type'})

### 2.2 Reproject WSFE to project CRS, and ensure file contains only valid date range.

In [22]:
for filename in os.listdir(os.path.join(Source, 'Buildup')):
    if exists(filename.endswith('.tif')):
        SourceWSFE = glob.glob(os.path.join(Source, 'Buildup', '*.tif'))[0]
    else:
        # If WSFE hasn't been mosaicked yet:
        A=os.path.join(Source, 'Buildup', 'Premosaic', 'WSFE1.tif')
        B=os.path.join(Source, 'Buildup', 'Premosaic', 'WSFE2.tif')
        C=os.path.join(Source, 'Buildup', 'Premosaic', 'WSFE3.tif')
        D=os.path.join(Source, 'Buildup', 'Premosaic', 'WSFE4.tif')

        SourceWSFE = os.path.join(Intermediate, 'WSFE.tif')
        
        mosaicShell(A=A, B=B, C=C, D=D, OutFile=SourceWSFE, OutType = ' -ot UInt32 ')

In [23]:
SourceWSFE

'Q:\\GIS\\povertyequity\\urban_growth\\Syria\\Source\\Buildup\\GAIA_WSFevolution_1985-2022.tif'

In [None]:
InPath = SourceWSFE
OutWGS = os.path.join(Intermediate, 'WSFE_reclass.tif')

# Together, x and y define the data's "shape".
# geotransform contains the parameters detailing how the raster should be stretched and aligned.
# geoproj is the map projection
# Z are the values in the raster band.
[xsize,ysize,geotransform,geoproj,Z] = readRaster(InPath)
Z[Z<Year_start] = 0
Z[Z>Year_end] = 0

writeRaster(OutWGS,geotransform,geoproj,Z, NoDataVal=0, dst_datatype=gdal.GDT_UInt32) # Unsigned 32 so that we can concatenate.
print('Wrote the reclassed raster to file. %s' % time.ctime())

In [77]:
# In case the raster doesn't go up to our preferred end year, let's make sure we've recorded the accurate one.
WSFE_start = int(Z[Z>0].min())
WSFE_end = int(Z[Z>0].max())
print('Source date range: ', WSFE_start, '-', WSFE_end)

Source date range:  1985 - 2021


In [78]:
WSFE_Years = ListFromRange(WSFE_start, WSFE_end)
Reversed_WSFE_Years = []
for i in WSFE_Years:
    Reversed_WSFE_Years.insert(0,i)
print(WSFE_Years, '\n\n', Reversed_WSFE_Years)

[1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021] 

 [2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, 1998, 1997, 1996, 1995, 1994, 1993, 1992, 1991, 1990, 1989, 1988, 1987, 1986, 1985]


In [26]:
OutEqArea = os.path.join(Intermediate, 'WSFE_equalarea.tif')

# Whenever we want to work in a projected CRS, we'll use Africa Albers Equal Area Conic.
ProjEqArea = gdal.WarpOptions(dstSRS='ESRI:102022')
Warp = gdal.Warp(OutEqArea, # Where to store the warped raster
                 OutWGS, # Which raster to warp
                 format='GTiff', 
                 options=ProjEqArea)
print('Wrote the reclassed and reprojected raster to file. %s' % time.ctime())

Wrote the reclassed and reprojected raster to file. Wed Sep 13 11:25:25 2023


In [27]:
rioStats(OutWGS)
rioStats(OutEqArea)


 [{'raster': 'Q:/GIS/povertyequity/urban_growth/Syria/Intermediate/WSFE_reclass.tif', 'bands': 1, 'data type': ('uint32',), 'no data value': (0.0,), 'width': 31912, 'height': 21152, 'min': 0, 'mean': 43.49248497291767, 'median': 0.0, 'max': 2021}]

 [{'raster': 'Q:/GIS/povertyequity/urban_growth/Syria/Intermediate/WSFE_equalarea.tif', 'bands': 1, 'data type': ('uint32',), 'no data value': (0.0,), 'width': 32854, 'height': 20036, 'min': 0, 'mean': 42.82576195744719, 'median': 0.0, 'max': 2021}]


In [28]:
InPath = Warp = OutWGS = OutEqArea = SourceWSFE = None

---

## 3. WSFE AND ADM; GRID3 AND ADM
RASTERIZE: Bring ADM and GRID3 into raster space.

RASTER MATH: "Join" ADM ID onto GRID3 and onto WSFE by creating unique concatenation string.

VECTORIZE: Bring joined data into vector space.

VECTOR MATH: Split unique ID from raster math step into separate columns.

### 3.1 Rasterize admin areas and GRID3 using WSFE specs.

In [29]:
G3_max = GRID3_vec.G3_ID.max()
G3_min = GRID3_vec.G3_ID.min()
ADM_max = ADM_vec.ADM_ID.max()
ADM_min = ADM_vec.ADM_ID.min()
print(G3_max, G3_min, ADM_max, ADM_min) # Min should not be zero.

157080 1 272 1


In [30]:
# Copy and update the metadata from WSFE for the output
WSFE = os.path.join(Intermediate, 'WSFE_equalarea.tif')
ADM_out = os.path.join(Intermediate, 'ADM_rasterized.tif')
GRID3_out = os.path.join(Intermediate, 'GRID3_rasterized.tif')

In [31]:
# ADM raster can be unsigned int 16. Consider 32 if there are thousands of admin areas.
ShapeToRaster(Shapefile=ADM_vec, ValueVar="ADM_ID", MetaRasterPath=WSFE, OutFilePath=ADM_out, NewDType = 'uint16')

# To be safe, we'll do unsigned 32 for settlements.
ShapeToRaster(GRID3_vec, "G3_ID", WSFE, GRID3_out, NewDType='uint32')

# Check printed stats here to make sure the NoData and min values are 0 
# and the max is the number of vector features in the input.

Rasterizing dataset. Wed Sep 13 11:25:35 2023
Finished rasterizing. Checking contents. Wed Sep 13 11:25:43 2023

 [{'raster': 'Q:/GIS/povertyequity/urban_growth/Syria/Intermediate/ADM_rasterized.tif', 'bands': 1, 'data type': ('uint16',), 'no data value': (0.0,), 'width': 32854, 'height': 20036, 'min': 0, 'mean': 58.275021362898215, 'median': 0.0, 'max': 272}]
Rasterizing dataset. Wed Sep 13 11:25:46 2023
Finished rasterizing. Checking contents. Wed Sep 13 11:26:16 2023

 [{'raster': 'Q:/GIS/povertyequity/urban_growth/Syria/Intermediate/GRID3_rasterized.tif', 'bands': 1, 'data type': ('uint32',), 'no data value': (0.0,), 'width': 32854, 'height': 20036, 'min': 0, 'mean': 1511.3145033375913, 'median': 0.0, 'max': 157080}]


### 3.2 Raster math to "join" admin to GRID3 and to WSFE.
Processing is more rapid when "joining," i.e. creating serial codes out of two datasets, in raster rather than vector space.
Here, we are concatenating the ID fields of the two datasets to create a serial number that we can then split in vector space later to create two ID fields.

In [32]:
# In paths
InG3 = os.path.join(Intermediate, 'GRID3_rasterized.tif')
InWSFE = os.path.join(Intermediate, 'WSFE_equalarea.tif')
InADM = os.path.join(Intermediate, 'ADM_rasterized.tif')

# Out paths
G3_ADM = os.path.join(Intermediate, 'GRID3_ADM.tif')
WSFE_ADM = os.path.join(Intermediate, 'WSFE_ADM.tif')

In [33]:
G3_rio = rasterio.open(InG3).read(1)
WSFE_rio = rasterio.open(InWSFE).read(1)
ADM_rio = rasterio.open(InADM).read(1)

In [34]:
print("G3 range accurate? ", G3_rio.max()==G3_max)
print("WSFE range accurate? ", WSFE_rio.max()==WSFE_end)
print("ADM range accurate? ", ADM_rio.max()==ADM_max)

G3 range accurate?  True
WSFE range accurate?  True
ADM range accurate?  True


In [35]:
# Recalculate number of digits for each dataset if starting from Section 3
len_G3 = len(str(G3_rio.max()))
len_WSFE = len(str(WSFE_rio.max()))
len_ADM = len(str(ADM_rio.max()))

In [36]:
G3_rio = WSFE_rio = ADM_rio = None
print('\nDIGITS', '\nGRID3: ', len_G3, '\nWSFE: ', len_WSFE, '\nADM: ', len_ADM)


DIGITS 
GRID3:  6 
WSFE:  4 
ADM:  3


In [40]:
# Calculations
# The number of digits in the largest ADM index value (len_ADM) is 
# the number of zeroes we tack onto the first variable in the serial.

Calc = "(A*" + str(10**len_ADM) + ")+B" 

calcShell(A=InG3, B=InADM, OutFile=G3_ADM, Calculation=Calc)
calcShell(A=InWSFE, B=InADM, OutFile=WSFE_ADM, Calculation=Calc)

Running for Q:\GIS\povertyequity\urban_growth\Syria\Intermediate\GRID3_rasterized.tif. Wed Sep 13 11:27:39 2023
Ran in shell. See OutFile folder to inspect results. Wed Sep 13 11:28:42 2023
Running for Q:\GIS\povertyequity\urban_growth\Syria\Intermediate\WSFE_equalarea.tif. Wed Sep 13 11:28:42 2023
Ran in shell. See OutFile folder to inspect results. Wed Sep 13 11:29:44 2023


*Adding together the values to create join IDs. This is in effect a concatenation of their ID strings, by way of summation. The number of zeros in the calc multiplication corresponds with number of digits of the maximum value in the "B" dataset. (e.g. Chad ADM codes go up 4 digits, so it's calc=(A*10000)+B).*

In [41]:
rioStats(G3_ADM)
rioStats(WSFE_ADM)


 [{'raster': 'Q:/GIS/povertyequity/urban_growth/Syria/Intermediate/GRID3_ADM.tif', 'bands': 1, 'data type': ('uint32',), 'no data value': (4294967293.0,), 'width': 32854, 'height': 20036, 'min': 1063, 'mean': 4208011649.944821, 'median': 4294967293.0, 'max': 4294967293}]

 [{'raster': 'Q:/GIS/povertyequity/urban_growth/Syria/Intermediate/WSFE_ADM.tif', 'bands': 1, 'data type': ('uint32',), 'no data value': (4294967293.0,), 'width': 32854, 'height': 20036, 'min': 1985001, 'mean': 4271188891.221792, 'median': 4294967293.0, 'max': 4294967293}]


In [42]:
G3_ADM = WSFE_ADM = None

### 3.3 Vectorize serialized layers.

In [43]:
G3_in = os.path.join(Intermediate, 'GRID3_ADM.tif')
G3_out = os.path.join(Intermediate, 'GRID3_ADM.shp')
WSFE_in = os.path.join(Intermediate, 'WSFE_ADM.tif')
WSFE_out = os.path.join(Intermediate, 'WSFE_ADM.shp')

In [44]:
RasterToShapefile(G3_in, G3_out, OutName='GRID3_ADM', VariableName='gridcode', Driver = 'ESRI Shapefile')

PROJCS["Africa_Albers_Equal_Area_Conic",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",0],PARAMETER["longitude_of_center",25],PARAMETER["standard_parallel_1",20],PARAMETER["standard_parallel_2",-23],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]] 

 PROJCS["Africa_Albers_Equal_Area_Conic",
    GEOGCS["WGS 84",
        DATUM["WGS_1984",
            SPHEROID["WGS 84",6378137,298.257223563,
                AUTHORITY["EPSG","7030"]],
            AUTHORITY["EPSG","6326"]],
        PRIMEM["Greenwich",0],
        UNIT["degree",0.0174532925199433,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","4326"]],
    PROJECTION["Albers_Conic_

In [45]:
RasterToShapefile(WSFE_in, WSFE_out, OutName='WSFE_ADM', VariableName='gridcode', Driver = 'ESRI Shapefile')

PROJCS["Africa_Albers_Equal_Area_Conic",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",0],PARAMETER["longitude_of_center",25],PARAMETER["standard_parallel_1",20],PARAMETER["standard_parallel_2",-23],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]] 

 PROJCS["Africa_Albers_Equal_Area_Conic",
    GEOGCS["WGS 84",
        DATUM["WGS_1984",
            SPHEROID["WGS 84",6378137,298.257223563,
                AUTHORITY["EPSG","7030"]],
            AUTHORITY["EPSG","6326"]],
        PRIMEM["Greenwich",0],
        UNIT["degree",0.0174532925199433,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","4326"]],
    PROJECTION["Albers_Conic_

### 3.4 Vector math to split raster strings into admin area, GRID3, and WSFE year assignments.

In [46]:
# Load newly created vectorized datasets.
GRID3_ADM = gpd.read_file(G3_out).to_crs("ESRI:102022")
WSFE_ADM = gpd.read_file(WSFE_out).to_crs("ESRI:102022")
print(GRID3_ADM.info(), "\n\n", GRID3_ADM.sample(10), "\n\n", GRID3_ADM.crs, "\n\n", 
      WSFE_ADM.info(), "\n\n", WSFE_ADM.sample(10), "\n\n", WSFE_ADM.crs, "\n\n", 
      GRID3_ADM['gridcode'].max(), WSFE_ADM['gridcode'].max())

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 197953 entries, 0 to 197952
Data columns (total 2 columns):
 #   Column    Non-Null Count   Dtype   
---  ------    --------------   -----   
 0   gridcode  197953 non-null  int64   
 1   geometry  197953 non-null  geometry
dtypes: geometry(1), int64(1)
memory usage: 3.0 MB
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 1375670 entries, 0 to 1375669
Data columns (total 2 columns):
 #   Column    Non-Null Count    Dtype   
---  ------    --------------    -----   
 0   gridcode  1375670 non-null  int64   
 1   geometry  1375670 non-null  geometry
dtypes: geometry(1), int64(1)
memory usage: 21.0 MB
None 

          gridcode                                           geometry
82893   110514166  POLYGON ((1399028.929 3952698.588, 1399111.913...
46466   100721076  POLYGON ((1313362.258 3991617.866, 1313528.225...
81225    77211030  POLYGON ((1213892.659 3954551.887, 1213975.642...
111583   67734246  POLYGON ((1234638.543 39

In [47]:
# Split serial back into separate dataset fields.
# For example, Burkina: WSFE and ADM: 4+3=7 digits. GRID3 and ADM: 6+3=9 digits.
G3_Fill = len_G3 + len_ADM
WSFE_Fill = len_WSFE + len_ADM

GRID3_ADM['gridstring'] = GRID3_ADM['gridcode'].astype(str).str.zfill(G3_Fill)
WSFE_ADM['gridstring'] = WSFE_ADM['gridcode'].astype(str).str.zfill(WSFE_Fill)

GRID3_ADM['Sett_ID'] = GRID3_ADM['gridstring'].str[:-len_ADM].astype(int) # Remove the last 3 digits to get the GRID3 portion.
GRID3_ADM['ADM_ID'] = GRID3_ADM['gridstring'].str[-len_ADM:].astype(int) # Keep only the last 3 digits to get the ADM portion.
WSFE_ADM['year'] = WSFE_ADM['gridstring'].str[:-len_ADM].astype(int)
WSFE_ADM['ADM_ID'] = WSFE_ADM['gridstring'].str[-len_ADM:].astype(int)

print(GRID3_ADM.sample(10), WSFE_ADM.sample(10))

          gridcode                                           geometry  \
51885     76723062  POLYGON ((1233781.046 3984619.588, 1233864.030...   
70140    139073152  POLYGON ((1596612.730 3965229.102, 1596806.358...   
181541    36864155  POLYGON ((1232646.938 3686708.691, 1232729.922...   
164896  2147483647  POLYGON ((1163217.379 3746069.581, 1163383.346...   
92625    109807012  POLYGON ((1361216.098 3940997.909, 1361299.081...   
149351    50530016  POLYGON ((1223657.055 3837351.472, 1223933.667...   
164443    38225070  POLYGON ((1255661.039 3747812.236, 1255771.684...   
17368    135560243  POLYGON ((1509839.612 4030315.856, 1510005.579...   
25205    115715177  POLYGON ((1377204.259 4022128.147, 1377287.242...   
65909    139683152  POLYGON ((1564940.680 3968825.055, 1565051.325...   

        gridstring  Sett_ID  ADM_ID  
51885    076723062    76723      62  
70140    139073152   139073     152  
181541   036864155    36864     155  
164896  2147483647  2147483     647  
92625 

In [48]:
# Dissolve any features that have the same G3 and ADM values so that we have a single unique feature per settlement.
# Note: we do NOT want to dissolve the WSFE features. Distinct features for noncontiguous builtup areas of the same year is necessary to separate them in the Near tool step.

print(time.ctime())
GRID3_ADM = GRID3_ADM.dissolve(by=['Sett_ID', 'ADM_ID'], as_index=False)
print(GRID3_ADM.info(), GRID3_ADM.head(), "\n\n", time.ctime())

Wed Sep 13 11:46:11 2023
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 158685 entries, 0 to 158684
Data columns (total 5 columns):
 #   Column      Non-Null Count   Dtype   
---  ------      --------------   -----   
 0   Sett_ID     158685 non-null  int64   
 1   ADM_ID      158685 non-null  int64   
 2   geometry    158685 non-null  geometry
 3   gridcode    158685 non-null  int64   
 4   gridstring  158685 non-null  object  
dtypes: geometry(1), int64(3), object(1)
memory usage: 6.1+ MB
None    Sett_ID  ADM_ID                                           geometry  \
0        1      63  POLYGON ((1185042.049 3636946.231, 1185069.710...   
1        1     180  POLYGON ((1185069.710 3636946.231, 1185235.677...   
2        2      40  MULTIPOLYGON (((1143660.926 3656253.733, 11436...   
3        3      63  MULTIPOLYGON (((1167117.605 3647263.850, 11671...   
4        4      63  POLYGON ((1171598.716 3648314.975, 1171792.344...   

   gridcode gridstring  
0      1063  000001063  

In [49]:
# Remove features where year, settlement, or admin area = 0.
# This was supposed to be resolved earlier with the gdal_calc NoDataValue parameter. Being thorough.

print("Before: WSFE %s and GRID3 %s\n" % (WSFE_ADM.shape, GRID3_ADM.shape))
WSFE_ADM = WSFE_ADM.loc[(WSFE_ADM["year"] != 0) & (WSFE_ADM["ADM_ID"] != 0)] # Since we change the datatype to integer, no need to include all digits. Otherwise, it would need to be: != '0000'
GRID3_ADM = GRID3_ADM.loc[(GRID3_ADM["Sett_ID"] != 0) & (GRID3_ADM["ADM_ID"] != 0)]
print("After: WSFE %s and GRID3 %s\n" % (WSFE_ADM.shape, GRID3_ADM.shape))

Before: WSFE (1375670, 5) and GRID3 (158685, 5)

After: WSFE (1375670, 5) and GRID3 (158685, 5)



In [50]:
# The Bounded_ID is our new unique settlement identifier for subsequent matching steps.
GRID3_ADM['Bounded_ID'] = GRID3_ADM.index
WSFE_ADM['WSFE_ID'] = WSFE_ADM.index
GRID3_ADM = GRID3_ADM[['Sett_ID', 'Bounded_ID', 'ADM_ID', 'geometry']]
WSFE_ADM = WSFE_ADM[['WSFE_ID', 'year', 'ADM_ID', 'geometry']]

In [51]:
# Validation: 
# The first two printed numbers should be the same. There shouldn't be any GRID3 rows with matching Sett_ID and ADM_IDs.
# The latter two numbers should be different, and the first should be larger. We never dissolved WSFE by any column.

print(len(GRID3_ADM[['Sett_ID', 'ADM_ID']]),
      len(GRID3_ADM[['Sett_ID', 'ADM_ID']].drop_duplicates()),
      len(WSFE_ADM[['year', 'ADM_ID']]),
      len(WSFE_ADM[['year', 'ADM_ID']].drop_duplicates()))

158685 158685 1375670 5725


In [52]:
GRID3_ADM.to_file(
    driver='GPKG', filename=os.path.join(Intermediate,'GRID3_ADM.gpkg'), layer='GRID3_ADM_cleaned')
WSFE_ADM.to_file(
    driver='GPKG', filename=os.path.join(Intermediate,' WSFE_ADM.gpkg'), layer='WSFE_ADM_cleaned')

---

## 4. UNIQUE SETTLEMENTS FROM WSFE AND GRID3: TWO VERSIONS

Note that there are 2 versions here, so that we can create a fragmentation index:
1. **Boundless, aka boundary-agnostic settlements**: Unique settlements are linked to GRID3 settlement IDs. Administrative areas do not influence the extents of the settlement.
2. **Bounded, aka politically-defined settlements**: Settlements in the Boundless dataset which spread across more than one administrative area are split into separate settlements in the Bounded dataset. The largest polygon after the split is considered the "principal" settlement, and polygons in other admin areas are considered "fragments." By dividing the fragment area(s) of the Bounded settlement by the area of the Boundless settlement, we can acquire a fragmentation index for each locality.

### 4.1 BOUNDED SETTLEMENTS: Near Join by ADM group.

In [53]:
print("Number of admin areas with GRID3 features: %s" % len(GRID3_ADM['ADM_ID'].unique().tolist()))
print("Number of admin areas with WSFE features: %s" % len(WSFE_ADM['ADM_ID'].unique().tolist()))
print("Number of admin areas where one dataset is observed but the other is not: %s" % (
    len(GRID3_ADM['ADM_ID'].unique().tolist()) - len(WSFE_ADM['ADM_ID'].unique().tolist())))

Number of admin areas with GRID3 features: 273
Number of admin areas with WSFE features: 272
Number of admin areas where one dataset is observed but the other is not: 1


In [54]:
ADM_IDs = sorted(GRID3_ADM['ADM_ID'].unique().tolist())
ADM_IDs

[1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 47,
 48,
 49,
 50,
 51,
 52,
 53,
 54,
 55,
 56,
 57,
 58,
 59,
 60,
 61,
 62,
 63,
 64,
 65,
 66,
 67,
 68,
 69,
 70,
 71,
 72,
 73,
 74,
 75,
 76,
 77,
 78,
 79,
 80,
 81,
 82,
 83,
 84,
 85,
 86,
 87,
 88,
 89,
 90,
 91,
 92,
 93,
 94,
 95,
 96,
 97,
 98,
 99,
 100,
 101,
 102,
 103,
 104,
 105,
 106,
 107,
 108,
 109,
 110,
 111,
 112,
 113,
 114,
 115,
 116,
 117,
 118,
 119,
 120,
 121,
 122,
 123,
 124,
 125,
 126,
 127,
 128,
 129,
 130,
 131,
 132,
 133,
 134,
 135,
 136,
 137,
 138,
 139,
 140,
 141,
 142,
 143,
 144,
 145,
 146,
 147,
 148,
 149,
 150,
 151,
 152,
 153,
 154,
 155,
 156,
 157,
 158,
 159,
 160,
 161,
 162,
 163,
 164,
 165,
 166,
 167,
 168,
 169,
 170,
 171,
 172,
 173,
 174,
 175,
 176,
 177,
 178,
 179,
 180,
 181,
 182,
 183,
 184,
 185

In [55]:
# We're creating this field to help in removing duplicates from the sjoin_nearest, next section.
GRID3_ADM['G3_Area'] = GRID3_ADM['geometry'].area / 10**6

In [56]:
# Create empty geodataframe to append onto using the dataframe whose geometry we want to retain.
Bounded = GRID3_ADM[0:0]
Bounded["year"] = pd.Series(dtype='int')
Bounded.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 0 entries
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   Sett_ID     0 non-null      int64   
 1   Bounded_ID  0 non-null      int64   
 2   ADM_ID      0 non-null      int64   
 3   geometry    0 non-null      geometry
 4   G3_Area     0 non-null      float64 
 5   year        0 non-null      int32   
dtypes: float64(1), geometry(1), int32(1), int64(3)
memory usage: 0.0 bytes


In [57]:
for ID in ADM_IDs:
    WSFE_shard = WSFE_ADM.loc[WSFE_ADM['ADM_ID'] == ID]
    GRID3_shard = GRID3_ADM.loc[GRID3_ADM['ADM_ID'] == ID]
    WSFE_GRID3_shard = gpd.sjoin_nearest(WSFE_shard, 
                                         GRID3_shard, 
                                         how='inner',
                                         max_distance=500)
    Bounded = pd.concat([Bounded, WSFE_GRID3_shard])
    print('Completed near join in admin area %s. %s \n' % (ID, time.ctime()))
print('Completed near join for all ADMs. %s \n' % time.ctime())

del WSFE_shard, GRID3_shard, WSFE_GRID3_shard

Completed near join in admin area 1. Wed Sep 13 11:52:44 2023 

Completed near join in admin area 2. Wed Sep 13 11:52:44 2023 

Completed near join in admin area 3. Wed Sep 13 11:52:45 2023 

Completed near join in admin area 4. Wed Sep 13 11:52:45 2023 

Completed near join in admin area 5. Wed Sep 13 11:52:45 2023 

Completed near join in admin area 6. Wed Sep 13 11:52:45 2023 

Completed near join in admin area 7. Wed Sep 13 11:52:45 2023 

Completed near join in admin area 8. Wed Sep 13 11:52:45 2023 

Completed near join in admin area 9. Wed Sep 13 11:52:45 2023 

Completed near join in admin area 10. Wed Sep 13 11:52:45 2023 

Completed near join in admin area 11. Wed Sep 13 11:52:46 2023 

Completed near join in admin area 12. Wed Sep 13 11:52:46 2023 

Completed near join in admin area 13. Wed Sep 13 11:52:46 2023 

Completed near join in admin area 14. Wed Sep 13 11:52:47 2023 

Completed near join in admin area 15. Wed Sep 13 11:52:47 2023 

Completed near join in admin area 

Completed near join in admin area 127. Wed Sep 13 11:53:21 2023 

Completed near join in admin area 128. Wed Sep 13 11:53:22 2023 

Completed near join in admin area 129. Wed Sep 13 11:53:26 2023 

Completed near join in admin area 130. Wed Sep 13 11:53:26 2023 

Completed near join in admin area 131. Wed Sep 13 11:53:26 2023 

Completed near join in admin area 132. Wed Sep 13 11:53:26 2023 

Completed near join in admin area 133. Wed Sep 13 11:53:26 2023 

Completed near join in admin area 134. Wed Sep 13 11:53:26 2023 

Completed near join in admin area 135. Wed Sep 13 11:53:26 2023 

Completed near join in admin area 136. Wed Sep 13 11:53:26 2023 

Completed near join in admin area 137. Wed Sep 13 11:53:26 2023 

Completed near join in admin area 138. Wed Sep 13 11:53:27 2023 

Completed near join in admin area 139. Wed Sep 13 11:53:27 2023 

Completed near join in admin area 140. Wed Sep 13 11:53:27 2023 

Completed near join in admin area 141. Wed Sep 13 11:53:27 2023 

Completed 

Completed near join in admin area 253. Wed Sep 13 11:53:55 2023 

Completed near join in admin area 254. Wed Sep 13 11:53:55 2023 

Completed near join in admin area 255. Wed Sep 13 11:53:55 2023 

Completed near join in admin area 256. Wed Sep 13 11:53:55 2023 

Completed near join in admin area 257. Wed Sep 13 11:53:56 2023 

Completed near join in admin area 258. Wed Sep 13 11:53:56 2023 

Completed near join in admin area 259. Wed Sep 13 11:53:56 2023 

Completed near join in admin area 260. Wed Sep 13 11:53:57 2023 

Completed near join in admin area 261. Wed Sep 13 11:53:57 2023 

Completed near join in admin area 262. Wed Sep 13 11:53:57 2023 

Completed near join in admin area 263. Wed Sep 13 11:53:57 2023 

Completed near join in admin area 264. Wed Sep 13 11:53:57 2023 

Completed near join in admin area 265. Wed Sep 13 11:53:58 2023 

Completed near join in admin area 266. Wed Sep 13 11:53:58 2023 

Completed near join in admin area 267. Wed Sep 13 11:53:58 2023 

Completed 

In [58]:
Bounded.sample(20)

Unnamed: 0,Sett_ID,Bounded_ID,ADM_ID,geometry,G3_Area,year,WSFE_ID,ADM_ID_left,index_right,ADM_ID_right
960567,10708,10886,,"POLYGON ((1163715.280 3732570.926, 1163742.941...",29.493118,1990,960567.0,197.0,10886.0,197.0
1274853,31915,32458,,"POLYGON ((1208166.795 3658826.223, 1208194.456...",3.485982,2021,1274853.0,176.0,32458.0,176.0
726468,50649,51510,,"POLYGON ((1225786.966 3842800.724, 1225814.627...",93.299741,2021,726468.0,115.0,51510.0,115.0
600096,21451,21921,,"POLYGON ((1154421.124 3868497.959, 1154448.785...",0.853897,2020,600096.0,260.0,21921.0,260.0
1030913,10549,10718,,"POLYGON ((1181529.079 3727730.220, 1181556.741...",0.009182,1990,1030913.0,61.0,10718.0,61.0
1127932,42501,43291,,"POLYGON ((1208747.679 3718823.320, 1208775.341...",0.013773,1999,1127932.0,93.0,43291.0,93.0
147717,147384,149113,,"POLYGON ((1657660.952 4015351.158, 1657688.613...",0.769732,2010,147717.0,11.0,149113.0,11.0
388644,137972,139735,,"POLYGON ((1561400.049 3930237.710, 1561455.372...",3.58698,2010,388644.0,248.0,139735.0,248.0
681933,52279,53143,,"POLYGON ((1224846.486 3848830.861, 1224901.808...",0.016068,2014,681933.0,151.0,53143.0,151.0
167853,103121,104770,,"POLYGON ((1317594.418 4004563.298, 1317622.080...",21.638182,2010,167853.0,18.0,104770.0,18.0


In [59]:
Bounded.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 1353979 entries, 124741 to 1375669
Data columns (total 10 columns):
 #   Column        Non-Null Count    Dtype   
---  ------        --------------    -----   
 0   Sett_ID       1353979 non-null  int64   
 1   Bounded_ID    1353979 non-null  int64   
 2   ADM_ID        0 non-null        float64 
 3   geometry      1353979 non-null  geometry
 4   G3_Area       1353979 non-null  float64 
 5   year          1353979 non-null  int32   
 6   WSFE_ID       1353979 non-null  float64 
 7   ADM_ID_left   1353979 non-null  float64 
 8   index_right   1353979 non-null  float64 
 9   ADM_ID_right  1353979 non-null  float64 
dtypes: float64(6), geometry(1), int32(1), int64(2)
memory usage: 108.5 MB


In [60]:
# Remove WSFE features that did not match any GRID3 settlements.
Bounded = Bounded.loc[~Bounded['Sett_ID'].isna()]
Bounded.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 1353979 entries, 124741 to 1375669
Data columns (total 10 columns):
 #   Column        Non-Null Count    Dtype   
---  ------        --------------    -----   
 0   Sett_ID       1353979 non-null  int64   
 1   Bounded_ID    1353979 non-null  int64   
 2   ADM_ID        0 non-null        float64 
 3   geometry      1353979 non-null  geometry
 4   G3_Area       1353979 non-null  float64 
 5   year          1353979 non-null  int32   
 6   WSFE_ID       1353979 non-null  float64 
 7   ADM_ID_left   1353979 non-null  float64 
 8   index_right   1353979 non-null  float64 
 9   ADM_ID_right  1353979 non-null  float64 
dtypes: float64(6), geometry(1), int32(1), int64(2)
memory usage: 108.5 MB


In [61]:
del GRID3_ADM, ADM_IDs

### 4.2 Remove duplicates: where buildup polygons intersected with more than one GRID3 settlement extent.
This happens when the first dataset (WSFE) intersects (distance = 0) with more than one feature of the second dataset (GRID3). More common for large cities. For example, Yaoundé, CMN has a large contiguous 1985 WSFE polygon which overlaps several small GRID3 features that are not Yaoundé.

In [62]:
# The first number should always be zero. 
# The second tells us whether/how many WSFE polygons were duplicated by the Near join.

print(len(WSFE_ADM[WSFE_ADM.duplicated('WSFE_ID')]), len(Bounded[Bounded.duplicated('WSFE_ID')]))

0 8498


In [63]:
# If there are duplicate WSFE_IDs, then we need to choose between them.
# We'll pick the one that joined with the largest GRID3 polygon.
# To do that, we can just sort the dataframe by GRID3 areas, then drop_duplicates. 
# It will retain the first row of each WSFE_ID group.
Bounded = Bounded.sort_values('G3_Area', ascending=False).drop_duplicates(['WSFE_ID'])
Bounded.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 1345481 entries, 1375669 to 579652
Data columns (total 10 columns):
 #   Column        Non-Null Count    Dtype   
---  ------        --------------    -----   
 0   Sett_ID       1345481 non-null  int64   
 1   Bounded_ID    1345481 non-null  int64   
 2   ADM_ID        0 non-null        float64 
 3   geometry      1345481 non-null  geometry
 4   G3_Area       1345481 non-null  float64 
 5   year          1345481 non-null  int32   
 6   WSFE_ID       1345481 non-null  float64 
 7   ADM_ID_left   1345481 non-null  float64 
 8   index_right   1345481 non-null  float64 
 9   ADM_ID_right  1345481 non-null  float64 
dtypes: float64(6), geometry(1), int32(1), int64(2)
memory usage: 107.8 MB


In [64]:
print(len(Bounded[Bounded.duplicated('WSFE_ID')]))

0


In [65]:
# Now we can dissolve with the WSFE years, now that we can group them by their administratively split ID.
Bounded = Bounded.dissolve(by=['year', 'Bounded_ID'], as_index=False)
print(Bounded.info(), Bounded.sample(10))

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 97655 entries, 0 to 97654
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype   
---  ------        --------------  -----   
 0   year          97655 non-null  int64   
 1   Bounded_ID    97655 non-null  int64   
 2   geometry      97655 non-null  geometry
 3   Sett_ID       97655 non-null  int64   
 4   ADM_ID        0 non-null      float64 
 5   G3_Area       97655 non-null  float64 
 6   WSFE_ID       97655 non-null  float64 
 7   ADM_ID_left   97655 non-null  float64 
 8   index_right   97655 non-null  float64 
 9   ADM_ID_right  97655 non-null  float64 
dtypes: float64(6), geometry(1), int64(3)
memory usage: 7.5 MB
None        year  Bounded_ID                                           geometry  \
73876  2018      139736  MULTIPOLYGON (((1582533.190 3916573.088, 15825...   
11642  1997       43147  POLYGON ((1214030.965 3718629.692, 1214058.626...   
85183  2020       78755  POLYGON ((1215552.329 398824

In [66]:
# Clean up and save to file.
Bounded = Bounded[['ADM_ID_left', 'year', 'Bounded_ID', 'Sett_ID', 'geometry']].rename(columns={"ADM_ID_left": "ADM_ID"})
Bounded = Bounded.astype({"ADM_ID":'int', "Bounded_ID":'int', "Sett_ID":'int', "year":'int'})
print(Bounded.sample(10))

       ADM_ID  year  Bounded_ID  Sett_ID  \
68637      18  2017      108810   107158   
37793     193  2012      154246   152602   
4910      151  1989       47593    46745   
46280      40  2014        2043     2034   
67645     175  2017       58533    57518   
17424      70  2005       40448    39728   
33805     270  2012       45622    44792   
30835      13  2012        9580     9437   
29216      99  2011       59170    58135   
51072     239  2014       39993    39284   

                                                geometry  
68637  MULTIPOLYGON (((1330429.205 4007661.350, 13304...  
37793  POLYGON ((1739593.364 4063785.882, 1739648.686...  
4910   MULTIPOLYGON (((1222854.881 3845926.437, 12228...  
46280  POLYGON ((1150548.559 3657664.454, 1150576.220...  
67645  MULTIPOLYGON (((1194170.238 3877819.776, 11941...  
17424  POLYGON ((1231236.218 3735143.416, 1231263.879...  
33805  MULTIPOLYGON (((1222412.302 3770632.708, 12224...  
30835  POLYGON ((1143080.041 3675893.170, 1

In [67]:
Bounded.to_file(
    driver='GPKG', filename=os.path.join(Intermediate,'NonCumulativeSettlements.gpkg'), layer='Settlements_Bounded')

In [68]:
del WSFE_ADM

### 4.3 BOUNDLESS SETTLEMENTS: Dissolve features that were split by an ADM boundary.

In [69]:
# Fragments of any bounded settlement will be combined into a single "boundless" settlement in this version.
# It is based on their "Sett_ID", which is a direct loan from the GRID3 settlement features.
Boundless = Bounded.dissolve(by=['year', 'Sett_ID'], as_index=False)
print(Boundless.info(), Boundless.sample(10))

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 93908 entries, 0 to 93907
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   year        93908 non-null  int64   
 1   Sett_ID     93908 non-null  int64   
 2   geometry    93908 non-null  geometry
 3   ADM_ID      93908 non-null  int32   
 4   Bounded_ID  93908 non-null  int32   
dtypes: geometry(1), int32(2), int64(2)
memory usage: 2.9 MB
None        year  Sett_ID                                           geometry  \
48085  2014    31001  MULTIPOLYGON (((1232840.566 3655894.138, 12328...   
24313  2010    59900  MULTIPOLYGON (((1258012.239 3898814.611, 12580...   
6940   1990   100777  POLYGON ((1308327.924 3997316.069, 1308383.246...   
72051  2019      363  MULTIPOLYGON (((1146205.754 3658217.677, 11461...   
21250  2010    26282  POLYGON ((1150437.914 3934276.243, 1150465.576...   
1730   1985   135003  MULTIPOLYGON (((1487074.462 3962048.066, 14870...   

In [70]:
# Clean up and save to file.
Boundless.to_file(driver='GPKG', 
                  filename=os.path.join(Intermediate,'NonCumulativeSettlements.gpkg'), layer='Settlements_Boundless')

---

## 5. CUMULATIVE ANNUALIZED SETTLEMENT EXTENTS
DISSOLVE BY YEAR SETS: Create separate feature layers of each cumulative year.

### 5.1 Define study years for each for loop.

In [95]:
Boundless = gpd.read_file(os.path.join(Intermediate,'NonCumulativeSettlements.gpkg'), layer='Settlements_Boundless')

In [81]:
Reversed_WSFE_Years.remove(WSFE_end) # We'll call the last year in the study, then use this list for subsequent sets.
print(Reversed_WSFE_Years)



 [2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, 1998, 1997, 1996, 1995, 1994, 1993, 1992, 1991, 1990, 1989, 1988, 1987, 1986, 1985]


### 5.2 Starting with main Boundless dataset, create a cumulative area feature layer for each year.

In [100]:
# For each year in the growth stats study, we are taking features from all years prior to and including that year, 
# dissolving those features, and exporting as its own file.

for item in WSFE_Years:
    print('Subsetting to cumulative area for year: %s. %s\n' % (item, time.ctime()))
    YearSet = Boundless[Boundless['year'].between(
        WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the start year and "item" rather than only between them.
    print('Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. %s\n' % time.ctime())
    YearDissolve = YearSet.dissolve(by='Sett_ID', 
                                        aggfunc={"year": "max", "ADM_ID":"min"}, # Though ADM_ID should be matching every time.
                                        as_index=False)
    print('Write to file. %s\n' % time.ctime())
    YearName = ''.join(['Cu', str(item), '_Boundless'])
    YearDissolve.to_file(driver='GPKG', filename=os.path.join(Results,'CumulativeSettlements.gpkg'), layer=YearName)
    del YearSet, YearDissolve
print("Done with all years in set. %s" % time.ctime())

Subsetting to cumulative area for year: 1985. Wed Sep 13 15:32:59 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:32:59 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:33:06 2023

Subsetting to cumulative area for year: 1986. Wed Sep 13 15:33:09 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:33:09 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:33:18 2023

Subsetting to cumulative area for year: 1987. Wed Sep 13 15:33:21 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:33:21 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:33:33 2023

Subsetting to cumulative area for year: 1988. Wed Sep 13 15:33:38 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:33:38 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:33:51 2023

Subsetting to cumulative area for year: 1989. Wed Sep 13 15:33:55 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:33:55 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:34:09 2023

Subsetting to cumulative area for year: 1990. Wed Sep 13 15:34:13 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:34:13 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:34:31 2023

Subsetting to cumulative area for year: 1991. Wed Sep 13 15:34:36 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:34:36 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:34:54 2023

Subsetting to cumulative area for year: 1992. Wed Sep 13 15:34:59 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:34:59 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:35:18 2023

Subsetting to cumulative area for year: 1993. Wed Sep 13 15:35:22 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:35:22 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:35:41 2023

Subsetting to cumulative area for year: 1994. Wed Sep 13 15:35:45 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:35:45 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:36:05 2023

Subsetting to cumulative area for year: 1995. Wed Sep 13 15:36:09 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:36:09 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:36:30 2023

Subsetting to cumulative area for year: 1996. Wed Sep 13 15:36:35 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:36:35 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:36:58 2023

Subsetting to cumulative area for year: 1997. Wed Sep 13 15:37:04 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:37:04 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:37:26 2023

Subsetting to cumulative area for year: 1998. Wed Sep 13 15:37:31 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:37:31 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:37:55 2023

Subsetting to cumulative area for year: 1999. Wed Sep 13 15:37:59 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:37:59 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:38:23 2023

Subsetting to cumulative area for year: 2000. Wed Sep 13 15:38:27 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:38:27 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:38:51 2023

Subsetting to cumulative area for year: 2001. Wed Sep 13 15:38:56 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:38:56 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:39:20 2023

Subsetting to cumulative area for year: 2002. Wed Sep 13 15:39:24 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:39:24 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:39:48 2023

Subsetting to cumulative area for year: 2003. Wed Sep 13 15:39:53 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:39:53 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:40:19 2023

Subsetting to cumulative area for year: 2004. Wed Sep 13 15:40:23 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:40:24 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:40:49 2023

Subsetting to cumulative area for year: 2005. Wed Sep 13 15:40:54 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:40:54 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:41:21 2023

Subsetting to cumulative area for year: 2006. Wed Sep 13 15:41:25 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:41:25 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:41:52 2023

Subsetting to cumulative area for year: 2007. Wed Sep 13 15:41:56 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:41:57 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:42:24 2023

Subsetting to cumulative area for year: 2008. Wed Sep 13 15:42:28 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:42:28 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:42:57 2023

Subsetting to cumulative area for year: 2009. Wed Sep 13 15:43:02 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:43:02 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:43:28 2023

Subsetting to cumulative area for year: 2010. Wed Sep 13 15:43:32 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:43:32 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:44:16 2023

Subsetting to cumulative area for year: 2011. Wed Sep 13 15:44:35 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:44:35 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:46:02 2023

Subsetting to cumulative area for year: 2012. Wed Sep 13 15:46:24 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:46:24 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:47:40 2023

Subsetting to cumulative area for year: 2013. Wed Sep 13 15:47:51 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:47:51 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:48:48 2023

Subsetting to cumulative area for year: 2014. Wed Sep 13 15:48:59 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:48:59 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:50:13 2023

Subsetting to cumulative area for year: 2015. Wed Sep 13 15:50:26 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:50:26 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:51:44 2023

Subsetting to cumulative area for year: 2016. Wed Sep 13 15:51:57 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:51:57 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:53:13 2023

Subsetting to cumulative area for year: 2017. Wed Sep 13 15:53:27 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:53:27 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:54:52 2023

Subsetting to cumulative area for year: 2018. Wed Sep 13 15:55:05 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:55:05 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:56:29 2023

Subsetting to cumulative area for year: 2019. Wed Sep 13 15:56:42 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:56:42 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:58:11 2023

Subsetting to cumulative area for year: 2020. Wed Sep 13 15:58:25 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 15:58:25 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 15:59:56 2023

Subsetting to cumulative area for year: 2021. Wed Sep 13 16:00:10 2023

Dissolving so that each unique settlement (Sett_ID) has a single cumulative WSFE feature. Wed Sep 13 16:00:10 2023



  YearSet = Boundless[Boundless['year'].between(


Write to file. Wed Sep 13 16:01:47 2023

Done with all years in set. Wed Sep 13 16:02:00 2023


##### Join area information from each cumulative layer onto the latest year dataset.

In [101]:
# The latest year in the study contains all settlements. Merge all other years' areas onto this dataset.
SettAreas = gpd.read_file(os.path.join(Results,'CumulativeSettlements.gpkg'), layer=
                          ''.join(['Cu', str(WSFE_end), '_Boundless'])) 
SettAreas['AREA'+str(WSFE_end)] = SettAreas['geometry'].area / 10**6
SettAreas = pd.DataFrame(SettAreas).drop(columns='geometry') # We have settlement IDs, so no need to join spatially!

In [102]:
for item in Reversed_WSFE_Years:
    print("Loading cumulative layer for year %s. %s\n" % (item, time.ctime()))
    YearLayer = gpd.read_file(os.path.join(Results,'CumulativeSettlements.gpkg'), 
                              layer=''.join(['Cu', str(item), '_Boundless']))
    print("Adding area field and converting to non-spatial dataframe. %s\n" % (time.ctime()))
    AreaYearName = ''.join(['AREA', str(item)])
    YearLayer[AreaYearName] = YearLayer['geometry'].area/ 10**6 
    YearLayer = pd.DataFrame(YearLayer)[['Sett_ID', AreaYearName]]
    print("Merging variables from %s onto our latest year (%s) via table join. %s\n" % (item, WSFE_end, time.ctime()))
    SettAreas = SettAreas.merge(YearLayer, how='left', on='Sett_ID')
print("Done merging annualized areas onto latest year geometries. Saving to file. %s\n" % (time.ctime()))


print(SettAreas.info())
SettAreas.to_csv(os.path.join(Results, 'Area.csv'))

Loading cumulative layer for year 2020. Wed Sep 13 16:02:03 2023

Adding area field and converting to non-spatial dataframe. Wed Sep 13 16:02:06 2023

Merging variables from 2020 onto our latest year (2021) via table join. Wed Sep 13 16:02:06 2023

Loading cumulative layer for year 2019. Wed Sep 13 16:02:06 2023

Adding area field and converting to non-spatial dataframe. Wed Sep 13 16:02:08 2023

Merging variables from 2019 onto our latest year (2021) via table join. Wed Sep 13 16:02:08 2023

Loading cumulative layer for year 2018. Wed Sep 13 16:02:08 2023

Adding area field and converting to non-spatial dataframe. Wed Sep 13 16:02:11 2023

Merging variables from 2018 onto our latest year (2021) via table join. Wed Sep 13 16:02:11 2023

Loading cumulative layer for year 2017. Wed Sep 13 16:02:11 2023

Adding area field and converting to non-spatial dataframe. Wed Sep 13 16:02:13 2023

Merging variables from 2017 onto our latest year (2021) via table join. Wed Sep 13 16:02:14 2023

Load

Adding area field and converting to non-spatial dataframe. Wed Sep 13 16:02:56 2023

Merging variables from 1987 onto our latest year (2021) via table join. Wed Sep 13 16:02:56 2023

Loading cumulative layer for year 1986. Wed Sep 13 16:02:56 2023

Adding area field and converting to non-spatial dataframe. Wed Sep 13 16:02:57 2023

Merging variables from 1986 onto our latest year (2021) via table join. Wed Sep 13 16:02:57 2023

Loading cumulative layer for year 1985. Wed Sep 13 16:02:57 2023

Adding area field and converting to non-spatial dataframe. Wed Sep 13 16:02:58 2023

Merging variables from 1985 onto our latest year (2021) via table join. Wed Sep 13 16:02:58 2023

Done merging annualized areas onto latest year geometries. Saving to file. Wed Sep 13 16:02:58 2023

<class 'pandas.core.frame.DataFrame'>
Int64Index: 18552 entries, 0 to 18551
Data columns (total 40 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Sett_ID   18552 non-null  

In [103]:
del SettAreas

### 5.3 Repeat for Bounded dataset.

In [90]:
# Bounded = gpd.read_file(r'Results/NonCumulativeSettlements.gpkg', layer='Settlements_Bounded')

for item in WSFE_Years:
    print('Subsetting to cumulative area for year: %s. %s\n' % (item, time.ctime()))
    YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.
    print('Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. %s\n' % time.ctime())
    YearDissolve = YearSet.dissolve(by='Bounded_ID', 
                                        aggfunc={"year": "max", "ADM_ID":"min", "Sett_ID":"min"}, # Though ADM_ID and Sett_ID should be matching every time.
                                        as_index=False)
    print('Write to file. %s\n' % time.ctime())
    YearName = ''.join(['Cu', str(item), '_Bounded'])
    YearDissolve.to_file(driver='GPKG', filename=os.path.join(Results,'CumulativeSettlements.gpkg'), layer=YearName)
    del YearSet, YearDissolve
print("Done with all years in set. %s" % time.ctime())

Subsetting to cumulative area for year: 1985. Wed Sep 13 13:44:20 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:44:20 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:44:25 2023

Subsetting to cumulative area for year: 1986. Wed Sep 13 13:44:27 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:44:27 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:44:35 2023

Subsetting to cumulative area for year: 1987. Wed Sep 13 13:44:37 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:44:37 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:44:47 2023

Subsetting to cumulative area for year: 1988. Wed Sep 13 13:44:50 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:44:50 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:45:02 2023

Subsetting to cumulative area for year: 1989. Wed Sep 13 13:45:05 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:45:05 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:45:17 2023

Subsetting to cumulative area for year: 1990. Wed Sep 13 13:45:20 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:45:20 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:45:37 2023

Subsetting to cumulative area for year: 1991. Wed Sep 13 13:45:40 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:45:40 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:45:57 2023

Subsetting to cumulative area for year: 1992. Wed Sep 13 13:46:01 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:46:01 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:46:18 2023

Subsetting to cumulative area for year: 1993. Wed Sep 13 13:46:23 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:46:23 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:46:39 2023

Subsetting to cumulative area for year: 1994. Wed Sep 13 13:46:43 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:46:43 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:46:59 2023

Subsetting to cumulative area for year: 1995. Wed Sep 13 13:47:03 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:47:03 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:47:22 2023

Subsetting to cumulative area for year: 1996. Wed Sep 13 13:47:26 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:47:26 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:47:45 2023

Subsetting to cumulative area for year: 1997. Wed Sep 13 13:47:49 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:47:49 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:48:08 2023

Subsetting to cumulative area for year: 1998. Wed Sep 13 13:48:12 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:48:12 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:48:31 2023

Subsetting to cumulative area for year: 1999. Wed Sep 13 13:48:35 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:48:35 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:48:54 2023

Subsetting to cumulative area for year: 2000. Wed Sep 13 13:48:58 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:48:58 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:49:20 2023

Subsetting to cumulative area for year: 2001. Wed Sep 13 13:49:24 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:49:24 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:49:49 2023

Subsetting to cumulative area for year: 2002. Wed Sep 13 13:49:54 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:49:54 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:50:17 2023

Subsetting to cumulative area for year: 2003. Wed Sep 13 13:50:21 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:50:21 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:50:45 2023

Subsetting to cumulative area for year: 2004. Wed Sep 13 13:50:50 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:50:50 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:51:15 2023

Subsetting to cumulative area for year: 2005. Wed Sep 13 13:51:20 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:51:20 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:51:43 2023

Subsetting to cumulative area for year: 2006. Wed Sep 13 13:51:49 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:51:49 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:52:14 2023

Subsetting to cumulative area for year: 2007. Wed Sep 13 13:52:20 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:52:20 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:52:45 2023

Subsetting to cumulative area for year: 2008. Wed Sep 13 13:52:49 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:52:49 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:53:14 2023

Subsetting to cumulative area for year: 2009. Wed Sep 13 13:53:19 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:53:19 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:53:45 2023

Subsetting to cumulative area for year: 2010. Wed Sep 13 13:53:50 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:53:50 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:54:31 2023

Subsetting to cumulative area for year: 2011. Wed Sep 13 13:54:42 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:54:42 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:55:25 2023

Subsetting to cumulative area for year: 2012. Wed Sep 13 13:55:36 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:55:36 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:56:32 2023

Subsetting to cumulative area for year: 2013. Wed Sep 13 13:56:44 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:56:44 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:57:44 2023

Subsetting to cumulative area for year: 2014. Wed Sep 13 13:57:56 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:57:56 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 13:59:08 2023

Subsetting to cumulative area for year: 2015. Wed Sep 13 13:59:22 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 13:59:22 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 14:00:39 2023

Subsetting to cumulative area for year: 2016. Wed Sep 13 14:00:53 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 14:00:53 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 14:02:10 2023

Subsetting to cumulative area for year: 2017. Wed Sep 13 14:02:22 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 14:02:22 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 14:03:40 2023

Subsetting to cumulative area for year: 2018. Wed Sep 13 14:03:53 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 14:03:54 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 14:05:14 2023

Subsetting to cumulative area for year: 2019. Wed Sep 13 14:05:27 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 14:05:27 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 14:07:46 2023

Subsetting to cumulative area for year: 2020. Wed Sep 13 14:08:12 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 14:08:12 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 14:11:09 2023

Subsetting to cumulative area for year: 2021. Wed Sep 13 14:11:39 2023

Dissolving so that each unique settlement (Bounded_ID) has a single cumulative WSFE feature. Wed Sep 13 14:11:39 2023



  YearSet = Bounded[Bounded['year'].between(WSFE_start, item, inclusive=True)] # Inclusive parameter means we include the years 1985 and "item" rather than only between them.


Write to file. Wed Sep 13 14:14:59 2023

Done with all years in set. Wed Sep 13 14:15:24 2023


In [91]:
SettAreas = gpd.read_file(os.path.join(Results,'CumulativeSettlements.gpkg'), 
                          layer=''.join(['Cu', str(WSFE_end), '_Bounded']))
SettAreas['AREA'+str(WSFE_end)] = SettAreas['geometry'].area / 10**6
SettAreas = pd.DataFrame(SettAreas).drop(columns='geometry')

In [92]:
for item in Reversed_WSFE_Years:
    print("Loading cumulative layer for year %s. %s\n" % (item, time.ctime()))
    YearLayer = gpd.read_file(os.path.join(Results,'CumulativeSettlements.gpkg'), 
                              layer=''.join(['Cu', str(item), '_Bounded']))
    print("Adding area field and converting to non-spatial dataframe. %s\n" % (time.ctime()))
    AreaYearName = ''.join(['AREA', str(item)])
    YearLayer[AreaYearName] = YearLayer['geometry'].area/ 10**6 
    YearLayer = pd.DataFrame(YearLayer)[['Bounded_ID', AreaYearName]]
    print("Merging variables from %s onto our latest year (%s) via table join. %s\n" % (item, WSFE_end, time.ctime()))
    SettAreas = SettAreas.merge(YearLayer, how='left', on='Bounded_ID')
print("Done merging annualized areas onto latest year geometries. Saving to file. %s\n" % (time.ctime()))

print(SettAreas.info())
SettAreas.to_csv(os.path.join(Results, 'Area_Bounded.csv'))

Loading cumulative layer for year 2020. Wed Sep 13 14:15:28 2023

Adding area field and converting to non-spatial dataframe. Wed Sep 13 14:15:34 2023

Merging variables from 2020 onto our latest year (2021) via table join. Wed Sep 13 14:15:34 2023

Loading cumulative layer for year 2019. Wed Sep 13 14:15:34 2023

Adding area field and converting to non-spatial dataframe. Wed Sep 13 14:15:40 2023

Merging variables from 2019 onto our latest year (2021) via table join. Wed Sep 13 14:15:40 2023

Loading cumulative layer for year 2018. Wed Sep 13 14:15:40 2023

Adding area field and converting to non-spatial dataframe. Wed Sep 13 14:15:45 2023

Merging variables from 2018 onto our latest year (2021) via table join. Wed Sep 13 14:15:45 2023

Loading cumulative layer for year 2017. Wed Sep 13 14:15:45 2023

Adding area field and converting to non-spatial dataframe. Wed Sep 13 14:15:51 2023

Merging variables from 2017 onto our latest year (2021) via table join. Wed Sep 13 14:15:51 2023

Load

Adding area field and converting to non-spatial dataframe. Wed Sep 13 14:17:08 2023

Merging variables from 1987 onto our latest year (2021) via table join. Wed Sep 13 14:17:08 2023

Loading cumulative layer for year 1986. Wed Sep 13 14:17:08 2023

Adding area field and converting to non-spatial dataframe. Wed Sep 13 14:17:10 2023

Merging variables from 1986 onto our latest year (2021) via table join. Wed Sep 13 14:17:10 2023

Loading cumulative layer for year 1985. Wed Sep 13 14:17:10 2023

Adding area field and converting to non-spatial dataframe. Wed Sep 13 14:17:11 2023

Merging variables from 1985 onto our latest year (2021) via table join. Wed Sep 13 14:17:11 2023

Done merging annualized areas onto latest year geometries. Saving to file. Wed Sep 13 14:17:11 2023

<class 'pandas.core.frame.DataFrame'>
Int64Index: 19095 entries, 0 to 19094
Data columns (total 41 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Bounded_ID  19095 non-

In [93]:
del SettAreas

### 5.4 One settlement geofile to rule them all. ...and in the Sett_ID bind them.
The annualized values can be stored as distinct non-spatial dataframes. Their Sett_IDs will be used to join onto this geoversion with place names for the summary stats.

In [104]:
Settlements = gpd.read_file(os.path.join(Results,'CumulativeSettlements.gpkg'), 
                           layer=''.join(['Cu', str(WSFE_end), '_Boundless']))[['Sett_ID', 'ADM_ID', 'geometry']]
print(Settlements.info())
print(Settlements.crs)
Settlements.to_file(driver='GPKG', 
                       filename=os.path.join(Results,'SETTLEMENTS.gpkg'), 
                       layer='SETTLEMENTS_equalarea')

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 18552 entries, 0 to 18551
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   Sett_ID   18552 non-null  int64   
 1   ADM_ID    18552 non-null  int64   
 2   geometry  18552 non-null  geometry
dtypes: geometry(1), int64(2)
memory usage: 434.9 KB
None
PROJCS["Africa_Albers_Equal_Area_Conic",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",0],PARAMETER["longitude_of_center",25],PARAMETER["standard_parallel_1",20],PARAMETER["standard_parallel_2",-23],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["ESRI","102022"]]


In [105]:
# Saving all the final products as WGS84.
Settlements_WGS = Settlements.to_crs(4326) 
print(Settlements_WGS.info())
print(Settlements_WGS.crs)
Settlements_WGS.to_file(driver='GPKG', 
                       filename=os.path.join(Results,'SETTLEMENTS.gpkg'), 
                       layer='SETTLEMENTS')

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 18552 entries, 0 to 18551
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   Sett_ID   18552 non-null  int64   
 1   ADM_ID    18552 non-null  int64   
 2   geometry  18552 non-null  geometry
dtypes: geometry(1), int64(2)
memory usage: 434.9 KB
None
epsg:4326


### 5.5 Buffer the area of the Boundless dataset's latest year to mask raster data in later sections.
The Bounded dataset would also be fine for our purposes here. The buffer is dissolved to a single feature to be used for its total extents, which are identical between Bounded & Boundless datasets.

In [106]:
# Create buffer layer(s) to use as maximum distance for Near joins.

# Population buffer: 2km
Distance = 2000 # The Africa Albers projection is in meters. Saving in this projection to use in later sections.

print('Creating buffer layer. %s' % time.ctime())
BufferLayer = Settlements[['Sett_ID', 'geometry']]
BufferLayer['geometry'] = BufferLayer['geometry'].apply(
    make_valid).buffer(Distance) # make_valid is a workaround for any null geometries.
print('Finished buffer layer creation. %s' % time.ctime())

BufferFileName1 = ''.join(['Buff', str(Distance), 'm_', str(WSFE_end)])
BufferLayer.to_file(driver='GPKG', filename=os.path.join(Intermediate,'Catchment.gpkg'), layer=BufferFileName1)
print('Saved to file. %s' % time.ctime())

Creating buffer layer. Wed Sep 13 16:03:28 2023


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)


Finished buffer layer creation. Wed Sep 13 18:34:08 2023
Saved to file. Wed Sep 13 18:34:14 2023


In [107]:
# Nighttime Lights buffer: 250m
Distance = 250

print('Creating buffer layer. %s' % time.ctime())
BufferLayer = Settlements[['Sett_ID', 'geometry']]
BufferLayer['geometry'] = BufferLayer['geometry'].apply(
    make_valid).buffer(Distance) # make_valid is a workaround for any null geometries.
print('Finished buffer layer creation. %s' % time.ctime())

BufferFileName2 = ''.join(['Buff', str(Distance), 'm_', str(WSFE_end)])
BufferLayer.to_file(driver='GPKG', filename=os.path.join(Intermediate,'Catchment.gpkg'), layer=BufferFileName2)
print('Saved to file. %s' % time.ctime())

Creating buffer layer. Wed Sep 13 18:34:14 2023


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)


Finished buffer layer creation. Wed Sep 13 18:39:20 2023
Saved to file. Wed Sep 13 18:39:28 2023


---

## 6. PLACE NAMES
Join urban place names from UCDB, Africapolis, and GeoNames onto the settlement vectors.

### 6.1 Load placename datasets, filter, and project.

In [69]:
# WSFE_end = 2021
# WSFE_start = 1985
# WSFE_Years = ListFromRange(WSFE_start, WSFE_end)

In [43]:
# Anytime we use a spatial join or work with area, 
# my preference is to keep it in a planar, equal area, meters projection. So we'll load as the Africa Albers.
Settlements = gpd.read_file(os.path.join(Results, 'SETTLEMENTS.gpkg'), layer='SETTLEMENTS_equalarea')
Settlements['AREA'+str(WSFE_end)] = Settlements['geometry'].area / 10**6

# Load, pull name field, rename, and reproject to match the catchments CRS.
UCDB = gpd.read_file(os.path.join(Source, 'PlaceName', 'GHS_STAT_UCDB2015MT_GLOBE_R2019A_V1_2.gpkg'), 
                     layer=0)[['UC_NM_MN', 'geometry']].rename(
    columns={"UC_NM_MN": "UCDB_Name"}).to_crs("ESRI:102022")

Africapolis = gpd.read_file(os.path.join(Source, 'PlaceName', 'AFRICAPOLIS2020.shp'))[['agglosName', 'geometry']].rename(
    columns={"agglosName": "Afpl_Name"}).to_crs("ESRI:102022")

GeoNames = gpd.read_file(os.path.join(Source, 'PlaceName', 'GeoNames.gpkg'), 
                         layer=0)[['GeoName', 'geometry']].to_crs("ESRI:102022")

print(Settlements.info(), UCDB.info(), Africapolis.info(), GeoNames.info())

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 18552 entries, 0 to 18551
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   Sett_ID   18552 non-null  int64   
 1   ADM_ID    18552 non-null  int64   
 2   geometry  18552 non-null  geometry
 3   AREA2021  18552 non-null  float64 
dtypes: float64(1), geometry(1), int64(2)
memory usage: 579.9 KB
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 13135 entries, 0 to 13134
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype   
---  ------     --------------  -----   
 0   UCDB_Name  13135 non-null  object  
 1   geometry   13135 non-null  geometry
dtypes: geometry(1), object(1)
memory usage: 205.4+ KB
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 7720 entries, 0 to 7719
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype   
---  ------     --------------  -----   
 0   Afpl_Name  7720 non-null   object  
 1   ge

### 6.2 Join placenames onto settlements geodataframe.

In [44]:
# We wrap it in pd.DataFrame() since the sjoin() is the last time we need the geometry.

GeoNames = pd.DataFrame(gpd.sjoin_nearest(GeoNames, Settlements, 
                             how='left', distance_col="distGN", max_distance=250, 
                             lsuffix="G3", rsuffix="GN")).drop(columns='geometry')
Africapolis = pd.DataFrame(gpd.sjoin_nearest(Africapolis, Settlements, 
                             how='left', distance_col="distAF", max_distance=250,
                             lsuffix="G3", rsuffix="Af")).drop(columns='geometry')
UCDB = pd.DataFrame(gpd.sjoin_nearest(UCDB, Settlements, 
                             how='left', distance_col="distUC", max_distance=250,
                             lsuffix="G3", rsuffix="UC")).drop(columns='geometry')

In [45]:
print(GeoNames.info())
print(Africapolis.info())
print(UCDB.info())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 199390 entries, 0 to 199389
Data columns (total 6 columns):
 #   Column    Non-Null Count   Dtype  
---  ------    --------------   -----  
 0   GeoName   199390 non-null  object 
 1   index_GN  277 non-null     float64
 2   Sett_ID   277 non-null     float64
 3   ADM_ID    277 non-null     float64
 4   AREA2021  277 non-null     float64
 5   distGN    277 non-null     float64
dtypes: float64(5), object(1)
memory usage: 10.6+ MB
None
<class 'pandas.core.frame.DataFrame'>
Int64Index: 7720 entries, 0 to 7719
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Afpl_Name  7720 non-null   object 
 1   index_Af   0 non-null      float64
 2   Sett_ID    0 non-null      float64
 3   ADM_ID     0 non-null      float64
 4   AREA2021   0 non-null      float64
 5   distAF     0 non-null      float64
dtypes: float64(5), object(1)
memory usage: 422.2+ KB
None
<class 'pandas.core.frame.D

In [46]:
alldatasets = [pd.DataFrame(Settlements).drop(columns='geometry'),
               Africapolis[['Sett_ID', 'Afpl_Name', 'distAF']], 
               GeoNames[['Sett_ID', 'GeoName', 'distGN']],
               UCDB[['Sett_ID', 'UCDB_Name', 'distUC']]]

SettlementsNamed = reduce(lambda left,right: pd.merge(left,right,on=['Sett_ID'], how='left'), alldatasets)
SettlementsNamed[['Afpl_Name', 'GeoName', 'UCDB_Name']] = SettlementsNamed[['Afpl_Name', 'GeoName', 'UCDB_Name']].fillna('UNK')

# Replace NaN values with a countable distance.
SettlementsNamed[['distAF', 'distGN', 'distUC']] = SettlementsNamed[['distAF', 'distGN', 'distUC']].fillna(-1)

In [47]:
print(SettlementsNamed.info())
print(SettlementsNamed.sample(10))

<class 'pandas.core.frame.DataFrame'>
Int64Index: 18575 entries, 0 to 18574
Data columns (total 9 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Sett_ID    18575 non-null  int64  
 1   ADM_ID     18575 non-null  int64  
 2   AREA2021   18575 non-null  float64
 3   Afpl_Name  18575 non-null  object 
 4   distAF     18575 non-null  float64
 5   GeoName    18575 non-null  object 
 6   distGN     18575 non-null  float64
 7   UCDB_Name  18575 non-null  object 
 8   distUC     18575 non-null  float64
dtypes: float64(4), int64(2), object(3)
memory usage: 1.4+ MB
None
       Sett_ID  ADM_ID  AREA2021 Afpl_Name  distAF GeoName  distGN UCDB_Name  \
15937   135240     141  0.034431       UNK    -1.0     UNK    -1.0       UNK   
15848   133939      26  0.032136       UNK    -1.0     UNK    -1.0       UNK   
5111     21703     212  0.000765       UNK    -1.0     UNK    -1.0       UNK   
6241     29106      46  0.002295       UNK    -1.0     UNK    -1.

In [48]:
del UCDB, Africapolis, GeoNames

The near joins should have prevented duplication of rows, but if df1 intersects with two features in df2, it creates a new row. Two of our placenames sources are polygons, so there may be instances.

In [49]:
SettlementsNamed[SettlementsNamed.duplicated('Sett_ID', keep=False)]

Unnamed: 0,Sett_ID,ADM_ID,AREA2021,Afpl_Name,distAF,GeoName,distGN,UCDB_Name,distUC
3205,11822,44,18.081043,UNK,-1.0,Maḑāyā,0.0,UNK,-1.0
3206,11822,44,18.081043,UNK,-1.0,Az Zabadānī,0.0,UNK,-1.0
4862,20308,50,9.228363,UNK,-1.0,UNK,-1.0,Baniyas,0.0
4863,20308,50,9.228363,UNK,-1.0,UNK,-1.0,Tartus,0.0
5042,21342,210,3.762963,UNK,-1.0,Şāfītā,0.0,UNK,-1.0
5043,21342,210,3.762963,UNK,-1.0,Ra’s al Khashūfah,37.97087,UNK,-1.0
5471,24050,104,27.997268,UNK,-1.0,Latakia,0.0,Latakia,0.0
5472,24050,104,27.997268,UNK,-1.0,Al Hinādī,14.481241,Latakia,0.0
5866,26727,50,11.317198,UNK,-1.0,Al Quţaylibīyah,60.441147,Jablah,0.0
5867,26727,50,11.317198,UNK,-1.0,Jablah,0.0,Jablah,0.0


In [50]:
SettlementsNamed.drop_duplicates(subset=['Sett_ID'], inplace=True, keep='first')
SettlementsNamed.info() # Range of entries should be the same as original Settlements file.

<class 'pandas.core.frame.DataFrame'>
Int64Index: 18552 entries, 0 to 18574
Data columns (total 9 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Sett_ID    18552 non-null  int64  
 1   ADM_ID     18552 non-null  int64  
 2   AREA2021   18552 non-null  float64
 3   Afpl_Name  18552 non-null  object 
 4   distAF     18552 non-null  float64
 5   GeoName    18552 non-null  object 
 6   distGN     18552 non-null  float64
 7   UCDB_Name  18552 non-null  object 
 8   distUC     18552 non-null  float64
dtypes: float64(4), int64(2), object(3)
memory usage: 1.4+ MB


### 6.3 Reduce to single name column.

In [51]:
# Determine which source has a name geometrically closest to the settlement.
# Since we switched NaN values to -1 earlier, we also resolved what happens in the event of a tie, 
# i.e. when more than one source is 0.0 meters from the settlement. It will take the value from the first column.
SettlementsNamed['SettName'] = "UNK"
SettlementsNamed['closest'] = SettlementsNamed[['distAF', 'distGN', 'distUC']].idxmax(axis=1)

In [52]:
SettlementsNamed.sample(20)

Unnamed: 0,Sett_ID,ADM_ID,AREA2021,Afpl_Name,distAF,GeoName,distGN,UCDB_Name,distUC,SettName,closest
9793,52034,16,0.000765,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
3459,12905,197,0.045143,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
7436,36432,176,0.16068,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
12097,73868,62,0.00153,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
12953,83313,121,0.090287,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
5434,23785,204,0.000765,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
13892,100934,129,0.005356,UNK,-1.0,UNK,-1.0,Aleppo,0.0,UNK,distUC
4846,20219,268,0.003826,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
12049,73382,223,0.00153,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
8949,47325,115,0.009947,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF


In [53]:
# Create a single name column where non-named settlements are "UNK" but all others use one of the three name sources.
SettlementsNamed.loc[
    SettlementsNamed['closest'] == "distAF", 
    'SettName'] = SettlementsNamed['Afpl_Name']

SettlementsNamed.loc[
    SettlementsNamed['closest'] == "distUC", 
    'SettName'] = SettlementsNamed['UCDB_Name']

SettlementsNamed.loc[
    SettlementsNamed['closest'] == "distGN", 
    'SettName'] = SettlementsNamed['GeoName']

In [54]:
SettlementsNamed.sample(20)

Unnamed: 0,Sett_ID,ADM_ID,AREA2021,Afpl_Name,distAF,GeoName,distGN,UCDB_Name,distUC,SettName,closest
12161,74561,223,0.035962,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
12771,81223,121,0.005356,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
3940,15245,101,0.890624,UNK,-1.0,Al Karīmah,0.0,UNK,-1.0,Al Karīmah,distGN
1225,4613,125,0.00153,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
1032,3907,170,0.002295,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
6865,32880,39,0.16221,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
17772,155355,126,0.087226,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
12356,76657,225,0.002295,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
13519,96403,35,0.003061,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF
4386,17283,260,0.006121,UNK,-1.0,UNK,-1.0,UNK,-1.0,UNK,distAF


In [55]:
SettlementsNamed[SettlementsNamed['SettName'] != 'UNK'].sample(20)

Unnamed: 0,Sett_ID,ADM_ID,AREA2021,Afpl_Name,distAF,GeoName,distGN,UCDB_Name,distUC,SettName,closest
16802,147611,11,0.101764,UNK,-1.0,UNK,-1.0,Al-Hasakah,0.0,Al-Hasakah,distUC
15216,121705,177,0.003826,UNK,-1.0,UNK,-1.0,Manbij,0.0,Manbij,distUC
16080,138061,68,0.00153,UNK,-1.0,UNK,-1.0,Deir Ez Zor,0.0,Deir Ez Zor,distUC
16756,147274,11,0.229542,UNK,-1.0,UNK,-1.0,Al-Hasakah,0.0,Al-Hasakah,distUC
16595,143516,20,1.068902,UNK,-1.0,UNK,-1.0,Al Qurayya,0.0,Al Qurayya,distUC
2976,10766,61,0.37798,UNK,-1.0,UNK,-1.0,Damascus,0.0,Damascus,distUC
10262,54694,182,0.656491,UNK,-1.0,Tremseh,0.0,UNK,-1.0,Tremseh,distGN
1991,7271,213,4.597731,UNK,-1.0,Sa‘sa‘,0.0,UNK,-1.0,Sa‘sa‘,distGN
13405,94742,129,0.013007,UNK,-1.0,UNK,-1.0,Aleppo,0.0,Aleppo,distUC
16316,141741,209,0.002295,UNK,-1.0,UNK,-1.0,Ceylanpınar,0.0,Ceylanpınar,distUC


### 6.4 Make sure place name is unique by stripping smaller localities of duplicated names.

In [56]:
Dupes = SettlementsNamed[ 
    (SettlementsNamed['SettName'] != 'UNK') & 
    (SettlementsNamed.duplicated('SettName', keep=False)) ] # keep=False is necessary to retain *all* duplicates, not just first or last in each group.

print("Number of named settlements: %s" % SettlementsNamed['SettName'].str.contains('UNK').value_counts()[False])
print("Number of named settlements where name is duplicated at least once: %s" % len(Dupes))

Number of named settlements: 1004
Number of named settlements where name is duplicated at least once: 754


In [57]:
Largest = Dupes.loc[Dupes.groupby(["SettName"])["AREA"+str(WSFE_end)].idxmax()]
print(Largest)

       Sett_ID  ADM_ID   AREA2021 Afpl_Name  distAF    GeoName  distGN  \
15649   130670     264   0.306056       UNK    -1.0        UNK    -1.0   
16595   143516      20   1.068902       UNK    -1.0        UNK    -1.0   
16772   147384      11   2.609130       UNK    -1.0        UNK    -1.0   
15528   126749      17   1.182143       UNK    -1.0        UNK    -1.0   
13558    96879      43  16.290613       UNK    -1.0        UNK    -1.0   
15830   133699      26  24.873198       UNK    -1.0  Ar Raqqah     0.0   
13390    94530      35   0.074219       UNK    -1.0        UNK    -1.0   
7079     34022      39   0.078810       UNK    -1.0        UNK    -1.0   
4862     20308      50   9.228363       UNK    -1.0        UNK    -1.0   
14647   107695      18   0.035962       UNK    -1.0        UNK    -1.0   
16352   142065     209   0.115536       UNK    -1.0        UNK    -1.0   
2956     10657     201   1.179082       UNK    -1.0        UNK    -1.0   
16004   137704      68   0.342783     

In [58]:
# Filter to settlements which have a duplicated name and are not the largest of those with that name, then replace with UNK.
SettlementsNamed.loc[(~SettlementsNamed.Sett_ID.isin(Largest.Sett_ID)) 
                     & (SettlementsNamed.Sett_ID.isin(Dupes.Sett_ID)), 
                     'SettName'] = 'UNK'

In [59]:
# Second number should now be zero.

print("Number of named settlements: %s" % SettlementsNamed['SettName'].str.contains('UNK').value_counts()[False])
print("Number of named settlements where name is duplicated at least once: %s" % len(SettlementsNamed[ 
    (SettlementsNamed['SettName'] != 'UNK') & 
    (SettlementsNamed.duplicated('SettName', keep=False)) ]))

Number of named settlements: 275
Number of named settlements where name is duplicated at least once: 0


In [60]:
print(SettlementsNamed.info(), SettlementsNamed[SettlementsNamed['SettName'] != "UNK"].sample(20))

<class 'pandas.core.frame.DataFrame'>
Int64Index: 18552 entries, 0 to 18574
Data columns (total 11 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Sett_ID    18552 non-null  int64  
 1   ADM_ID     18552 non-null  int64  
 2   AREA2021   18552 non-null  float64
 3   Afpl_Name  18552 non-null  object 
 4   distAF     18552 non-null  float64
 5   GeoName    18552 non-null  object 
 6   distGN     18552 non-null  float64
 7   UCDB_Name  18552 non-null  object 
 8   distUC     18552 non-null  float64
 9   SettName   18552 non-null  object 
 10  closest    18552 non-null  object 
dtypes: float64(4), int64(2), object(5)
memory usage: 2.2+ MB
None        Sett_ID  ADM_ID   AREA2021 Afpl_Name  distAF          GeoName  \
1991      7271     213   4.597731       UNK    -1.0           Sa‘sa‘   
15290   123287     242   0.016068       UNK    -1.0       As Sukhnah   
16012   137724     181   0.008417       UNK    -1.0        Mūḩ Ḩasan   
974       3721  

In [61]:
# Drop extra columns and save to file.
# SettlementsNamed = SettlementsNamed[['Sett_ID', 'SettName']]
SettlementsNamed = SettlementsNamed[['Sett_ID', 'SettName', 'GeoName', 'UCDB_Name']]
SettlementsNamed.to_csv(r'Results/PlaceNames.csv')

In [62]:
del SettlementsNamed

---

## 7. CREATE FRAGMENTATION INDEX
We are determining what percentage of a settlement's area lies outside of its administrative zone each year.
The index is a range of 0 to 100, i.e. the percent of the settlement area which is fragmented.

For each Sett_ID:
((Area of Boundless settlement - Area of largest Bounded settlement feature) / Area of Boundless settlement) * 100

### 7.1 Load boundless and bounded cumulative settlements and clean.

In [64]:
BoundlessAreas = pd.read_csv(os.path.join(Results, 'Area.csv'))
print('Loaded Boundless dataset, whose settlements will be used as the index of the Fragmentation Index dataset. %s' 
      % time.ctime())
print(BoundlessAreas.info())

BoundedAreas = pd.read_csv(os.path.join(Results, 'Area_Bounded.csv'))
print('Loaded Bounded dataset, which will factor into the fragmentation calculation. %s' % time.ctime())
print(BoundedAreas.info())

Loaded Boundless dataset, whose settlements will be used as the index of the Fragmentation Index dataset. Thu Sep 14 16:57:36 2023
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18552 entries, 0 to 18551
Data columns (total 41 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  18552 non-null  int64  
 1   Sett_ID     18552 non-null  int64  
 2   year        18552 non-null  int64  
 3   ADM_ID      18552 non-null  int64  
 4   AREA2021    18552 non-null  float64
 5   AREA2020    17807 non-null  float64
 6   AREA2019    17460 non-null  float64
 7   AREA2018    17058 non-null  float64
 8   AREA2017    16845 non-null  float64
 9   AREA2016    16629 non-null  float64
 10  AREA2015    16479 non-null  float64
 11  AREA2014    15938 non-null  float64
 12  AREA2013    14137 non-null  float64
 13  AREA2012    13533 non-null  float64
 14  AREA2011    11442 non-null  float64
 15  AREA2010    11415 non-null  float64
 16  AREA2009    4755 

In [65]:
BoundlessAreas = BoundlessAreas.loc[:, ~BoundlessAreas.columns.str.contains('Unnamed')]
BoundedAreas = BoundedAreas.loc[:, ~BoundedAreas.columns.str.contains('Unnamed')]

In [66]:
LargestFragments = BoundedAreas.loc[BoundedAreas.groupby(["Sett_ID"])["AREA"+str(WSFE_end)].idxmax()] 
print(LargestFragments.info())
print("Filtered the Bounded dataset to only rows where latest year's area is largest for each Sett_ID. %s" % time.ctime())
LargestFragments.columns = LargestFragments.columns.str.replace('AREA', 'Largest')
LargestFragments = LargestFragments.drop(columns=['year', 'ADM_ID'])
print("Renamed columns to avoid duplication during merge, and dropped unnecessary columns. %s" % time.ctime())
FragIndices = BoundlessAreas.merge(LargestFragments, how='left', on='Sett_ID')
print(FragIndices.info())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 18552 entries, 0 to 19094
Data columns (total 41 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Bounded_ID  18552 non-null  int64  
 1   year        18552 non-null  int64  
 2   ADM_ID      18552 non-null  int64  
 3   Sett_ID     18552 non-null  int64  
 4   AREA2021    18552 non-null  float64
 5   AREA2020    17806 non-null  float64
 6   AREA2019    17458 non-null  float64
 7   AREA2018    17055 non-null  float64
 8   AREA2017    16842 non-null  float64
 9   AREA2016    16626 non-null  float64
 10  AREA2015    16476 non-null  float64
 11  AREA2014    15935 non-null  float64
 12  AREA2013    14130 non-null  float64
 13  AREA2012    13528 non-null  float64
 14  AREA2011    11434 non-null  float64
 15  AREA2010    11407 non-null  float64
 16  AREA2009    4746 non-null   float64
 17  AREA2008    4735 non-null   float64
 18  AREA2007    4730 non-null   float64
 19  AREA2006    4721 non-null

In [67]:
del BoundlessAreas, BoundedAreas, LargestFragments

### 7.2 Merge and run fragmentation calculation.

In [70]:
for item in WSFE_Years:
    YY = str(item) # 4-digit year
    AreaYY = ''.join(["AREA", YY]) # The Boundless area variable name
    LargestYY = ''.join(['Largest', YY]) # The Bounded largest area variable name
    FragYY = ''.join(["Frag", YY]) # Name for the fragmentation index variable
    print("Created names for Year %s's variables and temporary objects. %s" % (item, time.ctime()))
    
    FragIndices[FragYY] = ((FragIndices[AreaYY] - FragIndices[LargestYY]) / FragIndices[AreaYY]) * 100
    FragIndices[FragYY] = (FragIndices[FragYY].fillna(0).replace([np.inf, -np.inf], 0)).astype('int')
    print("Calculated fragmentation index for year %s. %s" % (item, time.ctime()))

# Remove unnecessary columns.
FragIndices = FragIndices.loc[:, ~FragIndices.columns.str.startswith('Largest')]
FragIndices = FragIndices.loc[:, ~FragIndices.columns.str.startswith('AREA')]

print('Completed fragmentation index calculations for all years. %s' % time.ctime())
print(FragIndices.info())
print(FragIndices.sort_values(by='Frag'+str(WSFE_end), ascending=False).head(20))

Created names for Year 1985's variables and temporary objects. Thu Sep 14 16:59:22 2023
Calculated fragmentation index for year 1985. Thu Sep 14 16:59:22 2023
Created names for Year 1986's variables and temporary objects. Thu Sep 14 16:59:22 2023
Calculated fragmentation index for year 1986. Thu Sep 14 16:59:22 2023
Created names for Year 1987's variables and temporary objects. Thu Sep 14 16:59:22 2023
Calculated fragmentation index for year 1987. Thu Sep 14 16:59:22 2023
Created names for Year 1988's variables and temporary objects. Thu Sep 14 16:59:22 2023
Calculated fragmentation index for year 1988. Thu Sep 14 16:59:22 2023
Created names for Year 1989's variables and temporary objects. Thu Sep 14 16:59:22 2023
Calculated fragmentation index for year 1989. Thu Sep 14 16:59:22 2023
Created names for Year 1990's variables and temporary objects. Thu Sep 14 16:59:22 2023
Calculated fragmentation index for year 1990. Thu Sep 14 16:59:22 2023
Created names for Year 1991's variables and te

In [71]:
FragIndices = FragIndices.drop(columns=['year', 'ADM_ID'])
FragIndices.to_csv(os.path.join(Results, 'FragIndex.csv'))
print('Saved to file. %s' % time.ctime())

Saved to file. Thu Sep 14 16:59:33 2023


In [72]:
del FragIndices

---

## 8. PREPARE YEARLY DATASETS: POPULATION
Can use this as a template for other annualized rasters

### 8.1 Reproject and reclassify with settlement buffer mask.
Reclassify so that we only need to work with cells within X distance of settlements.

In [73]:
PopList = glob.glob(os.path.join(Source, 'Population', "*.tif"))
Settlements = gpd.read_file(os.path.join(Results, 'SETTLEMENTS.gpkg'), layer='SETTLEMENTS')
PopList

['Q:\\GIS\\povertyequity\\urban_growth\\Syria\\Source\\Population\\syr_ppp_2000_UNadj.tif',
 'Q:\\GIS\\povertyequity\\urban_growth\\Syria\\Source\\Population\\syr_ppp_2001_UNadj.tif',
 'Q:\\GIS\\povertyequity\\urban_growth\\Syria\\Source\\Population\\syr_ppp_2002_UNadj.tif',
 'Q:\\GIS\\povertyequity\\urban_growth\\Syria\\Source\\Population\\syr_ppp_2003_UNadj.tif',
 'Q:\\GIS\\povertyequity\\urban_growth\\Syria\\Source\\Population\\syr_ppp_2004_UNadj.tif',
 'Q:\\GIS\\povertyequity\\urban_growth\\Syria\\Source\\Population\\syr_ppp_2005_UNadj.tif',
 'Q:\\GIS\\povertyequity\\urban_growth\\Syria\\Source\\Population\\syr_ppp_2006_UNadj.tif',
 'Q:\\GIS\\povertyequity\\urban_growth\\Syria\\Source\\Population\\syr_ppp_2007_UNadj.tif',
 'Q:\\GIS\\povertyequity\\urban_growth\\Syria\\Source\\Population\\syr_ppp_2008_UNadj.tif',
 'Q:\\GIS\\povertyequity\\urban_growth\\Syria\\Source\\Population\\syr_ppp_2009_UNadj.tif',
 'Q:\\GIS\\povertyequity\\urban_growth\\Syria\\Source\\Population\\syr_ppp_2010_

In [74]:
BatchZonal(RasterFileList= PopList, Zones = Settlements, 
           KeepFields = ['Sett_ID'], 
           RasterDirectory = os.path.join(Source, 'Population'), 
           OutPath = os.path.join(Results, 'Population.csv'), 
           Statistics=['sum'], 
           NoDataVal = -99999, 
           Prefix = 'POP',
           Suffix = '', 
           DropStatName = True)

Dataframe to merge with: 
        Sett_ID
0            3
1            7
2           14
3           17
4           21
...        ...
18547   157067
18548   157070
18549   157072
18550   157073
18551   157079

[18552 rows x 1 columns]
Loading with rasterio. Thu Sep 14 17:00:20 2023
Raster: Q:\GIS\povertyequity\urban_growth\Syria\Source\Population\syr_ppp_2000_UNadj.tif
| 0.00, 0.00, 35.72|
| 0.00,-0.00, 37.32|
| 0.00, 0.00, 1.00|
[[-99999. -99999. -99999. ... -99999. -99999. -99999.]
 [-99999. -99999. -99999. ... -99999. -99999. -99999.]
 [-99999. -99999. -99999. ... -99999. -99999. -99999.]
 ...
 [-99999. -99999. -99999. ... -99999. -99999. -99999.]
 [-99999. -99999. -99999. ... -99999. -99999. -99999.]
 [-99999. -99999. -99999. ... -99999. -99999. -99999.]]
Zonal statistics. Thu Sep 14 17:00:21 2023
sum stat output field for Q:\GIS\povertyequity\urban_growth\Syria\Source\Population\syr_ppp_2000_UNadj.tif: POP2000
       Sett_ID   POP2000
123        425  3.420516
4282     16700       Na

sum stat output field for Q:\GIS\povertyequity\urban_growth\Syria\Source\Population\syr_ppp_2007_UNadj.tif: POP2007
      Sett_ID    POP2000    POP2001    POP2002    POP2003    POP2004  \
3652    13848   0.684297   0.614603   0.726203   0.717191   0.727754   
2292     8322        NaN        NaN        NaN        NaN        NaN   
6454    30477  40.995060  48.956299  48.238609  53.317711  43.049911   
6476    30597   1.003719   0.940220   0.988451   1.104733   1.014861   
9491    50458   0.896038   0.708521   0.750977   0.714063   1.230495   

        POP2005    POP2006    POP2007  
3652   0.641799   0.598872   0.653812  
2292        NaN        NaN        NaN  
6454  56.415146  52.577904  42.857689  
6476   1.211061   1.256148   1.149581  
9491   1.209414   1.379240   1.259499  
Loading with rasterio. Thu Sep 14 17:15:16 2023
Raster: Q:\GIS\povertyequity\urban_growth\Syria\Source\Population\syr_ppp_2008_UNadj.tif
| 0.00, 0.00, 35.72|
| 0.00,-0.00, 37.32|
| 0.00, 0.00, 1.00|
[[-99999. -9

| 0.00, 0.00, 35.72|
| 0.00,-0.00, 37.32|
| 0.00, 0.00, 1.00|
[[-99999. -99999. -99999. ... -99999. -99999. -99999.]
 [-99999. -99999. -99999. ... -99999. -99999. -99999.]
 [-99999. -99999. -99999. ... -99999. -99999. -99999.]
 ...
 [-99999. -99999. -99999. ... -99999. -99999. -99999.]
 [-99999. -99999. -99999. ... -99999. -99999. -99999.]
 [-99999. -99999. -99999. ... -99999. -99999. -99999.]]
Zonal statistics. Thu Sep 14 17:19:33 2023
sum stat output field for Q:\GIS\povertyequity\urban_growth\Syria\Source\Population\syr_ppp_2013_UNadj.tif: POP2013
       Sett_ID    POP2000    POP2001    POP2002    POP2003    POP2004  \
18338   156649   3.088691   2.833685   2.750787   4.517529   4.578218   
1249      4678        NaN        NaN        NaN        NaN        NaN   
15530   127101  21.688093  22.016289  20.894442  22.799839  22.740576   
17372   153758   0.146723   0.116561   0.105576   0.175090   0.170308   
13798   100027   9.406130  11.291383  38.007156  38.423813  28.804192   

    

sum stat output field for Q:\GIS\povertyequity\urban_growth\Syria\Source\Population\syr_ppp_2017_UNadj.tif: POP2017
       Sett_ID    POP2000    POP2001    POP2002    POP2003    POP2004  \
8635     45272  10.993009  19.153950  18.753742  18.560871  18.870796   
9933     52961   0.516831   0.421406   0.464261   0.482957   0.504045   
12476    78325   1.794587   1.524969   1.454607   1.667410   1.600494   
14760   109581   0.054352   0.054390   0.057397   0.047176   0.057171   
13102    85987  46.873096  70.289780  73.410362  75.025879  78.724762   

         POP2005    POP2006    POP2007    POP2008    POP2009    POP2010  \
8635   21.219223  22.664143  20.658342  20.691639  23.581322  26.268219   
9933    0.528743   0.558412   0.518287   0.507905   0.516004   0.494172   
12476   1.755992   1.717225   1.854907   1.904724   2.399550   1.987571   
14760   0.056860   0.055723   0.058462   0.065138   0.074405   0.071015   
13102  78.843262  76.738426  68.282387  67.158470  75.767212  70.66371

---

## 9. PREPARE YEARLY DATASETS: NIGHTTIME LIGHTS

### 9.1 Reclassify with settlement buffer mask.
Reclassify so that we only need to work with cells within X distance of settlements. The two NTL sources have already been reprojected in a separate script, and cropped to Central & Western Africa.

In [79]:
D_avg = glob.glob(os.path.join(NTL, "*D*avg.tif"))
V_avg = glob.glob(os.path.join(NTL, "*V*avg.tif"))
D_cfc = glob.glob(os.path.join(NTL, "*D*cfc.tif"))
V_cfc = glob.glob(os.path.join(NTL, "*V*cfc.tif"))
Settlements = gpd.read_file(os.path.join(Results, 'SETTLEMENTS.gpkg'), layer='SETTLEMENTS')

print(D_avg, V_avg, D_cfc, V_cfc)

['Q:\\GIS\\povertyequity\\urban_growth\\NighttimeLights_VIIRS_DMSP\\Temp\\D_1999_avg.tif', 'Q:\\GIS\\povertyequity\\urban_growth\\NighttimeLights_VIIRS_DMSP\\Temp\\D_2000_avg.tif', 'Q:\\GIS\\povertyequity\\urban_growth\\NighttimeLights_VIIRS_DMSP\\Temp\\D_2001_avg.tif', 'Q:\\GIS\\povertyequity\\urban_growth\\NighttimeLights_VIIRS_DMSP\\Temp\\D_2002_avg.tif', 'Q:\\GIS\\povertyequity\\urban_growth\\NighttimeLights_VIIRS_DMSP\\Temp\\D_2003_avg.tif', 'Q:\\GIS\\povertyequity\\urban_growth\\NighttimeLights_VIIRS_DMSP\\Temp\\D_2004_avg.tif', 'Q:\\GIS\\povertyequity\\urban_growth\\NighttimeLights_VIIRS_DMSP\\Temp\\D_2005_avg.tif', 'Q:\\GIS\\povertyequity\\urban_growth\\NighttimeLights_VIIRS_DMSP\\Temp\\D_2006_avg.tif', 'Q:\\GIS\\povertyequity\\urban_growth\\NighttimeLights_VIIRS_DMSP\\Temp\\D_2007_avg.tif', 'Q:\\GIS\\povertyequity\\urban_growth\\NighttimeLights_VIIRS_DMSP\\Temp\\D_2008_avg.tif', 'Q:\\GIS\\povertyequity\\urban_growth\\NighttimeLights_VIIRS_DMSP\\Temp\\D_2009_avg.tif', 'Q:\\GIS\

#### Nighttime lights from two different sources

In [80]:
BatchZonal(RasterFileList= D_avg, Zones = Settlements, 
           KeepFields = ['Sett_ID'], 
           RasterDirectory = NTL, 
           OutPath = os.path.join(Results, 'NTL_DMSP.csv'), 
           Statistics=['sum', 'max', 'mean', 'median', 'std'], 
           NoDataVal = -99999, # For NTL, the actual NoData is a "soft NaN": 3.40282e+38. Just using pop's here.
           Prefix = 'NTL_D',
           Suffix = '', 
           DropStatName = False)

BatchZonal(RasterFileList= V_avg, Zones = Settlements, 
           KeepFields = ['Sett_ID'], 
           RasterDirectory = NTL, 
           OutPath = os.path.join(Results, 'NTL_VIIRS.csv'), 
           Statistics=['sum', 'max', 'mean', 'median', 'std'], 
           NoDataVal = -99999,
           Prefix = 'NTL_V',
           Suffix = '', 
           DropStatName = False)

Dataframe to merge with: 
        Sett_ID
0            3
1            7
2           14
3           17
4           21
...        ...
18547   157067
18548   157070
18549   157072
18550   157073
18551   157079

[18552 rows x 1 columns]
Loading with rasterio. Thu Sep 14 17:48:02 2023
Raster: Q:\GIS\povertyequity\urban_growth\NighttimeLights_VIIRS_DMSP\Temp\D_1999_avg.tif
| 0.01, 0.00,-180.00|
| 0.00,-0.01, 75.00|
| 0.00, 0.00, 1.00|
[[0.        0.        0.        ... 0.        0.        0.       ]
 [0.        0.        0.        ... 0.        0.        0.       ]
 [0.        0.        0.        ... 0.        0.        0.       ]
 ...
 [6.270092  6.047826  6.1272793 ... 6.0887446 6.471717  6.471717 ]
 [6.413352  5.6817465 5.7555556 ... 6.1291924 6.332485  6.332485 ]
 [0.        0.        0.        ... 0.        0.        0.       ]]
Zonal statistics. Thu Sep 14 17:48:10 2023
std stat output field for Q:\GIS\povertyequity\urban_growth\NighttimeLights_VIIRS_DMSP\Temp\D_1999_avg.tif: NTL_Dstd

| 0.01, 0.00,-180.00|
| 0.00,-0.01, 75.00|
| 0.00, 0.00, 1.00|
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
Zonal statistics. Thu Sep 14 17:51:56 2023
std stat output field for Q:\GIS\povertyequity\urban_growth\NighttimeLights_VIIRS_DMSP\Temp\D_2003_avg.tif: NTL_Dstd2003
       Sett_ID  NTL_Dsum1999  NTL_Dmax1999  NTL_Dmean1999  NTL_Dmedian1999  \
4468     17790           NaN           NaN            NaN              NaN   
15267   123287           NaN           NaN            NaN              NaN   
15502   126642           NaN           NaN            NaN              NaN   
7265     35264           NaN           NaN            NaN              NaN   
3046     11070           NaN           NaN            NaN              NaN   

       NTL_Dstd1999  NTL_Dsum2000  NTL_Dmax2000  NTL_Dmean2000  \
4468            NaN           NaN           NaN            NaN   
15267           

| 0.01, 0.00,-180.00|
| 0.00,-0.01, 75.00|
| 0.00, 0.00, 1.00|
[[0.        0.        0.        ... 0.        0.        0.       ]
 [2.5714285 2.642857  2.7142856 ... 2.7142856 3.4       2.5714285]
 [4.5714283 4.8690476 4.6875    ... 4.8142853 5.5       4.5714283]
 ...
 [2.4       2.6       2.6       ... 2.375     2.275     2.4      ]
 [2.6       2.5       2.7       ... 2.5       2.5       2.6      ]
 [3.        2.8       3.        ... 2.6       2.9       3.       ]]
Zonal statistics. Thu Sep 14 17:54:42 2023
std stat output field for Q:\GIS\povertyequity\urban_growth\NighttimeLights_VIIRS_DMSP\Temp\D_2006_avg.tif: NTL_Dstd2006
       Sett_ID  NTL_Dsum1999  NTL_Dmax1999  NTL_Dmean1999  NTL_Dmedian1999  \
16896   149079           NaN           NaN            NaN              NaN   
13730    99136           NaN           NaN            NaN              NaN   
9630     51241      6.223214      6.223214       6.223214         6.223214   
6504     30813      8.808655      8.808655       8.80

| 0.01, 0.00,-180.00|
| 0.00,-0.01, 75.00|
| 0.00, 0.00, 1.00|
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
Zonal statistics. Thu Sep 14 17:57:23 2023
std stat output field for Q:\GIS\povertyequity\urban_growth\NighttimeLights_VIIRS_DMSP\Temp\D_2009_avg.tif: NTL_Dstd2009
       Sett_ID  NTL_Dsum1999  NTL_Dmax1999  NTL_Dmean1999  NTL_Dmedian1999  \
9137     48520           NaN           NaN            NaN              NaN   
11607    67013           NaN           NaN            NaN              NaN   
9539     50728           NaN           NaN            NaN              NaN   
6906     33079           NaN           NaN            NaN              NaN   
9378     49840           NaN           NaN            NaN              NaN   

       NTL_Dstd1999  NTL_Dsum2000  NTL_Dmax2000  NTL_Dmean2000  \
9137            NaN           NaN           NaN            NaN   
11607           

| 0.01, 0.00,-180.00|
| 0.00,-0.01, 75.00|
| 0.00, 0.00, 1.00|
[[5.5652175 5.9583335 6.5833335 ... 6.125     6.1666665 6.1666665]
 [5.        6.1666665 6.4583335 ... 5.7391305 6.130435  6.130435 ]
 [5.5       5.095238  5.        ... 5.826087  6.173913  6.173913 ]
 ...
 [6.        6.6666665 6.6666665 ... 4.6666665 4.6666665 4.6666665]
 [6.        6.6666665 6.6666665 ... 4.6666665 4.6666665 4.6666665]
 [0.        0.        0.        ... 0.        0.        0.       ]]
Zonal statistics. Thu Sep 14 18:00:56 2023
std stat output field for Q:\GIS\povertyequity\urban_growth\NighttimeLights_VIIRS_DMSP\Temp\D_2013_avg.tif: NTL_Dstd2013
      Sett_ID  NTL_Dsum1999  NTL_Dmax1999  NTL_Dmean1999  NTL_Dmedian1999  \
7158    34478           NaN           NaN            NaN              NaN   
8804    46657           NaN           NaN            NaN              NaN   
8838    46805           NaN           NaN            NaN              NaN   
2928    10544           NaN           NaN            NaN 

| 0.00, 0.00,-180.00|
| 0.00,-0.00, 75.00|
| 0.00, 0.00, 1.00|
[[1.1100459  1.1288927  1.127595   ... 1.1011702  1.1024839  1.087924  ]
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]
 ...
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.77174467 0.795612   0.7914716 ]
 [0.         0.         0.         ... 0.         0.         0.        ]]
Zonal statistics. Thu Sep 14 18:04:32 2023
std stat output field for Q:\GIS\povertyequity\urban_growth\NighttimeLights_VIIRS_DMSP\Temp\V_2014_avg.tif: NTL_Vstd2014
       Sett_ID  NTL_Vsum2012  NTL_Vmax2012  NTL_Vmean2012  NTL_Vmedian2012  \
11023    59258      2.083235      2.083235       2.083235         2.083235   
833       3208           NaN           NaN            NaN              NaN   
7067     33976           NaN           NaN            NaN              NaN   
16317   141939   

#### Cloud-free coverage (confidence metric)

In [81]:
BatchZonal(RasterFileList= D_cfc, Zones = Settlements, 
           KeepFields = ['Sett_ID'], 
           RasterDirectory = NTL, 
           OutPath = os.path.join(Results, 'NTL_DMSPcfc.csv'), 
           Statistics=['sum', 'count', 'mean', 'median', 'std'], 
           NoDataVal = -99999, 
           Prefix = 'NTL_D',
           Suffix = '_cfc', 
           DropStatName = False)

BatchZonal(RasterFileList= V_cfc, Zones = Settlements, 
           KeepFields = ['Sett_ID'], 
           RasterDirectory = NTL, 
           OutPath = os.path.join(Results, 'NTL_VIIRScfc.csv'), 
           Statistics=['sum', 'count', 'mean', 'median', 'std'], 
           NoDataVal = -99999,
           Prefix = 'NTL_V',
           Suffix = '_cfc', 
           DropStatName = False)

Dataframe to merge with: 
        Sett_ID
0            3
1            7
2           14
3           17
4           21
...        ...
18547   157067
18548   157070
18549   157072
18550   157073
18551   157079

[18552 rows x 1 columns]
Loading with rasterio. Thu Sep 14 18:06:35 2023
Raster: Q:\GIS\povertyequity\urban_growth\NighttimeLights_VIIRS_DMSP\Temp\D_1999_cfc.tif
| 0.01, 0.00,-180.00|
| 0.00,-0.01, 75.00|
| 0.00, 0.00, 1.00|
[[ 0  0  0 ...  0  0  0]
 [ 0  0  0 ...  0  0  0]
 [ 0  0  0 ...  0  0  0]
 ...
 [44 44 44 ... 44 44 44]
 [43 43 44 ... 42 42 42]
 [ 0  0  0 ...  0  0  0]]
Zonal statistics. Thu Sep 14 18:06:39 2023
std stat output field for Q:\GIS\povertyequity\urban_growth\NighttimeLights_VIIRS_DMSP\Temp\D_1999_cfc.tif: NTL_Dstd1999_cfc
       Sett_ID  NTL_Dsum1999_cfc  NTL_Dcount1999_cfc  NTL_Dmean1999_cfc  \
14925   114356               NaN                   0                NaN   
4130     16046              42.0                   1               42.0   
14239   104291    

| 0.01, 0.00,-180.00|
| 0.00,-0.01, 75.00|
| 0.00, 0.00, 1.00|
[[1 1 1 ... 1 1 1]
 [1 1 1 ... 1 1 1]
 [1 1 1 ... 1 1 1]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]
Zonal statistics. Thu Sep 14 18:09:49 2023
std stat output field for Q:\GIS\povertyequity\urban_growth\NighttimeLights_VIIRS_DMSP\Temp\D_2003_cfc.tif: NTL_Dstd2003_cfc
       Sett_ID  NTL_Dsum1999_cfc  NTL_Dcount1999_cfc  NTL_Dmean1999_cfc  \
12937    83373               NaN                   0                NaN   
7200     34801               NaN                   0                NaN   
11924    72335               NaN                   0                NaN   
7219     34920               NaN                   0                NaN   
3640     13803               NaN                   0                NaN   

       NTL_Dmedian1999_cfc  NTL_Dstd1999_cfc  NTL_Dsum2000_cfc  \
12937                  NaN               NaN               NaN   
7200                   NaN               NaN               NaN   


| 0.01, 0.00,-180.00|
| 0.00,-0.01, 75.00|
| 0.00, 0.00, 1.00|
[[0 0 0 ... 0 0 0]
 [5 5 5 ... 5 5 5]
 [8 8 8 ... 8 8 8]
 ...
 [5 5 5 ... 5 5 5]
 [5 5 5 ... 5 5 5]
 [5 5 5 ... 5 5 5]]
Zonal statistics. Thu Sep 14 18:12:12 2023
std stat output field for Q:\GIS\povertyequity\urban_growth\NighttimeLights_VIIRS_DMSP\Temp\D_2006_cfc.tif: NTL_Dstd2006_cfc
       Sett_ID  NTL_Dsum1999_cfc  NTL_Dcount1999_cfc  NTL_Dmean1999_cfc  \
7741     38533               NaN                   0                NaN   
12276    76021               NaN                   0                NaN   
1720      6348               NaN                   0                NaN   
12165    74884               NaN                   0                NaN   
4169     16240               NaN                   0                NaN   

       NTL_Dmedian1999_cfc  NTL_Dstd1999_cfc  NTL_Dsum2000_cfc  \
7741                   NaN               NaN               NaN   
12276                  NaN               NaN               NaN   


| 0.01, 0.00,-180.00|
| 0.00,-0.01, 75.00|
| 0.00, 0.00, 1.00|
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]
Zonal statistics. Thu Sep 14 18:14:34 2023
std stat output field for Q:\GIS\povertyequity\urban_growth\NighttimeLights_VIIRS_DMSP\Temp\D_2009_cfc.tif: NTL_Dstd2009_cfc
       Sett_ID  NTL_Dsum1999_cfc  NTL_Dcount1999_cfc  NTL_Dmean1999_cfc  \
11985    72878               NaN                   0                NaN   
9296     49381               NaN                   0                NaN   
11384    61232               NaN                   0                NaN   
5239     22546               NaN                   0                NaN   
3028     10996               NaN                   0                NaN   

       NTL_Dmedian1999_cfc  NTL_Dstd1999_cfc  NTL_Dsum2000_cfc  \
11985                  NaN               NaN               NaN   
9296                   NaN               NaN               NaN   


| 0.01, 0.00,-180.00|
| 0.00,-0.01, 75.00|
| 0.00, 0.00, 1.00|
[[4 4 4 ... 4 4 4]
 [4 4 4 ... 4 4 4]
 [4 4 4 ... 4 4 4]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]
Zonal statistics. Thu Sep 14 18:16:54 2023
std stat output field for Q:\GIS\povertyequity\urban_growth\NighttimeLights_VIIRS_DMSP\Temp\D_2012_cfc.tif: NTL_Dstd2012_cfc
       Sett_ID  NTL_Dsum1999_cfc  NTL_Dcount1999_cfc  NTL_Dmean1999_cfc  \
10484    56175               NaN                   0                NaN   
7811     38993               NaN                   0                NaN   
16925   149603               NaN                   0                NaN   
14870   112773               NaN                   0                NaN   
14760   109581               NaN                   0                NaN   

       NTL_Dmedian1999_cfc  NTL_Dstd1999_cfc  NTL_Dsum2000_cfc  \
10484                  NaN               NaN               NaN   
7811                   NaN               NaN               NaN   


Dataframe to merge with: 
        Sett_ID
0            3
1            7
2           14
3           17
4           21
...        ...
18547   157067
18548   157070
18549   157072
18550   157073
18551   157079

[18552 rows x 1 columns]
Loading with rasterio. Thu Sep 14 18:18:24 2023
Raster: Q:\GIS\povertyequity\urban_growth\NighttimeLights_VIIRS_DMSP\Temp\V_2012_cfc.tif
| 0.00, 0.00,-180.00|
| 0.00,-0.00, 75.00|
| 0.00, 0.00, 1.00|
[[42 40 40 ... 42 42 28]
 [40 40 40 ... 41 41 27]
 [40 40 40 ... 41 41 27]
 ...
 [58 61 63 ... 61 58 54]
 [59 64 66 ... 59 61 55]
 [ 0  0  0 ...  0  0  0]]
Zonal statistics. Thu Sep 14 18:18:40 2023
std stat output field for Q:\GIS\povertyequity\urban_growth\NighttimeLights_VIIRS_DMSP\Temp\V_2012_cfc.tif: NTL_Vstd2012_cfc
       Sett_ID  NTL_Vsum2012_cfc  NTL_Vcount2012_cfc  NTL_Vmean2012_cfc  \
9074     48169               NaN                   0                NaN   
12564    79308               NaN                   0                NaN   
9365     49718    

---

## 10. FLOOD EXPOSURE BY RETURN PERIOD

### 10.1 Calculate Expected Annual Depth (EAD) using exceedance probabilities of every flood return period.

##### Flood layers

In [None]:
FloodFolder = os.path.join(Source, 'Flood')

In [None]:
InRasters = os.listdir(FloodFolder)
InRasters

In [None]:
Exceedances = []
    
for Raster in InRasters:
    InPath = os.path.join(SourceFolder, Raster)
    RP = re.sub('\D', '', Raster)[1:] # Get the return period
    NewFileName = Raster.replace('.tif', '_EXC.tif')
    OutPath = os.path.join(FloodFolder, NewFileName)
    
    Calc = "(1/" + RP + ")*A"

    calcShell(A=InPath, OutFile=OutPath, Calculation = Calc)
    Exceedances = Exceedances + [NewFileName]
    
print('Done with list. New flood set: %s' % Exceedances)

In [None]:
# gdal_calc doesn't always take well to adding together a large number of files, so we'll do it in 2 batches.

Calc = 'A+B+C+D+E'
OutName = 'Batch1.tif'

A = os.path.join(FloodFolder, Exceedances[0])
B = os.path.join(FloodFolder, Exceedances[1])
C = os.path.join(FloodFolder, Exceedances[2])
D = os.path.join(FloodFolder, Exceedances[3])
E = os.path.join(FloodFolder, Exceedances[4])

calcShell(A=A, B=B, C=C, D=D, E=E,
          OutFile = os.path.join(FloodFolder, OutName), 
          Calculation = Calc)


In [None]:
Calc = 'A+B+C+D+E+F'
OutName = 'FU_ExpectedAnnualDepth.tif'

A = os.path.join(FloodFolder, 'Batch1.tif')
B = os.path.join(FloodFolder, Exceedances[5])
C = os.path.join(FloodFolder, Exceedances[6])
D = os.path.join(FloodFolder, Exceedances[7])
E = os.path.join(FloodFolder, Exceedances[8])
F = os.path.join(FloodFolder, Exceedances[9])

calcShell(A=A, B=B, C=C, D=D, E=E, F=F,
          OutFile = os.path.join(FloodFolder, OutName), 
          Calculation = Calc)

### 10.2 Reclassify and resample flood data and buildup data in preparation for the impact calculation.

##### Reclassify flood as a binary: flooded / not-flooded

In [None]:
InPath = os.path.join(FloodFolder, OutName)
OutPath = os.path.join(FloodFolder, 'FU_EAD_reclassed.tif')

[xsize, ysize, geotransform, geoproj, Z] = readRaster(InPath)

Z[Z<0.15] = 0 # Not-flooded category. This includes no data cells.
Z[Z>=0.15] = 1 # Flooded category. This includes permanent water bodies.

writeRaster(OutPath,geotransform,geoproj,Z)
InPath = OutPath = None

print('Finished reclassifying. %s' % time.ctime())

##### Buildup

In [None]:
WSFE = 'WSFE_equalarea.tif'
WSFEPath = os.path.join(Workspace, 'Buildup', WSFE)
OutPath = os.path.join(FloodFolder, WSFE.replace('equalarea.tif', 'simplified.tif'))

[xsize, ysize, geotransform, geoproj, Z] = readRaster(WSFEPath)

np.putmask(Z, Z>0, Z-1984) # All years now converted to at most 2 digits: 1-31. (All non-buildup = 0)

writeRaster(OutPath,geotransform,geoproj,Z)

print('\nSimplified buildup file: %s' % OutPath)

##### Resample flood to match buildup

In [None]:
WSFEPath = os.path.join(FloodFolder, 'WSFE_simplified.tif') 

RasterPath = os.path.join(FloodFolder, 'FU_EAD_reclassed.tif')
OutPath = os.path.join(FloodFolder, 'FU_EAD_resampled.tif')
resampleRaster(RasterPath, WSFEPath, OutPath)
    
print('Resampled to match WSFE. %s' % time.ctime())

### 10.3 Mask out built areas that were not flooded.

In [None]:
# WSFEPath = os.path.join(FloodFolder, 'WSFE_simplified.tif') 
    
InPath = os.path.join(FloodFolder, 'FU_EAD_resampled.tif')
OutPath = os.path.join(FloodFolder, 'FU_EAD_WSFEimpact.tif')

calcShell(A=WSFEPath, B=InPath, OutFile=OutPath, Calculation="A*B", OutType=" --type=Byte")
    
print('Done. Only built-up cells that have been flooded remain.. %s' % time.ctime())

### 10.4 Join with Settlements via concatenation
Using the serial method, combine settlement IDs with 1) WSFE year cells and 2) the flooded-only WSFE year cells under each scenario.

##### Rasterize the settlements we've created.

In [None]:
# WSFEPath = os.path.join(FloodFolder, 'WSFE_simplified.tif') 
OutSett = os.path.join(Results, ''.join(['Settlements', str(WSFE_end), '_rasterized.tif'])
Settlements = gpd.read_file(r'Results/SETTLEMENTS.gpkg', layer='SETTLEMENTS_equalarea')[['Sett_ID', 'geometry']]

len_WSFE = 2 # We already know that the WSFE years are reclassified to 1-31, i.e. a max of 2 digits.

In [None]:
ShapeToRaster(Shapefile=Settlements, ValueVar="Sett_ID", MetaRasterPath=WSFEPath, OutFilePath=OutSett, NewDType = 'uint32')

##### Calculations

In [None]:
Calc = "(A*" + str(10**len_WSFE) + ")+B" 

FloodImpactPath = os.path.join(FloodFolder, 'FU_EAD_WSFEimpact.tif')
FloodSerialPath = os.path.join(FloodFolder, 'FU_Settlements_serial.tif')

#WSFEPath = os.path.join(FloodFolder, 'WSFE_simplified.tif')
WSFESerialPath = os.path.join(FloodFolder, 'WSFE_Settlements_serial.tif')

In [None]:
calcShell(A=OutSett, B=FloodImpactPath, OutFile=FloodSerialPath, Calculation=Calc)
calcShell(A=OutSett, B=WSFEPath, OutFile=WSFESerialPath, Calculation=Calc)

### 10.5 Vector math to split raster strings into Settlement and WSFE year assignments.

##### Vectorize

In [None]:
FloodVec = 'FloodedBuildup.shp' # Was having write issues when putting both in the same gpkg, so we're settling for .shp.
FloodVecPath = os.path.join(FloodFolder, FloodVec)
BuildVec = 'AllBuildup.shp' 
BuildVecPath = os.path.join(FloodFolder, BuildVec)

In [None]:
RasterToShapefile(InRasterPath=FloodSerialPath, OutFilePath=FloodVecPath, 
                  OutName='', VariableName='gridcode')
RasterToShapefile(InRasterPath=WSFESerialPath, OutFilePath=BuildVecPath, 
                  OutName='', VariableName='gridcode')

##### Split string into separate fields

In [None]:
Sett_rio = rasterio.open(os.path.join(Results, ''.join(['Settlements', str(WSFE_end), '_rasterized.tif']))).read(1)
len_Sett = len(str(Sett_rio.max()))
Sett_rio = None

Fill = len_Sett + 2 # Add the digits stored as len_WSFE # or just write +2 since we already know the length of reclassed WSFE.

OutPackage = os.path.join(FloodFolder, 'FloodedSettlements.gpkg')

In [None]:
# Load newly created vectorized datasets.
for File in [FloodVec, BuildVec]:
    InObject = gpd.read_file(os.path.join(FloodFolder, File)).to_crs("ESRI:102022")
    print(InObject.info(), '\n\n', InObject.sample(10), '\n\n', InObject.crs, '\n\n', InObject['gridcode'].max())
    
    InObject['gridstring'] = InObject['gridcode'].astype(str).str.zfill(Fill)

    InObject['Sett_ID'] = InObject['gridstring'].str[:-2].astype(int) # Remove the last digits to get the Sett ID portion.
    InObject['year'] = InObject['gridstring'].str[-2:].astype(int) # Keep only the last digits to get the year portion.
    InObject['year'] = np.where(InObject['year'] > 0, InObject['year'] + 1984, InObject['year']) # Reclass back to year value.
    
    print('%s Serial split by year of buildup and Sett ID.\n\n' % time.ctime(), InObject.sample(10))
    
    # Remove features where year or settlement = 0.
    print("%s Before: %s\n" % (File, InObject.shape))
    InObject = InObject.loc[(InObject["year"] >1984) & (InObject["year"] < 2016) & (InObject["Sett_ID"] != 0)] 
    print("%s After: %s\n" % (File, InObject.shape))

    # Save intermediate file.
    InObject.to_file(driver='GPKG', filename=OutPackage, layer=File.replace('.shp', ''))

### 10.6 Group by settlement and count cells for each year.

##### Flooded buildings

In [None]:
Settlements = gpd.read_file(r'Results/SETTLEMENTS.gpkg', layer='SETTLEMENTS_equalarea')
AllSummaries = pd.DataFrame(Settlements).drop(columns='geometry')[['Sett_ID']]
Settlements = None

ValObject = pd.DataFrame(gpd.read_file(OutPackage, layer='FloodedBuildup'))[['Sett_ID', 'year']]

print(AllSummaries.info(), '\n', AllSummaries.sample(10), '\n', ValObject.info(), '\n', ValObject.sample(10))

In [None]:
for BuiltYear in EligibleYears:
    GroupedVals = ValObject[
        ValObject['year']<=BuiltYear].groupby(
        'Sett_ID', as_index=False)
    
    VariableName = ''.join(['FLDct_', str(BuiltYear)])
    
    AllSummaries = AllSummaries.merge(GroupedVals.count().rename(columns={'year': VariableName}), how = 'left', on='Sett_ID')

    print('\nDesired aggregation methods applied to settlement level, year %s. %s \n' % (BuiltYear, time.ctime()))

    # Save in-progress results
    AllSummaries.to_csv(os.path.join(FloodFolder, 'FloodedCellCount.csv'))
    print(AllSummaries.sort_values(by=AllSummaries.columns[1], ascending=False).head(10))

##### All buildings

In [None]:
Settlements = gpd.read_file(r'Results/SETTLEMENTS.gpkg', layer='SETTLEMENTS_equalarea')
AllSummaries = pd.DataFrame(Settlements).drop(columns='geometry')[['Sett_ID']]
Settlements = None
AllSummaries

ValObject = pd.DataFrame(gpd.read_file(OutPackage, layer='AllBuildup'))[['Sett_ID', 'year']]

In [None]:
for BuiltYear in EligibleYears:
    GroupedVals = ValObject[
        ValObject['year']<=BuiltYear].groupby(
        'Sett_ID', as_index=False)
    
    VariableName = ''.join(['BLDct_', str(BuiltYear)])
    
    AllSummaries = AllSummaries.merge(GroupedVals.count().rename(columns={'year': VariableName}), how = 'left', on='Sett_ID')

    print('\nDesired aggregation methods applied to settlement level, year %s. %s \n' % (BuiltYear, time.ctime()))

    # Save in-progress results
    AllSummaries.to_csv(os.path.join(FloodFolder, 'BuiltCellCount.csv'))
    print(AllSummaries.sort_values(by=AllSummaries.columns[1], ascending=False).head(10))

### 10.7 Calculate area and percent flooded and save to file

In [None]:
BuiltArea = pd.read_csv(os.path.join(FloodFolder, 'BuiltCellCount.csv'))
Flood = pd.read_csv(os.path.join(FloodFolder, 'FloodedCellCount.csv'))
Areas = pd.read_csv(os.path.join(Results, 'Areas.csv'))

for Dataset in [BuiltArea, Flood, Areas]:
    if 'Unnamed: 0' in Dataset.columns:
        Dataset.drop(columns='Unnamed: 0', inplace=True)
    else:
        pass
    print(Dataset.info())

Stats = reduce(lambda  left,right: pd.merge(left,right,on=['Sett_ID'],
                                            how='outer'), [BuiltArea, Flood, Areas])
print(Stats.info())

In [None]:
# Quick spot-checking. Number of flood cells should always be less than or equal to number of built area cells.
Check1 = (Stats['FLDct_2007'] > Stats['BLDct_2007']).sum()
Check2 = (Stats['FLDct_2000'] > Stats['BLDct_2000']).sum()
Check3 = (Stats['FLDct_2011'] > Stats['BLDct_2011']).sum()
print(Check1, Check2, Check3) # All should be zero. 

##### Percent flooded

In [None]:
for year in EligibleYears:
    RawVar = ''.join(['FLDct_', str(year)])
    DenomVar = ''.join(['BLDct_', str(year)])
    NewVar = ''.join(['FLDpc', str(year)])
    if ((RawVar in Stats.columns) and (DenomVar in Stats.columns)):
        Stats[NewVar] = Stats[RawVar] / Stats[DenomVar]
    else:
        pass
Stats.sort_values(by='FLDpc2005', ascending=False).head(10)

##### Area flooded

In [None]:
for year in EligibleYears:
    RawVar = ''.join(['FLDpc', str(year)])
    DenomVar = ''.join(['AREA', str(year)])
    NewVar = ''.join(['FLDarea', str(year)])
    if ((RawVar in Stats.columns) and (DenomVar in Stats.columns)):
        Stats[NewVar] = Stats[RawVar] * Stats[DenomVar]
    else:
        pass
Stats.sort_values(by='FLDarea2005', ascending=False).head(10)

In [None]:
# Drop original variables.
Stats = Stats.loc[:, Stats.columns.str.contains('Sett|FLDpc|FLDarea')]
Stats.columns

In [None]:
# Save to file
Stats.to_csv(os.path.join(Results, 'Flood.csv'))

## 11. GROWTH STATISTICS

### 11.1 Load and prep.

In [None]:
WSFE_end = 2021
WSFE_start = 1985
WSFE_Years = ListFromRange(WSFE_start, WSFE_end)

In [82]:
PlaceNames = pd.read_csv(os.path.join(Results, 'PlaceNames.csv'))
Areas = pd.read_csv(os.path.join(Results, 'Area.csv'))
Population = pd.read_csv(os.path.join(Results, 'Population.csv'))
DMSP = pd.read_csv(os.path.join(Results, 'NTL_DMSP.csv'))
VNL = pd.read_csv(os.path.join(Results, 'NTL_VIIRS.csv'))
#Flood = pd.read_csv(os.path.join(Results, 'Flood.csv'))

RawValues = [PlaceNames, Areas, Population, DMSP, VNL]

for Dataset in RawValues:
    if 'Unnamed: 0' in Dataset.columns:
        Dataset.drop(columns='Unnamed: 0', inplace=True)
    else:
        pass
    if 'year' in Dataset.columns:
        Dataset.drop(columns='year', inplace=True)
    else:
        pass
    print(Dataset.info(verbose=True))

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18552 entries, 0 to 18551
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Sett_ID    18552 non-null  int64 
 1   SettName   18552 non-null  object
 2   GeoName    18552 non-null  object
 3   UCDB_Name  18552 non-null  object
dtypes: int64(1), object(3)
memory usage: 579.9+ KB
None
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18552 entries, 0 to 18551
Data columns (total 39 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Sett_ID   18552 non-null  int64  
 1   ADM_ID    18552 non-null  int64  
 2   AREA2021  18552 non-null  float64
 3   AREA2020  17807 non-null  float64
 4   AREA2019  17460 non-null  float64
 5   AREA2018  17058 non-null  float64
 6   AREA2017  16845 non-null  float64
 7   AREA2016  16629 non-null  float64
 8   AREA2015  16479 non-null  float64
 9   AREA2014  15938 non-null  float64
 10  AREA2013  14137 non

In [83]:
AllStats = reduce(lambda  left,right: pd.merge(left,right,on=['Sett_ID'],
                                            how='outer'), RawValues)
AllStats.to_csv(os.path.join(Results, 'AllStats.csv'))

AllStats[AllStats.SettName!='UNK'].sample(5)

Unnamed: 0,Sett_ID,SettName,GeoName,UCDB_Name,ADM_ID,AREA2021,AREA2020,AREA2019,AREA2018,AREA2017,...,NTL_Vsum2014,NTL_Vmax2014,NTL_Vmean2014,NTL_Vmedian2014,NTL_Vstd2014,NTL_Vsum2015,NTL_Vmax2015,NTL_Vmean2015,NTL_Vmedian2015,NTL_Vstd2015
11049,59420,Talldaww,Talldaww,UNK,107,4.518157,4.20904,4.105746,3.9948,3.971846,...,50.489422,4.321935,2.019577,1.872939,0.912116,27.809113,2.145282,1.112365,1.028654,0.427772
1758,6457,‘Ein Qunīya,‘Ein Qunīya,UNK,170,0.27086,0.263208,0.257087,0.257087,0.253262,...,22.408588,13.588633,11.204294,11.204294,2.384338,26.076504,16.042847,13.038252,13.038252,3.004595
6388,30082,Şalkhad,Şalkhad,UNK,222,4.533459,4.449294,4.4294,4.388083,4.361303,...,80.896942,6.295742,3.370706,3.180894,1.623381,48.982742,3.665474,2.040948,2.092541,0.956662
5812,26407,Ḩarf al Musaytirah,Ḩarf al Musaytirah,UNK,108,0.508819,0.447607,0.439191,0.428479,0.423888,...,13.791924,5.109694,4.597308,4.344055,0.362319,7.653924,2.685563,2.551308,2.663573,0.174547
4720,19348,Al Qamşīyah,Al Qamşīyah,UNK,202,1.11175,0.941123,0.917404,0.890624,0.889094,...,14.041906,2.151807,1.404191,1.343488,0.344194,8.071981,1.105958,0.807198,0.803568,0.146231


### 11.2 Change over time of raw variables
pch = percent change

#### Population change

In [84]:
Stats = PlaceNames.copy().merge(Population, how = 'outer', on='Sett_ID')
for year in EligibleYears:
    RawVar = ''.join(['POP', str(year)])
    LagVar = ''.join(['POP', str(year-1)])
    NewVar = ''.join(['POPpch', str(year)])
    if ((RawVar in Stats.columns) and (LagVar in Stats.columns)):
        Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
    else:
        pass

In [85]:
# Drop original variables.
Stats = Stats.loc[:, Stats.columns.str.contains('Sett|pch')]
Stats.columns

Index(['Sett_ID', 'SettName', 'POPpch2001', 'POPpch2002', 'POPpch2003',
       'POPpch2004', 'POPpch2005', 'POPpch2006', 'POPpch2007', 'POPpch2008',
       'POPpch2009', 'POPpch2010', 'POPpch2011', 'POPpch2012', 'POPpch2013',
       'POPpch2014', 'POPpch2015', 'POPpch2016', 'POPpch2017', 'POPpch2018',
       'POPpch2019', 'POPpch2020'],
      dtype='object')

In [86]:
Stats.to_csv(os.path.join(Results, 'PopChange.csv'))
Stats.drop(columns='SettName', inplace=True)
AllStats = AllStats.merge(Stats, how='left', on='Sett_ID')
AllStats[AllStats.SettName!='UNK'].sort_values(by=AllStats.columns[5], ascending=False).head(10)

Unnamed: 0,Sett_ID,SettName,GeoName,UCDB_Name,ADM_ID,AREA2021,AREA2020,AREA2019,AREA2018,AREA2017,...,POPpch2011,POPpch2012,POPpch2013,POPpch2014,POPpch2015,POPpch2016,POPpch2017,POPpch2018,POPpch2019,POPpch2020
7742,38534,Şaḩnāyā,Şaḩnāyā,Damascus,28,281.043874,275.274712,273.803346,270.629542,268.027298,...,-0.031286,0.023206,-0.064291,-0.037347,-0.035103,-0.043332,-0.027361,0.001813,-0.004629,0.040336
13495,96393,Ḩuraytān,Ḩuraytān,Aleppo,110,115.537029,113.038844,112.337975,111.384609,110.751838,...,-0.095489,0.155745,-0.130523,-0.025649,-0.056103,-0.074796,-0.040655,0.036965,-0.052175,0.101168
9524,50649,Homs,Homs,Homs,115,61.194432,58.507258,57.903562,57.279972,56.77804,...,-0.095587,0.095903,-0.127497,-0.023739,-0.05268,-0.036555,-0.075504,0.045738,-0.046187,0.078855
2966,10708,Qadsayyā,Qadsayyā,Damascus,61,55.778766,55.175835,54.813923,54.318112,53.143621,...,-0.006455,-0.021466,-0.008715,-0.037205,-0.05595,-0.005616,-0.062011,-0.004596,-0.005218,0.011233
10840,58135,Ḩamāh,Ḩamāh,Hama,99,48.053139,45.659013,44.727072,43.723972,43.055239,...,-0.109395,0.131973,-0.199106,-0.019397,-0.05438,-0.02881,-0.075182,0.039631,-0.043457,0.069917
12246,75655,Batabo,Batabo,UNK,43,47.433375,42.164615,39.378737,37.328925,36.593625,...,0.003911,0.076824,-0.015279,-0.047508,-0.052922,-0.094035,-0.03211,0.066806,-0.047084,0.073159
6744,32195,As-Suwayda,As-Suwayda,As Suwayda,39,30.747184,29.289591,28.877945,28.323218,27.970488,...,-0.142862,0.159841,-0.230049,-0.002613,-0.060484,-0.032212,-0.049829,0.050318,-0.067175,0.081853
5468,24050,Latakia,Latakia,Latakia,104,27.997268,27.450192,27.229066,27.001819,26.691937,...,-0.042622,0.078339,-0.103963,0.00385,-0.053288,-0.045033,-0.046994,0.005794,-0.026737,0.056819
15807,133699,Ar Raqqah,Ar Raqqah,Ar Raqqah,26,24.873198,23.289356,23.169994,22.967997,22.587722,...,-0.162722,0.234975,-0.228939,0.027231,-0.142488,-0.076009,-0.093134,0.121209,-0.141902,0.173449
3211,11851,Al Kiswah,Al Kiswah,UNK,153,24.496748,24.20523,24.108822,23.938196,23.769865,...,-0.055476,0.114704,-0.152174,-0.03634,-0.022059,-0.003674,-0.017776,0.001058,0.001656,0.068369


#### Area change

In [87]:
Stats = PlaceNames.copy().merge(Areas, how = 'outer', on='Sett_ID')
for year in EligibleYears:
    RawVar = ''.join(['AREA', str(year)])
    LagVar = ''.join(['AREA', str(year-1)])
    NewVar = ''.join(['AREApch', str(year)])
    if ((RawVar in Stats.columns) and (LagVar in Stats.columns)):
        Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
    else:
        pass

In [88]:
# Drop original variables.
Stats = Stats.loc[:, Stats.columns.str.contains('Sett|pch')]
Stats.columns

Index(['Sett_ID', 'SettName', 'AREApch1986', 'AREApch1987', 'AREApch1988',
       'AREApch1989', 'AREApch1990', 'AREApch1991', 'AREApch1992',
       'AREApch1993', 'AREApch1994', 'AREApch1995', 'AREApch1996',
       'AREApch1997', 'AREApch1998', 'AREApch1999', 'AREApch2000',
       'AREApch2001', 'AREApch2002', 'AREApch2003', 'AREApch2004',
       'AREApch2005', 'AREApch2006', 'AREApch2007', 'AREApch2008',
       'AREApch2009', 'AREApch2010', 'AREApch2011', 'AREApch2012',
       'AREApch2013', 'AREApch2014', 'AREApch2015', 'AREApch2016',
       'AREApch2017', 'AREApch2018', 'AREApch2019', 'AREApch2020',
       'AREApch2021'],
      dtype='object')

In [89]:
Stats.to_csv(os.path.join(Results, 'AreaChange.csv'))
Stats.drop(columns='SettName', inplace=True)
AllStats = AllStats.merge(Stats, how='left', on='Sett_ID')
AllStats[AllStats.SettName!='UNK'].sort_values(by=AllStats.columns[5], ascending=False).head(10)

Unnamed: 0,Sett_ID,SettName,GeoName,UCDB_Name,ADM_ID,AREA2021,AREA2020,AREA2019,AREA2018,AREA2017,...,AREApch2012,AREApch2013,AREApch2014,AREApch2015,AREApch2016,AREApch2017,AREApch2018,AREApch2019,AREApch2020,AREApch2021
7742,38534,Şaḩnāyā,Şaḩnāyā,Damascus,28,281.043874,275.274712,273.803346,270.629542,268.027298,...,0.107653,0.035557,0.054777,0.027551,0.009488,0.010879,0.009709,0.011727,0.005374,0.020958
13495,96393,Ḩuraytān,Ḩuraytān,Aleppo,110,115.537029,113.038844,112.337975,111.384609,110.751838,...,0.052468,0.019745,0.040996,0.014849,0.003445,0.00604,0.005713,0.008559,0.006239,0.0221
9524,50649,Homs,Homs,Homs,115,61.194432,58.507258,57.903562,57.279972,56.77804,...,0.039025,0.020639,0.04908,0.01381,0.005801,0.006989,0.00884,0.010887,0.010426,0.045929
2966,10708,Qadsayyā,Qadsayyā,Damascus,61,55.778766,55.175835,54.813923,54.318112,53.143621,...,0.092324,0.047622,0.125536,0.039791,0.027261,0.019478,0.0221,0.009128,0.006603,0.010927
10840,58135,Ḩamāh,Ḩamāh,Hama,99,48.053139,45.659013,44.727072,43.723972,43.055239,...,0.0521,0.045541,0.09984,0.016797,0.010025,0.019236,0.015532,0.022942,0.020836,0.052435
12246,75655,Batabo,Batabo,UNK,43,47.433375,42.164615,39.378737,37.328925,36.593625,...,0.49425,0.199247,2.191315,0.037783,0.004694,0.024901,0.020094,0.054912,0.070746,0.124957
6744,32195,As-Suwayda,As-Suwayda,As Suwayda,39,30.747184,29.289591,28.877945,28.323218,27.970488,...,0.16449,0.072502,0.153695,0.01855,0.008731,0.007635,0.012611,0.019586,0.014255,0.049765
5468,24050,Latakia,Latakia,Latakia,104,27.997268,27.450192,27.229066,27.001819,26.691937,...,0.033893,0.018205,0.024678,0.089413,0.011478,0.009871,0.01161,0.008416,0.008121,0.01993
15807,133699,Ar Raqqah,Ar Raqqah,Ar Raqqah,26,24.873198,23.289356,23.169994,22.967997,22.587722,...,0.038875,0.031967,0.012719,0.021267,0.016878,0.022939,0.016835,0.008795,0.005152,0.068007
3211,11851,Al Kiswah,Al Kiswah,UNK,153,24.496748,24.20523,24.108822,23.938196,23.769865,...,0.066751,0.02789,0.047031,0.016678,0.007026,0.008113,0.007082,0.007128,0.003999,0.012044


#### NTL change

In [91]:
# Stats = PlaceNames.copy().merge(VNL, how = 'outer', on='Sett_ID')
# Stats = Stats.merge(DMSP, how = 'outer', on='Sett_ID')
# Sensors = ['D_', 'V_']
# Methods = ['sum', 'avg', 'max']

# for year in WSFE_Years:
#     for Sensor in Sensors:
#         for agg in Methods:
#             RawVar = ''.join(['NTL', agg, Sensor, str(year)])
#             LagVar = ''.join(['NTL', agg, Sensor, str(year-1)])
#             NewVar = ''.join(['NTL', agg, '_pch', Sensor, str(year)])
#             if ((RawVar in Stats.columns) and (LagVar in Stats.columns)):
#                 Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
#             else:
#                 pass

In [92]:
# Drop original variables.
# Stats = Stats.loc[:, Stats.columns.str.contains('Sett|pch')]
# Stats.columns

Index(['Sett_ID', 'SettName'], dtype='object')

In [None]:
# Stats.to_csv(os.path.join(Results, 'NTLChange.csv'))

In [None]:
# Stats.drop(columns='SettName', inplace=True)
# AllStats = AllStats.merge(Stats, how='left', on='Sett_ID')
# AllStats[AllStats.SettName!='UNK'].sort_values(by=AllStats.columns[5], ascending=False).head(10)

#### Flood change: change in area and change in percent of total area

In [None]:
Stats = PlaceNames.copy().merge(Flood, how = 'outer', on='Sett_ID')
for year in EligibleYears:
    RawVar = ''.join(['FLDarea', str(year)])
    LagVar = ''.join(['FLDarea', str(year-1)])
    NewVar = ''.join(['FLDareapch', str(year)])
    if ((RawVar in Stats.columns) and (LagVar in Stats.columns)):
        Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
    else:
        pass

In [None]:
for year in EligibleYears:
    RawVar = ''.join(['FLDpc', str(year)])
    LagVar = ''.join(['FLDpc', str(year-1)])
    NewVar = ''.join(['FLDpcpch', str(year)])
    if ((RawVar in Stats.columns) and (LagVar in Stats.columns)):
        Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
    else:
        pass

In [None]:
# Drop original variables.
Stats = Stats.loc[:, Stats.columns.str.contains('Sett|FLDareapch|FLDpcpch')]
Stats.columns

In [None]:
Stats.to_csv(os.path.join(Results, 'FloodChange.csv'))
Stats.drop(columns='SettName', inplace=True)
AllStats = AllStats.merge(Stats, how='left', on='Sett_ID')
AllStats[AllStats.SettName!='UNK'].sort_values(by=AllStats.columns[5], ascending=False).head(10)

#### Update parent spreadsheet

In [93]:
AllStats.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 18552 entries, 0 to 18551
Data columns (total 214 columns):
 #    Column           Dtype  
---   ------           -----  
 0    Sett_ID          int64  
 1    SettName         object 
 2    GeoName          object 
 3    UCDB_Name        object 
 4    ADM_ID           int64  
 5    AREA2021         float64
 6    AREA2020         float64
 7    AREA2019         float64
 8    AREA2018         float64
 9    AREA2017         float64
 10   AREA2016         float64
 11   AREA2015         float64
 12   AREA2014         float64
 13   AREA2013         float64
 14   AREA2012         float64
 15   AREA2011         float64
 16   AREA2010         float64
 17   AREA2009         float64
 18   AREA2008         float64
 19   AREA2007         float64
 20   AREA2006         float64
 21   AREA2005         float64
 22   AREA2004         float64
 23   AREA2003         float64
 24   AREA2002         float64
 25   AREA2001         float64
 26   AREA2000        

In [94]:
AllStats.to_csv(os.path.join(Results, 'AllStats.csv'))

### 11.3 Densities
POPden = people per square kilometer
<br>NTL...den = nighttime light luminosity per square kilometer
<br>NTL...pop = nighttime light luminosity per capita

#### Population Density

In [95]:
Stats = PlaceNames.copy().merge(Population, how = 'outer', on='Sett_ID')
Stats = Stats.merge(Areas, how='left', on='Sett_ID')

for year in EligibleYears:
    RawVar = ''.join(['POP', str(year)])
    DenomVar = ''.join(['AREA', str(year)])
    NewVar = ''.join(['POPden', str(year)])
    if ((RawVar in Stats.columns) and (DenomVar in Stats.columns)):
        Stats[NewVar] = Stats[RawVar] / Stats[DenomVar]
    else:
        pass

In [96]:
# Change in density
for year in EligibleYears:
    RawVar = ''.join(['POPden', str(year)])
    LagVar = ''.join(['POPden', str(year-1)])
    NewVar = ''.join(['POPdenpch', str(year)])
    if ((RawVar in Stats.columns) and (LagVar in Stats.columns)):
        Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
    else:
        pass

In [97]:
# Drop original variables.
Stats = Stats.loc[:, ~Stats.columns.str.contains('AREA|POP1|POP2|ct|year|ADM')]
Stats.columns

Index(['Sett_ID', 'SettName', 'GeoName', 'UCDB_Name', 'POPden2000',
       'POPden2001', 'POPden2002', 'POPden2003', 'POPden2004', 'POPden2005',
       'POPden2006', 'POPden2007', 'POPden2008', 'POPden2009', 'POPden2010',
       'POPden2011', 'POPden2012', 'POPden2013', 'POPden2014', 'POPden2015',
       'POPden2016', 'POPden2017', 'POPden2018', 'POPden2019', 'POPden2020',
       'POPdenpch2001', 'POPdenpch2002', 'POPdenpch2003', 'POPdenpch2004',
       'POPdenpch2005', 'POPdenpch2006', 'POPdenpch2007', 'POPdenpch2008',
       'POPdenpch2009', 'POPdenpch2010', 'POPdenpch2011', 'POPdenpch2012',
       'POPdenpch2013', 'POPdenpch2014', 'POPdenpch2015', 'POPdenpch2016',
       'POPdenpch2017', 'POPdenpch2018', 'POPdenpch2019', 'POPdenpch2020'],
      dtype='object')

In [98]:
Stats.to_csv(os.path.join(Results, 'PopDensity.csv'))
Stats.drop(columns='SettName', inplace=True)
AllStats = AllStats.merge(Stats, how='left', on='Sett_ID')
AllStats[AllStats.SettName!='UNK'].sort_values(by=AllStats.columns[5], ascending=False).head(10)

Unnamed: 0,Sett_ID,SettName,GeoName_x,UCDB_Name_x,ADM_ID,AREA2021,AREA2020,AREA2019,AREA2018,AREA2017,...,POPdenpch2011,POPdenpch2012,POPdenpch2013,POPdenpch2014,POPdenpch2015,POPdenpch2016,POPdenpch2017,POPdenpch2018,POPdenpch2019,POPdenpch2020
7742,38534,Şaḩnāyā,Şaḩnāyā,Damascus,28,281.043874,275.274712,273.803346,270.629542,268.027298,...,-0.038499,-0.07624,-0.09642,-0.08734,-0.060974,-0.052324,-0.037829,-0.00782,-0.016166,0.034776
13495,96393,Ḩuraytān,Ḩuraytān,Aleppo,110,115.537029,113.038844,112.337975,111.384609,110.751838,...,-0.09634,0.098128,-0.147359,-0.06402,-0.069914,-0.077973,-0.046414,0.031074,-0.060219,0.09434
9524,50649,Homs,Homs,Homs,115,61.194432,58.507258,57.903562,57.279972,56.77804,...,-0.100613,0.054742,-0.145141,-0.069413,-0.065584,-0.042112,-0.08192,0.036575,-0.056459,0.067723
2966,10708,Qadsayyā,Qadsayyā,Damascus,61,55.778766,55.175835,54.813923,54.318112,53.143621,...,-0.045077,-0.104173,-0.053776,-0.14459,-0.092078,-0.032005,-0.079932,-0.026119,-0.014216,0.0046
10840,58135,Ḩamāh,Ḩamāh,Hama,99,48.053139,45.659013,44.727072,43.723972,43.055239,...,-0.112722,0.075918,-0.233991,-0.108413,-0.070002,-0.03845,-0.092636,0.02373,-0.064909,0.048079
12246,75655,Batabo,Batabo,UNK,43,47.433375,42.164615,39.378737,37.328925,36.593625,...,0.003911,-0.279355,-0.178884,-0.701536,-0.087403,-0.098267,-0.055627,0.045792,-0.096687,0.002254
6744,32195,As-Suwayda,As-Suwayda,As Suwayda,39,30.747184,29.289591,28.877945,28.323218,27.970488,...,-0.144296,-0.003993,-0.282099,-0.135485,-0.077595,-0.040589,-0.057029,0.037237,-0.085094,0.066648
5468,24050,Latakia,Latakia,Latakia,104,27.997268,27.450192,27.229066,27.001819,26.691937,...,-0.054514,0.042989,-0.119983,-0.020326,-0.130989,-0.05587,-0.056309,-0.005749,-0.03486,0.048306
15807,133699,Ar Raqqah,Ar Raqqah,Ar Raqqah,26,24.873198,23.289356,23.169994,22.967997,22.587722,...,-0.166516,0.188762,-0.252824,0.01433,-0.160346,-0.091346,-0.11347,0.102645,-0.149383,0.167435
3211,11851,Al Kiswah,Al Kiswah,UNK,153,24.496748,24.20523,24.108822,23.938196,23.769865,...,-0.060087,0.044952,-0.175179,-0.079626,-0.038102,-0.010625,-0.02568,-0.005981,-0.005433,0.064113


#### Nighttime Lights Density

In [None]:
# Stats = PlaceNames.copy().merge(NTL, how = 'outer', on='Sett_ID')
# Stats = Stats.merge(Areas, how='left', on='Sett_ID')
# Sensors = ['D_', 'V_']
# Methods = ['sum', 'avg', 'max']

# for year in EligibleYears:
#     for Sensor in Sensors:
#         for agg in Methods:
#             RawVar = ''.join(['NTL', agg, Sensor, str(year)])
#             DenomVar = ''.join(['AREA', str(year)])
#             NewVar = ''.join(['NTL', agg, '_den', Sensor, str(year)])
#             if ((RawVar in Stats.columns) and (DenomVar in Stats.columns)):
#                 Stats[NewVar] = Stats[RawVar] / Stats[DenomVar]
#             else:
#                 pass

In [None]:
# # Change in density
# for year in EligibleYears:
#     for Sensor in Sensors:
#         for agg in Methods:
#             RawVar = ''.join(['NTL', agg, '_den', Sensor, str(year)])
#             LagVar = ''.join(['NTL', agg, '_den', Sensor, str(year-1)])
#             NewVar = ''.join(['NTL', agg, '_denpch', Sensor, str(year)])
#             if ((RawVar in Stats.columns) and (LagVar in Stats.columns)):
#                 Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
#             else:
#                 pass

In [99]:
list(Stats.columns)

['Sett_ID',
 'GeoName',
 'UCDB_Name',
 'POPden2000',
 'POPden2001',
 'POPden2002',
 'POPden2003',
 'POPden2004',
 'POPden2005',
 'POPden2006',
 'POPden2007',
 'POPden2008',
 'POPden2009',
 'POPden2010',
 'POPden2011',
 'POPden2012',
 'POPden2013',
 'POPden2014',
 'POPden2015',
 'POPden2016',
 'POPden2017',
 'POPden2018',
 'POPden2019',
 'POPden2020',
 'POPdenpch2001',
 'POPdenpch2002',
 'POPdenpch2003',
 'POPdenpch2004',
 'POPdenpch2005',
 'POPdenpch2006',
 'POPdenpch2007',
 'POPdenpch2008',
 'POPdenpch2009',
 'POPdenpch2010',
 'POPdenpch2011',
 'POPdenpch2012',
 'POPdenpch2013',
 'POPdenpch2014',
 'POPdenpch2015',
 'POPdenpch2016',
 'POPdenpch2017',
 'POPdenpch2018',
 'POPdenpch2019',
 'POPdenpch2020']

In [None]:
# Drop original variables.
Stats = Stats.loc[:, Stats.columns.str.contains('Sett|den')]
list(Stats.columns)

In [None]:
Stats.to_csv(os.path.join(Results, 'NTLDensity.csv'))
Stats.drop(columns='SettName', inplace=True)
AllStats = AllStats.merge(Stats, how='left', on='Sett_ID')
AllStats[AllStats.SettName!='UNK'].sort_values(by=AllStats.columns[5], ascending=False).head(10)

#### Nighttime Lights per Capita

In [None]:
Stats = PlaceNames.copy().merge(NTL, how = 'outer', on='Sett_ID')
Stats = Stats.merge(Population, how='left', on='Sett_ID')
Sensors = ['D_', 'V_']
Methods = ['sum', 'avg', 'max']

for year in EligibleYears:
    for Sensor in Sensors:
        for agg in Methods:
            RawVar = ''.join(['NTL', agg, Sensor, str(year)])
            DenomVar = ''.join(['POP', str(year)])
            NewVar = ''.join(['NTL', agg, '_pop', Sensor, str(year)])
            if ((RawVar in Stats.columns) and (DenomVar in Stats.columns)):
                Stats[NewVar] = Stats[RawVar] / Stats[DenomVar]
            else:
                pass

In [None]:
# Change in density
for year in EligibleYears:
    for Sensor in Sensors:
        for agg in Methods:
            RawVar = ''.join(['NTL', agg, '_pop', Sensor, str(year)])
            LagVar = ''.join(['NTL', agg, '_pop', Sensor, str(year-1)])
            NewVar = ''.join(['NTL', agg, '_poppch', Sensor, str(year)])
            if ((RawVar in Stats.columns) and (LagVar in Stats.columns)):
                Stats[NewVar] = (Stats[RawVar] - Stats[LagVar]) / Stats[LagVar]
            else:
                pass

In [None]:
list(Stats.columns)

In [None]:
# Drop original variables.
Stats = Stats.loc[:, Stats.columns.str.contains('Sett|_pop')]
list(Stats.columns)

In [None]:
Stats.to_csv(os.path.join(Results, 'NTLperCapita.csv'))
Stats.drop(columns='SettName', inplace=True)
AllStats = AllStats.merge(Stats, how='left', on='Sett_ID')
AllStats[AllStats.SettName!='UNK'].sort_values(by=AllStats.columns[5], ascending=False).head(10)

#### Update parent spreadsheet

In [100]:
AllStats.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 18552 entries, 0 to 18551
Data columns (total 257 columns):
 #    Column           Dtype  
---   ------           -----  
 0    Sett_ID          int64  
 1    SettName         object 
 2    GeoName_x        object 
 3    UCDB_Name_x      object 
 4    ADM_ID           int64  
 5    AREA2021         float64
 6    AREA2020         float64
 7    AREA2019         float64
 8    AREA2018         float64
 9    AREA2017         float64
 10   AREA2016         float64
 11   AREA2015         float64
 12   AREA2014         float64
 13   AREA2013         float64
 14   AREA2012         float64
 15   AREA2011         float64
 16   AREA2010         float64
 17   AREA2009         float64
 18   AREA2008         float64
 19   AREA2007         float64
 20   AREA2006         float64
 21   AREA2005         float64
 22   AREA2004         float64
 23   AREA2003         float64
 24   AREA2002         float64
 25   AREA2001         float64
 26   AREA2000        

In [101]:
AllStats.to_csv(os.path.join(Results, 'AllStats.csv'))

### 11.4 Urban Type

In [102]:
for year in EligibleYears:
    PopVar = ''.join(['POP', str(year)])
    DenVar = ''.join(['POPden', str(year)])
    NewVar = ''.join(['UrbType', str(year)])
    if ((PopVar in AllStats.columns) and (DenVar in AllStats.columns)):
        AllStats[NewVar] = 'LD'
        AllStats.loc[(AllStats[PopVar] >= 5000) & (AllStats[DenVar] >= 300), NewVar] = 'SDurban'
        AllStats.loc[(AllStats[PopVar] >= 50000) & (AllStats[DenVar] >= 1500), NewVar] = 'HDurban'
    else:
        pass

In [103]:
AllStats.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 18552 entries, 0 to 18551
Data columns (total 278 columns):
 #    Column           Dtype  
---   ------           -----  
 0    Sett_ID          int64  
 1    SettName         object 
 2    GeoName_x        object 
 3    UCDB_Name_x      object 
 4    ADM_ID           int64  
 5    AREA2021         float64
 6    AREA2020         float64
 7    AREA2019         float64
 8    AREA2018         float64
 9    AREA2017         float64
 10   AREA2016         float64
 11   AREA2015         float64
 12   AREA2014         float64
 13   AREA2013         float64
 14   AREA2012         float64
 15   AREA2011         float64
 16   AREA2010         float64
 17   AREA2009         float64
 18   AREA2008         float64
 19   AREA2007         float64
 20   AREA2006         float64
 21   AREA2005         float64
 22   AREA2004         float64
 23   AREA2003         float64
 24   AREA2002         float64
 25   AREA2001         float64
 26   AREA2000        

In [104]:
AllStats.to_csv(os.path.join(Results, 'AllStats.csv'))

In [105]:
Stats = AllStats.loc[:, AllStats.columns.str.contains('Sett|UrbType')]
list(Stats.columns)

['Sett_ID',
 'SettName',
 'UrbType2000',
 'UrbType2001',
 'UrbType2002',
 'UrbType2003',
 'UrbType2004',
 'UrbType2005',
 'UrbType2006',
 'UrbType2007',
 'UrbType2008',
 'UrbType2009',
 'UrbType2010',
 'UrbType2011',
 'UrbType2012',
 'UrbType2013',
 'UrbType2014',
 'UrbType2015',
 'UrbType2016',
 'UrbType2017',
 'UrbType2018',
 'UrbType2019',
 'UrbType2020']

In [106]:
Stats[Stats.SettName!='UNK'].sort_values(by=Stats.columns[5], ascending=False).head(10)

Unnamed: 0,Sett_ID,SettName,UrbType2000,UrbType2001,UrbType2002,UrbType2003,UrbType2004,UrbType2005,UrbType2006,UrbType2007,...,UrbType2011,UrbType2012,UrbType2013,UrbType2014,UrbType2015,UrbType2016,UrbType2017,UrbType2018,UrbType2019,UrbType2020
14553,107002,Māri‘,LD,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,...,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,LD,SDurban,LD,SDurban
4953,20808,Khirbat al Ma‘azzah,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,...,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban
15236,122595,Jarābulus,LD,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,...,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,LD,LD,LD,SDurban
12718,80810,‘Afrīn,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,...,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban
15482,126346,Ath Thawrah,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,...,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban
5040,21342,Şāfītā,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,...,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban
9219,48944,Al Quşayr,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,...,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban
15684,131596,‘Ayn al ‘Arab,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,...,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban
4954,20811,Ţarţūs,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,...,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban
4861,20308,Baniyas,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,...,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban,SDurban


In [107]:
Stats.to_csv(os.path.join(Results, 'UrbanType.csv'))