---

## 0. PSEUDOCODE / OVERVIEW

##### Prep data
Merge: GRID3_BFA, GRID3_MLI, GRID3_NER, GRID3_TCD
<br> Export GRID3 without Built-Up Area class (projection = Africa Albers) = GRID3_Sahel_rural
<br> Reproject ADM3.shp to Africa Albers. (retain ADM2 and ADM1 codes)
<br> Reproject LZ.shp to Africa Albers

##### Spatial join
Convert GRID3 geometries to centroids.
<br> Spatial join ADM3 onto GRID3.
<br> Spatial join LZ onto GRID3.
<br> Convert GRID3 to dataframe (no need for geometries starting here).

##### Largest population by group
GRID3: Group by ADM3 and LZ. Sum population field (reset index to create new dataframe).
<br> NewLZ field: Select the LZ value of the datapoint which has the largest population in its ADM group.
<br> Repeat for other ADMs.

##### Aggregate to ADM
Group by ADM3. Select "first" of the NewLZ field. (or max)
<br> Repeat for other ADMs.

## 1. PREPARE WORKSPACE

### 1.1 Load all packages.

In [None]:
# Built-in:
# dir(), print(), range(), format(), int(), len(), list(), max(), min(), zip(), sorted(), sum(), open(), del, = None, try except, with as, for in, if elif else
# Also: list.append(), list.insert(), list.remove(), count(), startswith(), endswith(), contains(), replace()

import os, sys, glob, re, time, subprocess, string # os.getcwd(), os.path.join(), os.listdir(), os.remove(), time.ctime(), glob.glob(), string.zfill(), string.join()
from os.path import exists # exists()
from functools import reduce # reduce()

import geopandas as gpd # read_file(), GeoDataFrame(), sjoin_nearest(), to_crs(), to_file(), .crs, buffer(), dissolve()
import pandas as pd # .dtypes, Series(), concat(), DataFrame(), read_table(), merge(), to_csv(), .loc[], head(), sample(), astype(), unique(), rename(), between(), drop(), fillna(), idxmax(), isna(), isin(), apply(), info(), sort_values(), notna(), groupby(), value_counts(), duplicated(), drop_duplicates()
from shapely.geometry import Point, LineString, Polygon, shape, MultiPoint
from shapely.ops import cascaded_union
from shapely.validation import make_valid  # in apply(make_valid)
import shapely.wkt

import numpy as np # median(), mean(), tolist(), .inf
import fiona, rioxarray # fiona.open()
import rasterio # open(), write_band(), .name, .count, .width, .height. nodatavals, .meta, update(), copy(), write()
from rasterio.plot import show
from rasterio import features # features.rasterize()
from rasterio.features import shapes
from rasterio import mask # rasterio.mask.mask()
from rasterio.enums import Resampling # rasterio.enums.Resampling()
from osgeo import gdal, osr, ogr, gdal_array, gdalconst # Open(), SpatialReference, WarpOptions(), Warp(), GetDataTypeName(), GetRasterBand(), GetNoDataValue(), Translate(), GetProjection(), GetAttrValue()

In [None]:
ProjectFolder = os.getcwd()
print(ProjectFolder)

G3Folder = os.path.join(ProjectFolder, 'GRID3')
print(G3Folder)

---

## 2. PREPARE DATA
All datasets are already projected to: Africa Albers Equal Area Conic

### 2.1 Merge GRID3 Sahel datasets

In [4]:
G3_files = glob.glob(os.path.join(G3Folder, "*.shp"))
G3_list = [gpd.read_file(f) for f in G3_files]
G3_list

[           FID  Shape_Leng       country  iso bld_count dou_level1  \
 0            1    0.002927  Burkina Faso  BFA      1-50      Rural   
 1            2    0.004368  Burkina Faso  BFA      1-50      Rural   
 2            3    0.002892  Burkina Faso  BFA      1-50      Rural   
 3            4    0.003012  Burkina Faso  BFA      1-50      Rural   
 4            5    0.002961  Burkina Faso  BFA      1-50      Rural   
 ...        ...         ...           ...  ...       ...        ...   
 566987  566988    0.007608  Burkina Faso  BFA      1-50      Rural   
 566988  566989    0.006485  Burkina Faso  BFA      1-50      Rural   
 566989  566990    0.003440  Burkina Faso  BFA      1-50      Rural   
 566990  566991    0.002947  Burkina Faso  BFA      1-50      Rural   
 566991  566992    0.013777  Burkina Faso  BFA    51-100      Rural   
 
                     dou_level2    status  is_fp   prob_fp           mgrs  \
 0       Very Low Density Rural  existing    1.0  0.848630   30PUR119

In [5]:
print('Match all projections for concatenation.')
for prevf, f in zip(G3_list, G3_list[1:]):
    print('Checking...')
    if f.crs != prevf.crs:
        try:
            f.to_crs(prevf.crs)
            print('Reprojecting to match previous.')
        except:
            pass
    else:
        print('Matches with previous.')
        
print('\n\nFinal CRS list:')
for f in G3_list:
    print(f.crs)

Match all projections for concatenation.
Checking...
Matches with previous.
Checking...
Matches with previous.
Checking...
Matches with previous.


Final CRS list:
epsg:4326
epsg:4326
epsg:4326
epsg:4326


In [6]:
G3 = pd.concat(G3_list, ignore_index=True)
print(G3.info(), '\n\n', G3['type'].unique(), '\n\n', G3.head(10))

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 1662499 entries, 0 to 1662498
Data columns (total 30 columns):
 #   Column      Non-Null Count    Dtype   
---  ------      --------------    -----   
 0   FID         566992 non-null   float64 
 1   Shape_Leng  566992 non-null   float64 
 2   country     1662499 non-null  object  
 3   iso         1662499 non-null  object  
 4   bld_count   566992 non-null   object  
 5   dou_level1  566992 non-null   object  
 6   dou_level2  566992 non-null   object  
 7   status      566992 non-null   object  
 8   is_fp       566992 non-null   float64 
 9   prob_fp     566992 non-null   float64 
 10  mgrs        566992 non-null   object  
 11  pcode       566992 non-null   object  
 12  date        566992 non-null   object  
 13  area_m2     566992 non-null   float64 
 14  SHAPE_Le_1  566992 non-null   float64 
 15  SHAPE_Area  566992 non-null   float64 
 16  geometry    1662499 non-null  geometry
 17  OBJECTID    1095507 non-null  float64 

In [7]:
# We don't need most of those columns.
G3['G3_ID'] = G3.index
G3 = G3[['G3_ID','type', 'pop_un_adj', 'geometry']]
G3.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 1662499 entries, 0 to 1662498
Data columns (total 4 columns):
 #   Column      Non-Null Count    Dtype   
---  ------      --------------    -----   
 0   G3_ID       1662499 non-null  int64   
 1   type        1095507 non-null  object  
 2   pop_un_adj  1089988 non-null  float64 
 3   geometry    1662499 non-null  geometry
dtypes: float64(1), geometry(1), int64(1), object(1)
memory usage: 50.7+ MB


In [8]:
# Make sure unmeasured populations don't give us trouble.
G3[['pop_un_adj']] = G3[['pop_un_adj']].fillna(0)

In [9]:
# Save file of all settlement types.
G3.to_file(driver='GPKG', filename=r'GRID3/GRID3_Sahel.gpkg', layer='Sahel_allSettTypes')

### 2.2 Remove built-up areas, leaving only rural towns and villages.

In [10]:
G3['type'].unique()
G3['type'] = G3['type'].astype(str)
G3 = G3[G3['type'].str.startswith("S") | G3['type'].str.startswith("H")] # Remove all built-up types.
G3['type'].unique()

array(['Small Settlement Area', 'Hamlet'], dtype=object)

### 2.3 Reproject and save to file.

In [11]:
G3.to_file(driver='GPKG', filename=r'GRID3/GRID3_Sahel.gpkg', layer='Sahel_SSA_HA')

In [12]:
G3 = G3.to_crs('ESRI:102022')

### 2.4 Load and reproject ADM3.

In [13]:
fiona.listlayers('Sahel_AdminBoundaries.gpkg')

['adm0', 'adm1', 'adm2', 'adm3']

In [14]:
ADM3 = gpd.read_file(filename='Sahel_AdminBoundaries.gpkg', 
                     layer='adm3').to_crs('ESRI:102022')[['ADM1_CODE', 'ADM2_CODE', 'ADM3_CODE', 'geometry']]
ADM3.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 1433 entries, 0 to 1432
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype   
---  ------     --------------  -----   
 0   ADM1_CODE  1433 non-null   object  
 1   ADM2_CODE  1433 non-null   object  
 2   ADM3_CODE  1433 non-null   object  
 3   geometry   1433 non-null   geometry
dtypes: geometry(1), object(3)
memory usage: 44.9+ KB


### 2.5 Load and reproject Livelihood Zones.

In [15]:
fiona.listlayers('Livelihood.gpkg')

['LZ_harmonized']

In [22]:
LZ = gpd.read_file(filename='Livelihood.gpkg').to_crs('ESRI:102022')[['OECD_ZONE', 'geometry']]
LZ.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 63 entries, 0 to 62
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype   
---  ------     --------------  -----   
 0   OECD_ZONE  63 non-null     object  
 1   geometry   63 non-null     geometry
dtypes: geometry(1), object(1)
memory usage: 1.1+ KB


## 3. SPATIAL JOIN

### 3.1 GRID3 to centroid

In [27]:
G3 = gpd.read_file(filename=r'GRID3/GRID3_Sahel.gpkg', layer='Sahel_SSA_HA').to_crs('ESRI:102022')
G3['centroid'] = G3.geometry.centroid
G3 = G3.set_geometry('centroid')

### 3.2 Join GRID3 and ADMs

In [28]:
G3 = gpd.sjoin(G3, ADM3, how='left', predicate='intersects')
print(G3.sample(5), G3.info())

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 1094501 entries, 0 to 1094490
Data columns (total 9 columns):
 #   Column       Non-Null Count    Dtype   
---  ------       --------------    -----   
 0   G3_ID        1094501 non-null  int64   
 1   type         1094501 non-null  object  
 2   pop_un_adj   1094501 non-null  float64 
 3   geometry     1094501 non-null  geometry
 4   centroid     1094501 non-null  geometry
 5   index_right  1093102 non-null  float64 
 6   ADM1_CODE    1093102 non-null  object  
 7   ADM2_CODE    1093102 non-null  object  
 8   ADM3_CODE    1093102 non-null  object  
dtypes: float64(2), geometry(2), int64(1), object(4)
memory usage: 83.5+ MB
          G3_ID                   type  pop_un_adj  \
41847    608960                 Hamlet    8.792259   
599940  1167442                 Hamlet    4.241427   
380930   948268  Small Settlement Area  619.667704   
314790   882061                 Hamlet    6.533971   
999726  1567670                 Hamlet 

### 3.3 Join GRID3 and Livelihood Zones

In [29]:
try:
    G3 = G3.drop(['index_right'], axis=1)
except:
    pass

In [30]:
G3 = gpd.sjoin(G3, LZ, how='left', predicate='intersects')
print(G3.sample(5), G3.info())

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 1094501 entries, 0 to 1094490
Data columns (total 10 columns):
 #   Column       Non-Null Count    Dtype   
---  ------       --------------    -----   
 0   G3_ID        1094501 non-null  int64   
 1   type         1094501 non-null  object  
 2   pop_un_adj   1094501 non-null  float64 
 3   geometry     1094501 non-null  geometry
 4   centroid     1094501 non-null  geometry
 5   ADM1_CODE    1093102 non-null  object  
 6   ADM2_CODE    1093102 non-null  object  
 7   ADM3_CODE    1093102 non-null  object  
 8   index_right  1091843 non-null  float64 
 9   OECD_ZONE    1091843 non-null  object  
dtypes: float64(2), geometry(2), int64(1), object(5)
memory usage: 91.9+ MB
          G3_ID                   type  pop_un_adj  \
221368   788604                 Hamlet   10.512911   
657170  1224750                 Hamlet    3.476443   
599200  1166702                 Hamlet   12.166201   
279342   846602                 Hamlet    5.211

## 4. Largest Population by Group

### 4.1 Group-by ADM and LZ

##### For each unique ADM3-LZ area, sum up the population of non-built-up settlements.

In [36]:
ADM3_LZ = G3[['ADM3_CODE', 'OECD_ZONE', 'pop_un_adj']].groupby(['ADM3_CODE', 'OECD_ZONE'], as_index=False).sum()
ADM3_LZ

Unnamed: 0,ADM3_CODE,OECD_ZONE,pop_un_adj
0,BF020203,Cash Crops and Rice,0.000000
1,BF020203,Rainfed Agriculture,215.340489
2,BF020206,Cash Crops and Rice,0.000000
3,BF020206,Rainfed Agriculture,131.448241
4,BF020304,Rainfed Agriculture,7.719418
...,...,...,...
1559,TD230201,Desert,49238.903661
1560,TD230201,Nomadic/Transhumant Pastoralist,93.590981
1561,TD230301,Desert,51539.150118
1562,TD230401,Desert,102196.619183


In [37]:
print('Number of ADMs with multiple livelihood zones: ', 
      len(ADM3.ADM3_CODE.unique()) - len(ADM3_LZ.ADM3_CODE.unique()))

Number of ADM3s with multiple livelihood zones:  339


In [38]:
# Within each ADM3 group, choose the row which has the highest population.
ADM3_LZ_Max = ADM3_LZ.loc[ADM3_LZ.groupby(['ADM3_CODE'])['pop_un_adj'].idxmax()]
ADM3_LZ_Max

Unnamed: 0,ADM3_CODE,OECD_ZONE,pop_un_adj
1,BF020203,Rainfed Agriculture,215.340489
3,BF020206,Rainfed Agriculture,131.448241
4,BF020304,Rainfed Agriculture,7.719418
5,BF020306,Rainfed Agriculture,0.000000
6,BF020307,Rainfed Agriculture,0.000000
...,...,...,...
1558,TD230101,Desert,14786.435122
1559,TD230201,Desert,49238.903661
1561,TD230301,Desert,51539.150118
1562,TD230401,Desert,102196.619183


In [43]:
# Join the livelihood zone with the highest rural population onto the ADM3 geodataframe as field 'LZ'. 
ADM3_LZ = ADM3.merge(ADM3_LZ_Max.rename(columns={'OECD_ZONE':'LZ'}), on='ADM3_CODE')
ADM3_LZ

Unnamed: 0,ADM1_CODE,ADM2_CODE,ADM3_CODE,geometry,LZ,pop_un_adj
0,BF02,BF0202,BF020203,"MULTIPOLYGON (((-3056670.470 1419405.361, -305...",Rainfed Agriculture,215.340489
1,BF02,BF0202,BF020206,"MULTIPOLYGON (((-3058275.312 1474673.126, -305...",Rainfed Agriculture,131.448241
2,BF02,BF0203,BF020304,"MULTIPOLYGON (((-3025648.212 1542179.266, -302...",Rainfed Agriculture,7.719418
3,BF02,BF0203,BF020306,"MULTIPOLYGON (((-3029873.483 1509712.374, -302...",Rainfed Agriculture,0.000000
4,BF02,BF0203,BF020307,"MULTIPOLYGON (((-3020012.475 1560288.700, -302...",Rainfed Agriculture,0.000000
...,...,...,...,...,...,...
1089,TD23,TD2301,TD230101,"MULTIPOLYGON (((-104505.940 1975358.629, -1046...",Desert,14786.435122
1090,TD23,TD2302,TD230201,"MULTIPOLYGON (((-194790.797 1836382.290, -1948...",Desert,49238.903661
1091,TD23,TD2303,TD230301,"MULTIPOLYGON (((-104471.336 1900332.483, -1045...",Desert,51539.150118
1092,TD23,TD2304,TD230401,"MULTIPOLYGON (((-296278.729 1882859.274, -2891...",Desert,102196.619183


In [44]:
ADM3_LZ.LZ.isna().sum() # No need to fill in missing values.

0

In [45]:
# Remember to drop the population field to avoid confusion. It's only the population for that LZ's portion of the ADM area.
ADM3_LZ = ADM3_LZ.drop(['pop_un_adj'], axis=1)

In [46]:
ADM3_LZ.to_file(driver='GPKG', filename='Livelihood_ADM.gpkg', layer='LZ_ADM3')