# Network analysis in Senegal

### Objectives
    1)	Use measures of road-based accessibility to identify road segments that, if rehabilitated, would improve agricultural market activities in Senegal, including during flood conditions.
    2)	Gain a better understanding of the accessibility, connectivity, and criticality of roads in Senegal in relationship to agricultural origins, processing & transfer sites, and markets.

To this end, the team will develop an accessibility model which measures the travel time from sites of agricultural production to their nearest populated areas, processing centers, and markets. 

### Datasets for analysis
#### ORIGIN
    1) agriculture: MapSPAM 2017. Measuring value in international dollars.
    2) agriculture: UMD Land Cover 2019 30m. Assign MapSPAM value onto land cover cropland class for more precise origin information.
    3) population: WorldPop 2020, UN-adjusted.
    4) settlement extent: GRID3 2020.
#### DESTINATION
    4) markets: derived from WorldPop 2020 and GRID3 2020 urban clusters.
    5) agricultural processing hubs: to be acquired.
#### TRAVEL ROUTE
    6) roads: OpenStreetMap, July 2021.
    7) elevation: 
#### OBSTACLE
    8) flood: FATHOM. 1-in-10, 20, and 50 year flood return periods. 
#### INTERVENTION
    9) upcoming road projects: AGEROUTE interventions separate from the World Bank-financed project
    10) targeted road projects: critical road segments identified by this accessibility model's baseline outputs


### Model design
#### Basic formula: 
    (a) Off-road driving time from origin to closest road node
    +
    (b) Driving time from road node in (a) to a destination (closeness measured by road segments speeds)

#### Model origin & destination (OD) sets:
    A)	Travel time from an area that has agricultural value/potential to the nearest processing hub (if provided).
    B)	Travel time from an area that has agricultural value/potential to the nearest larger settlement, (“larger” settlement identified using a case-appropriate population metric to be determined).
    C)	Travel time from an area that has agricultural value/potential to the nearest market.
    D)	Travel time from all settlements to the nearest market.
    E)	Travel time from larger settlements to the nearest market.

#### Before/after scenarios for each OD set:
    1)	Pre-project, baseline weather: No inclement weather. Road network status as of November 2021.
    2)	Pre-project, flood: 1-in-10, 1-in-20 and 1-in-50 year flood return period. Road network status as of November 2021.
    3)	Post-project, baseline weather: No inclement weather. Road network status if X number of critical road segments to high-value areas are protected (i.e., their travel times reduced).
    4)	Post-project, flood: 1-in-10 year flood return period. Road network status if X number of critical road segments to high-value areas are protected (i.e., their travel times reduced).

#### Notes:
    --Destinations are expected to be proximal to the road network, so no measure is taken between road and destination.
    --All travel times will be assigned to each model variation’s point of origin; the aggregation up to admin areas is possible if desired.
    --Obstacles & interventions modify the road segment speeds. Basic formula is then applied to the modified road network.


### Prep workspace

In [1]:
import os, sys
GISFolder = os.getcwd()
GISFolder

'C:\\Users\\wb527163\\GEO-Cdrive-Grace'

In [2]:
# Note: needed to reinstall rtree due to geopandas import error. Did so in the console. 
# conda install -c conda-forge rtree=0.9.3

In [3]:
# load and filter osm network (step 1)
import geopandas as gpd
from geopandas import GeoDataFrame
import pandas as pd
import time
sys.path.append(r"C:\Users\wb527163\.conda\envs\geo\GOSTnets-master")
import GOSTnets as gn

In [4]:
import networkx as nx
import osmnx as ox
import numpy as np
import rasterio as rt
import shapely
from shapely.geometry import Point, box
from shapely.ops import unary_union
from shapely.wkt import loads
from shapely import wkt
from shapely.geometry import LineString, MultiLineString, Point
import peartree

In [5]:
#### Might not use these
import fiona
from osgeo import gdal
import importlib
import matplotlib.pyplot as plt
import subprocess, glob

In [6]:
pth = os.path.join(GISFolder, "SEN-Cdrive") # Personal folder system for running model.
pth

'C:\\Users\\wb527163\\GEO-Cdrive-Grace\\SEN-Cdrive'

In [7]:
out_pth = os.path.join(GISFolder, "SEN-Cdrive\outputs") # For storing intermediate outputs from the model.
out_pth

'C:\\Users\\wb527163\\GEO-Cdrive-Grace\\SEN-Cdrive\\outputs'

In [8]:
team_pth = 'R:\\SEN\\GEO' # This is where the unmodified input data is stored. Finalized outputs also housed here.
team_pth

'R:\\SEN\\GEO'

### Prepare and clean the data

In [9]:
# Notes:
# OSM road network is in WGS84. Projected each dataset to match.
# Starting as CSV (dataframe) and deriving geometry from there tends to avoid read errors.

#### Previously: joined MapSPAM agricultural values ("spam") to the much higher resolution but empty cropland class of UMD landcover. Divided the ag value across all cropland landcover points ("LC") located within a spam cell.

#### Now, dealing with spatial gaps so that all MapSPAM values have a point of origin for the graph.

#### Pseudocode:
    4. agriculture.shp = Spatial join SPAM["val"] and SPAM["SID"] onto lc.shp (full outer join)
    5. Where ["SID"] = NULL, ["val"] = 0
    6. Where ["lc_crop"] = NULL:
        SPAM_centroid.shp = SPAM.shp to centroid (retain ["val"] field)
        SPAM_cent_subset.shp = Select SPAM_centroid based on lc_crop = NULL and export
        Append agriculture.shp with SPAM_cent_subset.shp
    7. ["lc_sum"] = Sum ["lcID"] per ["SID"]
    8. Divide SPAM value by lc_sum and assign it to landcover["spam_V"]

In [9]:
spam = gpd.read_file(r"C:\Users\wb527163\GEO-Cdrive-Grace\SEN-Cdrive\scratch.gdb", layer="spam_poly_clip")
spam.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 2263 entries, 0 to 2262
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype   
---  ------        --------------  -----   
 0   Id            2263 non-null   int64   
 1   gridcode      2263 non-null   int64   
 2   Shape_Length  2263 non-null   float64 
 3   Shape_Area    2263 non-null   float64 
 4   geometry      2263 non-null   geometry
dtypes: float64(2), geometry(1), int64(2)
memory usage: 88.5 KB


In [10]:
spam.rename(columns={'Id':'ID_spam', 'gridcode':'grid_val'}, inplace=True)
spam = spam.drop(columns=['Shape_Length', 'Shape_Area'])
spam

Unnamed: 0,ID_spam,grid_val,geometry
0,1387,50698,"MULTIPOLYGON (((-12.91734 13.75031, -13.00067 ..."
1,2448,320005,"MULTIPOLYGON (((-11.91734 12.08364, -12.00067 ..."
2,365,4503,"MULTIPOLYGON (((-13.50067 15.41696, -13.50067 ..."
3,2150,152650,"MULTIPOLYGON (((-12.83400 12.50031, -12.91734 ..."
4,1961,105340,"MULTIPOLYGON (((-16.50065 12.75031, -16.58399 ..."
...,...,...,...
2258,695,468582,"MULTIPOLYGON (((-12.33400 14.75030, -12.41734 ..."
2259,1431,362959,"MULTIPOLYGON (((-14.08400 13.66697, -14.16733 ..."
2260,1319,232790,"MULTIPOLYGON (((-14.00066 13.83364, -14.08400 ..."
2261,905,1133888,"MULTIPOLYGON (((-15.58399 14.41697, -15.66732 ..."


In [11]:
agriculture = gpd.read_file(r"C:\Users\wb527163\GEO-Cdrive-Grace\SEN-Cdrive\scratch.gdb", layer="agriculture")
agriculture.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 13941001 entries, 0 to 13941000
Data columns (total 5 columns):
 #   Column    Dtype   
---  ------    -----   
 0   ID_spam   float64 
 1   grid_val  float64 
 2   ID_LC     float64 
 3   val       float64 
 4   geometry  geometry
dtypes: float64(4), geometry(1)
memory usage: 531.8 MB


In [12]:
spam = spam.to_crs("EPSG:31028")
agriculture = agriculture.to_crs("EPSG:31028")
spam.crs == agriculture.crs

True

In [13]:
print(agriculture['ID_spam'].isna().sum())

0


In [18]:
# Not relevant because land cover was already clipped by non-null spam in Arc. 
# agriculture['val'] = agriculture['val'].fillna(1) # Assign a value of one to non-matches.
# agriculture

Unnamed: 0,x,y,ID_lc,geometry,index__spam,ID_spam,val
0,-15.990057,16.693707,1,POINT (394264.774 1845856.132),,,1.0
1,-15.989787,16.693707,2,POINT (394293.510 1845855.989),,,1.0
2,-15.989518,16.693707,3,POINT (394322.246 1845855.846),,,1.0
3,-15.925378,16.693707,4,POINT (401161.317 1845823.013),,,1.0
4,-15.925109,16.693707,5,POINT (401190.052 1845822.879),,,1.0
...,...,...,...,...,...,...,...
7654994,-15.940200,15.171060,7654995,POINT (398822.078 1677384.290),458.0,459.0,670742.0
7654995,-15.939930,15.171060,7654996,POINT (398851.085 1677384.166),458.0,459.0,670742.0
7654996,-15.939660,15.171060,7654997,POINT (398880.092 1677384.041),458.0,459.0,670742.0
7654997,-15.939390,15.171060,7654998,POINT (398909.100 1677383.917),458.0,459.0,670742.0


#### Land cover that doesn't have a SPAM grid was assigned a value of 1. Now need to provide points for SPAM data where there is no land cover pixel.
#### Create xy fields from spam centroids now for later.  Need WGS84 points for all SPAM grids that don't have any land cover points to map to.

In [25]:
spam_xy = spam.to_crs("EPSG:31028") # Geographic CRSs can cause incorrect centroid measures. Use projected.
spam_xy['x'] = spam_xy.centroid.x
spam_xy['y'] = spam_xy.centroid.y
spam_xy.to_csv(os.path.join(out_pth, 'spam_xy.csv'))

In [14]:
# Create new point dataframe from centroids.
spam_xy = os.path.join(out_pth, "spam_xy.csv")
spam_xy = pd.read_csv(spam_xy)
geometry = [Point(xy) for xy in zip(spam_xy.x, spam_xy.y)]
crs = "EPSG:31028"
spam_xy = GeoDataFrame(spam_xy, crs=crs, geometry=geometry) 
spam_xy

# 2263 spam cells.

Unnamed: 0.1,Unnamed: 0,ID_spam,grid_val,geometry,x,y
0,0,1387,50698,POINT (720473.131 1525577.307),720473.130737,1.525577e+06
1,1,2448,320005,POINT (830855.283 1342176.354),830855.283210,1.342176e+06
2,2,365,4503,POINT (665153.151 1709568.014),665153.150572,1.709568e+06
3,3,2150,152650,POINT (730654.504 1387334.758),730654.503815,1.387335e+06
4,4,1961,105340,POINT (332415.367 1414550.077),332415.367100,1.414550e+06
...,...,...,...,...,...,...
2258,2258,695,468582,POINT (782318.406 1636890.692),782318.405503,1.636891e+06
2259,2259,1431,362959,POINT (594366.185 1515594.966),594366.185263,1.515595e+06
2260,2260,1319,232790,POINT (603304.222 1534063.640),603304.222164,1.534064e+06
2261,2261,905,1133888,POINT (432393.633 1598462.951),432393.633494,1.598463e+06


In [15]:
# But we need the xy fields to be in WGS84 for snapping to the graph and matching origins table, so we re-project again.
spam_xy = spam_xy.to_crs("EPSG:4326")
spam_xy["geom_WGS84"] = spam_xy["geometry"].astype('str')

# Strip the geometry column of extra characters
spam_xy["geom_WGS84"] = spam_xy["geom_WGS84"].str.strip('POINT ') 
spam_xy["geom_WGS84"] = spam_xy["geom_WGS84"].str.strip('()')

# 3. Split the column into two based on the space between the coordinates.
XY_spam = spam_xy["geom_WGS84"].str.split(" ", expand=True)
spam_xy["X"] = XY_spam[0]
spam_xy["Y"] = XY_spam[1]

spam_xy.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 2263 entries, 0 to 2262
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   Unnamed: 0  2263 non-null   int64   
 1   ID_spam     2263 non-null   int64   
 2   grid_val    2263 non-null   int64   
 3   geometry    2263 non-null   geometry
 4   x           2263 non-null   float64 
 5   y           2263 non-null   float64 
 6   geom_WGS84  2263 non-null   object  
 7   X           2263 non-null   object  
 8   Y           2263 non-null   object  
dtypes: float64(2), geometry(1), int64(3), object(3)
memory usage: 159.2+ KB


In [16]:
spam_xy['X'] = spam_xy['X'].astype(float)
spam_xy['Y'] = spam_xy['Y'].astype(float)
spam_xy.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 2263 entries, 0 to 2262
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   Unnamed: 0  2263 non-null   int64   
 1   ID_spam     2263 non-null   int64   
 2   grid_val    2263 non-null   int64   
 3   geometry    2263 non-null   geometry
 4   x           2263 non-null   float64 
 5   y           2263 non-null   float64 
 6   geom_WGS84  2263 non-null   object  
 7   X           2263 non-null   float64 
 8   Y           2263 non-null   float64 
dtypes: float64(4), geometry(1), int64(3), object(1)
memory usage: 159.2+ KB


In [17]:
spam_xy

Unnamed: 0.1,Unnamed: 0,ID_spam,grid_val,geometry,x,y,geom_WGS84,X,Y
0,0,1387,50698,POINT (-12.95900 13.79197),720473.130737,1.525577e+06,-12.959001 13.791973,-12.959001,13.791973
1,1,2448,320005,POINT (-11.95901 12.12531),830855.283210,1.342176e+06,-11.959005 12.125313,-11.959005,12.125313
2,2,365,4503,POINT (-13.45900 15.45863),665153.150572,1.709568e+06,-13.458999 15.458633,-13.458999,15.458633
3,3,2150,152650,POINT (-12.87567 12.54198),730654.503815,1.387335e+06,-12.875668 12.541978,-12.875668,12.541978
4,4,1961,105340,POINT (-16.54232 12.79198),332415.367100,1.414550e+06,-16.542321 12.791977,-16.542321,12.791977
...,...,...,...,...,...,...,...,...,...
2258,2258,695,468582,POINT (-12.37567 14.79197),782318.405503,1.636891e+06,-12.37567 14.791969,-12.375670,14.791969
2259,2259,1431,362959,POINT (-14.12566 13.70864),594366.185263,1.515595e+06,-14.125663 13.70864,-14.125663,13.708640
2260,2260,1319,232790,POINT (-14.04233 13.87531),603304.222164,1.534064e+06,-14.04233 13.875306,-14.042330,13.875306
2261,2261,905,1133888,POINT (-15.62566 14.45864),432393.633494,1.598463e+06,-15.625658 14.458637,-15.625658,14.458637


#### This is how we still retain SPAM values where there was no land cover cell to represent it. 
#### There may be a more efficient method here. sjoin does not allow full outer joins.

In [18]:
# Find which SPAM cells were not joined to land cover, create a point for them, and append them to the origins (agriculture) dataset.
spam_nomatch = spam[~spam.ID_spam.isin(agriculture.ID_spam)]
spam_nomatch

# 123 cells without corresponding land cover points.

Unnamed: 0,ID_spam,grid_val,geometry
1,2448,320005,"MULTIPOLYGON (((835446.983 1337613.654, 826366..."
25,2384,886860,"MULTIPOLYGON (((744591.874 1345950.163, 735519..."
54,2353,1306428,"MULTIPOLYGON (((436289.577 1345005.377, 427223..."
62,2389,1011610,"MULTIPOLYGON (((789960.190 1346360.586, 780885..."
90,2497,2377126,"MULTIPOLYGON (((790129.495 1328999.489, 790050..."
...,...,...,...
2180,5,54822,"MULTIPOLYGON (((464226.016 1851883.675, 455343..."
2211,2383,290657,"MULTIPOLYGON (((726431.713 1347928.814, 726377..."
2234,2382,743827,"MULTIPOLYGON (((726431.713 1347928.814, 717340..."
2238,2345,94605,"MULTIPOLYGON (((354684.661 1345293.292, 345616..."


In [19]:
spam_nomatch = pd.merge(spam_nomatch, spam_xy, on='ID_spam',how='left') # Add the WGS84 xy fields created earlier only for SPAM grids that didn't have land cover points.
spam_nomatch
# Resulting dataset should have same number of rows as earlier spam_nomatch table.

Unnamed: 0.1,ID_spam,grid_val_x,geometry_x,Unnamed: 0,grid_val_y,geometry_y,x,y,geom_WGS84,X,Y
0,2448,320005,"MULTIPOLYGON (((835446.983 1337613.654, 826366...",1,320005,POINT (-11.95901 12.12531),830855.283210,1.342176e+06,-11.959005 12.125313,-11.959005,12.125313
1,2384,886860,"MULTIPOLYGON (((744591.874 1345950.163, 735519...",25,886860,POINT (-12.79234 12.20865),740018.245292,1.350524e+06,-12.792335 12.208646,-12.792335,12.208646
2,2353,1306428,"MULTIPOLYGON (((436289.577 1345005.377, 427223...",54,1306428,POINT (-15.62566 12.20865),431766.963707,1.349623e+06,-15.625658 12.208646,-15.625658,12.208646
3,2389,1011610,"MULTIPOLYGON (((789960.190 1346360.586, 780885...",62,1011610,POINT (-12.37567 12.20865),785378.175609,1.350928e+06,-12.37567 12.208646,-12.375670,12.208646
4,2497,2377126,"MULTIPOLYGON (((790129.495 1328999.489, 790050...",90,2377126,POINT (-12.29351 12.04963),794499.340290,1.333415e+06,-12.293506 12.049633,-12.293506,12.049633
...,...,...,...,...,...,...,...,...,...,...,...
118,5,54822,"MULTIPOLYGON (((464226.016 1851883.675, 455343...",2180,54822,POINT (-15.37566 16.79196),459793.687462,1.856501e+06,-15.375659 16.791961,-15.375659,16.791961
119,2383,290657,"MULTIPOLYGON (((726431.713 1347928.814, 726377...",2211,290657,POINT (-12.87370 12.21204),731159.466492,1.350829e+06,-12.873696 12.212042,-12.873696,12.212042
120,2382,743827,"MULTIPOLYGON (((726431.713 1347928.814, 717340...",2234,743827,POINT (-12.95561 12.22402),722233.491819,1.352085e+06,-12.955614 12.224018,-12.955614,12.224018
121,2345,94605,"MULTIPOLYGON (((354684.661 1345293.292, 345616...",2238,94605,POINT (-16.37565 12.20865),350173.846532,1.349925e+06,-16.375655 12.208646,-16.375655,12.208646


In [20]:
spam_nomatch.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 123 entries, 0 to 122
Data columns (total 11 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   ID_spam     123 non-null    int64   
 1   grid_val_x  123 non-null    int64   
 2   geometry_x  123 non-null    geometry
 3   Unnamed: 0  123 non-null    int64   
 4   grid_val_y  123 non-null    int64   
 5   geometry_y  123 non-null    geometry
 6   x           123 non-null    float64 
 7   y           123 non-null    float64 
 8   geom_WGS84  123 non-null    object  
 9   X           123 non-null    float64 
 10  Y           123 non-null    float64 
dtypes: float64(4), geometry(2), int64(4), object(1)
memory usage: 11.5+ KB


In [21]:
# Clean up the no match dataset prior to appending to final origins file. Make sure XY fields are exact matches, including case.
spam_nomatch = spam_nomatch.drop(columns=['Unnamed: 0', 'grid_val_y', 'geom_WGS84', 'x', 'y'])
spam_nomatch.rename(columns={'grid_val_x':'grid_val', 'geometry_x':'geom_UTM', 'geometry_y':'geom_WGS84', 'X':'x', 'Y':'y'}, inplace=True)
spam_nomatch.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 123 entries, 0 to 122
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   ID_spam     123 non-null    int64   
 1   grid_val    123 non-null    int64   
 2   geom_UTM    123 non-null    geometry
 3   geom_WGS84  123 non-null    geometry
 4   x           123 non-null    float64 
 5   y           123 non-null    float64 
dtypes: float64(2), geometry(2), int64(2)
memory usage: 6.7 KB


In [22]:
# Just in case, I want to reload the no matches as a regular dataframe and make sure all data types are appropriate.
spam_nomatch.to_csv(os.path.join(out_pth, 'spam_nomatch.csv'))
spam_nomatch = os.path.join(out_pth, "spam_nomatch.csv")
spam_nomatch = pd.read_csv(spam_nomatch)
spam_nomatch.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 123 entries, 0 to 122
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  123 non-null    int64  
 1   ID_spam     123 non-null    int64  
 2   grid_val    123 non-null    int64  
 3   geom_UTM    123 non-null    object 
 4   geom_WGS84  123 non-null    object 
 5   x           123 non-null    float64
 6   y           123 non-null    float64
dtypes: float64(2), int64(3), object(2)
memory usage: 6.9+ KB


In [23]:
agriculture = agriculture.to_crs("EPSG:4326")

In [24]:
agriculture["geom_WGS84"] = agriculture["geometry"].astype('str')

# Strip the geometry column of extra characters
agriculture["geom_WGS84"] = agriculture["geom_WGS84"].str.strip('POINT ') 
agriculture["geom_WGS84"] = agriculture["geom_WGS84"].str.strip('()')

# 3. Split the column into two based on the space between the coordinates.
XY_ag = agriculture["geom_WGS84"].str.split(" ", expand=True)
agriculture["x"] = XY_ag[0]
agriculture["y"] = XY_ag[1]

agriculture['x'] = agriculture['x'].astype(float)
agriculture['y'] = agriculture['y'].astype(float)
agriculture.info()
agriculture

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 13941001 entries, 0 to 13941000
Data columns (total 8 columns):
 #   Column      Dtype   
---  ------      -----   
 0   ID_spam     float64 
 1   grid_val    float64 
 2   ID_LC       float64 
 3   val         float64 
 4   geometry    geometry
 5   geom_WGS84  object  
 6   x           float64 
 7   y           float64 
dtypes: float64(6), geometry(1), object(1)
memory usage: 850.9+ MB


Unnamed: 0,ID_spam,grid_val,ID_LC,val,geometry,geom_WGS84,x,y
0,9.0,2487.0,13787187.0,4.236797,POINT (-15.56250 16.67094),-15.562504 16.670935,-15.562504,16.670935
1,9.0,2487.0,13818608.0,4.236797,POINT (-15.55442 16.68225),-15.554419 16.682254,-15.554419,16.682254
2,9.0,2487.0,13787063.0,4.236797,POINT (-15.55981 16.67686),-15.559809 16.676864,-15.559809,16.676864
3,9.0,2487.0,13787188.0,4.236797,POINT (-15.56196 16.67094),-15.561965 16.670935,-15.561965,16.670935
4,9.0,2487.0,13787225.0,4.236797,POINT (-15.57113 16.66824),-15.571128 16.66824,-15.571128,16.668240
...,...,...,...,...,...,...,...,...
13940996,2342.0,455853.0,249666.0,45585.300000,POINT (-11.49260 12.33046),-11.492597 12.330455,-11.492597,12.330455
13940997,2342.0,455853.0,249665.0,45585.300000,POINT (-11.49314 12.33046),-11.493136 12.330455,-11.493136,12.330455
13940998,2342.0,455853.0,249664.0,45585.300000,POINT (-11.48882 12.33207),-11.488824 12.332072,-11.488824,12.332072
13940999,2342.0,455853.0,249661.0,45585.300000,POINT (-11.48936 12.33261),-11.489363 12.332611,-11.489363,12.332611


In [25]:
print(agriculture['x'].isna().sum())

0


In [26]:
agriculture = agriculture.append(spam_nomatch) 
agriculture
# Make sure XY fields appended onto the agriculture XY fields.

Unnamed: 0.1,ID_spam,grid_val,ID_LC,val,geometry,geom_WGS84,x,y,Unnamed: 0,geom_UTM
0,9.0,2487.0,13787187.0,4.236797,POINT (-15.56250 16.67094),-15.562504 16.670935,-15.562504,16.670935,,
1,9.0,2487.0,13818608.0,4.236797,POINT (-15.55442 16.68225),-15.554419 16.682254,-15.554419,16.682254,,
2,9.0,2487.0,13787063.0,4.236797,POINT (-15.55981 16.67686),-15.559809 16.676864,-15.559809,16.676864,,
3,9.0,2487.0,13787188.0,4.236797,POINT (-15.56196 16.67094),-15.561965 16.670935,-15.561965,16.670935,,
4,9.0,2487.0,13787225.0,4.236797,POINT (-15.57113 16.66824),-15.571128 16.66824,-15.571128,16.668240,,
...,...,...,...,...,...,...,...,...,...,...
118,5.0,54822.0,,,,POINT (-15.375658562426967 16.791960774825146),-15.375659,16.791961,118.0,MULTIPOLYGON (((464226.0159186065 1851883.6752...
119,2383.0,290657.0,,,,POINT (-12.873695714743537 12.212042461981278),-12.873696,12.212042,119.0,MULTIPOLYGON (((726431.7126558168 1347928.8139...
120,2382.0,743827.0,,,,POINT (-12.955613789985883 12.224017589494675),-12.955614,12.224018,120.0,MULTIPOLYGON (((726431.7126558168 1347928.8139...
121,2345.0,94605.0,,,,POINT (-16.37565473031309 12.2086455435392),-16.375655,12.208646,121.0,MULTIPOLYGON (((354684.6609952422 1345293.2921...


In [27]:
agriculture.info()
# 13941124 rows in result, - 123 nomatches = number of rows in original agriculture dataframe. 

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 13941124 entries, 0 to 122
Data columns (total 10 columns):
 #   Column      Dtype   
---  ------      -----   
 0   ID_spam     float64 
 1   grid_val    float64 
 2   ID_LC       float64 
 3   val         float64 
 4   geometry    geometry
 5   geom_WGS84  object  
 6   x           float64 
 7   y           float64 
 8   Unnamed: 0  float64 
 9   geom_UTM    object  
dtypes: float64(7), geometry(1), object(2)
memory usage: 1.1+ GB


In [28]:
agriculture = agriculture.drop(columns=['geometry','geom_UTM', 'geom_WGS84'])
agriculture['x'] = agriculture['x'].astype(float) # Just in case.
agriculture['y'] = agriculture['y'].astype(float)
agriculture['ID_LC'] = agriculture['ID_LC'].astype(float)
agriculture['ID_spam'] = agriculture['ID_spam'].astype(float)
agriculture['val'] = agriculture['val'].astype(float)

In [29]:
# Give a dummy ID for SPAM locations that don't have land cover pixels, and vice versa.
agriculture['ID_LC'] = agriculture['ID_LC'].fillna(0.1)
agriculture['ID_spam'] = agriculture['ID_spam'].fillna(0.1)

In [30]:
agriculture.head(40)

Unnamed: 0.1,ID_spam,grid_val,ID_LC,val,x,y,Unnamed: 0
0,9.0,2487.0,13787187.0,4.236797,-15.562504,16.670935,
1,9.0,2487.0,13818608.0,4.236797,-15.554419,16.682254,
2,9.0,2487.0,13787063.0,4.236797,-15.559809,16.676864,
3,9.0,2487.0,13787188.0,4.236797,-15.561965,16.670935,
4,9.0,2487.0,13787225.0,4.236797,-15.571128,16.66824,
5,9.0,2487.0,13787138.0,4.236797,-15.561965,16.673091,
6,9.0,2487.0,13787238.0,4.236797,-15.581368,16.667162,
7,9.0,2487.0,13818609.0,4.236797,-15.55388,16.682254,
8,9.0,2487.0,13818411.0,4.236797,-15.544717,16.691956,
9,9.0,2487.0,13787224.0,4.236797,-15.571667,16.66824,


In [31]:
agriculture['val'] = agriculture['val'].fillna(0)
agriculture.head(30)

Unnamed: 0.1,ID_spam,grid_val,ID_LC,val,x,y,Unnamed: 0
0,9.0,2487.0,13787187.0,4.236797,-15.562504,16.670935,
1,9.0,2487.0,13818608.0,4.236797,-15.554419,16.682254,
2,9.0,2487.0,13787063.0,4.236797,-15.559809,16.676864,
3,9.0,2487.0,13787188.0,4.236797,-15.561965,16.670935,
4,9.0,2487.0,13787225.0,4.236797,-15.571128,16.66824,
5,9.0,2487.0,13787138.0,4.236797,-15.561965,16.673091,
6,9.0,2487.0,13787238.0,4.236797,-15.581368,16.667162,
7,9.0,2487.0,13818609.0,4.236797,-15.55388,16.682254,
8,9.0,2487.0,13818411.0,4.236797,-15.544717,16.691956,
9,9.0,2487.0,13787224.0,4.236797,-15.571667,16.66824,


In [32]:
agriculture.tail(30)

Unnamed: 0.1,ID_spam,grid_val,ID_LC,val,x,y,Unnamed: 0
93,2356.0,1880419.0,0.1,0.0,-15.379193,12.217335,93.0
94,398.0,18901.0,0.1,0.0,-13.542333,15.381391,94.0
95,1683.0,329033.0,0.1,0.0,-11.318076,13.362432,95.0
96,2407.0,854693.0,0.1,0.0,-15.792235,12.125412,96.0
97,2444.0,1651475.0,0.1,0.0,-12.292337,12.125313,97.0
98,2442.0,2134537.0,0.1,0.0,-12.459003,12.125313,98.0
99,2451.0,426432.0,0.1,0.0,-11.710119,12.131277,99.0
100,72.0,11654.0,0.1,0.0,-14.04233,16.541962,100.0
101,2001.0,9111.0,0.1,0.0,-12.125671,12.791977,101.0
102,2344.0,2008851.0,0.1,0.0,-16.458988,12.208646,102.0


In [35]:
# Transfer grid values from spam to point values column. Should have done this via name change before appending.
agriculture.reset_index(inplace=True)
agriculture.loc[agriculture['val'] == 0, 'val'] = agriculture['grid_val']
agriculture.tail(10)

Unnamed: 0.1,index,ID_spam,grid_val,ID_LC,val,x,y,Unnamed: 0
13941114,113,7.0,901.0,0.1,901.0,-14.875661,16.785873,113.0
13941115,114,2053.0,215844.0,0.1,215844.0,-11.209008,12.708644,114.0
13941116,115,2003.0,215284.0,0.1,215284.0,-11.292341,12.791977,115.0
13941117,116,2357.0,1024815.0,0.1,1024815.0,-15.299521,12.233041,116.0
13941118,117,2501.0,407669.0,0.1,407669.0,-11.96258,12.072311,117.0
13941119,118,5.0,54822.0,0.1,54822.0,-15.375659,16.791961,118.0
13941120,119,2383.0,290657.0,0.1,290657.0,-12.873696,12.212042,119.0
13941121,120,2382.0,743827.0,0.1,743827.0,-12.955614,12.224018,120.0
13941122,121,2345.0,94605.0,0.1,94605.0,-16.375655,12.208646,121.0
13941123,122,2443.0,1451706.0,0.1,1451706.0,-12.37567,12.125313,122.0


In [36]:
agriculture.head(20)

Unnamed: 0.1,index,ID_spam,grid_val,ID_LC,val,x,y,Unnamed: 0
0,0,9.0,2487.0,13787187.0,4.236797,-15.562504,16.670935,
1,1,9.0,2487.0,13818608.0,4.236797,-15.554419,16.682254,
2,2,9.0,2487.0,13787063.0,4.236797,-15.559809,16.676864,
3,3,9.0,2487.0,13787188.0,4.236797,-15.561965,16.670935,
4,4,9.0,2487.0,13787225.0,4.236797,-15.571128,16.66824,
5,5,9.0,2487.0,13787138.0,4.236797,-15.561965,16.673091,
6,6,9.0,2487.0,13787238.0,4.236797,-15.581368,16.667162,
7,7,9.0,2487.0,13818609.0,4.236797,-15.55388,16.682254,
8,8,9.0,2487.0,13818411.0,4.236797,-15.544717,16.691956,
9,9,9.0,2487.0,13787224.0,4.236797,-15.571667,16.66824,


In [37]:
agriculture.to_csv(os.path.join(out_pth, 'agriculture.csv')) # CSV circumvents some gdf issues later on.
# Note that it will take a few min to write to file.

### Origins and destinations

Measure distance from origin/destination to nearest node and save to file.

In [10]:
#%% If starting new session, reload graph from file
gTime = nx.read_gpickle("SEN-Cdrive/gTime.pickle")

In [11]:
gn.example_edge(gTime)

(358284990, 5217543379, {'osmid': 59618174, 'ref': 'D 523', 'name': 'D 523', 'highway': 'unclassified', 'oneway': False, 'length': 33.127, 'time': 2.3851440000000004, 'mode': 'drive'})


In [12]:
# pandana_snap_c was giving an Index Error. Resetting the index didn't work.
# Re-loading seemed to resolve whatever the issue was.
agriculture = os.path.join(out_pth, "agriculture.csv")
agriculture = pd.read_csv(agriculture)
crs = "EPSG:4326"
geometry = [Point(xy) for xy in zip(agriculture.x, agriculture.y)]
agriculture = GeoDataFrame(agriculture, crs=crs, geometry=geometry) 
agriculture.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 13941124 entries, 0 to 13941123
Data columns (total 10 columns):
 #   Column        Dtype   
---  ------        -----   
 0   Unnamed: 0    int64   
 1   index         int64   
 2   ID_spam       float64 
 3   grid_val      float64 
 4   ID_LC         float64 
 5   val           float64 
 6   x             float64 
 7   y             float64 
 8   Unnamed: 0.1  float64 
 9   geometry      geometry
dtypes: float64(7), geometry(1), int64(2)
memory usage: 1.0 GB


In [13]:
agriculture.head()

Unnamed: 0.2,Unnamed: 0,index,ID_spam,grid_val,ID_LC,val,x,y,Unnamed: 0.1,geometry
0,0,0,9.0,2487.0,13787187.0,4.236797,-15.562504,16.670935,,POINT (-15.56250 16.67094)
1,1,1,9.0,2487.0,13818608.0,4.236797,-15.554419,16.682254,,POINT (-15.55442 16.68225)
2,2,2,9.0,2487.0,13787063.0,4.236797,-15.559809,16.676864,,POINT (-15.55981 16.67686)
3,3,3,9.0,2487.0,13787188.0,4.236797,-15.561965,16.670935,,POINT (-15.56197 16.67094)
4,4,4,9.0,2487.0,13787225.0,4.236797,-15.571128,16.66824,,POINT (-15.57113 16.66824)


In [14]:
agriculture.crs

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

In [15]:
print('start: %s\n' % time.ctime())
ag_snap = gn.pandana_snap_c(gTime, agriculture, source_crs = 'epsg:4326', target_crs = 'epsg:31028', add_dist_to_node_col = True)
ag_snap.to_csv('SEN-Cdrive/outputs/ag_snap.csv', index=True)
print('\nend: %s' % time.ctime())
print('\n--- processing complete')
ag_snap

start: Thu Dec 23 12:03:59 2021


end: Thu Dec 23 12:27:21 2021

--- processing complete


Unnamed: 0.2,Unnamed: 0,index,ID_spam,grid_val,ID_LC,val,x,y,Unnamed: 0.1,geometry,NN,NN_dist
0,0,0,9.0,2487.0,13787187.0,4.236797e+00,-15.562504,16.670935,,POINT (-15.56250 16.67094),3656618126,5113.670395
1,1,1,9.0,2487.0,13818608.0,4.236797e+00,-15.554419,16.682254,,POINT (-15.55442 16.68225),3656617769,5632.260552
2,2,2,9.0,2487.0,13787063.0,4.236797e+00,-15.559809,16.676864,,POINT (-15.55981 16.67686),3656617751,5453.608825
3,3,3,9.0,2487.0,13787188.0,4.236797e+00,-15.561965,16.670935,,POINT (-15.56197 16.67094),3656618126,5075.881917
4,4,4,9.0,2487.0,13787225.0,4.236797e+00,-15.571128,16.668240,,POINT (-15.57113 16.66824),3656618126,5568.582130
...,...,...,...,...,...,...,...,...,...,...,...,...
13941119,13941119,118,5.0,54822.0,0.1,5.482200e+04,-15.375659,16.791961,118.0,POINT (-15.37566 16.79196),3651042393,4280.425149
13941120,13941120,119,2383.0,290657.0,0.1,2.906570e+05,-12.873696,12.212042,119.0,POINT (-12.87370 12.21204),4818951142,9019.155622
13941121,13941121,120,2382.0,743827.0,0.1,7.438270e+05,-12.955614,12.224018,120.0,POINT (-12.95561 12.22402),4618077921,10786.551522
13941122,13941122,121,2345.0,94605.0,0.1,9.460500e+04,-16.375655,12.208646,121.0,POINT (-16.37565 12.20865),3507831510,2881.578459


### Create travel time values for the road nodes nearest to each service.

Using calculate_OD.

In [16]:
# If starting a new session, load from file.
HDurban_snap = os.path.join(out_pth, "HDurban_snap.csv")
HDurban_snap = pd.read_csv(HDurban_snap)

In [None]:
ag_snap = os.path.join(out_pth, "ag_snap.csv")
ag_snap = pd.read_csv(ag_snap)

In [44]:
gTime = nx.read_gpickle("SEN-Cdrive/outputs/gTime.pickle")

In [17]:
# We only need to find the origin-destination pairs for nodes closest to the origins and services,
# and some nodes will be the nearest for more than one service (and definitely for multiple origins).
origins = list(ag_snap.NN.unique())

In [18]:
list_HDurban = list(HDurban_snap.NN.unique()) 
dests = list_HDurban

# If more than one set of destinations, you can combine the sets to go into a single OD matrix using this code:
# list_ssa = list(ssa_snap.NN.unique()) 
# destslist = list_HDurban + list_ssa
# dests = list(set(destslist))

In [19]:
len(origins) # 23108 unique nearest nodes (average of 5 origins per node).

625851

In [20]:
len(dests) # 314 unique nearest nodes. This is only 5 less than the original dataset.

58

In [21]:
fail_value = 999999999 # If there is no shortest path, the OD pair will be assigned the fail value.

In [22]:
gn.example_edge(gTime, 11)

(358284990, 5217543379, {'osmid': 59618174, 'ref': 'D 523', 'name': 'D 523', 'highway': 'unclassified', 'oneway': False, 'length': 33.127, 'time': 2.3851440000000004, 'mode': 'drive'})
(358284990, 1888282175, {'osmid': 178482063, 'highway': 'tertiary', 'oneway': False, 'length': 12.832, 'time': 0.76992, 'mode': 'drive'})
(358284990, 5329792467, {'osmid': 178482063, 'highway': 'tertiary', 'oneway': False, 'length': 48.781, 'time': 2.92686, 'mode': 'drive'})
(358284993, 1888282575, {'osmid': 178470940, 'highway': 'tertiary', 'oneway': False, 'length': 126.577, 'time': 7.59462, 'mode': 'drive'})
(358284993, 1888198886, {'osmid': 178470940, 'highway': 'tertiary', 'oneway': False, 'length': 120.578, 'time': 7.23468, 'mode': 'drive'})
(358284994, 6424643487, {'osmid': 178470940, 'highway': 'tertiary', 'oneway': False, 'length': 46.582, 'time': 2.79492, 'mode': 'drive'})
(358284994, 1888282618, {'osmid': 178470940, 'highway': 'tertiary', 'oneway': False, 'length': 53.36, 'time': 3.2016, 'mode

In [23]:
OD = gn.calculate_OD(gTime, origins, dests, fail_value, weight = 'time')
# This a few minutes.

In [24]:
OD_df = pd.DataFrame(OD, index = origins, columns = dests)

In [25]:
OD_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 625851 entries, 3656618126 to 3882487056
Data columns (total 58 columns):
 #   Column      Non-Null Count   Dtype  
---  ------      --------------   -----  
 0   6058226279  625851 non-null  float64
 1   6029307183  625851 non-null  float64
 2   4998093094  625851 non-null  float64
 3   2201506815  625851 non-null  float64
 4   3474499811  625851 non-null  float64
 5   1697006012  625851 non-null  float64
 6   1901689169  625851 non-null  float64
 7   6032060028  625851 non-null  float64
 8   6040927878  625851 non-null  float64
 9   3449495495  625851 non-null  float64
 10  3990543961  625851 non-null  float64
 11  8972391475  625851 non-null  float64
 12  3418418812  625851 non-null  float64
 13  1983641803  625851 non-null  float64
 14  6014451367  625851 non-null  float64
 15  6027163276  625851 non-null  float64
 16  2833577858  625851 non-null  float64
 17  4656728818  625851 non-null  float64
 18  6045659373  625851 non-null  fl

In [26]:
OD_df.tail()

Unnamed: 0,6058226279,6029307183,4998093094,2201506815,3474499811,1697006012,1901689169,6032060028,6040927878,3449495495,...,1968458114,1936967272,3496518021,6027615161,6027276892,6041228287,5536661253,7357630367,8178147277,6026834850
3691541254,32667.287962,24304.383961,33719.340799,28806.55557,25967.00787,35154.292086,36467.971926,35440.340053,31701.63606,31867.242606,...,37413.787711,40744.568312,16581.282252,16570.282542,33979.33388,36278.401638,36333.50628,36644.209122,32081.918197,32394.376391
5376874047,7908.832112,6934.580049,22409.25773,17496.4725,17645.623638,23844.209017,25157.888856,24130.256983,20391.55299,22413.58049,...,27627.202833,30790.094698,16615.685029,16637.134924,22669.250811,24968.318569,25023.423211,25334.126053,22280.968551,23002.540878
3651042393,36502.461371,35422.517484,22589.899624,24947.833976,26372.879713,20464.283865,22763.696706,19448.397677,22052.458466,20515.640119,...,15279.003582,11638.548276,35723.238333,35973.784863,22680.903888,22475.895024,22444.199192,22438.946802,20638.326724,20702.08153
4618077921,23418.745092,15052.180971,29383.37293,24470.587701,21631.040001,30818.324217,32132.004057,31104.372184,27365.668191,27531.274737,...,33077.819842,36408.600443,12253.070703,12274.520598,29643.366011,31942.433769,31997.538411,32308.241253,27745.950328,28058.408522
3882487056,30679.315465,22316.411464,31731.368302,26818.583073,23979.035373,33166.319589,34479.999429,33452.367556,29713.663563,29879.270109,...,35425.815214,38756.595815,14593.309755,14582.310045,31991.361383,34290.429141,34345.533783,34656.236625,30093.9457,30406.403894


In [27]:
# Convert to minutes and save to file.
OD_min = OD_df[OD_df <fail_value] / 60
OD_min.to_csv(os.path.join(out_pth, 'fromagriculture/OD.csv'))

In [28]:
OD_min.tail(20)

Unnamed: 0,6058226279,6029307183,4998093094,2201506815,3474499811,1697006012,1901689169,6032060028,6040927878,3449495495,...,1968458114,1936967272,3496518021,6027615161,6027276892,6041228287,5536661253,7357630367,8178147277,6026834850
5072506110,520.447595,381.065861,537.981809,456.102055,408.77626,561.897663,583.792327,566.665129,504.353396,507.113505,...,599.555924,655.068934,252.347499,252.164171,542.315027,580.632823,581.551233,586.729614,510.691432,515.899068
1860634594,712.079413,694.080348,480.203384,519.50229,543.253052,444.776454,483.100002,427.845018,471.246031,445.632392,...,358.355116,297.680861,699.092362,703.268138,481.720122,478.303307,477.775043,477.687503,447.677169,448.739749
5111800060,444.547341,305.165607,462.081555,380.201801,332.876006,485.99741,507.892074,490.764876,428.453142,431.213252,...,523.65567,579.16868,176.447246,176.263917,466.414773,504.732569,505.650979,510.82936,434.791178,439.998815
3507831609,83.755636,209.586049,357.743419,275.863666,278.349518,381.659274,403.553938,386.42674,324.115007,357.815465,...,444.709171,497.424036,370.937799,371.295297,362.076637,400.394433,401.312844,406.491225,355.605266,367.631472
3109029200,586.630365,447.248632,604.164579,522.284825,474.95903,628.080434,649.975098,632.8479,570.536167,573.296276,...,665.738694,721.251704,318.53027,318.346941,608.497797,646.815593,647.734004,652.912384,576.874202,582.081839
2535965098,520.499464,381.11773,538.033678,456.153924,408.828129,561.949532,583.844196,566.716998,504.405265,507.165374,...,599.607793,655.120803,252.399368,252.21604,542.366896,580.684692,581.603102,586.781483,510.743301,515.950937
1888510171,499.41474,360.033007,516.948954,435.069201,387.743406,540.864809,562.759473,545.632275,483.320542,486.080651,...,578.52307,634.03608,231.314645,231.131317,521.282172,559.599968,560.518379,565.69676,489.658578,494.866214
3166661162,557.505539,418.123806,575.039753,493.159999,445.834204,598.955608,620.850272,603.723074,541.411341,544.17145,...,636.613868,674.867498,289.405444,289.222115,579.372971,617.690767,618.609178,623.787558,547.749376,552.957013
3651042501,621.904659,603.905594,390.028629,429.327535,453.078298,354.6017,392.925247,337.670264,381.071277,355.457638,...,268.180362,207.506107,608.917608,613.093383,391.545367,388.128553,387.600289,387.512749,357.502414,358.564995
1888509577,509.488486,370.106753,527.0227,445.142946,397.817151,550.938555,572.833219,555.706021,493.394288,496.154397,...,588.596815,644.109825,241.388391,241.205062,531.355918,569.673714,570.592125,575.770505,499.732323,504.93996


### Filter 1st nearest

#### Check each file to make sure nearest neighbor column is named correctly. If not, rename.

In [29]:
# Reload from file even if already loaded. Quickest way to ensure NN is a column rather than only the index.
OD = os.path.join(out_pth, "fromagriculture/OD.csv")
OD = pd.read_csv(OD)

In [30]:
OD

Unnamed: 0.1,Unnamed: 0,6058226279,6029307183,4998093094,2201506815,3474499811,1697006012,1901689169,6032060028,6040927878,...,1968458114,1936967272,3496518021,6027615161,6027276892,6041228287,5536661253,7357630367,8178147277,6026834850
0,3656618126,577.901536,559.902472,346.025507,385.324413,409.075175,310.598578,348.922125,293.667142,337.068155,...,224.177240,163.502985,564.914486,569.090261,347.542245,344.125431,343.597167,343.509627,313.499292,314.561872
1,3656617769,578.309363,560.310299,346.433334,385.732240,409.483002,311.006405,349.329952,294.074969,337.475982,...,224.585067,163.910812,565.322313,569.498088,347.950072,344.533258,344.004994,343.917454,313.907119,314.969699
2,3656617751,577.966765,559.967701,346.090736,385.389642,409.140404,310.663807,348.987354,293.732371,337.133384,...,224.242469,163.568214,564.979715,569.155490,347.607474,344.190660,343.662396,343.574856,313.564521,314.627101
3,4303444021,571.873871,553.874806,339.997842,379.296748,403.047510,304.570912,342.894460,287.639476,331.040489,...,218.149574,157.475319,558.886820,563.062596,341.514579,338.097765,337.569501,337.481961,307.471627,308.534207
4,3656617804,578.333584,560.334520,346.457555,385.756461,409.507223,311.030626,349.354173,294.099190,337.500203,...,224.609288,163.935033,565.346534,569.522309,347.974293,344.557479,344.029215,343.941675,313.931340,314.993920
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
625846,3691541254,544.454799,405.073066,561.989013,480.109260,432.783464,585.904868,607.799532,590.672334,528.360601,...,623.563129,679.076139,276.354704,276.171376,566.322231,604.640027,605.558438,610.736819,534.698637,539.906273
625847,5376874047,131.813869,115.576334,373.487629,291.607875,294.093727,397.403484,419.298148,402.170950,339.859217,...,460.453381,513.168245,276.928084,277.285582,377.820847,416.138643,417.057054,422.235434,371.349476,383.375681
625848,3651042393,608.374356,590.375291,376.498327,415.797233,439.547995,341.071398,379.394945,324.139961,367.540974,...,254.650060,193.975805,595.387306,599.563081,378.015065,374.598250,374.069987,373.982447,343.972112,345.034692
625849,4618077921,390.312418,250.869683,489.722882,407.843128,360.517333,513.638737,535.533401,518.406203,456.094470,...,551.296997,606.810007,204.217845,204.575343,494.056100,532.373896,533.292307,538.470688,462.432505,467.640142


In [31]:
OD.rename(columns={'Unnamed: 0': 'NN'}, inplace=True) # Repeat for each OD set, if needed.

#### Find first, second, and third nearest destination for each origin node. 

In [32]:
fail_value = 999999999

In [33]:
# Since we only have one destination set, renaming the OD object for quick find/replace on this section of code.
OD_HDurban = OD

# Nearest
OD_HDurban["HDurban1"] = 0
sub = OD_HDurban.iloc[:,1:-1] # Filtering out the newly created field and the node ID column. ("include everything between column 0 and the last column")
OD_HDurban["HDurban1"] = sub.min(axis=1) # Default is axis=0, meaning min value of each column selected. We want min of each row.
HDurban1 = OD_HDurban[['NN', 'HDurban1']] # Remove unnecessary OD values.
HDurban1.to_csv(os.path.join(out_pth, 'fromagriculture/ag-HDurban1.csv'))


# Second nearest
dupes = OD_HDurban.apply(pd.Series.duplicated, axis = 1, keep=False) # If a number is repeated within a row, value is True. If not, False.
# The first time this is done, there should be two True values per row, unless any POIs are equidistant.
dupes = OD_HDurban.where(~dupes, fail_value) # For any value that appears more than once in its row, it is replaced with the fail_value.

OD_HDurban["HDurban2"] = 0
Dsub = dupes.iloc[:,1:] # Filtering out the node ID column. No need to filter 1st nearest as its new "dupes" value is too high to be caught.
OD_HDurban["HDurban2"] = Dsub.min(axis=1) 
HDurban2 = OD_HDurban.loc[:,['NN', 'HDurban2']] 
HDurban2.to_csv(os.path.join(out_pth, 'fromagriculture/ag-HDurban2.csv'))


# Third nearest
dupes = OD_HDurban.apply(pd.Series.duplicated, axis = 1, keep=False)
# Since this includes both first and second nearest columns, there should be four True values per row, unless POIs are equidistant.
dupes = OD_HDurban.where(~dupes, fail_value)
 
OD_HDurban["HDurban3"] = 0
Dsub = dupes.iloc[:,1:] # Filtering out the node ID column.
OD_HDurban["HDurban3"] = Dsub.min(axis=1)
HDurban3 = OD_HDurban.loc[:,['NN', 'HDurban3']]
HDurban3.to_csv(os.path.join(out_pth, 'fromagriculture/ag-HDurban3.csv'))


# Combine and write to file
HDurban_all = OD_HDurban.loc[:,['NN', 'HDurban1', 'HDurban2', 'HDurban3']]
HDurban_all.to_csv(os.path.join(out_pth, 'fromagriculture/ag-HDurban_all.csv'))
HDurban_all.head()

Unnamed: 0,NN,HDurban1,HDurban2,HDurban3
0,3656618126,163.502985,164.457268,170.140815
1,3656617769,163.910812,164.865095,170.548642
2,3656617751,163.568214,164.522497,170.206044
3,4303444021,157.475319,158.429603,164.11315
4,3656617804,163.935033,164.889316,170.572863


### Join back to georeferenced _snap file.

In [72]:
ag_snap = os.path.join(out_pth, "ag_snap.csv")
ag_snap = pd.read_csv(ag_snap)
ag_snap

  exec(code_obj, self.user_global_ns, self.user_ns)


Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,x,y,ID_lc,geometry,index__spam,ID_spam,val,geom_UTM,geom_WGS84,lc_sum,spam_V,NN,NN_dist
0,0,0,-15.990057,16.693707,1.0,POINT (-15.9900569732419 16.6937073347074),,,0.0,,,,0.0,5343951929,14607.791436
1,1,1,-15.989787,16.693707,2.0,POINT (-15.9897874786566 16.6937073347074),,,0.0,,,,0.0,5343951929,14628.808767
2,2,2,-15.989518,16.693707,3.0,POINT (-15.9895179840714 16.6937073347074),,,0.0,,,,0.0,5343951929,14649.852284
3,3,3,-15.925378,16.693707,4.0,POINT (-15.9253782727816 16.6937073347074),,,0.0,,,,0.0,3639403839,14029.850649
4,4,4,-15.925109,16.693707,5.0,POINT (-15.9251087781964 16.6937073347074),,,0.0,,,,0.0,3639403839,14018.013147
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7657060,7657060,2061,-11.459007,12.041980,,POINT (-11.459007 12.04198),,2507.0,411859.0,MULTIPOLYGON (((890065.2102591199 1329046.5648...,POINT (-11.45900694020285 12.041979526954448),0.0,inf,2241013982,14274.586682
7657061,7657061,2062,-11.375674,12.041980,,POINT (-11.375674 12.04198),,2508.0,497200.0,MULTIPOLYGON (((899153.5281643758 1329166.2613...,POINT (-11.375673926651558 12.041979526573865),0.0,inf,6662833878,13597.325940
7657062,7657062,2063,-11.292341,12.041980,,POINT (-11.292341 12.04198),,2509.0,217938.0,MULTIPOLYGON (((908242.6248140114 1329288.7282...,POINT (-11.292340913098172 12.041979526183944),0.0,inf,6662833878,14994.531096
7657063,7657063,2064,-11.209008,12.041980,,POINT (-11.209008 12.04198),,2510.0,201222.0,MULTIPOLYGON (((917332.5179180377 1329413.9667...,POINT (-11.209007899992315 12.041979525784605),0.0,inf,2217989971,20260.745415


In [34]:
ag_snap.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 13941124 entries, 0 to 13941123
Data columns (total 12 columns):
 #   Column        Dtype   
---  ------        -----   
 0   Unnamed: 0    int64   
 1   index         int64   
 2   ID_spam       float64 
 3   grid_val      float64 
 4   ID_LC         float64 
 5   val           float64 
 6   x             float64 
 7   y             float64 
 8   Unnamed: 0.1  float64 
 9   geometry      geometry
 10  NN            int64   
 11  NN_dist       float64 
dtypes: float64(8), geometry(1), int64(3)
memory usage: 1.2 GB


In [35]:
ag_to_HDurban = pd.merge(ag_snap, HDurban_all, on='NN',how='left')
ag_to_HDurban

Unnamed: 0.2,Unnamed: 0,index,ID_spam,grid_val,ID_LC,val,x,y,Unnamed: 0.1,geometry,NN,NN_dist,HDurban1,HDurban2,HDurban3
0,0,0,9.0,2487.0,13787187.0,4.236797e+00,-15.562504,16.670935,,POINT (-15.56250 16.67094),3656618126,5113.670395,163.502985,164.457268,170.140815
1,1,1,9.0,2487.0,13818608.0,4.236797e+00,-15.554419,16.682254,,POINT (-15.55442 16.68225),3656617769,5632.260552,163.910812,164.865095,170.548642
2,2,2,9.0,2487.0,13787063.0,4.236797e+00,-15.559809,16.676864,,POINT (-15.55981 16.67686),3656617751,5453.608825,163.568214,164.522497,170.206044
3,3,3,9.0,2487.0,13787188.0,4.236797e+00,-15.561965,16.670935,,POINT (-15.56197 16.67094),3656618126,5075.881917,163.502985,164.457268,170.140815
4,4,4,9.0,2487.0,13787225.0,4.236797e+00,-15.571128,16.668240,,POINT (-15.57113 16.66824),3656618126,5568.582130,163.502985,164.457268,170.140815
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13941119,13941119,118,5.0,54822.0,0.1,5.482200e+04,-15.375659,16.791961,118.0,POINT (-15.37566 16.79196),3651042393,4280.425149,193.975805,194.930088,200.613635
13941120,13941120,119,2383.0,290657.0,0.1,2.906570e+05,-12.873696,12.212042,119.0,POINT (-12.87370 12.21204),4818951142,9019.155622,219.410253,220.599554,224.281252
13941121,13941121,120,2382.0,743827.0,0.1,7.438270e+05,-12.955614,12.224018,120.0,POINT (-12.95561 12.22402),4618077921,10786.551522,199.346845,200.536147,204.217845
13941122,13941122,121,2345.0,94605.0,0.1,9.460500e+04,-16.375655,12.208646,121.0,POINT (-16.37565 12.20865),3507831510,2881.578459,88.413319,90.114132,215.944545


In [36]:
ag_to_HDurban.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 13941124 entries, 0 to 13941123
Data columns (total 15 columns):
 #   Column        Dtype   
---  ------        -----   
 0   Unnamed: 0    int64   
 1   index         int64   
 2   ID_spam       float64 
 3   grid_val      float64 
 4   ID_LC         float64 
 5   val           float64 
 6   x             float64 
 7   y             float64 
 8   Unnamed: 0.1  float64 
 9   geometry      geometry
 10  NN            int64   
 11  NN_dist       float64 
 12  HDurban1      float64 
 13  HDurban2      float64 
 14  HDurban3      float64 
dtypes: float64(11), geometry(1), int64(3)
memory usage: 1.7 GB


In [38]:
ag_to_HDurban = ag_to_HDurban.drop(columns=['index','Unnamed: 0.1', 'HDurban2', 'HDurban3'])
ag_to_HDurban.rename(columns={'Unnamed: 0': 'ID_ag', 'HDurban1':'ag_HDurb'}, inplace=True)
ag_to_HDurban.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 13941124 entries, 0 to 13941123
Data columns (total 11 columns):
 #   Column    Dtype   
---  ------    -----   
 0   ID_ag     int64   
 1   ID_spam   float64 
 2   grid_val  float64 
 3   ID_LC     float64 
 4   val       float64 
 5   x         float64 
 6   y         float64 
 7   geometry  geometry
 8   NN        int64   
 9   NN_dist   float64 
 10  ag_HDurb  float64 
dtypes: float64(8), geometry(1), int64(2)
memory usage: 1.2 GB


In [39]:
ag_to_HDurban.to_csv(os.path.join(out_pth, 'fromagriculture/ag_to_HDurban.csv'))

### End of script. Load into QGIS or Arc and visualize at 10 min intervals. 
QML file for symbology in QGIS:
R:\GEOGlobal\Design\symb_traveltimes_10min.qml