# Data preparation and infrastructure exposure to flooding

This notebook forms the basis of "Hands-On 5" in the CCG course.

1. Extract infrastructure data from OpenStreetMap
2. Extract flood hazard data from Aqueduct
3. Intersect floods with roads to calculate exposure
4. Open QGIS to look at the data

In [1]:
# The os and subprocess modules are built into Python
# see https://docs.python.org/3/library/os.html
import os

# see https://docs.python.org/3/library/subprocess.html
import subprocess

# see https://docs.python.org/3/library/time.html
import time

# see https://docs.python.org/3/library/pathlib.html
from pathlib import Path

## Activity 1: Extract infrastructure data

### Step 1) On your desktop, create a folder called `ghana_tutorial`

### Step 2) Create a variable to store the folder location

In the cell below, type in the path to your desktop, by changing NAME to match your username as shown on your computer

In [2]:
# edit this if using a Mac (otherwise delete)
data_folder = Path("/Users/NAME/Desktop/ghana_tutorial")

# edit this if using Windows (otherwise delete)
data_folder = Path("C:/Users/NAME/Desktop/ghana_tutorial")

# delete this line
data_folder = Path("../data")

### Step 3) Load Python libraries

In [3]:
# Pandas and GeoPandas are libraries for working with datasets
# see https://geopandas.org/
import geopandas as gpd

gpd._compat.USE_PYGEOS = False
# see https://pandas.pydata.org/
import pandas as pd

# This package interacts with a risk data extract service, also accessible at
# https://global.infrastructureresilience.org/downloads
import irv_autopkg_client

# We'll use snail to intersect roads with flooding
import snail.intersection
import snail.io

# snkit helps generate connected networks from lines and nodes
# see https://snkit.readthedocs.io/
import snkit
import snkit.network

# PyPROJ is a library for working with geographic projections
# see https://pyproj4.github.io/
from pyproj import Geod

### Step 4) and 5) Download and save data

Download the `ghana-latest-free.shp.zip` dataset from http://download.geofabrik.de/africa/ghana.html, extract the zip folder and save the extracted folder within your new folder `ghana_tutorial`

### Step 6) Load the road dataset you've just downloaded

In [4]:
roads = gpd.read_file(
    data_folder / "ghana-latest-free.shp" / "gis_osm_roads_free_1.shp"
)

### Step 7) To take a look at the data and the attribute table fill in and run the next two cells

In [5]:
roads

Unnamed: 0,osm_id,code,fclass,name,ref,oneway,maxspeed,layer,bridge,tunnel,geometry
0,4790591,5121,unclassified,Airport Road,,B,0,0,F,F,"LINESTRING (-0.17184 5.60847, -0.17182 5.60849..."
1,4790592,5122,residential,Nortei Ababio Road,,B,0,0,F,F,"LINESTRING (-0.18282 5.61197, -0.18336 5.61198..."
2,4790594,5115,tertiary,Airport Road,,F,0,0,F,F,"LINESTRING (-0.17544 5.60550, -0.17418 5.60555..."
3,4790596,5121,unclassified,Airport Road,,F,0,0,F,F,"LINESTRING (-0.17207 5.60853, -0.17207 5.60844..."
4,4790597,5122,residential,Volta Road,,B,0,0,F,F,"LINESTRING (-0.18282 5.61197, -0.18280 5.61262..."
...,...,...,...,...,...,...,...,...,...,...,...
338073,1182192627,5141,service,,,B,0,0,F,F,"LINESTRING (-0.17508 5.71756, -0.17511 5.71756..."
338074,1182192628,5141,service,,,B,0,0,F,F,"LINESTRING (-0.17501 5.71759, -0.17508 5.71756)"
338075,1182192629,5141,service,,,B,0,0,F,F,"LINESTRING (-0.17506 5.71778, -0.17500 5.71764..."
338076,1182207852,5114,secondary,Education Ridge Road,R92,B,0,0,F,F,"LINESTRING (-0.97456 9.56428, -0.97542 9.56413..."


In [6]:
roads.fclass.unique()

array(['unclassified', 'residential', 'tertiary', 'tertiary_link',
       'secondary', 'trunk', 'service', 'primary', 'trunk_link',
       'primary_link', 'secondary_link', 'footway', 'path', 'track',
       'motorway', 'track_grade3', 'track_grade4', 'motorway_link',
       'steps', 'pedestrian', 'bridleway', 'cycleway', 'track_grade2',
       'track_grade5', 'track_grade1', 'living_street'], dtype=object)

### Step 8) Next we want to make a couple of changes to the data

Filter out minor and residential roads, tracks and paths.

In [7]:
# Keep only the specified columns
roads = roads[["osm_id", "fclass", "name", "geometry"]]
# Keep only the roads whose "fclass" is in the list
roads = roads[
    roads.fclass.isin(
        [
            "motorway",
            "motorway_link",
            "trunk",
            "trunk_link",
            "primary",
            "primary_link",
            "secondary",
            "secondary_link",
            "tertiary",
            "tertiary_link",
        ]
    )
]
# Rename some columns
roads = roads.rename(
    columns={
        "fclass": "road_type",
    }
)

Create topological network information - this adds information that will let us find routes over the road network.
- add nodes at the start and end of each road segment
- split roads at junctions, so each segment goes from junction to junction
- add ids to each node and edge, and add `from_id` and `to_id` to each edge

In [8]:
road_network = snkit.Network(edges=roads)

In [9]:
with_endpoints = snkit.network.add_endpoints(road_network)
split_edges = snkit.network.split_edges_at_nodes(with_endpoints)
with_ids = snkit.network.add_ids(
    split_edges, id_col="id", edge_prefix="roade", node_prefix="roadn"
)
connected = snkit.network.add_topology(with_ids)
roads = connected.edges
road_nodes = connected.nodes

Calculate the length of each road segment in meters

In [10]:
geod = Geod(ellps="WGS84")
roads["length_m"] = roads.geometry.apply(geod.geometry_length)

In [11]:
roads.tail()

Unnamed: 0,osm_id,road_type,name,geometry,id,from_id,to_id,length_m
15684,1181982913,secondary,Kumbungu-Zangbalung road,"LINESTRING (-0.95804 9.56291, -0.95811 9.56294...",roade_15684,roadn_12219,roadn_12220,1870.991174
15685,1182141809,secondary_link,,"LINESTRING (-1.59420 6.65761, -1.59426 6.65768...",roade_15685,roadn_12221,roadn_12216,47.244512
15686,1182207852,secondary,Education Ridge Road,"LINESTRING (-0.97456 9.56428, -0.97542 9.56413...",roade_15686,roadn_12220,roadn_8005,2242.279664
15687,1182207852,secondary,Education Ridge Road,"LINESTRING (-0.99028 9.57190, -0.99202 9.57587...",roade_15687,roadn_8005,roadn_12222,1069.950243
15688,1182207853,secondary,Bontanga - Dalung Road,"LINESTRING (-0.99413 9.58079, -0.99425 9.58107...",roade_15688,roadn_12222,roadn_8006,6604.650117


In [12]:
roads.set_crs(4326, inplace=True)
road_nodes.set_crs(4326, inplace=True)
road_nodes.crs

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

### Step 9) Save the pre-processed dataset

In [13]:
roads.to_file(
    data_folder / "GHA_OSM_roads.gpkg",
    layer="edges",
    driver="GPKG",
)
road_nodes.to_file(
    data_folder / "GHA_OSM_roads.gpkg",
    layer="nodes",
    driver="GPKG",
)

## Activity 2: Extract hazard data

The full [Aqueduct dataset](https://www.wri.org/resources/data-sets/aqueduct-floods-hazard-maps) is available to download openly. 

Country-level extracts are available through the [Global Systemic Risk Assessment Tool (G-SRAT)](https://global.infrastructureresilience.org/downloads/). This section uses that service to download an extract for Ghana.

In [14]:
country_iso = "gha"

Create a client to connect to the data API:

In [15]:
client = irv_autopkg_client.Client()

In [16]:
job_id = client.job_submit(country_iso, ["wri_aqueduct.version_2"])

In [17]:
while not client.job_complete(job_id):
    print("Processing...")
    time.sleep(1)

Processing...


In [18]:
client.extract_download(
    country_iso,
    data_folder / "flood_layer",
    # there may be other datasets available, but only download the following
    dataset_filter=["wri_aqueduct.version_2"],
    overwrite=True,
)

### Alternative: download flood hazard data from Aqueduct

The full [Aqueduct dataset](https://www.wri.org/resources/data-sets/aqueduct-floods-hazard-maps) is available to download. There are some scripts and summary of the data you may find useful at [nismod/aqueduct](https://github.com/nismod/aqueduct).

There are almost 700 files in the full Aqueduct dataset, of up to around 100MB each, so we don't recommend downloading all of them unless you intend to do further analysis.

The next steps show how to clip a region out of the global dataset, in case you prefer to work from the original global Aqueduct files.

To follow this step, we suggest downloading [inunriver_historical_000000000WATCH_1980_rp00100.tif](http://wri-projects.s3.amazonaws.com/AqueductFloodTool/download/v2/inunriver_historical_000000000WATCH_1980_rp00100.tif) to work through the next steps. Save the downloaded file in a new folder titled `flood_layer` under your data_folder.

In [19]:
xmin = "-3.262509"
ymin = "4.737128"
xmax = "1.187968"
ymax = "11.162937"

for root, dirs, files in os.walk(os.path.join(data_folder, "flood_layer")):
    print("Looking in", root)
    for file_ in sorted(files):
        if file_.endswith(".tif") and not file_.endswith(
            f"-{country_iso}.tif"
        ):
            print("Found tif file", file_)
            stem = file_[:-4]
            input_file = os.path.join(root, file_)

            # Clip file to bounds
            clip_file = os.path.join(
                root,
                "gha",
                "wri_aqueduct_version_2",
                f"{stem}-{country_iso}.tif",
            )
            try:
                os.remove(clip_file)
            except FileNotFoundError:
                pass
            cmd = [
                "gdalwarp",
                "-te",
                xmin,
                ymin,
                xmax,
                ymax,
                input_file,
                clip_file,
            ]
            print(cmd)
            p = subprocess.run(cmd, capture_output=True)
            print(p.stdout.decode("utf8"))
            print(p.stderr.decode("utf8"))
            print(clip_file)

Looking in ../data/flood_layer
Found tif file inunriver_historical_000000000WATCH_1980_rp00100.tif
['gdalwarp', '-te', '-3.262509', '4.737128', '1.187968', '11.162937', '../data/flood_layer/inunriver_historical_000000000WATCH_1980_rp00100.tif', '../data/flood_layer/gha/wri_aqueduct_version_2/inunriver_historical_000000000WATCH_1980_rp00100-gha.tif']
Creating output file that is 534P x 771L.
Processing ../data/flood_layer/inunriver_historical_000000000WATCH_1980_rp00100.tif [1/1] : 0Using internal nodata values (e.g. -9999) for image ../data/flood_layer/inunriver_historical_000000000WATCH_1980_rp00100.tif.
Copying nodata values from source ../data/flood_layer/inunriver_historical_000000000WATCH_1980_rp00100.tif to destination ../data/flood_layer/gha/wri_aqueduct_version_2/inunriver_historical_000000000WATCH_1980_rp00100-gha.tif.
...10...20...30...40...50...60...70...80...90...100 - done.


../data/flood_layer/gha/wri_aqueduct_version_2/inunriver_historical_000000000WATCH_1980_rp00100-gh

## Activity 3: Intersect hazard 

Let us now intersect the hazard and the roads, starting with one hazard initially so we save time.

### Step 1) Specify your input and output path as well as the name of the intersection

In [24]:
flood_path = Path(
    data_folder,
    "flood_layer",
    "gha",
    "wri_aqueduct_version_2",
    "inunriver_historical_000000000WATCH_1980_rp00100-gha.tif",
)

output_path = Path(
    data_folder,
    "results",
    "inunriver_historical_000000000WATCH_1980_rp00100__roads_exposure.gpkg",
)

Read in pre-processed road edges, as created earlier.

In [21]:
roads = gpd.read_file(data_folder / "GHA_OSM_roads.gpkg", layer="edges")

### Step 2) Run the intersection

In [27]:
grid, bands = snail.io.read_raster_metadata(flood_path)

prepared = snail.intersection.prepare_linestrings(roads)
flood_intersections = snail.intersection.split_linestrings(prepared, grid)
flood_intersections = snail.intersection.apply_indices(
    flood_intersections, grid
)
flood_data = snail.io.read_raster_band_data(flood_path)
flood_intersections[
    "inunriver__epoch_historical__rcp_baseline__rp_100"
] = snail.intersection.get_raster_values_for_splits(
    flood_intersections, flood_data
)

Calculate the exposed length

In [28]:
geod = Geod(ellps="WGS84")
flood_intersections["flood_length_m"] = flood_intersections.geometry.apply(
    geod.geometry_length
)

In [29]:
flood_intersections.tail(2)

Unnamed: 0,osm_id,road_type,name,id,from_id,to_id,length_m,geometry,split,index_i,index_j,inunriver__epoch_historical__rcp_baseline__rp_100,flood_length_m
15688,1182207853,secondary,Bontanga - Dalung Road,roade_15688,roadn_12222,roadn_8006,6604.650117,"LINESTRING (-1.00963 9.62941, -1.01021 9.63122...",8,270,183,0.0,782.156843
15688,1182207853,secondary,Bontanga - Dalung Road,roade_15688,roadn_12222,roadn_8006,6604.650117,"LINESTRING (-1.01227 9.63597, -1.01230 9.63605...",9,269,183,0.0,135.659825


Calculate the proportion of roads in our dataset which are exposed to >=1m flood depths in this scenario

In [30]:
exposed_1m = flood_intersections[
    flood_intersections.inunriver__epoch_historical__rcp_baseline__rp_100 >= 1
]
exposed_length_km = exposed_1m.flood_length_m.sum() * 1e-3
exposed_length_km

728.5879687723159

In [31]:
all_roads_in_dataset_length_km = roads.length_m.sum() * 1e-3
all_roads_in_dataset_length_km

29069.876011778793

In [32]:
proportion = exposed_length_km / all_roads_in_dataset_length_km
proportion

0.025063332519103282

In [33]:
f"{proportion:.1%} of roads in this dataset are exposed to flood depths of >= 1m in a historical 1-in-100 year flood"

'2.5% of roads in this dataset are exposed to flood depths of >= 1m in a historical 1-in-100 year flood'

In [34]:
output_path.parent.mkdir(parents=True, exist_ok=True)

Save to file (with spatial data)

In [35]:
flood_intersections.to_file(output_path, driver="GPKG")

Save to CSV (without spatial data)

In [36]:
flood_intersections.drop(columns="geometry").to_csv(
    output_path.parent / output_path.name.replace(".gpkg", ".csv")
)