# Analyse a sample of 25 functional urban areas from each continent

This notebook downloads the boundaries of FUAs from GHSL, randomly samples 25 of them from each continent to account for various sizes and geographical variation, downloads street networks from OpenStreetMap and measures areas and circular compactness of polygons enclosed by street network geometry.

In [58]:
import os
import warnings
import unicodedata

import geopandas
import numpy
import pandas
import pooch
import pygeos
import osmnx as ox

from tqdm import tqdm

## Download data

Download FUA polygons. We are using _GHS functional urban areas, derived from GHS-UCDB R2019A (2015)_ from [GHSL - Global Human Settlement Layer](https://ghsl.jrc.ec.europa.eu/ghs_fua.php).

In [4]:
fua_path = "https://jeodpp.jrc.ec.europa.eu/ftp/jrc-opendata/GHSL/GHS_FUA_UCDB2015_GLOBE_R2019A/V1-0/GHS_FUA_UCDB2015_GLOBE_R2019A_54009_1K_V1_0.zip"

fua_cache = pooch.retrieve(fua_path, known_hash="d54de59b82b8c4d64a710f90ccd554975a3be92233f14115ac154094c3549979")

Read polygons and continent geometry (built-in dataset in geopandas coming from Natural Earth).

In [13]:
fua = geopandas.read_file(f"{fua_cache}!GHS_FUA_UCDB2015_GLOBE_R2019A_54009_1K_V1_0.gpkg")
continents = geopandas.read_file(geopandas.datasets.get_path("naturalearth_lowres"))

Attach information on a continent to FUAs.

In [16]:
fua = fua.merge(continents[["continent", "iso_a3"]], left_on="Cntry_ISO", right_on="iso_a3")

Sample 25 FUAs from each continent.

In [19]:
sample = []
for continent in fua.continent.unique():
    sample.append(fua[fua.continent == continent].sample(25, random_state=42))
sample = pandas.concat(sample)

Reproject geometry to WGS84 required by OSM and check geometry validity.

In [20]:
sample = sample.to_crs(4326)
if not sample.is_valid.all():
    sample.geometry = sample.buffer(0)

Loop over the sampled FUAs and download their street network from OSM. This step may take some time.

In [56]:
# Filter warnings about GeoParquet implementation.
warnings.filterwarnings('ignore', message='.*initial implementation of Parquet.*')

# Define which combination of OSM tags should be used. This covers what would be usually used in a morphological analysis.
type_filter = '["highway"~"living_street|motorway|motorway_link|pedestrian|primary|primary_link|residential|secondary|secondary_link|service|tertiary|tertiary_link|trunk|trunk_link"]'

# Loop over all samples
for ix, row in tqdm(sample.iterrows(), total=len(sample)):
    # Download OSM graph
    streets_graph = ox.graph_from_polygon(row.geometry, network_type='all_private', custom_filter=type_filter, retain_all=True)
    # Project graph to the local UTM zone (in meters with a reletively small error)
    streets_graph = ox.projection.project_graph(streets_graph)
    # Create an undirected graph to avoid duplicated geometry and convert it to a GeoDataFrame
    gdf = ox.graph_to_gdfs(ox.get_undirected(streets_graph), nodes=False, edges=True, node_geometry=False, fill_edge_geometry=True)
    # Ensure tags are a string and not different dtype (as list) so we can save it
    gdf.highway = gdf.highway.astype(str)
    
    # Create a folder for the sample case
    os.makedirs(f"../data/{int(row.eFUA_ID)}", exist_ok=True)
    
    # Save the street network as a GeoParquet, using only necessary columns. We are not interested in other.
    path =  f"../data/{int(row.eFUA_ID)}/roads_osm.parquet"
    gdf[['highway', 'geometry']].to_parquet(path)

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 150/150 [4:50:11<00:00, 116.08s/it]


Save sample boundaries containing names and continents for a future reference.

In [57]:
sample.to_parquet("../data/sample.parquet")

## Measure shape characteristics

Polygonize the network to get polygons fully enclosed by street network geometry (sometimes called blocks, negative space...) and measure their shape characteristics. For the 2-D scattter plot showing the _banana_ shape, we need polygon area and Reock (circular) compactness [Measure A1 in Altman (1998), cited for Frolov (1974), but earlier from Reock (1963)]. For the 1-D histogram, we derive a custom shape index from polygon's area and an area of its mimimum bounding circle.

In [59]:
# Filter warnings about GeoParquet implementation.
warnings.filterwarnings('ignore', message='.*initial implementation of Parquet.*')


# Loop over unique FUA IDs
for fua_id in tqdm(sample.eFUA_ID, total=len(sample)):
    # Read stret network
    roads = geopandas.read_parquet(f"../data/{int(fua_id)}/roads_osm.parquet")

    # Polygonize street network
    polygons = pygeos.polygonize(roads.geometry.array.data)
    
    # Store geometries as a GeoDataFrame
    polygons = geopandas.GeoDataFrame(
        geometry=geopandas.GeoSeries(
            [polygons], crs=roads.crs
        ).explode(ignore_index=True)
    )

    # Extract underlying PyGEOS geometries pygeos understands (minimum_bounding_circle is not yet exposed in geopandas)
    ga = polygons.geometry.array.data

    # measure area
    polygons["area"] = polygons.area
    # generate minimum bounding circle
    polygons["bounding_circle"] = geopandas.GeoSeries(pygeos.minimum_bounding_circle(ga), crs=polygons.crs)
    # measure MBC's area
    polygons["circle_area"] = polygons["bounding_circle"].area
    # measure Reock (circular) compactness
    polygons["reock"] = polygons["area"] / polygons["circle_area"]
    # measure direct shape index that captures banana-like relationship between area and Reock compactness in a one dimension
    polygons["shape_index"] = polygons["area"] / numpy.sqrt(polygons["circle_area"])
    
    # save polygons to a GeoParquet
    polygons.to_parquet(f"../data/{int(fua_id)}/polygons.parquet")

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 150/150 [00:58<00:00,  2.55it/s]
