# Urban growth modeling in GRASS GIS: Parallel computing case study
The purpose of this notebook is to demonstrate several parallel computing principles and how they are implemented in GRASS GIS.
We use FUTURES urban growth model implemented as a GRASS GIS addon [r.futures](https://grass.osgeo.org/grass-stable/manuals/addons/r.futures.html).

This notebook uses a [prepared dataset](https://doi.org/10.5281/zenodo.6577922). This dataset is a GRASS GIS Location containing:
 * [NLCD 2001-2019](https://www.mrlc.gov/) (National Land Cover Database; land use and impervious surface descriptor)
 * [US county boundaries](https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html)
 * [US-PAD protected areas](https://www.usgs.gov/programs/gap-analysis-project/science/pad-us-data-overview)
 * [USGS DEM](https://www.usgs.gov/3d-elevation-program/about-3dep-products-services)
 
Population files were downloaded from [Zenodo](https://doi.org/10.5281/zenodo.6577903).

All the software is already installed and includes:
 * _GRASS GIS v8.3_ with the following addons: _r.futures, r.mapcalc.tiled, r.sample.category_
 * _R_ with the following packages: _lme4, optparse, MuMIn, snow_
 * _GNU Parallel_
 * _Python 3_ with packages _pandas_
 

<div  class="alert alert-info">This notebook combines Python 3 and Bash cells. By default a code cell is in Python.
We use IPython <a href="https://ipython.readthedocs.io/en/stable/interactive/magics.html#cell-magics">cell magic</a> including `%%bash`, `%%time`, `%%timeit` and `%%writefile`.</div>

## Setting up
The data structure is set up this way
```
~/data
├── grassdata
|   └── FUTURES_SE_USA (location)
│       ├── PERMANENT  (mapset)
│       └── tutorial   (mapset)
├── observed_population_SE_counties_2001-2019.csv
└── projected_population_SE_counties_2020-2100_SSP2.csv

```

Change the current directory to where we have input data:

In [None]:
import os

os.chdir(os.path.expanduser("~/data"))

Import Python packages and initialize GRASS GIS session:

In [None]:
import subprocess
import sys
import pathlib
import json
import pandas as pd
from IPython.display import Image

# Ask GRASS GIS where its Python packages are.
sys.path.append(
    subprocess.check_output(["grass", "--config", "python_path"], text=True).strip()
)

# Import GRASS packages
import grass.script as gs
import grass.jupyter as gj

# Start GRASS Session
session = gj.init("~/data/grassdata/", "FUTURES_SE_USA", "tutorial")

## Data preprocessing
List dataset layers:

In [None]:
%%bash
g.list type=raster,vector -p

### Process county boundaries
Extract 3 counties in Southeast US states (Georgia, North Carolina, South Carolina)
and convert GEOID columns to integer, which is going to be useful for further processing.

In [None]:
%%bash
v.extract tl_2021_us_county output=counties where="STATEFP in ('13', '37', '45')" --q
v.db.addcolumn counties column="state integer" --q
v.db.addcolumn counties column="county integer" --q
v.db.update counties col=state qcol="CAST(STATEFP AS integer)" --q
v.db.update counties col=county qcol="CAST(GEOID AS integer)" --q

In [None]:
m = gj.InteractiveMap()
m.add_vector(name="counties")
m.show()

Split and rasterize states for further parallelization steps:

In [None]:
states = [13, 37, 45]
gs.use_temp_region()
for state in states:
    gs.run_command(
        "v.extract",
        input="SE_counties",
        where=f"state == '{state}'",
        output=f"state_{state}",
    )
    gs.run_command("g.region", vector=f"state_{state}", align="nlcd_2019")
    gs.run_command(
        "v.to.rast",
        input=f"state_{state}",
        output=f"state_{state}",
        use="attr",
        attribute_column="county",
    )
gs.del_temp_region()

In [None]:
m = gj.Map()
m.d_rast(map="state_37")
m.show()

Set computational region to match counties' extent and aligns with the NLCD raster.

In [None]:
gs.run_command("g.region", vector="counties", align="nlcd_2019")

### DEM to slope
Compute slope with [r.slope.aspect](https://grass.osgeo.org/grass-stable/manuals/r.slope.aspect.html) which uses OpenMP for parallelization:

In [None]:
gs.run_command(
    "r.slope.aspect", elevation="DEM", slope="slope", flags="e", nprocs=4
)

In [None]:
m = gj.Map()
m.d_rast(map="slope")
m.show()

### Protected land
Rasterize protected areas to later include them in a mask. We use GridModule to split the computation in tiles:

In [None]:
%%python
from grass.pygrass.modules.grid import GridModule

grid = GridModule(
    "v.to.rast",
    input="protected",
    output="protected",
    type="area",
    use="cat",
    processes=4,
    #patch_backend="r.patch",
    quiet=True,
)
grid.run()

In [None]:
m = gj.Map()
m.d_rast(map="protected")
m.show()

### Process NLCD data
Most of our predictors we will derive from NLCD data (land cover type and impervious descriptor products). With r.reclass we create water, wetland, forest, roads, urban rasters.
Note that those rasters are virtual (they behave the same way, but are only pointing to the original NLCD raster),
so reclassification is very fast.

In [None]:
NLCD_years = [2001, 2004, 2006, 2008, 2011, 2013, 2016, 2019]
NLCD_start_end_years = [2001, 2019]
# water (1 or no data)
gs.write_command(
    "r.reclass", input="nlcd_2019", output="water", rules="-", stdin="11 = 1"
)
# binary wetlands
gs.write_command(
    "r.reclass",
    input="nlcd_2019",
    output="wetlands",
    rules="-",
    stdin="90 95 = 1 \n * = 0",
)
for year in NLCD_years:
    gs.write_command(
        "r.reclass",
        input=f"nlcd_{year}",
        output=f"urban_{year}",
        rules="-",
        stdin="21 22 23 24 = 1\n* = 0",
    )
for year in NLCD_start_end_years:
    gs.write_command(
        "r.reclass",
        input=f"nlcd_{year}",
        output=f"forest_{year}",
        rules="-",
        stdin="41 42 43 = 1",
    )
    gs.write_command(
        "r.reclass",
        input=f"nlcd_descriptor_{year}",
        output=f"roads_{year}",
        rules="-",
        stdin="20 21 22 23 = 1",
    )
    gs.write_command(
        "r.reclass",
        input=f"nlcd_descriptor_{year}",
        output=f"urban_no_roads_{year}",
        rules="-",
        stdin="24 25 26 = 1\n* = 0",
    )

In Bash use background processing (append &) to compute distance to water, forest, and roads in parallel since these are independent computations. Command `wait` forces to wait for the background processes to finish.
Once the distance is computed, we use raster algebra to transform it logarithmically.

In [None]:
%%bash
r.grow.distance input=water distance=dist_to_water -m --q &
r.grow.distance input=forest_2001 distance=dist_to_forest_2001 -m --q &
r.grow.distance input=forest_2019 distance=dist_to_forest_2019 -m --q &
r.grow.distance input=roads_2001 distance=dist_to_roads_2001 -m --q &
r.grow.distance input=roads_2019 distance=dist_to_roads_2019 -m --q &
wait
r.mapcalc "log_dist_to_water = log(dist_to_water + 1)" --q &
r.mapcalc "log_dist_to_forest_2001 = log(dist_to_forest_2001 + 1)" --q &
r.mapcalc "log_dist_to_forest_2019 = log(dist_to_forest_2019 + 1)" --q &
r.mapcalc "log_dist_to_roads_2001 = log(dist_to_roads_2001 + 1)" --q &
r.mapcalc "log_dist_to_roads_2019 = log(dist_to_roads_2019 + 1)" --q &
wait

In [None]:
m = gj.Map()
m.d_rast(map="log_dist_to_forest_2019")
m.show()

As another predictor, we compute wetland density (percentage of wetland in 1 km squared circular neighborhood). Module [r.neighbors](https://grass.osgeo.org/grass-stable/manuals/r.neighbors.html) is internally parallelized using OpenMP, so we can use `nprocs` option:

In [None]:
gs.run_command(
    "r.neighbors",
    input="wetlands",
    output="wetland_density",
    size=15,
    method="average",
    flags="c",
    nprocs=8,
)

FUTURES uses a special predictor called development pressure, which can be computed with [r.futures.devpressure](https://grass.osgeo.org/grass-stable/manuals/addons/r.futures.devpressure.html), which is internally parallelized.
Since we need to compute it for 2 years, we use a hybrid approach which runs both command as background process and each of them runs in parallel.
To do that we split the number of available processes so that each r.futures.devpressure process gets half of the available processes:

In [None]:
%%bash
r.futures.devpressure input=urban_no_roads_2001 output=devpressure_2001 size=15 gamma=0.5 nprocs=4 scaling_factor=0.1 &
r.futures.devpressure input=urban_no_roads_2019 output=devpressure_2019 size=15 gamma=0.5 nprocs=4 scaling_factor=0.1 &
wait

In [None]:
m = gj.Map()
m.d_rast(map="devpressure_2001")
m.show()

### Mask
Compute mask to avoid urban growth simulation in water, protected areas, and outside the study area. We use [r.mapcalc.tiled](https://grass.osgeo.org/grass-stable/manuals/addons/r.mapcalc.tiled.html):

In [None]:
gs.run_command(
    "r.mapcalc.tiled",
    expression="masking = if((isnull(protected) &&  isnull(water) && nlcd_2019), 1, null())",
    nprocs=12,
)

In [None]:
m = gj.Map()
m.d_rast(map="masking")
m.show()