 # Downsampling a Copernicus 30m DEM

This notebook serves as an example of how to parameterize a notebook so that it can be called on the command-line via `papermill` with custom inputs specified. It also exercises the basic features of an actual science algorithm notebook in that it:

1. simulates a CPU and memory intensive algorithm by running stress-ng (for a random amount of time between `min_stress_time` and `max_stress_time`)
1. creates a representative output dataset by using `gdalwarp` to downsample the input dataset
1. creates a browse image from the output dataset by using `convert` to further downsample the output and convert it to a png
1. creates a JSON metadata file extracted from the output product
1. aggregates all 3 of these ouputs files into an output dataset directory representative of the output of this notebook 

This first cell holds the input variables that we want to expose as papermill parameters. The cell needs to be tagged with `parameters` in order for papermill to recognize them.

In [None]:
# this file should already exist (see `stage_in.ipynb`)
input_file = "inputs/Copernicus_DSM_COG_10_N21_00_W159_00_DEM.tif"

# stress-ng timeout bounds
min_stress_time = 15
max_stress_time = 30

This next cell determines the output product id to use and randomly determines the amount of time to run stress-ng to simulate a CPU and memory-heavy algorithm.

In [None]:
import os, random

# create outputs directory
outputs_dir = "outputs"
if not os.path.isdir(outputs_dir):
    os.makedirs(outputs_dir)

# determine product id for output product
dataset_id = os.path.splitext(os.path.basename(input_file))[0] + "_downsampled"

# get amount of time to run stress-ng
stress_time = random.randint(int(min_stress_time), int(max_stress_time))
print(f"Running stress-ng for {stress_time} seconds...")

Here we run stress-ng. If you have access to the host running it, see the resource utilization with `top` or `htop`.

In [None]:
%%bash -s "$stress_time"
stress-ng --cpu 32 --vm 32 --vm-bytes 80% --vm-method all --verify -t ${1}s -v

Finally we create the output product directory, downsample the input file to our output product file, generate the browse image from the output product file, and extract the metadata of our output product file to a JSON file.

In [None]:
%%bash -s "$dataset_id" "$input_file" "$outputs_dir"

# create output product directory
mkdir $3/$1

# created downsampled GeoTIFF
gdalwarp -ts 1200 0 $2 $3/$1/$1.tif

# create browse image
convert -resize 250x250 $3/$1/$1.tif $3/$1/$1.browse.png

# create metadata file
gdalinfo $3/$1/$1.tif -json > $3/$1/$1.met.json

You should now have an output product directory:

In [None]:
print(f"Your output product directory is {outputs_dir}/{dataset_id}")

and to match on this output product directory you can either:

1. utilize the following glob pattern: "outputs/*_downsampled"
2. utilize the following regex pattern: r'^outputs/\w+_downsampled'