This notebook creates training patches for the landcover transfer learning task. We assume the raw landcover rasters have already been downloaded using the `landcover.ipynb` notebook. The overall process has the form

1. Build a VRT from all the landcover tiles

2. For each input x (LE7 imagery) tile...

  a. Read a window of the y (landcover) VRT with the same lat / long bounds as the x tile
  
  b. Extract a patch of size 512 x 512 from x, which is the dimension used for model trainig 

3. For each patch of size 512 x 512 from x ...

  a. Determine the corresponding pixel coordinates in y. Not exact alignment, because x and y have different spatial resolutions

  b. Linearly interpolate y onto the dimension of x
  
  c. Write both x and y to file
  
4. Shuffle those into training, development, and test datasets. These will be preprocessed and then used for training.

Note that we are not writing any special metadata to file. I assume that we do not need to filter, since most regions will have at least a few interesting classes (e.g., forest, water).

In [None]:
import rasterio
import matplotlib.pyplot as plt
import glacier_mapping.data.transfer as gt
from pathlib import Path
import numpy as np
import shutil
%matplotlib inline

landcover_dir = Path("/datadrive/glaciers/landcover")
input_folder = Path("/datadrive/glaciers/unique_tiles/warped")
out_dir = Path("/datadrive/glaciers/landcover/patches/")

if out_dir.exists():
    shutil.rmtree(out_dir)
    
out_dir.mkdir()

In [None]:
import subprocess

def vrt_from_dir(input_dir, output_path="./output.vrt", **kwargs):
    inputs = [f for f in input_dir.glob("*.tif*")]
    subprocess.call(["gdalbuildvrt", "-o", output_path] + inputs)

vrt_from_dir(landcover_dir, landcover_dir / "landcover.vrt")
landcover = rasterio.open(landcover_dir / "landcover.vrt")

In [None]:
input_paths = list(Path(input_folder).glob("*.tif*"))

SUBSET = False
if SUBSET is not False:
    input_paths = input_paths[:SUBSET]

# loop over tiles and generate training pairs
for i, p in enumerate(input_paths):
    tilef = rasterio.open(p)
    pairs = gt.patch_tile(tilef, landcover)
    for j, (x, y) in enumerate(pairs):
        np.save(out_dir / f"x_{i:03}-{j:03}.npy", x)
        np.save(out_dir / f"y_{i:03}-{j:03}.npy", y)