* Include the command line instructions for preparing the splits
  - Possibly include a parameter for testing (just a few tiffs)
    - Is this tiff failing to reproject? LE07_147038_20080701.tif
  - !bash scripts/geo/setup_data.sh
  - Include as actual code block
* argument to notebook: what are the train / dev / test folders?
  - where do we save the checkpoints?
  - Where do we save the logs
  - Where do we save the final accuracy
* plot the data right before training (a few random slices)
* Call the train script as a bash command
  - !python3 -m train
* Load the summarywriter that has been saved, look at some predictions
* Load some of the checkpoints
  - Visualize predictions from them
  - Visualize predictions on tiffs within train / test?


In [None]:
# some parameters for the notebook
import yaml
params = yaml.safe_load(open("geo.yaml", "r"))
params

In [None]:
!rm -rf /mnt/blobfuse/glaciers/expers/geographic/masks # remove any masks files that might exist
!bash ../geo/setup_data.sh test # create geographic splits data in test mode

Now that we've setup the geographic splits, let's look at some example training images.

In [None]:
import matplotlib.pyplot as plt
import pathlib
import numpy
import random
%matplotlib inline

params["split_dir"] = "/mnt/blobfuse/glaciers/expers/geographic/splits/1"
params["split_dir"] = pathlib.Path(params["split_dir"])
train_ims = list((params["split_dir"] / "train").glob("*img*npy"))
test_ims = list((params["split_dir"] / "test").glob("*img*npy"))
dev_ims = list((params["split_dir"] / "dev").glob("*img*npy"))

K = 5
_, ax = plt.subplots(2, K, figsize=(15, 15))
for i, im in enumerate(random.sample(train_ims, k=K)):
    x = np.load(im)
    y = np.load(str(im).replace("img", "mask"))
    ax[0, i].imshow(x)
    ax[1, i].imshow(y)
    
    
plt.subplots_adjust(wspace=.05, hspace=0)

The code below visualizes which slices are included in the training, development, and test sets.

In [None]:
import geopandas as gpd
params["slices_geojson"] = "/mnt/blobfuse/glaciers/expers/geographic/slices/slices.geojson"
slices_meta = gpd.read_file(params["slices_geojson"])
slices_meta

slices_meta["train_type"] = ""
train_str = [str(s.stem) for s in train_ims]
dev_str = [str(s.stem) for s in dev_ims]
test_str = [str(s.stem) for s in test_ims]

for row in slices_meta.iterrows():
    cur_slice = pathlib.Path(row[1][2]).stem
    if cur_slice in train_str:
        slices_meta.loc[row[0], "train_type"] = "train"
    elif cur_slice in dev_str:
        slices_meta.loc[row[0], "train_type"] = "dev"
    elif cur_slice in test_str:
        slices_meta.loc[row[0], "train_type"] = "test"

slices_meta.plot(column="train_type", legend=True)

Now we run training and look at the resulting models.

In [None]:
!python3 ../train.py -d /mnt/blobfuse/glaciers/expers/geographic/splits/1/ -c ../../conf/train.yaml  -p /mnt/blobfuse/glaciers/expers/geographic/splits/1/postprocess.yaml -r geo

In [None]:
%tensorboard --logdir=/mnt/blobfuse/glaciers/expers/geographic/splits/1/runs/geo/logs/