# OpenMapFlow Tutorial

<img src="https://raw.githubusercontent.com/nasaharvest/openmapflow/main/assets/quick-map3.gif" width="80%"/>

### Sections
1. Installing OpenMapFlow
2. Exploring labeled earth observation data
3. Training a model
4. Doing inference over small region
5. Deploying of best model

### Prerequisites:
- Github account
- Github access token (obtained [here](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token))
- Forked OpenMapFlow repository
- Basic Python knowledge  

### Editable Google Doc for Q&A:
https://docs.google.com/document/d/1Kp6MphER1G5tdLYeAzl4n19S10TweIxiYT64rXsjKm4/edit?usp=sharing

## 1. Clone Github repo and install OpenMapFlow

<img src="https://storage.googleapis.com/harvest-public-assets/openmapflow/title.png" width="70%"/>

In [None]:
!pip install "ipywidgets>=7,<8" -q # https://github.com/googlecolab/colabtools/issues/3020

In [None]:
from ipywidgets import HTML, Password, Text, Textarea, VBox
inputs = [
      Password(description="Github Token:"),
      Text(description='Github Email:'),
      Text(description='Github User:'),
]
VBox(inputs)

The OpenMapFlow repository will be cloned to allow access to already available data.

Ensure you have created a fork of the repository.

In [None]:
token = inputs[0].value
email = inputs[1].value
username = inputs[2].value

github_url_input = Textarea(value=f'https://github.com/{username}/openmapflow.git')
VBox([HTML(value="<b>Github Clone URL</b>"), github_url_input])

In [None]:
from pathlib import Path

github_url = github_url_input.value
project_name = "crop-mask-example" # maize-example
country_name = "Togo" # Kenya

for input_value in [token, email, username, github_url]:
  if input_value.strip() == "":
    raise ValueError("Found input with blank value.")

path_to_project = f"{Path(github_url).stem}/{project_name}"

!git config --global user.email $username
!git config --global user.name $email
!git clone {github_url.replace("https://", f"https://{username}:{token}@")}

%cd {path_to_project}

In [None]:
!pip install openmapflow[all] -q
!pip install dvc[gs] cmocean -q

In [None]:
# Download GDAL
%%shell
GDAL_VERSION="3.6.4+dfsg-1~jammy0"
add-apt-repository -y ppa:ubuntugis/ubuntugis-unstable
apt-get -qq update
apt-get -qq install python3-gdal=$GDAL_VERSION gdal-bin=$GDAL_VERSION libgdal-dev=$GDAL_VERSION

In [None]:
# CLI
!openmapflow

## 2. Exploring labeled earth observation data 🛰️

<img src="https://storage.googleapis.com/harvest-public-assets/openmapflow/step1.png" width="70%"/>

In [None]:
# A Google Cloud Account is required to access the data
!gcloud auth application-default login

In [None]:
# Pull in data already available
!dvc pull

In [None]:
# See report of data already available
!openmapflow datasets

### Exploring labels

In [None]:
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
from datasets import datasets, label_col
from openmapflow.constants import LAT, LON, DATASET, SUBSET

In [None]:
# Load data as csv
df = pd.concat([d.load_df(to_np=True) for d in datasets])
df.head()

In [None]:
# Plot map where labels should go
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world.plot(facecolor="lightgray", figsize=(15, 15));

In [None]:
world

In [None]:
# Convert pandas dataframe to geopandas dataframe
gdf = gpd.GeoDataFrame(df)
gdf["geometry"] = [Point(xy) for xy in zip(gdf[LON], gdf[LAT])]

In [None]:
ax = world.plot(figsize=(20,20), facecolor="lightgray")
ax.set_title("Label Locations")
ax.axis('off')
gdf.plot(
    ax=ax,
    marker='o',
    categorical=True,
    markersize=1,
    column=DATASET,
    legend=True,
    legend_kwds={'loc': 'lower left'});

In [None]:
country = world[world["name"] == country_name]
ax = country.plot(figsize=(10,10), facecolor="lightgray")
ax.set_title("Label Locations by subset")
ax.axis('off')

points = gdf[gdf["country"] == country_name]
points.plot(
    ax=ax,
    marker='o',
    categorical=True,
    markersize=1,
    column="subset",
    legend=True,
    legend_kwds={'loc': 'lower left'});

### Exploring earth observation data

In [None]:
import matplotlib.pyplot as plt
from openmapflow.constants import MONTHS, EO_DATA
from openmapflow.bands import BANDS

In [None]:
# Get a label with postive class
positive_example = df[(df[label_col] == 1.0) & (df[SUBSET] == "validation")].iloc[0]
positive_example

In [None]:
# Load earth observation data for label
positive_example[EO_DATA].shape

**Available earth observation bands**

<img src="https://storage.googleapis.com/harvest-public-assets/openmapflow/cropharvest_bands.png" width="80%"/>

In [None]:
fig, ax = plt.subplots(1,1, figsize=(15,5))
ax.bar(x=BANDS, height=positive_example[EO_DATA][10])
ax.set_title("Earth observation bands")
plt.xticks(rotation=45);

### ❗**Challenge**❗

Plot the NDVI (normalized difference vegetation index) for positive and negative example data over a one year period.

In [None]:
fig, ax = plt.subplots(1,1, figsize=(10,5))
ax.set_title("NDVI")
plt.xticks(rotation=45)

positive_class_ndvi = positive_example[EO_DATA][:12, -1]
ax.plot(MONTHS, positive_class_ndvi, label="Positive class")

##########################################
negative_example = df[(df[label_col] == 0.0) & (df[SUBSET] == "validation")].iloc[0]
##########################################
negative_example_ndvi = negative_example[EO_DATA][:12, -1]
ax.plot(MONTHS, negative_example_ndvi, label="Negative class")

ax.legend()

gmap_url = "http://maps.google.com/maps?z=12&t=k&q=loc:"
print(f"Positive class: {gmap_url}{positive_example[LAT]}+{positive_example[LON]}")
print(f"Negative class: {gmap_url}{negative_example[LAT]}+{negative_example[LON]}")

## 3. Train a model 🏋️‍♂️

<img src="https://storage.googleapis.com/harvest-public-assets/openmapflow/step2.png" width="80%"/>

<img src="https://storage.googleapis.com/harvest-public-assets/openmapflow/train_model.png" width="80%" />

In [None]:
import os
os.environ["MODEL_NAME"] = input("MODEL_NAME=")

`train.py` can be opened in Colab directly using the sidebar.

In [None]:
!python train.py --model_name $MODEL_NAME --epoch 3

### ❗**Optional Challenge**❗

Try to improve the model by modifying `{project_name}/train.py` in Colab directly

## 4. Inference over small region 🗺️

In [None]:
from openmapflow.train_utils import model_path_from_name
from openmapflow.config import PROJECT
from openmapflow.inference import Inference
from openmapflow.bands import DYNAMIC_BANDS
from tqdm.notebook import tqdm
from pathlib import Path
from datetime import date
import cmocean
import numpy as np
import rasterio as rio
import torch

In [None]:
tifs_dir = Path(f"/content/tifs")
preds_dir = Path(f"/content/preds")
tifs_dir.mkdir(exist_ok=True)
preds_dir.mkdir(exist_ok=True)

### Download example inference data

In [None]:
prefix = "gs://harvest-public-assets/openmapflow/Togo_2019_demo_2019-02-01_2020-02-01"
paths = [
  f"{prefix}/00000000000-0000000000.tif",
  f"{prefix}/00000000000-0000000256.tif",
  f"{prefix}/00000000256-0000000000.tif",
  f"{prefix}/00000000256-0000000256.tif"
]

for p in tqdm(paths):
  !gsutil -m cp {p} {tifs_dir}/{Path(p).name}

In [None]:
!gdalbuildvrt {tifs_dir}.vrt {tifs_dir}/*.tif
!gdal_translate -a_srs EPSG:4326 -of GTiff {tifs_dir}.vrt {tifs_dir}.tif

In [None]:
def normalize(array):
    array_min, array_max = array.min(), array.max()*0.5
    return ((array - array_min)/(array_max - array_min))

month = 2
rgb_indexes = [DYNAMIC_BANDS.index(b) for b in ["B4", "B3", "B2"]]
eo_data = rio.open(f"{tifs_dir}.tif")
colors = [eo_data.read(i + month*len(DYNAMIC_BANDS)) for i in rgb_indexes]
normalized_colors = [normalize(c) for c in colors]
rgb = np.dstack(normalized_colors)
plt.figure(figsize=(10,10))
plt.title("Earth Observation data for one month")
plt.axis('off')
plt.imshow(rgb);

### Make predictions with model

In [None]:
model = torch.jit.load(model_path_from_name(os.environ["MODEL_NAME"]))
inference = Inference(model=model, normalizing_dict=None)
local_pred_paths = []
tifs = list(Path(tifs_dir).glob("*.tif"))
for local_tif_path in tqdm(tifs, desc="Making predictions"):
  local_pred_path = Path(f"{preds_dir}/pred_{local_tif_path.stem}.nc")
  inference.run(
      local_path=local_tif_path,
      dest_path=local_pred_path
  )
  local_pred_paths.append(local_pred_path)

### Merge predictions into map

<img src="https://storage.googleapis.com/harvest-public-assets/openmapflow/merging_predictions.png" width="50%"/>

In [None]:
!gdalbuildvrt {preds_dir}.vrt {preds_dir}/*.nc
!gdal_translate -a_srs EPSG:4326 -of GTiff {preds_dir}.vrt {preds_dir}.tif

### Visualize predictions

In [None]:
# Visualize
predictions_map = rio.open(f"{preds_dir}.tif")
if "maize" in PROJECT:
  cmap = cmocean.cm.solar
elif "crop" in PROJECT:
  cmap = cmocean.cm.speed
else:
  cmap = cmocean.cm.thermal

plt.figure(figsize=(10,10))
plt.imshow(predictions_map.read(1).clip(0,1), cmap=cmap)
plt.title(f"Map Preview: {PROJECT}")
plt.colorbar(fraction=0.03, pad=0.04)
plt.axis("off");

## 5. [OPTIONAL] Deployment - Push to dvc and git

In [None]:
# Generate test metrics
!python evaluate.py --model_name $MODEL_NAME

In [None]:
# This will only work if you have been granted write bucket permissions.
!dvc commit -q
!dvc push

In [None]:
!git checkout -b"$MODEL_NAME"
!git add .
!git commit -m "$MODEL_NAME"
!git push --set-upstream origin "$MODEL_NAME"

Once Pull Request is merged model will be deployed for map creation.

<img src="https://storage.googleapis.com/harvest-public-assets/openmapflow/step3.png" width="80%"/>