#SKAI is the limit 🚀
*Assessing Post-Disaster Damage 🏚️ from Satellite Imagery 🛰️ using Semi-Supervised Learning Techniques 📔*

*Contributors:  Amine Baha (1), Joseph Xu (2), Jihyeon Lee (2), Tomer Shekel (2), Fiona Huang (1)*

*Co-developed by (1) WFP Innovation Accelerator and (2) Google Research AI, January 2023*

## Intro 🏹

WFP partnered with Google Research to set up **SKAI**, a humanitarian response mapping solution powered by artificial intelligence — an approach that combines statistical methods, data and modern computing techniques to automate specific tasks. SKAI assesses damage to buildings by applying computer vision — computer algorithms that can interpret information extracted from visual materials such as, in this case, **satellite images of areas impacted by conflict, climate events, or other disasters**.

![Skai Logo](https://storage.googleapis.com/skai-public/skai_logo.png)

The type of machine learning used in SKAI, learns from a small number of labeled and a large number of unlabeled images of affected buildings. SKAI uses a ***semi-supervised learning technique*** that reduces the required number of labeled examples by an order of magnitude. As such, SKAI models typically *only need a couple hundred labeled examples* to achieve high accuracy, significantly improving the speed at which accurate results can be obtained.

Google Research presented this novel application of semi-supervised learning (SSL) to train models for damage assessment with a minimal amount of labeled data and large amount of unlabeled data in [June 2020](https://ai.googleblog.com/2020/06/machine-learning-based-damage.html). Using the state-of-the-art methods including [MixMatch](https://arxiv.org/abs/1905.02249) and [FixMatch](https://arxiv.org/abs/2001.07685), they compare the performance with supervised baseline for the 2010 Haiti earthquake, 2017 Santa Rosa wildfire, and 2016 armed conflict in Syria.

![SSL Approach](https://storage.googleapis.com/skai-public/ssl_diagram.png)

The [paper](https://arxiv.org/abs/2011.14004) published by *Jihyeon Lee, Joseph Z. Xu, Kihyuk Sohn, Wenhan Lu, David Berthelot, Izzeddin Gur, Pranav Khaitan, Ke-Wei, Huang, Kyriacos Koupparis, Bernhard Kowatsch* shows how models trained with SSL methods can reach fully supervised performance despite using only a fraction of labeled data.


## Notebook Setup 📓

**Before running this Colab notebook, we recommend to initialize your kernel using [Initialize SKAI XManager Colab Kernel Notebook](https://github.com/google-research/skai/blob/main/src/colab/Initialize_SKAI_XManager_Colab_Kernel.ipynb).**

In [None]:
import os
import datetime

#@title Please run this cell first!

#@markdown Specify the parameters to set up your Colab notebook. They should be the same that the ones used during the initialization of the Colab kernel
#############################################
### CODE SETTING - ENVIRONMENT ACTIVATION ###
#############################################
#@markdown ---
#@markdown Please enter the path to the **git repository** and **colab workspace directory** to use:

#@markdown ---
SKAI_CODE_DIR = "/content/skai"  #@param {type:"string"}
SKAI_VENV_DIR = "/content/skai_env"  #@param {type:"string"}
SKAI_REPO = "https://github.com/google-research/skai.git"  #@param {type:"string"}
SKAI_BRANCH = "instadeep"  #@param {type:"string"}
SKAI_COMMIT = "" #@param {type:"string"}

root_filesys=os.path.dirname(SKAI_CODE_DIR)

pathsys_venv=SKAI_VENV_DIR
pathsys_actenv=os.path.join(pathsys_venv, 'bin/activate')

pathsys_skai=SKAI_CODE_DIR
%shell rm -rf {SKAI_CODE_DIR}
%shell git clone -b {SKAI_BRANCH} {SKAI_REPO} {SKAI_CODE_DIR}
if SKAI_COMMIT!='':
  %shell cd {SKAI_CODE_DIR} ; git checkout {SKAI_COMMIT}

In [None]:
#@title Run XManager Train Job (with Vizier Hyperparameter Tuning) on Vertex AI

#@markdown Enter arguments for the training job
CONFIG_FILE = "skai_two_tower_config" #@param ["skai_config","skai_two_tower_config"]
DATASET_NAME = "skai_dataset" #@param {type:"string"}
DATASET_GCP_PATH = "gs://skai-data/hurricane_ian" #@param {type:"string"}
GCP_OUTPUT_DIR = "gs://skai-data/experiments/skai_train_vizier" #@param {type:"string"}
NUM_EPOCHS = 10 #@param {type:"integer"}
ACCELERATOR = "V100" #@param ["V100","T4"]
EXPERIMENT_NAME = "skai_train_vizier" #@param {type:"string"}

GOOGLE_CLOUD_BUCKET_NAME = os.path.split(DATASET_GCP_PATH.replace("gs://", ""))[0]


job_args ={
    'config':f"src/skai/model/configs/{CONFIG_FILE}.py",
    'config.data.tfds_dataset_name':DATASET_NAME,
    'config.data.tfds_data_dir':DATASET_GCP_PATH,
    'config.output_dir':GCP_OUTPUT_DIR,
    'config.training.num_epochs':NUM_EPOCHS,
    'accelerator':ACCELERATOR,
    'experiment_name':EXPERIMENT_NAME,
}

JOB_ARGS_STR = [' '.join(f"--{f}={v}" for f, v in job_args.items())][0]

print(JOB_ARGS_STR)

sh = f"""
export GOOGLE_APPLICATION_CREDENTIALS=/root/service-account-private-key.json
export GOOGLE_CLOUD_BUCKET_NAME={GOOGLE_CLOUD_BUCKET_NAME}

cd {SKAI_CODE_DIR}
xmanager launch src/skai/model/xm_launch_single_model_vertex.py -- \
--xm_wrap_late_bindings \
--xm_upgrade_db=True \
--project_path={SKAI_CODE_DIR} \
--accelerator_count=1 {JOB_ARGS_STR}
"""

with open('script.sh', 'w') as file:
  file.write(sh)

%shell bash script.sh