# Introduction to Geospatial AI

# Intro
Welcome to this workship about geospatial AI! In this workshop you will try to detect buildings from aerial images. This is done in three steps;

1. Creating training data.
2. Training machine learning models using data from step 1.
3. Evaluating the trained models and predicting where buildings are in images the models haven't seen before. 

We will be using jupyter notebooks with Google Colab, but you don't need to have any experience with these in order to complete this workshop.

# Task 0 - Setup

But first, before we can do any of the fun stuff, we need to set up the environment properly. In order to do that follow the steps under;

1. Create a Google account, for instance by creating a gmail account. Can be skipped if you already have a gmail account. Log into the account in your browser.
2. Head over to https://colab.research.google.com/ and press on the `Github` tab. Search for `kartAI` user and select `kartAI/kartAI` repository. A notebook should appear. Press on this notebook and it should open a notebook in another tab.
3. Create a copy of the notebook in your drive by saving it. Shortcut is `Ctrl + s` on Windows or `Cmd + s` on Mac.
4. Change to a GPU runtime environment. In the top right corner choose `Change runtime type` and select `T4 GPU`.
5. Insert secrets. # TODO: Mayhaps

Nice! What's left now is to clone the git repo we are working with and adding it to the path in addition to installing some dependencies. To do this simply run the two cells below.

In [None]:
!git clone https://github.com/kartAI/kartAI.git

!pip install focal_loss
!pip install azure-storage-blob
!pip install rasterio
!pip install rasterstats

In [None]:
import sys
sys.path.insert(0,'/content/kartAI')

# Task 1 - Training Data
In the first task we create the training data. This is done by selecting an area you want to train on, then download aerial photos in addition to data about all the existing buildings in the chosen area. 

In the next cell, choose which area you want to train from. There is 3 areas you can choose from. You can also define your own area. Head over to https://geojson.io/ to find coordinates and train your model on your custom area. NB!! We dont have a dataset over all of norway, so you might have missing data if you chose an area that's not covered.

In [None]:
# TODO: MALTE Set correct coordinates for the other areas

area = { "x_min": 618296.0, "x_max": 621495.0, "y_min": 6668145.0, "y_max": 6670133.0 }
# area = { "x_min": 618296.0, "x_max": 621495.0, "y_min": 6668145.0, "y_max": 6670133.0 }
# area = { "x_min": 618296.0, "x_max": 621495.0, "y_min": 6668145.0, "y_max": 6670133.0 }

# Task 1 - Training Data
When you have chosen the area you can run the cell after in order to create the training data. The training data is downloaded as rasters (images) and split into a training, validation and test set. The model will train on the training set, and while training run tests on the validation set. After the training is finished it will run tests on the tests set - data the model have never seen before.

While downloading the rasters, it will say how many rasters total it will download. The training data download is quite time consuming, so if you make a custom area, make sure it downloads about ~700 rasters.

In [None]:
from kartAI.kartai.tools.create_training_data import create_training_data

create_training_data(
    training_dataset_name="test", 
    config_file_path="kartAI/config/dataset/bygg.json", 
    eager_load=True,
    confidence_threshold=None, 
    eval_model_checkpoint=None,
    region=None, 
    x_min=area["x_min"], 
    x_max=area["x_max"], 
    y_min=area["y_min"], 
    y_max=area["y_max"],
    num_processes=None                 
)

# Task 1 - Training Data
After downloading the data you can visualize it in the next cell. Make sure the path to the training data is correct, and setting the correct coordinates for the starting view in the map also helps you see the data faster.

In [None]:
import folium
import rasterio
import os

from pyproj import CRS, Transformer

path_to_dir = "/content/training_data/OrtofotoWMS/25832_563000.0_6623000.0_100.0_100.0/512/" # TODO: Insert correct path to the training data.
files = os.listdir(path_to_dir)
files.sort()

crs_25832 = CRS.from_epsg(25832)
crs_4326 = CRS.from_epsg(4326)
transformer = Transformer.from_crs(crs_25832, crs_4326)

fig = folium.Figure(width=800, height=400)
m = folium.Map(
    location=transformer.transform(618200.0, 6669700), # TODO: Insert the correct coordinates for the starting view of the map.
    zoom_start=14
)

for i in range(5):
    with rasterio.open(f"{path_to_dir}{files[i]}") as src:
        img = src.read()
        transformed_bottom_left = transformer.transform(src.bounds.left, src.bounds.bottom)
        transformed_top_right = transformer.transform(src.bounds.right, src.bounds.top)
    m.add_child(folium.raster_layers.ImageOverlay(img.transpose(1, 2, 0), bounds = [transformed_bottom_left, transformed_top_right]))

fig.add_child(m)

# Task 2 - Machine Learning Model
After creating and visualizing the training data we are ready to train our model! We won't go into too much details about machine learning theory, but if you are familiar you can experiment a bit. The default configuration should get you a decent model though.

In the next cell you can tweak some hyperparameters, but make sure the training doesn't take too long. The default configuration should take about ~15 minutes to execute.

While training some statistics about the training is showing. They can be a little bit confusing, so it's not  These are;

 - Loss: A measurement of how wrong the model is. The lower the loss is, the better. If the loss is 0, the model is "perfect". A model tries to minimize this value.
 - Binary Accuracy: A measurement of how many of the predicted pixels are inside a building. It's a number between 0 and 1, where higher is better. 1 means all the pixels the model says are within a building is actually within a building. But keep in mind even if the number is 1, the model might not have made predictions for all pixels in all buildings...
 - IoU: Intersection over Union. A measurement of how much of the estimated area overlaps with a building. It's a number between 0 and 1, where higher is better. 1 means the model is fitting the bounding box of all buildings "perfectly".
 - IoU_fz: 
 - IoU_point_[5-9]: 
 - val_x: The validation equivalent of whatever x is. X could be loss, IoU, etc.

In [None]:
from kartAI.kartai.tools.train import train

train_args = {
      "features": 32,
      "depth": 4,
      "optimizer": "RMSprop",
      "batch_size": 8,
      "model": "unet",
      "loss": "binary_crossentropy",
      "activation": "relu",
      "epochs": 20
}


train(
      checkpoint_name="some_checkpoint",
      dataset_name=["test"],
      input_generator_config_path="kartAI/config/ml_input_generator/ortofoto.json",
      save_model=False,
      train_args=train_args,
      checkpoint_to_finetune=False
)


# Task 3 - Evaluation and Inference
For the last part we will use our trained machine learning model and try to find buildings in a new set of images we haven't seen so far. The next cell runs predictions on the test portion of the downloaded training data - which is data the model is not trained on. Some stats from the predictions will show up;



In [None]:
# TODO: Skal vi ha med dette?

import os
import json
from kartAI.env import get_env_variable
from kartAI.kartai.tools.predict import predict_and_evaluate

created_datasets_dir = os.path.join(get_env_variable(
    'created_datasets_directory'), "test")

checkpoint_path = os.path.join(get_env_variable(
    'trained_models_directory'), 'some_checkpoint.h5')

with open("kartAI/config/ml_input_generator/ortofoto.json", encoding="utf8") as config:
    datagenerator_config = json.load(config)

predict_and_evaluate(
    created_datasets_dir,
    datagenerator_config,
    "some_checkpoint",
    True,
    True
)

# Task 3 - Evaluation and Inference
For the last part we will use our trained machine learning model and try to find buildings in a new set of images we haven't seen so far. The next cell runs predictions on 

In [None]:
from kartAI.kartai.dataset.create_building_dataset import produce_vector_buildings, run_ml_predictions
from kartAI.kartai.utils.config_utils import read_config
from kartAI.kartai.utils.crs_utils import get_projection_from_config_path
from kartAI.kartai.utils.geometry_utils import parse_region_arg
from kartAI.kartai.utils.prediction_utils import get_raster_predictions_dir, get_vector_predictions_dir

geom = parse_region_arg("kartAI/training_data/regions/small_test_region.json")

projection = get_projection_from_config_path("kartAI/config/dataset/bygg.json")

config = read_config("kartAI/config/dataset/bygg.json")

run_ml_predictions(
    "some_checkpoint", 
    "small_test_region", 
    projection,
    config=config, 
    geom=geom, 
    batch_size=200, 
    skip_data_fetching=False,
    save_to="local", 
    num_processes=1
)

vector_output_dir = get_vector_predictions_dir("small_test_region", "some_checkpoint")
raster_predictions_path = get_raster_predictions_dir("small_test_region", "some_checkpoint")

produce_vector_buildings(
    vector_output_dir, 
    raster_predictions_path, 
    config, 
    200, 
    "small_test_region_some_checkpoint", 
    save_to="local"
)

In [None]:
import folium
import geopandas as gp

polygon_25832 = gp.read_file("results/small_test_region/some_checkpoint/vector/raw_predictions_0.json")
polygon_4326 = polygon_25832.to_crs(4326)

fig = folium.Figure(width=800, height=400)
map = folium.Map(location=transformer.transform(618200.0, 6669700), zoom_start=14)
folium.GeoJson(data=polygon_4326["geometry"]).add_to(map)
fig.add_child(map)

In [None]:
from kartAI.kartai.dataset.create_building_dataset import run_ml_predictions
from kartAI.kartai.tools.predict import create_contour_result
from kartAI.kartai.utils.config_utils import read_config
from kartAI.kartai.utils.crs_utils import get_projection_from_config_path
from kartAI.kartai.utils.geometry_utils import parse_region_arg
from kartAI.kartai.utils.prediction_utils import get_contour_predictions_dir, get_raster_predictions_dir

geom = parse_region_arg("kartAI/training_data/regions/small_test_region.json")

projection = get_projection_from_config_path("kartAI/config/dataset/bygg.json")

config = read_config("kartAI/config/dataset/bygg.json")

run_ml_predictions(
    "some_checkpoint", 
    "small_test_region", 
    projection,
    config=config, 
    geom=geom, 
    batch_size=200, 
    skip_data_fetching=False,
    save_to="local", 
    num_processes=1
)

raster_output_dir = get_raster_predictions_dir("small_test_region", "some_checkpoint")
contour_output_dir = get_contour_predictions_dir("small_test_region", "some_checkpoint")

print("---> Creating contour dataset from rasters")

contour_levels = [0.3, 0.4, 0.5, 0.6, 0.8, 0.9, 1]
create_contour_result(raster_output_dir, contour_output_dir, projection, contour_levels)

print("==== Contour dataset created ====")


In [None]:
import folium
import geopandas as gp

# NB!!! If you look at more than ~5000 LineStrings, the map might crash.
contour_25832 = gp.read_file("results/small_test_region/some_checkpoint/contour/complete_contour.json")
contour_4326 = contour_25832.to_crs(4326)[0:2500]

figure = folium.Figure(width=800, height=400)
map = folium.Map(location=transformer.transform(618200.0, 6669700), zoom_start=14)
folium.GeoJson(data=contour_4326["geometry"]).add_to(map)
figure.add_child(map)