<a href="https://colab.research.google.com/github/lgiesen/forest-height/blob/main/forest_height.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Forest Height Estimation
Objective of this notebook:
- Train model
- Evaluate model
- Export model for prediction

Prerequisite: 
- Training data is generated in [data_exploration.ipynb](https://github.com/lgiesen/forest-height/blob/main/data_exploration.ipynb)
- Prediction of the dataset is realized in a python file (!to be created), but it is tested in a Jupyter Notebook (!to be created)

### To Dos

- Data Augmentation (flip image at least by 90, 180 and 270°)

### Tasks from Presentation
- Split the dataset into a training and validation set. Train a first regression model on the provided training dataset.
- Try out different architectures, hyperparameter.
- Now, you are supposed to apply your model to the test set. First, you have to
implement the sliding window approach in combination with non-max suppression. Note: Instead of choosing the non-max suppression, you can choose a different
approach or come up with your own.
- After you have found your best performing setup, apply your model to the unlabeled data set. You can check for plausibility by visually inspecting the output or choose to reuse the predictions for increasing the number of training observations.

- implement 1 ML Model (Logistic Regression, Boosted Trees, Random Forest, ...) and one CNN Model -> compare them in a poster (I'd use Figma as a tool)
- submission: script that produces a binary output numpy-file (.npy) for every test image, automatically.


### Submission
- The npy-files should have the same height and width as the original satellite image. Your submission should therefore be a zip file containing multiple npy-files of size [1 × width × weight]
- We will use the Mean Absolute Error (MAE) to measure your performance on the hidden test set. For this, we will provide further information at a later time.
- Deadline: 4th July, 11:59pm

- ZIP-file with all the predictions as described above
- ZIP-file with your source code (only the files that are used in your final product)
- A poster (A1-size) as pdf-file

In [1]:
from google.colab import drive
drive.mount ('/content/drive', force_remount=True)
root_path = 'drive/MyDrive/Colab Notebooks/data/'

Mounted at /content/drive


In [2]:
path_images = root_path + 'images/'
path_masks = root_path + 'masks/'

In [7]:
def path_exists(path):
  import os
  return os.path.exists(root_path + path)

In [8]:
import numpy as np
# load exemplary data
sat_path = 'images/image_004.npy'
if path_exists(sat_path):
  satellite = np.load(root_path + sat_path)
  print('satellite:',satellite.shape)
mask_path = 'masks/mask_004.npy'
if path_exists(mask_path):
  mask = np.load(root_path + mask_path)
  mask.shape
  print('mask:',satellite.shape)

satellite: (10, 1024, 1024)
mask: (10, 1024, 1024)


In [9]:
import matplotlib.pyplot as plt
def plot_img(img, is_satellite = True):
    #shape: satellite == (10, 1024, 1024), mask == (1, 1024, 1024)
    if is_satellite:
      # Extract Red, Green, and Blue bands
      red = img[2, :, :]
      green = img[1, :, :]
      blue = img[0, :, :]

      # Normalize the bands to [0, 1] range
      red_norm = (red - red.min()) / (red.max() - red.min())
      green_norm = (green - green.min()) / (green.max() - green.min())
      blue_norm = (blue - blue.min()) / (blue.max() - blue.min())
    
      # Stack the bands to create an RGB image
      scaled_img = np.stack((red_norm, green_norm, blue_norm), axis=-1)
    
    elif not is_satellite:
      scaled_img = (mask - np.min(mask)) / (np.amax(mask) - np.amin(mask))
      # TODO: scale with total max and min of all masks for comparability
      scaled_img = np.squeeze(scaled_img) # remove redundant dimension
      
      
    # Plot the image
    plt.figure(figsize=(10, 10))
    plt.imshow(scaled_img)
    plt.axis('off')
    plt.show()

## Data Preparation

The npy files are combined into a dataset. After the first loading they do not have to be generated anymore.
One npy file might need to be split into smaller images of 256x256 or 512x512 pixels, which then are put back together in the end.

In [10]:
# load dataset
path_train_sat = root_path + "train_satellite.npy"
path_train_masks = root_path + "train_masks.npy"

train_sat = np.load(path_train_sat, allow_pickle=True)
train_masks = np.load(path_train_masks, allow_pickle=True)

In [11]:
# remove drive connection as it is no longer needed
drive.flush_and_unmount()

## Training

### Machine Learning Regressor



Suggestions based on [SciKits advice](https://scikit-learn.org/stable/_static/ml_map.png) on choosing the right model:
- RidgeRegression
- SVR(kernel='linear')
- SVR(kernel='rbf')
- EnsembleRegressors

### Convolutional Neural Network (CNN)

## Evaluation