# The preparation
After downloading the data, unpack it and move to any preferred destination. For this example we will be interested only in stage1_train and stage1_test subdirectories, thus other files could be put aside. Let's take a look at the exemplary image.

In [1]:
from pathlib import Path
import os
import pandas as pd
cwd = Path.cwd()
while cwd.stem != "pyplatypus":
    cwd = cwd.parent
os.chdir(cwd)

In [2]:
cloud_path = "examples/data/38-Cloud/38-Cloud_training/train_gt/"
mask_paths = [os.path.join(cloud_path, mp) for mp in os.listdir(cloud_path)]
red_images_paths = [mp.replace('gt', 'red') for mp in mask_paths]
green_images_paths = [mp.replace('gt', 'green') for mp in mask_paths]
blue_images_paths = [mp.replace('gt', 'blue') for mp in mask_paths]
nir_images_paths = [mp.replace('gt', 'nir') for mp in mask_paths]
config_df = pd.DataFrame({
    'images': [r + ';' + g + ';' + b + ';' + n for r, g, b, n in zip(red_images_paths, green_images_paths, blue_images_paths, nir_images_paths)],
    'masks': mask_paths
})
config_df.iloc[0:5040, :].to_csv('examples/30_clouds_train.csv', index=False)
config_df.iloc[5040:6720, :].to_csv('examples/30_clouds_validation.csv', index=False)
config_df.iloc[6720:, :].to_csv('examples/30_clouds_test.csv', index=False)

Let's now inspect the input message that we are to send to PlatypusSolver in order to run it.

In [3]:
import yaml
import json
with open(r"examples/claud_38_config.yaml") as stream:
    config = yaml.safe_load(stream)
    print(json.dumps(config, indent=4, sort_keys=True))

{
    "object_detection": null,
    "semantic_segmentation": {
        "data": {
            "colormap": [
                [
                    0,
                    0,
                    0
                ],
                [
                    255,
                    255,
                    255
                ]
            ],
            "column_sep": ";",
            "mode": "config_file",
            "shuffle": false,
            "subdirs": [
                "images",
                "masks"
            ],
            "train_path": "examples/30_clouds_train.csv",
            "validation_path": "examples/30_clouds_validation.csv"
        },
        "models": [
            {
                "activation_layer": "relu",
                "augmentation": {
                    "Blur": {
                        "always_apply": false,
                        "blur_limit": 7,
                        "p": 0.5
                    },
                    "Flip": {
                        "

What might have struck you is that the config is organized so that it might potentially tell the Solver to train multiple models while using a complex augmentation pipeline and loss functions coming from the rather large set of ones available within the PyPlatypus framework.

![68747470733a2f2f69322e77702e636f6d2f6e657074756e652e61692f77702d636f6e74656e742f75706c6f6164732f552d6e65742d6172636869746563747572652e706e673f73736c3d31.webp](attachment:68747470733a2f2f69322e77702e636f6d2f6e657074756e652e61692f77702d636f6e74656e742f75706c6f6164732f552d6e65742d6172636869746563747572652e706e673f73736c3d31.webp)

# The model

The models present in the PyPlatypus segmentation submodule are U-Net based.

U-Net was originally developed for biomedical data segmentation. As you can see in the picture above architecture is very similar to autoencoder and it looks like the letter U, hence the name. Model is composed of 2 parts, and each part has some number of convolutional blocks (3 in the image above). Number of blocks will be hyperparameter in our model.

To build a U-Net model in platypus use u_net function. You have to specify:

* Number of convolutional blocks,
* Input image height and width - it need not to be in the form 2^N, as we added the generalizng layer.
* Indicator determining if the input image will be loaded as grayscale or RGB.
* Number of classes - in our case we have only 2 (background and nuclei).
* Additional arguments for CNN such as: number of filters, dropout rate etc.

Hereafter the models' building process is rather straightforward.

In [4]:
from pyplatypus.solvers.platypus_cv_solver import PlatypusSolver


ps = PlatypusSolver(
    config_yaml_path=Path("examples/claud_38_config.yaml")
)
ps.train()

5040 images detected!
Set 'steps_per_epoch' to: 158
1680 images detected!
Set 'steps_per_epoch' to: 53


2022-10-12 21:24:22.953609: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-12 21:24:22.958717: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-12 21:24:22.958930: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-12 21:24:22.959592: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags

Epoch 1/100


2022-10-12 21:24:31.707394: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8101


Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
5040 images detected!
Set 'steps_per_epoch' to: 630
1680 images detected!
Set 'steps_per_epoch' to: 210
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100


Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100


Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100


Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100


In [None]:
x = ps.sample_generator('38c_u_net_plus_plus')
x[1][0, :, :, 1].max()

In [None]:
x[0].shape

In [None]:
import matplotlib.pyplot as plt
plt.matshow(x[0][0, :, :, 3])

In [None]:
x = ps.sample_generator('bulls_u_net', training_augmentation = True)
print(x.shape)
import matplotlib.pyplot as plt
plt.matshow(x[0,:,:,:])

In [None]:
ps.evaluate_models('data/config_file_train.csv')

In [None]:
from pyplatypus.segmentation.models.stacked_ensemble import stacked_ensembler
from tensorflow.keras.utils import plot_model
model1 = ps.cache["semantic_segmentation"]['bulls_u_net']['model']
model2 = ps.cache["semantic_segmentation"]['bulls_u_net_plus_plus']['model']
members = [model1, model2]
ens = stacked_ensembler(members, 2, 300, 300, 16, (3, 3), 2, True, 0.1, True, "he_normal", True, "relu").model
plot_model(ens, show_shapes=True, to_file='model_graph.png')
ens.summary()

In [None]:
ps.cache['semantic_segmentation']['bulls_u_net']['training_history']

# Predictions

Only after do we train the models, we can easily produce predicted masks based on the validation set or whatever data that we would like to use, just make sure it is organized as in the train/validation/test sets.

In [None]:
from glob import glob
from random import sample
from PIL import Image
import numpy as np

def prepare_masks(masks, np_original_size, masks_frame):
    for mask in masks:
        loaded_mask_ = plt.imread(mask)
        if loaded_mask_.shape != np_original_size:
            rotated_loaded_mask_ = np.rot90(loaded_mask_)
            masks_frame += rotated_loaded_mask_
        else:
            masks_frame += loaded_mask_
    return masks_frame

def sample_and_plot_predictions(data_path: Path, model_name: str, n=3):
    validation_images = glob(str(data_path/Path("stage1_validation/*")))
    # Sample size
    n_max = len(validation_images)
    n=n_max if n > n_max else n
    validation_images = sample(validation_images, n)
    for img_path in validation_images:
        img_name = img_path.split("/")[-1:][0]
        img = glob(f"{img_path}/images/*.png")[0]
        predictions = glob(f"{img_path}/predicted_masks/{model_name}_predicted_mask.png")[0]
        masks = glob(f"{img_path}/masks/*.png")
        # Load images
        img_loaded = Image.open(img) 
        original_size = img_loaded.size
        np_original_size = tuple(reversed(original_size))
        # Load masks and squeeze them into one frame
        masks_frame = np.zeros(np_original_size)
        masks_frame = prepare_masks(masks, np_original_size, masks_frame)
        # Load predictions
        predictions_loaded = Image.open(predictions)
        original_size_scaled = (np.array(original_size)/2).astype(int)
        predictions_scaled = predictions_loaded.resize(original_size_scaled)
        # Plot image alongside true and predicted masks
        f, axarr = plt.subplots(1,3)
        plt.title(f"Image and predictions: {img_name}")
        axarr[0].imshow(img_loaded)
        axarr[1].imshow(masks_frame)
        axarr[2].imshow(predictions_scaled)

In [None]:
# Clean the results of former runs
from glob import glob
from shutil import rmtree
masks = glob(str(data_path/"stage1_validation/**/predicted_*"))
for mask in masks:
    rmtree(mask)


In [None]:
# When the custom_data_path is set to None, the validation data will be used.
# If that is not the intention of yours, feel free to point the engine to any other direction.pyplatypus.com
ps.produce_and_save_predicted_masks_for_model(model_name="bulls_u_net", custom_data_path=None)
ps.produce_and_save_predicted_masks_for_model(model_name="bulls_u_net", custom_data_path='data/config_file_test.csv')

In [None]:
# When the custom_data_path is set to None, the validation data will be used.
# If that is not the intention of yours, feel free to point the engine to any other direction.pyplatypus.com
ps.produce_and_save_predicted_masks_for_model(model_name="bulls_u_net_plus_plus", custom_data_path=None)
ps.produce_and_save_predicted_masks_for_model(model_name="bulls_u_net_plus_plus", custom_data_path='data/config_file_test.csv')

In [None]:
sample_and_plot_predictions(data_path, model_name="bulls_u_net", n=10)