# Case Study Part 2

## Todos

- ~~Write Introduction~~
- ~~Write Data formats~~
- ~~Load data~~ 
- ~~Write function to print validation images~~
- ~~Write function to add found boundaries to images~~
- Write function to create predictions
<br>

**Ideas**
- Data Augmentation: Shifting, rotating and so on to increase amount of training data

## Introduction

For one of their projects an insurance company wants to get an estimate of the assets of their clients. For this they want to scan satellite images for four different objects: ponds, pools, photovoltaic systems and trampolines.

### Available data
To solve the task the following data are available.

**Training data**
<br>
The training data consist of five different groups. In the beginning the data themselves are not labels (as in labeled and already in a dataframe form). They are just .png and .jpg files in different subfolders. All training images are of size 256x256 pixels. The following five groups are available:
- Trampoline images: 140 images that contain a trampoline.
- Pond images: 9 images that contain a pond.
- Pool images: 20 images that contain a pool.
- Solar images: 37 images that contain a photovoltaic system.
- Background images: 3110 images that do not contain any of the above-mentioned items.

This sums up to 3316 labeled training patches. 

<br>

**Unlabeled training data**
<br>
In addition to the training patches, 20 validation images are available. They are of size 8000x8000 each and can be used for the manual validation of an approach.

<br>

**Validation data**
<br>
The validation data set contains 3 images of size 8000x8000. In addition there is a csv file that contains the coordinates of an item and a surrounding bounding box of various items and their respective labels.
Thus, one prediction should have the following shape (1,7) with column 1 as strings and the rest as integer values:
|label|y_target|x_target|y_upper_left|x_upper_left|y_lower_right|x_lower_right
|---|---|---|---|---|---|---|
|trampoline|268|278|140|150|396|406|
|...|

<br>

**Predictions** <br>
A prediction should represent a bounding box around a target entity. One prediction consist of the label and the four coordinates that describe the bounding box. <br>
<span style="color:red">The bounding box must be of size 256 x 256</span>. <br>
A prediction is considered correct if the predicted bounding box overlaps at least 50% with the ground truth bounding box. <br>
This overlap is computed as the area of the intersection of the boxes divided by the union of the areas of the boxes: IoU = intersection_area / union_area:

<img src="https://www.baeldung.com/wp-content/uploads/sites/4/2022/04/fig1.png" alt="Intersection over Union for Object Detection | Baeldung on Computer Science">

Our predictions thus must each be of shape (1,5):
|label|y_upper_left|x_upper_left|y_lower_right|x_lower_right
|---|---|---|---|---|
|trampoline|140|150|396|406|
|...|


## Data retrieval

This section loads the training patches into a numpy array and creates the corresponding label vector.
The result are X_train, X_val, y_train and y_val. 

The images are converted to RGB values, which is why there are 3 channels in the training data.

The training data sets are of dimension (number_of_instances x height x width x 3 channels). 
The label vectors only have one dimension (number_of_instances).

In [2]:
from PIL import Image
import numpy as np
import os
import PIL

In [3]:
def loadImagesToArray(path:str):
    '''
    Loads all .jpg and .png files from the specified directory.\n
    Each image will be converted into an array of size (height x width x channels).\n
    The return numpy array is of dimensions (numberOfImages x height x width x channels).\n
    '''
    imagesArray = []

    counter = 0
    for file in os.scandir(path):
        filepath = os.fsdecode(file)
        if(filepath.endswith(".jpg") or filepath.endswith(".png")):
                imgArray = np.array(Image.open(filepath))
                imagesArray.append(imgArray)
                counter += 1                  
    return np.array(imagesArray)

def loadTrainingDataAndLabels(path, subdirectories):
    '''
    Loads the training data as numpy arrays and creates the corresponding labels.\n
    For this to work, the images should be under the folder <path> in separate subdirectories, one for each class.\n
    The labels will be inferred from the names of the subdirectories. \n

    Returns the training data as a numpy array with the dimensions (number_of_images x height x width x channels).\n
    Returns the labels as a numpy array with the dimensions (number_of_images).
    '''

    training_data = []
    labels = []

    for directory in subdirectories:
        images_array = loadImagesToArray(os.path.join(path, directory))
        training_data.extend(images_array)

        labels.extend(np.full(len(images_array), directory))

    training_data_array = np.array(training_data)
    print("Shape of training_data: ", training_data_array.shape)
    labels_array = np.array(labels)
    print("Shape of labels: ", labels_array.shape)
    
    return training_data_array, labels_array
    

In [4]:
training_data, labels = loadTrainingDataAndLabels("training_patches/", ["background", "ponds", "pools", "solar", "trampoline"])


from sklearn import preprocessing

le = preprocessing.LabelEncoder()
le.fit(labels)
labels_categorical = le.transform(labels)

Shape of training_data:  (3316, 256, 256, 3)
Shape of labels:  (3316,)


In [5]:
from sklearn.model_selection import train_test_split

X_train, X_val, y_train, y_val = train_test_split(training_data, labels_categorical, test_size=0.33, random_state=1, stratify=labels)

## Data Augmentation

Rotation

In [None]:
import PIL

def rotateDirectory(directory,folder):

    directory = directory + folder
    
    for file in os.scandir(directory):
        filepath = os.fsdecode(file)      	
        pathname, extension = os.path.splitext(filepath)
        
        if(filepath.endswith(".jpg") or filepath.endswith(".png")):
                img = Image.open(filepath)
                img_180 = img.rotate(180, expand = 0)
                img_90 = img.rotate(90, expand = 0)
                img_270 = img.rotate(270, expand = 0)
                pathname, extension = os.path.splitext(filepath)
                filename = str(pathname.split('/')[2])


                img_180.save("training_patches2/"+ folder +"/" + filename + "_180" +".png")
                img_90.save("training_patches2/"+ folder +"/" +filename + "_90" +".png")
                img_270.save("training_patches2/"+ folder +"/" +filename + "_270" +".png")


rotateDirectory("training_patches/", "solar")
rotateDirectory("training_patches/", "ponds")
rotateDirectory("training_patches/", "trampoline")
rotateDirectory("training_patches/", "pools")

## Model Selection

In [6]:
from tensorflow.keras.utils import to_categorical
y_train = to_categorical(y_train, dtype="int8")
np.unique(y_train, axis=0)


array([[0, 0, 0, 0, 1],
       [0, 0, 0, 1, 0],
       [0, 0, 1, 0, 0],
       [0, 1, 0, 0, 0],
       [1, 0, 0, 0, 0]], dtype=int8)

In [7]:
from tensorflow.keras.layers import InputLayer, Dense, Flatten, Conv2D, MaxPool2D
from tensorflow import keras
model = keras.models.Sequential()
model.add(InputLayer(input_shape=(256,256,3)))
model.add(Conv2D(filters=10, kernel_size=(3,3), strides=1, padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(20, activation="relu"))
model.add(Dense(20, activation="relu"))
model.add(Dense(5, activation="softmax"))

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 256, 256, 10)      280       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 128, 128, 10)      0         
_________________________________________________________________
flatten (Flatten)            (None, 163840)            0         
_________________________________________________________________
dense (Dense)                (None, 20)                3276820   
_________________________________________________________________
dense_1 (Dense)              (None, 20)                420       
_________________________________________________________________
dense_2 (Dense)              (None, 5)                 105       
Total params: 3,277,625
Trainable params: 3,277,625
Non-trainable params: 0
______________________________________________

In [8]:
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])


history = model.fit(X_train, 
                    y_train, 
                    epochs=20,
                    batch_size=64,
                    validation_split=0.1,
                   )




Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [25]:
preds = model.predict(X_val)
from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score
preds_argmaxed = np.apply_along_axis(np.argmax, 1, preds)
f1_score(y_val,preds_argmaxed, average='macro'), accuracy_score(y_val, preds_argmaxed) 





(0.3066859066859067, 0.9598173515981735)

In [26]:


from sklearn.metrics import confusion_matrix

confusion_matrix(y_val, preds_argmaxed)



array([[1027,    0,    0,    0,    0],
       [   1,    0,    0,    0,    2],
       [   0,    0,    0,    0,    7],
       [   2,    1,    0,    0,    9],
       [  22,    0,    0,    0,   24]])

## Model Training

## Model Validation

This section contains code that is needed to create predictions, save them and to print the training / validation images - either with or without the predicted bounding boxes.

In [27]:
from tensorflow import keras
from PIL import Image
from keras.applications.inception_v3 import preprocess_input
from keras import Model
import time
import pandas as pd

In [28]:
def savePredictionToCsv(predictionDataframe: pd.DataFrame, filepath:str):
    '''
    Saves a dataframe containing the prediction for a single image to a CSV file.

    @predictionDataFrame - The dataframe that contains the predictions and should be saved.\n
    @path - The path under which the CSV file should be saved.
    @filename - The name under which the CSV file should be saved.
    '''
    #filepath =  os.path.splitext(filepath)[2]+"_prediction.csv"
    filepath = "./03_validation_results/" +str(filepath.split('/')[2]) +"_prediction.csv"
    predictionDataframe.to_csv(filepath, sep=",", index=False)



def makePredictions(path:str, convnet:keras.Model, stepSize:int, windowSize):
    '''
    Traverses a folder that contains images for which predictions should be made.\n
    Creates a separate prediction CSV file for each image.

    @path - The path containing the images for which predictions should be created.
    '''

    # For each image in path
        # Perform sliding Window approach
            # For each slide
            # Store x_upper_left, y_upper_left, x_lower_right, y_lower_right
            # Run image through convnet
            # Run classifier on output
            # If prediction != 'background'
            # Store prediction in temp array
        # Run non-max suppression to filter predictions
        # Store predictions in csv
    for file in os.scandir(path):
        filepath = os.fsdecode(file)
        
        if(("annotated" in filepath) or not (filepath.endswith(".jpg") or filepath.endswith(".png"))):
           continue
        
        createPredictionsForImage(filepath=filepath, convnet=convnet, stepSize=stepSize, windowSize=windowSize)


def createPredictionsForImage(filepath:str, convnet:keras.Model, stepSize:int, windowSize):
    '''
    Creates the prediction CSV for one image.
    '''

    print("\nCreating predictions for file: ", filepath)
    create_predictions_start_time = time.time()
    #image = Image.open(filepath)
    imgArray = np.array(Image.open(filepath))
    
    patch_coordinates= []  
    preprocessed_patches = []
    counter = 0
    patch_preprocessing_start_time = time.time()
    
    print("Starting sliding window to create patches of size: ", windowSize[0], "x", windowSize[1], ".")
    for(x,y,patch) in sliding_window(imageArray=imgArray, stepSize=stepSize, windowSize=windowSize):
        if counter > 0 and counter%10000 == 0:
            print("Still processing, reached patch", counter)
            print("Execution time for the last 10.000 patches: ", time.time()-patch_preprocessing_start_time, " seconds.")
            patch_preprocessing_start_time = time.time()
            print("Processing continues...")
        
        # Skip if the size of a patch doesn't match the specified windowSize
        if patch.shape[0] != windowSize[0] or patch.shape[1] != windowSize[1]:
            continue
    
        # Save coordinates which are needed for a prediction
        x_upper_left = x
        y_upper_left = y
        x_lower_right = x+windowSize[0]
        y_lower_right = y+windowSize[1]
        x_center = x+128
        y_center = y+128

        # Run the patch through the classification
        preprocessed_patch = preprocess_input(patch)
        preprocessed_patches.append(preprocessed_patch)
        patch_coordinates.append([y_upper_left, x_upper_left, y_lower_right, x_lower_right])
        counter +=1
    
    print("Finished preprocessing of the patches.")
    preprocessed_patches = np.array(preprocessed_patches)
    patch_coordinates = np.array(patch_coordinates)
    print("Shape of preprocessed patches: ", preprocessed_patches.shape)
    print("Shape of patch coordinates: ", patch_coordinates.shape, "\n")

    # Get all predictions
    print("Running patches through ConvNet and using classifier to predict labels...")
    prediction_start_time = time.time()
    predicted_labels_encoded = pd.DataFrame(convnet.predict(preprocessed_patches), columns=["background", "ponds", "pools", "solar", "trampoline"])
    predicted_labels= predicted_labels_encoded.idxmax(1)
    
    print("Finished predictions, execution time: ", time.time()-prediction_start_time, " seconds.\n")
    
    print("Shape of predicted_labels: ", labels.shape)
    print("Shape of patch_coordinates: ", patch_coordinates.shape)

    # Combining patch coordinates and predictions
    predictions_array=np.c_[predicted_labels, patch_coordinates]

    print("Shape of combined predictions array (unfiltered): ", predictions_array.shape)

    predictions_dataframe = pd.DataFrame(data=predictions_array, columns=["label","y_upper_left", "x_upper_left", "y_lower_right", "x_lower_right"])
    # Filter all predictions that contain the label "background"
    predictions_dataframe = predictions_dataframe[predictions_dataframe.label != "background"]
    print("Description of the predictions dataframe: ", predictions_dataframe.describe())

    # Save prediction to csv
    savePredictionToCsv(predictionDataframe=predictions_dataframe, filepath=filepath)
    print("Saved predictions for file: ", filepath, "\n")
    print("Elapsed time: ", time.time()-create_predictions_start_time, " seconds.\n")

    
def sliding_window(imageArray, stepSize:int, windowSize=(256,256)):
    for y in range(0, imageArray.shape[0], stepSize):
	    for x in range(0, imageArray.shape[1], stepSize):
			# yield the current window
		    yield (x, y, imageArray[y:y + windowSize[1], x:x + windowSize[0]])

In [32]:
#Execute Cell
makePredictions("./02_validation_data_images/", convnet=model, stepSize=64, windowSize=(256,256))


Creating predictions for file:  ./02_validation_data_images/DQIMQN.png
Starting sliding window to create patches of size:  256 x 256 .
Still processing, reached patch 10000
Execution time for the last 10.000 patches:  11.607695817947388  seconds.
Processing continues...
Still processing, reached patch 20000
Execution time for the last 10.000 patches:  17.799495935440063  seconds.
Processing continues...
Still processing, reached patch 30000
Execution time for the last 10.000 patches:  16.359992027282715  seconds.
Processing continues...
Still processing, reached patch 40000
Execution time for the last 10.000 patches:  36.129801988601685  seconds.
Processing continues...
Still processing, reached patch 50000
Execution time for the last 10.000 patches:  26.264587879180908  seconds.
Processing continues...
Finished preprocessing of the patches.


: 

: 

In [47]:
import tensorflow as tf
import numpy as np

def nonMaxSuppressBoundingBoxes(path:str, iou_threshold:float, score_threshold:float):
    '''
    Loads the csv files from the path and performs the non-max-suppression to reduce the amount of predictions to one per detected object.\n
    '''

    for file in os.scandir(path):
        filepath = os.fsdecode(file)
        
        if(not(filepath.endswith(".csv")) or ("suppressed" in filepath)):
           continue

        print("Performing non-max suppresion on file ", filepath)
        originalPredictions = np.genfromtxt(filepath, delimiter=',', skip_header=1)
        originalPredictions_df= pd.read_csv(filepath)
        print("OriginalPredictions shape: ", originalPredictions[0])

        labels = originalPredictions_df["label"]
        coordinates = originalPredictions[:, 2:6].astype(int)
        scores = originalPredictions[:,0]
        print("Shape of coordinates: ", coordinates.shape)
        print("Shape of labels: ", scores.shape)
       
        selectedBoxes_indices = tf.image.non_max_suppression(boxes=coordinates, scores=scores, max_output_size=200, iou_threshold=iou_threshold, score_threshold=score_threshold )
        print(selectedBoxes_indices.shape)

        selected_boxes = tf.gather(coordinates, selectedBoxes_indices).numpy()
        selected_labels = np.array([x.numpy().decode() for x in tf.gather(labels, selectedBoxes_indices)])
        selected_scores = tf.gather(scores, selectedBoxes_indices).numpy()
        print(selected_labels)
        print("Shape of selected labels: ", selected_labels.shape)
        print("Shape of selected boxes: ", selected_boxes.shape)

        predictions = pd.DataFrame(np.c_[selected_labels, selected_boxes], columns=["label", "y_upper_left", "x_upper_left", "y_lower_right", "x_lower_right"])
        print(predictions)
        new_filepath =  os.path.splitext(filepath)[0]+"_suppressed.csv"
        predictions.to_csv(new_filepath, sep=",", index=False)
        

nonMaxSuppressBoundingBoxes("03_validation_results/", iou_threshold=0.5, score_threshold=0.75)


Performing non-max suppresion on file  03_validation_results/DQIMQN_prediction.csv
OriginalPredictions shape:  [ nan 320. 384. 192. 256. 448. 512.]
Shape of coordinates:  (332, 4)
Shape of labels:  (332,)
(0,)
[]
Shape of selected labels:  (0,)
Shape of selected boxes:  (0, 4)
Empty DataFrame
Columns: [label, y_upper_left, x_upper_left, y_lower_right, x_lower_right]
Index: []


In [30]:
import gc
preprocessed_patches = None
del preprocessed_patches
patch_coordinates = None
del patch_coordinates
X_train = None
del X_train
X_val = None 
del X_val
y_train = None
del y_train
y_val = None
training_data = None
del training_data
X_train_preprocessed = None
del X_train_preprocessed
gc.collect()



1802

In [35]:
makePredictions("./02_validation_data_images/", convnet=model, stepSize=64, windowSize=(256,256))


Creating predictions for file:  ./02_validation_data_images/DQIMQN.png
Starting sliding window to create patches of size:  256 x 256 .
Still processing, reached patch 10000
Execution time for the last 10.000 patches:  6.959197044372559  seconds.
Processing continues...
Finished preprocessing of the patches.
Shape of preprocessed patches:  (14884, 256, 256, 3)
Shape of patch coordinates:  (14884, 6) 

Running patches through ConvNet and using classifier to predict labels...
Finished predictions, execution time:  182.3661322593689  seconds.

Shape of predicted_labels:  (3316,)
Shape of patch_coordinates:  (14884, 6)
Shape of combined predictions array (unfiltered):  (14884, 7)
Description of the predictions dataframe:               label  y_center  x_center  y_upper_left  x_upper_left  \
count          332       332       332           332           332   
unique           3        94        85            94            85   
top     trampoline      6528      1280          6400          

## Model Validation (Unlabeled Training Data)

This section prints the unlabeled training images along with bounding boxes that are retrieved from the corresponding CSV files.

In [31]:
import csv, json, glob, random, os
from shapely.geometry import Polygon

def calc_performance(gt_path, pred_path, image_name=None, verbose=0):
    ground_truth = []
    predictions = []

    # Create default performance values
    performances = {
        'file': image_name,
        'tp': 0,
        'fn': 0,
        'fp': 0,
        'f1': 0,
    }

    ## Load ground truth
    with open(gt_path) as f:
        reader = csv.DictReader(f)
        for row in reader:
            row = {k: int(row[k]) if k != 'label' else row[k] for k in row.keys()}
            ground_truth.append(row)

    ## load predictions if path exists
    if os.path.exists(pred_path):
        with open(pred_path) as f:
            reader = csv.DictReader(f)
            for row in reader:
                row = {k: int(row[k]) if k != 'label' else row[k] for k in row.keys()}
                predictions.append(row)

    # Number of false positives equals number of left predictions
    performances['fp'] = max(len(predictions) - len(ground_truth), 0)

    for j, gt in enumerate(ground_truth):
        gt_box = Polygon([(gt['y_upper_left'],  gt['x_upper_left']),
                          (gt['y_upper_left'],  gt['x_lower_right']),
                          (gt['y_lower_right'], gt['x_lower_right']),
                          (gt['y_lower_right'], gt['x_upper_left'])])

        if gt_box.area != (256. * 256.):
            print(f'### Warning {j}: false ground truth shape of {gt_box.area} detected in {image_name}!')
            print(gt['y_lower_right'] - gt['y_upper_left'], gt['x_lower_right'] - gt['x_upper_left'])

        best_found_iou = (None, 0.) # (idx, IoU)
        for i, pred in enumerate(predictions):
            if gt['label'] == pred['label']:
                pred_box = Polygon([(pred['y_upper_left'],  pred['x_upper_left']),
                                    (pred['y_upper_left'],  pred['x_lower_right']),
                                    (pred['y_lower_right'], pred['x_lower_right']),
                                    (pred['y_lower_right'], pred['x_upper_left'])])

                if pred_box.area != (256. * 256.):
                    print(f'### Warning {i}: false predicted shape of {pred_box.area} detected in {image_name}!')
                    print(pred['y_lower_right'] - pred['y_upper_left'], pred['x_lower_right'] - pred['x_upper_left'])

                ## Calculate IoU
                next_iou = (gt_box.intersection(pred_box).area + 1) / (gt_box.union(pred_box).area + 1)

                # If the next found IoU is larger than the previous found IoU -> override
                if next_iou > best_found_iou[1]:
                    best_found_iou = (i, next_iou)

        ## Append metric. If IoU is larger 0.5, then its a true positive, else false negative
        if best_found_iou[0] is not None and best_found_iou[1] >= 0.5:
            del predictions[best_found_iou[0]] # Remove prediction from list!
            performances['tp'] += 1 # Increase number of True Positives
            if verbose == 1:
                print(f'Found correct prediction with IoU of {round(best_found_iou[1], 3)} and label {gt["label"]}!')
        else:
            performances['fn'] += 1 # Increase number of False Negatives
            if verbose == 1:
                print(f'Found false prediction with IoU of {round(best_found_iou[1], 3)} and label {gt["label"]}!')

    ## Calculate F1-Score
    performances['f1'] = (performances['tp'] + 1e-8) / \
                                    (performances['tp'] + 0.5 * (performances['fp'] + performances['fn']) + 1e-8)
    return performances


path = '02_validation_data_images' # Change if needed

## Iterate over all validation images
for image_path in glob.glob(path + '/*.png'):
    image_name = image_path.split('/')[-1]
    gt_path = image_path[:-4] + '_true.csv' # Ground Truth path
    pred_path = image_path[:-4] + '_prediction.csv' # Prediction path
    performance = calc_performance(gt_path, pred_path, image_name)
    print(performance)


{'file': 'DQIMQN.png', 'tp': 0, 'fn': 30, 'fp': 0, 'f1': 6.666666662222222e-10}


## Printing the validation images

This section prints the public test data images along with their respective predictions / bounding boxes.