# Learning Distance


## Neural Network or Multilayer Perceptron
A Multilayer Perceptron consists of an input layer, multiple hiddenlayers and an output layer. The MLP is a feedforward algorithm, because inputs are combined with the initial weights in a weighted sum and subjected to the activation function, just like in the simple Perceptron. But the difference is that each linear combination is propagated to the next layer. Each layer is feeding the next one with the result of their computation. This goes through all the hidden layers to the output layer.

The advantage of a multilayer perceptron compared to a classical perceptron (single layer) is that it is capable of learning a function between input and output that is non linear. Therfore it can learn signifacently more complex functions capturing the relation from input to output data.

This tutorial will give a brief overview into training and testing an MLP and heighlight typical pitfalls that might occure easily.


# Coding: Getting Started
Before you get started you need to follow the descriptions of the Git-Repository and make sure that all required packages are properly installed. It is recommended to work in a virtual environment. Start your virtual environment **before** you launch this jupyter-notebook. Then you may need to change the kernel: *Kernel &rarr; Change kernel &rarr; venv*

In [None]:
import torch
import cv2
import os 
import glob
import argparse
import data_setup
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import mlp_utils as mlp_utils
import train_utils
from tqdm.auto import tqdm
from model import DistNet_MLP as mlp
from matplotlib.pyplot import cm

In [None]:
YOLO_MODEL = './model/yolo_model.pt'
FILE = '../../data/test/img/2023-6-28_17-06-35-343099.png'

PATH_TRAIN = '../../data/train/img'
PATH_VAL = '../../data/val/img'
PATH_TEST = '../../data/test/img'
PATH_CSV_TRAIN = '../../data/train/train.csv'
PATH_CSV_VAL = '../../data/val/val.csv'
PATH_CSV_TEST = '../../data/test/test.csv'

PATH_MODEL_SAVE = './model/mlp_model_trained.pth'
ROBOTS = ["ollie", "grace", "alan", "hermann", "kaethe"]
CAM_ROBOT = "ollie"
MAX_DIST = 5
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
FILE_EXTENSION = "*.png"
CONFIDENCE = 0.7
NUM_EPOCHS = 20
RELOAD_WEIGHTS = True
LR = 0.001

In [None]:
# Loading YOLOv5 Model
model_YOLO = torch.hub.load('ultralytics/yolov5', 'custom', path=YOLO_MODEL, verbose=False)
model_YOLO.conf = CONFIDENCE
model_YOLO.to(DEVICE)

## Part 1.1: Training a small MLP (multi-layer perceptron)

The provided code automatically detects little Lego Robots in an image. For detection we use a fine-tuned YOLOv5 network that estimates bounding boxes, that each capture the rough dimensions of the projected object.
Given the height of an object bounding box, the camera's focal length as well as the actual robot's height we compute the absolut distance between "CAM_ROBOT" and all visible robots in the image.

In [None]:
# Load the training and validation image sets.
file_pattern = os.path.join(PATH_TRAIN, FILE_EXTENSION)
train_images = glob.glob(file_pattern)

file_pattern = os.path.join(PATH_VAL, FILE_EXTENSION)
valid_images = glob.glob(file_pattern)

# Pass images through yolo and get bounding boxes with their corresponding csv distance labels
train_data = mlp_utils.images2data(model_YOLO, train_images, PATH_CSV_TRAIN, verbose = False)
valid_data = mlp_utils.images2data(model_YOLO, valid_images, PATH_CSV_VAL, verbose = False)

In [None]:
# Load the MLP model
model_MLP = mlp.MLP()

# Define optimizer
optimizer = torch.optim.Adam(params=model_MLP.parameters(), lr=LR)
loss_fn = torch.nn.MSELoss()

# Pass the data through the dataloader
train_dataloader, valid_dataloader = data_setup.create_dataloader(train_data, valid_data, batch_size=5)
train_features_batch, train_labels_batch = next(iter(train_dataloader))

In [None]:
# Set training
best_metric = float('inf')  # Initialize with a large value for loss
train_loss_plotting = []
validation_loss_plotting = []

# Training Loop
for epoch in tqdm(range(NUM_EPOCHS)):
    train_loss = train_utils.train_step(model_MLP, train_dataloader, loss_fn, optimizer, DEVICE)
    validation_loss = train_utils.validation_step(model_MLP, valid_dataloader, loss_fn, DEVICE)

    # add train loss fro plotting
    train_loss_plotting.append(train_loss.item())
    validation_loss_plotting.append(validation_loss.item())
    
    ####################################################
    # TODO 1:
    # please fill in the code to save 
    # the currently "best" trained model. 
    # How to decide what the best model is?

    torch.save(model_MLP.state_dict(), PATH_MODEL_SAVE)

    ####################################################

#    if(epoch % 10 == 0):
#        print(f"Epoch: {epoch}\n------")
#        print(f"Train loss: {train_loss:.5f}\n")
        
# plotting stuff
%matplotlib inline
x_err = np.linspace(0, NUM_EPOCHS, NUM_EPOCHS)
plt.plot(x_err, train_loss_plotting, label='train loss')
plt.plot(x_err, validation_loss_plotting, label='val loss')

# Adding labels and title
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('MLP train-loss over epochs')

# Display the plot
plt.legend()

plt.show()

### Question: 
* Can you overfit intentionally to the training data? How does the validation curve will look like? Please verify your assumption.

* Now train a well performing model. Feel free to change parameters like learning rate, number of epochs, number of layers. How do these parameters effect the result?

*Answer:*

## Part 1.2: Inference on a single image

In [None]:
# loading trained model
model = mlp.MLP()
model.load_state_dict(torch.load(PATH_MODEL_SAVE, map_location=torch.device(DEVICE)))
model.eval() 

# yolo bounding box detection
res = model_YOLO(FILE)

bounding_boxes = res.xyxy[0][res.xyxy[0][:, 0].sort()[1]]
image = cv2.imread(FILE)
color = iter(cm.rainbow(np.linspace(0, 1, 5)))

for box in bounding_boxes:
    
    BBH = (box[3] - box[1]) / 640
    
    input = torch.stack((torch.tensor(1).to(DEVICE), BBH))
    distance = model(input)
    
    c = next(color)
    cv2.rectangle(image, (int(box[0]), int(box[1])), (int(box[2]), int(box[3])), c[0:3]*255, 3)
    image_with_rectangle = cv2.rectangle(image, (int(box[0]), int(box[1])-30), (int(box[0]) + 115, int(box[1]-2)), (255,255,255), -1)
    img_drawn = cv2.putText(image, f"Dist: {(distance.item() * MAX_DIST):.3f}m", (int(box[0]), int(box[1] - 10)), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0,0,0), 2, cv2.LINE_AA)
    image_rgb = cv2.cvtColor(img_drawn, cv2.COLOR_BGR2RGB)


# visualize frame with overlayed 
# bounding boxes and distance estimates
%matplotlib inline

fig = plt.figure()
plt.imshow(image_rgb)
plt.show()

## Part 1.3: Visualization

... now let's run this code on several consequtive frames and visualize the relationship between object height and its distance.

In [None]:
print("Loading Pictures...")

file_pattern = os.path.join(PATH_TEST, FILE_EXTENSION)
picture_files = glob.glob(file_pattern)
picture_files.sort()

print("Reading pictures...")

values = {}

first_image = cv2.imread(picture_files[0])
height, width, _ = first_image.shape

for i, file in enumerate(picture_files):

    res = model_YOLO(file)
    bounding_boxes = res.xyxy[0][res.xyxy[0][:, 0].sort()[1]]

    for box in bounding_boxes:
        BBH = (box[3] - box[1]) / 640

        input = torch.stack((torch.tensor(1).to(DEVICE), BBH))
        distance = model(input)
        
        values.setdefault(distance, []).append(box[3] - box[1])



In [None]:
x = []
y = []

for dist in values:
    for height in values[dist]:
        x.append(height.item())
        y.append(dist.detach().numpy())

plt.scatter(x, y)

plt.xlabel('Height of the bounding box (in pixel)')
plt.ylabel('Distance (in m)')
plt.title('Plot of height and distance')

plt.show()

## Part 2.1: Evaluation

Groudtruth robot poses were captured with a motion capture system for all robots visible in this data. Poses were saved in the provided .csv file. In this part of the tutorial let's anayse the observed error. How does this relate to the object's distance? What are possible causes of this error?

In [None]:
# loading trained model
model = mlp.MLP()
model.load_state_dict(torch.load(PATH_MODEL_SAVE, map_location=torch.device(DEVICE)))
model.eval()

# get list of picture files in PATH
file_pattern = os.path.join(PATH_TEST, FILE_EXTENSION)
picture_files = glob.glob(file_pattern)
picture_files.sort()

# read csv file with ground truth robot poses
df = pd.read_csv(PATH_CSV_TEST, header=None)

# get column for each robot in csv file
row = df.iloc[0].to_numpy()
robot_col = {}
for rbo in ROBOTS:
    bool_array = row == rbo
    robot_col[rbo] = bool_array.argmax()
cam_robot_col = robot_col[CAM_ROBOT]

err = []
distance_x = []

for i, file in enumerate(picture_files):

    # -----------------------------------------------------------------
    # LOAD GROUNDTRUTH DISTANCES FOR EACH PICTURE
    # -----------------------------------------------------------------
    
    row_val = os.path.basename(file)
    row_matching_value = df[df.iloc[:, 0] == row_val]
    print("********")
    print(row_val)

    if row_matching_value.empty:
        print("Skipping, since no matching image name in csv")
        continue

    row = row_matching_value.index[0]

    if type(df.at[row,2]) is not str and math.isnan(df.at[row,2]):
        print("Skipping, since no robots visable on image ")
        continue
       
    # robots present in current frame
    robot_list = df.at[row,2].split()
    
    distance_gt = []
    
    for robot in zip(robot_list):
        
        # pose of the camera robot
        camX = df.at[row,cam_robot_col+2]
        camY = df.at[row,cam_robot_col]

        # pose of neighboring robot
        robotX = df.at[row, robot_col[robot[0]] + 2]
        robotY = df.at[row, robot_col[robot[0]]]

        # GROUNDTRUTH DISTANCE
        distance = np.sqrt((float(robotX)-float(camX)) ** 2 + (float(robotY)-float(camY)) ** 2)
        distance = distance / 100
        
        print(f"distance_gt: {distance}")
        
        distance = torch.tensor(distance, dtype = torch.float32,)
        distance_gt.append(distance)
        
    # -----------------------------------------------------------------
    # COMPUTE DISTANCE FROM BOUNDING BOX HEIGHT
    # -----------------------------------------------------------------
    
    res = model_YOLO(file)
    bounding_boxes = res.xyxy[0][res.xyxy[0][:, 0].sort()[1]]

    distance_estim = []
    
    for box in bounding_boxes:
    
        BBH = (box[3] - box[1]) / 640

        input = torch.stack((torch.tensor(1).to(DEVICE), BBH))
        distance = model(input)
        distance = distance[0].detach()
        distance_estim.append(distance * MAX_DIST)
        
        print(f"distance_estim: {distance * MAX_DIST}")
        
    # -----------------------------------------------------------------
    # COMPUTE ERROR
    # -----------------------------------------------------------------
    
    if(bounding_boxes.shape[0] == len(robot_list)):

        error = abs(np.array(distance_gt) - np.array(distance_estim))
        err.extend(error)
        distance_x.extend(distance_gt)

## Part 2.2: Visualization
...now let's visualize again.

In [None]:
plt.scatter(distance_x, err, label='Error')
mean_err = np.mean(err)
plt.axhline(y=mean_err, color='r', linestyle='--', label=f'Mean Error: {mean_err:.2f} m')

# Adding labels and title
plt.xlabel('Distance [m]')
plt.ylabel('Error [m]')

# Display the plot
plt.legend()
plt.show(block=True)

## Question 1:
Please describe the behavior of the error. What are possible causes for errors that you observe? (Please name at least three causes)

*Answer:*