---
# **Note(s):**
## 1) Metrics to be used for each:
* `Image`:
  * scene
    * estimated based on presumptions, i.e., `scenes`
* `Object`:
  * area:
    * object detected based on single shot detection (SSD) model, i.e., `object_detector`
    * calculated based on bound-boxes, i.e., `curr_pred_boxes`
  * depth:
    * (in progress) predicted based on depth-estimation model, i.e., `depth_estimator`
      * got the matrix
      * developing code to get distance of each object using the matrix
  * **angle**:
    * (to be done:) detected/estimated based on object angle-estimation model
      * have to find a proper model that does the job ...

---
# **Xtra(s)**
## 1) Future work(s):
* improve the image quality in such a way by image-ehancement so that when passed the image to Artificial Retina (AR) chip.
* consider minimum stimulation power by involving multi-objects
* edge information of primary object(s) (todo)

## 2) Question(s):
* What is the **frequency** of sending images to the chip?, i.e., The number of images transmitted per time.
* `image-size`:
  * model-input: `100 x 100`
  * model-output: `40 x 40`
* `tool` to simulate image to artificial retinal chip, i.e., `Matlab`

---

# **Variables**

In [1]:
# general
images_to_consider = 1000

# object detection
obj_det_probability_threshold = 0.360

# distance step factor (multiplicative)
obj_multiplicative_step = 1.00015

# ignore object(s)
obj_ignore_size  = 50 # i.e., 50x50
obj_ignore_area = obj_ignore_size*obj_ignore_size
obj_ignore_depth = 120 # far
obj_ignore_range = 50 # nearest-objects

In [2]:
depth_file_postfix            = "depth"
object_detection_file_postfix = "object-detection"

# **Libraries**

In [3]:
import os
import shutil
import numpy as np
import requests
import math
import random
import datetime
import pytz

import xml.etree.ElementTree as ET

from IPython.display import clear_output

import torch
import torchvision
from torchvision import transforms as T

from PIL import Image, ImageDraw
import cv2

In [112]:
!jupyter --version

Selected Jupyter core packages...
IPython          : 8.12.0
ipykernel        : 6.19.2
ipywidgets       : 8.0.4
jupyter_client   : 8.1.0
jupyter_core     : 5.3.0
jupyter_server   : 2.5.0
jupyterlab       : 3.6.3
nbclient         : 0.5.13
nbconvert        : 6.5.4
nbformat         : 5.7.0
notebook         : 6.5.4
qtconsole        : 5.4.2
traitlets        : 5.7.1


In [4]:
# comment in local only
# from torchsummary import summary

In [5]:
# install for colab only
# !pip install transformers==4.30.2
from transformers import pipeline

In [6]:
# comment in local only
# from google.colab.patches import cv2_imshow

# **Cuda**

In [11]:
device = "cpu"
gpus = None
if_multi_gpu_which_gpu = 0

In [12]:
if (torch.cuda.is_available()):
    device = "cuda:0"
    gpus = [torch.cuda.device(i) for i in range(torch.cuda.device_count())]
    gpus_props = [torch.cuda.get_device_properties(gpu) for gpu in gpus]
    for ind, gpu in enumerate(gpus_props):
        print ("{:}. {:}".format(ind, gpus[ind]))
        print (" - name:\t\t\t{:}\n - major/minor:\t\t\t{:}/{:}".format(
            gpu.name, gpu.major, gpu.minor,
        ))
        print (" - total_memory:\t\t{:} ({:.2f} GB)\n - Multi-processor count:\t{:}\n".format(
            gpu.total_memory, (gpu.total_memory/1000000000), gpu.multi_processor_count,
        ))
        if (if_multi_gpu_which_gpu == ind):
            device = "cuda:{:}".format(if_multi_gpu_which_gpu)
    print ("*** You are ready to use '{:}' device! ***".format(device))

0. <torch.cuda.device object at 0x7f04ac54eb00>
 - name:			NVIDIA GeForce RTX 4070 Ti
 - major/minor:			8/9
 - total_memory:		12878086144 (12.88 GB)
 - Multi-processor count:	60

*** You are ready to use 'cuda:0' device! ***


# **Fonts/Colors/Etc.**

In [13]:
font = cv2.FONT_HERSHEY_SIMPLEX
antialiasedline = cv2.LINE_AA

boxcolor = (0, 255, 0)
textcolor = (255, 255, 0)
centercolor = (255, 0, 0)

# **Functions**

In [14]:
def rgb2gray(rgb):
    r, g, b = rgb[:,:,0], rgb[:,:,1], rgb[:,:,2]
    gray = 0.2989 * r + 0.5870 * g + 0.1140 * b
    return gray

In [15]:
def rectAreaFromCoordinates(x1, y1, x2, y2):
  length = x2-x1
  height = y2-y1
  return length * height

In [16]:
def getTagName(tagString):
  tmp = tagString.split("}")
  tmp_len = len(tmp)
  if (tmp_len == 1):
    return "non-tag"
  else:
    return tmp[tmp_len-1]

In [17]:
def getExtension(nameString):
  tmp = nameString.split(".")
  tmp_len = len(tmp)
  if (tmp_len == 1):
    return "non-ext"
  else:
    return tmp[tmp_len-1]

In [18]:
def getDepthAverage(depths, center, layer):
  return depths[center[0]][center[1]].item()

# **Get the model**

## **Depth estimation model** (vinvino02/glpn-*)

*   GLPN stands for Global-Local Path Networks
*   Karlsruhe Institute of Technology and Toyota Technological Institute
*   New York University

### **- Depth estimator model 1 (a, b)**

In [73]:
checkpoint_options = [
    "vinvino02/glpn-nyu",
    "vinvino02/glpn-kitti",
]

task = "depth-estimation"
checkpoint = checkpoint_options[0]

# for checking existence in locals() function
variable_to_check = "depth_estimator"
if variable_to_check in locals():
  print ("Variable '{:}' exists!, No need to re-initialize!".format(variable_to_check))
else:
  depth_estimator = pipeline(task, model=checkpoint)

Variable 'depth_estimator' exists!, No need to re-initialize!


In [74]:
depth_estimator

<transformers.pipelines.depth_estimation.DepthEstimationPipeline at 0x7f04abbebfd0>

### **- Depth estimator model 2**

In [75]:
# this will be used to convert the image to tensor. Then this tensor will be used as an input to the depth_estimator model
resnet_input_size = (512, 512)
preprocess = T.Compose([
    T.Resize(resnet_input_size),
    T.ToTensor(),
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),  # Adjust normalization values
])

In [76]:
# Download the MiDaS model weights
checkpoint_url = "https://github.com/isl-org/MiDaS/releases/download/v3_1/dpt_beit_large_512.pt"
checkpoint_path = "model_weights.pt"
if (checkpoint_path not in os.listdir()):
  torch.hub.download_url_to_file(checkpoint_url, checkpoint_path)

# for checking existence in locals() function
variable_to_check = "depth_estimator2"
if variable_to_check in locals():
  print ("Variable '{:}' exists!, No need to re-initialize!".format(variable_to_check))
else:
  # Create the MiDaS depth estimation model
  depth_estimator2 = torchvision.models.resnet50(pretrained=True)
  # depth_estimator2.load_state_dict(torch.load(checkpoint_path))
  depth_estimator2 = depth_estimator2.to(device)
  # Set the model to evaluation mode
  depth_estimator2.eval()
  clear_output(wait=True)

Variable 'depth_estimator2' exists!, No need to re-initialize!


In [77]:
depth_estimator2.conv1

Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

## **Object detection model** (ssd*)

* SSD stands for Single Shot Detector
* COCO stands for Common Object in Context

### **- Object detector model 1**

In [96]:
object_detector_options = [
    "ssd300_vgg16",
    "ssdlite320_mobilenet_v3_large",
]

object_detector = None
object_detector_option = object_detector_options[0]

if (object_detector_option == "ssd300_vgg16"):
  object_detector = torchvision.models.detection.ssd300_vgg16(pretrained = True)
elif(object_detector_option == "ssdlite320_mobilenet_v3_large"):
  object_detector = torchvision.models.detection.ssdlite320_mobilenet_v3_large(pretrained = True)



In [97]:
object_detector = object_detector.to(device)
clear_output(wait=True)

In [98]:
object_detector.eval()
clear_output(wait=True)

### loading coco labels

In [1]:
coco_names =  [
    "person" , "bicycle" , "car" , "motorcycle" , "airplane" , 
    "bus" , "train" , "truck" , "boat" , "traffic light" , # 10 
    "fire hydrant" , "street sign" , "stop sign" , "parking meter" , "bench" , 
    "bird" , "cat" , "dog" , "horse" , "sheep" , # 20 
    "cow" , "elephant" , "bear" , "zebra" , "giraffe" , 
    "hat" , "backpack" , "umbrella" , "shoe" , "eye glasses" , # 30 
    "handbag" , "tie" , "suitcase" , "frisbee" , "skis" , 
    "snowboard" , "sports ball" , "kite" , "baseball bat" , "baseball glove" , # 40 
    "skateboard" , "surfboard" , "tennis racket" , "bottle" , "plate" , 
    "wine glass" , "cup" , "fork" , "knife" , "spoon" , # 50 
    "bowl" , "banana" , "apple" , "sandwich" , "orange" , 
    "broccoli" , "carrot" , "hot dog" , "pizza" , "donut" , # 60 
    "cake" , "chair" , "couch" , "potted plant" , "bed" , 
    "mirror" , "dining table" , "window" , "desk" , "toilet" , # 70 
    "door" , "tv" , "laptop" , "mouse" , "remote" , 
    "keyboard" , "cell phone" , "microwave" , "oven" , "toaster" , # 80 
    "sink" , "refrigerator" , "blender" , "book" , "clock" , 
    "vase" , "scissors" , "teddy bear" , "hair drier" , "toothbrush" , # 90 
    "hair brush", # 91 
]

#### trying to get coco_labels from library (instead of hard coding)

In [100]:
# utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_ssd_processing_utils')

In [101]:
# classes_to_labels = utils.get_coco_object_dictionary()

In [102]:
# print ("Number of classes (from utils) {:}".format(len(classes_to_labels)))
# print ("Number of classes (manually) {:}".format(len(coco_names)))

# **Setting up directories**

## clean directories if exist already

In [30]:
main_folder = ""
dataset_folder = ""

if os.getenv("COLAB_RELEASE_TAG"):
  print ("Running in Colab")
  main_folder = "/content/images"
  dataset_folder = "/content/dataset"
else:
  print ("Running in Local system")
  main_folder = "images"
  dataset_folder = "dataset"
print ("- images in '{:}'\n- dataset in '{:}'".format(main_folder, dataset_folder))

Running in Local system
- images in 'images'
- dataset in 'dataset'


In [31]:
if (os.path.exists(main_folder) == True):
  shutil.rmtree(main_folder)

In [32]:
if (os.path.exists(dataset_folder) == True):
  shutil.rmtree(dataset_folder)

## setting up the source of images (to be loaded)

In [33]:
image_src = "http://images.cocodataset.org"

In [34]:
response = requests.get(image_src)
xml_content = response.content

In [35]:
tree = ET.fromstring(xml_content)

In [36]:
root = tree

In [37]:
contents_counter = 0
others_counter = 0
zip_counter = 0
jpg_counter = 0
non_counter = 0
non_value = ""
images_dict = {}

stop_considering_jpgs = False
for content in root:
  tagName = getTagName(content.tag)
  # print ("***{:}***:".format(tagName))
  if (tagName == "Contents"):
    contents_counter += 1
    for content2 in content:
      tagName2 = getTagName(content2.tag)
      # print ("\t{:}: '{:}'".format(tagName2, content2.text))
      if (tagName2 == "Key"):
        fileExtension = getExtension(content2.text)
        # print ("{:}".format(fileExtension))
        if (fileExtension == "zip"):
          zip_counter += 1
        elif (fileExtension == "jpg"):
          jpg_counter += 1
          if (images_to_consider == jpg_counter):
            stop_considering_jpgs = True
          # print ("***{:}***:".format(tagName))
          # print ("\t{:}: '{:}'".format(tagName2, content2.text))
          # print ("{:}".format(fileExtension))
          images_dict[jpg_counter-1] = content2.text
        elif (fileExtension == "non-ext"):
          non_counter += 1
          non_value = fileExtension
    if (stop_considering_jpgs):
      break
  else:
    others_counter += 1
  # print ("")

print ("")
print ("# of 'Others' tags: {:}".format(others_counter))
print ("# of 'Content' tags: {:}".format(contents_counter))
print (" - # of 'zip' files: {:}".format(zip_counter))
print (" - # of 'jpg' files: {:}".format(jpg_counter))
print (" - # of '{:}' files: {:}".format(non_value, non_counter))


# of 'Others' tags: 5
# of 'Content' tags: 1000
 - # of 'zip' files: 11
 - # of 'jpg' files: 988
 - # of 'non-ext' files: 1


In [38]:
images = len(images_dict)
# images = 20
print ("'{:}/{:}' number of images will be considered.".format(images, images_to_consider))

'988/1000' number of images will be considered.


## create directories

In [39]:
print ("Setting up (image) directories:")

sub_folder = ""
for image in range(images):
  try:
    sub_folder = "{:}/{:}/".format(main_folder, image)
    os.makedirs(sub_folder)
    print ("- Folder '{:}' created.".format(sub_folder))
  except FileExistsError:
    print ("- Folder '{:}' already exists.".format(sub_folder))
  clear_output(wait=True)

- Folder 'images/987/' created.


In [40]:
img_folder_originals = "images_originals"
img_folder_depths    = "images_depths"
img_folder_objects   = "images_objects"
txt_folder_objects   = "texts_objects"

In [41]:
print ("Setting up (dataset) directories:")

try:
  sub_folder = "{:}".format(dataset_folder)
  os.makedirs(sub_folder)
  print ("- Folder '{:}' created.".format(sub_folder))
except FileExistsError:
  print ("- Folder '{:}' already exists.".format(sub_folder))

try:
  sub_folder = "{:}/{:}/".format(dataset_folder, img_folder_originals)
  os.makedirs(sub_folder)
  print ("- Folder '{:}' created.".format(sub_folder))
except FileExistsError:
  print ("- Folder '{:}' already exists.".format(sub_folder))

try:
  sub_folder = "{:}/{:}/".format(dataset_folder, img_folder_depths)
  os.makedirs(sub_folder)
  print ("- Folder '{:}' created.".format(sub_folder))
except FileExistsError:
  print ("- Folder '{:}' already exists.".format(sub_folder))

try:
  sub_folder = "{:}/{:}/".format(dataset_folder, img_folder_objects)
  os.makedirs(sub_folder)
  print ("- Folder '{:}' created.".format(sub_folder))
except FileExistsError:
  print ("- Folder '{:}' already exists.".format(sub_folder))

try:
  sub_folder = "{:}/{:}/".format(dataset_folder, txt_folder_objects)
  os.makedirs(sub_folder)
  print ("- Folder '{:}' created.".format(sub_folder))
except FileExistsError:
  print ("- Folder '{:}' already exists.".format(sub_folder))

Setting up (dataset) directories:
- Folder 'dataset' created.
- Folder 'dataset/images_originals/' created.
- Folder 'dataset/images_depths/' created.
- Folder 'dataset/images_objects/' created.
- Folder 'dataset/texts_objects/' created.


## loading images in the folder (**or manually upload them and don't run this block of code**)

In [42]:
for image in range(images):
  print (image, images_dict[image])
  target_img = "{:}/{:}/{:}.jpg".format(main_folder, image, 0)
  if (len(images_dict[image]) <77):
    src_img = "{:}/{:}".format(image_src, images_dict[image])
  else:
    src_img = "{:}".format(images_dict[image])
  print ("-", target_img)
  print ("-", src_img)
  # !wget -O "/content/images/0/0.jpg" "http://images.cocodataset.org/val2017/000000037777.jpg"
  !wget -O {target_img} {src_img}
  clear_output(wait=True)

459 test-stuff2017/000000013570.jpg
- images/459/0.jpg
- http://images.cocodataset.org/test-stuff2017/000000013570.jpg
459 test-stuff2017/000000013570.jpg
- images/459/0.jpg
- http://images.cocodataset.org/test-stuff2017/000000013570.jpg


In [43]:
shutil.make_archive(main_folder, 'zip', main_folder)

# **Presumptions**

In [44]:
showrgb = not True

In [103]:
# scenes = {
#     "traffic" : [
#         "truck", "car",
#     ], # 0
#     "cows-in-pasture" : [
#         "cow",
#     ], # 1
#     "elephant-in-zoo" : [
#         "elephant",
#     ], # 2
#     "self-picture" : [
#         "person", "tie",
#     ], # 3
#     "public-place": [
#         "person", "tie",
#     ], # 4
#     "buses-on-road" : [
#         "bus",
#     ], # 5
#     "clock-house" : [
#         "clock",
#     ], # 6
#     "eating-food" : [
#         "person", "donut",
#     ], # 7
#     "open seas and oceans" : [
#         "boat",
#     ], # 8
#     "workplace" : [
#         "person", "chair"
#     ], # 9

#     # "sports" : [
#     #     "person", "skateboard",
#     # ],
#     # "people-sitting-outside" : [
#     #     "people", "bench", "potted plant",
#     # ],
#     # "person-in-tv-lounge" : [
#     #     "person", "tv", "couch",
#     # ],
#     # "person-in-bedroom" : [
#     #     "person", "bed",
#     # ],
#     # "fruit-market" : [
#     #     "person", "banana",
#     # ],
#     # "person-dining" : [
#     #     "person", "dining table", "chair", "table", "cup", "bowl", "teddy bear",
#     # ],

#     # "kitchen" : [
#     #     "refrigerator", "oven", "dining table", "bowl",
#     # ],
# }
# default_scene = "no-scene"

# **Brain**

## Comments to this part:


* Scene estimation

## **Object detection & Depth-estimation**

In [108]:
transform = T.ToTensor()

output = {}

# Iterate images
for image in range(images):
  timestamp = 0
  print ("")
  print ("Image # {:}:".format(image))
  output[image] = {}
  target_img = "{}/{:}/{:}.jpg".format(main_folder, image, timestamp)
  curr_ig = Image.open(target_img)
  curr_img = transform(curr_ig).to(device)
  with torch.no_grad():
    ##########################
    # loading objects detected
    curr_pred = object_detector([curr_img])
    curr_pred_keys = np.array(list(curr_pred[0].keys()))
    curr_pred_num_keys = len(curr_pred_keys)
    curr_pred_boxes, curr_pred_scores, curr_pred_labels = curr_pred[0]["boxes"], curr_pred[0]["scores"], curr_pred[0]["labels"]
    curr_objects_num = torch.argwhere(curr_pred_scores >= obj_det_probability_threshold).shape[0]
    curr_objects = []
    curr_igg = cv2.imread(target_img)

    ###########################
    # loading depth predictions
    depth_predictions = depth_estimator(curr_ig)
    draw = ImageDraw.Draw(depth_predictions["depth"])

    #######################################
    # find minimum and maximum pixel values
    depth_prediction_min = float('inf')
    depth_prediction_max = float('-inf')
    depth_prediction_w, depth_prediction_h = depth_predictions["depth"].size
    # print (" - it has '{:}' rows and '{:}' columns.".format(depth_prediction_w, depth_prediction_h))
    for y in range(depth_prediction_h):
      for x in range(depth_prediction_w):
        depth_prediction_pv = depth_predictions["depth"].getpixel((x, y))
        if depth_prediction_pv < depth_prediction_min:
          depth_prediction_min = depth_prediction_pv
        if depth_prediction_pv > depth_prediction_max:
          depth_prediction_max = depth_prediction_pv
    # print (" - the min and max depths of this image is {:}. Diff: {:}".format(
    #     (depth_prediction_min, depth_prediction_max),
    #     depth_prediction_max-depth_prediction_min,
    # ))

    #####################################
    # making output statistics of objects
    output[image]["objects"] = {}
    output[image]["stats"] = {}
    output[image]["stats"]["depth"] = {}
    output[image]["stats"]["depth"]["min"] = depth_prediction_min
    output[image]["stats"]["depth"]["max"] = depth_prediction_max
    for obj in range(curr_objects_num):
      obj_x1, obj_y1, obj_x2, obj_y2 = curr_pred_boxes[obj].cpu().numpy().astype("int")
      obj_xc, obj_yc = math.ceil((obj_x1+obj_x2)/2), math.ceil((obj_y1+obj_y2)/2)
      dot_position = (obj_xc, obj_yc)
      depth_prediction_curr_obj = depth_predictions["depth"].getpixel(dot_position)
      curr_igg  = cv2.rectangle(curr_igg, (obj_x1, obj_y1), (obj_x2, obj_y2), boxcolor, 1)
      class_name = coco_names[curr_pred_labels.cpu().numpy()[obj]-1]
      curr_objects.append(class_name)
      curr_igg = cv2.putText(curr_igg, class_name, (obj_x1, obj_y1-6), font, 0.5, textcolor, 1, antialiasedline)
      curr_igg = cv2.putText(curr_igg, "x", (obj_xc, obj_yc), font, 0.2, centercolor, 1, antialiasedline)

      ################################
      # find actual size of the object
      obj_area = rectAreaFromCoordinates(obj_x1, obj_y1, obj_x2, obj_y2)
      obj_area_org = obj_area
      for step in range(depth_prediction_curr_obj, depth_prediction_min, -1):
        obj_area_org = obj_multiplicative_step*obj_area_org
      obj_area_org = math.ceil(obj_area_org)

      #####################
      # storing output json
      if class_name not in output[image]["objects"]:
        output[image]["objects"][class_name] = {}
      curr_obj_index = len(output[image]["objects"][class_name].keys())
      output[image]["objects"][class_name][curr_obj_index] = {}
      output[image]["objects"][class_name][curr_obj_index]["x1"]    = obj_x1
      output[image]["objects"][class_name][curr_obj_index]["y1"]    = obj_y1
      output[image]["objects"][class_name][curr_obj_index]["x2"]    = obj_x2
      output[image]["objects"][class_name][curr_obj_index]["y2"]    = obj_y2
      output[image]["objects"][class_name][curr_obj_index]["xc"]    = obj_xc
      output[image]["objects"][class_name][curr_obj_index]["yc"]    = obj_yc
      output[image]["objects"][class_name][curr_obj_index]["class"] = coco_names.index(class_name)
      output[image]["objects"][class_name][curr_obj_index]["area"]  = obj_area_org
      output[image]["objects"][class_name][curr_obj_index]["depth"] = depth_prediction_curr_obj
      output[image]["objects"][class_name][curr_obj_index]["score"] = round(curr_pred_scores[obj].item(), 2)

      output[image]["objects"][class_name][curr_obj_index]["primary"] = 1 # random.getrandbits(1)
      if (depth_prediction_curr_obj > obj_ignore_depth):
        output[image]["objects"][class_name][curr_obj_index]["primary"] = 0
        if (obj_area < obj_ignore_area):
          output[image]["objects"][class_name][curr_obj_index]["primary"] = 0
        else:
          pass
      else:
        if (obj_area < obj_ignore_area):
          output[image]["objects"][class_name][curr_obj_index]["primary"] = 0
        else:
          pass
      draw.point(dot_position, fill = "white")
    curr_objects = list(dict.fromkeys(curr_objects))

    # ####################################
    # # making output statistics of scenes
    # output[image]["scenes"] = {}
    # for scene in scenes:
    #   for obj in curr_objects:
    #     if (obj in scenes[scene]):
    #       if scene not in output[image]["scenes"]:
    #         output[image]["scenes"][scene] = {}
    #       output[image]["scenes"][scene][obj] = 0
    #       for obj_i in output[image]["objects"][obj]:
    #         output[image]["scenes"][scene][obj] += 1

  ##########################################
  # copying original-image to dataset_folder
  target_img_dest = "{:}/{:}/{:}-{:}.jpg".format(dataset_folder, img_folder_originals, image, timestamp)
  shutil.copyfile(target_img, target_img_dest)

  ###############################
  # saving object-detection-image
  curr_igg_image = Image.fromarray(curr_igg)
  target_img_object_detection = "{:}/{:}/{:}-{:}.jpg".format(main_folder, image, timestamp, object_detection_file_postfix)
  curr_igg_image.save(target_img_object_detection)
  # copying object-detection-image to dataset_folder
  target_img_object_detection_dest = "{:}/{:}/{:}-{:}.jpg".format(dataset_folder, img_folder_objects, image, timestamp)
  shutil.copyfile(target_img_object_detection, target_img_object_detection_dest)

  ####################
  # saving depth-image
  target_img_depth = "{:}/{:}/{:}-{:}.jpg".format(main_folder, image, timestamp, depth_file_postfix)
  depth_predictions["depth"].save(target_img_depth)
  # copying depth-image to dataset_folder
  target_img_depth_dest = "{:}/{:}/{:}-{:}.jpg".format(dataset_folder, img_folder_depths, image, 0)
  shutil.copyfile(target_img_depth, target_img_depth_dest)

  ######################################
  # saving object details in a text file
  txt_file_cols = {
      "x1"         : "x1",
      "y1"         : "x2",
      "x2"         : "y1",
      "y2"         : "y2",
      "xc"         : "xc",
      "yc"         : "yc",
      "class"      : "class",
      "area"       : "area",
      "depth"      : "depth",
      "score"      : "pred_score",
      "primary"    : "primary",
  }
  target_txt_object = "{:}/{:}/{:}.txt".format(main_folder, image, 0)
  with open(target_txt_object, "w") as txt_file:
    txt_file.write(
      ",".join([txt_file_cols[key] for key in txt_file_cols.keys()])+"\n"
    )
    for ind, obj in enumerate(output[image]["objects"]):
      occ_obj = len(output[image]["objects"][obj])
      print (" - {:}. {:} ({:}):".format(
        ind,
        obj,
        occ_obj,
      ))
      for obj_i in output[image]["objects"][obj]:
        txt_file.write(
          ",".join([str(output[image]["objects"][obj][obj_i][key]) for key in txt_file_cols.keys()])+"\n"
        )
        print ("    - #{:} - area:{:}, depth:{:}, score:{:.3f}, (x1, y1): {:}, (x2, y2): {:}, (xc, yc): {:} - PRIMARY: {:}".format(
            obj_i,
            output[image]["objects"][obj][obj_i]["area"],
            output[image]["objects"][obj][obj_i]["depth"],
            output[image]["objects"][obj][obj_i]["score"],
            "({:}, {:})".format(output[image]["objects"][obj][obj_i]["x1"], output[image]["objects"][obj][obj_i]["y1"]),
            "({:}, {:})".format(output[image]["objects"][obj][obj_i]["x2"], output[image]["objects"][obj][obj_i]["y2"]),
            "({:}, {:})".format(output[image]["objects"][obj][obj_i]["xc"], output[image]["objects"][obj][obj_i]["yc"]),
            output[image]["objects"][obj][obj_i]["primary"],
        ))
  target_txt_object_dest = "{:}/{:}/{:}-{:}.txt".format(dataset_folder, txt_folder_objects, image, 0)
  shutil.copyfile(target_txt_object, target_txt_object_dest)

  # ###################################################
  # # displaying object-detection-image and depth-image
  # if (showrgb == True):
  #   cv2_imshow(rgb2gray(curr_igg))
  # else:
  #   cv2_imshow(curr_igg)
  # depth_predictions["depth"].show()

  clear_output(wait=True)


Image # 987:
 - 0. dining table (1):
    - #0 - area:151564, depth:56, score:0.930, (x1, y1): (2, 85), (x2, y2): (367, 499), (xc, yc): (185, 292) - PRIMARY: 1
 - 1. chair (1):
    - #0 - area:10894, depth:152, score:0.890, (x1, y1): (3, 1), (x2, y2): (109, 102), (xc, yc): (56, 52) - PRIMARY: 0
 - 2. bowl (3):
    - #0 - area:17652, depth:45, score:0.770, (x1, y1): (216, 387), (x2, y2): (372, 500), (xc, yc): (294, 444) - PRIMARY: 1
    - #1 - area:3274, depth:55, score:0.610, (x1, y1): (5, 236), (x2, y2): (69, 287), (xc, yc): (37, 262) - PRIMARY: 1
    - #2 - area:13128, depth:44, score:0.410, (x1, y1): (47, 412), (x2, y2): (196, 500), (xc, yc): (122, 456) - PRIMARY: 1


## Annotation CSV

In [146]:
target_csv_file = "{:}/custom_annotations.txt".format(dataset_folder)
with open(target_csv_file, "w") as csv_file:
    csv_file.write("{:},{:},{:},{:},{:}\n".format(
        "image_id",
        "image_original",
        "image_object_detection",
        "image_depth",
        "text_object",
    ))
    for image in range(images):
        timestamp = 0
        image_original = "{:}/{:}/{:}-{:}.jpg".format(
            dataset_folder, img_folder_originals, image, timestamp
        )
        image_object_detection = "{:}/{:}/{:}-{:}.jpg".format(
            dataset_folder, img_folder_objects, image, timestamp
        )
        image_depth = "{:}/{:}/{:}-{:}.jpg".format(
            dataset_folder, img_folder_depths, image, timestamp
        )
        text_object = "{:}/{:}/{:}-{:}.txt".format(
            dataset_folder, txt_folder_objects, image, timestamp
        )
        csv_file.write("{:},{:},{:},{:},{:}\n".format(
            image,
            image_original,
            image_object_detection,
            image_depth,
            text_object,
        ))

## Zip 'dataset' and 'images' folders and download the zip(s)

In [142]:
current_time = datetime.datetime.now(pytz.timezone('Asia/Seoul'))
timestamp_string = current_time.strftime("%Y-%m-%d_%H-%M-%S")

In [148]:
# zip the dataset folder
shutil.make_archive(dataset_folder+"_"+timestamp_string, 'zip', dataset_folder)

# zip the images folder
shutil.make_archive(main_folder+"_"+timestamp_string, 'zip', main_folder)

'/home/malik/Artificial-Retina/code/images_2023-07-05_20-16-05.zip'

### depth to depth_prediction_max (mostly 255)

In [None]:
# curr_obj_area_org = output[image]["objects"][obj][obj_i]["area"]
# for step in range(output[image]["objects"][obj][obj_i]["depth"], depth_prediction_max, 1):
#   curr_obj_area_org = obj_multiplicative_step*curr_obj_area_org
#   print ("{:} ({:})".format(
#       step, curr_obj_area_org
#   ),
#   end=", ")
# print (curr_obj_area_org)

### depth to depth_prediction_min (mostly close to 0)

In [None]:
# curr_obj_area_org = output[image]["objects"][obj][obj_i]["area"]
# for step in range(output[image]["objects"][obj][obj_i]["depth"], depth_prediction_min, -1):
#   curr_obj_area_org = obj_multiplicative_step*curr_obj_area_org
#   print ("{:} ({:})".format(
#       step, curr_obj_area_org
#   ),
#   end=", ")
# print (curr_obj_area_org)

### find min and max depth values of an image

In [None]:
# depth_prediction_min = float('inf')
# depth_prediction_max = float('-inf')
# depth_prediction_w, depth_prediction_h = depth_predictions["depth"].size
# print (" - The image has '{:}' rows and '{:}' columns.".format(depth_prediction_w, depth_prediction_h))

In [None]:
# # Iterate over each pixel in the image to find out minimum and maximum value
# for y in range(depth_prediction_h):
#   for x in range(depth_prediction_w):
#     depth_prediction_pv = depth_predictions["depth"].getpixel((x, y))
#     if depth_prediction_pv < depth_prediction_min:
#       depth_prediction_min = depth_prediction_pv
#     if depth_prediction_pv > depth_prediction_max:
#       depth_prediction_max = depth_prediction_pv

# depth_prediction_minmax = depth_prediction_min, depth_prediction_max
# print (" - The min and max depths of this image is {:}. Diff: {:}".format(
#     depth_prediction_minmax,
#     depth_prediction_max-depth_prediction_min,
# ))

## **Displaying output**

In [None]:
# # iterating and displaying full output (scenes, objects)
# # scenes_in_all_images = []
# for image in output:
#   # print ("")
#   print ("Image # {:}:".format(image))

#   indent_for_object_scenes = "   "
#   indent_for_object_stats  = "\t"

#   # ###################
#   # # displaying scenes
#   # print ("\n- Scenes:")
#   # for ind, scene in enumerate(output[image]["scenes"]):
#   #   print ("{:}{:}. {:}:\t\t({:})".format(
#   #     indent_for_object_scenes,
#   #     ind,
#   #     scene,
#   #     # output[image]["scenes"][scene],
#   #     ', '.join('{}:{}'.format(key, value) for key, value in output[image]["scenes"][scene].items())
#   #   ))

#   ####################
#   # displaying objects
#   num_objs = len(output[image]["objects"])
#   print ("- Objects ({:}):".format(num_objs))
#   for ind, obj in enumerate(output[image]["objects"]):
#     occ_obj = len(output[image]["objects"][obj])
#     print ("{:}{:}. {:} ({:}):".format(
#       indent_for_object_scenes,
#       ind,
#       obj.capitalize(),
#       occ_obj,
#     ))
#     for obj_i in output[image]["objects"][obj]:
#       print ("{:} #{:} - area:{:}, depth:{:}, score:{:.3f}, class:{:}, (x1, y1): {:}, (x2, y2): {:}, (xc, yc): {:}".format(
#           indent_for_object_stats, obj_i,
#           output[image]["objects"][obj][obj_i]["area"],
#           output[image]["objects"][obj][obj_i]["depth"],
#           output[image]["objects"][obj][obj_i]["score"],
#           output[image]["objects"][obj][obj_i]["class"],
#           "({:}, {:})".format(output[image]["objects"][obj][obj_i]["x1"], output[image]["objects"][obj][obj_i]["y1"]),
#           "({:}, {:})".format(output[image]["objects"][obj][obj_i]["x2"], output[image]["objects"][obj][obj_i]["y2"]),
#           "({:}, {:})".format(output[image]["objects"][obj][obj_i]["xc"], output[image]["objects"][obj][obj_i]["yc"]),
#       ))

#   # scenes_in_curr_image = len(output[image]["scenes"])
#   # scenes_in_all_images.append(scenes_in_curr_image)

#   print ("\n******************\n")

# # ###########################################
# # # displaying number of scenes in each image
# # for image, num_of_scenes in enumerate(scenes_in_all_images):
# #   print ("Image # {:} has '{:}' scenes.".format(image, num_of_scenes), end = "")
# #   if (num_of_scenes == 0 or num_of_scenes > 1):
# #     print (" (have a look at this image number)")

# #   else:
# #     print ("")

In [None]:
# output[0]

In [None]:
# ###########################################
# # displaying number of scenes in each image
# for image, num_of_scenes in enumerate(scenes_in_all_images):
#   print ("\nImage # {:} has '{:}' scene(s).".format(image, num_of_scenes), end = "")
#   if (num_of_scenes == 0 or num_of_scenes > 1):
#     print (" (have a look at this image number)")
#     for ind, scene in enumerate(output[image]["scenes"]):
#       print ("{:}{:}. {:}({:} objects):\t\t({:})".format(
#         indent_for_object_scenes,
#         ind,
#         scene,
#         len(output[image]["scenes"][scene]),
#         # output[image]["scenes"][scene],
#         ", ".join("{:}:{:}".format(key, value) for key, value in output[image]["scenes"][scene].items())
#       ))
#       # for obj_in_scene in output[image]["scenes"][scene]:
#       #   print ("  - {:}{:}".format(indent_for_object_scenes, obj_in_scene))
#       # print ("")
#   else:
#     print (" (ok)")

## **Viewing one sample output**

In [None]:
# output_i = 0
# if (output_i >= images_to_consider):
#   output_i = 0
# print ("Sample output for Image # {:}".format(output_i))

In [None]:
# output[output_i]["scenes"]

In [None]:
# output[output_i]["objects"]

In [None]:
# # print ("\n- Scenes:")
# # for ind, scene in enumerate(output[output_i]["scenes"]):
# #   print ("{:}{:}. {:}:\t\t({:})".format(
# #     indent_for_object_scenes,
# #     ind,
# #     scene,
# #     # output[image]["scenes"][scene],
# #     ', '.join('{}:{}'.format(key, value) for key, value in output[output_i]["scenes"][scene].items())
# #   ))

# target_image_sample       = "{:}/{:}/{:}-{:}.jpg".format(main_folder, output_i, 0, object_detection_file_postfix)
# target_image_depth_sample = "{:}/{:}/{:}-{:}.jpg".format(main_folder, output_i, 0, depth_file_postfix)
# target_image_sample_output       = Image.open(target_image_sample)
# target_image_depth_sample_output = Image.open(target_image_depth_sample)
# target_image_sample_output.show()
# target_image_depth_sample_output.show()

## **PyTorch model structures**

### **depth_estimator** model 1 (a) structure
model:
* **glpn**
  * encoder
* **decoder**

In [None]:
depth_estimator.model.decoder

### **depth_estimator2** model 1 (b) structure
model:
* **conv2d**
  * conv2d-one
* **two**
  * two-one
* **three**
  * three-one
* **four**

In [None]:
depth_estimator2.conv1

### **object_detector** model 2 structure
model
* **backbone**
  * features
  * extra
* **anchor_generator**
  * aspect_ratios
* **head**
  * classification_head
* **transform**

In [None]:
object_detector.backbone.features

In [None]:
tmp_layers = [type(layer) for layer in object_detector.backbone.features]
tmp_layers_unq = set(tmp_layers)

In [None]:
print ("Types of layers used in 'object_detector' model:")
for tmp in tmp_layers_unq:
  print ("  -", tmp)

## **Data Types (continuous, ordinal, catgeorical)**

In [3]:
dataset_defintion = {
    "image": {
        "originals": {"values": {"min": 0, "max": 255}, "category": "image-continous"},
        "objects":   {"values": {"min": 0, "max": 255}, "category": "image-continous"},
        "depths":    {"values": {"min": 0, "max": 255}, "category": "image-continous"},
    },
    "objectspecs": {
        "x1": { "values": {"min": 0, "max": 255}, "d-type": "int", "category": "continous/numerical"},
        "x2": { "values": {"min": 0, "max": 255}, "d-type": "int", "category": "continous/numerical"},
        "y1": { "values": {"min": 0, "max": 255}, "d-type": "int", "category": "continous/numerical"},
        "y2": { "values": {"min": 0, "max": 255}, "d-type": "int", "category": "continous/numerical"},
        "xc": { "values": {"min": 0, "max": 255}, "d-type": "int", "category": "continous/numerical"},
        "yc": { "values": {"min": 0, "max": 255}, "d-type": "int", "category": "continous/numerical"},
        "class": { "values": {"min": 0, "max": 90}, "d-type": "int", "category": "categorical/nominal"},
        "area": { "values": {"min": 0, "max": 65536}, "d-type": "float", "category": "continous/numerical"},
        "depth": { "values": {"min": 0, "max": 255}, "d-type": "int", "category": "continous/numerical"},
        "pred_score": { "values": {"min": 0, "max": 1}, "d-type": "float", "category": "continous/numerical"},
        "primary": { "values": {"min": 0, "max": 1}, "d-type": "bool", "category": "categorical/nominal"},
    },
}

In [112]:
dataset_analysis_techniques = {
    "continous/numerical": {
        "DescriptivE-StatisticS to provide a summary of main features of the data": [
            "main", "median", "standard-deviation", "range", "percentiles", "central-tendency", "dispersion",
        ],
        "DatA-VisualizatioN to explore distribution, patterns, and relationships": [
            "histograms", "box-plots", "scatter-plots", "line-plots",
        ],
        "InferencE-StatisticS to make inferences about population based on sample data": [
            "hypothesis-testing(t-tests, ANOVA)",
        ],
        "CorrelatioN-AnalysiS to assess the strength and direction of relationship between continous variables" : [
            "correlation-coefficients", "pearson-correlation",
        ],
        "RegressioN-AnalysiS to model the relationship between a dependent variable and one or more independent variables": [
            "linear", "polynomial", "logistic", "poisson", "ridge", "lasso", "elastic-net",
            "stepwise", "robust", "non-linear", "time-series", "hierarchical", "bayesian",
        ],
        "TimE-SerieS-AnalysiS for the data collected over time": [
            "trends", "seasonality", "patterns",
        ],
        "ProbabilitY/NormaL-DistributionS to model underlying distribution of continuous data": [
            "likelihood of observing certain values",
        ],
        "Outlier Detection to ensure robustness of the analysis": [
            "indicates-data-quality-issues-or-interesting-phenomena",
        ],
        "Clustering Analysis to group data points with similar characteristics": [
            "distinct-patterns", "segments-within-data",
        ],
        "DimensionalitY-ReductioN to reduce dimensions of data and preserve important patterns": [
            "prinicpal-component-analysis(pca)", "t-distributed-stochastic-neighbor-embedding(t-SNE)",
        ],
    },
    "ordinal/categorical": {
        "DescriptivE-StatisticS to provide a summary of category distribution": [
            "frequency", "percentages", "central-tendency(median/mode)",
        ],
        "NoN-ParametriC-TestS to provide a summary of category distribution": [
            "Mann-Whitney-U", "Kruskal-Wallis", "Wilcoxon-signed-rank",
        ],
        "OrdinaL-LogistiC-RegressioN to model relationship b/w ordinal-outcomes and other ordianl/nominal outcomes": [
            "proportional-odds-model", "ordinal-outcome-level-changing",
        ],
        "Spearman'S-RanK-CorrelatioN to assess the strength and direction of relationship b/w two ordinal variables": [
            "non-parametric-alternative-to-Pearson-correlation", "suitable-for-monotonic-relationships",
        ],
        "ProportionaL-OddS-ModeL to analyze ordinal data with multiple predictor variables": [
            "estimates the odds of an ordinal outcome being in a higher category, given the predictor variables",
        ],
        "CrosS-TabulatioN-AnD-Chi-SquarE-TesT to identify associations b/w 2/more categorical variables, including ordinals": [
            "",
        ],
        "VisualizationS to display the distribution of ordinal data and revealing patterns between variables": [
            "bar-charts", "stacked-bar-charts", "mosaic-plots",
        ],
        "CumulativE-ProbabilitY-PlotS to visualize the cumulative distribution of ordinal data": [
            "insights-into-the-underlying-structure", "comparisons-between-groups",
        ],
    },
    "categorical/nominal": {
        "Descriptive Statistics": ["TBD"],
        "NoN-ParametriC-TestS": [
            "Mann-Whitney-U", "Kruskal-Wallis", "Wilcoxon-signed-rank",
        ],
        "Ordinal Logistic Regression": ["TBD"],
        "Spearman's Rank Correlation": ["TBD"],
        "Proportional Odds Model": ["TBD"],
        "Cross-Tabulation and Chi-Square Test": ["TBD"],
        "Visualizations": [
            "bar-charts", "stacked-bar-charts", "mosaic-plots",
        ],
        "Cumulative Probability Plots": ["TBD"],
    },
}

In [114]:
print ("Apply these statistical analysis if one of these:\n")
indentation1 = "  "
indentation2 = indentation1+indentation1
for i, analysis_tech in enumerate(dataset_analysis_techniques, 1):
    print ("{:}. {:}:".format(i, analysis_tech))
    for analysis_meth in dataset_analysis_techniques[analysis_tech]:
        print ("{:}- {:}:\n{:}- {:}".format(
            indentation1, analysis_meth,
            indentation2,
            ", ".join(dataset_analysis_techniques[analysis_tech][analysis_meth])[::-1].replace(",", "dna ", 1)[::-1],
        ))
    print ("")

Apply these statistical analysis if one of these:

1. continous/numerical:
  - DescriptivE-StatisticS to provide a summary of main features of the data:
    - main, median, standard-deviation, range, percentiles, central-tendency and dispersion
  - DatA-VisualizatioN to explore distribution, patterns, and relationships:
    - histograms, box-plots, scatter-plots and line-plots
  - InferencE-StatisticS to make inferences about population based on sample data:
    - hypothesis-testing(t-tests and ANOVA)
  - CorrelatioN-AnalysiS to assess the strength and direction of relationship between continous variables:
    - correlation-coefficients and pearson-correlation
  - RegressioN-AnalysiS to model the relationship between a dependent variable and one or more independent variables:
    - linear, polynomial, logistic, poisson, ridge, lasso, elastic-net, stepwise, robust, non-linear, time-series, hierarchical and bayesian
  - TimE-SerieS-AnalysiS for the data collected over time:
    - trends,

In [115]:
for definition in dataset_defintion:
    print (definition)
    for subdef in dataset_defintion[definition]:
        subdefcat = dataset_defintion[definition][subdef]["category"]
        subdefmethod = "not-sure"
        if (subdefcat == "continous"):
            subdefmethod = "statistical techniques"
        elif (subdefcat == "ordinal"):
            subdefmethod = "non-parametric statistical methods"
        elif (subdefcat == "categorical"):
            subdefmethod = "descriptive statistics (frequency-counts, bar-charts) and statistical tests (chi-square)"
        print ("- {:} ({:})".format(subdef, subdefcat), end="")
        print ("\tanalysis-method: {:}".format(subdefmethod))
    print ("")

image
- originals (image-continous)	analysis-method: not-sure
- objects (image-continous)	analysis-method: not-sure
- depths (image-continous)	analysis-method: not-sure

objectspecs
- x1 (continous)	analysis-method: statistical techniques
- x2 (continous)	analysis-method: statistical techniques
- y1 (continous)	analysis-method: statistical techniques
- y2 (continous)	analysis-method: statistical techniques
- xc (continous)	analysis-method: statistical techniques
- yc (continous)	analysis-method: statistical techniques
- class (category)	analysis-method: not-sure
- area (continous)	analysis-method: statistical techniques
- depth (continous)	analysis-method: statistical techniques
- pred_score (continous)	analysis-method: statistical techniques
- primary (category)	analysis-method: not-sure



## Generating part by JUNGBEOM

In [None]:
!mkdir dataset
!mkdir dataset/images
!mkdir dataset/dpe_images
!mkdir dataset/objects

import csv

csv_file = "dataset/annotations.csv"
original_folder = 'dataset/images'
depth_folder = 'dataset/dpe_images'
txt_folder = 'dataset/objects'

if os.path.exists(csv_file):
    "Previous annotation file has benn removed."
    os.remove(csv_file)

transform = T.ToTensor()
output = {}


# The score threshold needs to be decided
score_threshold = 0

for image in range(images):
    output[image] = {}
    target_img = "{}/{:}/{:}.jpg".format(main_folder, image, 0)
    print(f'Processing {target_img}')
    curr_ig = Image.open(target_img)
    # flow 1 complete
    curr_img = transform(curr_ig)

    # flow 2 complete
    with torch.no_grad():
        # print (curr_img.shape)
        curr_pred = object_detector([curr_img])
        curr_pred_keys = np.array(list(curr_pred[0].keys()))
        curr_pred_num_keys = len(curr_pred_keys)
        curr_pred_boxes, curr_pred_scores, curr_pred_labels = curr_pred[0]["boxes"], curr_pred[0]["scores"], curr_pred[0]["labels"]


        curr_objects_num = torch.sum(curr_pred_scores >= obj_det_probability_threshold)
        curr_objects = []
        curr_igg = cv2.imread(target_img)

        output[image]["objects"] = {}
        for obj in range(curr_objects_num):
            obj_x1, obj_y1, obj_x2, obj_y2 = curr_pred_boxes[obj].numpy().astype("int")
            curr_igg  = cv2.rectangle(curr_igg, (obj_x1, obj_y1), (obj_x2, obj_y2), boxcolor, 1)
            class_name = coco_names[curr_pred_labels.numpy()[obj]-1]
            curr_objects.append(class_name)
            curr_igg = cv2.putText(curr_igg, class_name, (obj_x1, obj_y1-6), font, 0.5, textcolor, 1, antialiasedline)
            obj_area = rectAreaFromCoordinates(obj_x1, obj_y1, obj_x2, obj_y2)
            # print ("'{:}' area: {:}".format(class_name, obj_area))
            #############################################################
            if class_name not in output[image]["objects"]:
                output[image]["objects"][class_name] = {}
            curr_obj_index = len(output[image]["objects"][class_name].keys())
            output[image]["objects"][class_name][curr_obj_index] = {}
            output[image]["objects"][class_name][curr_obj_index]["size"] = obj_area
            output[image]["objects"][class_name][curr_obj_index]["coordinates"] = [obj_x1, obj_y1, obj_x2, obj_y2]
            #############################################################
        curr_objects = list(dict.fromkeys(curr_objects))

        output[image]["scenes"] = {}
        for scene in scenes:
            # print ("checking scene '{:}'".format(scene))
            output[image]["scenes"][scene] = {}
            output[image]["scenes"][scene]["ans"] = not True
            output[image]["scenes"][scene]["count"] = 0
            for obj in curr_objects:
            # print ("- checking object '{:}'".format(obj))
                if (obj in scenes[scene]):
                    output[image]["scenes"][scene]["ans"] = True
                    output[image]["scenes"][scene]["count"] += 1
    # flow 3 complete
    depth_predictions = depth_estimator(curr_ig)
    for obj in output[image]["objects"]:
        for obj_i in output[image]["objects"][obj]:
            # (to be added here) build logic for depth estimation from the matrix 'depth_predictions'
            output[image]["objects"][obj][obj_i]["depth"] = 10 # static, for now

    curr_output = output[image]

    # flow 4 complete
    sorted_objects = []
    size_weight = 0.01
    depth_weight = 0.01
    # cls = class
    for cls in curr_output['objects']:
        # ob = object
        for index, ob in enumerate(curr_output['objects'][cls]):
            coords = curr_output['objects'][cls][ob]['coordinates']
            # The score needs to be formulated
            score = \
                size_weight * curr_output['objects'][cls][ob]['size'] + \
                depth_weight * curr_output['objects'][cls][ob]['depth']
            if score > score_threshold:
                sorted_objects.append([cls, index, score, coords])


    sorted_objects = sorted(sorted_objects, key=lambda x: x[2], reverse=True)


    ori_imagePath = os.path.join(original_folder, str(image) + '.jpg')
    dpt_imagePath = os.path.join(depth_folder, str(image) + '.jpg')
    txt_filePath = os.path.join(txt_folder, str(image) + '.txt')

    # Save original image (flow 5 complete)
    curr_ig.save(ori_imagePath)

    # Save depth estimation result (flow 6 complete)
    depth_predictions['depth'].save(fp=dpt_imagePath)

    # Save coordinates (flow 7 complete)

    len_SO = len(sorted_objects)
    with open(txt_filePath, 'w') as f:
        if len_SO >= 5:
            for o in sorted_objects:
                f.write(f'{o[3][0]},{o[3][1]},{o[3][2]},{o[3][3]}\n')
        else:
            # If less than 5 objects are detected
            for o in sorted_objects:
                f.write(f'{o[3][0]},{o[3][1]},{o[3][2]},{o[3][3]}\n')
            for _ in range(5-len_SO):
                f.write('0,0,0,0\n')

    # Write on csv file (flow 8 complete)
    csv_content = [
        # index, oriPath, dptPath, txtPath, label
        image, ori_imagePath, dpt_imagePath, txt_filePath, 0
    ]

    with open(csv_file, 'a', newline='') as csvfile:
        writer = csv.writer(csvfile, delimiter=',')
        writer.writerow(csv_content)

# **Xtras**

## **1) Dinov2 (it tracks objects in the video)**

In [None]:
dinov2_vits14 = torch.hub.load('facebookresearch/dinov2', 'dinov2_vits14')

In [None]:
dinov2_vits14.eval()

## **2) Topic (one-liner description)**

In [None]:
# code here