<a align="left" href="https://ultralytics.com/yolov5" target="_blank">
<img src="https://user-images.githubusercontent.com/26833433/125273437-35b3fc00-e30d-11eb-9079-46f313325424.png"></a>

Based from the **official YOLOv5 🚀 notebook** authored by **Ultralytics**, and is freely available for redistribution under the [GPL-3.0 license](https://choosealicense.com/licenses/gpl-3.0/). 
For more information please visit https://github.com/ultralytics/yolov5 and https://ultralytics.com.

# Setup

Check PyTorch and GPU.

In [1]:
%cd ./yolov5

C:\Users\Nathan\Desktop\Test Detection\sign-detection\yolov5


In [13]:
import torch
from IPython.display import Image, clear_output  # to display images

print(f"Setup complete. Using torch {torch.__version__} ({torch.cuda.get_device_properties(0).name if torch.cuda.is_available() else 'CPU'})")

Setup complete. Using torch 1.11.0+cpu (CPU)


# Inference
Run cells in order

1.   Parameters
> Where user can adjust settings for processing and OCR
2.   Functions
> Functions used in other cells
3.   Detection
> YOLOv5 detection to create labels
4.   Process labels
> * Processes the labels in a multi-step process to allow smoother labeling in video
> * Refer to cell for more information
5.   OCR
> EasyOCR to read speeds from speed limit signs and validate stop signs
6. Generate snippet highlights (optional)
> Creates snippets of only detections
7.   Generate results 
> Creates a results CSV that is more readable than labels.csv (optional)
8.   Generate videos
> Create videos with labels (optional)



## Parameters

In [14]:
import os
# ===== ===== ===== ===== ===== PATHS ===== ===== ===== ===== =====

# Path to folder of source videos for detections
videos_path = "../videos"

# Path to folder for output csvs containing detection labels
# will automatically generate
csv_output_path = "../csv_raw_output"

# Path to folder where post processed videos are saved
# will automatically generate
video_output_path = "../videos_processed"

# Path to dataset .yaml file
dataset_yaml = "./data/lisa.yaml"

# Path to weights
weights_path = "../weights/yolov5l_200_epochs.pt"

# Path to results, set to None if you don't want to generate a new folder for all results
# will automatically generate
results_output_path = os.path.join(videos_path , "results")

# Path to video snippets of only detections
# will automatically generate
video_snippets_path = os.path.join(videos_path , "snippets")



# ===== ===== ===== ===== ===== DETECTION ===== ===== ===== ===== =====
# How much to increment each frame in the detection model, 1 = source fps, 2 = half of source fps, 3 = third of source fps, etc
frame_inc = 1



# ===== ===== ===== ===== ===== PROCESSING ===== ===== ===== ===== =====

# Range of frames that checks if there is an overlapping detection of the same class, if below threshold then outlier is removed
outlier_range = 50

# Max number of frames group together before not an outlier
outlier_count = 1

# Range of frames that allows a gap to be filled
# Has to be less than outlier_range or else won't work
gap_range = 25

# Percentage of overlap required to be considered same object
overlap_threshold = 0.001

# ID range, range that groups detections together
# Should be pretty big, since speed limit numbers don't change within a few seconds
id_range = 100



# ===== ===== ===== ===== ===== OCR ===== ===== ===== ===== =====   

# Percentage of total detections required to be valid
# If OCR makes detections at a percentage lower than total count in ID grouping, ignore grouping
ocr_detections_threshold = 0.05

# Confidence threshold for OCR
ocr_conf_threshold = 0.5



# ===== ===== ===== ===== ===== SNIPPETS ===== ===== ===== ===== =====  

# Seconds added to before and after snippet highlight as a buffer
snippet_buffer = 3

# Whether or not to automatically snippet corresponding vehicle interior videos as well
snippet_interior = True

# Creates folder for snippets sorted by classes
snippet_split_classes = True



# ===== ===== ===== ===== ===== RESULTS ===== ===== ===== ===== =====  

# Whether or not results will be dependent on snippets
# This means removing a video from the snippets folder will result in removing a row in results
# Adding a video won't add to the results though
results_snippets_sync = True

## Functions

In [15]:
# [x1, x2, y1, y2, class, conf]
def getOverlapArea(a, b):
  if a[4] != b[4]:
    return 0
  a_area = (a[1] - a[0]) * (a[3] - a[2])
  b_area = (b[1] - b[0]) * (b[3] - b[2])
  maxArea = max(a_area, b_area)
  dxa = a[1] - b[0]
  dya = a[3] - b[2]
  dxya = dxa * dya

  dxb = b[1] - a[0]
  dyb = b[3] - b[2]
  dxyb = dxb * dyb

  #dx = min(a[1], b[1]) - max(a[0], b[0])
  #dy = min(a[3], b[3]) - max(a[2], b[2])

  if dxya > 0 or dxyb > 0:
    return float(max(dxyb, dxya) / maxArea)

  #if (dx>=0) and (dy>=0):
  #  area = dx*dy
  #  return float(area / maxArea)
  return 0

# [class, x, y, width, height, confidence]
# to and from
# [x1, x2, y1, y2, class, conf]
def reformat(data, video_res, toCoordinates = True):
  if toCoordinates:
    x1 = max(int(video_res[0] * (float(data[1]) - ((float(data[3]) / 2)))), 0)
    x2 = min(int(video_res[0] * (float(data[1]) + ((float(data[3]) / 2)))), video_res[0])
    y1 = max(int(video_res[1] * (float(data[2]) - ((float(data[4]) / 2)))), 0)
    y2 = min(int(video_res[1] * (float(data[2]) + ((float(data[4]) / 2)))), video_res[1])
    classname = label_names[int(data[0])]
    conf = data[5]
    reformatted = [x1, x2, y1, y2, classname, conf]
    if len(data) > 6:
      reformatted.append(int(data[6]))
    return reformatted
  width = round((data[1] - data[0]) / video_res[0], 7)
  height = round((data[3] - data[2]) / video_res[1], 7)
  x = round((data[0] / video_res[0]) + (width / 2), 6)
  y = round((data[2] / video_res[1]) + (height / 2), 6)
  classname = label_names.index(data[4])
  conf = data[5]
  reformatted = f"{classname} {x} {y} {width} {height} {conf}"
  if len(data) > 6:
    reformatted = reformatted + f" {data[6]}"
  return reformatted
  

# Returns list of bounding boxes
def getBoundingBoxes(detections, video_res):
    reformatted_detections = []
    for detection in detections:
      data = detection.split(" ")
      if len(data) != 6 and len(data) != 7:
        continue
      reformatted_detections.append(reformat(data, video_res))
    return reformatted_detections

# Creates new bounding boxes between two bounding boxes
def averageBoxes(box1, box2, box_distance):
  new_boxes = []
  for i in range(1, box_distance + 1):
    new_box = []
    for k in range(0, 4):
      new_box.append(int((box2[k] - box1[k]) / box_distance * i) + box1[k])
    new_box.append(box1[4])
    new_box.append(min(box1[5], box2[5]))
    new_boxes.append(new_box)
  return new_boxes

# Preprocesses image for OCR
def preprocess(inp_image, otsu=1):
  processed = inp_image.copy()

  # Grayscale
  processed = cv2.cvtColor(processed, cv2.COLOR_BGR2GRAY)

  # Scaling up
  basewidth = 300
  scale = basewidth / processed.shape[1]
  processed = cv2.resize(processed, None, fx=scale, fy=scale, interpolation=cv2.INTER_CUBIC)

  # Thresholding
  otsu_thresh = cv2.threshold(processed, 127, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[0]
  otsu_thresh *= otsu
  processed_thresh = cv2.threshold(processed, otsu_thresh, 255, cv2.THRESH_BINARY)[1]

  thresh_area = processed_thresh.shape[0] * processed_thresh.shape[1]
  #print(cv2.countNonZero(processed_thresh) / thresh_area)

  # Border
  bordersize = 10
  processed_thresh = cv2.copyMakeBorder(processed_thresh,top=bordersize,bottom=bordersize,left=bordersize,right=bordersize,borderType=cv2.BORDER_CONSTANT,value=[255, 255, 255])
  return processed_thresh


## Detection
<img src="https://user-images.githubusercontent.com/26833433/114307955-5c7e4e80-9ae2-11eb-9f50-a90e39bee53f.png" width="900"> 

Custom trained model

In [19]:
import time
import os
import shutil
start_time = time.time()

if (os.path.exists(csv_output_path) and os.path.isdir(csv_output_path)):
  shutil.rmtree(csv_output_path)
os.mkdir(csv_output_path)

cmd_weights_path = "\"" + weights_path + "\""
cmd_videos_path = "\"" + videos_path + "\""
cmd_csv_output_path = "\"" + csv_output_path + "\""

!python detect.py --weights {cmd_weights_path} --img 640 --conf 0.5 --source {cmd_videos_path} --save-csv {cmd_csv_output_path} --save-conf --nosave --frame-inc {frame_inc}
print("Done, time since start:", time.time() - start_time)



^C
Done, time since start: 865.3435003757477


## Processes the labels in a csv file
Three loops through the csv


1.   Removes outliers
2.   Fills in empty gaps
3.   Applies an ID to consecutively overlapping boxes

Writes to labels.csv in the videos_processed directory

0 = stop sign, 1 = speed limit

\[x1,x2,y1,y2,classname,confidence,id]

In [20]:
import csv
import os
import cv2
import shutil
import yaml
import time

# Open yaml file to get label class data
with open(dataset_yaml, "r") as yamlfile:
  label_names = yaml.safe_load(yamlfile)["names"]

# Create folder
if not os.path.exists(video_output_path):
  os.mkdir(video_output_path)

summary = {}
# Loop through all csvs
for source_video_name in os.listdir(videos_path):
  csv_name = ".".join(source_video_name.split(".")[:-1]) + ".csv"
  if not csv_name in os.listdir(csv_output_path):
    continue

  print("\n" + csv_name)
  summary[source_video_name] = {}

  # Get resolution and fps from video
  video_name = csv_name.split(".csv")[0] + ".mp4"
  vidcap = cv2.VideoCapture(os.path.join(videos_path, video_name))
  video_res = [int(vidcap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(vidcap.get(cv2.CAP_PROP_FRAME_HEIGHT)), int(vidcap.get(cv2.CAP_PROP_FPS)), int(vidcap.get(cv2.CAP_PROP_FRAME_COUNT))]
  vidcap.release()

  # Convert csv to array
  labels_array = []
  csv_path = os.path.join(csv_output_path, csv_name)
  with open(csv_path, "r") as csvfile:
    csv_reader = csv.reader(csvfile)
    labels_array = list(csv_reader)
  
  csvfile_array = []
  for frame_count in range(video_res[3]):
    line_append = [frame_count]
    if len(labels_array) > 0 and len(labels_array[0]) < 2:
      labels_array.pop(0)
    while len(labels_array) > 0 and len(labels_array[0]) >= 2 and frame_count == int(labels_array[0][0]):
      line_append.append(labels_array.pop(0)[1])
    csvfile_array.append(line_append)

  changes_done = [0,0]
  
  # Go through each frame
  # First loop to remove outliers
  # Only removes detections
  for i in range(len(csvfile_array)):
    if len(csvfile_array[i]) <= 1:
      continue

    # Create array of detections for current frame
    curr_detections = getBoundingBoxes(csvfile_array[i][1:], video_res)

    # Go through each detection and remove outliers
    detection_count = 1
    keep_detections = []
    for curr_box in curr_detections:
      nearby_count = 0

      # Checks in range around it
      for k in range(1, outlier_range + 1):

        # Checks next frames
        if i + k < len(csvfile_array):
          if(len(csvfile_array[i+k])) > 1:
            for next_box in getBoundingBoxes(csvfile_array[i+k][1:], video_res):
              if getOverlapArea(curr_box, next_box) >= overlap_threshold:
                nearby_count += 1
              #else:
              #  print(i, i+k, getOverlapArea(curr_box, next_box), 'no overlap')

        # Checks previous frames
        if i - k >= 0:
          if(len(csvfile_array[i-k])) > 1:
            for past_box in getBoundingBoxes(csvfile_array[i-k][1:], video_res):
              if getOverlapArea(curr_box, past_box) >= overlap_threshold:
                nearby_count += 1
              #else:
              #  print(i, i-k, 'no overlap')

      # Removes outliers
      if nearby_count <= outlier_count:
        print(f"{i} Removing outlier {curr_box} {nearby_count} nearby")
        csvfile_array[i].pop(detection_count)
        changes_done[0] += 1
      else:
        keep_detections.append(curr_box)
        detection_count += 1
  # End first iteration

  # Go through each frame
  # Second loop to fill gaps
  # Only adds detections
  for i in range(len(csvfile_array)):

    # Only does checks when there are detections on frame and a gap (empty next frame) is present
    if len(csvfile_array[i]) <= 1:
      continue

    # Formatting from YOLO format to 4 corner coordinates
    formatted_detections = []
    for detection in csvfile_array[i][1:]:
      if detection == "":
        csvfile_array[i].remove(detection)
        continue
      d = reformat(detection.split(" "), video_res)
      formatted_detections.append(f"{d[0]};{d[1]};{d[2]};{d[3]};{d[4]};{d[5]}")

    # Prevent out of bounds errors
    if i + 2 >= len(csvfile_array):
      continue
    # Create array of detections for current frame
    curr_detections = getBoundingBoxes(csvfile_array[i][1:], video_res)
    # Checks within a range for the next detection that overlaps
    prev_iteration = None
    for curr_box in curr_detections:
      #if (not prev_iteration is None) and (getOverlapArea(curr_box, prev_iteration) >= overlap_threshold):
      #  print("overlap within same frame")
      prev_iteration = curr_box
      gap_present = True
      # Checks for empty space directly in front
      for next_box in getBoundingBoxes(csvfile_array[i+1][1:], video_res):
        if getOverlapArea(curr_box, next_box) >= overlap_threshold:
          gap_present = False

      # Checks further in front to find if there is another overlapping frame to fill gap
      if gap_present:
        gap_done = False
        for k in range(2, gap_range + 2):
          if not gap_done and i + k < len(csvfile_array):
            if len(csvfile_array[i+k]) > 1:

              # Checks frame for a bounding box overlapping with current bounding box
              for next_box in getBoundingBoxes(csvfile_array[i+k][1:], video_res):
                if not gap_done:
                  if getOverlapArea(curr_box, next_box) >= overlap_threshold:
                    boxes_to_fill = averageBoxes(curr_box, next_box, k)

                    # Fills gap with bounding boxes calculated from averages
                    print(f"Filling from {i+1} to {i+len(boxes_to_fill)-1}")
                    for x in range(1, len(boxes_to_fill)):
                      box_to_fill = boxes_to_fill.pop()
                      box_to_fillog = box_to_fill.copy()
                      box_to_fill = reformat(box_to_fill, video_res, toCoordinates=False)
                      next_boxes = getBoundingBoxes(csvfile_array[i+x][1:], video_res)
                      future_overlap = False
                      for b in next_boxes:
                        if getOverlapArea(box_to_fillog, b) >= overlap_threshold:
                          future_overlap = True
                      if not future_overlap:
                        csvfile_array[i+x].append(box_to_fill)
                        print(f"{i+x} Filling in gap with {box_to_fillog}")
                        changes_done[1] += 1
                    gap_done = True
  # End second iteration

  # Go through each frame
  # Third iteration to group detections together
  # Only changes existing detections
  # ID exists for the OCR
  id = 1
  
  csv_array_3 = []
  for i in range(len(csvfile_array)):
    if len(csvfile_array[i]) <= 1:
      csv_array_3.append(csvfile_array[i])
      continue

    # Create array of detections for current frame
    curr_detections = getBoundingBoxes(csvfile_array[i][1:], video_res)
    reformatted_detections = []
    k = 1
    for curr_box in curr_detections:
      done = False
      # Assigns existing ID
      if i - 1 >= 0:
        # Checks past boxes in given range
        for x in range(1, id_range + 1):
          if i - x >= 0 and not done:
            for past_box in getBoundingBoxes(csvfile_array[i-x][1:], video_res):
                  # Assumes speed limit signs are the same when occuring at the same time
                  if (getOverlapArea(curr_box, past_box) >= overlap_threshold) or (past_box[4] == label_names[1] and curr_box[4] == label_names[1]):
                    if len(past_box) > 6:
                      reformatted_detections.append(f"{curr_box[0]};{curr_box[1]};{curr_box[2]};{curr_box[3]};{curr_box[4]};{curr_box[5]};{past_box[6]}")
                      csvfile_array[i][k] = csvfile_array[i][k] + f" {past_box[6]}"
                      done = True
                      break
      # Creates a new ID
      if not done:
        for next_box in getBoundingBoxes(csvfile_array[i+1][1:], video_res):
              if getOverlapArea(curr_box, next_box) >= overlap_threshold:
                reformatted_detections.append(f"{curr_box[0]};{curr_box[1]};{curr_box[2]};{curr_box[3]};{curr_box[4]};{curr_box[5]};{id}")
                csvfile_array[i][k] = csvfile_array[i][k] + f" {id}"
                done = True
                summary[source_video_name][id] = curr_box[4]
                id += 1
                break
      if not done:
        reformatted_detections.append(f"{curr_box[0]};{curr_box[1]};{curr_box[2]};{curr_box[3]};{curr_box[4]};{curr_box[5]};none")
      k += 1
      
    csv_array_3.append([csvfile_array[i][0]] + reformatted_detections)
    # End third iteration   
  print(f"{changes_done[0]} total outliers removed, {changes_done[1]} spaces filled in, {id-1} id's assigned")

  # Create directory for video
  video_dir_path = os.path.join(video_output_path, video_name)
  if not os.path.exists(video_dir_path):
    os.mkdir(video_dir_path)
  video_labels_path = os.path.join(video_dir_path, "labels.csv")
  if os.path.exists(video_labels_path):
    os.remove(video_labels_path)

  # Writes labels into a csv file
  with open(video_labels_path, "w", newline='') as new_csv:
    csv_writer = csv.writer(new_csv)
    csv_writer.writerow(["Frame Number", "Detections (x1 x2 y1 y2 class conf id)"])
    for line in csv_array_3:
      csv_writer.writerow(line)
#print("Done, time since start:", time.time() - start_time)

# Prints a summary of detections
for video in summary:
  print(video)
  for id in summary[video]:
    if not summary[video][id] == "urdbl":
      print(f"id: {id} \tlabel: {summary[video][id]}")
#print("Done, time since start:", time.time() - start_time)


20191222_171147_EF.csv
723 Removing outlier [1046, 1920, 0, 1037, 'stop', '0.648128\n'] 0 nearby
1801 Removing outlier [1555, 1588, 644, 680, 'stop', '0.582084\n'] 0 nearby
3209 Removing outlier [744, 803, 600, 655, 'stop', '0.519109\n'] 0 nearby
5183 Removing outlier [1284, 1316, 680, 712, 'stop', '0.604829\n'] 1 nearby
5191 Removing outlier [1359, 1397, 669, 704, 'stop', '0.516333\n'] 0 nearby
Filling from 632 to 636
632 Filling in gap with [1035, 1689, 252, 1033, 'stop', '0.526324\n']
633 Filling in gap with [1034, 1687, 254, 1032, 'stop', '0.526324\n']
634 Filling in gap with [1034, 1686, 255, 1032, 'stop', '0.526324\n']
635 Filling in gap with [1034, 1684, 257, 1032, 'stop', '0.526324\n']
636 Filling in gap with [1034, 1683, 258, 1031, 'stop', '0.526324\n']
Filling from 638 to 638
638 Filling in gap with [1038, 1695, 247, 1033, 'stop', '0.530543\n']
Filling from 640 to 642
640 Filling in gap with [1038, 1702, 238, 1033, 'stop', '0.530543\n']
641 Filling in gap with [1038, 1700, 2

## Using OCR to identify text on signs
Loops through csv to apply EasyOCR to signs and applies speed number to class to signs grouped by id

Any signs unable to be read will be overwritten as "urdbl"

Run the processing before this, as this overwrites label classes


In [27]:
import cv2
import os
import csv
import numpy as np
from PIL import Image
import easyocr
import time

easy_reader = easyocr.Reader(['en'])

summary = {}

# Only iterates through working video directory, not all processed videos

for video_name in os.listdir(video_output_path):
  if not video_name in os.listdir(videos_path):
    continue

  summary[video_name] = {}
  print("\n\n\n" + video_name)
  video_path = os.path.join(video_output_path, video_name)
  labels_path = os.path.join(video_path, "labels.csv")


  # Open labels csv from video
  csv_array = []
  with open(labels_path, "r") as labels_csv:
    csv_reader = csv.reader(labels_csv)
    csv_array = list(csv_reader)
    header = csv_array.pop(0)
  
  source_video_path = os.path.join(videos_path, video_name)
  reader = cv2.VideoCapture(source_video_path)
  total_frames = int(reader.get(cv2.CAP_PROP_FRAME_COUNT))

  stops_by_id = {}
  speeds_by_id = {}
  id_count = {}
  for row in csv_array:
    if len(row) > 1:
      for raw_data in row[1:]:
        d = raw_data.split(";")
        frame_number = int(row[0])
        if frame_number < total_frames:
          reader.set(cv2.CAP_PROP_POS_FRAMES, frame_number)
          success, img = reader.read()
          if d[6] not in id_count:
            id_count[d[6]] = 0
          id_count[d[6]] += 1

          # Block for speed limits, reads number and overwrites label
          if success and d[4] == "speedLimit":
            # Cropping only the sign
            cropped_img = img[int(d[2]):int(d[3]), int(d[0]):int(d[1]), :].copy()
            thresh = preprocess(cropped_img)

            pil_img = Image.fromarray(thresh)
            easy_detections = easy_reader.readtext(np.array(pil_img))
            #cv2_imshow(np.array(pil_img))

            # Gathers only the numbers over a confidence threshold
            numbers = [] # Array to store all numbers from a single frame of detections
            for detect in easy_detections:
              if detect[2] >= ocr_conf_threshold:
                for c in detect[1]:
                  if c.isdigit():
                    numbers.append(c)

            # Prunes the numbers since speed limits are generally at most 2 digits
            while len(numbers) > 2 or (len(numbers) > 0 and numbers[0] == "0"):
                numbers.pop(0)
            numbers = "".join(numbers) # Converts list of numbers to string
            if len(numbers) == 2 or len(numbers) == 1:
              if d[6] not in speeds_by_id:
                speeds_by_id[d[6]] = []
              speeds_by_id[d[6]].append(numbers) # Append to a dictionary of numbers categorized by group ID

          # Block for stop signs, reads to make sure stop signs are stop signs
          elif success and d[4] == "stop":
            cropped_img = img[int(d[2]):int(d[3]), int(d[0]):int(d[1]), :].copy()
            thresh = preprocess(cropped_img)
            pil_img = Image.fromarray(thresh)
            easy_detections = easy_reader.readtext(np.array(pil_img))
            #print(d[6], easy_detections)
            if not d[6] in stops_by_id.keys():
              stops_by_id[d[6]] = 0
            for detection in easy_detections:
              if "stop" in detection[1].lower():
                stops_by_id[d[6]] += 1

  # Speed limits
  # Out of the detections in one group, picks the most frequent for the entire group
  likely_speeds = {}
  print(speeds_by_id)
  for id in speeds_by_id:
    number_of_speeds = {}
    for speed in speeds_by_id[id]:

      # Speed limit numbers are almost always ending in 5 or 0 (ex. 15, 70, 60)
      if len(speed) == 2 and speed[-1] in ["0", "5"]:
        if speed not in number_of_speeds:
          number_of_speeds[speed] = 0
        number_of_speeds[speed] += 1

    # Sometimes misses the 2nd digit
    for speed in speeds_by_id[id]:
      if len(speed) == 1:
        for i in range(len(number_of_speeds)):
          if speed in list(number_of_speeds.keys())[i][0]:
            number_of_speeds[list(number_of_speeds.keys())[i]] += 1

    number_of_speeds["urdbl"] = 0
    print(id, number_of_speeds)
    detections_made = 0
    for speed_count in number_of_speeds:
      detections_made += number_of_speeds[speed_count]
    print(f"{detections_made} detections out of {id_count[id]} frames")

    # Ignore grouping if below detections threshold
    if detections_made / id_count[id] <= ocr_detections_threshold:
      likely_speeds[id] = "urdbl"
    else:
      likely_speeds[id] = max(number_of_speeds, key=number_of_speeds.get)

    for id in likely_speeds:
      if likely_speeds[id] != "urdbl":
        summary[video_name][id] = likely_speeds[id]
    
  # Save to CSV
  with open(labels_path, "w", newline='') as labels_csv:
    csv_writer = csv.writer(labels_csv)
    csv_writer.writerow(header)
    for row in csv_array:
      for i in range(len(row)):
        detection = row[i].split(";")
        if len(detection) == 7:

          # Overwrites for speed limits
          if detection[6] in likely_speeds:
            detection[4] = f"{detection[4]} ({likely_speeds[detection[6]]})"
            row[i] = f"{detection[0]};{detection[1]};{detection[2]};{detection[3]};{detection[4]};{detection[5]};{detection[6]}"
            break

          # Overwrites for stop signs
          elif detection[6] in stops_by_id.keys():
            if stops_by_id[detection[6]] >= id_count[detection[6]] * 0.05:
              if not detection[6] in summary[video_name]:
                summary[video_name][detection[6]] = "stop"
            else:
              row[i] = f"{detection[0]};{detection[1]};{detection[2]};{detection[3]};urdbl;{detection[5]};{detection[6]}"

      csv_writer.writerow(row)
  print(likely_speeds)
  print("===========================================================================")

  
  reader.release()

# Prints a summary of detections
for video in summary:
  print(video)
  for id in summary[video]:
    if not summary[video][id] == "urdbl":
      print(f"id: {id} \tlabel: {summary[video][id]}")
#print("Done, time since start:", time.time() - start_time)

CUDA not available - defaulting to CPU. Note: This module is much faster with a GPU.





20191222_171147_EF.mp4
{'3': ['6', '65', '65', '65', '65', '65', '65', '65', '65', '65', '65', '65', '65', '65', '65', '65', '65', '65', '65', '55', '51', '65', '51', '55', '65', '55', '65', '55', '65']}
3 {'65': 23, '55': 4, 'urdbl': 0}
27 detections out of 54 frames
{'3': '65'}
20191222_171147_EF.mp4
id: 3 	label: 65


##Create video clips highlighting only detections

In [None]:
import csv
import os
import cv2
import math

if not os.path.exists(video_snippets_path):
  os.mkdir(video_snippets_path)

for video_name in os.listdir(videos_path):
  if not video_name in os.listdir(video_output_path):
    continue
  
  specific_video_folder_path = os.path.join(video_output_path, video_name)
  labels_path = os.path.join(specific_video_folder_path, "labels.csv")
  source_video_path = os.path.join(videos_path, video_name)

  print(source_video_path)

  # Gets video resolution and fps from source
  reader = cv2.VideoCapture(source_video_path)
  reader2 = False
  if snippet_interior:
    snippet_interior_name = ".".join(video_name.split(".")[:-1])
    if snippet_interior_name[-1] == "F":
      snippet_interior_name = snippet_interior_name[:-1] + "R"
      #print(snippet_interior_name + ".mp4")
      #print(os.listdir(videos_path))
      if snippet_interior_name + ".mp4" in os.listdir(videos_path):
        interior_source_path = os.path.join(videos_path, snippet_interior_name + ".mp4")
        reader2 = cv2.VideoCapture(interior_source_path)



  video_res = [int(reader.get(cv2.CAP_PROP_FRAME_WIDTH)), int(reader.get(cv2.CAP_PROP_FRAME_HEIGHT)), math.ceil(reader.get(cv2.CAP_PROP_FPS)), int(reader.get(cv2.CAP_PROP_FRAME_COUNT))]
  if snippet_interior and reader2:
    video_res2 = (int(reader2.get(cv2.CAP_PROP_FRAME_WIDTH)), int(reader2.get(cv2.CAP_PROP_FRAME_HEIGHT)))
  fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v')
  

  # Reads from processed CSV
  csv_array = []
  with open(labels_path, "r") as labels_csv:
    csv_reader = csv.reader(labels_csv)
    csv_array = list(csv_reader)
    header = csv_array.pop(0)

   # Groups detections by ID
  labels_by_id = {}
  # {id: [label_class, start_time, end_time]}
  for line in csv_array:
    if len(line) > 1:
      frame = int(line[0])
      for detection in line:
        d = detection.split(";")
        if len(d) == 7:
          labelclass = d[4]
          id = d[6]

          #if "urdbl" in labelclass or labelclass == "speedLimit":
          #  continue

          # Create list for id if not already existing
          if not id in labels_by_id:
            labels_by_id[id] = [labelclass, frame, frame] # [labelclass, earliest_frame, latest_frame]
          # Finds the max and min for frame count per ID
          labels_by_id[id][1] = min(labels_by_id[id][1], frame)
          labels_by_id[id][2] = max(labels_by_id[id][2], frame)

  print(labels_by_id)
  for id in labels_by_id:
    video_name_cut = ".".join(video_name.split(".")[:-1])
    
    snippets_directory = video_snippets_path
    if snippet_split_classes:
      if "stop" in labels_by_id[id][0]:
        snippets_directory = os.path.join(snippets_directory, "stop")
      else:
        snippets_directory = os.path.join(snippets_directory, "speed_limit")
      if not (os.path.exists(snippets_directory) and os.path.isdir(snippets_directory)):
        os.mkdir(snippets_directory)
    specific_snippet_path = os.path.join(snippets_directory, f"{video_name_cut}_{labels_by_id[id][0]}_{id}.mp4")
    print(specific_snippet_path)
    print(video_res)

    # Adding buffer to snippet
    earliest_frame = max(labels_by_id[id][1] - (snippet_buffer * video_res[2]), 0)
    latest_frame = min(labels_by_id[id][2] + (snippet_buffer * video_res[2]), video_res[3] - 1)

    success = True

    # start frame
    frame_count = earliest_frame

    video_writer = cv2.VideoWriter(specific_snippet_path, fourcc, video_res[2], (video_res[0], video_res[1]))

    if snippet_interior and reader2:
      snippet_interior_path = os.path.join(snippets_directory, f"{snippet_interior_name}_{labels_by_id[id][0]}_{id}.mp4")
      print("writing", snippet_interior_path)
      video_writer2 = cv2.VideoWriter(snippet_interior_path, fourcc, video_res[2], video_res2)
      reader2.set(cv2.CAP_PROP_POS_FRAMES, frame_count)


    reader.set(cv2.CAP_PROP_POS_FRAMES, frame_count)

    while success and frame_count <= latest_frame:
      success, frame = reader.read()

      if snippet_interior and reader2:
        success2, frame2 = reader2.read()
        if success2:
          video_writer2.write(frame2)

      for detection in csv_array[frame_count][1:]:
        detection = detection.split(";")

        # Cases where bounding box isn't drawn
        if len(detection) != 6 and len(detection) != 7:
          continue
        #if detection[4] == "speedLimit":
        #  continue

        if id != detection[6]:
          continue

        
        # Draw boxes around detections
        tl = round(0.002 * (frame.shape[0] + frame.shape[1]) / 2) + 1
        c1, c2 = (int(detection[0]), int(detection[2])), (int(detection[1]), int(detection[3]))
        cv2.rectangle(frame, c1, c2, (128, 128, 128), thickness=tl, lineType=cv2.LINE_AA)
        if detection[4]:
          tf = max(tl - 1, 1)
          label_text = detection[4]
          if len(detection) > 6:
            label_text += f" id: {detection[6]}"
          t_size = cv2.getTextSize(label_text, 0, fontScale=tl / 3, thickness=tf)[0]
          c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3
          cv2.rectangle(frame, c1, c2, (128, 128, 128), -1, cv2.LINE_AA)
          cv2.putText(frame, label_text, (c1[0], c1[1] - 2), 0, tl / 3, [225, 255, 255], thickness=tf, lineType=cv2.LINE_AA)

      frame_count += 1
      video_writer.write(frame)
    video_writer.release()
    if snippet_interior and reader2:
      video_writer2.release()
  reader.release()

## Create results CSV
Creates a readable results CSV
>label class, start time, end time

In [None]:
import os
import csv
import cv2
import datetime
import time
import shutil

if results_output_path:
  if not os.path.exists(results_output_path):
    os.mkdir(results_output_path)

for video_name in os.listdir(videos_path):
  video_path = os.path.join(videos_path, video_name)

  if not (os.path.exists(video_path) and not os.path.isdir(video_path) and video_name.endswith(".mp4")):
    continue
  print(video_name)
  
  video_processed_path = os.path.join(video_output_path, video_name)
  results_path = os.path.join(video_processed_path, "results.csv")
  # Writes to processed folder
  if os.path.exists(video_processed_path) and os.path.isdir(video_processed_path):
    with open(results_path, "w", newline='') as results_csv:
      csv_writer = csv.writer(results_csv)
      csv_writer.writerow(["Label class", "ID", "Start time", "End time", "Mean Confidence", "Start datetime", "End datetime"])

  # Writes to results folder if it exists
  if results_output_path and "F.mp4" in video_name:
    results_output_video_path = os.path.join(results_output_path, video_name + ".csv")
    with open(results_output_video_path, "w", newline='') as results_csv:
      csv_writer = csv.writer(results_csv)
      csv_writer.writerow(["Label class", "ID", "Start time", "End time", "Mean Confidence", "Start datetime", "End datetime"])
      

# Creates a list of all snippet files
if results_snippets_sync and os.path.exists(video_snippets_path):
  all_items = os.listdir(video_snippets_path)

  stop_snippets = os.path.join(video_snippets_path, "stop")
  if os.path.exists(stop_snippets) and os.path.isdir(stop_snippets):
      all_items += os.listdir(stop_snippets)

  speed_snippets = os.path.join(video_snippets_path, "speed_limit")
  if os.path.exists(speed_snippets) and os.path.isdir(speed_snippets):
      all_items += os.listdir(speed_snippets)

  # Acquires name and ID from snippets and adds to dictionary
  video_id = {}
  for item in all_items:
    if not os.path.isdir(item):
      if item.endswith(".mp4"):
        video_name = "_".join(item.split("_")[:-2]) + ".mp4"
        id = item.split("_")[-1].split(".")[0]
        if not video_name in video_id:
          video_id[video_name] = []
        if id.isnumeric():
          video_id[video_name].append(id)
  dir = video_id
elif os.path.exists(videos_path):
  dir = [video for video in os.listdir(videos_path) if (not os.path.isdir(os.path.join(videos_path, video)) and video.endswith("F.mp4"))]
else:
  dir = []
print(dir)

# iterates through list of videos
for video_name in dir:
  #print(video_name)
  if not video_name in os.listdir(video_output_path):
    continue
  
  source_video_path = os.path.join(videos_path, video_name)

  # Get FPS from source video
  reader = cv2.VideoCapture(source_video_path)
  source_fps = reader.get(cv2.CAP_PROP_FPS)
  reader.release()

  curr_video_path = os.path.join(video_output_path, video_name)

  csv_array = []
  
  # Copies csv to a list
  labels_path = os.path.join(curr_video_path, "labels.csv")
  with open(labels_path, "r") as labels_csv:
    csv_reader = csv.reader(labels_csv)
    csv_array = list(csv_reader)
    header = csv_array.pop(0)
  
  # Groups detections by ID
  labels_by_id = {}
  # {id: [label_class, start_time, end_time]}
  confidence_by_id = {}
  # {id: [confidence_frame1, confidence_frame2, ...]}
  for line in csv_array:
    if len(line) > 1:
      frame = int(line[0])
      for detection in line:
        d = detection.split(";")
        if len(d) == 7:
          labelclass = d[4]
          conf = float(d[5])
          id = d[6]

          #if "urdbl" in labelclass or labelclass == "speedLimit":
          #  continue

          # Create list for id if not already existing
          if not id in labels_by_id:
            labels_by_id[id] = [labelclass, frame, frame] # [labelclass, earliest_frame, latest_frame]
          # Finds the max and min for frame count per ID
          labels_by_id[id][1] = min(labels_by_id[id][1], frame)
          labels_by_id[id][2] = max(labels_by_id[id][2], frame)

          if not id in confidence_by_id:
            confidence_by_id[id] = []
          confidence_by_id[id].append(conf)

  print("\n" + video_name)
  print(labels_by_id)


  # Writes detections by ID
  results_path = os.path.join(curr_video_path, "results.csv")
  #print(results_path)
  with open(results_path, "w", newline='') as results_csv:
    csv_writer = csv.writer(results_csv)
    csv_writer.writerow(["Label class", "ID", "Start time", "End time", "Mean Confidence", "Start datetime", "End datetime"])
    #print(labels_by_id)
    for id in labels_by_id:
      d = labels_by_id[id]
      label_class = d[0]
      start_seconds = int(d[1] / source_fps)
      end_seconds = int(d[2] / source_fps)

      name_split = video_name.split("_")
      video_date = name_split[0]
      video_time = name_split[1]

      year = int(video_date[0:4])
      month = int(video_date[4:6])
      day = int(video_date[6:8])

      hour = int(video_time[0:2])
      minute = int(video_time[2:4])
      second = int(video_time[4:6])

      csv_start_time = str(datetime.timedelta(seconds=start_seconds))
      csv_start_time2 = datetime.datetime(year, month, day, hour=hour, minute=minute, second=second) + datetime.timedelta(seconds=start_seconds)

      csv_end_time = str(datetime.timedelta(seconds=end_seconds))
      csv_end_time2 = datetime.datetime(year, month, day, hour=hour, minute=minute, second=second) + datetime.timedelta(seconds=end_seconds)

      mean_confidence = sum(confidence_by_id[id]) / len(confidence_by_id[id])
      new_row = [label_class, id, csv_start_time, csv_end_time, mean_confidence, csv_start_time2, csv_end_time2]

      if results_snippets_sync:
        if id in dir[video_name]:
          print(new_row)
          csv_writer.writerow(new_row)
      else:
        print(new_row)
        csv_writer.writerow(new_row)
  if results_output_path:
    #print(results_path, os.path.join(results_output_path, video_name + ".csv"))

    shutil.copy(results_path, os.path.join(results_output_path, video_name + ".csv"))

##Show classes

In [None]:
import cv2
import os
import csv
import numpy as np
from PIL import Image
import time
from google.colab.patches import cv2_imshow

summary = {}

# Only iterates through working video directory, not all processed videos

for video_name in os.listdir(video_output_path):
  if not video_name in os.listdir(videos_path):
    continue

  summary[video_name] = {}
  print("\n\n\n" + video_name)
  video_path = os.path.join(video_output_path, video_name)
  labels_path = os.path.join(video_path, "labels.csv")


  # Open labels csv from video
  csv_array = []
  with open(labels_path, "r") as labels_csv:
    csv_reader = csv.reader(labels_csv)
    csv_array = list(csv_reader)
    header = csv_array.pop(0)
  
  source_video_path = os.path.join(videos_path, video_name)
  reader = cv2.VideoCapture(source_video_path)
  total_frames = int(reader.get(cv2.CAP_PROP_FRAME_COUNT))

  

# Prints a summary of detections
for video in summary:
  print(video)
  for id in summary[video]:
    if not summary[video][id] == "urdbl":
      print(f"id: {id} \tlabel: {summary[video][id]}")
#print("Done, time since start:", time.time() - start_time)

## Create a video with bounding boxes based on labels.csv

In [None]:
import csv
import os
import cv2

for video_name in os.listdir(videos_path):
  if not video_name in os.listdir(video_output_path):
    continue
  print(video_name)
  labels_path = f"{video_output_path}{video_name}/labels.csv"
  videos_write_path = f"{video_output_path}{video_name}/{video_name}"

  # Preserves original video format 
  filetypes = [".mp4", ".MP4", ".avi"]
  if not any(filetype in video_name for filetype in filetypes):
    videos_write_path += ".mp4"

  # Reads from processed CSV
  csv_array = []
  with open(labels_path, "r") as labels_csv:
    csv_reader = csv.reader(labels_csv)
    header = next(csv_reader, None)
    for line in csv_reader:
      csv_array.append(line)

  # Gets video resolution and fps from source
  reader = cv2.VideoCapture(videos_path + video_name)
  video_res = [int(reader.get(cv2.CAP_PROP_FRAME_WIDTH)), int(reader.get(cv2.CAP_PROP_FRAME_HEIGHT)), reader.get(cv2.CAP_PROP_FPS)]
  fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v')
  writer = cv2.VideoWriter(videos_write_path, fourcc, video_res[2], (video_res[0], video_res[1]))
  
  success = True
  frame_count = 0
  while success:
    success, frame = reader.read()
    if success:
      if(frame_count < len(csv_array) and len(csv_array[frame_count]) > 1):
        for detection in csv_array[frame_count][1:]:
          detection = detection.split(";")

          # Cases where bounding box isn't drawn
          if len(detection) != 6 and len(detection) != 7:
            continue
          if detection[4] == "speedLimit":
            continue
          if "urdbl" in detection[4]:
            continue

          # Draw boxes around detections
          tl = round(0.002 * (frame.shape[0] + frame.shape[1]) / 2) + 1
          c1, c2 = (int(detection[0]), int(detection[2])), (int(detection[1]), int(detection[3]))
          cv2.rectangle(frame, c1, c2, (128, 128, 128), thickness=tl, lineType=cv2.LINE_AA)
          if detection[4]:
            tf = max(tl - 1, 1)
            label_text = detection[4]
            if len(detection) > 6:
              label_text += f" id: {detection[6]}"
            t_size = cv2.getTextSize(label_text, 0, fontScale=tl / 3, thickness=tf)[0]
            c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3
            cv2.rectangle(frame, c1, c2, (128, 128, 128), -1, cv2.LINE_AA)
            cv2.putText(frame, label_text, (c1[0], c1[1] - 2), 0, tl / 3, [225, 255, 255], thickness=tf, lineType=cv2.LINE_AA)

      frame_count += 1
      writer.write(frame)
  writer.release()
  reader.release()

# Train



Train a YOLOv5s model starting from pretrained `--weights yolov5s.pt`, or from randomly initialized `--weights '' --cfg yolov5s.yaml`. Models are downloaded automatically from the [latest YOLOv5 release](https://github.com/ultralytics/yolov5/releases), and **COCO, COCO128, and VOC datasets are downloaded automatically** on first use.

All training results are saved to `runs/train/` with incrementing run directories, i.e. `runs/train/exp2`, `runs/train/exp3` etc.


In [None]:
# Tensorboard  (optional)
%load_ext tensorboard
%tensorboard --logdir runs/train

In [None]:
%reload_ext tensorboard

In [None]:
# hyperparameter evolution
!python train.py --img 640 --batch 16 --epochs 10 --data lisa.yaml --weights "./yolov5x.pt" --evolve

In [None]:

!python train.py --img 640 --batch 16 --epochs 200 --data lisa.yaml --weights "./yolov5l.pt"

In [None]:
!python train.py --resume

In [None]:
pwd

# Misc.

Prints out class names of all proccessed videos

In [None]:
import os
import csv

for video_name in os.listdir(video_output_path):
  labels_path = f"{video_output_path}{video_name}/labels.csv"
  csv_array = []
  with open(labels_path, "r") as labels_csv:
    csv_reader = csv.reader(labels_csv)
    header = next(csv_reader, None)
    for line in csv_reader:
      csv_array.append(line)
  types_by_id = {}
  for line in csv_array:
    for detection in line:
      d = detection.split(";")
      if len(d) < 5:
        continue
      classname = d[4]
      if classname == "urdbl" in classname:
        continue
      id = d[6]
      if not id in types_by_id:
        types_by_id[id] = classname
  print(video_name)
  classes = ""
  for id in types_by_id:
    classes += " " + types_by_id[id]
  print(classes, "\n")

Copy only videos from processed videos to specified folder

In [None]:
import os
import shutil

folder_to_copy_to = "../sample 2/"

folder_found = True
if os.path.exists(folder_to_copy_to) and os.path.isdir(folder_to_copy_to):
  print("Starting copy")
else:
  print("Folder not found")
  folder_found = False

if folder_found:
  for video_name in os.listdir(video_output_path):
    for file in os.listdir(video_output_path + video_name):
      if file[-3:].lower() == "mp4":
        print("Copying", file)
        shutil.copyfile(f"{video_output_path}{video_name}/{file}", f"{folder_to_copy_to}{file}")

## Local Logging

All results are logged by default to `runs/train`, with a new experiment directory created for each new training as `runs/train/exp2`, `runs/train/exp3`, etc. View train and val jpgs to see mosaics, labels, predictions and augmentation effects. Note a **Mosaic Dataloader** is used for training (shown below), a new concept developed by Ultralytics and first featured in [YOLOv4](https://arxiv.org/abs/2004.10934).

In [None]:
Image(filename='runs/train/exp/train_batch0.jpg', width=800)  # train batch 0 mosaics and labels
Image(filename='runs/train/exp/test_batch0_labels.jpg', width=800)  # val batch 0 labels
Image(filename='runs/train/exp/test_batch0_pred.jpg', width=800)  # val batch 0 predictions

> <img src="https://user-images.githubusercontent.com/26833433/124931219-48bf8700-e002-11eb-84f0-e05d95b118dd.jpg" width="700">  
`train_batch0.jpg` shows train batch 0 mosaics and labels

> <img src="https://user-images.githubusercontent.com/26833433/124931217-4826f080-e002-11eb-87b9-ae0925a8c94b.jpg" width="700">  
`test_batch0_labels.jpg` shows val batch 0 labels

> <img src="https://user-images.githubusercontent.com/26833433/124931209-46f5c380-e002-11eb-9bd5-7a3de2be9851.jpg" width="700">  
`test_batch0_pred.jpg` shows val batch 0 _predictions_

Training results are automatically logged to [Tensorboard](https://www.tensorflow.org/tensorboard) and `runs/train/exp/results.txt`, which is plotted as `results.png` (below) after training completes. You can also plot any `results.txt` file manually:

In [None]:
from utils.plots import plot_results 
plot_results(save_dir='runs/train/exp')  # plot all results*.txt files in 'runs/train/exp'
Image(filename='runs/train/exp/results.png', width=800)

<p align="left"><img width="800" alt="COCO128 Training Results" src="https://user-images.githubusercontent.com/26833433/125273596-6300aa00-e30d-11eb-8dc4-70a960c53013.png"></p>

# Status

![CI CPU testing](https://github.com/ultralytics/yolov5/workflows/CI%20CPU%20testing/badge.svg)

If this badge is green, all [YOLOv5 GitHub Actions](https://github.com/ultralytics/yolov5/actions) Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training ([train.py](https://github.com/ultralytics/yolov5/blob/master/train.py)), testing ([val.py](https://github.com/ultralytics/yolov5/blob/master/val.py)), inference ([detect.py](https://github.com/ultralytics/yolov5/blob/master/detect.py)) and export ([export.py](https://github.com/ultralytics/yolov5/blob/master/export.py)) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.
