# Model Training for Manuela Visual Inspection


In this notebook we are going to train a custom Yolov4 model for detecting anomalies in images. The data set for this demonstrator is based on the Metal Nut Data Set from mvtec.com

**Metal Nut Data Set**
- Credits to https://www.mvtec.com/company/research/datasets
- See also: https://www.mvtec.com/company/research/datasets/mvtec-ad

**ATTRIBUTION**

Paul Bergmann, Michael Fauser, David Sattlegger, Carsten Steger. MVTec AD - A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection; in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

**LICENSE**

The data is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). For using the data in a way that falls under the commercial use clause of the license, please contact us via the form below.


## Introduction

We are going to train and validate a YOLO Neural Network. You only look once (YOLO) is a state-of-the-art, real-time object detection system. The tool that we use is called Darknet. Darknet is an open source neural network framework written in C and CUDA. It is fast, easy to install, and supports CPU and GPU computation. Later on, the train model will be converted into a TensorFlow model, but this is not part of this notebook.
For a good Yolov4 introduction with Darknet please watch [YOLOv4 in the CLOUD: Install and Run Object Detector](https://www.youtube.com/watch?v=mKAEGSxwOAY)

**High Level Overview**
- Install Darknet
- Download and inspect the training data
- Run the model training
- Test the trained model



## Install Darknet

Darknet is installed by cloning the GitHub Repo and compiling the source code. Details see https://github.com/AlexeyAB/darknet#how-to-use-on-the-command-line




In [2]:
!git clone https://github.com/AlexeyAB/darknet.git

Cloning into 'darknet'...
remote: Enumerating objects: 3, done.[K
remote: Counting objects: 100% (3/3), done.[K
remote: Compressing objects: 100% (3/3), done.[K
remote: Total 14751 (delta 0), reused 1 (delta 0), pack-reused 14748[K
Receiving objects: 100% (14751/14751), 13.31 MiB | 24.12 MiB/s, done.
Resolving deltas: 100% (10031/10031), done.
Checking out files: 100% (2023/2023), done.


In [1]:
# Configure the Makefile
%cd darknet
!sed -i 's/GPU=0/GPU=1/' Makefile \
    && sed -i 's/CUDNN=0/CUDNN=1/' Makefile \
    && sed -i 's/CUDNN_HALF=0/CUDNN_HALF=1/' Makefile
# !sed -i 's/OPENCV=0/OPENCV=1/' Makefile # OpenCV not installed

/opt/app-root/src/darknet


In [3]:
# Build Darknet ... ignore the warnings 
!make

chmod +x *.sh
g++ -std=c++11 -std=c++11 -Iinclude/ -I3rdparty/stb/include -DGPU -I/usr/local/cuda/include/ -DCUDNN -DCUDNN_HALF -Wall -Wfatal-errors -Wno-unused-result -Wno-unknown-pragmas -fPIC -Ofast -DGPU -DCUDNN -I/usr/local/cudnn/include -DCUDNN_HALF -c ./src/image_opencv.cpp -o obj/image_opencv.o
g++ -std=c++11 -std=c++11 -Iinclude/ -I3rdparty/stb/include -DGPU -I/usr/local/cuda/include/ -DCUDNN -DCUDNN_HALF -Wall -Wfatal-errors -Wno-unused-result -Wno-unknown-pragmas -fPIC -Ofast -DGPU -DCUDNN -I/usr/local/cudnn/include -DCUDNN_HALF -c ./src/http_stream.cpp -o obj/http_stream.o
[01m[K./src/http_stream.cpp:[m[K In member function ‘[01m[Kbool JSON_sender::write(const char*)[m[K’:
                 int n = _write(client, outputbuf, outlen);
[01;32m[K                     ^[m[K
[01m[K./src/http_stream.cpp:[m[K In function ‘[01m[Kvoid set_track_id(detection*, int, float, float, float, int, int, int)[m[K’:
         for (int i = 0; i < v.size(); ++i) {
[01;32m[K    

In [4]:
# Verify CUDA
!/usr/local/cuda/bin/nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243


## Dowload Training data

Before we can start the model training we need to download the training data, a base yolo model and the darknet yolo configuration. The base model should help shorten the learning time so that we don't start the learning from scratch.


First install opencv and define helper functions:

In [1]:
!pip install opencv-python-headless

Collecting opencv-python-headless
  Downloading opencv_python_headless-4.5.1.48-cp36-cp36m-manylinux2014_x86_64.whl (37.6 MB)
[K     |████████████████████████████████| 37.6 MB 17.6 MB/s eta 0:00:01�████████▌           | 24.2 MB 16.3 MB/s eta 0:00:01
Installing collected packages: opencv-python-headless
Successfully installed opencv-python-headless-4.5.1.48


In [4]:
# define helper functions
def imShow(path):
  import cv2
  import matplotlib.pyplot as plt
  %matplotlib inline

  image = cv2.imread(path)
  height, width = image.shape[:2]
  resized_image = cv2.resize(image,(3*width, 3*height), interpolation = cv2.INTER_CUBIC)

  fig = plt.gcf()
  fig.set_size_inches(18, 10)
  plt.axis("off")
  plt.imshow(cv2.cvtColor(resized_image, cv2.COLOR_BGR2RGB))
  plt.show()

The data.zip package contains images, yolo annotation files and configuration files for the model training.

Let's download and unpack the file:
TODO: **Switch to Manuele Repo**

In [5]:
!curl -O https://raw.githubusercontent.com/sa-mw-dach/manuela-visual-inspection/main/ml/darknet/data.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 43.7M  100 43.7M    0     0  14.2M      0  0:00:03  0:00:03 --:--:-- 14.2M


In [6]:
!unzip -o data.zip && rm data.zip

Archive:  data.zip
 extracting: data/metal_yolo/classes.txt  
  inflating: data/metal_yolo/bent-000.png  
  inflating: data/metal_yolo/bent-000.txt  
  inflating: data/metal_yolo/bent-001.png  
  inflating: data/metal_yolo/bent-001.txt  
  inflating: data/metal_yolo/bent-002.png  
  inflating: data/metal_yolo/bent-002.txt  
  inflating: data/metal_yolo/bent-003.png  
  inflating: data/metal_yolo/bent-003.txt  
  inflating: data/metal_yolo/bent-004.png  
  inflating: data/metal_yolo/bent-004.txt  
  inflating: data/metal_yolo/bent-005.png  
  inflating: data/metal_yolo/bent-005.txt  
  inflating: data/metal_yolo/bent-006.png  
  inflating: data/metal_yolo/bent-006.txt  
  inflating: data/metal_yolo/bent-007.png  
  inflating: data/metal_yolo/bent-007.txt  
  inflating: data/metal_yolo/bent-008.png  
  inflating: data/metal_yolo/bent-008.txt  
  inflating: data/metal_yolo/bent-009.png  
  inflating: data/metal_yolo/bent-009.txt  
  inflating: data/metal_yolo/bent-010.png  
  inflating: d

First we can have a look at an image and the related yolo annotation.

Yolo labeling annotation files are  .txt files with the same name for each image file.  The .txt file contains the annotations for the corresponding image file, that is object class, object coordinates, height and width.

```
<object-class> <x> <y> <width> <height>
```
Each object is a new line.

The classes are defines in a class file. I.e. data/metal_yolo/classes.txt
We have only two classes: scratch and bent

    
    

In [9]:
!cat data/metal_yolo/classes.txt

scratch
bent


In [None]:
# Show an example image
imShow('data/metal_yolo/bent-000.png')

In [10]:
# Show the related yolo annoation: <object-class> <x> <y> <width> <height>
!cat data/metal_yolo/bent-000.txt

1 0.8842857142857142 0.425 0.13714285714285715 0.33285714285714285
1 0.46 0.11857142857142858 0.22 0.13142857142857142


The following two files, metal-data.data and yolov4-custom-metal.cfg, define the location of the training data and yolo network:

In [8]:
# Inspect darknet/data/metal-data.data
!cat data/metal-data.data

classes = 2
train = data/train.txt
valid = data/test.txt
names = data/metal_yolo/classes.txt
backup = backup

In [9]:
# Inspect darknet/data/yolov4-custom-metal.cfg
!cat  data/yolov4-custom-metal.cfg

[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=32
width=416
height=416
channels=3
momentum=0.949
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
#burn_in=100
max_batches = 6000
#max_batches = 600
policy=steps
steps=4800,5400
#steps=480,540


scales=.1,.1

#cutmix=1
mosaic=0

#:104x104 54:52x52 85:26x26 104:13x13 for 416

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=mish

# Downsample

[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish

[route]
layers = -2

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=32
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=

Finally, let's download a pre-model so that we don't start from scratch. This is also called transfer learning.

To start training on YOLOv4, we typically download pretrained weights. These weights have been pretrained on the COCO dataset, which includes common objects like people, bikes, and cars. It is generally a good idea to start from pretrained weights, especially if you believe your objects are similar to the objects in COCO.

Scratch and bent are not in the COCO dataset, but we can give it a try.

In [None]:
!curl -L -O https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.conv.137

# Model Training

Start the model training with darknet is a 'simpĺe' CLI call. We discussed the required configuration in the previous section.


## Run the training

stderr goes to /tmp/dn.log and some stdout line are dropped so that the cell output does not overflow the notebook. The training run even with GPUs several hours. Checkpoints with weights are stored in the backup directory.

In [None]:
!./darknet detector train data/metal-data.data data/yolov4-custom-metal.cfg  yolov4.conv.137 -dont_show -map  2> /tmp/dn.log | grep -v 'next mAP calculation a'

The learning progress visualized with a chart is going to look like the following example. (Run the next cell if the chart is not visible)

- The blue line show the loss. The loss is the model error and should be a small number under 1.
- The red line is the [mAP](https://jonathan-hui.medium.com/map-mean-average-precision-for-object-detection-45c121a31173) over time in percent. The average precision computes the average precision value for recall value over 0 to 1. A value abode 50% is rather good in object detection.



![darknet-yolo-learning-progress](https://github.com/sa-mw-dach/manuela-visual-inspection/raw/main/images/darknet-yolo-learning-progress.gif)

## Test the trained model
Now let's check the trained Yolo model. 

- Check Model Mean Average Precision (mAP)
- Predict and check results using two images

### Download the trained model (optionally)
In case you could not run the training to the end due to time or resources constrains, you can download a trained model to perform the remaining steps.

In [12]:
# Download and unpack
!curl -LO https://github.com/sa-mw-dach/manuela-visual-inspection/releases/download/v0.1-alpha-darknet/model.tar && tar xvf model.tar -C backup && rm model.tar

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   617  100   617    0     0   5735      0 --:--:-- --:--:-- --:--:--  5766
100  244M  100  244M    0     0  93.5M      0  0:00:02  0:00:02 --:--:--  108M
yolov4-custom-metal_final.weights
yolov4-custom-metal-test.cfg
classes.txt


### Check Model Mean Average Precision (mAP)

Inspect mAP of your model after the training. Run the following command on any of the saved weights from the training to see the mAP value for that specific weight's file. 


In [13]:
!./darknet detector map data/metal-data.data data/yolov4-custom-metal.cfg backup/yolov4-custom-metal_final.weights

 CUDA-version: 10010 (11000), cuDNN: 7.6.5, CUDNN_HALF=1, GPU count: 1  
 CUDNN_HALF=1 
 OpenCV isn't used - data augmentation will be slow 
 0 : compute_capability = 350, cudnn_half = 0, GPU: Tesla K20Xm 
net.optimized_memory = 0 
mini_batch = 1, batch = 32, time_steps = 1, train = 0 
   layer   filters  size/strd(dil)      input                output
   0 Create CUDA-stream - 0 
 Create cudnn-handle 0 
conv     32       3 x 3/ 1    416 x 416 x   3 ->  416 x 416 x  32 0.299 BF
   1 conv     64       3 x 3/ 2    416 x 416 x  32 ->  208 x 208 x  64 1.595 BF
   2 conv     64       1 x 1/ 1    208 x 208 x  64 ->  208 x 208 x  64 0.354 BF
   3 route  1 		                           ->  208 x 208 x  64 
   4 conv     64       1 x 1/ 1    208 x 208 x  64 ->  208 x 208 x  64 0.354 BF
   5 conv     32       1 x 1/ 1    208 x 208 x  64 ->  208 x 208 x  32 0.177 BF
   6 conv     64       3 x 3/ 1    208 x 208 x  32 ->  208 x 208 x  64 1.595 BF
   7 Shortcut Layer: 4,  wt = 0, wn = 0, outputs: 208

**Expected output:**
```
...
class_id = 0, name = scratch, ap = 100.00%   	 (TP = 6, FP = 0) 
class_id = 1, name = bent, ap = 100.00%   	 (TP = 8, FP = 0) 

 for conf_thresh = 0.25, precision = 1.00, recall = 1.00, F1-score = 1.00 
 for conf_thresh = 0.25, TP = 14, FP = 0, FN = 0, average IoU = 93.24 % 

 IoU threshold = 50 %, used Area-Under-Curve for each unique Recall 
 mean average precision (mAP@0.50) = 1.000000, or 100.00 % 
...
```

- scratch: True Positives = 6, False Positives = 0
- bent: True Positives = 8, False Positives = 0
- F1-score = 1.00  (yeah!)


### Predict and check results

In [16]:
# run your custom detector with this command
!./darknet detector test data/metal-data.data data/yolov4-custom-metal.cfg  backup/yolov4-custom-metal_final.weights data/metal_yolo/bent-000.png -thresh 0.3


 CUDA-version: 10010 (11000), cuDNN: 7.6.5, CUDNN_HALF=1, GPU count: 1  
 CUDNN_HALF=1 
 OpenCV isn't used - data augmentation will be slow 
 0 : compute_capability = 350, cudnn_half = 0, GPU: Tesla K20Xm 
net.optimized_memory = 0 
mini_batch = 1, batch = 32, time_steps = 1, train = 0 
   layer   filters  size/strd(dil)      input                output
   0 Create CUDA-stream - 0 
 Create cudnn-handle 0 
conv     32       3 x 3/ 1    416 x 416 x   3 ->  416 x 416 x  32 0.299 BF
   1 conv     64       3 x 3/ 2    416 x 416 x  32 ->  208 x 208 x  64 1.595 BF
   2 conv     64       1 x 1/ 1    208 x 208 x  64 ->  208 x 208 x  64 0.354 BF
   3 route  1 		                           ->  208 x 208 x  64 
   4 conv     64       1 x 1/ 1    208 x 208 x  64 ->  208 x 208 x  64 0.354 BF
   5 conv     32       1 x 1/ 1    208 x 208 x  64 ->  208 x 208 x  32 0.177 BF
   6 conv     64       3 x 3/ 1    208 x 208 x  32 ->  208 x 208 x  64 1.595 BF
   7 Shortcut Layer: 4,  wt = 0, wn = 0, outputs: 208

In [None]:
imShow('predictions.jpg')

In [18]:
# run your custom detector
!./darknet detector test data/metal-data.data data/yolov4-custom-metal.cfg  backup/yolov4-custom-metal_final.weights data/metal_yolo/scratch-000.png -thresh 0.3


 CUDA-version: 10010 (11000), cuDNN: 7.6.5, CUDNN_HALF=1, GPU count: 1  
 CUDNN_HALF=1 
 OpenCV isn't used - data augmentation will be slow 
 0 : compute_capability = 350, cudnn_half = 0, GPU: Tesla K20Xm 
net.optimized_memory = 0 
mini_batch = 1, batch = 32, time_steps = 1, train = 0 
   layer   filters  size/strd(dil)      input                output
   0 Create CUDA-stream - 0 
 Create cudnn-handle 0 
conv     32       3 x 3/ 1    416 x 416 x   3 ->  416 x 416 x  32 0.299 BF
   1 conv     64       3 x 3/ 2    416 x 416 x  32 ->  208 x 208 x  64 1.595 BF
   2 conv     64       1 x 1/ 1    208 x 208 x  64 ->  208 x 208 x  64 0.354 BF
   3 route  1 		                           ->  208 x 208 x  64 
   4 conv     64       1 x 1/ 1    208 x 208 x  64 ->  208 x 208 x  64 0.354 BF
   5 conv     32       1 x 1/ 1    208 x 208 x  64 ->  208 x 208 x  32 0.177 BF
   6 conv     64       3 x 3/ 1    208 x 208 x  32 ->  208 x 208 x  64 1.595 BF
   7 Shortcut Layer: 4,  wt = 0, wn = 0, outputs: 208

In [None]:
# show chart.png of how custom object detector did with training
imShow('predictions.jpg')

Now we could download and version the train model so that we can use it in the application for detection anomalies.