In [None]:
import os
import numpy as np 
import pandas as pd 
import cv2
import matplotlib.pyplot as plt
import re
from collections import defaultdict

In [None]:
#Constants
CONFIDENCE_THRESHOLD = 0.1 # Filter predicted bboxes

#Seeds
SEED = 42
np.random.seed(seed=SEED)

## Global Wheat Detection. Predict with pre-trained YOLO v4 and Darknet.

This notebook is dedicated to making predictions for Global Wheat Detection competition. I saw some YOLO implementation based on PyTorch or TensorFlow, but I will show you how to do it only with Darknet, without additional frameworks. 

### Make darknet

**DEPRECATED**

**This approach will not allow you to use the GPU in Kaggle Kernles, but it works fine for CPU**

Usually, you can just ```! git clone``` files from repositories, but this competition prohibits the internet access of submission kernels. If you want to use the internet, check the 4-th version of this notebook.
We will use additional files from ```global_wheat_detection_models```:
* ```darknet``` - copy of [AlexeyAB's Darknet repository](https://github.com/AlexeyAB/darknet)
* ```competition_files``` - some files from [my repository for this competition](https://github.com/Gooogr/Kaggle_Global_Wheat_Detection). We need custom cfg. file and script for result log.txt file. But you can also find there training Google Colab notebook.
* ```yolov4.weights``` - pre-trained weights on COCO dataset from AlexeyAB's repository. We will use it to check our darknet build.
* ```yolov4_naive.weights``` - current version of my pre-trained weights from Google Colab.



**RUNNING WITH GPU** 

This method was discovered by Mark Perg, this is the link to his noteboook: https://www.kaggle.com/markpeng/darknet-gpu-on-kaggle<br>
We will use a pre-built darknet binaries with additional files (libdarknet.so and minimal example files for ```dog.jpg```  predict)

In [None]:
## Use version from github
# ! git clone https://github.com/AlexeyAB/darknet.git 

# Use Darknet with CPU and make it from source
# ! cp -a /kaggle/input/global-wheat-detection-models/darknet/darknet/. /kaggle/darknet/

## Use pre-built Daknet binaries with GPU support
! cp -a /kaggle/input/global-wheat-detection-models/darknet_gpu_prebuilt/darknet_gpu_prebuilt/. /kaggle/darknet/

In [None]:
# %cd /kaggle/darknet

# ## Uncomment if you want to use Darknet with GPU.

# !sed -i 's/OPENCV=0/OPENCV=1/' Makefile
# # !sed -i 's/GPU=0/GPU=1/' Makefile
# # !sed -i 's/CUDNN=0/CUDNN=1/' Makefile
# # !sed -i 's/OPENMP=0/OPENMP=1/' Makefile

# !head Makefile

# %%capture 
# #Use  %%caputure to hide huge terminal output
# ! make clean
# ! make --silent

Test our darknet build.<br>
We can everything for that by default, except weights. Copy them  from additional files (you can also find them in darknet folder). It will be pre-trained weights from AlexeyAB's github trained on COCO dataset.

In [None]:
! mkdir /kaggle/darknet/weights
! cp -a /kaggle/input/global-wheat-detection-models/yolov4.weights /kaggle/darknet/weights

In [None]:
%%capture 
%cd /kaggle/darknet
! chmod 777 ./darknet
! ./darknet detect cfg/yolov4.cfg weights/yolov4.weights data/dog.jpg -dont_show

In [None]:
sample_preds = cv2.imread('predictions.jpg')
fig, ax = plt.subplots(figsize=(7, 7))
ax.imshow(sample_preds)
fig.show()

Good, now let's move to our custom model predictions.

### Setting up custom config files and weights


Create ```my_files``` folder inside darknet directory for nessesary files. Copy prepaired files and add pre-trained weights.

In [None]:
! ls /kaggle/input/global-wheat-detection-models/competition_files/competition_files

yolov4.cfg file, txt2json script and weights

In [None]:
! mkdir /kaggle/darknet/my_files
# cfg file and txt2json
! cp -a /kaggle/input/global-wheat-detection-models/competition_files/competition_files/. /kaggle/darknet/my_files
# yolo weights (CHANGE LINK TO YOUR WEIGHTS HERE IF YOU NEED)
! cp -a /kaggle/input/global-wheat-detection-models/yolov4_naive.weights /kaggle/darknet/weights

In [None]:
!mv /kaggle/darknet/my_files/yolov4-custom.cfg /kaggle/darknet/my_files/yolov4.cfg 

In [None]:
%cd /kaggle/darknet/my_files

obj.names

In [None]:
%%writefile obj.names
Wheat head

yolo.data

In [None]:
%%writefile yolo.data
#classses = 1
names = /kaggle/darknet/my_files/obj.names

predict.txt

In [None]:
def create_path_file(files_dir, save_dir):
    %cd /kaggle/working/ 
    # from https://stackoverflow.com/questions/9816816/get-absolute-paths-of-all-files-in-a-directory
    file = open(os.path.join(save_dir, "predict.txt"), "w")
    for root, dirs, files in os.walk(os.path.abspath(files_dir)):
        for item in files:
            row = os.path.join(root, item)
            file.write(row)
            file.write('\n')
    file.close()

In [None]:
create_path_file(files_dir='/kaggle/input/global-wheat-detection/test', 
                 save_dir='/kaggle/darknet/my_files/')

In [None]:
! head /kaggle/darknet/my_files/predict.txt 

### Predict

Let's test our config on a separate image from test set

In [None]:
! ls /kaggle/darknet

In [None]:
%%capture 
%cd /kaggle/darknet

! ./darknet detector test \
my_files/yolo.data \
my_files/yolov4.cfg \
weights/yolov4_naive.weights \
/kaggle/input/global-wheat-detection/test/2fd875eaa.jpg -dont_show

In [None]:
sample_preds = cv2.imread('predictions.jpg')
fig, ax = plt.subplots(figsize=(7, 7))
ax.imshow(sample_preds)
fig.show()

Generate log file with boundary boxes for full test dataset

In [None]:
%%capture 
%cd /kaggle/darknet

! ./darknet detector test \
my_files/yolo.data \
my_files/yolov4.cfg \
weights/yolov4_naive.weights \
-dont_show -ext_output < my_files/predict.txt > log.txt

In [None]:
# # Uncomment to see result log file
# ! cat log.txt

### Make submission

Now we have log.txt with all bboxes in it. Time to parse this data to panads data frame.

In [None]:
def txt2json(file_path):
    file_lines = open(file_path, 'r').read()
    table_dict = defaultdict()
    current_jpg_name = ''

    jpg_delimiters = " ", "/", ":"
    jpg_regexPattern = '|'.join(map(re.escape, jpg_delimiters))

    for line in file_lines.splitlines():
        if '.jpg' in line:
            for item in re.split(jpg_regexPattern, line):
                if '.jpg' in item:
                    current_jpg_name = item
                    table_dict[item] = []
        if '%' in line:
            split_string = (re.findall('-?\d+', line))
            split_string = list(filter(lambda x: x != "", split_string)) # remove empty strings from list
            int_string = list(map(int, split_string))
            sub_dict_keys = ['proba_%', 'left_x', 'top_y', 'width', 'height']
            table_dict[current_jpg_name].append(dict(zip(sub_dict_keys, int_string)))
    return table_dict

In [None]:
data = txt2json('/kaggle/darknet/log.txt')

In [None]:
# data['empty_sample'] = list()  #ONLY FOR NEGATIVE TEST, DON'T UNCOMMENT

In [None]:
img_id, proba, left_x, top_y, width, height = list([]), [], [], [], [], []
for key in data.keys():
    try:
        df = pd.DataFrame(data[key])
        img_id.extend([key] * len(df))
        proba.extend(df['proba_%'].values)
        left_x.extend(df['left_x'].values)
        top_y.extend(df['top_y'].values)
        width.extend(df['width'].values)
        height.extend(df['height'].values)
    except: # in case of no detections
        img_id.extend([key])
        proba.extend([np.nan])
        left_x.extend([np.nan])
        top_y.extend([np.nan])
        width.extend([np.nan])
        height.extend([np.nan])

result_df = pd.DataFrame(list(zip(img_id, proba, left_x, top_y, width, height)), 
                         columns = ['img', 'proba_%', 'left_x', 'top_y', 'width', 'height'])
result_df.head()

Convert our submission to the final form

In [None]:
sample_submission = pd.read_csv('/kaggle/input/global-wheat-detection/sample_submission.csv')
sample_submission.head().T

In [None]:
result_df['proba_ratio'] = result_df['proba_%'] / 100

In [None]:
def format_list(confidence, x, y, width, height):
    temp_list =  [confidence, x, y, width, height]
    if not np.isnan(confidence):
        return ' '.join(str(item) for item in temp_list)
    else:
        return np.nan

In [None]:
result_df['sub_list'] = result_df.apply(lambda x: format_list(x.proba_ratio, 
                                                              x.left_x, 
                                                              x.top_y, 
                                                              x.width, 
                                                              x.height), axis = 1)

In [None]:
filter_condition = (result_df['proba_ratio'] > CONFIDENCE_THRESHOLD) | (result_df['proba_ratio'].isna())
result_df = result_df[filter_condition]

result_df.fillna('', inplace=True)

img_pred_list = []
for img_name in result_df['img'].unique():
    img_pred_list.append(' '.join(str(item) for item in result_df[result_df['img']==img_name].sub_list))

img_names = [item.split('.')[0] for item in result_df['img'].unique()]

submission = pd.DataFrame(zip(img_names, img_pred_list), 
                          columns = ['image_id', 'PredictionString'])

In [None]:
submission.head()

In [None]:
submission.to_csv('/kaggle/working/submission.csv', index=False)

### References

* [Yolov4-darknet-Inference](https://www.kaggle.com/pabloberhauser/yolov4-darknet-inference)
* [darknet-gpu-on-kaggle](https://www.kaggle.com/markpeng/darknet-gpu-on-kaggle)
* [Darknet](https://github.com/AlexeyAB/darknet/)