* This notebook is for make yolo label txt files from given csv file.

* **NOTE:** I removing very small and very large Bounding boxes from dataset in this notebook, becuase it might effect accuracy of model.

* If you want all bounding boxes data check our [Github repo](https://github.com/DhruvMakwana/Global-Wheat-Detection).

* I got removing very large and small idea from [aleksandradeis](https://www.kaggle.com/aleksandradeis) 's EDA [notebook](https://www.kaggle.com/aleksandradeis/globalwheatdetection-eda).

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
from tqdm import tqdm
import ast
import cv2

In [None]:
os.mkdir('labels')

In [None]:
train_csv_path = '../input/global-wheat-detection/train.csv'
output_path = 'labels/'
img_path = '../input/global-wheat-detection/train/'

# Data Processing 

In [None]:
train_csv = pd.read_csv(train_csv_path)
train_csv.head()

In [None]:
train_csv.info()

In [None]:
type(train_csv['bbox'][0])

Here data type of **train_csv['bbox']** is string so now convert it into LIST.

In [None]:
train_csv['bbox'] = train_csv['bbox'].apply(ast.literal_eval)

In [None]:
type(train_csv['bbox'][0])

In [None]:
#groupby all images
df = train_csv.groupby('image_id')['bbox'].apply(list).reset_index()
df.head()

# Converting into YOLO format and saving

Loop through extracting bb from csv and saving it yolo format

In [None]:
cnt = 0
removed = pd.DataFrame(columns=['img_id','bb'])
for i,row in tqdm(df.iterrows(),total = len(df)):
    image_name = row['image_id']
    bboxes = row['bbox']
    yolo_labels = []
    for bb in bboxes:
        xmin = bb[0]
        ymin = bb[1]
        w    = bb[2]
        h    = bb[3]
        
        # Removing very small and very large bounding boxes 
        # This value is choose by trial and error method
        if w*h>120000 or w*h <700.0:
            cnt += 1
            removed.loc[cnt,'img_id'] = image_name
            removed.loc[cnt,'bb'] = bb
            continue
        
        # converting into YOLO format
        x_center = (xmin + w/2)/1024.0
        y_center = (ymin + h/2)/1024.0
        w = w/1024.0
        h = h/1024.0
        yolo_labels.append([0,x_center,y_center,w,h])
    yolo_labels = np.array(yolo_labels)
    
    # Saving images.
    np.savetxt(output_path+image_name+'.txt',yolo_labels,fmt=['%d', '%f', '%f', '%f', '%f'])
        

In [None]:
print('Total Removed Bounding Boxes: {}'.format(len(removed)))

# Visualizing removed labels

Most of small labels are near corner and edge of image so you have to see images carefully.

In [None]:
removed = removed.groupby('img_id')['bb'].apply(list).reset_index()
plt.figure(figsize=(24,24))
for i,img_id in enumerate(range(116,132)):
    img = plt.imread(img_path + removed.loc[img_id,'img_id']+'.jpg')

    for box in removed.loc[img_id,'bb']:
        x,y,w,h = box

        cv2.rectangle(img,(int(x),int(y)),(int(x+w),int(y+h)),(255,0,0),5)
    plt.subplot(4,4,i+1)
    plt.axis('off')
    plt.imshow(img)
plt.show()

# Check some labels

In [None]:
print('Total labels: {}'.format(len(os.listdir('labels/'))))
print('Total images: {}'.format(len(os.listdir(img_path))))

**NOTE** 49 images is without labels, so you have to remove that 49 images from your dataset.

In [None]:
with open('labels/'+os.listdir('labels/')[1],'r') as f:
    temp_file = f.read()

In [None]:
temp_file.split('\n')