# Kaggle Global-Wheat-Detection Dataset <a class="anchor" id="top"></a>

https://www.kaggle.com/c/global-wheat-detection

## Table of Content
* [CNN Models](#cnn_models)
* [Image Helpers](#image_helpers)
* [Data Helpers](#data_helpers)
* [Prepare Data](#prepare_data)
* [Visualize Data](#visualize_data)
* [Augment Data](#augment_data)
* [Generate Model](#generate_model)
* [Train Model](#train_model)
* [Show Results](#show_results)

### Useful Notebooks (kudos!)
- Augmentations, Data Cleaning and Bounding Boxes  
  https://www.kaggle.com/reighns/augmentations-data-cleaning-and-bounding-boxes
- EfficientDet  
  https://www.kaggle.com/shonenkov/training-efficientdet


In [None]:
# Standard imports
import numpy as np
import os
import pandas as pd
import warnings

# User Variables <a class="anchor" id="user_variables"></a> 
[got to top](#top)

In [None]:
# User Variables
data_root = os.path.join(os.environ["HOME"], 'workspace', 'plai', 'res', 'data') # Leave empty for upload on kaggle
model_root = os.path.join(os.environ["HOME"], 'workspace', 'plai', 'res', 'trained') 

RUN_PREPROCESSING = False
RUN_VISUALIZATION = False

In [None]:
# Relative paths
data_path = os.path.join(data_root, 'kaggle', 'input', 'global-wheat-detection')
train_image_folder = os.path.join(data_root, 'kaggle', 'input', 'global-wheat-detection', 'train')
train_csv = pd.read_csv( os.path.join(data_path, 'train.csv') )

# CNN Models <a class="anchor" id="cnn_models"></a> 
[got to top](#top)

## Darknet
Darknet-19 Architecture [3] | Darknet-53 Architecture [2]
:-:|:-:
<img src="darknet_19.png" alt="Darknet-19 Architecture [3]" width="300"/>|<img src="darknet_53.png" alt="Darknet-53 Architecture [2]" width="300"/>

### Darknet-19
Backbone (classification model) of YOLO and YOLO v2. 

### Darknet-53
Backbone (classification model) of YOLO v3  

## YOLO

### YOLO v1
YOLO Overview [4] | YOLO Architecture [1]
:-:|:-:
<img src="yolo_overview.png" alt="YOLO Overview [4]" width="400"/>|<img src="yolo.png" alt="YOLO Architecture [1]" width="400"/>

### YOLO v3

[1] 
@inproceedings{redmon2016you,
  title={You only look once: Unified, real-time object detection},
  author={Redmon, Joseph and Divvala, Santosh and Girshick, Ross and Farhadi, Ali},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={779--788},
  year={2016}
}

[2]
@article{redmon2018yolov3,
  title={Yolov3: An incremental improvement},
  author={Redmon, Joseph and Farhadi, Ali},
  journal={arXiv preprint arXiv:1804.02767},
  year={2018}
}

[3]
@inproceedings{redmon2017yolo9000,
  title={YOLO9000: better, faster, stronger},
  author={Redmon, Joseph and Farhadi, Ali},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={7263--7271},
  year={2017}
}

[4]
https://lilianweng.github.io/lil-log/2018/12/27/object-detection-part-4.html

In [None]:
"""@package cnn_models

  @brief Selection of CNN networks in keras (e.g. Yolo)
      Papers:
          YOLO: https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdf
          YOLO9000: https://pjreddie.com/media/files/papers/YOLO9000.pdf
          YOLOv3: https://arxiv.org/pdf/1804.02767.pdf
      
  
  @author Maximilian Harr <maximilian.harr@gmail.com>
  @date 03.06.2020

  @bug
  @warning
  @todo
 
"""
## IMPORTS #######################################################################################

## CLASSES #######################################################################################

class Yolo():
    """ 
    CNN that detect objects (classification + bounding box) 
        https://pjreddie.com/darknet/yolo/
    """
    None
## FUNCTIONS #####################################################################################


In [None]:
"""@package test_cnn_models

  @brief Unittest for CNN models
  
  @author Maximilian Harr <maximilian.harr@gmail.com>
  @date 03.06.2020

  @bug
  @warning
  @todo
 
"""

# IMPORTS ########################################################################################
import unittest

# Local

# CLASSES ########################################################################################

class TestCnnModels(unittest.TestCase):
    
    def test_yolo(self):
        
        self.assertEqual(True, True)
    
    
# FUNCTIONS ######################################################################################


# MAIN ###########################################################################################
if __name__ == '__main__':
    #unittest.main()
    unittest.main(argv=[''], verbosity=2, exit=False)


# Image Helpers <a class="anchor" id="image_helpers"></a> 
[got to top](#top)


In [None]:
"""@package image_helpers

  @brief Helper functions to visualize images (e.g. bounding boxes)
  
  @author Maximilian Harr <maximilian.harr@gmail.com>
  @date 01.06.2020

  @bug
  @warning
  @todo
 
"""
## IMPORTS #######################################################################################
import cv2
import glob
import math
import numpy as np
import scipy.io
from skimage import io, exposure
from tqdm.notebook import tqdm 
import matplotlib.pyplot as plt

## CLASSES #######################################################################################

class ImageHelper():
    """ Class for visualization / checking images. """
    
    def check_all_image_sizes(self, image_folder_path: str, width: int, height: int) -> bool:
        """
        check_all_image_sizes Check if all images in folder have a certain size
            @param image_folder_path: path to folder
            @param width, height: image dimension
            @return: boolean
        """
        
        # Check input parameters
        if not type(image_folder_path) == str or not type(width) == int or not type(height) == int:
            raise TypeError('Wrong datatype provided')
        if not os.path.isdir(image_folder_path):
            raise ValueError('Directory does not exist')
        
        # Check size of all images
        total_img_list = glob.glob(os.path.join(image_folder_path,"*"))
        counter = 0
        
        for image in tqdm(total_img_list, desc="Checking images"):
            try:
                img = cv2.imread(image)
                img_height, img_width = img.shape[1], img.shape[0]
            except AttributeError:
                if not (img_width == width and img_height == height):
                    counter = counter + 1
        return counter == 0
    
    def check_bbox(self, bbox: pd.DataFrame) -> bool:
        """
        check_bbox Checks if the boundingbox pandas frame bbox has all necessary columns
            @param bbox pandas datafram
            @return boolen
        """
        
        # Check input parameters
        if not type(bbox) == pd.DataFrame:
            raise TypeError('Wrong datatype provided')
        
        # Check if columns exist
        if set(['image_name', 'width', 'height', 'x_min', 'x_max', 'y_min', 'y_max', 'class']).issubset(bbox.columns):
            return True
        else:
            return False
    
    def check_image_and_bbox(self, image_folder_path: str, image_bbox_dataframe: pd.DataFrame) -> bool:
        """ 
        check_image_and_bbox Check image and bbox for consistency
            @param image_folder_path path to images
            @param image_bbox_dataframe pandas dataframe with images
            @return boolean
        """
    
        # Check input parameters
        if not type(image_folder_path) == str or not type(image_bbox_dataframe) == pd.DataFrame:
            raise TypeError('Wrong datatype provided')
        if not os.path.isdir(image_folder_path):
            raise ValueError('Directory does not exist')
        if not self.check_bbox(image_bbox_dataframe):
            raise ValueError('Dataframe is not a valid bbox dataframe')

        # Check if all images in image_bbox_dataframe are existent
        missing_images = []
        for image_name in image_bbox_dataframe['image_name'].unique():
            image_path = os.path.join(image_folder_path, image_name)
            if not os.path.isfile(image_path):
                missing_images.append(image_name)
        
        if len(missing_images) is not 0:
            warnings.warn("Images specified in Bbox are missing", UserWarning)
            print(missing_images)
            return False
        
        return True
    
    def plot_multiple_img(self, img_matrix_list: list, 
                          title_list: np.ndarray, 
                          ncols: int, 
                          main_title: str = ""):
        """
        plot_multiple_img Plots multiple images
            @param img_matrix_list list of images (cv2.imread)
            @param title_list 
            @param ncols number of plot columns
            @param main_title Title of plot
            @return
        """
        
        # Check input parameters
        if not type(img_matrix_list) == list \
            or not type(title_list) == np.ndarray \
            or not type(ncols) == int \
            or not type(main_title) == str:
            raise TypeError('Wrong datatype provided')
        
        
        fig, myaxes = plt.subplots(figsize=(20, 10), nrows=math.ceil(len(img_matrix_list)/ncols), 
                                   ncols=ncols, squeeze=False)
        fig.suptitle(main_title, fontsize = 30)
        fig.subplots_adjust(wspace=0.3)
        fig.subplots_adjust(hspace=0.3)
        for i, (img, title) in enumerate(zip(img_matrix_list, title_list)):
            myaxes[i // ncols][i % ncols].imshow(img)
            myaxes[i // ncols][i % ncols].set_title(title, fontsize=15)
        plt.show()
        
    def plot_random_images(self, image_folder_path: str, 
                           ncols: int = 2,
                           nimgs: int = 12) -> None:
        """
        plot_random_images Plots random images
            @param image_folder_path
            @param ncols number of coulums
            @param nimgs number of image
            @return
        """
        
        # Check input parameters
        if not type(image_folder_path) == str \
            or not type(ncols) == int \
            or not type(nimgs) == int:
            raise TypeError('Wrong datatype provided')
        if not os.path.isdir(image_folder_path):
            raise ValueError('Directory does not exist')
        
        # randomly choose 12 images to plot
        file_list = os.listdir(image_folder_path)
        
        img_files_list = np.random.choice(file_list, nimgs)
        print("The images' names are {}".format(img_files_list))
        img_matrix_list = []

        for img_file in img_files_list:
            image_file_path = os.path.join(image_folder_path, img_file)
            img = cv2.imread(image_file_path)[:,:,::-1]  
            img_matrix_list.append(img)

        return self.plot_multiple_img(img_matrix_list, title_list = img_files_list, ncols = ncols, main_title="Wheat Images")

    def plot_random_images_bbox(self, image_folder_path: str, 
                                image_bbox_dataframe: pd.DataFrame, 
                                ncols: int = 2,
                                nimgs: int = 12) -> None:
        """
        plot_random_images_bbox Plots random images with bounding boxes
            @param image_folder_path
            @param image_bbox_dataframe
            @param ncols number of coulums
            @param nimgs number of image
            @return
        """
        
        # Check input parameters
        if not type(image_folder_path) == str \
            or not type(image_bbox_dataframe) == pd.DataFrame \
            or not type(ncols) == int \
            or not type(nimgs) == int:
            raise TypeError('Wrong datatype provided')
        if not os.path.isdir(image_folder_path):
            raise ValueError('Directory does not exist')
        if not self.check_bbox(image_bbox_dataframe):
            raise ValueError('Dataframe is not a valid bbox dataframe')
            
        # randomly choose 12 image.
        img_files_list = np.random.choice(list(image_bbox_dataframe['image_name']), nimgs)
        print("The images' names are {}".format(img_files_list))
        image_file_path_list = []

        bbox_list = []
        img_matrix_list = []
        random_image_matrix_list = []
        
        # Save images and bounding boxes in new list
        for img_file in img_files_list:
            
            bbox_list.append( image_bbox_dataframe[image_bbox_dataframe['image_name'] == img_file] )
            
            image_file_path = os.path.join(image_folder_path, img_file)
            img = cv2.imread(image_file_path)[:,:,::-1]  
            img_matrix_list.append(img)
        
        # Plot all bounding boxes in image
        final_bbox_list = []
        for bboxes, img in zip(bbox_list, img_matrix_list):
            
            box = bboxes[['x_min','x_max', 'y_min', 'y_max']]
            random_image = img.copy()
            
            for bbox in bboxes[['x_min','y_min', 'x_max', 'y_max']].values.astype(int).reshape(-1, 4):
                start_point = tuple(np.array(bbox[0:2]))
                end_point = tuple(np.array(bbox[2:4]))
                color = (255, 0, 0)
                thickness = 2
                random_image = cv2.rectangle(random_image, start_point, end_point, color, thickness)
            
            random_image_matrix_list.append(random_image)
            
        self.plot_multiple_img(random_image_matrix_list, 
                               title_list = img_files_list, 
                               ncols=ncols, 
                               main_title="Bounding Box Wheat Images")   
    
## FUNCTIONS #####################################################################################


In [None]:
"""@package test_image_helpers

  @brief Unittest for image helpers
  
  @author Maximilian Harr <maximilian.harr@gmail.com>
  @date 01.06.2020

  @bug
  @warning
  @todo
 
"""

# IMPORTS ########################################################################################
import unittest
import math
import numpy as np

# Local
import plai.workspace.init

# CLASSES ########################################################################################

class TestImageHelpers(unittest.TestCase):
    
    def test_check_all_image_sizes(self):
        
        image_helper = ImageHelper()
        
        plai_image_folder = os.path.join( plai.workspace.init.get_ws_path(), 'common', 'plai', 'test', 'res', 'imgs')
        
        self.assertEqual(True, image_helper.check_all_image_sizes(plai_image_folder, width=1024, height=1024) )
    
    def test_check_bbox(self):
        
        image_helper = ImageHelper()
        
        image_bbox = pd.DataFrame(
            [['*.jpg', 0, 0, 0, 0, 0, 0, 0],
            ['*.jpg', 0, 0, 0, 0, 0, 0, 0]],
            columns=['image_name', 'width', 'height', 'x_min', 'x_max', 'y_min', 'y_max', 'class'])
                
        self.assertEqual(True, image_helper.check_bbox(image_bbox))
        
    def test_check_image_and_bbox(self):
        
        image_helper = ImageHelper()
    
        plai_image_folder = os.path.join( plai.workspace.init.get_ws_path(), 'common', 'plai', 'test', 'res', 'imgs')
        image_bbox = pd.DataFrame(
            [['b53afdf5c.jpg', 1024, 1024, 0, 0, 0, 0, 0],
            ['b6ab77fd7.jpg', 1024, 1024, 0, 0, 0, 0, 0]],
            columns=['image_name', 'width', 'height', 'x_min', 'x_max', 'y_min', 'y_max', 'class'])
        
        self.assertEqual( True, image_helper.check_image_and_bbox(plai_image_folder, image_bbox))
        
        image_bbox = pd.DataFrame(
            [['missing_image.jpg', 1024, 1024, 0, 0, 0, 0, 0]],
            columns=['image_name', 'width', 'height', 'x_min', 'x_max', 'y_min', 'y_max', 'class'])
        
        self.assertEqual( False, image_helper.check_image_and_bbox(plai_image_folder, image_bbox))
    
    def test_plot_random_images(self):
        
        image_helper = ImageHelper()

        plai_image_folder = os.path.join( plai.workspace.init.get_ws_path(), 'common', 'plai', 'test', 'res', 'imgs')

        image_helper.plot_random_images(plai_image_folder, nimgs=6, ncols=3)
    
    def test_plot_random_images_bbox(self):

        image_helper = ImageHelper()

        plai_image_folder = os.path.join( plai.workspace.init.get_ws_path(), 'common', 'plai', 'test', 'res', 'imgs')

        # Read Bbox annotation file
        image_bbox = pd.DataFrame([])
        image_bbox_csv = os.path.join( plai.workspace.init.get_ws_path(), 'common', 'plai', 'test', 'res', 'train.csv')

        with open(image_bbox_csv, "r") as file:
            image_bbox = pd.read_csv(image_bbox_csv)

        image_helper.plot_random_images_bbox(plai_image_folder, image_bbox, nimgs=6, ncols=3)
    
# FUNCTIONS ######################################################################################


# MAIN ###########################################################################################
if __name__ == '__main__':
    #unittest.main()
    unittest.main(argv=[''], verbosity=2, exit=False)


# Data Helpers <a class="anchor" id="data_helpers"></a>
[got to top](#top)

In [None]:
"""@package data_helpers

  @brief Helper functions to convert text data (e.g. csv etc)
  
  @author Maximilian Harr <maximilian.harr@gmail.com>
  @date 29.05.2020

  @bug
  @warning
  @todo
 
"""

## IMPORTS #######################################################################################
from collections import Counter
import numpy as np
import scipy.io
from skimage import io, exposure
import matplotlib.pyplot as plt

## CLASSES #######################################################################################

class BboxHelper():
    """ Class for Bounding box data processing. """
    None

class FileHelper():
    """ Class for checking folders and files. """
    def folder_filetypes_equal(self, path: str, ignore_folder_type: bool) -> bool :
        """
        folder_file_types_equal Check if all file types in folder are equal
            @param path: path to folder
            @param ignore_folder_type: Ignore folder file type
            @return: boolean
        """

        # Check input parameter
        if not type(path) == str and not type(ignore_folder_type) == bool:
            raise TypeError('Wrong datatype provided')
        if not os.path.isdir(path):
            raise ValueError('Directory does not exist')

        # Check if all file types are equal
        extension_type = []
        file_list = os.listdir(path)

        for file in file_list:
            # Skip folders
            if os.path.isdir(os.path.join(path, file)) and ignore_folder_type is True:
                continue
            extension_type.append(file.rsplit(".", 1)[1].lower())

        # print(Counter(extension_type).keys())
        # print(Counter(extension_type).values())

        return len(Counter(extension_type).keys()) == 1


## FUNCTIONS #####################################################################################


In [None]:
"""@package test_data_helpers

  @brief Unittest for data helpers
  
  @author Maximilian Harr <maximilian.harr@gmail.com>
  @date 29.05.2020

  @bug
  @warning
  @todo
 
"""

# IMPORTS ########################################################################################
import unittest
import math
import numpy as np
import os

# Local
import plai.workspace.init

# CLASSES ########################################################################################

class TestDataHelpers(unittest.TestCase):

    def test_folder_file_types_equal(self):
        plai_dir = os.path.join( plai.workspace.init.get_ws_path(), 'common', 'plai')
        plai_img_dir = os.path.join( plai.workspace.init.get_ws_path(), 'common', 'plai', 'test', 'res', 'imgs')
        
        file_helper = FileHelper()
        self.assertEqual( False, file_helper.folder_filetypes_equal(plai_dir, ignore_folder_type=True) )
        self.assertEqual( True, file_helper.folder_filetypes_equal(plai_img_dir, ignore_folder_type=True) )
        None

# FUNCTIONS ######################################################################################


# MAIN ###########################################################################################
if __name__ == '__main__':
    #unittest.main()
    unittest.main(argv=[''], verbosity=2, exit=False)


# Prepare Data <a class="anchor" id="prepare_data"></a>
[got to top](#top)

In [None]:
# Read data
train_csv.head()

In [None]:
# Get bboxes
from pandas import read_csv

train_bbox = pd.DataFrame()

if RUN_PREPROCESSING:
    train_bbox = train_csv # Merely used to initialize space
    train_bbox["image_name"] = train_csv["image_id"].apply(lambda x: str(x) + ".jpg")

    # Add columns [x_min, y_min, width, height]
    bboxes = np.stack(train_bbox['bbox'].apply(lambda x: np.fromstring(x[1:-1], sep=',')))
    for i, column in enumerate(['x_min', 'y_min', 'width', 'height']):
        train_bbox[column] = bboxes[:,i]

    # Add colums [x_max, y_max]
    train_bbox["x_max"] = train_bbox.apply(lambda col: col.x_min + col.width, axis=1)
    train_bbox["y_max"] = train_bbox.apply(lambda col: col.y_min + col.height, axis = 1)
    train_bbox["x_center"] = train_bbox.apply(lambda col: col.x_min + col.width/2, axis=1)
    train_bbox["y_center"] = train_bbox.apply(lambda col: col.y_min + col.height/2, axis = 1)
    train_bbox.drop(columns=['bbox'], inplace=True)

    # Remove columns
    del train_bbox['source']
    del train_bbox['image_id']
    
    # Add class label
    train_bbox["class"] = '1'

    # Store as *.csv
    train_bbox.to_csv( os.path.join(data_path, "train_bbox.csv") )
    train_bbox.head()
else:
    train_bbox = pd.read_csv( os.path.join(data_path, "train_bbox.csv") )

In [None]:
# Sanity checks
file_helper = FileHelper()
image_helper = ImageHelper()

if RUN_PREPROCESSING:
    # 
    if (len(train_csv[train_csv["x_max"] > 1024]) or \
        len(train_csv[train_csv["y_max"] > 1024]) or \
        len(train_csv[train_csv["x_min"] < 0]) or
        len(train_csv[train_csv["y_min"] < 0]) ):
        warnings.warn("Image normalization required", UserWarning)

    #if image_helper.check_all_image_sizes(train_image_folder, width=1024, height=1024) == False:
    #    warnings.warn("Actual image size not equal", UserWarning)

    if (len(train_csv[train_bbox["x_min"] < 0]) or \
        len(train_csv[train_bbox["y_min"] < 0]) or \
        len(train_csv[train_bbox["x_max"] > 1024]) or
        len(train_csv[train_bbox["y_max"] > 1024]) ):
        warnings.warn("Bounding box exceeds image", UserWarning)

    if file_helper.folder_filetypes_equal(data_path, ignore_folder_type=True) == False:
        warnings.warn("File types in folder differ", UserWarning)

    if not image_helper.check_bbox(train_bbox):
        warnings.warn("Error in bounding boxes", UserWarning)

# Visualize Data <a class="anchor" id="visualize_data"></a>
[got to top](#top)

In [None]:
# Plot random wheat images with bounding boxes
image_helper = ImageHelper()
if RUN_VISUALIZATION:
    image_helper.plot_random_images_bbox(train_image_folder, train_bbox, nimgs=6, ncols=3)

In [None]:
# Plot wheat heads of certain image
def plot_heads_of_image(image_name: str, image_folder, train_bbox: pd.DataFrame()) -> None:
    
    train_bbox_sample = train_bbox[train_bbox['image_name'] == image_name ]
    image_path_full = os.path.join(image_folder, image_name)
    image_sample = cv2.imread(  image_path_full)[:,:,::-1]

    img_matrix_list = []
    img_files_list = []

    # Plot wheat heads
    for index, bbox in train_bbox_sample.iterrows():
        # Crop wheat head
        img = cv2.imread( image_path_full )[:,:,::-1]
        img = img[ int(bbox.y_min) : int(bbox.y_max) , int(bbox.x_min) : int(bbox.x_max) ,::]
        img_matrix_list.append(img)
        img_files_list.append("head_" + str(index))

    image_helper.plot_multiple_img(img_matrix_list, title_list = np.asarray(img_files_list, dtype=str),
                                   ncols = 10, main_title="Wheat Heads")

#plot_heads_of_image('b53afdf5c.jpg', train_image_folder, train_bbox)

In [None]:
# Save all wheatheads as separate jpg images
train_image_folder_heads = os.path.join(data_root, 'kaggle', 'input', 'global-wheat-detection', 'train_heads')

if RUN_PREPROCESSING:
    # Create folder for head images    
    if not os.path.exists(train_image_folder_heads):
        os.makedirs(train_image_folder_heads)

    # Plot wheat heads
    for index, bbox in tqdm(train_bbox.iterrows(), total=len(train_bbox), unit="images"):
        # Crop wheat head
        img = cv2.imread( os.path.join(train_image_folder, bbox.image_name) )[:,:,::-1]
        img = img[ int(bbox.y_min) : int(bbox.y_max) , int(bbox.x_min) : int(bbox.x_max) ,::]

        img_destination = os.path.join(train_image_folder_heads, str(index) + ".jpg")
        cv2.imwrite(img_destination, img)

    image_helper.plot_random_images(train_image_folder_heads, nimgs=20, ncols=5)

In [None]:
# Check which images contain wheatheads
file_list = os.listdir(train_image_folder)

A = pd.DataFrame(train_bbox['image_name'].unique(), columns={"image_name"})
B = pd.DataFrame(file_list, columns={"image_name"})

A_B_union = pd.merge(A, B, how='inner', on=['image_name', 'image_name'] ) # needless, but nice example (docu)
A_not_in_B = pd.concat([A, B]).drop_duplicates(keep=False)

print("Total number of images: %d" %len(B))
print("Number of images with bbox: %d" %len(A_B_union))
print("Number of images without bbxox: %d" % len(A_not_in_B))

A_not_in_B.head()

## Train Model that detects if wheat is in image

In [None]:
# @brief
from PIL import Image
from tqdm.notebook import tqdm 

def read_images(path: str, image_names: pd.DataFrame, height, width) -> np.array:
    
    # Prepare images (resize)
    # https://pillow.readthedocs.io/en/3.1.x/reference/Image.html
    images = []
    
    for index, image_name in tqdm(image_names.iterrows(), total=len(image_names), unit="images"):
        try:
            
            image = Image.open(os.path.join(path, image_name.image_name))
            image = image.resize( (width, height), Image.LANCZOS)
            image = image.convert("RGB")

            image = np.asarray(image)
            images.append(image)
        except OSError:
            pass
    
    # Convert to float
    images = np.array(images)
    images = images.astype(np.float32)
    
    return images

In [None]:
# https://stackoverflow.com/questions/53698035/failed-to-get-convolution-algorithm-this-is-probably-because-cudnn-failed-to-in
import keras
import tensorflow as tf

# tf.compat.v1.disable_v2_behavior()
def tf_configure_gpu( gpu_fraction=0.3: float) -> None:
    if tf.__version__[0] == "2":
        print("Tensorflow V2")

        config = tf.compat.v1.ConfigProto()
        config.gpu_options.per_process_gpu_memory_fraction = gpu_fraction
        config.gpu_options.allow_growth = True
        session = tf.compat.v1.Session(config=config)
        tf.compat.v1.keras.backend.set_session(session)

    else:
        print("Tensorflow V1")
        
        config = tf.ConfigProto()
        config.gpu_options.per_process_gpu_memory_fraction = gpu_fraction
        config.gpu_options.allow_growth = True
        session = tf.Session(config=config)
        keras.backend.tensorflow_backend.set_session( session )

In [None]:
# Setup GPU
USE_GPU = True

if USE_GPU:
    tf_configure_gpu( gpu_fraction=0.4 )

In [None]:
# Read nowheat and wheat data into RAM
TRAIN_HAS_WHEAT_MODEL = True

if TRAIN_HAS_WHEAT_MODEL:
    nowheat_image_names = pd.DataFrame(A_not_in_B)
    wheat_image_names = A_B_union

    nowheat_images = read_images( train_image_folder, nowheat_image_names, height=224, width=224 )
    wheat_images = read_images( train_image_folder, wheat_image_names, height=224, width=224 )

In [None]:
# Append nowheat and wheat imagess
if TRAIN_HAS_WHEAT_MODEL:
    x = np.concatenate([nowheat_images, wheat_images])
    x.shape

In [None]:
# Add labels
if TRAIN_HAS_WHEAT_MODEL:
    y_nowheat = np.zeros(len(nowheat_images))
    y_wheat = np.ones(len(wheat_images))

    y = np.concatenate([y_nowheat, y_wheat])
    y = y.reshape(-1, 1)

In [None]:
# Create CNN based on VGG16 without last / top layer (output)
# For details: vgg16_model.summary()
import tensorflow as tf
import keras.applications.vgg16 as vgg16

#with tf.device('cpu:0'):
vgg16_model = vgg16.VGG16(include_top=False, input_shape=(224, 224, 3))


In [None]:
# Run all x data through VGG16 net now (to save time during training)
# Force CPU
import tensorflow as tf

# Use Transfer Learning on pre-trained VGG16 model

if TRAIN_HAS_WHEAT_MODEL:
    #with tf.device('cpu:0'):
    x_vgg16 = vgg16.preprocess_input(x)
    x_after_vgg = vgg16_model.predict(x_vgg16, verbose=True, batch_size=12)

In [None]:
# Create new keras CNN without VGG16 but using its processed images
from keras.models import Sequential
from keras.layers import Dense, Flatten
from keras.optimizers import Adam

if TRAIN_HAS_WHEAT_MODEL:
    model_has_wheat = Sequential()
    # model2.add(vgg16_model)

    model_has_wheat.add(Flatten( input_shape=(7, 7, 512)))
    model_has_wheat.add(Dense(4096, activation="relu"))
    model_has_wheat.add(Dense(1024, activation="relu"))
    model_has_wheat.add(Dense(1, activation="sigmoid"))

    model_has_wheat.summary()

In [None]:
# Shuffle data (which is sorted after cats/dogs)
# This is necessary as we use validation_split in next step
# ... which uses last 20% of data
from sklearn.utils import shuffle
    
if TRAIN_HAS_WHEAT_MODEL:
    x_after_vgg, y = shuffle(x_after_vgg, y)

In [None]:
# Train model using x_after_vgg
if TRAIN_HAS_WHEAT_MODEL:
    #with tf.device('cpu:0'):
    model_has_wheat.compile(optimizer=Adam(lr=0.0001), loss="binary_crossentropy", metrics=["acc"])
    model_has_wheat.fit(x_after_vgg, y, epochs=10, batch_size=12, validation_split=0.2)

In [None]:
import keras
keras.Sequential.fit?

In [None]:
# Save model
import keras.models as models

if TRAIN_HAS_WHEAT_MODEL:
    model_has_wheat.save( os.path.join(model_root, 'build', 'model_has_wheat.h5') )
else:
    model_has_wheat = models.load_model( os.path.join(model_root, 'build', 'model_has_wheat.h5') )

In [None]:
#test_image_names = pd.DataFrame(['wheat01.jpg'], columns={"image_name"})
#test_image_names = pd.DataFrame(['2fd875eaa.jpg'], columns={"image_name"})

test_image_names = pd.DataFrame( os.listdir(os.path.join(data_path, 'test_www')), columns={"image_name"})
test_images = read_images( os.path.join(data_path, 'test_www'), test_image_names, width=224, height=224 )

x_test = vgg16.preprocess_input(test_images)

with tf.device('cpu:0'):
    x_test_vgg16 = vgg16.preprocess_input(x_test)
    x_test_after_vgg = vgg16_model.predict(x_test_vgg16, verbose=True, batch_size=12)
    res = model_has_wheat.predict(x_test_after_vgg)
    
    print(res)
    print(test_image_names)

In [None]:
# debug
#config = tf.compat.v1.ConfigProto()
#config.gpu_options.allow_growth = True
#session = tf.compat.v1.InteractiveSession(config=config)
import tensorflow as tf
import keras.models as models
import keras.applications.vgg16 as vgg16

# Load models
vgg16_model = vgg16.VGG16(include_top=False, input_shape=(224, 224, 3))
model_has_wheat = models.load_model( os.path.join(model_root, 'build', 'model_has_wheat.h5') )

In [None]:
x_test_vgg16 = vgg16.preprocess_input(x_test)
x_test_after_vgg = vgg16_model.predict(x_test_vgg16, verbose=True, batch_size=12)
res = model_has_wheat.predict(x_test_after_vgg)

In [None]:
# Get variables and their size
import sys

# These are the usual ipython objects, including this one you are creating
ipython_vars = ['In', 'Out', 'exit', 'quit', 'get_ipython', 'ipython_vars']

# Get a sorted list of the objects and their sizes
sorted([(x, sys.getsizeof(globals().get(x))) for x in dir() if not x.startswith('_') and x not in sys.modules and x not in ipython_vars], key=lambda x: x[1], reverse=True)

# Augment Data <a class="anchor" id="augment_data"></a>
[got to top](#top)

In [None]:
tqdm?

# Generate Model <a class="anchor" id="generate_model"></a>
[got to top](#top)

You may also start from here, but run  

* [Image Helpers](#image_helpers)  
* [Data Helpers](#data_helpers)  

before.

# Train Model <a class="anchor" id="train_model"></a>
[got to top](#top)

# Show Results <a class="anchor" id="show_results"></a>
[got to top](#top)

In [None]:
array = "1"
array = wheat_helpers.get_array_from_string("[1, 2, 3]")
print(array[1])

In [None]:
kaggledata_root = "../../res/data" # Leave empty for upload on kaggle

In [None]:
#read csv and show head of csv
data_box = pd.read_csv(kaggledata_root + "/kaggle/input/global-wheat-detection/train.csv")
data_box.head()

wheat_helpers.plot_boundingbox(data_box, kaggledata_root + '/kaggle/input/global-wheat-detection/train/b6ab77fd7.jpg')

In [None]:
# Display sample image
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
img = mpimg.imread(kaggledata_root + "/kaggle/input/global-wheat-detection/train/b6ab77fd7.jpg")
plt.imshow(img)