### Version information

**Version 4** -- The original public version of the notebook.

**Version 5** -- Two major changes were made: 
   1. As was pointed out by [Roman](https://www.kaggle.com/nroman), the hair images were originally designed for the 256x256 size, so they need to be scaled to use with images of different sizes. In this version we introduced a scaling factor for the dimensions of the hair images, so now they should work just fine with any input sizes.
   2. It was pointed out by [Helgi](https://www.kaggle.com/helgith) that the TensorFlow code from **Version 4** was throwing an error when used in the graph mode. In this new version of the notebook, The TensorFlow code was tweaked to make it workable in the graph mode. An example is added to illustrate how to fetch a training batch and print it to the screen.
   
**Version 6** -- fixed some minor typo (thank you [Franko Sikic](https://www.kaggle.com/frankosikic)!). If you want to see how this augmentation can be included in your training pipeline take a look at **Versions 20** of the following public notebook of mine: 

[EfficientNet BN+Tabular Features TF CV5 512x512](https://www.kaggle.com/graf10a/efficientnet-bn-tabular-features-tf-cv5-512x512).

**Version 9** -- modified the part illustrating how to incorporate this data augmentation into `tf.data.Dataset` API to show an example of how to deal with both images and tabular data. Please note that since I am running this notebook on CPU, I had to decrease the size of the shuffling buffer from 2048 to 512 in the definition of the `get_training_dataset` function below. 

Also, the images shown in this last part might look a bit strange to you. This is because they were pre-processed with the Shades of Gray algorithm. This is totally optional, I decided to use this images just for fun. You can read more about the Shades of Gray algorithm in this discussion topic: [Shades of Gray prepossessed data](https://www.kaggle.com/c/siim-isic-melanoma-classification/discussion/161719).

**Version 11** -- Updated tfrecords (fixed color artifacts after Shades of Gray pre-processing).

**Version 13** -- Added the batch version of [Chris Deotte's affine augmentation](https://www.kaggle.com/cdeotte/rotation-augmentation-gpu-tpu-0-96) using the approach from [this great kernel](https://www.kaggle.com/yihdarshieh/make-chris-deotte-s-data-augmentation-faster) by [Yih-Dar SHIEH](https://www.kaggle.com/yihdarshieh). We switched back to the original images (no color constant pre-processing). For more information see the following discussion topic: [Batch form of affine augmentations in Tensor Flow](https://www.kaggle.com/c/siim-isic-melanoma-classification/discussion/169504).

### Motivation and acknowledgement

This notebook is based on the idea suggested by [Roman](https://www.kaggle.com/nroman) in the following discussion topic: [Advanced hair augmentation](https://www.kaggle.com/c/siim-isic-melanoma-classification/discussion/159176). We first reproduce his result using the OpenCV library and illustrate it with some sample images. After that we re-write the OpenCV code in TensorFlow. The TensorFlow implementation of this technique makes it possible to use this agumentation with the `tf.data` API which is very well suited for tfrecords and TPU.

### Libraries

In [None]:
import os
import gc
import cv2
import math
import random
import numpy as np
import pandas as pd
from glob import glob
import tensorflow as tf
from pathlib import Path
import matplotlib.pyplot as plt
import tensorflow.keras.backend as K
from kaggle_datasets import KaggleDatasets

print("Tensorflow version " + tf.__version__)
AUTO = tf.data.experimental.AUTOTUNE

### Image paths

In [None]:
n_max=6     # the maximum number of hairs to augment
im_size=512  # all images are resized to this size

hair_images=glob('/kaggle/input/melanoma-hairs/*.png')
skin_images = glob("/kaggle/input/sample-skin/cancer/*.png")
# train_images=glob('/kaggle/input/siim-isic-melanoma-classification/jpeg/train/*.jpg')
# test_images=glob('/kaggle/input/siim-isic-melanoma-classification/jpeg/test/*.jpg')

len(hair_images), len(skin_images)

### Augmenting hair with OpenCV

In [None]:
def hair_aug_ocv(input_img):
    
    img=input_img.copy()
    # Randomly choose the number of hairs to augment (up to n_max)
    n_hairs = random.randint(0, n_max)

    # If the number of hairs is zero then do nothing
    if not n_hairs:
        return img, n_hairs

    # The image height and width (ignore the number of color channels)
    im_height, im_width, _ = img.shape 

    for _ in range(n_hairs):

        # Read a random hair image
        hair = cv2.imread(random.choice(hair_images)) 
        
        # Rescale the hair image to the right size (256 -- original size)
        scale=im_size/256
        hair = cv2.resize(hair, (int(scale*hair.shape[1]), int(scale*hair.shape[0])), 
                          interpolation=cv2.INTER_AREA)       

        # Flip it
        # flipcode = 0: flip vertically
        # flipcode > 0: flip horizontally
        # flipcode < 0: flip vertically and horizontally    
        hair = cv2.flip(hair, flipCode=random.choice([-1, 0, 1]))

        # Rotate it
        hair = cv2.rotate(hair, rotateCode=random.choice([cv2.ROTATE_90_CLOCKWISE,
                                                          cv2.ROTATE_90_COUNTERCLOCKWISE,
                                                          cv2.ROTATE_180
                                                         ])
                         )
        
        
        # The hair image height and width (ignore the number of color channels)
        h_height, h_width, _ = hair.shape

        # The top left coord's of the region of interest (roi)  
        # where the augmentation will be performed
        roi_h0 = random.randint(0, im_height - h_height)
        roi_w0 = random.randint(0, im_width - h_width)

        # The region of interest
        roi = img[roi_h0:(roi_h0 + h_height), roi_w0:(roi_w0 + h_width)]

        # Convert the hair image to grayscale
        hair2gray = cv2.cvtColor(hair, cv2.COLOR_BGR2GRAY)

        # If the pixel value is smaller than the threshold (10), it is set to 0 (black), 
        # otherwise it is set to a maximum value (255, white).
        # ret -- the list of thresholds (10 in this case)
        # mask -- the thresholded image
        # The original image must be a grayscale image
        # https://docs.opencv.org/master/d7/d4d/tutorial_py_thresholding.html
        ret, mask = cv2.threshold(hair2gray, 10, 255, cv2.THRESH_BINARY)

        # Invert the mask
        mask_inv = cv2.bitwise_not(mask)

        # Bitwise AND won't be performed where mask=0
        img_bg = cv2.bitwise_and(roi, roi, mask=mask_inv)
        hair_fg = cv2.bitwise_and(hair, hair, mask=mask)
        # Fixing colors
        hair_fg = cv2.cvtColor(hair_fg, cv2.COLOR_BGR2RGB)
        # Overlapping the image with the hair in the region of interest
        dst = cv2.add(img_bg, hair_fg)
        # Inserting the result in the original image
        img[roi_h0:roi_h0 + h_height, roi_w0:roi_w0 + h_width] = dst
        
    return img, n_hairs

### Examples of hair augmentation with OpenCV

In [None]:
def aug_examples(paths):

    for img_path in paths:
        # Read the image
        img=cv2.imread(img_path)
        # Fixing colors
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        # Resize to the desired size
        img = cv2.resize(img , (im_size, im_size), interpolation = cv2.INTER_AREA )
        # Creating an augmented image
        img_aug, n_hairs = hair_aug_ocv(img)
        
        _, (ax1,ax2) = plt.subplots(1, 2)
        
        im_name=img_path.split('/')[-1].split('.')[0]    
        ax1.set_title(f"{im_name}")            
        ax2.set_title(f"{im_name} with {n_hairs} {'hair' if n_hairs==1 else 'hairs'}")
        
        ax1.imshow(img)
        ax2.imshow(img_aug)
        
        plt.tight_layout()
        plt.show()

In [None]:
img_path = skin_images[0]

In [None]:
img=cv2.imread(img_path)
# Fixing colors
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

In [None]:
skin_images

In [None]:
img = cv2.resize(img , (im_size, im_size), interpolation = cv2.INTER_AREA )
# Creating an augmented image
img_aug, n_hairs = hair_aug_ocv(img)

_, (ax1,ax2) = plt.subplots(1, 2)

im_name=img_path.split('/')[-1].split('.')[0]    
ax1.set_title(f"{im_name}")            
ax2.set_title(f"{im_name} with {n_hairs} {'hair' if n_hairs==1 else 'hairs'}")

ax1.imshow(img)
ax2.imshow(img_aug)

plt.tight_layout()
plt.show()

In [None]:
aug_examples(skin_images)

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [None]:
train_augmenter=ImageDataGenerator(
    rescale=1./255, 
    #rotation range and fill mode only
    #samplewise_center=True, 
    #samplewise_std_normalization=True, 
    #horizontal_flip = True, 
    #vertical_flip = True, 
    #height_shift_range= 0.05, 
    #width_shift_range=0.1, 
    rotation_range=10, 
    #shear_range = 0.1,
    #fill_mode = 'nearest',
    #zoom_range=0.10,
    brightness_range=(0.8,1.2)
    #preprocessing_function=function_name,
    )

In [None]:
data = train_augmenter.flow_from_directory("/kaggle/input/sample-skin/", )

In [None]:
import matplotlib.pyplot as plt
def plotImages(images_arr):
    fig, axes = plt.subplots(1, 5, figsize=(20,20))
    axes = axes.flatten()
    for img, ax in zip( images_arr, axes):
        ax.imshow(img)
    plt.tight_layout()
    plt.show()
    
    
augmented_images = [data[0][0][1] for i in range(5)]
plotImages(augmented_images)

In [None]:
data[0][0].shape

In [None]:
plt.imshow(data[0][0][0])