# Filter Application and Data Visualization

## Goals
This notebook is about image preprocessing to optimize any model you use, whether it be a typical CNN, or a U-net, or anything else.
1. Analyze and get a feel for the data
2. Filtration and Image Normalization:
    1. Application of image zero-padding using numpy.pad
    2. Application of Contrast Limited Adaptive Histogram Equalization (referred to as CLAHE from now on)<sup>1</sup>
3. Apply scientifically described and trained models to identify and separate medical devices from organs <sup>2</sup>
4. Discuss image filtration techniques to provide before convolutions

<sup>1</sup> http://www.cs.unc.edu/techreports/90-035.pdf

<sup>2</sup> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6113157/

Interested in separate applications of CLAHE, in Python:

This one describes a model of CLAHE, and some of the limits of traditional models: https://towardsdatascience.com/increase-your-face-recognition-models-accuracy-by-improving-face-contrast-a3e71bb6b9fb

This one describes the opencv application of adaptive Histogram equalization on grayscale images: https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_histograms/py_histogram_equalization/py_histogram_equalization.html

Same as above, but in "layman's terms" (is anything AI ever in layman's terms): https://www.geeksforgeeks.org/clahe-histogram-eqalization-opencv/

Scikit Image application of CLAHE (not used below): https://scikit-image.org/docs/dev/api/skimage.exposure.html#skimage.exposure.equalize_adapthist

## Imports

In [None]:
# Basics
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os # File/directory scanning and editting

# Image Processing
from PIL import Image 
import cv2 as cv

# Image Displaying
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

## Directory Structure

In [None]:
# File Folders:
for dirname, _, filenames in os.walk('/kaggle/input'):
    print(dirname)
#     for filename in filenames:
#         print(os.path.join(dirname, filename))

In [None]:
# Files
display_folders = "n" # y = display, anything else = no
if (display_folders == 'y'):
    for dirname, _, filenames in os.walk('/kaggle/input'):
        print("-"*30, "\n"*10,"-"*30, sep = "\n")
        for filename in filenames:
            print(os.path.join(dirname, filename))

## Image Padding and Contrast Limited Adaptive Histogram Equalization (CLAHE)

### Image Padding:

1. We have to figure out the maximum dimensions of the images
2. We convert images to a numpy array
3. We apply the numpy.pad function, and voilÃ , you have your padded image
4. Optional: Convert numpy array to image type, or so that it is accessible by a library (such as PIL or OpenCV)

### A "quick" discussion on image resizing/padding:


* Should I resize my images, pad my images, or both? There are pros and cons to both approaches:
1. Resizing:
    * Shrinking:
        * Pros:
            * Speeds up training process
            * Reduces RAM requirement
            * Allows for larger batch sizes
            * Can focus images on larger portions of images
        * Cons:
            * Reduces image quality
            * Removes finer details of the image
    * Enlarging:
        * Pros:
            * Keeps all details of image
            * Allows for more pooling layers/convolutional layers, allowing for more sophisticated networks
            * Allows network to train on finer details of image
        * Cons:
            * Increases training time
            * Uses large amount or RAM
            * Smaller batch sizes often required
            * Images are often pixelated and/or stretched
                * **Note:** Images are not necessarily stretched, in either scenario, as one can shrink/enlarge keeping the same aspect ratio, and then pad for the remaining pixels.
                    * Example: Say I have a 100 by 200 pixel image, and I want it to be shrunk to a size of 50 by 50. I have two options: 
                        1. No Padding:
                            * Reduce width by a factor of 2, and the height by a factor of 4.
                        2. Padding:
                            * Reduct width and height by a factor of 4
                            * Pad an extra 25 by 50 region in the photo as desired
2. Padding:
    * Pros:
        * Keeps image aspect ratios
        * Retains all fine details
    * Cons:
        * Photos must be padded to the size of the largest photograph
            * ***Common Mistake:*** This must be the largest photograph of both the training and test set
            * This means that, depending on the sets, they can take up a lot of RAM, or not a lot, it varies
3. Combination of Resizing and Padding:
    * Pros and Cons:
        * Depend on situation/circumstances

In [None]:
## Step 1:

picture_height = 3567 # Manually set
picture_width = 3827 # Manually set

## Note: I coded, as below, for a program to find the largest dimensions. 
##   It takes about 5-10 minutes to run, so to save the valuable kernel time,
##   I manually set the values above, according to the result of the program below.

## Code to identify the largest photo
# for dirname, _, filenames in os.walk(train_path):
#     for filename in filenames:
#         temp_img = mpimg.imread(os.path.join(dirname, filename))
#         (temp_height, temp_width) = temp_img.shape
#         if temp_height > picture_height:
#             picture_height = temp_height
#         if temp_width > picture_width:
#             picture_width = temp_width

# for dirname, _, filenames in os.walk(test_path):
#     for filename in filenames:
#         temp_img = mpimg.imread(os.path.join(dirname, filename))
#         (temp_height, temp_width) = temp_img.shape
#         if temp_height > picture_height:
#             picture_height = temp_height
#         if temp_width > picture_width:
#             picture_width = temp_width

In [None]:
# Step 2:

# We will use some example images. I'll label them, img_1 and img_2
# We will read them, and then convert them to a numpy array.

# img_1 = "/kaggle/input/ranzcr-clip-catheter-line-classification/train/1.2.826.0.1.3680043.8.498.17952552645001544825751321016030941058.jpg"
# img_2 = '/kaggle/input/ranzcr-clip-catheter-line-classification/train/1.2.826.0.1.3680043.8.498.10370758874574386468962321364924311754.jpg'


# For this purpose, I will use the PIL (Pillow), library

img_1 = Image.open("/kaggle/input/ranzcr-clip-catheter-line-classification/train/1.2.826.0.1.3680043.8.498.17952552645001544825751321016030941058.jpg")
img_2 = Image.open('/kaggle/input/ranzcr-clip-catheter-line-classification/train/1.2.826.0.1.3680043.8.498.10370758874574386468962321364924311754.jpg')

# print(img_1.mode) # Prints the mode of the images (RGB, HSV, L, P, ...)
# print(img_2.mode)
# # The mode of the photos is "L", which are grayscale images with 8-bit pixels

img_1_np = np.array(img_1)
img_2_np = np.array(img_2)

In [None]:
# Step 3

img_1_np = np.pad(img_1_np, ((0, picture_height - img_1_np.shape[0]),(0, picture_width - img_1_np.shape[1])))
img_2_np = np.pad(img_2_np, ((0, picture_height - img_2_np.shape[0]),(0, picture_width - img_2_np.shape[1])))
print(img_1_np, "\n\n\n")
print(img_2_np)

In [None]:
# Optional Step 4:
# Now we can convert back to a PIL Image, and display it using matplot, along with the original image:

new_img_1 = Image.fromarray(img_1_np)
new_img_2 = Image.fromarray(img_2_np)

# You can see the padding below, comparing the two images.
# The white space, is space that is not part of the photo,
# and the black is padding

fig, ((ax1, ax2),(ax3, ax4)) = plt.subplots(nrows = 2, ncols = 2, sharex = True, sharey = True, figsize = (15, 15), dpi = 150, num = 1)
ax1.imshow(img_1, cmap = "gray") # Images are gray scale, ensuring that matplotlib displays them as such
ax2.imshow(new_img_1, cmap = "gray")
ax3.imshow(img_2, cmap = "gray")
ax4.imshow(new_img_2, cmap = "gray")

### Applying Contrast Limited Adaptive Histogram Equalization (CLAHE)

In [None]:
clahe = cv.createCLAHE(clipLimit=15.0, tileGridSize=(8,8))

clahe_img_1 = clahe.apply(img_1_np)
clahe_img_2 = clahe.apply(img_2_np)

fig, ((new_ax1,  new_ax2),(new_ax3, new_ax4)) = plt.subplots(nrows = 2, ncols = 2, sharex = True, sharey = True, figsize = (20, 20), dpi = 150, num = 1)
new_ax1.imshow(clahe_img_1, cmap = "gray") # Images are gray scale, ensuring that matplotlib displays them as such
new_ax2.imshow(new_img_1, cmap = "gray")
new_ax3.imshow(clahe_img_2, cmap = "gray")
new_ax4.imshow(new_img_2, cmap = "gray")

# Images on the left are the images with contrast applied, on the right are the non-contrasted images
# I recommend values from 2 - 20
# Notice, the difference in how well it works. In the top image, 
# you get black spots, for seemingly no reason, yet in the bottom 
# image, the spine becomes clearer.
# Most importantly, the lines in both images become clearer for the catheters/tubing.

## Medically Valid Models

You can apply some of the following models for further image preprocessing
* https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6113157/
* https://arxiv.org/pdf/2011.07394.pdf

In addition, one can apply further image filtration:
* https://humanhealth.iaea.org/HHW/MedicalPhysics/TheMedicalPhysicist/Studentscorner/HandbookforTeachersandStudents/Chapter_17.pdf