This kernel introduced preprocessing procedures for cervical images, including ROI (region of interest) croping and Specular Reflections removing. The main ideas came from two papers, please read them to fine more detailed infomations): 

- Ref. 1 [Automatic Detection of Anatomical Landmarks in Uterine Cervix Images](https://www.researchgate.net/profile/Sameer_Antani/publication/24041301_Automatic_Detection_of_Anatomical_Landmarks_in_Uterine_Cervix_Images/links/0deec51cdce9261312000000/Automatic-Detection-of-Anatomical-Landmarks-in-Uterine-Cervix-Images.pdf)

- Ref.2 [Automatic Detection of Specular Reflections in Uterine Cervix Images](http://paloma.eng.tau.ac.il/research/laboratories/mip_lab/publications/cervix_sr_spie06.pdf)

## ROI Cropping ##

In [None]:
# packeges importing
import pandas as pd
import numpy as np
import skimage
from skimage import io, transform, morphology, segmentation, measure
from sklearn.mixture import GaussianMixture
from sklearn.preprocessing import MinMaxScaler
import glob
import os
import matplotlib.pyplot as plt
%matplotlib inline

DIR_BASE = '../input'
imgs = glob.glob(os.path.join(DIR_BASE, 'train/Type_1', '*.jpg'))
num_imgs = len(imgs)
print('Number of images of type_1 in train size: {}'.format(num_imgs))

Let's first define a function which reads in an image as numpy array and resize it to smaller size, then show a sample image.

In [None]:
SHORTER_EDGE = 400  # image's shorter edge after resizing
def read_img(img_name):
    img = skimage.io.imread(img_name)
    h0, w0, _ = img.shape
    if h0 >= w0:
        w = SHORTER_EDGE
        h = int(h0/w0*w)
    else:
        h = SHORTER_EDGE
        w = int(w0/h0*h)
    img = transform.resize(img, (h,w))
    img_lab = skimage.color.rgb2lab(img) # Later the LAB color space will be used.

    return img, img_lab

img_name = imgs[60]
img, img_lab = read_img(img_name)
plt.imshow(img)

As paper *Ref. 1* suggested: 

> The cervix region is a relatively pink region located near the image center.  We use two features: the *A* color channel of the *LAB* color space (the higher the value of *A*, the “redder” the pixel color) and the distance *R* of a pixel from the image center. The *R* feature provides spatial information and supports the extraction of continuous regions within the image plane. The image is separated next into two clusters in the 2-D (*A*-*R*) feature space; we use Gaussian mixture modeling, initialized by a K-means procedure, as a statistical clustering methodology. The cluster that has the highest *A*-mean and the lowest *R*-mean is selected. The ROI is chosen as the largest connected component within the pixels associated with this cluster. 

Let's implement this idea in PYTHON. The following function *find_roi_by_gsmix* clusters the images' pixels (all or in the mask region if mask is provided) into two classes, and return a mask represents the ROI.

In [None]:
# find ROI by using Gaussian mixture modeling
def find_roi_by_gsmix(img, mask=None):
    h,w,_ = img.shape
    x_coor = np.repeat(range(h), w)  # for calculating the R
    y_coor = np.tile(range(w), h)
    if mask is None:
        center = [h/2, w/2]
    else:
        mask = mask.reshape(-1)
        center = [np.mean(x_coor[mask==1]), np.mean(y_coor[mask==1])]
    R = np.sqrt((x_coor-center[0])**2 + (y_coor-center[1])**2)  # R
    A = img_lab[:,:,1].reshape(-1)  # A
    Ra = np.vstack([R, A]).T  # concat R and A

    scaler = MinMaxScaler()
    Ra = scaler.fit_transform(Ra)
    gs_mix = GaussianMixture(n_components=2, random_state=42, init_params='kmeans') # Gaussian mixture modele
    gs_mix.fit(Ra)
    labels = gs_mix.predict(Ra)
    
    # Cluster with lowest R-mean will be chosen as ROI
    means = gs_mix.means_
    if means[0,0] < means[1,0]:
        labels = 1 - labels
    mask = labels.reshape(h, w)

    return mask

roi_mask = find_roi_by_gsmix(img)
plt.imshow(roi_mask, 'gray')

See, it seems fairly reasonable. But, it's not enough. We then need perform some postprocessing. The first thing we should do is removal of the small separated regions in the mask shown above, hence only the largest continuous region is keeping alive.  The following function *postprocess_mask* let it come true. Well, if you want to perform some morphology operations, just set the *morp* parameter to *True* when you call this function.

In [None]:
def postprocess_mask(mask, morp=False):    
    
    selem = skimage.morphology.disk(5)
    mask = skimage.morphology.binary_erosion(mask,selem)
    
    labels_mask = measure.label(mask)
    regions = measure.regionprops(labels_mask)
    regions.sort(key=lambda x: x.area, reverse=True)
    if len(regions) > 1:
        for rg in regions[1:]:
            labels_mask[rg.coords[:,0], rg.coords[:,1]] = 0
    labels_mask[labels_mask!=0] = 1
    mask = labels_mask
    
    selem = skimage.morphology.disk(5)
    mask = skimage.morphology.binary_dilation(mask,selem)

    if morp:
        selem = skimage.morphology.disk(10)
        mask = skimage.morphology.binary_erosion(mask,selem)
        mask = skimage.morphology.binary_dilation(mask,selem)

    return mask

mask = postprocess_mask(roi_mask, True)
plt.imshow(mask, 'gray')

Now we can crop the ROI off the original image. Firstly, we need a function to define the rectangle which will be used to crop the image. The smallest circumscribed rectangle is our interest.

In [None]:
def find_rectangle_border(mask):
    h, w = mask.shape
    left = np.argmax(mask, axis=1)
    left = left[left!=0].min()
    right = np.argmax(mask[:,::-1], axis=1)
    right = w - right[right!=0].min()
    up = np.argmax(mask, axis=0)
    up = up[up!=0].min()
    down = np.argmax(mask[::-1,:], axis=0)
    down = h - down[down!=0].min()

    return up, down, left, right

def segment_img(img, rectangle):
    up, down, left, right = rectangle
    img_out = img[up:down, left:right]

    return img_out

rect = find_rectangle_border(mask)
img_out = segment_img(img, rect)
plt.imshow(img_out)

Yes, we did it! You may notice that some bling bling points in the image. These are the specular reflections which interfere with numerous computer vision tasks. It’s better to identificate them and remove them. 

## Specular Reflections Removing ##