In this notebook, I've tried to break down Stephen Bailey's fantastic notebook ( https://www.kaggle.com/stkbailey/teaching-notebook-for-total-imaging-newbies) so as to make the concepts simpler to understand.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

from subprocess import check_output
print(check_output(["ls", "../input"]).decode("utf8"))

# Any results you write to the current directory are saved as output.

In [None]:
#Importing the other necessary libraries
import pathlib
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

**Reading the image**

In [None]:
#Importing OpenCV - the computer vision library
import cv2

We will be using the same example. We can try out something different (i.e a different picture) in future work.

In [None]:
# Glob the training data and load a single image path
training_paths = pathlib.Path('../input/stage1_train').glob('*/images/*.png')
training_sorted = sorted([x for x in training_paths])
im_path = training_sorted[45]

In [None]:
#To read the image 
bgrimg = cv2.imread(str(im_path))
plt.imshow(bgrimg)
plt.xticks([]) #To get rid of the x-ticks and y-ticks on the image axis
plt.yticks([])
print('Original Image Shape',bgrimg.shape)

In [None]:
#To see the structure of the image let's display one row of the image matrix
print('The first row of the image matrix contains',len(bgrimg[1]),'pixels')
print(bgrimg[1])

The image has been read in the BGR colorspace. We have a third dimension as every pixel is represented by it's B, G and R components. This is the default colorpsace in which images are read in OpenCV. A particular BGR/RGB color space is defined by the three chromaticities of the red, green, and blue additive primaries, and can produce any chromaticity that is the triangle defined by those primary colors. In simpler terms - An RGB color can be understood by thinking of it as all possible colors that can be made from three colored lights for red, green, and blue. For more information : https://en.wikipedia.org/wiki/RGB_color_space

## Basic Solution

In [None]:
#To transfrom the colorspace from BGR to grayscale so as to make things simpler
grayimg = cv2.cvtColor(bgrimg,cv2.COLOR_BGR2GRAY)

In [None]:
#To plot the image
plt.imshow(grayimg,cmap='gray') #cmap has been used as matplotlib uses some default colormap to plot grayscale images
plt.xticks([]) #To get rid of the x-ticks and y-ticks on the image axis
plt.yticks([])
print('New Image Shape',grayimg.shape)

It is important to understand the structure of the image here. We reduced a dimension when we transformed from the BGR colorspace to grayscale. Why did this happen? This is because grayscale is a range of monochromatic shades from black to white. Therefore, a grayscale image contains only shades of gray and no color (i.e it primarily contains only black and white). Transforming the colorspace removes all color information, leaving only the luminance of each pixel. Since digital images are displayed using a combination of red, green, and blue (RGB) colors, each pixel has three separate luminance values. Therefore, these three values must be combined into a single value when removing color from an image. Luminance can also be described as brightness or intensity, which can be measured on a scale from black (zero intensity) to white (full intensity)

In [None]:
#To understand this further, let's display one entire row of the image matrix
print('The first row of the image matrix contains',len(grayimg[1]),'pixels')
print(grayimg[1])

Thus this displays one entire row of the image matrix with the corresponding luminance or intensities of every pixel

**Removing the background**

In [None]:
#Okay let's look at the distribution of the intensity values of all the pixels
plt.figure(figsize=(10,5))

plt.subplot(1,2,1)
sns.distplot(grayimg.flatten(),kde=False)#This is to flatten the matrix and put the intensity values of all the pixels in one single row vector
plt.title('Distribution of intensity values')

#To zoom in on the distribution and see if there is more than one prominent peak 
plt.subplot(1,2,2)
sns.distplot(grayimg.flatten(),kde=False) 
plt.ylim(0,30000) 
plt.title('Distribution of intensity values (Zoomed In)')

We can see that there are 2 prominent peaks. The count of pixels with intensity values around 0 is extrememly high (250000). We would expect this to occur as the nuclei cover a smaller portion of the picture as compared to the background which is primarily black. Our job here is to seperate the two, that is, seperate the nuclei from the background. The optimal seperation value is somewhere around 20 but rather than relying on such descriptive statistics, we should take a more formal approach such as using Otsu's method.
Otsu's method, named after Nobuyuki Otsu is used to automatically perform clustering-based image thresholding, or, the reduction of a graylevel image to a binary image. The algorithm assumes that the image contains two classes of pixels following bi-modal histogram (foreground pixels and background pixels), it then calculates the optimum threshold separating the two classes so that their combined spread (intra-class variance) is minimal, or equivalently, so that their inter-class variance is maximal. Otsu’s method exhibits relatively good performance if the histogram can be assumed to have bimodal distribution and assumed to possess a deep and sharp valley between two peaks (source : https://en.wikipedia.org/wiki/Otsu%27s_method) 

In [None]:
from skimage.filters import threshold_otsu
thresh_val = threshold_otsu(grayimg)
print('The optimal seperation value is',thresh_val)

Now we'll use the np.where function to encode all pixels with an intensity value > the threshold value as 1 and all other pixels as 0.  The result of this function will be stored in a variable called mask

In [None]:
mask=np.where(grayimg>thresh_val,1,0)

In [None]:
#To plot the original image and mask side by side
plt.figure(figsize=(12,6))
plt.subplot(1,2,1)
plt.imshow(grayimg,cmap='gray')
plt.title('Original Image')

plt.subplot(1,2,2)
maskimg = mask.copy()
plt.imshow(maskimg, cmap='viridis')
plt.title('Mask')


We see that the mask has done a decent job. If these images were to appear in a newspaper column titled 'Spot the difference between' (except the obvious colour difference), it would have had people scratch their heads in frustration. However a more careful look suggests that the mask hasn't found out all the nuclei, especially the two in the top right corner. Around the (500,400) mark, the three nuclei have been all combined together to form one cluster. The darker coloured nuclei are causing a problem as the pixels that represent these nuclei have intensity values lesser than Otsu's threshold value.

In [None]:
#Let's see if K-Means does a good job on this data 
from sklearn.cluster import KMeans
kmeans=KMeans(n_clusters=2) #2 as we're still trying to seperate the lighter coloured nuclei from the darker coloured background 
kmeans.fit(grayimg.reshape(grayimg.shape[0]*grayimg.shape[1],1))

plt.figure(figsize=(12,6))
plt.subplot(1,2,1)
plt.imshow(kmeans.labels_.reshape(520,696),cmap='magma')
plt.title('K-Means')

plt.subplot(1,2,2)
plt.imshow(maskimg, cmap='viridis')
plt.title('Mask with Otsu Seperation')

It's extrememly hard to tell if there's a difference. Let's see if there is any difference by comparing the labels of Otsu and K-Means at a pixel level, summing over the booleans and dividing them by the total number of pixels in the image. If the result is 1, it means there is no difference at all

In [None]:
#To check if there's any difference
sum((kmeans.labels_.reshape(520,696)==mask).flatten())/(mask.shape[0]*mask.shape[1])

There is no difference at all. For a deeper explanantion as to why this could have happened, one may read D Liu's paper (http://ieeexplore.ieee.org/document/5254345/?reload=true) where he has compared K-Means with Otsu's method 

**Object identification**

To get a count of the total number of nuclei, we can use the ndimage.label function which labels features (pixels) in an array based on their interconnectedness. So for example if [1 1 1 0 0 1 1] was our row vector, using ndimage.label on this would give us [1 1 1 0 0 2 2] signifying the fact that there are 2 distinct objects in the row vector. The function returns the labeled array and the number of distinct objects it found in the array.

In [None]:
from scipy import ndimage
labels,nlabels=ndimage.label(mask)
print('There are',nlabels,'distinct nuclei in the mask.')

Now, there could be more nuclei than that as some nuclei have been combined into one and our mask hasn't been able to identify all the nuclei, especially the ones in the top right corner. However the 2 seperate spots in the top right corner get labelled as 2 different objects.

All in all the two major problems in this image are:
- Insignificant spots/dots being labelled as nuclei. These spots should have their labels (KMeans, Otsu) set to 0 if their sizes are too small. This problem has been caused by some nuclei that have pixels where the intensity values are lesser than Otsu's threshold value, thus causing only some pixels to have their label encoded as 1.
- The nuclei that are closer to one another get clustered to form one nuclei. So we need to seperate them using some edge detection algorithm (like convolution with a sobel filter or canny edge detector as suggested by Ramsu)

Now for this competition we need to have a seperate mask for every nucleus. In the file named 'stage1_train_labels.csv.zip', we have the image IDs in one column and the Run Length Encoded (RLE) vector for one such mask (i.e for one nucleus) in the other column.

In [None]:
#Since we need to create a seperate mask for every nucelus, let's store the masks in an iterable like a list 
label_array=[]
#We need to iterate from 1 as ndimage.label encodes every object starting from number 1
for i in range(1,nlabels+1):
    label_mask = np.where(labels==i,1,0)
    label_array.append(label_mask)
#To see one such mask
label_array[68]

The 1s represent 1 such object (nucleus) in the entire picture.

**Run Length Encoding**

 Every mask for every nucleus requires an RLE vector. This is the format required by the competition. 

What is RLE?

RLE or Run Length Encoding converts a matrix into a vector and returns the position/starting point of the first pixel from where we observe an object (identified by a 1) and gives us a count of how many pixels from that pixel we see the series of 1s. In the ndimage.label function example of [1 1 1 0 0 1 1], running RLE would give us 0 3 5 2, which means 3 pixels from the zeroth pixel (inclusive) and 2 pixels from the 5th pixel we see a series of 1s

In [None]:
#Function for rle encoding
def rle(x):
    '''
    x: numpy array of shape (height, width), 1 - mask, 0 - background
    Returns run length as list
    '''
    dots = np.where(x.T.flatten()==1)[0] # .T sets Fortran order down-then-right
    run_lengths = []
    prev = -2
    for b in dots:
        if (b>prev+1): run_lengths.extend((b+1, 0))
        run_lengths[-1] += 1
        prev = b
    return " ".join([str(i) for i in run_lengths])

Credit to Kaggle user rahlkin https://www.kaggle.com/rakhlin/fast-run-length-encoding-python for developing this function that has been used by many Kagglers for the purpose of this competition.

In [None]:
#Running RLE on the last label_mask in label_array gives us 
rle(label_mask)

**Putting everything together**

In [None]:
im_path.parts

In [None]:
#Now defining a function that does is applicable to all images
def basic(im_path):
    #Reading the image
    im_id=im_path.parts[-3] #To extract the image ID
    bgr = cv2.imread(str(im_path)) #Reading it in OpenCV
    gray = cv2.cvtColor(bgr,cv2.COLOR_BGR2GRAY) #Converting everything to grayscale from BGR

    #To remove the background
    thresh_val = threshold_otsu(gray) #Using Otsu's method to seperate the foreground objects from the background
    mask = np.where(gray > thresh_val, 1, 0) #Coding objects with intensity values higher than background as 1
    
    #Extracting connected objects
    test_rle=pd.DataFrame()
    labels, nlabels = ndimage.label(mask) #labels gives us the label of the different objects in every image starting from 1 and nlabels gives us the total number of objects in every image
    for i in range(1,nlabels+1): #Iterating through every object/label
        label_mask = np.where(labels==i,1,0) #Individual masks for every nucleus
        RLE = rle(label_mask) #RLE for every mask
        solution = pd.Series({'ImageId': im_id, 'EncodedPixels': RLE})
        test_rle = test_rle.append(solution, ignore_index=True)
    
    #Return the dataframe
    return(test_rle)
        

In [None]:
#Defining a function that takes a list of image paths (pathlib.Path objects), analyzes each and returns a submission ready DataFrame
def list_of_images(im_path_list):
    all_df = pd.DataFrame()
    for im_path in im_path_list: #We'll use this for the test images
        im_df = basic(im_path) #Creating one dataframe for every image 
        all_df = all_df.append(im_df, ignore_index=True) #Appending all these dataframes
    
    #Returing the submission ready dataframe
    return (all_df)

In [None]:
#Final submission
test_images = pathlib.Path('../input/stage1_test/').glob('*/images/*.png')
basic_solution = list_of_images(list(test_images))
basic_solution.to_csv('basic_solution.csv', index=None)

The submission scored 0.201 which gives us our baseline accuracy. Any layer of complexity that we add onto this should better this score, failing which it is absolutely useless.

Some important questions to ask are:

- Will we achieve a satisfactory performance by converting all pictures to grayscale? What are the different types of pictures in the dataset?
- What are the numerous ways to seperate the background from objects of interest? Otsu's method requires computing a graylevel histogram for us to find the optimum seperation value. In that respect KMeans may work better on images that aren't particularly grayscale or on images where there is no sharp contrast in the intensity values between objects of interest and the background
- What are some useful edge detection algorithms to create boundaries between nuclei that are extremely close to one another?
- How do we as humans identify objects in an image? We indeed take it for granted but if we think of objects as anything that has a fixed shape and size and is prominent with respect to the background, what is the technical (or computer) definition of these terms?