<a href="https://colab.research.google.com/github/constantinpape/dl-teaching-resources/blob/main/exercises/image_analysis/image_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Image Manipulation Tutorial

## Downloading the data

We are going to work with Kaggle 2018 Data Science Bowl data.
To start with go the [data webpage](https://www.kaggle.com/c/data-science-bowl-2018) and read the data description.

Now let's download the data. To make it easier, we're going to work with a subset of it. 

In [None]:
!wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1O66UElt2ZfhLXUKKX_nTxmIXh6fMA2rT' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1O66UElt2ZfhLXUKKX_nTxmIXh6fMA2rT" -O kaggle_data.zip && rm -rf /tmp/cookies.txt

Remember that you can execute any bash command from the Notebook if you preceed the command name with '!'.

And please check whether the downloaded archive is around 80M (the value after the progress bar [ <=> ]). If the value is much smaller, rerun the previous cell - probably something failed. 

Those of you who like bash can play around with unzipping the data into nice folders. The rest of you can just run the following:


In [None]:
!unzip -qq kaggle_data.zip && rm kaggle_data.zip && rm stage1_test.zip
!mkdir nuclei_data && unzip -qq stage1_train.zip -d nuclei_data/ && rm stage1_train.zip

Don't forget that you can always check what is happening in your directory using `ls` :

In [None]:
!ls

In [None]:
!ls nuclei_data

Wow, that was a loooot of folders. Hint: you can clear the output of the cell by clicking the 'clear output' button below the 'run cell'.

Now let's check what they contain by taking one random folder name:




In [None]:
!ls nuclei_data/eb1df8ed879d04b36980b0958a0e8fc446ad08c0bdcf3b5f42e3db023187c7e5

In [None]:
!ls nuclei_data/eb1df8ed879d04b36980b0958a0e8fc446ad08c0bdcf3b5f42e3db023187c7e5/images

Okay, this one contains a png image. __TASK:__ Check other random folder names to make sure the structure is the same.

## Displaying images

Now we want to load some pictures and look at them. For this we would need the following libraries:

In [None]:
# we want to show images directly in the notebook
%matplotlib inline
import os       # to list folders content
import numpy as np    # scientific computing 
import matplotlib.pyplot as plt   # plotting and visualisation
import scipy
import skimage    # image processing library, works with images as numpy arrays
import skimage.feature
import skimage.filters
import skimage.io
import skimage.morphology
import skimage.transform
plt.rcParams['figure.figsize'] = [12, 8]  # set default figure size for matplotlib plots

In [None]:
# Let's load a random image 
image_path = 'nuclei_data/eb1df8ed879d04b36980b0958a0e8fc446ad08c0bdcf3b5f42e3db023187c7e5/images/eb1df8ed879d04b36980b0958a0e8fc446ad08c0bdcf3b5f42e3db023187c7e5.png'
image = skimage.io.imread(image_path)

In [None]:
# Since images are read as numpy arrays, we can check their shape and data type numpy-style
print(image.shape) # note that this is an RGBA image (has 4 channels)
print(image.dtype)

In [None]:
# Now let's visualize the image
plt.imshow(image)

If we want to have a better overview of what is happening in the folder, loading the images one by one is not the best approach. What we will do now is list all the folders we have, and write a function that will load an image file from a random folder.

In [None]:
folders_list = os.listdir('nuclei_data')  # get the list of all the folders inside nuclei_data
print(folders_list[0])   # let's see how the folder names look like

In [None]:
def show_random_image(dir_list):
  rand_idx = np.random.randint(0, len(dir_list))   # get a random index
  img_path = # TASK: what would be the image path here?
  random_image = skimage.io.imread(img_path)
  plt.imshow(random_image)

In [None]:
# Now test your function and see how the images in your folder look like
show_random_image(folders_list)

## Operations on images

Now let's look more into the operations that you can perform on your image. Firstly, let's write a function that visualises two images at the same time to examine the transformations visually.

In [None]:
def show_two_images(image1, image2):
    f, axarr = plt.subplots(1, 2)   # we need two images in a row
    axarr[0].imshow(image1)
    axarr[1].imshow(image2)
    _ = [ax.axis('off') for ax in axarr]   # remove the axis ticks
    plt.show()

In [None]:
# Now let's crop the upper left corner of the image and visualize it next to the original image
show_two_images(image, image[:100, :100])

Looks fine. Now let's go through some common image transformations.

## Converting to greyscale
If you have looked through enough random images from the folder, you might have noticed some of them are colored and some are gray-scale. If we want to build a generic pipeline that processes all of them, we might need to convert the coloured ones to grayscale as well. Also, we will invert the image to have 0 as background and 1 as signal

In [None]:
grey_image = skimage.color.rgb2gray(image)
print(grey_image.shape) # now we see that 4 channels are gone
grey_image = 1 - grey_image
show_two_images(image, grey_image)

As you can see from the second image, by default matplotlib is using 'viridis' colormap to display grayscale images. 

**TASK**: change the cmap parameter in `imshow` function to display the second image in greyscale (use 'Greys' colormap)

##Rescaling image

In [None]:
rescaled_image = skimage.transform.rescale(grey_image, scale=2, order=0)
print(rescaled_image.shape) # the image now is twice larger 
plt.imshow(rescaled_image, cmap='Greys')

**TASK** : try different interpolation orders. Which orders produce smoother images? 

Hint: for visualization crop small regions of the image to see more subtle differences (e.g. `show_two_images(img1[:100, :100], img2[:100, :100])`) 

**TASK** : find an skimage function that does the same, but instead of scale factor accepts output shape as a parameter 

Advanced **TASK**: modify `show_two_images` to show n images - given a list of any number of images the function should plot all of them in a row. Visualise multiple interpolations orders at the same time.

## Flipping image

In [None]:
flipped_image = np.flip(image, axis=0)
show_two_images(image, flipped_image)

## Rotating image 

**TASK**: use `skimage.transform.rotate` to rotate your image and visualise it


**TASK**: write a function that given an image and a number N will rotate the image by a random angle in range (0, N) degrees

In [None]:
# Your code here
rotated_image = #  Your code here
show_two_images(image, rotated_image)


## Thresholding image
In simple cases you can get an object mask (segment your object) using plain thresholding + connected components. Let's see how good it works for our images. Note, that here we will work on the greyscale version

In [None]:
threshold = 0.5
thresholded_image = grey_image > threshold 
cells_labels = skimage.measure.label(thresholded_image)
plt.imshow(cells_labels, cmap='nipy_spectral')
plt.show()

**TASK** : try different thresholds. Do you manage to get the objects properly separated? 

Let's plot a histogram of image values to see whether there is a good cutoff between background and objects

In [None]:
plt.hist(grey_image.flatten(), bins=100)
plt.show()

Doesn't loook like we can define a clear threshold here. Let's try more sophisticated methods

**TASK** : Smooth the image with Gaussian filter (`skimage.filters.gaussian`) and try different thresholds again. Does it look better?

In [None]:
gaussian_smoothed_image = #TODO

The object boundaries look smoother, but there are still many objects we are not able to split fully. 
**TASK** : take a look at other thresholding methods offered by skimage. Hint: many of them can be visualized with `skimage.filters.try_all_threshold`



## Morphological Transformations
The masks we got out of tresholding look suboptimal. There are small objects and some masks are merged together. We can try to alleviate there problems with such morphological transformation as opening - errosion followed by dilation (a good explanation [here](https://www.youtube.com/watch?v=uUweXBmm978)).

Let's see how it looks like.

In [None]:
# we need to set a structuring element (https://www.youtube.com/watch?v=9lqH5XLI-V4)
kernel = np.ones((3,3), np.uint8)
opened_image = scipy.ndimage.binary_opening(thresholded_image, structure=kernel, iterations=4)
show_two_images(thresholded_image, opened_image)


As we can see, many small object disappeared and some objects got split, but not all.

**TASK**: try different number of iterations. How does it affect the segmentation masks? 

**TASK**: Try closing (`scipy.ndimage.binary_closing`) to see what is does.

## Finding edges 
We can also try to use image gradients to detect object edges. Let's try to use the [Canny algorithm](https://scikit-image.org/docs/dev/auto_examples/applications/plot_coins_segmentation.html#edge-based-segmentation). 

In [None]:
canny_edges = skimage.feature.canny(grey_image, sigma=1) # we get the object boundaries - 'edges'
filled_edges = scipy.ndimage.binary_fill_holes(canny_edges) # we fill the inner part to get full objects
show_two_images(canny_edges, filled_edges)

While the boundaries seem relatively good, the resulting segmentation is far from perfect, because of the small holes in the boundaries. 

**TASK** : for the Canny detecton try different `sigma`, `low_threshold` and `high_threshold`. Does any combination give good object segmentation?

**Advanced TASK**: segment the image with the watershed algorithm as described in the "Segment nuclei" part of the [Segment human cells tutorial](https://scikit-image.org/docs/dev/auto_examples/applications/plot_human_mitosis.html#segment-nuclei).