# Day 4-Part 3: Images

Images are everywhere in geosciences, satellite images, aerial photos, map images, well core photos, and thin section photos are all example of images. Images are basically layers of gridded data with an intensity value at each point (or pixel). The layers in the image are called channels (e.g., the red, green and blue channels), and the combination of these channels produces a specific color at each pixel.

In Python, we can use the [pillow](https://python-pillow.org) library to load, display and manipulate images. I believe `pillow` is installed in the Anaconda distribution, but if not, you will need to install it as follows:

In [None]:
# run this cell if pillow is not installed
import sys
!{sys.executable} -m pip install pillow

Once we load the image with `pillow`, it may be easier to convert the image to an array. `numpy` has methods for doing that. It is also possible to convert from an array to an image using the `pillow` library. 

The following example illustrates the loading, manipulation, and saving of images.

## Determining visual porosity from a sandstone thin section

This example is based on the following [repository](https://github.com/Philliec459/Create-Thin-Section-Image-Labels-for-Semantic-Segmentation-Training). The image `ss_thin_section.png` in the data folder, is an epoxy thin section of a sandstone. We start by loading the image using the `Image.open` function, and printing the image format, size and mode: 

In [None]:
from PIL import Image
import os

im = Image.open(os.path.join("..", "data", "ss_thin_section.png")) # open image

print(im.format, im.size, im.mode) # print image information

im

The image is in *portable network graphics* PNG format, and it has 537 by 443 pixels, in three channels: red (R), green (G) and blue (B). Now, let's convert the image to an array using the `numpy.array` method, and print the shape, and maximum and minimum values of the array:

In [None]:
import numpy as np

data = np.array(im) # convert image to array, "asarray" also works here

print(data.shape) # print shape of array

print("minimum value =", np.amin(data), "maximum value =", np.amax(data)) # print max and min array values

The array is a 3D array, which basically represents 3 (`data.shape[2]`) 2D layered grids (each one 443 x 537 pixels in size), corresponding to the R, G and B channels. The minimum and maximum values in the grid are 0 and 255. These values come from the format of the image (PNG), which uses a fixed 8-bit colormap of $2^8=256$ possible combinations representing different intensities. Now, let's plot the image:

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(8, 6)) # create figure

ax.imshow(data); # plot image

Let's plot now the histogram of the image using the `numpy` `histogram` method. Notice that we set the number of bins equal to the number of intensity values:

In [None]:
histogram, bin_edges = np.histogram(data, bins=256, range=(0, 256)) # make a histogram of image

fig, ax = plt.subplots() # create figure

ax.plot(bin_edges[0:-1], histogram) # plot histogram (plot all bin_edges vs. his)

# figure title and axes labels
ax.set_title("Histogram Original Thin Section Image")
ax.set_xlabel("value")
ax.set_ylabel("pixels"); 

However, we said that the image has three channels: R, G and B. What does the histogram above represent? To understand this, let's plot the histograms of the separate channels:

In [None]:
colors = ["red", "green", "blue"] # list of colors
channel_ids = [0, 1, 2] # list of channels ids

fig, ax = plt.subplots() # create figure

# plot histograms
for channel_id, c in zip(channel_ids, colors): # iterate channel ids and colors
    histogram, bin_edges = np.histogram(data[:, :, channel_id], bins=256, range=(0, 256)) # compute histogram
    ax.plot(bin_edges[0:-1], histogram, color=c, label=c + " channel") # plot histogram

# figure title, axes labels and legend
ax.set_title("Histogram Original Thin Section Image")
ax.set_xlabel("value")
ax.set_ylabel("pixels")
ax.legend(loc = "upper left");

So, as you can see, the histogram of the image is just the sum of the histograms of the individual channels.

Now, let's do something more interesting. Let's process the image to better highlight the main components of the thin section. To do this, we use the [scikit-image](https://scikit-image.org) library, which is a collection of algorithms for image processing. I believe `scikit-image` is installed in the Anaconda distribution, but if not, you will need to install it as follows:

In [None]:
# run this cell if scikit-image is not installed
import sys
!{sys.executable} -m pip install scikit-image

Let's convert the image to gray scale using the `scikit-image.color.rgb2gray` function (more about how this function works [here](https://scikit-image.org/docs/dev/auto_examples/color_exposure/plot_rgb_to_gray.html)), and then smooth the image a little bit using a gaussian filter ([sckit-image.filters.gaussian](https://scikit-image.org/docs/stable/api/skimage.filters.html#skimage.filters.gaussian)). We also print the size of the filtered image, the minimum and maximum values, and plot the image:

In [None]:
from skimage import color, filters

data_gray = color.rgb2gray(data) # converts RGB image to single grayscale channel

data_grad = filters.gaussian(data_gray) # apply Gaussian filter to grayscale image

# print size and minimum and maximum values
print(data_grad.shape) 
print("minimum value = {:.3f}, maximum value = {:.3f}".format(np.amin(data_grad), np.amax(data_grad)) ) 

# plot filtered image, notice that colormap is the default "viridis"
fig, ax = plt.subplots(figsize=(8, 6))
plt.imshow(data_grad)
plt.colorbar();

So, the image is just one single channel (grayscale), and the intensity values go from 0.0 to 1.0. Let's plot the histogram of this image. Similarly to the original image, we use 256 bins:

In [None]:
histogram, bin_edges = np.histogram(data_grad, bins=256, range=(0.0, 1.0)) # compute histogram of filtered image

fig, ax = plt.subplots() # create figure

# plot histogram
ax.plot(bin_edges[0:-1], histogram) 
# set figure grid, title, and labels
ax.grid(axis = "x")
ax.set_title("Histogram Gradient Gray Level Image")
ax.set_xlabel("value")
ax.set_ylabel("pixels"); 

So as you can see, in comparison to the histogram of the original image, this histogram is smoother, though one can tell different classes that roughly correspond to the different components in the thin section. Let's do a simple classification or labeling of the image based on its intensity values. Notice that doing this involves a little bit of trial and error. We also plot the labeled image:

In [None]:
label = np.zeros(data_grad.shape ) # initialize label arrray to zeros

# fill label array with classes 1 to 5
# these classes depend on the intensity values
label[data_grad < 0.25] = 1 #black grains 
label[data_grad > 0.25] = 2 #darker grains 
label[data_grad > 0.4]  = 3 #blue-dye epoxy 
label[data_grad > 0.6]  = 4 #darker grains
label[data_grad > 0.75]  = 5 #bright quartz grains

# plot labeled image
fig, ax = plt.subplots(figsize=(8, 6))
plt.imshow(label, interpolation="none")
plt.colorbar(ticks=np.arange(1,6,1));

Let's look at the histogram of the labeled image:

In [None]:
histogram, bin_edges = np.histogram(label, bins=256, range=(0.0, 6.0)) # compute histogram of labeled image

fig, ax = plt.subplots() # create figure

# plot histogram
ax.plot(bin_edges[0:-1], histogram)
ax.set_title("Histogram of labeled Image")
ax.set_xlabel("value")
ax.set_ylabel("pixels"); 

So now our image is divided in five classes. Let's find out the number of pixels in each class, plot them in a pie diagram, and compute visual porosity (class 3/sum of all classes):

In [None]:
black_grains = np.sum(label == 1) # number of pixels on black grains
dark_grains = np.sum(label == 2) + np.sum(label == 4) # number of pixels on dark grains
epoxy = np.sum(label == 3) # number of pixels on epoxy
bright_grains = np.sum(label == 5) # number of pixels on bright, quartz grains
all_comp = black_grains + dark_grains + epoxy + bright_grains # all components

fig, ax = plt.subplots() # create figure

ax.pie([black_grains, dark_grains, epoxy, bright_grains], 
      labels=["black grains", "dark grains", "epoxy", "bright grains"]); # plot pie diagram

print("visual porosity = {:.2f}".format(epoxy/all_comp) ) # visual porosity

Labeled images like these are used for the training of machine learning algorithms that can predict the segmentation of the image (the different components in the thin section) from the original image ([see this repository](https://github.com/Philliec459/Semantic-Segmentation-of-Petrographic-Thin-Sections-using-Keras)). We are not going to do this here, but let's save the labeled image:

In [None]:
# make image from array
im = Image.fromarray(label)

# convert image to 8-bit pixels, black and white
im = im.convert("L")

# save image
im.save("ss_thin_section_labeled.png")

The saved image is just black, because it is just one channel with pixels of 1 to 5 values. To recreate the image again, you will need to load it, convert it to an array, and plot it:

In [None]:
im = Image.open("ss_thin_section_labeled.png") # open image

data = np.array(im) # convert image to array

# plot image
fig, ax = plt.subplots(figsize=(8, 6)) 
plt.imshow(data, interpolation="none")
plt.colorbar(ticks=np.arange(1,6,1));

This was a brief introduction to images. There is more to learn. There are lots of online repositories on image processing with Python. However, not that many are relevant for geosciences. I recommend the [following repository](https://github.com/joferkington/geo_image_processing_tutorial). Check for example the [notebook 3](https://github.com/joferkington/geo_image_processing_tutorial/blob/master/03%20-%20Orientation%20Analysis.ipynb), thin section grain analyisis, for a very cool example on measuring the preferred orientation of mineral grains in a thin section.

To practice, try exercise 3 in `day4/lab/lab4.pdf`