#Creating Histograms

As it pertains to images, a *histogram* is a graphical representation showing how frequently various color values occur in the image. We saw in the Image Basics episode that we could use a histogram to visualize the differences in uncompressed and compressed image formats. If your project involves detecting color changes between images, histograms will prove to be very useful, and histograms are also quite handy as a preparatory step before performing Thresholding or Edge Detection.

##Grayscale Histograms

We will start with grayscale images and histograms first, and then move on to color images. The next several code cells contain Python code to load an image in grayscale instead of full color, and then create and display the corresponding histogram. The first few lines are:

In [None]:
# imports for image i/o and viewing
import skimage.color
import skimage.io

from matplotlib import pyplot as plt
import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 150

# for numeric maniuplation of images
import numpy as np

# read image as grayscale, and display it
image = skimage.io.imread('https://i.imgur.com/EqJBCJZ.jpg', as_gray=True)
skimage.io.imshow(image)
plt.show()

All of the code in the previous cell should be familiar to you, based on previous lessons. It is not visible, but it may be important to know that loading a color image as grayscale returns a two-dimensional NumPy array of numbers in the range $\left[0, 1\right]$, where $0$ is pure black and $1$ is pure white. If we need to convert back to integers in the range $\left[0, 255\right]$, we can can use the `skimage.util.img_as_ubyte()` function.

`skimage` does not provide a special function to compute histograms, but we can use the `np.histogram()` function, like this:

In [None]:
# create the histogram
histogram, bin_edges = np.histogram(image, bins=256, range=(0, 1))

The parameter `bins` determines the histogram size, or the number of "bins" to use for the histogram. We pass in 256 because we want to see the pixel count for each of the 256 possible values in the grayscale image.

The parameter `range` is the range of values each of the pixels in the image can have. Here, we pass 0 and 1, which is the value range of our input image after transforming it to grayscale.

The first output of the `np.histogram()` function is a one-dimensional NumPy array, with 256 rows and one column, representing the number of pixels with the color value corresponding to the index. I.e., the first number in the array is the number of pixels found with color value 0, and the final number in the array is the number of pixels found with color value 255. The second output of np.histogram is an array with the bin edges and one column and 257 rows (one more than the histogram itself). There are no gaps between the bins, which means that the end of the first bin, is the start of the second and so on. For the last bin, the array also has to contain the stop, so it has one more element than the histogram.

Next, we turn our attention to displaying the histogram, by taking advantage of the plotting facilities of the `matplotlib` library.

In [None]:
# configure and draw the histogram figure
plt.figure()
plt.title("Grayscale Histogram")
plt.xlabel("grayscale value")
plt.ylabel("pixels")
plt.xlim([0.0, 1.0])  

plt.plot(bin_edges[0:-1], histogram) 
plt.show()

We create the plot with `plt.figure()`, then label the figure and the coordinate axes with `plt.title()`, `plt.xlabel()`, and `plt.ylabel()` functions. The last step in the preparation of the figure is to set the limits on the values on the x-axis with the `plt.xlim([0.0, 1.0])` function call.

Finally, we create the histogram plot itself with `plt.plot(bin_edges[0:-1], histogram)`. We use the left bin edges as x-positions for the histogram values by indexing the `bin_edges` array to ignore the last value (the right edge of the last bin). Then we make it appear with `plt.show()`. 

---
> **Using a mask for a histogram**
> 
> Looking at the histogram above, you will notice that there is a large 
> number of very dark pixels, as indicated in the chart by the spike 
> around the grayscale value 0.12. That is not so surprising, since the 
> original image is mostly black background. What if we want to focus 
> more closely on the leaf of the seedling? That is where a mask enters 
> the picture!
> 
> The cell below contains a copy of the code above, with comments 
> showing where to make changes. 
> 
> First, use an image editing program to determing the 
> $\left(x, y\right)$ coordinates of a *bounding box* around the leaf
> of the seedling. Then, using techniques from the Drawing and Bitwise
> Operations lesson, create a mask with a white rectangle covering
> that bounding box.
> 
> After you have created the mask, apply it to the input image before
> passing it to the `np.histogram()` function. Then, run the code on
> the masked image and observe the resulting histogram.
---

In [None]:
# import for drawing
import skimage.draw

# read image as grayscale
image = skimage.io.imread('https://i.imgur.com/EqJBCJZ.jpg', as_gray=True)

# TODO: create mask here, using np.zeros() and skimage.draw.rectangle()

# TODO: mask the image and create the new histogram

# configure and draw the histogram figure
plt.figure()

plt.title("Grayscale Histogram")
plt.xlabel("grayscale value")
plt.ylabel("pixel count")
plt.xlim([0.0, 1.0])
plt.plot(bin_edges[0:-1], histogram)

plt.show()

##Color Histograms

We can also create histograms for full color images, in addition to grayscale histograms. We have seen color histograms before, in the Image Basics episode. A program to create color histograms starts in a familiar way:

In [None]:
# read image in color
image = skimage.io.imread('https://i.imgur.com/EqJBCJZ.jpg')

# display for pedagogical purposes
skimage.io.imshow(image)
plt.show()

Next, we create the histogram, by calling the `np.histogram` function three times, once for each of the channels. We obtain the individual channels, by slicing the image along the last axis. For example, we can obtain the red color channel by calling `r_chan = image[:, :, 0]`.

In [None]:
# tuple to select colors of each channel line
colors = ("r", "g", "b")
channel_ids = (0, 1, 2)

# create the histogram plot, with three lines, one for
# each color
plt.xlim([0, 256])
for channel_id, c in zip(channel_ids, colors):
    histogram, bin_edges = np.histogram(
        image[:, :, channel_id], bins=256, range=(0, 256)
    )
    plt.plot(bin_edges[0:-1], histogram, color=c)

plt.xlabel("Color value")
plt.ylabel("Pixels")

plt.show()

We will draw the histogram line for each channel in a different color, and so we create a tuple of the colors to use for the three lines with the

```
colors = ("r", "g", "b")
```

line of code. Then, we limit the range of the x-axis with the `plt.xlim()` function call.

Next, we use the `for` control structure to iterate through the three channels, plotting an appropriately-colored histogram line for each. This may be new Python syntax for you, so we will take a moment to discuss what is happening in the for statement.

The Python built-in `zip()` function takes a series of one or more lists and returns an iterator of tuples, where the first tuple contains the first element of each of the lists, the second contains the second element of each of the lists, and so on.

> **Iterators, tuples, and `zip()`**
>
> In Python, an *iterator*, or an *iterable object*, is, basically, 
> something that can be iterated over with the `for` control structure. 
> A *tuple* is a sequence of objects, just like a list. However, a 
> tuple cannot be changed, and a tuple is indicated by parentheses 
> instead of square brackets. The `zip()` function takes one or more 
> iterable objects, and returns an iterator of tuples consisting of the 
> corresponding ordinal objects from each parameter.
> 
> For example, consider this small Python program:
>
> ```
> list1 = (1, 2, 3, 4, 5)
> list2 = ("a", "b", "c", "d", "e")
> 
> for x in zip(list1, list2):
>   print(x)
>```
>
> Executing this program would produce the following output:
> 
> ```
> (1, ‘a’)
> (2, ‘b’)
> (3, ‘c’)
> (4, ‘d’)
> (5, ‘e’)

In our color histogram program, we are using a tuple, `(channel_id, c)`, as the for variable. The first time through the loop, the `channel_id` variable takes the value 0, referring to the position of the red color channel, and the `c` variable contains the string `"r"`. The second time through the loop the values are the green channels position and `"g"`, and the third time they are the blue channel position and `"b"`.

Inside the `for` loop, our code looks much like it did for the grayscale example. We calculate the histogram for the current channel with the

```
histogram, bin_edges = np.histogram(image[:, :, channel_id], bins=256, range=(0, 256))
```

function call, and then add a histogram line of the correct color to the plot with the

```
plt.plot(bin_edges[0:-1], histogram, color=c)
```

function call. Note the use of our loop variables, `channel_id` and `c`.

Finally we label our axes and display the histogram as before. 

---
> **Color histogram with a mask**
>
> We can use a mask with color histograms, in the same way we did for 
> grayscale histograms. Consider this image of a well plate, where 
> various chemical sensors have been applied to water and various 
> concentrations of hydrochloric acid and sodium hydroxide:
>
> <img src="https://datacarpentry.org/image-processing/fig/09-well-plate.jpg" alt="96-well plate image" style="float: left; margin-right:10px;"/>
>
> Suppose we are interested in the color histogram of one of the 
> sensors in the well plate image, specifically, the seventh well from 
> the left in the topmost row, which shows Erythrosin B reacting with 
> water.
> 
> Use an image processing program to find the center of that well and 
> the radius (in pixels) of the well. Then, edit the code in the cells
> below, create a circular mask to select only the desired well. Then, 
> use that mask to apply the color histogram operation to that well. 
> When you execute the program on the `plate-01.tif` image, your 
> program should display maskedImg, which will look like this:
> 
> <img src="https://datacarpentry.org/image-processing/fig/04-masked-well-plate.jpg" alt="masked 96-well plate image" style="float: left; margin-right:10px;"/>
>
> The code should produce a color histogram that looks like this:
>
> <img src="https://datacarpentry.org/image-processing/fig/04-well-plate-histogram.png" alt="color histogram" style="float: left; margin-right:10px;"/>
---

In [None]:
# load image
image = skimage.io.imread('https://i.imgur.com/ronDSZP.png')

# display the original image for pedagogical purposes
skimage.io.imshow(image)
plt.show()


In [None]:
# create a circular mask to select the 7th well in the first row
# TODO: WRITE YOUR CODE HERE

# just for pedagogical purposes:
# make a copy of the image, call it masked_image, and
# use np.logical_not() and indexing to apply the mask to it
# TODO: WRITE YOUR CODE HERE

# create a new window and display maskedImg, to verify the
# validity of your mask
# TODO: WRITE YOUR CODE HERE


In [None]:
# list to select colors of each channel line
colors = ("r", "g", "b")
channel_ids = (0, 1, 2)

# create the histogram plot, with three lines, one for
# each color
plt.xlim([0, 256])
for (channel_id, c) in zip(channel_ids, colors):
    # change this to use your circular mask to apply the histogram
    # operation to the 7th well of the first row
    # MODIFY CODE HERE
    histogram, bin_edges = np.histogram(
        image[:, :, channel_id][mask], bins=256, range=(0, 256)
    )

    plt.plot(histogram, color=c)

plt.xlabel("color value")
plt.ylabel("pixel count")

plt.show()

---
> **Histograms for the morphometrics challenge**
> 
> Using the grayscale and color histogram code we developed in this
> episode, create histograms for the bacteria colony images in the 
> following URLs. Save the histograms for later use.
>
> Colony image 1: <a href='https://i.imgur.com/uM0Rt9r.png'>https://i.imgur.com/uM0Rt9r.png</a>
> 
> Colony image 2: <a href='https://i.imgur.com/MAWoq9A.png'>https://i.imgur.com/MAWoq9A.png</a>
> 
> Colony image 3: <a href='https://i.imgur.com/SrG8kTQ.png'>https://i.imgur.com/SrG8kTQ.png</a>
---

In [None]:
# TODO: Execute grayscale and color histograms on the
# bacterial colony images
