### Import packages

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import image
from PIL import Image
import cv2

### Loading data train.csv

In [None]:
df_train = pd.read_csv('../input/tensorflow-great-barrier-reef/train.csv')

### Dataframe description
Metadata for each image in the training set indexed by the unique image ids, comprising both sequence and bounding box information.
* video_id - ID number of the video the image was part of. The video ids are not meaningfully ordered.
* video_frame - The frame number of the image within the video. Expect to see occasional gaps in the frame number from when the diver surfaced.
* sequence - ID of a gap-free subset of a given video. The sequence ids are not meaningfully ordered.
* sequence_frame - The frame number within a given sequence.
* image_id - ID code for the image, in the format '{video_id}-{video_frame}'
* annotations - The bounding boxes of any starfish detections in a string format that can be evaluated directly with Python. Does not use the same format as the predictions you will submit. Not available in test.csv. A bounding box is described by the pixel coordinate (x_min, y_min) of its upper left corner within the image together with its width and height in pixels.

Initial parte of data.

In [None]:
df_train[15:20]

Final part of data.

In [None]:
df_train.tail()

Showing the first image.

In [None]:
img_0 = plt.imread("../input/tensorflow-great-barrier-reef/train_images/video_0/0.jpg")
plt.imshow(img_0)
plt.show()

Color system

In [None]:
image_0 = Image.open("../input/tensorflow-great-barrier-reef/train_images/video_0/0.jpg")
print(image_0.mode)

Data type and dimension (width, height, canal)

In [None]:
print(type(img_0))
print(img_0.dtype)
print(img_0.shape)

### The structure of an image

When we look at an image, its smallest unit is called a pixel. The pixel is represented by three 8 bits numbers associated with the Red, Green and Blue (RGB) colors, where each color is a channel ranging from 0 to 255. Therefore, the color of a pixel corresponds to a combination of that range of channels.
The lowest value in this range (0) corresponds to the black color, the highest (255) represents the white color.

As seen in the cell above, an image is nothing more than arrays <class 'numpy.ndarray'>, and we can observe this using the numpy library.

In [None]:
print(img_0)

It is possible to see that for each line in the matrix above there are 3 values, and these values correspond to the hue of the RGB colors, and that each line represents a pixel.

### Histogram equalization

A common treatment when using images is their equalization from the interpretation of their histograms, thus ensuring an adequate pre-processing of the image.

This histogram represents the colors in pixels within a given range representing the three color channels.

In [None]:
plt.hist(img_0.ravel(), 256, [0, 256])
plt.show()

Now, let's see how the histograms are divided into the blue, green and red channels. With this separation we can better understand the composition and distribution of colors.

In [None]:
blue, green, red = cv2.split(img_0)

In [None]:
plt.figure(figsize=(20,5))
plt.subplot(131)
plt.hist(blue.ravel(), 256, [0, 256])
plt.title('Blue histogram')

plt.subplot(132)
plt.hist(green.ravel(), 256, [0, 256])
plt.title('Green histogram')

plt.subplot(133)
plt.hist(red.ravel(), 256, [0, 256])
plt.title('Red histogram')

plt.show()

### Converting color scale

For some image classification algorithms, transforming the image to a gray scale is often one of the first steps. Below we can see the original image, in grayscale and its equalized form.

In [None]:
image_0_gray = cv2.imread("../input/tensorflow-great-barrier-reef/train_images/video_0/0.jpg", 0)
image_0_eq_hist = cv2.equalizeHist(image_0_gray)

In [None]:
plt.figure(figsize=(20,5))

plt.subplot(131)
plt.imshow(img_0)
plt.title('Original image')

plt.subplot(132)
plt.imshow(image_0_gray, cmap=plt.cm.gray)
plt.title('Grayscale image')

plt.subplot(133)
plt.imshow(image_0_eq_hist, cmap=plt.cm.gray)
plt.title('Image with the equalizer function')

plt.show()

Now we have the distribution of light, medium and dark gray tones that make up the image.

In [None]:
plt.figure(figsize=(20, 5))

plt.subplot(121)
plt.hist(image_0_gray.ravel(), 256, [0, 256])
plt.title('Grayscale histogram')

plt.subplot(122)
plt.hist(image_0_eq_hist.ravel(), 256, [0, 256])
plt.title('Equalized image histogram')

plt.show()

### Image treatment

#### Blur smoothing filter

One of the most common treatments when working with images is the use of smoothing filters, in order to try to reduce unwanted graininess in the image.
The appearance of noise in the image is very common when it is obtained, for example, in low light environments.

In [None]:
filtered_image = cv2.blur(img_0, (3, 3))

The above function of opencv is responsible for applying a filter on the original image, the command (3, 3) generates a blur based on a pixel in relation to its neighbors. Below we can see the original image, which has a certain level of noise and is smoothed without losing its sharpness and image detail.

In [None]:
plt.figure(figsize=(20,5))

plt.subplot(121)
plt.imshow(img_0)
plt.title('Original image')

plt.subplot(122)
plt.imshow(filtered_image)
plt.title('Filtered image')

plt.show()

#### Gaussian filter

Another widely used filter is the Gaussian.

In [None]:
filtered_image_gaus = cv2.GaussianBlur(img_0, (5, 5), 2)

In the case of the Gaussian filter, we have to pass a third parameter, which defines that the number of neighboring pixels to be considered must be equal on the x and y axis.
Below we can see the original image and the filter image. We can see that it has a lower quality compared to when the blur function is used.

In [None]:
plt.figure(figsize=(20,5))

plt.subplot(121)
plt.imshow(img_0)
plt.title('Original image')

plt.subplot(122)
plt.imshow(filtered_image_gaus)
plt.title('Filtered image')

plt.show()

### Image Rotation

Rotating the image is also a very common technique used in image recognition models.
Rotating the image by 10 degrees.

In [None]:
row, column, canal = img_0.shape

rotation_matrix = cv2.getRotationMatrix2D((row/2, column/2), 10, 1)

rotated_image = cv2.warpAffine(img_0, rotation_matrix,(column, row))

fig = plt.figure(figsize=(10,8))
plt.imshow(rotated_image)
plt.show()

In [None]:
image_16 = plt.imread("../input/tensorflow-great-barrier-reef/train_images/video_0/1011.jpg")

In [None]:
red = (0, 0, 255) #'x': 559, 'y': 213, 'width': 50, 'height': 32
cv2.rectangle(image_16, (559, 213), (559 + 50, 213 + 32), red, 2)
fig = plt.figure(figsize=(10,8))
plt.imshow(image_16)
plt.show()