Dataset -> Load the img -> Preprocess (Grayscale and Normalize) -> Feature extraction -> Dataset Matrix -> Train the model -> Predict -> Evaluate

## 1) Load the image

In [1]:
from PIL import Image
import numpy as np

img = Image.open("img1.jpg")
img.show()

## 2. Preprocess the image

### 2.1 Convert to grayscale
- We convert an image to grayscale to make the pre processing faster. 
- The colored image is made of pixels and each pixel contains 3 primary colors - Red, Green, Blue; also called RGB. 
- Meaning each pixel contains 3 channels - [R amount, G amount, B amount]. 
- But when we convert it into grayscale, it contains only one channel - [Gray amount] or the brightness percentage. 
- This reduces time taken to pre-process the image and helps in yielding better outcome faster.

In [2]:
img = img.convert("L")
img.show()

### 2.2 Resize the image
- Convert all the images to 128x128 format

In [3]:
img = img.resize((128, 128))
img.show()

### 2.3 Convert the image into grid/matrix

- Once the image is resized, we need to represent the brightness levels in each image in the form of matrix for numpy to perform numerical calculations.
- Every image is divided into rows and columns. Each row represents horizontal line of pixels brightness in an image and each column represents vertical line of pixels brightness. 
- Each cell is the brightness value of that pixel.

In [4]:
img_array = np.array(img)
print(img_array)

[[120 123 128 ... 121 120 127]
 [135 135 134 ... 132 143 152]
 [150 150 149 ... 155 163 156]
 ...
 [ 18  58  62 ...  58  23   5]
 [ 16  62  64 ...  58  27   3]
 [ 15  59  65 ...  56  30   4]]


### 2.4 Normalize the array
- Convert the int array into float values between 0 and 1.

In [5]:
img_array = img_array.astype("float32")/255.0
print(img_array)

[[0.47058824 0.48235294 0.5019608  ... 0.4745098  0.47058824 0.49803922]
 [0.5294118  0.5294118  0.5254902  ... 0.5176471  0.56078434 0.59607846]
 [0.5882353  0.5882353  0.58431375 ... 0.60784316 0.6392157  0.6117647 ]
 ...
 [0.07058824 0.22745098 0.24313726 ... 0.22745098 0.09019608 0.01960784]
 [0.0627451  0.24313726 0.2509804  ... 0.22745098 0.10588235 0.01176471]
 [0.05882353 0.23137255 0.25490198 ... 0.21960784 0.11764706 0.01568628]]


## 3. Feature Extraction

### 3.1 Convert the img_array (2D) array into 1D array
- We convert 2D to 1D array so it's easier to perform statistical operations like mean, standard deviation, minimum brightness, maximum brightness on feature matrix.

In [6]:
features = img_array.flatten()
print(features)

[0.47058824 0.48235294 0.5019608  ... 0.21960784 0.11764706 0.01568628]


### 3.2 Find the statistical calculations and add them to feature matrix

In [9]:
mean = features.mean()
std = features.std()
m = features.min()
M = features.max()
features = np.concatenate([features, [mean, std, m, M]])
print(features)

[0.47058824 0.48235294 0.5019608  ... 0.21838112 0.         0.9882353 ]


### 3.3 Convert the 1D to 2D because the classifier expects 2D - [number of images, number of features]

In [10]:
X = features.reshape(1, -1)
Y = [1]
print(X)
print(Y)

[[0.47058824 0.48235294 0.5019608  ... 0.21838112 0.         0.9882353 ]]
[1]
