Our workflow will be as the following:
- Preprocessing:
    1. Read image.
    2. Grayscale image
    3. Resize images into (28*28)
    4. Extract HOG descriptors.
    5. Flatten Image.
    6. Save image into python list.
- Data Modeling:
    1. Create labels.
    2. Separate data into training and testing.
    3. Train model.
    4. Test model. 
    
    
    Incase you want to look PRO
    3. Bench mark 3 algorithms.
    4. Select best model.
    
# Now let perform each step in our hypothetical problem

In [103]:
# Import statements

from skimage.io import imread_collection, imread
from skimage.color import rgb2gray
from skimage.transform import rescale, resize, downscale_local_mean
from skimage.feature import hog
from matplotlib import pyplot as plt
from skimage import exposure
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.svm import LinearSVC
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report


### Preprocessing

In [39]:
#For ballons (class 1)
col_dir = 'ballons/*.jpg'

#creating a collection with the available images in our folder
col = imread_collection(col_dir)
# col object has all our images and we can access them by index example: col[0]
ballon_set = list()
#Do all the setps in each image in col
for image in col:
    img = rgb2gray(image) # convert to grayscale
    img = resize(img, (28,28)) # resize 
    fd, hog_image = hog(img, orientations=8, pixels_per_cell=(16, 16),
                    cells_per_block=(1, 1), visualize=True, multichannel=False) # HOG
    hog_image_rescaled = exposure.rescale_intensity(hog_image, in_range=(0, 100))
    flat_image = hog_image_rescaled.flatten() # Flatten image
    ballon_set.append(flat_image)

In [40]:
#For notballons (class 2)
col_dir = 'notballons/*.png'

#creating a collection with the available images in our folder
col = imread_collection(col_dir)
# col object has all our images and we can access them by index example: col[0]
notballon_set = list()
#Do all the setps in each image in col
for image in col:
    img = rgb2gray(image) # convert to grayscale
    img = resize(img, (28,28)) # resize 
    fd, hog_image = hog(img, orientations=8, pixels_per_cell=(16, 16),
                    cells_per_block=(1, 1), visualize=True, multichannel=False) # HOG
    hog_image_rescaled = exposure.rescale_intensity(hog_image, in_range=(0, 100))
    flat_image = hog_image_rescaled.flatten() # Flatten image
    notballon_set.append(flat_image)

# Data modeling

Now we have two python lists that has our images flattened. We need to create a label for each class so we can know which one is a ballon and which is not. We will label them as the following:
1. class ballon = 0
2. class not-ballon = 1

## Creating labels

In [48]:
# we create two victors one for each class, the size of the victor = number of images
ballon = np.zeros(10) # because I have 10 ballon images
notballon = np.ones(10)

In [57]:
# this is how it looks, but I need to transpose it
ballon

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [68]:
# let's make them as one long victor
label = np.hstack((ballon, notballon))
label

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1.])

In [70]:
# let's make this into a vertical victor
label = label.reshape((-1, 1))
label

array([[0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.]])

In [71]:
# now lets get our images together as well

images_set = np.vstack((ballon_set, notballon_set))

In [73]:
# hypothesis testing
# we expect 20 images, 20 labels

print('the number of images = ', len(images_set))
print('the number of labels = ', len(label))

the number of images =  20
the number of labels =  20


## Creating train and test sets

In [86]:

xtrain, xtest, ytrain, ytest = train_test_split(images_set, label, test_size=0.3, random_state=42, shuffle =True)


In [94]:
ytrain = ytrain.flatten()
ytest = ytest.flatten()

# Train a model

We will use linear SVM for this dummy project

Read about SVM classifier: https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html

In [95]:
# initialize the classifier (Support Victor Classifier)
clf = LinearSVC(random_state=0, tol=1e-5)

# Training
clf.fit(xtrain, ytrain)

LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='squared_hinge', max_iter=1000,
     multi_class='ovr', penalty='l2', random_state=0, tol=1e-05, verbose=0)

In [98]:
ypredtrain = clf.predict(xtrain)
ypredtest = clf.predict(xtest)

In [100]:
training_accuracy = accuracy_score(ytrain, ypredtrain)
testing_accuracy = accuracy_score(ytest, ypredtest)

In [102]:
print('Training accuracy = {0:.2f}%'.format(training_accuracy*100))
print('Testing accuracy = {0:.2f}%'.format(testing_accuracy*100))

Training accuracy = 57.14%
Testing accuracy = 33.33%


# Ok I know it sucks but these images are not even real. I hope now you see the workflow

In [109]:

training_report = classification_report(ytrain, ypredtrain)
testing_report = classification_report(ytest, ypredtest)

print("Training classification Report\n\n", training_report)
print("\n\nTesting classification Report\n\n", testing_report)

Training classification Report

               precision    recall  f1-score   support

         0.0       0.00      0.00      0.00         6
         1.0       0.57      1.00      0.73         8

   micro avg       0.57      0.57      0.57        14
   macro avg       0.29      0.50      0.36        14
weighted avg       0.33      0.57      0.42        14



Testing classification Report

               precision    recall  f1-score   support

         0.0       0.00      0.00      0.00         4
         1.0       0.33      1.00      0.50         2

   micro avg       0.33      0.33      0.33         6
   macro avg       0.17      0.50      0.25         6
weighted avg       0.11      0.33      0.17         6

