# ENES 100 ML Lesson
Start this lesson by making a copy of this notebook (just like you would a Google Document) so you can make edits. Click file, save a copy in drive.

#**<font color='red'>IMPORTANT REQUIRED FIRST STEP - CHANGE THE INSTANCE TYPE</font>**
**Efficient Machine learning needs a GPU for computations. In order to enable the GPU, click Runtime / Change runtime type, then click "T4 GPU" then "save"**

## Thumbs Up / Down Detection
In this lesson, you will train a machine learning model to determine whether a given image has a thumbs up or a thumbs down present in the image. You will gather and label a *training dataset* (aka, pictures of thumbs up and of thumbs down) and ask the model to infer what exactly makes up a thumbs up and thumbs down.

The goal of this lesson is not to figure out how machine learning works, but to get a sense of the applications you could apply machine learning to in this couse.

In [1]:
import torch
use_cuda = torch.cuda.is_available()

In [2]:
# This Google colab notebook requires many libraries, aka other peices of code.
# Some are builtin to python, some are used for image and data manipulation,
# some are specific to machine learning, and some are used to craft this Google
# colab notebook.

# builtin to python
import io
import time
import os
import glob
import subprocess
import uuid
from base64 import b64decode, b64encode

# image and data manipulation
import numpy as np
import cv2
import PIL

# machine learning tools
import torchvision
import torch.utils.data
import torchvision.transforms as transforms
import torch.nn.functional as F

# required
from IPython.display import display, Javascript, Image
from IPython import core
import ipywidgets
import traitlets

# Categories
Use this cell to define the categories of your model, ie the possible categories an image can be placed in. In this example, we are trying to see if an image includes an image of a thumbs up or a thumbs down, so the categories are "thumbs_up" and "thumbs_down".

When you change these, make sure to rerun all cells. (Runtime -> Run All or Ctrl+F9)

In [3]:
CATEGORIES = ['thumbs_up','thumbs_down']

# Dataset Definition

Your model will be trained using a dataset, images that this program will use as examples to train your model. Here we define a *class* (if you have programmed before you might be familiar with this concept) that represents a dataset object.

In [4]:
class ImageClassificationDataset(torch.utils.data.Dataset):
    def __init__(self, directory, categories, transform=None):
        self.categories = categories
        self.directory = directory
        self.transform = transform
        self.refresh()

    def __len__(self):
        return len(self.annotations)

    def __getitem__(self, idx):
        ann = self.annotations[idx]
        image = cv2.imread(ann['image_path'], cv2.IMREAD_COLOR)
        image = PIL.Image.fromarray(image)
        if self.transform is not None:
            image = self.transform(image)
        return image, ann['category_index']

    def refresh(self):
        self.annotations = []
        for category in self.categories:
            category_index = self.categories.index(category)
            for image_path in glob.glob(os.path.join(self.directory, category, '*.jpg')):
                self.annotations += [{
                    'image_path': image_path,
                    'category_index': category_index,
                    'category': category
                }]

    def save_entry(self, image, category):
        """Saves an image in BGR8 format to dataset for category"""
        if category not in self.categories:
            raise KeyError('There is no category named %s in this dataset.' % category)

        filename = str(uuid.uuid1()) + '.jpg'
        category_directory = os.path.join(self.directory, category)

        if not os.path.exists(category_directory):
            subprocess.call(['mkdir', '-p', category_directory])

        image_path = os.path.join(category_directory, filename)
        #print(image.type())
        cv2.imwrite(image_path, image)
        self.refresh()
        return image_path

    def get_count(self, category):
        i = 0
        for a in self.annotations:
            if a['category'] == category:
                i += 1
        return i

# Configuration Options
Here the code sets up some important configuration options. You should take some time to understand these options, you will need to configure these when applying this model to your ENES100 project.

## Transforms
An image straight from a camera isn't optimal for training / evalulating your model. In fact, before training, we want to resize your image to a smaller size, normalize the pixels [(why?)](https://discuss.pytorch.org/t/what-does-it-mean-to-normalize-images-for-resnet/96160), and also randomly change the hue, saturation, or value across the whole image to make your model more robust.


The TRAIN_TRANSFORMS variable lists some edits that are made to each image in the training data. You can research each one in detail, but in short the transforms:

1.   Randomly apply a color jitter.
2.   Resize the image to 224 by 224.
3.   Normalize the colors of the image.

When evaluating your model, you don't want to apply any randomness to the input image - just evaluate it as is.

In [13]:
TRAIN_TRANSFORMS = transforms.Compose([
    transforms.ColorJitter(0.2, 0.2, 0.2, 0.2),
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

EVALUATE_TRANSFORMS = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

# No spaces!
activity = "thumbs"

dataset = ImageClassificationDataset(activity, CATEGORIES, TRAIN_TRANSFORMS)

# UI Code
This code creates the user interface, outputted after the last cell. This allows you click buttons instead of write code!

Noteable, this creates your model as based on [resnet18](https://pytorch.org/vision/main/models/generated/torchvision.models.resnet18.html) which is a common image classification network. (Classifying an image as thumbs up and thumbs down) **Take a look at the line that sets 'model.fc'.** This sets the last layer of your model to have the same number of final outputs as categories in your dataset. Each output is a probability - the probability that your model thinks a given image fits in a certain category.

In [14]:
# Data Collection
category_widget = ipywidgets.Dropdown(options=dataset.categories, description='Category')
count_widget = ipywidgets.IntText(description='Count', disabled=True)
clear_training_data_button = ipywidgets.Button(description='Delete all images', tooltip="Removes all images from all categories. Does not effect model.")
count_widget.value = dataset.get_count(category_widget.value)

def update_counts(change):
    count_widget.value = dataset.get_count(change['new'])

def clear_training_data(_):
    os.system(f"rm -r {activity}")
    dataset.refresh()
    update_counts({'new': category_widget.value})

category_widget.observe(update_counts, names='value')
clear_training_data_button.on_click(clear_training_data)

data_collection_widget = ipywidgets.VBox([
   clear_training_data_button, category_widget, count_widget
])


device = torch.device("cuda" if use_cuda else "cpu")

model = torchvision.models.resnet18(weights='IMAGENET1K_V1')
model.fc = torch.nn.Linear(512, len(dataset.categories))
model = model.to(device)

model_save_button = ipywidgets.Button(description='save model')
model_load_button = ipywidgets.Button(description='load model')
model_path_widget = ipywidgets.Text(description='model path', value='my_model.pth')

def load_model(c):
    import os.path
    if not os.path.isfile(model_path_widget.value):
        print(f"File {model_path_widget.value} could not be found. Please check the directory.")
        return
    model.load_state_dict(torch.load(model_path_widget.value))
    print('loaded model')

def save_model(c):
    torch.save(model.state_dict(), model_path_widget.value)
    print('saved model')

model_load_button.on_click(load_model)
model_save_button.on_click(save_model)

model_widget = ipywidgets.VBox([
    model_path_widget,
    ipywidgets.HBox([model_load_button, model_save_button])
])

prediction_widget = ipywidgets.HTML("prediction: ")
score_widgets = []
for category in dataset.categories:
    score_widget = ipywidgets.FloatSlider(min=0.0, max=1.0, description=category, orientation='vertical', disabled=True)
    score_widgets.append(score_widget)

live_execution_widget = ipywidgets.VBox([
    ipywidgets.HBox(score_widgets),
    prediction_widget,
])

epochs_widget = ipywidgets.IntText(description='epochs/runs', value=10)
train_button = ipywidgets.Button(description='train')
loss_widget = ipywidgets.FloatText(description='loss')
accuracy_widget = ipywidgets.FloatText(description='accuracy')
progress_widget = ipywidgets.FloatProgress(min=0.0, max=1.0, description='progress')

# Training Loop

Here is where the magic happens! This section of code is commented heavily, so read it! The actual training is solely contained in these lines:


```python
if is_training:
    optimizer.zero_grad()
# TRAINING STEP 1: Run the model with the given images
outputs = model(images)
# TRAINING STEP 2: Compare our results with the correct labels. loss is an indication of how well we did
loss = F.cross_entropy(outputs, labels)
# TRAINING STEP 3: Minimize loss! Here's where the torch magic happens - in the background torch is calculating gradients (yes, calc 3 gradients) of massive matricies.
# PhD required for this stuff.
if is_training:
    loss.backward()
    optimizer.step()
```


In [7]:
optimizer = torch.optim.Adam(model.parameters())

def train_model():
    global model, dataset, optimizer, train_button, accuracy_widget, loss_widget, progress_widget, state_widget
    start_epochs = epochs_widget.value
    try:
        if len(dataset) == 0:
            raise Exception('Training failed: You must generate some training data before training!')

        # Train on 16 images at a time, randomly shuffled from the dataset.
        train_loader = torch.utils.data.DataLoader(dataset, batch_size=16, shuffle=True)
        train_button.disabled = True

        model.train(True) # Sets train to false if not training.
        while epochs_widget.value > 0: # One epoch iterates over the entire dataset once. Several epochs allow the model to learn more.
            i = 0
            sum_loss = 0.0
            error_count = 0.0
            for images, labels in iter(train_loader): # Iterate over the entire dataset, in batches of 16 images.
                # The images variable is a list of rbg images. The labels variable is a list numbers that corresponds to whether the image is of up or thumbs down.
                images = images.to(device)
                labels = labels.to(device)
                optimizer.zero_grad()
                # TRAINING STEP 1: Run the model with the given images
                outputs = model(images)
                # TRAINING STEP 2: Compare our results with the correct labels. loss is an indication of how well we did
                loss = F.cross_entropy(outputs, labels)
                # TRAINING STEP 3: Minimize loss! Here's where the torch magic happens - in the background torch is calculating gradients (yes, calc 3 gradients) of massive matricies.
                # PhD required for this stuff.
                loss.backward()
                optimizer.step()
                # How did we do? Of the 16 images, did we get anything wrong?
                error_count += len(torch.nonzero(outputs.argmax(1) - labels).flatten())
                count = len(labels.flatten())
                i += count
                sum_loss += float(loss)
                # Update the widgets to show training!
                progress_widget.value = i / len(dataset)
                loss_widget.value = sum_loss / i
                accuracy_widget.value = 1.0 - error_count / i

            epochs_widget.value = epochs_widget.value - 1
    except Exception as e:
        print('Exception while running train_model:')
        print(e)
    model = model.eval()
    epochs_widget.value = start_epochs
    train_button.disabled = False

train_button.on_click(lambda c: train_model())

train_eval_widget = ipywidgets.VBox([
    epochs_widget,
    progress_widget,
    loss_widget,
    accuracy_widget,
    train_button
])

# Webcam Streaming
To allow webcam streaming from the browser to the google colab server, we employ the following javascript. This isn't relevant to the machine learning and just allows google colab to access your webcam.

If you know JS, feel free to check it out.

In [11]:
video_stream_html = '''
        <div style="width: min-content">
          <div style='display: flex; flex-direction: row; justify-content: center; gap: 10px; margin: 10px'>
            <button id="togglepredictions" class="lm-Widget p-Widget jupyter-widgets jupyter-button">Enable Live Predictions</button>
            <button id="capture" class="lm-Widget p-Widget jupyter-widgets jupyter-button">Capture Image</button>
          </div>
          <video id='webcam' style="display:block"></video>
        </div>
        <script>
          async function init() {
            const video = document.querySelector('#webcam');
            const button = document.querySelector('#togglepredictions');
            let interval = undefined;
            const stream = await navigator.mediaDevices.getUserMedia({video: true});
            video.srcObject = stream;
            await video.play();
            button.onclick = (e) => {
              e.preventDefault();
              if(interval) {
               clearInterval(interval);
               interval = undefined;
               button.textContent = 'Enable Live Predictions'
               return
              }
              interval = setInterval(predict, 500);
              button.textContent = 'Disable Live Predictions'
            };
            document.querySelector('#capture').onclick = () => {
                get_ipython().kernel.exec(capture( [getFrame()]))
            }
          }

          function predict() {
            get_ipython().kernel.exec(predict( [getFrame()]))
          }

          function getFrame() {
            // Returns the string representation of a frame
            const video = document.querySelector('#webcam');
            const canvas = document.createElement('canvas');
            canvas.width = video.videoWidth;
            canvas.height = video.videoHeight;
            canvas.getContext('2d').drawImage(video, 0, 0);
            return canvas.toDataURL('image/jpeg', .8);
          }
          init()
        </script>
        '''

# Predict / use the model
Here is the code required to evaluate the model on a certain image.
We start by getting the image into the same format as the training data. (Very important!) We then call the model on the image and display the probabilities that the model thinks the image fits into each category.

In [9]:
def jpg_str_to_cv2(jpg_str):
    # decode base64 string
    image_bytes = b64decode(jpg_str.split(',')[1])
    # convert bytes to numpy array
    jpg_as_np = np.frombuffer(image_bytes, dtype=np.uint8)
    # decode numpy array into OpenCV BGR image
    return cv2.imdecode(jpg_as_np, flags=1)

def predict(img):
    img = jpg_str_to_cv2(img)
    image = PIL.Image.fromarray(img)
    transformed = EVALUATE_TRANSFORMS(image)
    transformed = transformed.to(device)
    transformed = torch.unsqueeze(transformed, 0)

    output = model(transformed)

    output = F.softmax(output, dim=1).detach().cpu().numpy().flatten()
    category_index = output.argmax()
    prediction_widget.value = dataset.categories[category_index]
    for i, score in enumerate(list(output)):
        score_widgets[i].value = score


def capture(img):
    img = jpg_str_to_cv2(img)
    dataset.save_entry(img, category_widget.value)
    count_widget.value = dataset.get_count(category_widget.value)


# Output

In [12]:
all_widget = ipywidgets.HBox([ipywidgets.VBox([
    ipywidgets.HTML("<h2>Data Collection</h2>"),
    data_collection_widget,
    ipywidgets.HTML("<h2>Train Model</h2>"),
    train_eval_widget,
    ipywidgets.HTML("<h2>Save / Load Model</h2>"),
    model_widget,
    ipywidgets.HTML("<div style='line-height: 14px;'>To download the model for safekeeping, <ol><li>Click the 'save model' button</li><li>Open the files sidebar (on the right)</li><li>Click the refresh button</li><li>Click the three dots by your model name, then download.</li><l>Wait! It will take a minute or so.</li></ol></div>")
], layout=ipywidgets.Layout(overflow='visible', padding='0 20px 0 0')),  ipywidgets.VBox([
    ipywidgets.HTML("<h2>Live Prediction Output</h2><div style='line-height: 14px;'>Click the 'Enable Live Predictions' button (above the camera stream) to test your model on the camera stream.</div>"),
    live_execution_widget,
]), ipywidgets.HTML(video_stream_html)])

display(all_widget)

HBox(children=(VBox(children=(HTML(value='<h2>Data Collection</h2>'), VBox(children=(Button(description='Delet…