<a href="https://colab.research.google.com/github/thesteve0/impatient-computer-vision/blob/main/2_classify_embed.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Classification and Embedding

We are going to do our housekeep steps which will take a little while to run. While they are running we will go back to slides and I will introduce the topics.

### Housekeeping
Before we do anything else, we are need to change our machine time to one that has a GPU. Doing computer vision tasks with a CPU, except for some specific models, is extremely slow. One of the reasons we are using Colab is that you can get free access to a GPU for the workshop.

Please:
1. Go up to the top right of the browser
2. Select "Connect"
3. Then "Change Runtime Type"
![change_runtime](assets/2_pick_GPU1.png)

4. Pick T4 GPU
5. Click Save
![pick GPU](assets/2_pick_GPU2.png)

6. When the run time connects it should look like this
![running GPU](assets/2_pick_GPU3.png)


Time to do our long running tasks
1. Load the dependencies
2. Map the drive
2. Load the data

In [None]:
!pip install fiftyone==1.4.1 torch torchvision umap-learn
from google.colab import drive
drive.mount('/content/drive')

import fiftyone as fo

name = "our-photos"
dir = "/content/drive/MyDrive/impatient-cv/flickr-labeled"

dataset = fo.Dataset.from_dir(
    dataset_dir=dir,
    dataset_type=fo.types.FiftyOneDataset,
    name=name
)

print(dataset)

## Classification

As we discussed in the slides, Classification is the computer vision task where you try to assign an image to single class out of a list of classes. We are going to use a classification model that is the foundation for many other models and is still quite powerful - ResNet. We are going to use the simplement version, ResNet18, because:

1. It doesn't require much GPU resources
2. It is fast to compute

There are many variations to ResNet where a number is appended to the name. This number usually represents the number of layers in the neural network.

### Training data

While ResNet18 has a specific architecture, to use it for predictions, the model needs to be trained on data. There are many foundational data sets in computer vision but, a partciularly common one is [ImageNet](https://www.image-net.org/index.php). This dataset has 1k classes and millions of annotated images.

Please open the list of the [imagenet classes](https://deeplearning.cms.waikato.ac.nz/user-guide/class-maps/IMAGENET/) in another browser tab. We will be referring to this later in the notebook

FiftyOne has a [dataset zoo](https://docs.voxel51.com/dataset_zoo/datasets.html) where many important computer vision datasets have been converted into FiftyOne format and are easy to download and view.

Let's go ahead and download and view a small subset of the ImageNet Data, the [ImageNet Sample Data](https://docs.voxel51.com/dataset_zoo/datasets.html#imagenet-sample)

In [None]:
1. import fiftyone.zoo as foz

imagenet_samples = foz.load_zoo_dataset("imagenet-sample")

session = fo.launch_app(imagenet_samples, auto=False)

session.url


### FiftyOne Model Zoo

The computer vision platform we have been using, FiftyOne, also has a set of models already converted into a format that works with the rest of the FiftyOne platform. Typically, you would have to use library specific code, such as PyTorch, along with other code to specify the architecture to run a computer vision model. With FiftyOne, we can load the model in one line of code,  and then run it for classification (inference) with another line of code. Two lines of code and you are in business.

#### ResNet18 in the model zoo

We are going to load the PytTorch version of [ResNet18 model](https://docs.voxel51.com/model_zoo/models.html#resnet18-imagenet-torch) that was trained on ImageNet

In [None]:
resnet18_imagenet_model = foz.load_zoo_model("resnet18-imagenet-torch")


### Predictions of our Photos

We loaded our Flickr dataset and we have loaded our classification model, time to have it predict the classifications for our images.

In [None]:
dataset.apply_model(resnet18_imagenet_model, label_field="rn18_in_predictions", num_workers=12, progress_bar=True)

# Now let's look at the results
session.dataset = dataset

#### Deep dive on the horse

I want us to dig is on one particular sample


In [None]:
horse_valley = dataset["6773012fa08cade6ec7e44f2"]

session.sample_id = horse_valley["id"]

Now let's see what the generated predictions tell us

In [None]:
import torch.nn.functional as TF
import torch

model_classes = resnet18_imagenet_model.classes
logits = torch.from_numpy(horse_valley["rn18_in_predictions"]["logits"])

print("There are " + str(len(logits))+ " logits")

print("\nHere are all the logits")
print(str(logits[:25]))

confidences = TF.softmax(logits, dim=0)
print("\nHere are all the confidence scores")
print(str(confidences[:25]))

# Get top 5 values and their indices
top_values, top_indices = torch.topk(confidences, k=5)

print("Top 5 confidence values:", top_values)
print("Their indices:", top_indices)

print("\nPredictions in descending confidence:\n")
for idx, value in zip(top_indices.tolist(), top_values.tolist()):
    print("Prediction: " + model_classes[idx] + " \tConfidence: " + str(value))

### Discussing the results

1. What are some of the main things you noticed about the predictions?
2. Were the predicted classes surprising to you? Were they useful for our problem?
3. Take home bonus - What did changing the number of workers do?

Here are the important ideas I wanted you to take away

1. The model only can predict classes it was trained on
2. The model will associate the most similar images of its training data to the current image and then give it that class


## Another ResNet Model

To demonstrate the importance of training data, we are going to run another ResNet18 model, except I trained this model on [Pokemon images](https://huggingface.co/datasets/TheSteve0/pokemon).

I put the model weights file in our shared drive.

To use this model we are going to:
1. Load the model into pytorch
2. Run the model against our Flickr images
3. Associate the classification labels back to our FiftyOne dataset
4. View the results

In [None]:
import torchvision.transforms as transforms
from PIL import Image
from tqdm.notebook import tqdm
import os

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

model = torch.load("pokemon-classification-model.pt", map_location=device)
model.eval()

# Standard ResNet transformations
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Optional: Define class mapping for Pokemon species (if available)
# class_names = {0: "Pikachu", 1: "Charizard", ...}
class_names = None  # Set to None if unavailable

# Process images in batches
batch_size = 16  # Adjust based on your available memory
total_samples = len(dataset)

with torch.no_grad():
    for i in tqdm(range(0, total_samples, batch_size)):
        batch_samples = dataset[i:min(i+batch_size, total_samples)]
        batch_tensors = []
        valid_indices = []

        # Process each image
        for j, sample in enumerate(batch_samples):
            try:
                image = Image.open(sample.filepath).convert('RGB')
                tensor = transform(image)
                batch_tensors.append(tensor)
                valid_indices.append(j)
            except Exception as e:
                print(f"Error processing {sample.filepath}: {e}")

        if not batch_tensors:
            continue

        # Run inference
        batch = torch.stack(batch_tensors).to(device)
        outputs = model(batch)

        # Get predictions
        probs = torch.nn.functional.softmax(outputs, dim=1)
        confidences, predictions = torch.max(probs, dim=1)

        # Update samples
        for j, idx in enumerate(valid_indices):
            sample = batch_samples[idx]
            pred_idx = predictions[j].item()
            confidence = confidences[j].item()

            # Get class name if mapping exists
            if class_names is not None:
                pred_label = class_names.get(pred_idx, f"Unknown({pred_idx})")
            else:
                pred_label = pred_idx

            # Add prediction to sample with requested field name
            sample["rn18_pm_predictions"] = pred_label
            sample["rn18_pm_confidence"] = float(confidence)
            sample.save()

# Save dataset
dataset.save()

