In this assignment, you will use a pre-trained convnet to produce features for a classifier that can detect a single object type. This notebook has some code to help you get started. 

In [1]:
import pandas as pd
import os
from os import listdir
from os.path import isfile, join
import os.path as osp
from tqdm import tqdm_notebook as tqdm

### Gather positive examples

Pick a word. For example, "red" or "santa" or "horse". 

Now you will need to find "positive" image examples of that word. For example, if you chose "red" as your word, you will need to find images of red things. You are free to use Google Image search or something similar. File types shouldn't matter, but try to stick with .png and .jpg files.

You'll need at least 100 positive example images. Put them in the folder called `pos`. 

### Gather negative examples

Now you need to think about negative examples; i.e., things that are *not* examples of your word. You can either just find random images, or look for specific negative examples. For example, if you chose the word "red" then it might work best if you find negative examples that are other colors, especially colors close to red. 

You'll need at least 200 negative example images. Put them in the folder called `neg`. 

## 1.) Run the following cell

* This imports needed Keras libraries
* Then, it gets the trained VGG19 imagenet model
* Then, it prints out the names of all the layers in that model

In [2]:
import numpy as np
from tensorflow.keras.applications import VGG19
from tensorflow.keras.applications.vgg19 import preprocess_input
from tensorflow.keras.preprocessing import image
from tensorflow.keras.models import Model

base_model = VGG19(weights='imagenet',include_top=True)
xs,ys=224,224

for layer in base_model.layers:
    print(layer.name)

2025-03-12 23:20:35.790847: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M2
2025-03-12 23:20:35.790876: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 8.00 GB
2025-03-12 23:20:35.790882: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 2.67 GB
2025-03-12 23:20:35.791188: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2025-03-12 23:20:35.791199: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)


input_layer
block1_conv1
block1_conv2
block1_pool
block2_conv1
block2_conv2
block2_pool
block3_conv1
block3_conv2
block3_conv3
block3_conv4
block3_pool
block4_conv1
block4_conv2
block4_conv3
block4_conv4
block4_pool
block5_conv1
block5_conv2
block5_conv3
block5_conv4
block5_pool
flatten
fc1
fc2
predictions


### 2.) Determine your output layer

- try `predictions` first
- note the layers printed out above; you can use any of those laters
- pay attention to output shape of each layer! predictions is a vector of size 1000, for example

In [3]:
layer = 'predictions'

model = Model(inputs=base_model.input, outputs=base_model.get_layer(layer).output)

### Run the following cell

- These functions are to help you perform transfer learning

In [4]:
def get_image(img_path, xs,ys):
    x = image.load_img(img_path, target_size=(xs, ys))
    x = image.img_to_array(x)
    x = np.expand_dims(x, axis=0)
    return x

def get_img_features(model, img):
    img = preprocess_input(img)
    yhat = model.predict(img)
    return yhat

def get_image_features(word):
    files = [f for f in listdir(word)] # grab all of the images in the folder
    image_vectors = []
    for f in tqdm(files):
        img = get_image(osp.join(word, f), xs, ys) 
        x_feats = get_img_features(model, img).flatten() # get features for each image
        image_vectors.append(x_feats) 
    return np.array(image_vectors)

## 3.) Evaluate a classifier for your `word`

* Using the positive and negative output from `model`, train a classifier (it can be a linear classifier from scikit-learn, if you'd like, but I would recommend the Keras Dense network we built for the previous assignment). 
* You'll need to split your data into Train and Test (I would recommend using half of the data for training, half for testing; you may opt for downloading more positive and negative examples)
* your classifier can be any scikit classifier, but you can also use a neural network of some kind

In [5]:
pos_images = get_image_features('pos') # get positive image vectors
neg_images = get_image_features('neg') # get negative image vectors

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for f in tqdm(files):


  0%|          | 0/102 [00:00<?, ?it/s]

2025-03-12 23:20:59.836880: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:117] Plugin optimizer for device_type GPU is enabled.


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 358ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 48ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 34ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 38ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 34ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 34ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 34ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 37ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 35ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 34ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3

  0%|          | 0/206 [00:00<?, ?it/s]

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 43ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 31ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 31ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 31ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 31ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 31

### Prepare the data. Split to train/test sets

In [18]:
from sklearn import preprocessing

# label and prepare data
pos_df = pd.DataFrame({'vectors': [v for v in pos_images], 'label': 1})
neg_df = pd.DataFrame({'vectors': [v for v in neg_images], 'label': 0})
data = pd.concat([pos_df, neg_df], ignore_index=True)

# split into train/test sets
test = data.sample(frac=0.5,random_state=200)
train = data.drop(test.index)

le = preprocessing.LabelEncoder() # convert to numerical categories
le.fit(data.label)

ohe = preprocessing.OneHotEncoder()
y = le.transform(train.label).reshape(-1, 1) # basically go from shape (n, ) to (n, 1)
ohe.fit(y)
y = ohe.transform(y).todense()
X = np.array([x for x in train.vectors])

X.shape, y.shape

((154, 1000), (154, 2))

### Define model, train

In [19]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.optimizers import Adam

In [20]:
# create keras nn

keras_model = Sequential()
#model.add(Flatten())
keras_model.add(Dense(96, activation='relu'))
keras_model.add(Dense(2, activation='softmax'))  

keras_model.compile(loss='binary_crossentropy', optimizer=Adam(learning_rate=0.0001), metrics=['accuracy'])

keras_model.fit(X, y, epochs=300, batch_size=10, verbose=0)

<keras.src.callbacks.history.History at 0x37fe68f70>

### Evaluate

In [21]:
_, train_accuracy = keras_model.evaluate(X, y)
print('Accuracy: %.2f' % (train_accuracy*100))

[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.7866 - loss: 0.4131 
Accuracy: 85.71


In [22]:
y_test = le.transform(test.label).reshape(-1, 1)
y_test = ohe.transform(y_test).todense()
X_test = np.array([x for x in test.vectors])

X_test.shape, y_test.shape

((154, 1000), (154, 2))

In [23]:
_, test_accuracy = keras_model.evaluate(X_test, y_test)
print('Accuracy: %.2f' % (test_accuracy*100))

[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7637 - loss: 0.5588 
Accuracy: 75.97


### 4.) Use CLIP

* Repeat steps 3 and 4 above, only this time using the [CLIP](https://github.com/openai/CLIP) model
  
  To get image features, use the following example: `image = preprocess(Image.open("CLIP.png")).unsqueeze(0).to(device)`

(see also the last code section of the README for the CLIP github repo on training a classifier using CLIP features)
  
  
* (Answer in a markdown cell): Which model+layer works the best for this data? Why do you think that is?
* What makes for good positive examples? What makes for good negative examples? Why does the choice of negative examples matter?

In [73]:
import clip
import torch
from PIL import Image
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load('ViT-B/32', device)

def c_get_image(img_path):
    image = preprocess(Image.open(img_path)).unsqueeze(0).to(device)
    return image

def c_get_img_features(img_tensor):
    with torch.no_grad():
        image_features = model.encode_image(img_tensor)
    return np.array(image_features).flatten()

def c_get_image_features(word):
    files = [f for f in listdir(word)] # grab all of the images in the folder
    image_vectors = []
    for f in tqdm(files):
        img_tensor = c_get_image(osp.join(word, f))
        img_feats = c_get_img_features(img_tensor)
        image_vectors.append(img_feats)
    return np.array(image_vectors)

In [74]:
c_pos_images = c_get_image_features('pos') # get positive image vectors
c_neg_images = c_get_image_features('neg') # get negative image vectors

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for f in tqdm(files):


  0%|          | 0/102 [00:00<?, ?it/s]

  0%|          | 0/206 [00:00<?, ?it/s]

In [75]:
# label and prepare data
pos_df = pd.DataFrame({'vectors': [v for v in c_pos_images], 'label': 1})
neg_df = pd.DataFrame({'vectors': [v for v in c_neg_images], 'label': 0})
data = pd.concat([pos_df, neg_df], ignore_index=True)

# split into train/test sets
test = data.sample(frac=0.5,random_state=200)
train = data.drop(test.index)

le = preprocessing.LabelEncoder() # convert to numerical categories
le.fit(data.label)

ohe = preprocessing.OneHotEncoder()
c_y = le.transform(train.label).reshape(-1, 1) # basically go from shape (n, ) to (n, 1)
ohe.fit(c_y)
c_y = ohe.transform(c_y).todense()
c_X = np.array([x for x in train.vectors])

c_X.shape, c_y.shape

((154, 512), (154, 2))

In [76]:
# create keras nn

keras_model = Sequential()
#model.add(Flatten())
keras_model.add(Dense(96, activation='relu'))
keras_model.add(Dense(2, activation='softmax'))  

keras_model.compile(loss='binary_crossentropy', optimizer=Adam(learning_rate=0.0001), metrics=['accuracy'])

keras_model.fit(c_X, c_y, epochs=300, batch_size=10, verbose=0)

<keras.src.callbacks.history.History at 0x3ba90ce50>

In [77]:
_, train_accuracy = keras_model.evaluate(c_X, c_y)
print('Accuracy: %.2f' % (train_accuracy*100))

[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 1.0000 - loss: 2.0780e-04 
Accuracy: 100.00


In [78]:
c_y_test = le.transform(test.label).reshape(-1, 1)
c_y_test = ohe.transform(c_y_test).todense()
c_X_test = np.array([x for x in test.vectors])

c_X_test.shape, c_y_test.shape

((154, 512), (154, 2))

In [79]:
_, test_accuracy = keras_model.evaluate(c_X_test, c_y_test)
print('Accuracy: %.2f' % (test_accuracy*100))

[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.9965 - loss: 0.0134 
Accuracy: 99.35


### Q&A
1) Which model+layer works the best for this data? Why do you think that is?<br>
The model+layer that worked best for me was the CLIP model with a keras nn with a singluar 'relu' dense layer.  I think this is because CLIP is trained on around 400M images while VGG is trained on only 1 million images.  CLIP is just a more complex model that had a lot more data to train off of and likely has some different architecture as well.
3) What makes for good positive examples? What makes for good negative examples? Why does the choice of negative examples matter?<br>
Good positive examples include the subject in different lighting/settings, higher resolution, with not so distracting backgrounds.  Good negative examples are images that are similar to the positive.  For example, I trained my classifier on positive images of Santa, but I also trained on negative images of Krampus, Dumbledore, and random people.  The choice of negative examples matter because when doing testing on these images, the model should be good at identifying images that aren't extremely different.  For example, it would be bad if we trained a santa classifier on positive images of santa but only against negative images of the color blue.  It would not really allow the model to learn all the features that make up santa.