# Using ConvNeXt to recognize Tom and Jerry

We will use [ConvNeXt](https://arxiv.org/abs/2201.03545) and fine-tune it to correctly classify a dataset with Tom and Jerry pictures.

ConvNeXt is a family of CNN models including some that are quite a small, fast and lightweight.

--------------------

Load neseccary packages and libraries

In [None]:
import keras
import numpy as np
import os
from IPython.display import Image
import matplotlib.pyplot as plt

from keras.applications.convnext import decode_predictions
from keras.utils import get_file, load_img, img_to_array
from keras.layers import Dense,GlobalAveragePooling2D
from keras.models import Model

## Load ConvNeXt

In [None]:
convnext = keras.applications.ConvNeXtTiny()

In [None]:
def prepare_image(file):
    img = load_img(file, target_size=(224, 224))
    img = img_to_array(img)
    img = np.expand_dims(img, axis=0)
    return img

## Testing ConvNeXt on dog images

Lets try some tests on images of different breed of dogs

In [None]:
Image(data='https://upload.wikimedia.org/wikipedia/commons/4/4f/German-shepherd-4040871920._%282%29.jpg', width = 300)

In [None]:
preprocessed_image = prepare_image(get_file('German-shepperd.jpg',origin='https://upload.wikimedia.org/wikipedia/commons/4/4f/German-shepherd-4040871920._%282%29.jpg'))
predictions = convnext.predict(preprocessed_image)
predictions

Decode the labels of predictions

In [None]:
results = decode_predictions(predictions)
results

Another image

In [None]:
Image(data='https://upload.wikimedia.org/wikipedia/commons/d/d4/Labrador_Retriever_-_Yellow.JPG', width = 300)

In [None]:
preprocessed_image = prepare_image(get_file('Labrador.jpg',origin='https://upload.wikimedia.org/wikipedia/commons/d/d4/Labrador_Retriever_-_Yellow.JPG'))
predictions = convnext.predict(preprocessed_image)
results = decode_predictions(predictions)
results

It works pretty well, you can try here some different pictures if you're curious.

## TODO - test on Tom and Jerry

Now let's test the network on some images of Tom and Jerry. We will work with images of Tom and Jerry.
Please use the code above as a template and try to find some images from the cartoon and test the network on them.

In [None]:
preprocessed_image = prepare_image() # TODO

## Get our custom dataset - Tom & Jerry

Lets now manipulate ConvNeXt top few layers and employ transfer learning. To do this, we need to train it on some images. We will train it on images of Tom, Jerry, both and neither of them. We will download the pictures from Kaggle: https://www.kaggle.com/datasets/balabaskar/tom-and-jerry-image-classification

Download the dataset

In [None]:
DATASET = 'balabaskar/tom-and-jerry-image-classification'
ZIP_PATH = './tom-and-jerry-image-classification.zip'
IMAGES_PATH = './tom_and_jerry/tom_and_jerry'

In [None]:

os.environ['KAGGLE_USERNAME'] = 'evaklimentov'
os.environ['KAGGLE_KEY'] = 'c3161c890c8b21e1e5cba18c9a7505c0'

!kaggle datasets download -d {DATASET} -p ./

In [None]:
import zipfile

with zipfile.ZipFile(ZIP_PATH, 'r') as zip_ref:
    zip_ref.extractall('./')

Load the dataset in a format suitable for training  and testing

In [None]:
batch_size = 32
img_height = # TODO
img_width = # TODO

In [None]:
train_ds = keras.utils.image_dataset_from_directory(
    IMAGES_PATH,
    validation_split=0.2,
    subset="training",
    seed=42,
    label_mode='categorical',
    image_size=(img_height, img_width),
    batch_size=batch_size)

val_ds = keras.utils.image_dataset_from_directory(
    IMAGES_PATH,
    validation_split=0.2,
    subset="validation",
    seed=42,
    label_mode='categorical',
    image_size=(img_height, img_width),
    batch_size=batch_size)

Have a look at the pictures

In [None]:
label_to_text = {
    0: 'Jerry',
    1: 'Tom',
    2: 'none',
    3: 'both'
}

In [None]:
plt.figure(figsize=(15, 5))
for i, (images, labels) in enumerate(train_ds.take(1)):
    for j in range(10):
        ax = plt.subplot(2, 5, j + 1)
        plt.imshow(images[j].numpy().astype("uint8"))
        plt.title(f"Label: {label_to_text[np.argmax(labels[j])]}")
        plt.axis("off")
plt.show()

##Get the model

In [None]:
base_model = keras.applications.ConvNeXtTiny(include_top=False)

# add an average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)
# add a fully-connected layer
x = Dense(1024, activation='relu')(x)
# and a another layer to distinguish our classes
predictions = Dense(4, activation='softmax')(x)

# this is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)

# first: train only the top layers (which were randomly initialized)
# i.e. freeze all ConvNeXt layers
for layer in base_model.layers:
    layer.trainable = False

# compile the model
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

In [None]:
model.summary()

## Train the model

In [None]:
history = model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=3
)

### Predicting one image from the validation dataset

In [None]:
plt.figure(figsize=(6, 3))
for images, labels in val_ds.take(1):
    sample_image = images[0]
    true_label = labels[0]

    sample_image = np.expand_dims(sample_image, axis=0)

    predictions = model.predict(sample_image)

    predicted_class_index = np.argmax(predictions, axis=1)[0]
    predicted_class = label_to_text[predicted_class_index]

    plt.imshow(sample_image[0].astype("uint8"))
    print(np.argmax(true_label))
    plt.title(f"True label: {label_to_text[np.argmax(true_label)]}\nPredicted label: {predicted_class}")
    plt.axis('off')

plt.show()

## Predict downloaded image

In [None]:
preprocessed_image = prepare_image() # TODO
print("Predicted label:", )

Predicted label: Tom
