# Junk Food Multi-label Classification with KNN

This notebook implements a **CNN** model for image classification from a **COCO JSON dataset**.

## Before you start

Make sure you have access to GPU. In case of any problems, navigate to `Edit` -> `Notebook settings` -> `Hardware accelerator`, set it to `GPU`, click `Save` and try again.

In [None]:
!nvidia-smi

Sun Jan 18 06:41:51 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A100-SXM4-40GB          Off |   00000000:00:04.0 Off |                    0 |
| N/A   34C    P0             68W /  400W |       0MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                

In [None]:
import os
HOME = os.getcwd()
print("HOME:", HOME)

HOME: /content


In [None]:
!mkdir -p {HOME}/datasets
%cd {HOME}/datasets


/content/datasets


## Install packages using pip

In [None]:
!pip install roboflow==1.2.11 tensorflow==2.19.0



## Download dataset from Roboflow

Don't forget to change the `API_KEY` with your dataset key.

We replicate your original dataset setup. Even though the dataset is labeled for object detection, we’ll use the full image classification approach with KNN. Labels will be derived from the most frequent class per image.

In [None]:
from roboflow import Roboflow
from google.colab import userdata

rf = Roboflow(api_key=userdata.get('ROBOFLOW_API_KEY'))
project = rf.workspace(userdata.get('ROBOFLOW_WORKSPACE_ID')).project(userdata.get('ROBOFLOW_PROJECT_ID'))
version = project.version(userdata.get('ROBOFLOW_DATASET_VERSION'))
dataset = version.download("coco")

loading Roboflow workspace...
loading Roboflow project...


In [None]:
%cd {HOME}

/content


## Convert COCO detection dataset to EfficientNetV2 multi-label classification

For labeling, we use the 7 classes from the COCO JSON dataset.

In [None]:
import json
import os
import numpy as np
from PIL import Image
from pathlib import Path
from typing import Tuple, List, Dict
import tensorflow as tf


def load_coco_annotations(json_path: str) -> Tuple[Dict, List, Dict]:
    with open(json_path, 'r') as f:
        coco_data = json.load(f)

    # Create mappings
    images_dict = {img['id']: img for img in coco_data['images']}

    # Filter out "junk-food" category
    categories = [cat for cat in coco_data['categories'] if cat['name'] != 'junk-food']

    # Get IDs of categories to keep
    valid_category_ids = {cat['id'] for cat in categories}

    # Group annotations by image_id, filtering out junk-food annotations
    annotations_by_image = {}
    for ann in coco_data['annotations']:
        # Skip if this annotation is for junk-food
        if ann['category_id'] not in valid_category_ids:
            continue

        image_id = ann['image_id']
        if image_id not in annotations_by_image:
            annotations_by_image[image_id] = []
        annotations_by_image[image_id].append(ann['category_id'])

    return annotations_by_image, categories, images_dict


def create_label_mapping(categories: List[Dict]) -> Tuple[Dict, Dict, int]:
    """
    Create category ID to index mapping for multi-label classification.
    """
    # Sort categories by ID for consistency
    sorted_categories = sorted(categories, key=lambda x: x['id'])

    cat_id_to_idx = {cat['id']: idx for idx, cat in enumerate(sorted_categories)}
    idx_to_cat_id = {idx: cat['id'] for idx, cat in enumerate(sorted_categories)}
    num_classes = len(categories)

    return cat_id_to_idx, idx_to_cat_id, num_classes


def transform_coco_to_multilabel(
    dataset_location: str,
    image_size: Tuple[int, int],
    subset: str = 'train',
) -> Tuple[np.ndarray, np.ndarray, Dict]:
    """
    Transform COCO JSON dataset into format for EfficientNetV2 multi-label classification.
    """
    # Construct paths
    subset_path = os.path.join(dataset_location, subset)
    json_path = os.path.join(subset_path, '_annotations.coco.json')

    if not os.path.exists(json_path):
        raise FileNotFoundError(f"Annotations file not found at {json_path}")

    # Load COCO annotations
    annotations_by_image, categories, images_dict = load_coco_annotations(json_path)

    # Create label mappings
    cat_id_to_idx, idx_to_cat_id, num_classes = create_label_mapping(categories)

    # Prepare lists for data
    image_paths = []
    labels_list = []

    # Process each image
    for image_id, image_info in images_dict.items():
        # Get image path
        image_filename = image_info['file_name']
        image_path = os.path.join(subset_path, image_filename)

        # Check if image exists
        if not os.path.exists(image_path):
            print(f"Warning: Image not found: {image_path}")
            continue

        # Create multi-hot encoded label
        label_vector = np.zeros(num_classes, dtype=np.float32)

        # Get annotations for this image
        if image_id in annotations_by_image:
            category_ids = annotations_by_image[image_id]
            for cat_id in category_ids:
                if cat_id in cat_id_to_idx:
                    idx = cat_id_to_idx[cat_id]
                    label_vector[idx] = 1.0

        image_paths.append(image_path)
        labels_list.append(label_vector)

    # Convert to numpy arrays
    image_paths = np.array(image_paths)
    labels = np.array(labels_list)

    # Create metadata dictionary
    metadata = {
        'num_classes': num_classes,
        'cat_id_to_idx': cat_id_to_idx,
        'idx_to_cat_id': idx_to_cat_id,
        'categories': categories,
        'image_size': image_size,
        'subset': subset,
        'num_samples': len(image_paths)
    }

    print(f"Loaded {subset} set: {len(image_paths)} images, {num_classes} classes")
    print(f"Labels shape: {labels.shape}")

    return image_paths, labels, metadata


def create_tf_dataset(
    image_paths: np.ndarray,
    labels: np.ndarray,
    metadata: Dict,
    batch_size: int = 32
) -> tf.data.Dataset:
    """
    Create a TensorFlow dataset from image paths and labels for EfficientNetV2.
    """
    image_size = metadata['image_size']

    def load_and_preprocess_image(image_path, label):
        # Read image
        image = tf.io.read_file(image_path)
        image = tf.image.decode_jpeg(image, channels=3)

        # Resize
        image = tf.image.resize(image, image_size)

        # Preprocess for EfficientNet (scales to [-1, 1])
        image = tf.keras.applications.efficientnet_v2.preprocess_input(image)

        return image, label

    # Create dataset
    dataset = tf.data.Dataset.from_tensor_slices((image_paths, labels))
    dataset = dataset.map(load_and_preprocess_image, num_parallel_calls=tf.data.AUTOTUNE)
    dataset = dataset.batch(batch_size)
    dataset = dataset.prefetch(tf.data.AUTOTUNE)

    return dataset

train_image_paths, train_labels, train_metadata = transform_coco_to_multilabel(
    dataset.location,
    subset='train',
    image_size=(640, 640)
)

train_dataset = create_tf_dataset(
    train_image_paths,
    train_labels,
    train_metadata,
)

Loaded train set: 4614 images, 7 classes
Labels shape: (4614, 7)


## Train multi-label classification EfficientNetV2 model with dataset

We train the EfficientNetV2 model with early stopping, a model checkpoint (to save the best resultant model), and display the required metrics for our evaluation.

In [None]:
from tensorflow.keras import layers, Model
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
import tensorflow as tf

valid_image_paths, valid_labels_train, valid_metadata = transform_coco_to_multilabel(
    dataset.location,
    subset='valid',
    image_size=(640, 640)
)

valid_dataset = create_tf_dataset(
    valid_image_paths,
    valid_labels_train,
    valid_metadata,
)

# Custom F1 Score metric (this is Micro F1)
class MicroF1Score(tf.keras.metrics.Metric):
    def __init__(self, name='micro_f1', **kwargs):
        super().__init__(name=name, **kwargs)
        self.precision = tf.keras.metrics.Precision()
        self.recall = tf.keras.metrics.Recall()

    def update_state(self, y_true, y_pred, sample_weight=None):
        self.precision.update_state(y_true, y_pred, sample_weight)
        self.recall.update_state(y_true, y_pred, sample_weight)

    def result(self):
        p = self.precision.result()
        r = self.recall.result()
        return 2 * ((p * r) / (p + r + tf.keras.backend.epsilon()))

    def reset_state(self):
        self.precision.reset_state()
        self.recall.reset_state()

# Custom Macro F1 Score metric
class MacroF1Score(tf.keras.metrics.Metric):
    def __init__(self, num_classes, name='macro_f1', **kwargs):
        super().__init__(name=name, **kwargs)
        self.num_classes = num_classes
        self.precisions = [tf.keras.metrics.Precision() for _ in range(num_classes)]
        self.recalls = [tf.keras.metrics.Recall() for _ in range(num_classes)]

    def update_state(self, y_true, y_pred, sample_weight=None):
        for i in range(self.num_classes):
            self.precisions[i].update_state(y_true[:, i], y_pred[:, i], sample_weight)
            self.recalls[i].update_state(y_true[:, i], y_pred[:, i], sample_weight)

    def result(self):
        f1_scores = []
        for i in range(self.num_classes):
            p = self.precisions[i].result()
            r = self.recalls[i].result()
            f1 = 2 * ((p * r) / (p + r + tf.keras.backend.epsilon()))
            f1_scores.append(f1)
        return tf.reduce_mean(f1_scores)

    def reset_state(self):
        for i in range(self.num_classes):
            self.precisions[i].reset_state()
            self.recalls[i].reset_state()

# Custom Subset Accuracy metric
class SubsetAccuracy(tf.keras.metrics.Metric):
    def __init__(self, name='subset_accuracy', threshold=0.5, **kwargs):
        super().__init__(name=name, **kwargs)
        self.threshold = threshold
        self.correct = self.add_weight(name='correct', initializer='zeros')
        self.total = self.add_weight(name='total', initializer='zeros')

    def update_state(self, y_true, y_pred, sample_weight=None):
        y_pred_binary = tf.cast(y_pred >= self.threshold, tf.float32)
        exact_matches = tf.reduce_all(tf.equal(y_true, y_pred_binary), axis=1)
        self.correct.assign_add(tf.reduce_sum(tf.cast(exact_matches, tf.float32)))
        self.total.assign_add(tf.cast(tf.shape(y_true)[0], tf.float32))

    def result(self):
        return self.correct / (self.total + tf.keras.backend.epsilon())

    def reset_state(self):
        self.correct.assign(0.0)
        self.total.assign(0.0)

# Build EfficientNetV2 multi-label classification model
base_model = tf.keras.applications.EfficientNetV2B0(
    include_top=False,
    weights='imagenet',
    input_shape=(640, 640, 3),
    pooling='avg'
)

# Unfreeze base model for fine-tuning
base_model.trainable = True

# Build model
inputs = tf.keras.Input(shape=(640, 640, 3))
x = base_model(inputs, training=True)
x = layers.Dropout(0.2)(x)
outputs = layers.Dense(train_metadata['num_classes'], activation='sigmoid')(x)

model = Model(inputs=inputs, outputs=outputs)

# Compile with all requested metrics
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
    loss='binary_crossentropy',
    metrics=[
        MicroF1Score(name='micro_f1'),
        MacroF1Score(num_classes=train_metadata['num_classes'], name='macro_f1'),
        tf.keras.metrics.AUC(name='auc', multi_label=True),
        SubsetAccuracy(name='subset_accuracy', threshold=0.5)
    ]
)

# Callbacks
early_stopping = EarlyStopping(
    monitor='val_loss',
    patience=5,
    restore_best_weights=True,
    verbose=1
)

model_checkpoint = ModelCheckpoint(
    filepath='best_model.keras',
    monitor='val_loss',
    save_best_only=True,
    verbose=1
)

# Train the model
history = model.fit(
    train_dataset,
    validation_data=valid_dataset,
    epochs=50,
    callbacks=[early_stopping, model_checkpoint],
    verbose=1
)

# Save the model
model.save('efficientnet_multilabel_model.keras')

Loaded valid set: 440 images, 7 classes
Labels shape: (440, 7)
Epoch 1/50
[1m145/145[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 668ms/step - auc: 0.6179 - loss: 0.4467 - macro_f1: 0.1540 - micro_f1: 0.2965 - subset_accuracy: 0.3247
Epoch 1: val_loss improved from inf to 0.24937, saving model to best_model.keras
[1m145/145[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m270s[0m 942ms/step - auc: 0.6186 - loss: 0.4460 - macro_f1: 0.1541 - micro_f1: 0.2970 - subset_accuracy: 0.3253 - val_auc: 0.8804 - val_loss: 0.2494 - val_macro_f1: 0.3485 - val_micro_f1: 0.5643 - val_subset_accuracy: 0.5227
Epoch 2/50
[1m144/145[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 119ms/step - auc: 0.9190 - loss: 0.2171 - macro_f1: 0.4171 - micro_f1: 0.6089 - subset_accuracy: 0.5425
Epoch 2: val_loss improved from 0.24937 to 0.16299, saving model to best_model.keras
[1m145/145[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 132ms/step - auc: 0.9193 - loss: 0.2167 - macro_f1: 0.418

## Run predictions on test set

In [None]:
import numpy as np
from sklearn.metrics import f1_score, accuracy_score

# Load the test dataset
test_image_paths, test_labels, test_metadata = transform_coco_to_multilabel(
    dataset.location,
    subset='test',
    image_size=(640, 640)
)

test_dataset = create_tf_dataset(
    test_image_paths,
    test_labels,
    test_metadata,
)

# Load the best model (no custom objects needed)
best_model = tf.keras.models.load_model('best_model.keras', compile=False)

# Generate predictions
print("Generating predictions...")
y_pred_probs = best_model.predict(test_dataset, verbose=1)
y_pred = (y_pred_probs > 0.5).astype(int)

# Get true labels
y_true = np.concatenate([y for x, y in test_dataset], axis=0)

# Calculate metrics
print("\n" + "=" * 50)
print("TEST SET METRICS")
print("=" * 50)

subset_accuracy = accuracy_score(y_true, y_pred)
print(f"Subset Accuracy: {subset_accuracy:.4f}")
micro_f1 = f1_score(y_true, y_pred, average='micro', zero_division=0)
print(f"Micro F1:        {micro_f1:.4f}")
macro_f1 = f1_score(y_true, y_pred, average='macro', zero_division=0)
print(f"Macro F1:        {macro_f1:.4f}")

# F1 score per class
print("\n" + "=" * 50)
print("F1 SCORE PER CLASS")
print("=" * 50)
class_names = [cat['name'] for cat in sorted(test_metadata['categories'], key=lambda x: x['id'])]
f1_per_class = f1_score(y_true, y_pred, average=None, zero_division=0)
for class_name, f1 in zip(class_names, f1_per_class):
    print(f"{class_name}: {f1:.4f}")

Loaded test set: 218 images, 7 classes
Labels shape: (218, 7)
Generating predictions...
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 4s/step

TEST SET METRICS
Subset Accuracy: 0.7844
Micro F1:        0.8548
Macro F1:        0.8490

F1 SCORE PER CLASS
french_fries: 0.7429
fried_chicken: 0.9268
hamburger: 0.9655
ice_cream: 0.8780
junk_food_logo: 0.8696
pizza: 0.8571
soda: 0.7027


## Real images test

Let's test the trained CNN model on random images from the test set with multi-label prediction.

In [None]:
import random
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from PIL import Image as PILImage

# Get indices of images that have at least one label (interesting images)
positive_indices = [
    i for i, labels_vector in enumerate(test_labels)
    if sum(labels_vector) > 0  # Has at least one label
]

# Pick 5 random test images with labels
test_indices = random.sample(positive_indices, min(5, len(positive_indices)))

# Define colors for each class
class_colors = {
    "french_fries": "#F39C12",
    "fried_chicken": "#E67E22",
    "hamburger": "#8B4513",
    "ice_cream": "#96CEB4",
    "junk_food_logo": "#FFEAA7",
    "pizza": "#FD79A8",
    "soda": "#A29BFE"
}

for idx, random_idx in enumerate(test_indices, 1):
    random_image_path = test_image_paths[random_idx]
    true_labels_vector = test_labels[random_idx]
    true_labels = [class_names[i] for i, val in enumerate(true_labels_vector) if val == 1]

    print(f"{'='*60}")
    print(f"Image {idx}/5: {random_image_path}")
    print('='*60)

    # Load and preprocess the image
    image = PILImage.open(random_image_path).convert("RGB")
    image_resized = image.resize((640, 640))
    image_array = np.array(image_resized, dtype=np.float32)
    # Use the same preprocessing as during training (scales to [-1, 1])
    image_array = tf.keras.applications.efficientnet_v2.preprocess_input(image_array)
    image_batch = np.expand_dims(image_array, axis=0)

    # Run inference
    pred_probs = best_model.predict(image_batch, verbose=0)[0]
    pred_labels_vector = (pred_probs > 0.5).astype(int)
    predicted_labels = [class_names[i] for i, val in enumerate(pred_labels_vector) if val == 1]

    # Display the image with predictions
    fig, ax = plt.subplots(1, 1, figsize=(12, 10))
    ax.imshow(image)
    ax.axis("off")

    # Create label badges at the bottom
    num_labels = len(predicted_labels) if predicted_labels else 1
    badge_width = 0.18
    badge_spacing = 0.02
    total_width = num_labels * badge_width + (num_labels - 1) * badge_spacing
    start_x = 0.5 - total_width / 2

    if predicted_labels:
        for i, label in enumerate(predicted_labels):
            x_pos = start_x + i * (badge_width + badge_spacing)
            color = class_colors.get(label, "#95A5A6")

            badge = mpatches.FancyBboxPatch(
                (x_pos, -0.08), badge_width, 0.05,
                boxstyle="round,pad=0.01",
                facecolor=color,
                edgecolor="white",
                linewidth=2,
                transform=ax.transAxes,
                clip_on=False
            )
            ax.add_patch(badge)
            ax.text(
                x_pos + badge_width / 2, -0.055,
                label.upper(),
                transform=ax.transAxes,
                fontsize=9,
                fontweight="bold",
                color="white" if label != "junk_food_logo" else "black",
                ha="center",
                va="center"
            )

    plt.tight_layout()
    plt.show()

    print(f"Model: EfficientNetV2B0 (CNN)")
    print(f"Predicted labels: {predicted_labels if predicted_labels else '(none)'}")
    print(f"True labels: {true_labels if true_labels else '(none)'}")

    print(f"All class probabilities:")
    sorted_probs = sorted(zip(class_names, pred_probs), key=lambda x: x[1], reverse=True)
    for cls, prob in sorted_probs:
        marker = ">" if prob > 0.5 else " "
        print(f"  {marker} {cls}: {prob:.3f}")

    correct_preds = set(predicted_labels) & set(true_labels)
    false_positives = set(predicted_labels) - set(true_labels)
    false_negatives = set(true_labels) - set(predicted_labels)
    print(f"Correct predictions: {list(correct_preds) if correct_preds else '(none)'}")
    print(f"False positives: {list(false_positives) if false_positives else '(none)'}")
    print(f"False negatives: {list(false_negatives) if false_negatives else '(none)'}")


Output hidden; open in https://colab.research.google.com to view.