# Junk Food Multi-label Classification with KNN

This notebook implements a **CNN** model for image classification from a **COCO JSON dataset**.

## Before you start

Make sure you have access to GPU. In case of any problems, navigate to `Edit` -> `Notebook settings` -> `Hardware accelerator`, set it to `GPU`, click `Save` and try again.

In [1]:
!nvidia-smi

Mon Jan 12 20:22:19 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A100-SXM4-40GB          Off |   00000000:00:04.0 Off |                    0 |
| N/A   33C    P0             43W /  400W |       0MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                

In [2]:
import os
HOME = os.getcwd()
print("HOME:", HOME)

HOME: /content


In [3]:
!mkdir -p {HOME}/datasets
%cd {HOME}/datasets


/content/datasets


## Install packages using pip

In [4]:
!pip install roboflow==1.2.11 tensorflow==2.19.0

Collecting roboflow==1.2.11
  Downloading roboflow-1.2.11-py3-none-any.whl.metadata (9.7 kB)
Collecting idna==3.7 (from roboflow==1.2.11)
  Downloading idna-3.7-py3-none-any.whl.metadata (9.9 kB)
Collecting opencv-python-headless==4.10.0.84 (from roboflow==1.2.11)
  Downloading opencv_python_headless-4.10.0.84-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (20 kB)
Collecting pi-heif<2 (from roboflow==1.2.11)
  Downloading pi_heif-1.1.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (6.5 kB)
Collecting pillow-avif-plugin<2 (from roboflow==1.2.11)
  Downloading pillow_avif_plugin-1.5.2-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (2.1 kB)
Collecting filetype (from roboflow==1.2.11)
  Downloading filetype-1.2.0-py2.py3-none-any.whl.metadata (6.5 kB)
Downloading roboflow-1.2.11-py3-none-any.whl (89 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m89.9/89.9 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading idna-3.7-py

## Download dataset from Roboflow

Don't forget to change the `API_KEY` with your dataset key.

We replicate your original dataset setup. Even though the dataset is labeled for object detection, we’ll use the full image classification approach with KNN. Labels will be derived from the most frequent class per image.

In [5]:
from roboflow import Roboflow
from google.colab import userdata

rf = Roboflow(api_key=userdata.get('ROBOFLOW_API_KEY'))
project = rf.workspace(userdata.get('ROBOFLOW_WORKSPACE_ID')).project(userdata.get('ROBOFLOW_PROJECT_ID'))
version = project.version(userdata.get('ROBOFLOW_DATASET_VERSION'))
dataset = version.download("coco")

loading Roboflow workspace...
loading Roboflow project...


Downloading Dataset Version Zip in Junk-Food-Detection-10 to coco:: 100%|██████████| 293482/293482 [00:03<00:00, 92789.16it/s]





Extracting Dataset Version Zip to Junk-Food-Detection-10 in coco:: 100%|██████████| 5280/5280 [00:00<00:00, 5775.64it/s]


In [6]:
%cd {HOME}

/content


## Convert COCO detection dataset to EfficientNetV2 multi-label classification

For labeling, we use the 7 classes from the COCO JSON dataset.

In [7]:
import json
import os
import numpy as np
from PIL import Image
from pathlib import Path
from typing import Tuple, List, Dict
import tensorflow as tf


def load_coco_annotations(json_path: str) -> Tuple[Dict, List, Dict]:
    with open(json_path, 'r') as f:
        coco_data = json.load(f)

    # Create mappings
    images_dict = {img['id']: img for img in coco_data['images']}

    # Filter out "junk-food" category
    categories = [cat for cat in coco_data['categories'] if cat['name'] != 'junk-food']

    # Get IDs of categories to keep
    valid_category_ids = {cat['id'] for cat in categories}

    # Group annotations by image_id, filtering out junk-food annotations
    annotations_by_image = {}
    for ann in coco_data['annotations']:
        # Skip if this annotation is for junk-food
        if ann['category_id'] not in valid_category_ids:
            continue

        image_id = ann['image_id']
        if image_id not in annotations_by_image:
            annotations_by_image[image_id] = []
        annotations_by_image[image_id].append(ann['category_id'])

    return annotations_by_image, categories, images_dict


def create_label_mapping(categories: List[Dict]) -> Tuple[Dict, Dict, int]:
    """
    Create category ID to index mapping for multi-label classification.
    """
    # Sort categories by ID for consistency
    sorted_categories = sorted(categories, key=lambda x: x['id'])

    cat_id_to_idx = {cat['id']: idx for idx, cat in enumerate(sorted_categories)}
    idx_to_cat_id = {idx: cat['id'] for idx, cat in enumerate(sorted_categories)}
    num_classes = len(categories)

    return cat_id_to_idx, idx_to_cat_id, num_classes


def transform_coco_to_multilabel(
    dataset_location: str,
    image_size: Tuple[int, int],
    subset: str = 'train',
) -> Tuple[np.ndarray, np.ndarray, Dict]:
    """
    Transform COCO JSON dataset into format for EfficientNetV2 multi-label classification.
    """
    # Construct paths
    subset_path = os.path.join(dataset_location, subset)
    json_path = os.path.join(subset_path, '_annotations.coco.json')

    if not os.path.exists(json_path):
        raise FileNotFoundError(f"Annotations file not found at {json_path}")

    # Load COCO annotations
    annotations_by_image, categories, images_dict = load_coco_annotations(json_path)

    # Create label mappings
    cat_id_to_idx, idx_to_cat_id, num_classes = create_label_mapping(categories)

    # Prepare lists for data
    image_paths = []
    labels_list = []

    # Process each image
    for image_id, image_info in images_dict.items():
        # Get image path
        image_filename = image_info['file_name']
        image_path = os.path.join(subset_path, image_filename)

        # Check if image exists
        if not os.path.exists(image_path):
            print(f"Warning: Image not found: {image_path}")
            continue

        # Create multi-hot encoded label
        label_vector = np.zeros(num_classes, dtype=np.float32)

        # Get annotations for this image
        if image_id in annotations_by_image:
            category_ids = annotations_by_image[image_id]
            for cat_id in category_ids:
                if cat_id in cat_id_to_idx:
                    idx = cat_id_to_idx[cat_id]
                    label_vector[idx] = 1.0

        image_paths.append(image_path)
        labels_list.append(label_vector)

    # Convert to numpy arrays
    image_paths = np.array(image_paths)
    labels = np.array(labels_list)

    # Create metadata dictionary
    metadata = {
        'num_classes': num_classes,
        'cat_id_to_idx': cat_id_to_idx,
        'idx_to_cat_id': idx_to_cat_id,
        'categories': categories,
        'image_size': image_size,
        'subset': subset,
        'num_samples': len(image_paths)
    }

    print(f"Loaded {subset} set: {len(image_paths)} images, {num_classes} classes")
    print(f"Labels shape: {labels.shape}")

    return image_paths, labels, metadata


def create_tf_dataset(
    image_paths: np.ndarray,
    labels: np.ndarray,
    metadata: Dict,
    batch_size: int = 32
) -> tf.data.Dataset:
    """
    Create a TensorFlow dataset from image paths and labels for EfficientNetV2.
    """
    image_size = metadata['image_size']

    def load_and_preprocess_image(image_path, label):
        # Read image
        image = tf.io.read_file(image_path)
        image = tf.image.decode_jpeg(image, channels=3)

        # Resize
        image = tf.image.resize(image, image_size)

        # Preprocess for EfficientNet (scales to [-1, 1])
        image = tf.keras.applications.efficientnet_v2.preprocess_input(image)

        return image, label

    # Create dataset
    dataset = tf.data.Dataset.from_tensor_slices((image_paths, labels))
    dataset = dataset.map(load_and_preprocess_image, num_parallel_calls=tf.data.AUTOTUNE)
    dataset = dataset.batch(batch_size)
    dataset = dataset.prefetch(tf.data.AUTOTUNE)

    return dataset

train_image_paths, train_labels, train_metadata = transform_coco_to_multilabel(
    dataset.location,
    subset='train',
    image_size=(640, 640)
)

train_dataset = create_tf_dataset(
    train_image_paths,
    train_labels,
    train_metadata,
)

Loaded train set: 4614 images, 7 classes
Labels shape: (4614, 7)


## Train multi-label classification EfficientNetV2 model with dataset

We train the EfficientNetV2 model with early stopping, a model checkpoint (to save the best resultant model), and display the required metrics for our evaluation.

In [8]:
from tensorflow.keras import layers, Model
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
import tensorflow as tf

valid_image_paths, valid_labels_train, valid_metadata = transform_coco_to_multilabel(
    dataset.location,
    subset='valid',
    image_size=(640, 640)
)

valid_dataset = create_tf_dataset(
    valid_image_paths,
    valid_labels_train,
    valid_metadata,
)

# Custom F1 Score metric (this is Micro F1)
class MicroF1Score(tf.keras.metrics.Metric):
    def __init__(self, name='micro_f1', **kwargs):
        super().__init__(name=name, **kwargs)
        self.precision = tf.keras.metrics.Precision()
        self.recall = tf.keras.metrics.Recall()

    def update_state(self, y_true, y_pred, sample_weight=None):
        self.precision.update_state(y_true, y_pred, sample_weight)
        self.recall.update_state(y_true, y_pred, sample_weight)

    def result(self):
        p = self.precision.result()
        r = self.recall.result()
        return 2 * ((p * r) / (p + r + tf.keras.backend.epsilon()))

    def reset_state(self):
        self.precision.reset_state()
        self.recall.reset_state()

# Custom Macro F1 Score metric
class MacroF1Score(tf.keras.metrics.Metric):
    def __init__(self, num_classes, name='macro_f1', **kwargs):
        super().__init__(name=name, **kwargs)
        self.num_classes = num_classes
        self.precisions = [tf.keras.metrics.Precision() for _ in range(num_classes)]
        self.recalls = [tf.keras.metrics.Recall() for _ in range(num_classes)]

    def update_state(self, y_true, y_pred, sample_weight=None):
        for i in range(self.num_classes):
            self.precisions[i].update_state(y_true[:, i], y_pred[:, i], sample_weight)
            self.recalls[i].update_state(y_true[:, i], y_pred[:, i], sample_weight)

    def result(self):
        f1_scores = []
        for i in range(self.num_classes):
            p = self.precisions[i].result()
            r = self.recalls[i].result()
            f1 = 2 * ((p * r) / (p + r + tf.keras.backend.epsilon()))
            f1_scores.append(f1)
        return tf.reduce_mean(f1_scores)

    def reset_state(self):
        for i in range(self.num_classes):
            self.precisions[i].reset_state()
            self.recalls[i].reset_state()

# Custom Subset Accuracy metric
class SubsetAccuracy(tf.keras.metrics.Metric):
    def __init__(self, name='subset_accuracy', threshold=0.5, **kwargs):
        super().__init__(name=name, **kwargs)
        self.threshold = threshold
        self.correct = self.add_weight(name='correct', initializer='zeros')
        self.total = self.add_weight(name='total', initializer='zeros')

    def update_state(self, y_true, y_pred, sample_weight=None):
        y_pred_binary = tf.cast(y_pred >= self.threshold, tf.float32)
        exact_matches = tf.reduce_all(tf.equal(y_true, y_pred_binary), axis=1)
        self.correct.assign_add(tf.reduce_sum(tf.cast(exact_matches, tf.float32)))
        self.total.assign_add(tf.cast(tf.shape(y_true)[0], tf.float32))

    def result(self):
        return self.correct / (self.total + tf.keras.backend.epsilon())

    def reset_state(self):
        self.correct.assign(0.0)
        self.total.assign(0.0)

# Build EfficientNetV2 multi-label classification model
base_model = tf.keras.applications.EfficientNetV2B0(
    include_top=False,
    weights='imagenet',
    input_shape=(640, 640, 3),
    pooling='avg'
)

# Unfreeze base model for fine-tuning
base_model.trainable = True

# Build model
inputs = tf.keras.Input(shape=(640, 640, 3))
x = base_model(inputs, training=True)
x = layers.Dropout(0.2)(x)
outputs = layers.Dense(train_metadata['num_classes'], activation='sigmoid')(x)

model = Model(inputs=inputs, outputs=outputs)

# Compile with all requested metrics
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
    loss='binary_crossentropy',
    metrics=[
        MicroF1Score(name='micro_f1'),
        MacroF1Score(num_classes=train_metadata['num_classes'], name='macro_f1'),
        tf.keras.metrics.AUC(name='auc', multi_label=True),
        SubsetAccuracy(name='subset_accuracy', threshold=0.5)
    ]
)

# Callbacks
early_stopping = EarlyStopping(
    monitor='val_loss',
    patience=5,
    restore_best_weights=True,
    verbose=1
)

model_checkpoint = ModelCheckpoint(
    filepath='best_model.keras',
    monitor='val_loss',
    save_best_only=True,
    verbose=1
)

# Train the model
history = model.fit(
    train_dataset,
    validation_data=valid_dataset,
    epochs=50,
    callbacks=[early_stopping, model_checkpoint],
    verbose=1
)

# Save the model
model.save('efficientnet_multilabel_model.keras')

Loaded valid set: 440 images, 7 classes
Labels shape: (440, 7)
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/efficientnet_v2/efficientnetv2-b0_notop.h5
[1m24274472/24274472[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
Epoch 1/50
[1m145/145[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 635ms/step - auc: 0.6123 - loss: 0.4518 - macro_f1: 0.1558 - micro_f1: 0.2770 - subset_accuracy: 0.3100
Epoch 1: val_loss improved from inf to 0.25082, saving model to best_model.keras
[1m145/145[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m263s[0m 908ms/step - auc: 0.6130 - loss: 0.4510 - macro_f1: 0.1558 - micro_f1: 0.2775 - subset_accuracy: 0.3106 - val_auc: 0.8747 - val_loss: 0.2508 - val_macro_f1: 0.3638 - val_micro_f1: 0.5559 - val_subset_accuracy: 0.5068
Epoch 2/50
[1m144/145[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 120ms/step - auc: 0.9215 - loss: 0.2172 - macro_f1: 0.4420 - micro_f1: 0.6147 - subset_accuracy: 0.548

## Run predictions on test set

In [9]:
import numpy as np
from sklearn.metrics import f1_score, accuracy_score

# Load the test dataset
test_image_paths, test_labels, test_metadata = transform_coco_to_multilabel(
    dataset.location,
    subset='test',
    image_size=(640, 640)
)

test_dataset = create_tf_dataset(
    test_image_paths,
    test_labels,
    test_metadata,
)

# Load the best model (no custom objects needed)
best_model = tf.keras.models.load_model('best_model.keras', compile=False)

# Generate predictions
print("Generating predictions...")
y_pred_probs = best_model.predict(test_dataset, verbose=1)
y_pred = (y_pred_probs > 0.5).astype(int)

# Get true labels
y_true = np.concatenate([y for x, y in test_dataset], axis=0)

# Calculate metrics
print("\n" + "=" * 50)
print("TEST SET METRICS")
print("=" * 50)

subset_accuracy = accuracy_score(y_true, y_pred)
print(f"Subset Accuracy: {subset_accuracy:.4f}")
micro_f1 = f1_score(y_true, y_pred, average='micro', zero_division=0)
print(f"Micro F1:        {micro_f1:.4f}")
macro_f1 = f1_score(y_true, y_pred, average='macro', zero_division=0)
print(f"Macro F1:        {macro_f1:.4f}")

# F1 score per class
print("\n" + "=" * 50)
print("F1 SCORE PER CLASS")
print("=" * 50)
class_names = [cat['name'] for cat in sorted(test_metadata['categories'], key=lambda x: x['id'])]
f1_per_class = f1_score(y_true, y_pred, average=None, zero_division=0)
for class_name, f1 in zip(class_names, f1_per_class):
    print(f"{class_name}: {f1:.4f}")

Loaded test set: 218 images, 7 classes
Labels shape: (218, 7)
Generating predictions...
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 4s/step

TEST SET METRICS
Subset Accuracy: 0.7890
Micro F1:        0.8516
Macro F1:        0.8348

F1 SCORE PER CLASS
french_fries: 0.7333
fried_chicken: 0.9091
hamburger: 0.9655
ice_cream: 0.7368
junk_food_logo: 0.8875
pizza: 0.8889
soda: 0.7222
