**Food Recognition Using Transfer Learning with EfficientNetB0**

**I. Abstract**

This study shows how fine-tuning a pre-trained EfficientNetB0 model helps classify food images from the UECFOOD256 dataset with 10 food categories. The dataset has 1,913 images, split into 80% for training and 20% for testing. First, the model was trained with its base layers frozen, using ImageNet weights, reaching about 89.8% accuracy on the test set. Then, some layers were unfrozen and trained more (fine-tuning) to improve learning of food-specific features. After fine-tuning, the model’s accuracy was about 87%, showing it learned better but also had some overfitting. Example predictions before and after fine-tuning show the model’s ability to identify food images, but some errors remain. This study shows that transfer learning and fine-tuning can improve image classification, but careful choice of which layers to train and learning rate is important for best results.

**II. Task Definition and Dataset**

Task: To classify food images into 10 different classes.

Dataset: A 10-class subset of the UECFOOD-256 dataset was used. Each class has food images labeled by category (e.g., Sushi, Hamburger).

Preprocessing included resizing images to 224×224, normalization, shuffling, batching, and basic data augmentation. The dataset was split into 80% training, 10% validation, and 10% testing.

**III. Model and Methodology**

I used EfficientNetB0 as the base model for transfer learning. At first, the base model was frozen, and only the custom classification layers were trained. After that, I fine-tuned the top layers of the base model to attempt to improve performance. The model was built using TensorFlow and Keras, and trained on Google Colab with a GPU (T4 GPU). This setup helped speed up my training

*Training configuration*
*  Batch size: 32
*  Initial learning rate: 0.001 (before fine-tuning), then reduced to 0.00001
*  Epochs: 10 (5 before fine-tuning + 5 fine-tuning)
*  Loss function: Sparse Categorical Crossentropy with Label Smoothing
*  Optimizer: Adam

After freezing the base model, the number of trainable parameters was much lower. Once I unfroze some layers for fine-tuning, the total number of trainable parameters increased.

**IV. Qualitative Analysis**

Before fine-tuning:

True: Sushi | Predicted: Sushi

True: Hamburger | Predicted: Hamburger

True: Hamburger | Predicted: Hamburger

After fine-tuning:

True: Sashimi | Predicted: Sashimi

True: Ramen | Predicted: Udon

True: Hamburger | Predicted: Hamburger

Before fine-tuning, predictions were more accurate. After fine-tuning, accuracy dropped a little and there were some misclassifications, especially between similar-looking food (Ramen and Udon). This suggests that the initial pretrained features were already strong, and fine-tuning might have caused slight overfitting due to the small dataset.

**V. Conclusion**

I learned that transfer learning using a pretrained CNN like EfficientNetB0 performs well even with limited data. The base model already gave high accuracy due to learned features from large datasets. However, after fine-tuning, the accuracy slightly dropped, likely due to overfitting or limited training samples. The fine-tuned model did not significantly outperform the base model. In fact, the base model gave better validation accuracy and more stable predictions. A good takeaway, is that for small datasets, using a frozen pretrained model with good augmentations may be more effective than full fine-tuning.

Source: http://foodcam.mobi/dataset256.html


In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
!cp "/content/drive/My Drive/dataset/dataset256.zip" /content/

In [5]:
import zipfile
import os

zip_path = "/content/dataset256.zip"
extract_path = "/content/food_data"

with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(extract_path)

In [6]:
import tensorflow as tf
from tensorflow.keras.utils import image_dataset_from_directory

img_size = (224, 224)
batch_size = 32

extract_path = "/content/food_data"
train_ds_raw = image_dataset_from_directory(
    extract_path,
    validation_split=0.2,
    subset="training",
    seed=123,
    image_size=img_size,
    batch_size=batch_size
)

val_ds_raw = image_dataset_from_directory(
    extract_path,
    validation_split=0.2,
    subset="validation",
    seed=123,
    image_size=img_size,
    batch_size=batch_size
)

class_names = train_ds_raw.class_names

AUTOTUNE = tf.data.AUTOTUNE
train_ds = train_ds_raw.prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds_raw.prefetch(buffer_size=AUTOTUNE)


Found 31395 files belonging to 1 classes.
Using 25116 files for training.
Found 31395 files belonging to 1 classes.
Using 6279 files for validation.


In [7]:
from tensorflow.keras import layers, models
from tensorflow.keras.applications import EfficientNetB0

base_model = EfficientNetB0(input_shape=(224, 224, 3),
                            include_top=False,
                            weights='imagenet')

base_model.trainable = False

model = models.Sequential([
    layers.Rescaling(1./255),
    base_model,
    layers.GlobalAveragePooling2D(),
    layers.Dropout(0.3),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(len(train_ds_raw.class_names), activation='softmax')
])

model.compile(
    optimizer='adam',
    loss=tf.keras.losses.CategoricalCrossentropy(label_smoothing=0.1),
    metrics=['accuracy']
)

Downloading data from https://storage.googleapis.com/keras-applications/efficientnetb0_notop.h5
[1m16705208/16705208[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


In [8]:
import os

root_dir = "/content/food_data"

for dirpath, dirnames, filenames in os.walk(root_dir):
    level = dirpath.replace(root_dir, '').count(os.sep)
    indent = ' ' * 2 * level
    print(f"{indent}{os.path.basename(dirpath)}/")
    subindent = ' ' * 2 * (level + 1)
    for f in filenames[:5]:
        print(f"{subindent}{f}")

food_data/
  UECFOOD256/
    README.txt
    category.txt
    151/
      103527.jpg
      92678.jpg
      95977.jpg
      91343.jpg
      92208.jpg
    170/
      57983.jpg
      58902.jpg
      158189.jpg
      157986.jpg
      59198.jpg
    105/
      712.jpg
      67556.jpg
      167174.jpg
      182641.jpg
      68259.jpg
    85/
      9013.jpg
      8974.jpg
      8943.jpg
      10910.jpg
      8984.jpg
    44/
      4355.jpg
      4350.jpg
      4333.jpg
      4312.jpg
      4313.jpg
    234/
      232021.jpg
      317835.jpg
      318283.jpg
      317805.jpg
      234481.jpg
    46/
      4548.jpg
      7305.jpg
      14173.jpg
      15364.jpg
      4581.jpg
    110/
      71194.jpg
      5540.jpg
      3161.jpg
      4266.jpg
      94013.jpg
    115/
      5678.jpg
      75633.jpg
      6713.jpg
      9366.jpg
      75138.jpg
    107/
      8079.jpg
      182957.jpg
      183435.jpg
      10353.jpg
      183931.jpg
    162/
      107909.jpg
      153788.jpg
      153673.jpg
    

In [15]:
category_path = '/content/food_data/UECFOOD256/category.txt'

class_id_to_name = {}
with open(category_path, 'r') as file:
    for line in file:
        parts = line.strip().split('\t')
        if len(parts) >= 2:
            class_id = parts[0]
            class_name = parts[1].strip().replace(' ', '_')
            class_id_to_name[class_id] = class_name

In [10]:
import os
import shutil

original_path = '/content/food_data/UECFOOD256'
new_dataset_path = '/content/food_data/UECFOOD256_10_classes'

selected_class_ids = ['1', '10', '15', '23', '32', '45', '67', '88', '120', '150']

os.makedirs(new_dataset_path, exist_ok=True)

for class_id in selected_class_ids:
    src = os.path.join(original_path, class_id)
    dst = os.path.join(new_dataset_path, class_id)

    if os.path.exists(src):
        if os.path.exists(dst):
            shutil.rmtree(dst)
        shutil.copytree(src, dst)
    else:
        print(f"no folder")

In [16]:
from tensorflow.keras.preprocessing import image_dataset_from_directory

train_ds = image_dataset_from_directory(
    '/content/food_data/UECFOOD256_10_classes',
    validation_split=0.2,
    subset='training',
    seed=123,
    image_size=(224, 224),
    batch_size=32
)

val_ds = image_dataset_from_directory(
    '/content/food_data/UECFOOD256_10_classes',
    validation_split=0.2,
    subset='validation',
    seed=123,
    image_size=(224, 224),
    batch_size=32
)

class_names = train_ds.class_names

category_path = '/content/food_data/UECFOOD256/category.txt'
class_id_to_name = {}
with open(category_path, 'r') as file:
    for line in file:
        parts = line.strip().split('\t')
        if len(parts) >= 2:
            class_id = parts[0]
            class_name = parts[1].strip().replace(' ', '_')
            class_id_to_name[class_id] = class_name
label_map = {i: class_id_to_name[name] for i, name in enumerate(class_names)}
print(label_map)

Found 1913 files belonging to 10 classes.
Using 1531 files for training.
Found 1913 files belonging to 10 classes.
Using 382 files for validation.
{0: 'rice', 1: 'tempura_bowl', 2: 'shortcake', 3: 'raisin_bread', 4: 'meat_loaf', 5: 'ramen_noodle', 6: 'croquette', 7: 'fried_fish', 8: 'omelet', 9: 'macaroni_salad'}


In [18]:
import tensorflow as tf

AUTOTUNE = tf.data.AUTOTUNE
train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)

In [19]:
base_model = tf.keras.applications.EfficientNetB0(
    input_shape=(224, 224, 3),
    include_top=False,
    weights='imagenet'
)

base_model.trainable = False

global_avg = tf.keras.layers.GlobalAveragePooling2D()(base_model.output)
dropout = tf.keras.layers.Dropout(0.2)(global_avg)
output = tf.keras.layers.Dense(10, activation='softmax')(dropout)

model = tf.keras.Model(inputs=base_model.input, outputs=output)

model.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(),
    metrics=['accuracy']
)

In [20]:
initial_epochs = 5
history_before = model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=initial_epochs
)

import numpy as np

def show_sample_predictions(dataset, model, label_map, num_samples=3, label=""):
    for images, labels in dataset.take(1):
        preds = model.predict(images)
        print(f"\n{label} predictions:")
        for i in range(num_samples):
            true_index = labels[i].numpy()
            pred_index = np.argmax(preds[i])
            true_label = label_map[true_index]
            pred_label = label_map[pred_index]
            print(f"True: {true_label} | Predicted: {pred_label}")

show_sample_predictions(val_ds, model, label_map, label="before fine tuning")

Epoch 1/5
[1m48/48[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m47s[0m 495ms/step - accuracy: 0.3667 - loss: 1.8202 - val_accuracy: 0.8063 - val_loss: 0.8385
Epoch 2/5
[1m48/48[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 39ms/step - accuracy: 0.8227 - loss: 0.7396 - val_accuracy: 0.8586 - val_loss: 0.5991
Epoch 3/5
[1m48/48[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 38ms/step - accuracy: 0.8613 - loss: 0.5372 - val_accuracy: 0.8770 - val_loss: 0.5190
Epoch 4/5
[1m48/48[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 38ms/step - accuracy: 0.8993 - loss: 0.4261 - val_accuracy: 0.8874 - val_loss: 0.4731
Epoch 5/5
[1m48/48[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 45ms/step - accuracy: 0.9065 - loss: 0.3609 - val_accuracy: 0.8874 - val_loss: 0.4500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 6s/step

before fine tuning predictions:
True: tempura_bowl | Predicted: tempura_bowl
True: fried_fish | Predicted: rice
True: shortcake

In [21]:
base_model.trainable = True

fine_tune_at = 100

for layer in base_model.layers[:fine_tune_at]:
    layer.trainable = False

model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-5),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(),
    metrics=['accuracy']
)

fine_tune_epochs = 5
total_epochs = initial_epochs + fine_tune_epochs

history_after = model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=total_epochs,
    initial_epoch=history_before.epoch[-1] + 1
)

show_sample_predictions(val_ds, model, label_map, label="after fine tuning")

Epoch 6/10
[1m48/48[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m89s[0m 795ms/step - accuracy: 0.6882 - loss: 1.1510 - val_accuracy: 0.8848 - val_loss: 0.4615
Epoch 7/10
[1m48/48[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 74ms/step - accuracy: 0.7752 - loss: 0.9491 - val_accuracy: 0.8848 - val_loss: 0.5099
Epoch 8/10
[1m48/48[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 74ms/step - accuracy: 0.8107 - loss: 0.8209 - val_accuracy: 0.8743 - val_loss: 0.5577
Epoch 9/10
[1m48/48[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 75ms/step - accuracy: 0.8611 - loss: 0.7152 - val_accuracy: 0.8665 - val_loss: 0.5813
Epoch 10/10
[1m48/48[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 75ms/step - accuracy: 0.8583 - loss: 0.6446 - val_accuracy: 0.8639 - val_loss: 0.5891
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 6s/step

after fine tuning predictions:
True: tempura_bowl | Predicted: tempura_bowl
True: fried_fish | Predicted: rice
True: sho