### EfficientNet model transfer learning 

EfficientNetB0 is the baseline model of the EfficientNet family, which consists of a group of convolutional neural networks (CNNs) that were designed to scale more efficiently with respect to the available computational resources. EfficientNetB0 was introduced in a research paper by Mingxing Tan and Quoc V. Le in 2019.

EfficientNet models are based on a principle called "compound scaling," where depth, width, and resolution of the network are scaled in a balanced way. The authors of the paper found that scaling up any dimension of a network (depth, width, or resolution) would improve accuracy, but to a certain point. Beyond that point, the network would see diminishing returns in model performance.

EfficientNetB0 serves as the baseline for the other EfficientNets (B1-B7), which are scaled up versions of B0 using the compound scaling method. The scaling method multiplies the dimensions of the network by a constant factor, which is determined by a grid search on the baseline B0 model.

Key characteristics of EfficientNetB0 include:
- It uses a mobile inverted bottleneck convolution (MBConv), similar to MobileNetV2 and MnasNet, which are also designed to be efficient.
- It employs squeeze-and-excitation blocks, which allow the network to recalibrate the weights of different channels.
- It's optimized to work well across a wide range of input resolutions, making it flexible for different applications.

EfficientNets, including B0, achieved state-of-the-art accuracy on ImageNet and other benchmarks at the time of their introduction, while using significantly less compute – hence the name "EfficientNet".

In [4]:
import os
import cv2
import csv
import numpy as np
import pandas as pd
import random
import gc
import sys
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.applications import EfficientNetB0
from tensorflow.keras.layers import Dense, Conv2D, Flatten, MaxPooling2D, Input, Dense, Flatten, concatenate, GlobalAveragePooling2D, Dropout, BatchNormalization
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.applications.inception_v3 import InceptionV3
import keras_tuner as kt
from keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Model
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.mixed_precision import set_global_policy
from tensorflow.keras.callbacks import LearningRateScheduler
from sklearn.model_selection import train_test_split
from tensorflow.keras.regularizers import l2

In [5]:
# Set the global policy to mixed_float16
set_global_policy('mixed_float16')

In [6]:
# Ensure the script uses the GPU if available and set memory growth
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
    except RuntimeError as e:
        # Memory growth must be set at program startup
        print(e)

### Load data 

In [7]:
# Load your preprocessed data
X_train = np.load('X_train-299.npy')
print('X_train loaded')
X_val = np.load('X_val-299.npy')
print('X_val loaded')
y_train = np.load('y_train-299.npy')
print('y_train loaded')
y_val = np.load('y_val-299.npy')
print('y_val loaded')

X_train loaded
X_val loaded
y_train loaded
y_val loaded


### Model training

In [8]:
def build_efficientnet_model(hp):
    # Load EfficientNetB0 as base model
    base_model = EfficientNetB0(include_top=False, input_tensor=Input(shape=(299, 299, 3)), weights='imagenet')
    print("Initial number of layers in the base model:", len(base_model.layers))
    
    # Freeze the base model layers initially
    for layer in base_model.layers[:-hp.Int('unfreeze_layers', min_value=0, max_value=len(base_model.layers), step=5)]:
        if not isinstance(layer, BatchNormalization):  # It's often advised to keep BatchNormalization layers frozen
            layer.trainable = False

    # Add custom layers on top of EfficientNetB0
    x = base_model.output
    
    # Additional convolutional layers with L2 regularization before the global pooling
    for i in range(hp.Int('num_additional_conv_blocks', 1, 3)):
        x = Conv2D(filters=hp.Int(f'conv_filters_{i}', min_value=32, max_value=128, step=32),
                   kernel_size=hp.Choice(f'conv_kernel_size_{i}', values=[3, 5]),
                   activation='relu', padding='same',
                   kernel_regularizer=l2(hp.Float(f'conv_l2_reg_{i}', min_value=1e-5, max_value=1e-2, step=1e-5)))(x)
        x = BatchNormalization()(x)
        x = MaxPooling2D(pool_size=(2, 2))(x)
        x = Dropout(hp.Float(f'conv_dropout_rate_{i}', min_value=0.1, max_value=0.5, step=0.1))(x)

    # Global pooling layer added after convolutional layers
    x = GlobalAveragePooling2D()(x)
    
    # Dense layer with L2 regularization
    x = Dense(hp.Int('dense_units', min_value=32, max_value=256, step=32), activation='relu',
              kernel_regularizer=l2(hp.Float('dense_l2_reg', min_value=1e-5, max_value=1e-2, step=1e-5)))(x)
    x = Dropout(hp.Float('dense_dropout_rate', min_value=0.1, max_value=0.5, step=0.1))(x)
    
    # Output layer
    predictions = Dense(26, activation='softmax')(x)

    # Compile the model
    model = Model(inputs=base_model.input, outputs=predictions)
    print("Total number of layers in the model:", len(model.layers))
    model.compile(optimizer=Adam(hp.Choice('learning_rate', values=[1e-3, 1e-4])),
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])

    return model


In [9]:
# Early stopping callback
early_stopping = EarlyStopping(monitor='val_loss', patience=3, verbose=1, mode='min')

# Set up the tuner for hyperparameter tuning using Hyperband
tuner = kt.Hyperband(
    build_efficientnet_model, 
    objective='val_accuracy',
    max_epochs=10,
    factor=3,
    hyperband_iterations=2,  # Number of times to iterate over the full Hyperband algorithm
    directory='efficientnet-model-tuning', 
    project_name='efficientnet-tuning'  
)

Reloading Tuner from efficientnet-model-tuning/efficientnet-tuning/tuner0.json


In [9]:
# Search for the best hyperparameters
tuner.search(X_train, y_train, epochs=10, validation_data=(X_val, y_val), callbacks=[early_stopping])

Trial 60 Complete [00h 09m 19s]
val_accuracy: 0.9654948115348816

Best val_accuracy So Far: 0.9934895634651184
Total elapsed time: 05h 03m 08s


In [10]:
# Get the best hyperparameters
best_hp = tuner.get_best_hyperparameters()[0]

# Print each hyperparameter and its corresponding best value
for hp in best_hp.space:
    print(f"{hp.name}: {best_hp.get(hp.name)}")

num_additional_conv_blocks: 1
conv_filters_0: 96
conv_kernel_size_0: 3
conv_l2_reg_0: 0.0015400000000000001
conv_dropout_rate_0: 0.1
dense_units: 160
dense_l2_reg: 0.00616
dense_dropout_rate: 0.2
learning_rate: 0.0001
conv_filters_1: 32
conv_kernel_size_1: 5
conv_l2_reg_1: 0.00037000000000000005
conv_dropout_rate_1: 0.30000000000000004
conv_filters_2: 128
conv_kernel_size_2: 3
conv_l2_reg_2: 0.00337
conv_dropout_rate_2: 0.5
unfreeze_layers: 215


In [11]:
# Retrieve all completed trials
trials = [t for t in tuner.oracle.trials.values() if t.status == 'COMPLETED']

# Prepare data for CSV
data_to_save = [["Trial Number", "Hyperparameters", "Validation Accuracy"]]

# Add data from each trial
for i, trial in enumerate(trials):
    trial_hyperparams = trial.hyperparameters.values
    val_accuracy = trial.score  
    data_to_save.append([f"Trial {i+1}", trial_hyperparams, val_accuracy])

# Write to CSV
with open('efficientnet_hyperparameter_trials.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerows(data_to_save)

In [12]:
# Learning rate scheduler
def scheduler(epoch, lr):
    if epoch < 10:
        return lr
    else:
        return lr * tf.math.exp(-0.1)

# Early stopping
early_stopping = EarlyStopping(monitor='val_loss', patience=5)

# Model checkpoint
model_checkpoint = ModelCheckpoint(
    'efficientnet-model.h5',  # Path where to save the model
    save_best_only=True, 
    monitor='val_loss', 
    mode='min'
)

# Combine all callbacks
callbacks_list = [
    LearningRateScheduler(scheduler),
    early_stopping,
    model_checkpoint
]

# Train model with best hyperparameters within strategy scope
model = build_efficientnet_model(best_hp)

# Fit the model
history = model.fit(
    X_train, y_train,
    epochs=50,
    validation_data=(X_val, y_val),
    callbacks=callbacks_list, 
    verbose=1
)

2023-11-15 12:20:00.625381: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023-11-15 12:20:00.627445: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023-11-15 12:20:00.629393: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysf

Initial number of layers in the base model: 238
Total number of layers in the model: 246
Epoch 1/50


2023-11-15 12:20:58.968674: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:442] Loaded cuDNN version 8900


  1/195 [..............................] - ETA: 2:14:41 - loss: 4.7920 - accuracy: 0.0312

2023-11-15 12:21:10.021753: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f791800cfa0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-11-15 12:21:10.021799: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2023-11-15 12:21:10.021807: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (1): Tesla T4, Compute Capability 7.5
2023-11-15 12:21:10.021813: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (2): Tesla T4, Compute Capability 7.5
2023-11-15 12:21:10.021819: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (3): Tesla T4, Compute Capability 7.5
2023-11-15 12:21:10.302956: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2023-11-15 12:21:11.040047: I ./tensorflow/compiler/jit/device



  saving_api.save_model(


Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


### Model Metrics save output

In [None]:
metrics_df = pd.DataFrame({
    'Epoch': range(1, len(history.history['loss']) + 1),
    'Loss': history.history['loss'],
    'Accuracy': history.history['accuracy'],
    'Val_Loss': history.history['val_loss'],
    'Val_Accuracy': history.history['val_accuracy']
})

# Save the metrics to a CSV file
metrics_df.to_csv('efficientnet-metrics.csv', index=False)

# Save full model 
model.save('efficientnet-fullmodel-full.h5')
