# Supernet

In the field of machine learning, Supernet refers to a concept that involves training a single neural network architecture that can represent multiple related network architectures. It is often used in the context of neural architecture search (NAS), which is a technique for automatically discovering the optimal architecture for a given task.

Traditionally, in NAS, individual neural network architectures are evaluated and trained separately, which can be computationally expensive and time-consuming. Supernet offers a more efficient approach by creating a single network that encompasses a diverse set of architectural choices.

The Supernet consists of a combination of various architectural components, such as convolutional layers, pooling operations, skip connections, etc. Each component has multiple possible options or configurations. During training, the Supernet learns to assign weights or probabilities to different architectural choices, effectively learning to represent a wide range of network architectures within a single model.

Once the Supernet is trained, a process called architecture sampling is used to select specific network architectures from the Supernet. This involves applying a search algorithm, such as reinforcement learning or evolutionary algorithms, to find the optimal combination of architectural choices for a given task or dataset.

By using a Supernet, the search process for finding the best architecture is significantly accelerated, as it eliminates the need to train each architecture from scratch. The Supernet acts as a compact representation of a large search space of possible architectures, enabling faster and more efficient exploration.

Overall, Supernet is a technique that enables efficient neural architecture search by training a single network that can represent multiple network architectures, providing a more streamlined approach to discovering optimal models for various machine learning tasks.

In [1]:
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Input

# Load and preprocess the MNIST dataset
(input_train, target_train), (input_test, target_test) = mnist.load_data()
input_shape = input_train.shape[1:]
num_classes = len(set(target_train))

input_train = input_train.reshape(input_train.shape[0], *input_shape, 1)
input_test = input_test.reshape(input_test.shape[0], *input_shape, 1)
target_train = tf.keras.utils.to_categorical(target_train, num_classes)
target_test = tf.keras.utils.to_categorical(target_test, num_classes)

# Regular model without Supernet
regular_model = tf.keras.Sequential([
    Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape+(1,)),
    MaxPooling2D(pool_size=(2, 2)),
    Flatten(),
    Dense(num_classes, activation='softmax')
])

regular_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

regular_model.fit(input_train, target_train, batch_size=64, epochs=10, validation_data=(input_test, target_test))

regular_model_score = regular_model.evaluate(input_test, target_test, verbose=0)
print("Regular Model Test Accuracy:", regular_model_score[-1])

# Supernet model
def build_supernet(input_shape, num_classes, num_architectural_choices):
    inputs = Input(shape=input_shape + (1,))
    backbone = Conv2D(32, kernel_size=(3, 3), activation='relu')(inputs)
    backbone = MaxPooling2D(pool_size=(2, 2))(backbone)
    backbone = Flatten()(backbone)

    choices = []
    for i in range(num_architectural_choices):
        choice = Dense(64, activation='relu')(backbone)
        choice = Dense(num_classes, activation='softmax')(choice)
        choices.append(choice)

    supernet_model = Model(inputs=inputs, outputs=choices)
    return supernet_model

num_architectural_choices = 3
supernet_model = build_supernet(input_shape, num_classes, num_architectural_choices)

supernet_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
supernet_model.fit(input_train, [target_train] * num_architectural_choices, batch_size=64, epochs=10, validation_data=(input_test, [target_test] * num_architectural_choices))

supernet_model_score = supernet_model.evaluate(input_test, [target_test] * num_architectural_choices, verbose=0)
print("Supernet Model Test Accuracy:", supernet_model_score[-1])

2023-06-14 10:26:13.358532: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Epoch 1/10


2023-06-14 10:26:14.599913: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023-06-14 10:26:14.615667: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Regular Model Test Accuracy: 0.9764999747276306
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Supernet Model Test Accuracy: 0.9789000153541565
