# Task 1

Check out this official repository with many examples of Keras implementations of various sorts of
deep neural networks here. We recommend cloning this repository and trying to get some of these
examples running on your system (or Colab/DeepNote). In particular, experiment with mnist mlp.py
and mnist cnn.py scripts which show you how to build simple neural networks for the MNIST dataset
(useful for the next task

Next, take the two well-known datasets: Fashion MNIST (introduced in Ch 10, p. 298) and CIFAR-10.
The first dataset contains 2D (grayscale) images of size 28x28, split into 10 categories; 60,000 images
for training and 10,000 for testing, while the latter contains 32x32x3 RGB images (50,000/10,000
train/test). Apply two reference networks on the fashion MNIST datase

## (a) MLP

initializations, activations, optimizers (and
their hyperparameters), regularizations (L1, L2, Dropout, no Dropout). You may also experiment
with changing the architecture of both networks: adding/removing layers, number of convolutional
filters, their sizes, etc.

In [None]:
import tensorflow as tf
seed = 9+10
tf.random.set_seed(seed)

2025-11-08 12:21:57.771730: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-11-08 12:22:09.578129: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-11-08 12:23:08.170003: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.


In [None]:
# Check number of available GPUs
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

Num GPUs Available:  1


In [None]:
# Use TensorFlow's bundled Keras to ensure compatibility with GPUs
from tensorflow import keras
import os
from sklearn.preprocessing import StandardScaler

random_state = 900
keras.utils.set_random_seed(random_state)

# Try to enable memory growth for all GPUs so TF doesn't reserve all GPU memory upfront
gpus = tf.config.list_physical_devices('GPU')
if gpus:
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        print('Enabled memory growth for GPUs:', gpus)
    except Exception as e:
        print('Could not set memory growth:', e)
else:
    print('No GPU devices found by TensorFlow')

Enabled memory growth for GPUs: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


### Model 1

Below is our first attempt at building a neural network. We use the SGD optimizer.

In [None]:
# Import FashionMNIST data
fashion_mnist= keras.datasets.fashion_mnist
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()
print(X_train_full.shape, y_train_full.shape)

# Preprocess data
X_valid, X_train = X_train_full[:5000] / 255.0, X_train_full[5000:] / 255.0
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]

X_test = X_test / 255.0
print(X_train.shape, y_train.shape)
print(X_valid.shape, y_valid.shape)


# Choose hyperparameters
seed = 900
activation_function = 'relu6'
kernel_ini = keras.initializers.Orthogonal(gain = 1.0, seed = seed)
kernel_reg = keras.regularizers.l2(0.0001)
bias_ini = keras.initializers.Zeros()
shape = [28, 28]

# Build model
model1 = keras.models.Sequential([
    keras.layers.Flatten(input_shape=shape),
    keras.layers.Dense(400, kernel_initializer = kernel_ini,
    activation=activation_function, kernel_regularizer=kernel_reg),
    keras.layers.Dropout(0.1),
    keras.layers.Dense(200, kernel_initializer = kernel_ini,
    activation=activation_function),
    keras.layers.Dropout(0.1),
    keras.layers.Dense(100, kernel_initializer = kernel_ini,
    activation=activation_function),
    keras.layers.Dropout(0.05),
    keras.layers.Dense(50, kernel_initializer = kernel_ini,
    activation=activation_function),
    keras.layers.Dropout(0.05),
    keras.layers.Dense(10, kernel_initializer = kernel_ini,
    activation="softmax")
])
early_stopping = keras.callbacks.EarlyStopping(
    patience=6,
    restore_best_weights=True
)

model1.summary()


model1.compile(loss="sparse_categorical_crossentropy",
optimizer=keras.optimizers.SGD(learning_rate=5e-3),
metrics=["accuracy",
        #   tf.keras.metrics.Precision(), tf.keras.metrics.Recall()
          ])
model1.fit(
    X_train, y_train,
    epochs=50,
    validation_data=(X_valid, y_valid),
    callbacks=[early_stopping]
)

#evaluate the model on the test set
test_loss, test_acc = model1.evaluate(X_test, y_test)
print('Test accuracy:', test_acc)
print('Test loss:', test_loss)

Epoch 1/50
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 6ms/step - accuracy: 0.5685 - loss: 1.3064 - val_accuracy: 0.7602 - val_loss: 0.7455
Epoch 2/50
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - accuracy: 0.7486 - loss: 0.7435 - val_accuracy: 0.8166 - val_loss: 0.5876
Epoch 3/50
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - accuracy: 0.7927 - loss: 0.6289 - val_accuracy: 0.8400 - val_loss: 0.5215
Epoch 4/50
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - accuracy: 0.8143 - loss: 0.5720 - val_accuracy: 0.8474 - val_loss: 0.4865
Epoch 5/50
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - accuracy: 0.8276 - loss: 0.5335 - val_accuracy: 0.8574 - val_loss: 0.4570
Epoch 6/50
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - accuracy: 0.8374 - loss: 0.5023 - val_accuracy: 0.8620 - val_loss: 0.4373
Epoch 7/50
[1m

In [9]:
#evaluate the model on the test set
test_loss, test_acc = model1.evaluate(X_test, y_test)
print('Test accuracy:', test_acc)
print('Test loss:', test_loss)
#0.8834999799728394

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.8618 - loss: 0.4676
Test accuracy: 0.8618000149726868
Test loss: 0.46763134002685547


#### Application to CIFAR10
We apply the model with chosen hyperparameters on the CIFAR10 set.

In [None]:
# Import CIFAR10 dataset
CIFAR10 = tf.keras.datasets.cifar10


# Preprocess data
(X_train_full, y_train_full), (X_test, y_test) = CIFAR10.load_data()
print(X_train_full.shape, y_train_full.shape)

X_valid, X_train = X_train_full[:5000] / 255.0, X_train_full[5000:] / 255.0
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]

X_test = X_test / 255.0
print(X_train.shape, y_train.shape)
print(X_valid.shape, y_valid.shape)


# Choose hyperparameters
seed = 900
activation_function = 'relu6'
kernel_ini = keras.initializers.Orthogonal(gain = 1.0, seed = seed)
bias_ini = keras.initializers.Zeros()
shape = [32,32,3]


# Build model
model1 = keras.models.Sequential([
    keras.layers.Flatten(input_shape=shape),
    keras.layers.Dense(400, kernel_initializer = kernel_ini,
    activation=activation_function, kernel_regularizer=keras.regularizers.l2(0.0001)),
    keras.layers.Dropout(0.1),
    keras.layers.Dense(200, kernel_initializer = kernel_ini,
    activation=activation_function),
    keras.layers.Dropout(0.1),
    keras.layers.Dense(100, kernel_initializer = kernel_ini,
    activation=activation_function),
    keras.layers.Dropout(0.05),
    keras.layers.Dense(50, kernel_initializer = kernel_ini,
    activation=activation_function),
    keras.layers.Dropout(0.05),
    keras.layers.Dense(10, kernel_initializer = kernel_ini,
    activation="softmax")
])
early_stopping = keras.callbacks.EarlyStopping(
    patience=6,
    restore_best_weights=True
)

model1.summary()


model1.compile(loss="sparse_categorical_crossentropy",
optimizer=keras.optimizers.SGD(learning_rate=5e-3),
metrics=["accuracy",
        #   tf.keras.metrics.Precision(), tf.keras.metrics.Recall()
          ])
model1.fit(
    X_train, y_train,
    epochs=50,
    validation_data=(X_valid, y_valid),
    callbacks=[early_stopping]
)

#evaluate the model on the test set
test_loss, test_acc = model1.evaluate(X_test, y_test)
print('Test accuracy:', test_acc)
print('Test loss:', test_loss)

  super().__init__(**kwargs)


Epoch 1/50
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 5ms/step - accuracy: 0.2290 - loss: 2.1300 - val_accuracy: 0.3240 - val_loss: 1.9446
Epoch 2/50
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.3124 - loss: 1.9376 - val_accuracy: 0.3514 - val_loss: 1.8714
Epoch 3/50
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.3457 - loss: 1.8571 - val_accuracy: 0.3704 - val_loss: 1.7985
Epoch 4/50
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.3659 - loss: 1.7963 - val_accuracy: 0.3734 - val_loss: 1.7994
Epoch 5/50
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.3854 - loss: 1.7520 - val_accuracy: 0.3932 - val_loss: 1.7423
Epoch 6/50
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.4014 - loss: 1.7138 - val_accuracy: 0.4008 - val_loss: 1.7085
Epoch 7/50
[1m

### Model 2

In [15]:
# Import FashionMNIST data
fashion_mnist= keras.datasets.fashion_mnist
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()
print(X_train_full.shape, y_train_full.shape)


# Preprocess data
X_valid, X_train = X_train_full[:5000] / 255.0, X_train_full[5000:] / 255.0
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]

X_test = X_test / 255.0
print(X_train.shape, y_train.shape)
print(X_valid.shape, y_valid.shape)


# Choose hyperparameters
seed = 900
activation_function = 'leaky_relu'

kernel_ini = "glorot_uniform"#keras.initializers.Orthogonal(gain = 1.0, seed = seed)
bias_ini = keras.initializers.Zeros()
shape = [28, 28]
learning_rate = 0.001


# Build model
model1 = keras.models.Sequential([
    keras.layers.Flatten(input_shape=shape),
    keras.layers.Dense(400, kernel_initializer = kernel_ini,
    activation=activation_function),
    keras.layers.Dropout(0.1),
    keras.layers.Dense(200, kernel_initializer = kernel_ini,
    activation=activation_function),
    keras.layers.Dropout(0.1),
    keras.layers.Dense(100, kernel_initializer = kernel_ini,
    activation=activation_function, kernel_regularizer=kernel_reg),
    keras.layers.Dropout(0.05),
    keras.layers.Dense(50, kernel_initializer = kernel_ini,
    activation=activation_function),
    keras.layers.Dropout(0.05),
    keras.layers.Dense(10, kernel_initializer = kernel_ini,
    activation="softmax")
])
early_stopping = keras.callbacks.EarlyStopping(
    patience=10,
    restore_best_weights=True
)

model1.summary()


model1.compile(loss="sparse_categorical_crossentropy",
optimizer=keras.optimizers.AdamW(learning_rate=learning_rate),
metrics=["accuracy",
        #   tf.keras.metrics.Precision(), tf.keras.metrics.Recall()
          ])
model1.fit(
    X_train, y_train,
    epochs=50,
    validation_data=(X_valid, y_valid),
    callbacks=[early_stopping]
)

# Evaluate the model on the test set
test_loss, test_acc = model1.evaluate(X_test, y_test)
print('Test accuracy:', test_acc)
print('Test loss:', test_loss)

(60000, 28, 28) (60000,)
(55000, 28, 28) (55000,)
(5000, 28, 28) (5000,)


  super().__init__(**kwargs)


Epoch 1/50
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 6ms/step - accuracy: 0.8036 - loss: 0.5515 - val_accuracy: 0.8560 - val_loss: 0.4168
Epoch 2/50
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.8500 - loss: 0.4216 - val_accuracy: 0.8598 - val_loss: 0.4036
Epoch 3/50
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.8620 - loss: 0.3881 - val_accuracy: 0.8678 - val_loss: 0.3862
Epoch 4/50
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.8709 - loss: 0.3642 - val_accuracy: 0.8750 - val_loss: 0.3611
Epoch 5/50
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - accuracy: 0.8767 - loss: 0.3486 - val_accuracy: 0.8774 - val_loss: 0.3427
Epoch 6/50
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - accuracy: 0.8811 - loss: 0.3333 - val_accuracy: 0.8800 - val_loss: 0.3449
Epoch 7/50
[1m

#### Application to CIFAR10

In [None]:
# Import CIFAR10 dataset
CIFAR10 = tf.keras.datasets.cifar10


# Preprocess data
(X_train_full, y_train_full), (X_test, y_test) = CIFAR10.load_data()
print(X_train_full.shape, y_train_full.shape)

X_valid, X_train = X_train_full[:5000] / 255.0, X_train_full[5000:] / 255.0
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]

X_test = X_test / 255.0
print(X_train.shape, y_train.shape)
print(X_valid.shape, y_valid.shape)


# Choose hyperparameters
seed = 900
activation_function = 'leaky_relu'

kernel_ini = "glorot_uniform"
bias_ini = keras.initializers.Zeros()
shape = [32,32,3]
learning_rate = 0.001


# Build model
model1 = keras.models.Sequential([
    keras.layers.Flatten(input_shape=shape),
    keras.layers.Dense(400, kernel_initializer = kernel_ini,
    activation=activation_function),
    keras.layers.Dropout(0.1),
    keras.layers.Dense(200, kernel_initializer = kernel_ini,
    activation=activation_function),
    keras.layers.Dropout(0.1),
    keras.layers.Dense(100, kernel_initializer = kernel_ini,
    activation=activation_function, kernel_regularizer=kernel_reg),
    keras.layers.Dropout(0.05),
    keras.layers.Dense(50, kernel_initializer = kernel_ini,
    activation=activation_function),
    keras.layers.Dropout(0.05),
    keras.layers.Dense(10, kernel_initializer = kernel_ini,
    activation="softmax")
])
early_stopping = keras.callbacks.EarlyStopping(
    patience=10,
    restore_best_weights=True
)

model1.summary()


model1.compile(loss="sparse_categorical_crossentropy",
optimizer=keras.optimizers.AdamW(learning_rate=learning_rate),
metrics=["accuracy",
        #   tf.keras.metrics.Precision(), tf.keras.metrics.Recall()
          ])
model1.fit(
    X_train, y_train,
    epochs=50,
    validation_data=(X_valid, y_valid),
    callbacks=[early_stopping]
)

# Evaluate the model on the test set
test_loss, test_acc = model1.evaluate(X_test, y_test)
print('Test accuracy:', test_acc)
print('Test loss:', test_loss)

(50000, 32, 32, 3) (50000, 1)
(45000, 32, 32, 3) (45000, 1)
(5000, 32, 32, 3) (5000, 1)


  super().__init__(**kwargs)


Epoch 1/50
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 7ms/step - accuracy: 0.2809 - loss: 1.9849 - val_accuracy: 0.3258 - val_loss: 1.8354
Epoch 2/50
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - accuracy: 0.3456 - loss: 1.8256 - val_accuracy: 0.3820 - val_loss: 1.7249
Epoch 3/50
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - accuracy: 0.3766 - loss: 1.7497 - val_accuracy: 0.3930 - val_loss: 1.6801
Epoch 4/50
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - accuracy: 0.4007 - loss: 1.6823 - val_accuracy: 0.4160 - val_loss: 1.6257
Epoch 5/50
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - accuracy: 0.4142 - loss: 1.6360 - val_accuracy: 0.4292 - val_loss: 1.5956
Epoch 6/50
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - accuracy: 0.4305 - loss: 1.5990 - val_accuracy: 0.4280 - val_loss: 1.5926
Epoch 7/50
[1m


### Model 3

In [19]:
# Import FashionMNIST data
fashion_mnist= keras.datasets.fashion_mnist
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()
print(X_train_full.shape, y_train_full.shape)


# Preprocess data
X_valid, X_train = X_train_full[:5000] / 255.0, X_train_full[5000:] / 255.0
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]

X_test = X_test / 255.0
print(X_train.shape, y_train.shape)
print(X_valid.shape, y_valid.shape)


# Choose hyperparameters
seed = 900
activation_function = 'relu6'

kernel_ini = keras.initializers.Orthogonal(gain = 1.0, seed = seed)
bias_ini = keras.initializers.Zeros()
shape = [28, 28]
learning_rate = 0.001


# Build model
model1 = keras.models.Sequential([
    keras.layers.Flatten(input_shape=shape),
    keras.layers.Dense(400, kernel_initializer = kernel_ini,
    activation=activation_function),
    keras.layers.Dropout(0.1),
    keras.layers.Dense(200, kernel_initializer = kernel_ini,
    activation=activation_function),
    keras.layers.Dropout(0.1),
    keras.layers.Dense(100, kernel_initializer = kernel_ini,
    activation=activation_function, kernel_regularizer=kernel_reg),
    keras.layers.Dropout(0.05),
    keras.layers.Dense(50, kernel_initializer = kernel_ini,
    activation=activation_function),
    keras.layers.Dropout(0.05),
    keras.layers.Dense(10, kernel_initializer = kernel_ini,
    activation="softmax")
])
early_stopping = keras.callbacks.EarlyStopping(
    patience=10,
    restore_best_weights=True
)

model1.summary()


model1.compile(loss="sparse_categorical_crossentropy",
optimizer=keras.optimizers.Nadam(learning_rate=learning_rate),
metrics=["accuracy",
        #   tf.keras.metrics.Precision(), tf.keras.metrics.Recall()
          ])
model1.fit(
    X_train, y_train,
    epochs=50,
    validation_data=(X_valid, y_valid),
    callbacks=[early_stopping]
)

# Evaluate the model on the test set
test_loss, test_acc = model1.evaluate(X_test, y_test)
print('Test accuracy:', test_acc)
print('Test loss:', test_loss)

#0.8765 

(60000, 28, 28) (60000,)
(55000, 28, 28) (55000,)
(5000, 28, 28) (5000,)


Epoch 1/50
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 6ms/step - accuracy: 0.8080 - loss: 0.5386 - val_accuracy: 0.8648 - val_loss: 0.4066
Epoch 2/50
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.8587 - loss: 0.4022 - val_accuracy: 0.8662 - val_loss: 0.3674
Epoch 3/50
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.8690 - loss: 0.3675 - val_accuracy: 0.8796 - val_loss: 0.3437
Epoch 4/50
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.8765 - loss: 0.3457 - val_accuracy: 0.8838 - val_loss: 0.3410
Epoch 5/50
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.8826 - loss: 0.3263 - val_accuracy: 0.8810 - val_loss: 0.3339
Epoch 6/50
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.8882 - loss: 0.3127 - val_accuracy: 0.8880 - val_loss: 0.3220
Epoch 7/50
[1m

#### Application to CIFAR10

In [21]:
# Import CIFAR10 dataset
CIFAR10 = tf.keras.datasets.cifar10


# Preprocess data
(X_train_full, y_train_full), (X_test, y_test) = CIFAR10.load_data()
print(X_train_full.shape, y_train_full.shape)

X_valid, X_train = X_train_full[:5000] / 255.0, X_train_full[5000:] / 255.0
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]

X_test = X_test / 255.0
print(X_train.shape, y_train.shape)
print(X_valid.shape, y_valid.shape)

# Choose hyperparameters
seed = 900
activation_function = 'relu6'

kernel_ini = keras.initializers.Orthogonal(gain = 1.0, seed = seed)
bias_ini = keras.initializers.Zeros()
shape = [32,32,3]
learning_rate = 0.001


# Build model
model1 = keras.models.Sequential([
    keras.layers.Flatten(input_shape=shape),
    keras.layers.Dense(400, kernel_initializer = kernel_ini,
    activation=activation_function),
    keras.layers.Dropout(0.1),
    keras.layers.Dense(200, kernel_initializer = kernel_ini,
    activation=activation_function),
    keras.layers.Dropout(0.1),
    keras.layers.Dense(100, kernel_initializer = kernel_ini,
    activation=activation_function, kernel_regularizer=kernel_reg),
    keras.layers.Dropout(0.05),
    keras.layers.Dense(50, kernel_initializer = kernel_ini,
    activation=activation_function),
    keras.layers.Dropout(0.05),
    keras.layers.Dense(10, kernel_initializer = kernel_ini,
    activation="softmax")
])
early_stopping = keras.callbacks.EarlyStopping(
    patience=10,
    restore_best_weights=True
)

model1.summary()


model1.compile(loss="sparse_categorical_crossentropy",
optimizer=keras.optimizers.Nadam(learning_rate=learning_rate),
metrics=["accuracy",
        #   tf.keras.metrics.Precision(), tf.keras.metrics.Recall()
          ])
model1.fit(
    X_train, y_train,
    epochs=50,
    validation_data=(X_valid, y_valid),
    callbacks=[early_stopping]
)

# Evaluate the model on the test set
test_loss, test_acc = model1.evaluate(X_test, y_test)
print('Test accuracy:', test_acc)
print('Test loss:', test_loss)

(50000, 32, 32, 3) (50000, 1)
(45000, 32, 32, 3) (45000, 1)
(5000, 32, 32, 3) (5000, 1)


  super().__init__(**kwargs)


Epoch 1/50


I0000 00:00:1762619831.257845 3515359 cuda_executor.cc:508] failed to allocate 2.00GiB (2147483648 bytes) from device: RESOURCE_EXHAUSTED: : CUDA_ERROR_OUT_OF_MEMORY: out of memory
I0000 00:00:1762619831.257972 3515359 cuda_executor.cc:508] failed to allocate 1.80GiB (1932735232 bytes) from device: RESOURCE_EXHAUSTED: : CUDA_ERROR_OUT_OF_MEMORY: out of memory
I0000 00:00:1762619831.258068 3515359 cuda_executor.cc:508] failed to allocate 1.62GiB (1739461632 bytes) from device: RESOURCE_EXHAUSTED: : CUDA_ERROR_OUT_OF_MEMORY: out of memory
I0000 00:00:1762619831.258155 3515359 cuda_executor.cc:508] failed to allocate 1.46GiB (1565515520 bytes) from device: RESOURCE_EXHAUSTED: : CUDA_ERROR_OUT_OF_MEMORY: out of memory


[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 7ms/step - accuracy: 0.2784 - loss: 1.9594 - val_accuracy: 0.3164 - val_loss: 1.8771
Epoch 2/50
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - accuracy: 0.3278 - loss: 1.8510 - val_accuracy: 0.3426 - val_loss: 1.8234
Epoch 3/50
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - accuracy: 0.3409 - loss: 1.8167 - val_accuracy: 0.3470 - val_loss: 1.7752
Epoch 4/50
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - accuracy: 0.3494 - loss: 1.7898 - val_accuracy: 0.3516 - val_loss: 1.7768
Epoch 5/50
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - accuracy: 0.3608 - loss: 1.7676 - val_accuracy: 0.3702 - val_loss: 1.7580
Epoch 6/50
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - accuracy: 0.3704 - loss: 1.7484 - val_accuracy: 0.3806 - val_loss: 1.7152
Epoch 7/50
[1m1407/1407[