Installing (updating) the following libraries for your Sagemaker
instance.

# AlexNet - TensorFlow

In [1]:
from d2l import tensorflow as d2l
import tensorflow as tf
tf.__version__

'2.4.1'

In [2]:
def net():
    return tf.keras.models.Sequential([
        tf.keras.layers.Conv2D(filters=96, kernel_size=11, strides=4,
                               activation='relu'),
        tf.keras.layers.MaxPool2D(pool_size=3, strides=2),
        tf.keras.layers.Conv2D(filters=256, kernel_size=5, padding='same',
                               activation='relu'),
        tf.keras.layers.MaxPool2D(pool_size=3, strides=2),
        tf.keras.layers.Conv2D(filters=384, kernel_size=3, padding='same',
                               activation='relu'),
        tf.keras.layers.Conv2D(filters=384, kernel_size=3, padding='same',
                               activation='relu'),
        tf.keras.layers.Conv2D(filters=256, kernel_size=3, padding='same',
                               activation='relu'),
        tf.keras.layers.MaxPool2D(pool_size=3, strides=2),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(4096, activation='relu'),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(4096, activation='relu'),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(10)
    ])

We construct a single-channel data example with both height and width of 224 to observe the output shape of each layer. 

In [3]:
X = tf.random.uniform((1, 224, 224, 1))
for layer in net().layers:
    X = layer(X)
    print(layer.__class__.__name__, 'Output shape:\t', X.shape)

Conv2D Output shape:	 (1, 54, 54, 96)
MaxPooling2D Output shape:	 (1, 26, 26, 96)
Conv2D Output shape:	 (1, 26, 26, 256)
MaxPooling2D Output shape:	 (1, 12, 12, 256)
Conv2D Output shape:	 (1, 12, 12, 384)
Conv2D Output shape:	 (1, 12, 12, 384)
Conv2D Output shape:	 (1, 12, 12, 256)
MaxPooling2D Output shape:	 (1, 5, 5, 256)
Flatten Output shape:	 (1, 6400)
Dense Output shape:	 (1, 4096)
Dropout Output shape:	 (1, 4096)
Dense Output shape:	 (1, 4096)
Dropout Output shape:	 (1, 4096)
Dense Output shape:	 (1, 10)


## Reading the Dataset


Although AlexNet is trained on ImageNet in the paper, we use Fashion-MNIST here
since training an ImageNet model to convergence could take hours or days
even on a modern GPU.
One of the problems with applying AlexNet directly on Fashion-MNIST
is that its images have lower resolution ($28 \times 28$ pixels)
than ImageNet images.
To make things work, we upsample them to $224 \times 224$
(generally not a smart practice,
but we do it here to be faithful to the AlexNet architecture).
We perform this resizing with the `resize` argument in the `d2l.load_data_fashion_mnist` function.

In [4]:
batch_size = 128
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=224)

## Training


Now, we can start training AlexNet.
Compared with LeNet in :numref:`sec_lenet`,
the main change here is the use of a smaller learning rate
and much slower training due to the deeper and wider network,
the higher image resolution, and the more costly convolutions.

In [None]:
lr, num_epochs = 0.01, 10
d2l.train_ch6(net, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())

loss 0.327, train acc 0.881, test acc 0.884
4397.7 examples/sec on /GPU:0


<tensorflow.python.keras.engine.sequential.Sequential at 0x7f1d15acd310>

[Q&A for this notebook](https://discuss.d2l.ai/t/276)
