Notes written from: https://thegrigorian.medium.com/understanding-vgg-neural-networks-architecture-and-implementation-400d99a9e9ba
# Understanding VGG Neural Networks
- VGG (Visual Geometry Group) is a CNN proposed by researchers from Oxford in 2014.
- Main challenge in image recognition is capturing intricate features and patterns in images.
  - primary motivations for **deep architectures** was the realization that visual data contains features at various levels of abstraction
- VGG tries to solve this problem with depth - by stacking many layers, VGG created a hierarchy of features that allowed it to grasp the essence of images
- Shallower architectures like LeNet and AlexNet captures simpler features, and encounters bottlenecks when confronted with complex images

## Explaining VGG
- The primary focus of the VGG architecture is on increasing the depth of the network while using simple and uniform convolutional layers.
- have various VGG architectures like VGG13, VGG19 -> number = total layers

1. Convolutional Layers
  - use 3 x 3 filters
  - allows the network to capture a wide range of features at different scales, enabling richer representations
2. Pooling Layers
  - After every two to three convolutional layers, max-pooling layers are used for spatial down-sampling
  - Max-pooling helps in reducing the spatial dimensions while retaining important features
3. Fully-Connected Layers
  - consolidate the learned features and produce class predictions

- use ReLU as activation functions
- networks are trained using stochastic gradient descent with momentum
- dropout is used to reduce overfitting

## VGG Blocks
- blocks of convolutional and pooling layers
- stack these blocks to discern high-level features that define the identity of objects in images



# Implementation using TensorFlow

In [None]:
from tensorflow import keras

In [None]:
class Block(keras.Model):
  def __init__(self, filters, kernel_size, repetitions, strides=2, pool_size=2):
    super(Block, self).__init__()
    self.filters = filters
    self.kernel_size = kernel_size
    self.repetitions = repetitions

    # create a convolutional layer repetition times
    for i in range(repetitions):
      vars(self)[f'conv2d_{i}'] = keras.layers.Conv2D(filters, kernel_size, activation='relu', padding='same')

    # create max pooling layer after the convolutional layers
    self.max_pool = keras.layers.MaxPool2D(pool_size=pool_size, strides=strides)

  # takes x (input) and processes it through the block
  def call(self, x):
    conv2D_0 = vars(self)['conv2d_0']
    x = conv2D_0(x)

    for i in range(1, self.repetitions):
      conv2D_i = vars(self)[f'conv2d_{i}']
      x = conv2D_i(x)

    max_pool = self.max_pool(x)
    return max_pool


In [None]:
class MyVGG(keras.Model):
  def __init__(self, input_shape, num_classes=10):
    super(MyVGG, self).__init__()
    self.block_a = Block(filters=64, kernel_size=3, repetitions=2)
    self.block_b = Block(filters=128, kernel_size=3, repetitions=2)
    self.block_c = Block(filters=256, kernel_size=3, repetitions=3)
    self.block_d = Block(filters=512, kernel_size=3, repetitions=3)
    self.block_e = Block(filters=512, kernel_size=3, repetitions=3)
    self.flatten = keras.layers.Flatten()
    self.dense_1 = keras.layers.Dense(256, activation='relu')
    self.dense_2 = keras.layers.Dense(num_classes, activation='softmax')

  def call(self, inputs):
    x = self.block_a(inputs)
    x = self.block_b(x)
    x = self.block_c(x)
    x = self.block_d(x)
    x = self.block_e(x)
    x = self.flatten(x)
    x = self.fc(x)
    x = self.classifier(x)
    return x

In [None]:
# training
model = MyVGG(num_classes=2)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])