# Desarrollo del código Perl/MXNet del notebook #3
## Grupo 5

## Integrantes: Almeida Edison, Laje Adrian, Mejia Leonardo y Willian Medina
## 7.3. Network in Network (NiN)
LeNet, AlexNet, and VGG all share a common design pattern: extract features exploiting spatial structure via a sequence of convolution and pooling layers and then post-process the representations via fully-connected layers. The improvements upon LeNet by AlexNet and VGG mainly lie in how these later networks widen and deepen these two modules. Alternatively, one could imagine using fully-connected layers earlier in the process. However, a careless use of dense layers might give up the spatial structure of the representation entirely, network in network (NiN) blocks offer an alternative. They were proposed based on a very simple insight: to use an MLP on the channels for each pixel separately ().

## 7.3.1. NiN Blocks
Recall that the inputs and outputs of convolutional layers consist of four-dimensional tensors with axes corresponding to the example, channel, height, and width. Also recall that the inputs and outputs of fully-connected layers are typically two-dimensional tensors corresponding to the example and feature. The idea behind NiN is to apply a fully-connected layer at each pixel location (for each height and width). If we tie the weights across each spatial location, we could think of this as a convolutional layer (as described in Section 6.4) or as a fully-connected layer acting independently on each pixel location. Another way to view this is to think of each element in the spatial dimension (height and width) as equivalent to an example and a channel as equivalent to a feature.

![nin.svg](attachment:nin.svg)

Fig. 7.3.1 illustrates the main structural differences between VGG and NiN, and their blocks. The NiN block 

In [1]:
#from mxnet import np, npx
#from mxnet.gluon import nn
#from d2l import mxnet as d2l

#npx.set_np()

#def nin_block(num_channels, kernel_size, strides, padding):
 #   blk = nn.Sequential()
  #  blk.add(nn.Conv2D(num_channels, kernel_size, strides, padding,
  #                    activation='relu'),
   #         nn.Conv2D(num_channels, kernel_size=1, activation='relu'),
    #        nn.Conv2D(num_channels, kernel_size=1, activation='relu'))
    #return blk

Error: Can't locate object method "mxnet" via package "import" at reply input line 1.


In [1]:
use strict;
use warnings;
use Data::Dump qw(dump);
use AI::MXNet qw(mx);
use AI::MXNet::Gluon qw(gluon);

In [4]:
sub nin_block{
    my ($num_channels, $kernel_size, $strides, $padding) = @_;
    my $blk = gluon->nn->Sequential();
  #  blk.add(nn.Conv2D(num_channels, kernel_size, strides, padding,
  #                    activation='relu'),
   #         nn.Conv2D(num_channels, kernel_size=1, activation='relu'),
    #        nn.Conv2D(num_channels, kernel_size=1, activation='relu'))
    return $blk;
}

Warning: Subroutine nin_block redefined at reply input line 1.


## 7.3.2. NiN Model
The original NiN network was proposed shortly after AlexNet and clearly draws some inspiration. NiN uses convolutional layers with window shapes of 11x11, 5x5, and 3x3, and the corresponding numbers of output channels are the same as in AlexNet. Each NiN block is followed by a maximum pooling layer with a stride of 2 and a window shape of 3x3.

One significant difference between NiN and AlexNet is that NiN avoids fully-connected layers altogether. Instead, NiN uses an NiN block with a number of output channels equal to the number of label classes, followed by a global average pooling layer, yielding a vector of logits. One advantage of NiN’s design is that it significantly reduces the number of required model parameters. However, in practice, this design sometimes requires increased model training time.

In [2]:
#net = nn.Sequential()
#net.add(nin_block(96, kernel_size=11, strides=4, padding=0),
 #       nn.MaxPool2D(pool_size=3, strides=2),
  #      nin_block(256, kernel_size=5, strides=1, padding=2),
   #     nn.MaxPool2D(pool_size=3, strides=2),
    #    nin_block(384, kernel_size=3, strides=1, padding=1),
     #   nn.MaxPool2D(pool_size=3, strides=2),
      #  nn.Dropout(0.5),
       # # There are 10 label classes
       # nin_block(10, kernel_size=3, strides=1, padding=1),
        # The global average pooling layer automatically sets the window shape
        # to the height and width of the input
        #nn.GlobalAvgPool2D(),
        # Transform the four-dimensional output into two-dimensional output
        # with a shape of (batch size, 10)
        #nn.Flatten())

Error: syntax error at reply input line 1, near "}

#line 1 "reply input"
="
BEGIN not safe after errors--compilation aborted at reply input line 20.



We create a data example to see the output shape of each block.

In [3]:
#X = np.random.uniform(size=(1, 1, 224, 224))
#net.initialize()
#or layer in net:
 #   X = layer(X)
  #  print(layer.name, 'output shape:\t', X.shape)

Error: syntax error at reply input line 1, near "}

#line 1 "reply input"
="
BEGIN not safe after errors--compilation aborted at reply input line 9.



## 7.3.3. Training
As before we use Fashion-MNIST to train the model. NiN’s training is similar to that for AlexNet and VGG.

In [4]:
#lr, num_epochs, batch_size = 0.1, 10, 128
#train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=224)
#d2l.train_ch6(net, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())

## 7.3.4. Summary
NiN uses blocks consisting of a convolutional layer and multiple 
 convolutional layers. This can be used within the convolutional stack to allow for more per-pixel nonlinearity.

NiN removes the fully-connected layers and replaces them with global average pooling (i.e., summing over all locations) after reducing the number of channels to the desired number of outputs (e.g., 10 for Fashion-MNIST).

Removing the fully-connected layers reduces overfitting. NiN has dramatically fewer parameters.

The NiN design influenced many subsequent CNN designs.