<a href="https://colab.research.google.com/github/hongkvu/Senior-Project-CMPE195A-B/blob/main/CoAtNet3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!nvidia-smi

Sun Nov 20 19:02:04 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   47C    P8    12W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
!pip install tensorflow-gpu
!pip install einops
!pip install torch==1.12.1+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tensorflow-gpu
  Downloading tensorflow_gpu-2.11.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (588.3 MB)
[K     |████████████████████████████████| 588.3 MB 6.5 kB/s 
[?25hCollecting tensorflow-estimator<2.12,>=2.11.0
  Downloading tensorflow_estimator-2.11.0-py2.py3-none-any.whl (439 kB)
[K     |████████████████████████████████| 439 kB 72.9 MB/s 
Collecting keras<2.12,>=2.11.0
  Downloading keras-2.11.0-py2.py3-none-any.whl (1.7 MB)
[K     |████████████████████████████████| 1.7 MB 69.1 MB/s 
Collecting tensorboard<2.12,>=2.11
  Downloading tensorboard-2.11.0-py3-none-any.whl (6.0 MB)
[K     |████████████████████████████████| 6.0 MB 67.8 MB/s 
Collecting flatbuffers>=2.0
  Downloading flatbuffers-22.10.26-py2.py3-none-any.whl (26 kB)
Installing collected packages: tensorflow-estimator, tensorboard, keras, flatbuffers, tensorflow-gpu
  Attempting uninstall: te

## What is Einops?
Einops, an abbreviation of Einstein-Inspired Notation for operations is an open-source python framework for writing deep learning code in a new and better way. Einops provides us with new notation & new operations. It is  a flexible and powerful tool to ensure code readability and reliability with minimalist yet powerful API.

Supports numpy, pytorch, tensorflow, jax, and others.

Source: https://analyticsindiamag.com/reinventing-deep-learning-operation-via-einops/
##Example:
from einops import rearrange, reduce, repeat
### Rearrange elements according to the pattern
- output_tensor = rearrange(input_tensor, 't b c -> b c t')

### Combine rearrangement and reduction
- output_tensor = reduce(input_tensor, 'b c (h h2) (w w2) -> b h w c', 'mean', h2=2, w2=2)

### Copy along a new axis 
- output_tensor = repeat(input_tensor, 'h w -> h w c', c=3)

### Example given for einops, but code in other frameworks is almost identical  
````
from torch.nn import Sequential, Conv2d, MaxPool2d, Linear, ReLU
from einops.layers.torch import Rearrange

model = Sequential(
    Conv2d(3, 6, kernel_size=5),
    MaxPool2d(kernel_size=2),
    Conv2d(6, 16, kernel_size=5),
    MaxPool2d(kernel_size=2),
    # flattening
    Rearrange('b c h w -> b (c h w)'),  
    Linear(16*5*5, 120), 
    ReLU(),
    Linear(120, 10), 
)
````



CoAtNet src code


In [4]:
import torch
import torch.nn as nn

from einops import rearrange
from einops.layers.torch import Rearrange

# 	nn.Conv2d : Applies a 2D convolution over an input signal composed of several input planes.

def conv_3x3_bn(inp, oup, image_size, downsample=False):
    stride = 1 if downsample == False else 2
    return nn.Sequential(                               # build neural networks 
        nn.Conv2d(inp, oup, 3, stride, 1, bias=False),  # slide a matrix or filter over 2D data
        #Channel - The colors(?) that transmit the actual information to the receiver. Usually RGB. The colors in an image are created by a mix of Red, Green, and Blue.
        #inp = number of channels in the input image.
        #oup = number of channels produced by the convolution
        #3 = kernel size (3x3 in this case). Kernel is the filter used to extract the features from the images. The matrix that slides over the 2D data. (In 3D, a filter is a collection of kernels)
        #stride = how many pixels the kernel will shift over the image.
        #bias - is like the b parameter in y = mx + b -- Helps model fit the training set properly
        nn.BatchNorm2d(oup),  # Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) 
        # Batch normalization normalizes the activations of the network between layers in batches so that the batches have a mean of 0 and a variance of 1
        nn.GELU() # The Gaussian Error Linear Unit is an activation function.
        # The GELU activation function is xΦ(x) 
        # GELU consistently achieves the lowest test error rate,
        # posing as a promising alternative to ReLU and ELU for a neural network’s activation function.
    )


class PreNorm(nn.Module):
    def __init__(self, dim, fn, norm):
        super().__init__()    # initializes the parent class object into the child class
        # __init__(): allows the class to initialize the attributes of the class
        # super(): function allows us to avoid using the base class name explicitly.
        self.norm = norm(dim)
        # self represents the instance of the class. By using the “self”  we can access the attributes and methods of the class in python. It binds the attributes with the given arguments.
        # norm is what is generally used to evaluate the error of a model
        # norm returns the matrix norm or vector norm of a given tensor
        # dim – Specifies which dimension or dimensions of input to calculate the norm across. 
        # If dim is None, the norm will be calculated across all dimensions of input. 
        # If the norm type indicated by p does not support the specified number of dimensions, an error will occur.
        self.fn = fn

    def forward(self, x, **kwargs):
        return self.fn(self.norm(x), **kwargs)
        #**kwargs allow to pass multiple arguments or keyword arguments to a function. 


# SE Blocks - the Squeeze-Excite process:a modular implementation that can be easily implemented to most Deep CNNs.
# The main idea of an SE Block is: Assign each channel of a feature map a different weightage (excitation) based on how important each channel is (squeeze).
# crucial to the gains in performance
class SE(nn.Module):
    def __init__(self, inp, oup, expansion=0.25):
        super().__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)                 #Sets the output size of the pooling layer to 1x1
        self.fc = nn.Sequential(
            nn.Linear(oup, int(inp * expansion), bias=False),   #applies a linear transformation
            nn.GELU(),        #applies the Gaussian Error Linear Units function
            nn.Linear(int(inp * expansion), oup, bias=False),
            nn.Sigmoid()      # An activation function that takes a value and turns it into a value between 0 and 1 to predict probabilities.
        )

    def forward(self, x):               # define the flow of forward process 
        b, c, _, _ = x.size()
        y = self.avg_pool(x).view(b, c) # Squeeze - perform Global Average Pooling
        y = self.fc(y).view(b, c, 1, 1)
        return x * y


class FeedForward(nn.Module):
    def __init__(self, dim, hidden_dim, dropout=0.):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(dim, hidden_dim), #applies a linear transformation
            nn.GELU(),       #applies the Gaussian Error Linear Units function
            nn.Dropout(dropout),  # takes in the dropout rate
            # to Prevent Neural Networks from Overfitting because it increase accuracy
            nn.Linear(hidden_dim, dim),   #applies a linear transformation
            nn.Dropout(dropout) # takes in the dropout rate
        )
    # define the flow of forward process 
    def forward(self, x):             # x in the forward() method is the input vector.
        return self.net(x)

# A MBConv is a Inverted Linear BottleNeck layer with Depth-Wise Separable Convolution.
class MBConv(nn.Module):
    def __init__(self, inp, oup, image_size, downsample=False, expansion=4):
        super().__init__()
        self.downsample = downsample    # reducing the sampling rate of a signal.
        # A reduction of the feature maps sizes(downsampling) 
        # as we move through the network enables the possibility of reducing the spatial resolution of the feature map.
        stride = 1 if self.downsample == False else 2
        # Stride is a parameter of the neural network's filter that modifies the amount of movement over the image or video. For example, if a neural network's stride is set to 1, the filter will move one pixel, or unit, at a time.
        #If stride = 2, the filter will move two pixels.
        hidden_dim = int(inp * expansion)   # Number of hidden layers self.
        # The hidden dimension is basically the number of nodes in each layer (like in the Multilayer Perceptron for example) 
        # The embedding size tells the size of the feature vector (the model uses embedded words as input)

        if self.downsample:
            self.pool = nn.MaxPool2d(3, 2, 1)   # Applies a 2D max pooling over an input signal composed of several input planes.
            # kernel_size=3, padding=2, dilation=1
            # kernel_size – the size of the window to take a max over
            # stride – the stride of the window. Default value is kernel_size
            # padding – implicit zero padding to be added on both sides
            # dilation – a parameter that controls the stride of elements in the window
            self.proj = nn.Conv2d(inp, oup, 1, 1, 0, bias=False)
            # slide a matrix or filter over 2D data

        if expansion == 1:
            self.conv = nn.Sequential(
                # dw
                nn.Conv2d(hidden_dim, hidden_dim, 3, stride,    # build neural networks that slide a matrix or filter over 2D data
                          1, groups=hidden_dim, bias=False),
                nn.BatchNorm2d(hidden_dim),                     # Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) 
                nn.GELU(),                                      #applies the Gaussian Error Linear Units function
                # pw-linear
                nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False), # build neural networks that slide a matrix or filter over 2D data
                nn.BatchNorm2d(oup),                             # Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) 
            )
        else:
            self.conv = nn.Sequential(
                # pw
                # down-sample in the first conv
                nn.Conv2d(inp, hidden_dim, 1, stride, 0, bias=False),    # build neural networks that slide a matrix or filter over 2D data
                nn.BatchNorm2d(hidden_dim),                              # Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension)
                nn.GELU(),
                # dw
                nn.Conv2d(hidden_dim, hidden_dim, 3, 1, 1,
                          groups=hidden_dim, bias=False),                # build neural networks that slide a matrix or filter over 2D data
                nn.BatchNorm2d(hidden_dim),                              # Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension)
                nn.GELU(),                                               #applies the Gaussian Error Linear Units function
                SE(inp, hidden_dim),
                # pw-linear
                nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),         # build neural networks that slide a matrix or filter over 2D data
                nn.BatchNorm2d(oup),                                     # Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension)
            ) 
        
        self.conv = PreNorm(inp, self.conv, nn.BatchNorm2d)

    def forward(self, x):           # define the flow of forward process
        if self.downsample:
            return self.proj(self.pool(x)) + self.conv(x)
        else:
            return x + self.conv(x)

# multihead attention: The goal is to take an average over the features of multiple elements.
# Allows the model to jointly attend to information from different representation subspaces
# Attention models (or mechanisms) are neural network input processing strategies 
# that allow the network to focus on specific parts of a complicated input one by one until the entire dataset is categorized
class Attention(nn.Module):
    def __init__(self, inp, oup, image_size, heads=8, dim_head=32, dropout=0.):
      # dim_head – Total dimension of the model.
      # heads – Number of parallel attention heads. Note that dim_head will be split across num_heads (i.e. each head will have dimension dim_head // heads).
      # dropout – Dropout probability on attn_output_weights. Default: 0.0 (no dropout).
        super().__init__()
        inner_dim = dim_head * heads
        project_out = not (heads == 1 and dim_head == inp)

        self.ih, self.iw = image_size

        self.heads = heads
        self.scale = dim_head ** -0.5

        # parameter table of relative position bias
        self.relative_bias_table = nn.Parameter(
            torch.zeros((2 * self.ih - 1) * (2 * self.iw - 1), heads))

        coords = torch.meshgrid((torch.arange(self.ih), torch.arange(self.iw)))
        coords = torch.flatten(torch.stack(coords), 1)
        relative_coords = coords[:, :, None] - coords[:, None, :]

        relative_coords[0] += self.ih - 1
        relative_coords[1] += self.iw - 1
        relative_coords[0] *= 2 * self.iw - 1
        relative_coords = rearrange(relative_coords, 'c h w -> h w c')
        relative_index = relative_coords.sum(-1).flatten().unsqueeze(1)
        self.register_buffer("relative_index", relative_index)

        self.attend = nn.Softmax(dim=-1)
        self.to_qkv = nn.Linear(inp, inner_dim * 3, bias=False)

        self.to_out = nn.Sequential(
            nn.Linear(inner_dim, oup),
            nn.Dropout(dropout)
        ) if project_out else nn.Identity()

    def forward(self, x):
        qkv = self.to_qkv(x).chunk(3, dim=-1)
        q, k, v = map(lambda t: rearrange(
            t, 'b n (h d) -> b h n d', h=self.heads), qkv)

        dots = torch.matmul(q, k.transpose(-1, -2)) * self.scale

        # Use "gather" for more efficiency on GPUs
        relative_bias = self.relative_bias_table.gather(
            0, self.relative_index.repeat(1, self.heads))
        relative_bias = rearrange(
            relative_bias, '(h w) c -> 1 c h w', h=self.ih*self.iw, w=self.ih*self.iw)
        dots = dots + relative_bias

        attn = self.attend(dots)
        out = torch.matmul(attn, v)
        out = rearrange(out, 'b h n d -> b n (h d)')
        out = self.to_out(out)
        return out

# Transformer architecture implements an encoder-decoder structure without recurrence and convolutions in order to generate an output.
# A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each part of the input data.
class Transformer(nn.Module):
    def __init__(self, inp, oup, image_size, heads=8, dim_head=32, downsample=False, dropout=0.):
        super().__init__()
        hidden_dim = int(inp * 4)

        self.ih, self.iw = image_size
        self.downsample = downsample

        if self.downsample:
            self.pool1 = nn.MaxPool2d(3, 2, 1)
            self.pool2 = nn.MaxPool2d(3, 2, 1)
            self.proj = nn.Conv2d(inp, oup, 1, 1, 0, bias=False)

        self.attn = Attention(inp, oup, image_size, heads, dim_head, dropout)     # attention technique
        #inp = number of channels in the input image.
        #oup = number of channels produced by the convolution
        #dim = dimension per head
        self.ff = FeedForward(oup, hidden_dim, dropout)
        # feed-forward neural networks: used in practice to transform the attention vectors into a digestible form by the next encoder/decoder block

        self.attn = nn.Sequential(
            Rearrange('b c ih iw -> b (ih iw) c'),
            PreNorm(inp, self.attn, nn.LayerNorm),
            Rearrange('b (ih iw) c -> b c ih iw', ih=self.ih, iw=self.iw)
        )

        self.ff = nn.Sequential(
            Rearrange('b c ih iw -> b (ih iw) c'),  # # concatenated images along horizontal axis
            # einops.rearrange is a reader-friendly smart element reordering for multidimensional tensors. 
            # This operation includes functionality of transpose (axes permutation), reshape (view), squeeze, unsqueeze, stack, concatenate and other operations.
            PreNorm(oup, self.ff, nn.LayerNorm),
            # nn.LayerNorm - Applies Layer Normalization over a mini-batch of inputs as described in the paper Layer Normalization
            Rearrange('b (ih iw) c -> b c ih iw', ih=self.ih, iw=self.iw)   # space-to-depth operation
            # ih, iw = image_size
        )

    def forward(self, x):
        if self.downsample:
            x = self.proj(self.pool1(x)) + self.attn(self.pool2(x))
        else:
            x = x + self.attn(x)
        x = x + self.ff(x)
        return x


class CoAtNet(nn.Module):
    def __init__(self, image_size, in_channels, num_blocks, channels, num_classes=1000, block_types=['C', 'C', 'T', 'T']):
        super().__init__()
        ih, iw = image_size
        block = {'C': MBConv, 'T': Transformer}

        self.s0 = self._make_layer(
            conv_3x3_bn, in_channels, channels[0], num_blocks[0], (ih // 2, iw // 2))
        self.s1 = self._make_layer(
            block[block_types[0]], channels[0], channels[1], num_blocks[1], (ih // 4, iw // 4))
        self.s2 = self._make_layer(
            block[block_types[1]], channels[1], channels[2], num_blocks[2], (ih // 8, iw // 8))
        self.s3 = self._make_layer(
            block[block_types[2]], channels[2], channels[3], num_blocks[3], (ih // 16, iw // 16))
        self.s4 = self._make_layer(
            block[block_types[3]], channels[3], channels[4], num_blocks[4], (ih // 32, iw // 32))

        self.pool = nn.AvgPool2d(ih // 32, 1)
        self.fc = nn.Linear(channels[-1], num_classes, bias=False)

    def forward(self, x):
        x = self.s0(x)
        x = self.s1(x)
        x = self.s2(x)
        x = self.s3(x)
        x = self.s4(x)

        x = self.pool(x).view(-1, x.shape[1])
        x = self.fc(x)
        return x

    def _make_layer(self, block, inp, oup, depth, image_size):
        layers = nn.ModuleList([])
        for i in range(depth):
            if i == 0:
                layers.append(block(inp, oup, image_size, downsample=True))
            else:
                layers.append(block(oup, oup, image_size))
        return nn.Sequential(*layers)


def coatnet_0():
    num_blocks = [2, 2, 3, 5, 2]            # L
    channels = [64, 96, 192, 384, 768]      # D
    return CoAtNet((224, 224), 3, num_blocks, channels, num_classes=1000)


def coatnet_1():
    num_blocks = [2, 2, 6, 14, 2]           # L
    channels = [64, 96, 192, 384, 768]      # D
    return CoAtNet((224, 224), 3, num_blocks, channels, num_classes=1000)


def coatnet_2():
    num_blocks = [2, 2, 6, 14, 2]           # L
    channels = [128, 128, 256, 512, 1026]   # D
    return CoAtNet((224, 224), 3, num_blocks, channels, num_classes=1000)


def coatnet_3():
    num_blocks = [2, 2, 6, 14, 2]           # L
    channels = [192, 192, 384, 768, 1536]   # D
    return CoAtNet((224, 224), 3, num_blocks, channels, num_classes=1000)


def coatnet_4():
    num_blocks = [2, 2, 12, 28, 2]          # L
    channels = [192, 192, 384, 768, 1536]   # D
    return CoAtNet((224, 224), 3, num_blocks, channels, num_classes=1000)



#PyTorch doesn't have a function to calculate the total number of parameters as Keras does, 
#but it's possible to sum the number of elements for every parameter group
#trainable parameters
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)


if __name__ == '__main__':
    img = torch.randn(1, 3, 224, 224)

    net = coatnet_0()
    out = net(img)
    print(out.shape, count_parameters(net))

    net = coatnet_1()
    out = net(img)
    print(out.shape, count_parameters(net))

    net = coatnet_2()
    out = net(img)
    print(out.shape, count_parameters(net))

    net = coatnet_3()
    out = net(img)
    print(out.shape, count_parameters(net))

    net = coatnet_4()
    out = net(img)
    print(out.shape, count_parameters(net))

  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]


torch.Size([1, 1000]) 17789624
torch.Size([1, 1000]) 33170624
torch.Size([1, 1000]) 55767564
torch.Size([1, 1000]) 117724480
torch.Size([1, 1000]) 203960368


Pytorch

Pytorch is an open source machine learning framework that accelerates the path from research prototyping to production deployment.

- Pytorch.nn: The nn package defines a set of Modules, which you can think of as a neural network layer that has produces output from input and may have some trainable weights.
  - nn documents: https://pytorch.org/docs/stable/nn.html
  - Container 
    - nn.Module: Base class for all neural network modules.
      - Example: 
            class Model(nn.Module):
              def __init__(self):
              super().__init__()
              self.conv1 = nn.Conv2d(1, 20, 5)
              self.conv2 = nn.Conv2d(20, 20, 5)
      - Modules can also contain other Modules, allowing to nest them in a tree structure: 
            module (Module) – child module to be added to the module
    - nn.Sequential: A sequential container.
      - Class torch.nn.Sequential(*args) 
      - Modules will be added to it in the order they are passed in the constructor. It accepts any input and forwards it to the first module it contains. It then “chains” outputs to inputs sequentially for each subsequent module, finally returning the output of the last module.
      - Example: 
            model = nn.Sequential(
                      nn.Conv2d(1,20,5),
                      nn.ReLU(),
                      nn.Conv2d(20,64,5),
                      nn.ReLU()
                    )
      -> Run `model`: input will first be passed to `Conv2d(1,20,5)` -> output of `Conv2d(1,20,5)` is used as input to the first `ReLU` -> output of the first `ReLU` becomes input for `Conv2d(20,64,5)` -> output of `Conv2d(20,64,5)` is used as input to the second `ReLU`.
  - Convolution Layers (https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md)
    - nn.Conv2d: Applies a 2D convolution over an input signal composed of several input planes.
    - Constructor: torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)
      - stride: stride for the cross-correlation, a single number or a tuple.
      - padding: amount of padding applied to the input. 
      - dilation: spacing between the kernel points.
      - groups: connections between inputs and outputs.
        - 1: all inputs are convolved to all outputs.
        - 2: the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels and producing half the output channels, and both subsequently concatenated.
        - in_channels: each input channel is convolved with its own set of filters  
  - Non-linear Activations
    - nn.ReLU: Applies the rectified linear unit function element-wise: 
            ReLU(x) = (x)+ = max(0,x)
      - Constructor: CLASStorch.nn.ReLU(inplace=False)
        - inplace – can optionally do the operation in-place. Default: False
  - Normalization Layers
    - nn.BatchNorm2d: Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension).
    - Constructor: CLASS torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True, device=None, dtype=None)
      - num_features – CC from an expected input of size (N, C, H, W)(N,C,H,W)
      - eps – a value added to the denominator for numerical stability.
        - Default: 1e-5
      - momentum – the value used for the running_mean and running_var computation. 
        - None for cumulative moving average 
        - Default: 0.1
      - affine – a boolean value. 
        - True: this module has learnable affine parameters. Default: True
      - track_running_stats – a boolean value. Default: True
        - True: this module tracks the running mean and variance,
        - False: this module does not track such statistics 
    - read: https://arxiv.org/abs/1502.03167


- torch.randn (*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)
  - Returns a tensor filled with random numbers from a normal distribution with mean 0 and variance 1 (standard normal distribution).
  - Parameters
    - size - sequence of integers defining the size of the output tensor. Can be a variable number of arguments or a collection like a list or tuple.
    - out (Tensor, optional) – the output tensor.
    - dtype (torch.dtype, optional) – the desired data type of returned tensor. 
      - Default: None - uses a global default.
    - layout (torch.layout, optional) – the desired layout of returned Tensor. 
      - Default: torch.strided.
    - device (torch.device, optional) 
      - the desired device of returned tensor. 
      - Default: None - uses the current device for the default tensor type. device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
    - requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: False.

In [5]:
import torch
x=torch.randn(1,3,224,224)
print(x)
x.shape

tensor([[[[-0.6031, -0.0715,  0.4560,  ...,  0.6588,  0.3715,  0.0946],
          [-0.8579,  0.5750, -1.5404,  ..., -0.0470,  0.2369,  1.0303],
          [ 1.9761,  2.4017, -0.3617,  ..., -1.0917, -0.7284,  1.9849],
          ...,
          [ 0.0400, -0.0350,  0.1565,  ..., -1.5333, -0.3082, -0.1381],
          [ 0.0855, -0.2970, -0.0382,  ...,  0.5885, -0.1948,  0.6803],
          [ 0.0684, -0.5336,  1.0513,  ...,  0.8326, -0.9815,  0.0669]],

         [[ 0.4844, -0.4957, -0.9447,  ...,  2.4915,  0.2813, -0.3474],
          [ 1.2042, -0.9704, -0.0366,  ..., -0.4432, -1.4183,  1.5103],
          [ 0.3570,  0.5656, -1.5836,  ..., -0.6628,  0.7694, -0.0245],
          ...,
          [ 1.1936,  0.1818,  0.4765,  ...,  1.7471,  1.3221, -0.7113],
          [ 0.5116, -1.1576,  0.4889,  ..., -2.1407,  1.2090, -0.9247],
          [ 0.2846,  0.4019,  0.7350,  ...,  0.8465,  1.1917,  0.3084]],

         [[-1.7332, -0.3648, -0.0946,  ..., -1.3945, -0.3625,  0.2694],
          [-2.3250, -0.0787,  

torch.Size([1, 3, 224, 224])

In [6]:
import torch
x=torch.randn(2,4)
print(x)
x.shape

tensor([[ 0.1443, -1.7385, -0.6709, -0.2790],
        [-0.7603, -0.5989, -1.0556, -0.8986]])


torch.Size([2, 4])

In [7]:
import torch
x=torch.randn(2,4,5)
print(x)
x.shape

tensor([[[-0.7910, -0.6144, -0.3438, -2.4132,  0.0048],
         [-1.0813, -1.2518,  0.2687, -0.4000, -0.9335],
         [-1.0828, -0.2219, -0.4348, -0.5639, -0.0440],
         [-0.7927, -0.7757, -0.4294, -1.0962,  0.0494]],

        [[-0.6210, -1.3430,  0.4268,  1.4845,  0.7278],
         [ 0.6526, -0.4841, -0.1213,  0.7235,  0.8378],
         [ 2.1596, -0.9474, -1.5652, -1.3992, -0.4536],
         [ 0.6750,  0.7097, -0.3789, -0.4911, -0.9581]]])


torch.Size([2, 4, 5])



```
def conv_3x3_bn(inp, oup, image_size, downsample=False): stride = 1 if downsample == False else 2
    return nn.Sequential(
        nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
        nn.BatchNorm2d(oup),
        nn.GELU()
    )
```



**nn.Sequential()**

Class that allows us to build neural networks on the fly without having to define an explicit class.

  **nn.Conv2d(inp, oup, 3, stride, 1, bias=False)**

  2D Convolution - We slide a matrix or filter over 2D data (an image turned into a grid of numbers) and perform element-wise multiplication with the data, then sum up the multiplication result to produce an output. We move the filter in strides until we get the final output matrix of the 2D convolution operation.

  Channel - The colors(?) that transmit the actual information to the receiver. Usually RGB. The colors in an image are created by a mix of Red, Green, and Blue.

  inp = number of channels in the input image. 

  oup = number of channels produced by the convolution

  3 = kernel size (3x3 in this case). Kernel is the filter used to extract the features from the images. The matrix that slides over the 2D data. (In 3D, a filter is a collection of kernels)

  stride = how many pixels the kernel will shift over the image.
  
  bias - is like the *b* parameter in *y = mx + b* -- Helps model fit the training set properly

  **nn.BatchNorm2d(oup)** 
  is needed to normalize the output so the data will be on the same scale. If one feature is too large, this feature will drown out the smaller feature. During gradient descent, the neural network will have to make a large update to one weight compared to the other weight. It can cause the gradient descent trajectory to oscillate back and forth, thus taking more steps to reach the minimum.

  **nn.GELU**

  Gaussian Error Linear Units function. An activation function like sigmoid and RELU



---

```
class SE(nn.Module):
    def __init__(self, inp, oup, expansion=0.25):
        super().__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
            nn.Linear(oup, int(inp * expansion), bias=False),
            nn.GELU(),
            nn.Linear(int(inp * expansion), oup, bias=False),
            nn.Sigmoid()
        )
```

**nn.AdaptiveAvgPool2d(1)**

Sets the output size of the pooling layer to the specified size. (1x1 in this case) The stride and kernel-size are automatically selected to get this output size.

Pooling compresses and generalizes the features in the feature map.

**nn.Linear()**

**nn.Sigmoid()**

An activation function that takes a value and turns it into a value between 0 and 1 to predict probabilities.
  




Import libraries

In [8]:
#os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

In [9]:
import torch.optim as optim #optimizer, will hold the current state and will update the parameters based on the computed gradients
import torch
import torch.nn as nn #neural network framework
import torch.nn.parallel 
import torch.utils.data
import torch.utils.data.distributed
import torchvision.transforms as transforms #common image transformation
from torch.autograd import Variable #automatic differentiation

Set global parameters

In [10]:
modellr = 1e-4
BATCH_SIZE = 16 #the dataset is divided into batches, each of 16 
EPOCHS = 100 #number of times the entire dataset is passed through the neural network
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')#the GPU device that operations will be on

Data Preprocessing

In [11]:
#modifying images (resize, convert, normalize)  
transform = transforms.Compose([ #chain the transformation together
    transforms.Resize((224, 224)), #resize image
    transforms.ToTensor(), #convert image to a tensor (matrix)
    transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]) #normalize with (mean, standard deviation)

])
transform_test = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
])

Fetch the data

In [12]:
# coding:utf8
import os
from PIL import Image
from torch.utils import data
from torchvision import transforms as T
from sklearn.model_selection import train_test_split


Labels = {
     'daisy': 0, 'dandelion': 1, 'rose': 2, 'sunflower': 3, 'tulip':4
}
 
class FlowerData (data.Dataset):
 
    def __init__(self, root, transforms=None, train=True, test=False):
        """
        Main objective: to obtain the addresses of all pictures and divide the data according to training, verification and test
        """
        self.test = test
        self.transforms = transforms
 
        if self.test:
            imgs = [os.path.join(root, img) for img in os.listdir(root)]
            self.imgs = imgs
        else:
            imgs_labels = [os.path.join(root, img) for img in os.listdir(root)]
            imgs = []
            for imglable in imgs_labels:
                for imgname in os.listdir(imglable):
                    imgpath = os.path.join(imglable, imgname)
                    imgs.append(imgpath)
            trainval_files, val_files = train_test_split(imgs, test_size=0.3, random_state=42)
            if train:
                self.imgs = trainval_files
            else:
                self.imgs = val_files
 
    def __getitem__(self, index):
        """
        Returns the data of one picture at a time
        """
        img_path = self.imgs[index]
        img_path=img_path.replace("\\",'/')
        if self.test:
            label = -1
        else:
            labelname = img_path.split('/')[-2]
            label = Labels[labelname]
        data = Image.open(img_path).convert('RGB')
        data = self.transforms(data)
        return data, label
 
    def __len__(self):
        return len(self.imgs)

In [13]:
# split-data
!pip install split-folders
import splitfolders
import pathlib
data = "drive/My Drive/195A+BSeniorProjectGroupWorks/Datasets/Flower_Dataset/flowers"
data_splitted = "drive/My Drive/195A+BSeniorProjectGroupWorks/Datasets/Flower_Dataset/flowers_train_test_set"
# splitfolders.ratio(data, data_splitted, seed=1337, ratio=(.8, .1, .1)) 

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting split-folders
  Downloading split_folders-0.5.1-py3-none-any.whl (8.4 kB)
Installing collected packages: split-folders
Successfully installed split-folders-0.5.1


In [14]:
# Read data
# Suggestion: automates image look-up/download from the web
dataset_train = FlowerData(pathlib.Path(data_splitted)/"train", transforms=transform, train=True)
dataset_test = FlowerData(pathlib.Path(data_splitted)/"val", transforms=transform_test, train=False)

In [15]:
# Import data
train_loader = torch.utils.data.DataLoader(dataset_train, batch_size=BATCH_SIZE, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset_test, batch_size=BATCH_SIZE, shuffle=False)

Set model

In [16]:
# Instantiate the model and move to the GPU
criterion = nn.CrossEntropyLoss()

model_ft = coatnet_3()
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 200) #change the 2nd arg according to the number of Labels
#TODO: look into why changing this arg fixes the error
model_ft.to(DEVICE)#move the model to GPU
# Choose the simple and violent Adam optimizer to reduce the learning rate
optimizer = optim.Adam(model_ft.parameters(), lr=modellr)
cosine_schedule = optim.lr_scheduler.CosineAnnealingLR(optimizer=optimizer,T_max=20,eta_min=1e-9)

In [17]:
# Define training process

def train(model, device, train_loader, optimizer, epoch):
    model.train()#set the mode to train mode, not actually training using this function
    sum_loss = 0
    total_num = len(train_loader.dataset)
    print(total_num, len(train_loader))
    for batch_idx, (data, target) in enumerate(train_loader):#enumrate is a counter for the loop
        data, target = Variable(data).to(device), Variable(target).to(device)#send data
        output = model(data)
        loss = criterion(output, target)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        print_loss = loss.data.item()
        sum_loss += print_loss
        if (batch_idx + 1) % 10 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, (batch_idx + 1) * len(data), len(train_loader.dataset),
                       100. * (batch_idx + 1) / len(train_loader), loss.item()))
    ave_loss = sum_loss / len(train_loader)
    print('epoch:{},loss:{}'.format(epoch, ave_loss))


# Verification process
def val(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    total_num = len(test_loader.dataset)
    print(total_num, len(test_loader))
    with torch.no_grad():
        for data, target in test_loader:
            data, target = Variable(data).to(device), Variable(target).to(device)
            output = model(data)
            loss = criterion(output, target)
            _, pred = torch.max(output.data, 1)
            correct += torch.sum(pred == target)
            print_loss = loss.data.item()
            test_loss += print_loss
        correct = correct.data.item()
        acc = correct / total_num
        avgloss = test_loss / len(test_loader)
        print('\nVal set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
            avgloss, correct, len(test_loader.dataset), 100 * acc))


# train

for epoch in range(1, EPOCHS + 1):
    train(model_ft, DEVICE, train_loader, optimizer, epoch)
    cosine_schedule.step()
    val(model_ft, DEVICE, test_loader)


2444 153
epoch:1,loss:1.6840852779500626
131 9

Val set: Average loss: 1.0671, Accuracy: 83/131 (63%)

2444 153
epoch:2,loss:0.9739675545224956
131 9

Val set: Average loss: 1.2341, Accuracy: 86/131 (66%)

2444 153
epoch:3,loss:0.8791875508096483
131 9

Val set: Average loss: 0.9557, Accuracy: 92/131 (70%)

2444 153
epoch:4,loss:0.769726320565526
131 9

Val set: Average loss: 0.9836, Accuracy: 96/131 (73%)

2444 153
epoch:5,loss:0.6089936103793531
131 9

Val set: Average loss: 1.0189, Accuracy: 91/131 (69%)

2444 153
epoch:6,loss:0.5428837967513044
131 9

Val set: Average loss: 0.9593, Accuracy: 88/131 (67%)

2444 153
epoch:7,loss:0.3786980456922179
131 9

Val set: Average loss: 1.0851, Accuracy: 94/131 (72%)

2444 153
epoch:8,loss:0.29092467305999176
131 9

Val set: Average loss: 1.3862, Accuracy: 96/131 (73%)

2444 153
epoch:9,loss:0.22106133495134855
131 9

Val set: Average loss: 0.9964, Accuracy: 99/131 (76%)

2444 153
epoch:10,loss:0.11575196952759928
131 9

Val set: Average loss:

In [18]:
torch.save(model_ft, 'model.pth')

Test

In [19]:
classes = (
     'daisy', 'dandelion', 'rose', 'sunflower', 'tulip'
)
     
transform_test = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
])

In [20]:
#load the model and put model in DEVICE
DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = torch.load("model.pth")
model.eval()
model.to(DEVICE)

CoAtNet(
  (s0): Sequential(
    (0): Sequential(
      (0): Conv2d(3, 192, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): GELU(approximate=none)
    )
    (1): Sequential(
      (0): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): GELU(approximate=none)
    )
  )
  (s1): Sequential(
    (0): MBConv(
      (pool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (proj): Conv2d(192, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (conv): PreNorm(
        (norm): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fn): Sequential(
          (0): Conv2d(192, 768, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (1): BatchNorm2d(768, eps=1e-05, momentum=

In [21]:
#Read picture and predict category of picture
path = 'drive/MyDrive/195A+BSeniorProjectGroupWorks/Datasets/test(christy)/'
testList = os.listdir(path)
for file in testList:
    img = Image.open(path + file)
    img = transform_test(img)
    img.unsqueeze_(0)
    img = Variable(img).to(DEVICE)
    out = model(img)
    # Predict
    _, pred = torch.max(out.data, 1)
    print('Image Name: {}, predict: {}'.format(file, classes[pred.data.item()]))

Image Name: daisy.jpg, predict: daisy
Image Name: rose.jpg, predict: rose
Image Name: sunflower.jpg, predict: sunflower
Image Name: tulip.jpg, predict: tulip
Image Name: daisy2.jpg, predict: daisy
Image Name: daisy3.jpg, predict: daisy
Image Name: rose2.jpg, predict: rose
Image Name: rose3.jpg, predict: rose
Image Name: sunflower2.jpg, predict: sunflower
Image Name: sunflower3.jpg, predict: sunflower
Image Name: tulip2.jpg, predict: tulip
Image Name: tulip3.jpg, predict: tulip
Image Name: dandelion.jpg, predict: dandelion
Image Name: dandelion2.jpg, predict: dandelion
Image Name: dandelion3.jpg, predict: dandelion
Image Name: sunflower_dl.jpeg, predict: sunflower


In [22]:
#testing useing test set
path = 'drive/MyDrive/195A+BSeniorProjectGroupWorks/Datasets/Flower_Dataset/flowers_train_test_set/test/daisy/'
testList = os.listdir(path)
for file in testList:
    img = Image.open(path + file)
    img = transform_test(img)
    img.unsqueeze_(0)
    img = Variable(img).to(DEVICE)
    out = model(img)
    # Predict
    _, pred = torch.max(out.data, 1)
    print('Image Name: {}, predict: {}'.format(file, classes[pred.data.item()]))

Image Name: 13583238844_573df2de8e_m.jpg, predict: daisy
Image Name: 1299501272_59d9da5510_n.jpg, predict: daisy
Image Name: 1286274236_1d7ac84efb_n.jpg, predict: tulip
Image Name: 10466290366_cc72e33532.jpg, predict: dandelion
Image Name: 14114116486_0bb6649bc1_m.jpg, predict: daisy
Image Name: 14087947408_9779257411_n.jpg, predict: daisy
Image Name: 10559679065_50d2b16f6d.jpg, predict: daisy
Image Name: 105806915_a9c13e2106_n.jpg, predict: daisy
Image Name: 14219214466_3ca6104eae_m.jpg, predict: daisy
Image Name: 34543119581_1fb7e0bd7f_n.jpg, predict: sunflower
Image Name: 2346726545_2ebce2b2a6.jpg, predict: daisy
Image Name: 34531542152_c8ba2e0fea_n.jpg, predict: daisy
Image Name: 34287492780_6dab677857_n.jpg, predict: dandelion
Image Name: 519880292_7a3a6c6b69.jpg, predict: tulip
Image Name: 147068564_32bb4350cc.jpg, predict: daisy
Image Name: 521762040_f26f2e08dd.jpg, predict: daisy
Image Name: 15760153042_a2a90e9da5_m.jpg, predict: daisy
Image Name: 34637970155_a2b917077c_n.jpg, 

In [23]:
#testing useing test set
path = 'drive/MyDrive/195A+BSeniorProjectGroupWorks/Datasets/Flower_Dataset/flowers_train_test_set/test/tulip/'
testList = os.listdir(path)
for file in testList:
    img = Image.open(path + file)
    img = transform_test(img)
    img.unsqueeze_(0)
    img = Variable(img).to(DEVICE)
    out = model(img)
    # Predict
    _, pred = torch.max(out.data, 1)
    print('Image Name: {}, predict: {}'.format(file, classes[pred.data.item()]))

Image Name: 113291410_1bdc718ed8_n.jpg, predict: tulip
Image Name: 14069914777_b10856aaf7_n.jpg, predict: dandelion
Image Name: 12916441224_2ed63596f8_n.jpg, predict: tulip
Image Name: 16677542612_a78a8ca8b3_m.jpg, predict: tulip
Image Name: 14861513337_4ef0bfa40d.jpg, predict: tulip
Image Name: 13530786873_0d34880300_n.jpg, predict: tulip
Image Name: 14009216519_b608321cf2_n.jpg, predict: tulip
Image Name: 155097272_70feb13184.jpg, predict: tulip
Image Name: 13514131694_d91da4f4fc.jpg, predict: dandelion
Image Name: 17324469461_2b318aff8d_m.jpg, predict: tulip
Image Name: 142235017_07816937c6.jpg, predict: tulip
Image Name: 19317019453_8b24740faf_n.jpg, predict: tulip
Image Name: 17189456156_6fc1067831.jpg, predict: tulip
Image Name: 112951086_150a59d499_n.jpg, predict: tulip
Image Name: 133692329_c1150ed811_n.jpg, predict: tulip
Image Name: 3502632842_791dd4be18_n.jpg, predict: tulip
Image Name: 17291908295_dc7d45ae9b_n.jpg, predict: tulip
Image Name: 112428665_d8f3632f36_n.jpg, pred

In [24]:
#testing useing test set
path = 'drive/MyDrive/195A+BSeniorProjectGroupWorks/Datasets/Flower_Dataset/flowers_train_test_set/test/rose/'
testList = os.listdir(path)
for file in testList:
    img = Image.open(path + file)
    img = transform_test(img)
    img.unsqueeze_(0)
    img = Variable(img).to(DEVICE)
    out = model(img)
    # Predict
    _, pred = torch.max(out.data, 1)
    print('Image Name: {}, predict: {}'.format(file, classes[pred.data.item()]))

Image Name: 14154164774_3b39d36778.jpg, predict: rose
Image Name: 15818051327_9a44bf6244_n.jpg, predict: rose
Image Name: 18464075576_4e496e7d42_n.jpg, predict: tulip
Image Name: 22506717337_0fd63e53e9.jpg, predict: rose
Image Name: 14870567200_80cda4362e_n.jpg, predict: tulip
Image Name: 15965652160_de91389965_m.jpg, predict: rose
Image Name: 18490508225_0fc630e963_n.jpg, predict: dandelion
Image Name: 1775233884_12ff5a124f.jpg, predict: rose
Image Name: 18486124712_17ebe7559b_n.jpg, predict: rose
Image Name: 123128873_546b8b7355_n.jpg, predict: rose
Image Name: 20622485918_90fc000c86_n.jpg, predict: dandelion
Image Name: 14414117598_cf70df30de.jpg, predict: rose
Image Name: 14172324538_2147808483_n.jpg, predict: rose
Image Name: 12238827553_cf427bfd51_n.jpg, predict: rose
Image Name: 15699509054_d3e125286f_n.jpg, predict: rose
Image Name: 17305246720_1866d6303b.jpg, predict: rose
Image Name: 14687731322_5613f76353.jpg, predict: rose
Image Name: 1461381091_aaaa663bbe_n.jpg, predict: r

In [25]:
#testing useing test set
path = 'drive/MyDrive/195A+BSeniorProjectGroupWorks/Datasets/Flower_Dataset/flowers_train_test_set/test/dandelion/'
testList = os.listdir(path)
for file in testList:
    img = Image.open(path + file)
    img = transform_test(img)
    img.unsqueeze_(0)
    img = Variable(img).to(DEVICE)
    out = model(img)
    # Predict
    _, pred = torch.max(out.data, 1)
    print('Image Name: {}, predict: {}'.format(file, classes[pred.data.item()]))

Image Name: 14362539701_cf19e588ca.jpg, predict: dandelion
Image Name: 14396023703_11c5dd35a9.jpg, predict: sunflower
Image Name: 14021281124_89cc388eac_n.jpg, predict: dandelion
Image Name: 17570530696_6a497298ee_n.jpg, predict: dandelion
Image Name: 18996965033_1d92e5c99e.jpg, predict: dandelion
Image Name: 17276354745_2e312a72b5_n.jpg, predict: dandelion
Image Name: 11768468623_9399b5111b_n.jpg, predict: dandelion
Image Name: 142813254_20a7fd5fb6_n.jpg, predict: dandelion
Image Name: 14457225751_645a3784fd_n.jpg, predict: dandelion
Image Name: 18010259565_d6aae33ca7_n.jpg, predict: dandelion
Image Name: 14171812905_8b81d50eb9_n.jpg, predict: dandelion
Image Name: 18479635994_83f93f4120.jpg, predict: dandelion
Image Name: 15819121091_26a5243340_n.jpg, predict: dandelion
Image Name: 14761980161_2d6dbaa4bb_m.jpg, predict: dandelion
Image Name: 1297972485_33266a18d9.jpg, predict: sunflower
Image Name: 14368895004_c486a29c1e_n.jpg, predict: dandelion
Image Name: 138132145_782763b84f_m.jp

In [26]:
#testing useing test set
path = 'drive/MyDrive/195A+BSeniorProjectGroupWorks/Datasets/Flower_Dataset/flowers_train_test_set/test/sunflower/'
testList = os.listdir(path)
for file in testList:
    img = Image.open(path + file)
    img = transform_test(img)
    img.unsqueeze_(0)
    img = Variable(img).to(DEVICE)
    out = model(img)
    # Predict
    _, pred = torch.max(out.data, 1)
    print('Image Name: {}, predict: {}'.format(file, classes[pred.data.item()]))

Image Name: 12323859023_447387dbf0_n.jpg, predict: dandelion
Image Name: 10541580714_ff6b171abd_n.jpg, predict: sunflower
Image Name: 12282924083_fb80aa17d4_n.jpg, predict: sunflower
Image Name: 3946535195_9382dcb951_n.jpg, predict: dandelion
Image Name: 2807106374_f422b5f00c.jpg, predict: rose
Image Name: 3146795631_d062f233c1.jpg, predict: sunflower
Image Name: 20704967595_a9c9b8d431.jpg, predict: sunflower
Image Name: 14646282112_447cc7d1f9.jpg, predict: sunflower
Image Name: 1484598527_579a272f53.jpg, predict: dandelion
Image Name: 3001536784_3bfd101b23_n.jpg, predict: sunflower
Image Name: 14858674096_ed0fc1a130.jpg, predict: sunflower
Image Name: 20621698991_dcb323911d.jpg, predict: sunflower
Image Name: 14921668662_3ffc5b9db3_n.jpg, predict: sunflower
Image Name: 15218421476_9d5f38e732_m.jpg, predict: sunflower
Image Name: 22183521655_56221bf2a4_n.jpg, predict: sunflower
Image Name: 4664767140_fe01231322_n.jpg, predict: sunflower
Image Name: 23356825566_f5885875f2.jpg, predict: 

Note: larger dataset yields worse result
Why?
*Figure this out and make it a novelty point* 

Priority: expand dataset

In [27]:
# # Show input and output of a directly uploaded picture
# from google.colab import files
# uploaded = files.upload()
# # Show the image
# from matplotlib import pyplot as plt
# input = plt.imread("")
# img = plt.imshow(input)
# # Resize the image
# from skimage.transform import resize
# resized_img = resize(input, (32,32,3))
# img = plt.imshow(resized_img)
# # Get model predictions
# img = transform_test(img)
# img.unsqueeze_(0)
# out = model(img)
# _, pred = torch.max(out.data, 1)
# print('Image Name: {}, predict: {}'.format(file, classes[pred.data.item()]))
