### This notebook is optionally accelerated with a GPU runtime.
### If you would like to use this acceleration, please select the menu option "Runtime" -> "Change runtime type", select "Hardware Accelerator" -> "GPU" and click "SAVE"

----------------------------------------------------------------------

# Inception_v3

*Author: Pytorch Team*

**Also called GoogleNetv3, a famous ConvNet trained on Imagenet from 2015**

<img src="https://pytorch.org/assets/images/inception_v3.png" alt="alt" width="50%"/>

In [40]:
import torch
import torch.nn as nn
from torchsummary import summary

def calculate_flops_hook(module, input, output):
    
    flops = 0
    input = input[0]
    batch_size = input.shape[0]
    # 
    if hasattr(module, 'weight'):
        output_channels = output.shape[1]
        output_size = output.numel() // (batch_size * output_channels)
        flops = module.weight.numel() * output_size * 2
        module.total_flops += flops

    elif isinstance(module, nn.BatchNorm2d):
        flops = input.numel() * 2  # mean and variance
        module.total_flops += flops

    elif isinstance(module, nn.ReLU) or isinstance(module, nn.ReLU6):
        flops = input.numel()
        module.total_flops += flops

    elif isinstance(module, nn.MaxPool2d):
        kernel_size = module.kernel_size if isinstance(module.kernel_size, tuple) else (module.kernel_size, module.kernel_size)
        flops = batch_size * input.shape[1] * (input.shape[2] // kernel_size[0]) * (input.shape[3] // kernel_size[1]) * (kernel_size[0] * kernel_size[1])
        module.total_flops += flops

    elif isinstance(module, nn.AdaptiveAvgPool2d):
        output_height, output_width = output.shape[2:]
        output_channels = output.shape[1]
        flops = batch_size * output_channels * output_height * output_width
        module.total_flops += flops

    elif isinstance(module, nn.Conv2d):
        
        output_channels = output.shape[1]
        output_size = output.numel() // (batch_size * output_channels)
        # Convolutional layer
        kernel_ops = module.kernel_size[0] ** 2
        in_channels = module.in_channels
        groups = module.groups
        out_channels = module.out_channels
        convolutions_per_sample = output_size * kernel_ops * in_channels // groups
        flops = convolutions_per_sample * groups * out_channels
        
    # elif isinstance(module, nn.Linear):
    #     # Fully connected layer
    #     weight_ops = module.weight.numel()
    #     flops = weight_ops * batch_size
    
    module.total_flops += flops

    

def add_flops_counter_to_model(model):
    for module in model.modules():
        module.register_buffer('total_flops', torch.zeros(1, dtype=torch.float64))
        module.register_forward_hook(calculate_flops_hook)

model = torch.hub.load('pytorch/vision:v0.10.0', 'inception_v3', pretrained=True)

add_flops_counter_to_model(model)
      
total_flops = 0
summary(model, (3, 299, 299))

for module in model.modules():
    if hasattr(module, 'total_flops'):
        total_flops += module.total_flops.item()
        print(f'{module.__class__.__name__}: {module.total_flops.item() / 1e9:.3f} GFLOPs')

print(f'Total FLOPs: {total_flops  / 1e9:.3f} GFLOPs')


Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0


----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 32, 149, 149]             864
       BatchNorm2d-2         [-1, 32, 149, 149]              64
       BasicConv2d-3         [-1, 32, 149, 149]               0
            Conv2d-4         [-1, 32, 147, 147]           9,216
       BatchNorm2d-5         [-1, 32, 147, 147]              64
       BasicConv2d-6         [-1, 32, 147, 147]               0
            Conv2d-7         [-1, 64, 147, 147]          18,432
       BatchNorm2d-8         [-1, 64, 147, 147]             128
       BasicConv2d-9         [-1, 64, 147, 147]               0
        MaxPool2d-10           [-1, 64, 73, 73]               0
           Conv2d-11           [-1, 80, 73, 73]           5,120
      BatchNorm2d-12           [-1, 80, 73, 73]             160
      BasicConv2d-13           [-1, 80, 73, 73]               0
           Conv2d-14          [-1, 192,

All pre-trained models expect input images normalized in the same way,
i.e. mini-batches of 3-channel RGB images of shape `(3 x H x W)`, where `H` and `W` are expected to be at least `299`.
The images have to be loaded in to a range of `[0, 1]` and then normalized using `mean = [0.485, 0.456, 0.406]`
and `std = [0.229, 0.224, 0.225]`.

Here's a sample execution.

In [None]:
# Download an example image from the pytorch website
import urllib
url, filename = ("https://github.com/pytorch/hub/raw/master/images/dog.jpg", "dog.jpg")
try: urllib.URLopener().retrieve(url, filename)
except: urllib.request.urlretrieve(url, filename)

In [None]:
# sample execution (requires torchvision)
from PIL import Image
from torchvision import transforms
input_image = Image.open(filename)
preprocess = transforms.Compose([
    transforms.Resize(299),
    transforms.CenterCrop(299),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model

# move the input and model to GPU for speed if available
if torch.cuda.is_available():
    input_batch = input_batch.to('cuda')
    model.to('cuda')

with torch.no_grad():
  output = model(input_batch)
# Tensor of shape 1000, with confidence scores over Imagenet's 1000 classes
print(output[0])
# The output has unnormalized scores. To get probabilities, you can run a softmax on it.
probabilities = torch.nn.functional.softmax(output[0], dim=0)
print(probabilities)

In [None]:
# Download ImageNet labels
!wget https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt

In [None]:
# Read the categories
with open("imagenet_classes.txt", "r") as f:
    categories = [s.strip() for s in f.readlines()]
# Show top categories per image
top5_prob, top5_catid = torch.topk(probabilities, 5)
for i in range(top5_prob.size(0)):
    print(categories[top5_catid[i]], top5_prob[i].item())

### Model Description

Inception v3: Based on the exploration of ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. We benchmark our methods on the ILSVRC 2012 classification challenge validation set demonstrate substantial gains over the state of the art: 21.2% top-1 and 5.6% top-5 error for single frame evaluation using a network with a computational cost of 5 billion multiply-adds per inference and with using less than 25 million parameters. With an ensemble of 4 models and multi-crop evaluation, we report 3.5% top-5 error on the validation set (3.6% error on the test set) and 17.3% top-1 error on the validation set.

The 1-crop error rates on the imagenet dataset with the pretrained model are listed below.

| Model structure | Top-1 error | Top-5 error |
| --------------- | ----------- | ----------- |
|  inception_v3        | 22.55       | 6.44        |

### References

 - [Rethinking the Inception Architecture for Computer Vision](https://arxiv.org/abs/1512.00567).