# torchvision.io.read_image

Reads a JPG image into a 3 dimensional RGB (channel, height, width). The values of the output tensor are uint8 in [0, 255].

For example, the input is an image with width 2px, height 4px

`print(input.shape)` output: (3, 4, 2)

`print(input)` output
```
[[[248,   0],
  [  0, 246],
  [244,   0],
  [240,   1]],

 [[  0, 248],
  [  0, 244],
  [  1, 242],
  [239,   3]],

 [[  0,   1],
  [246,   1],
  [243, 243],
  [237,   2]]
]
```

`print(input)` expanded output
```
[
  [
    [248,   0],
    [  0, 246],
    [244,   0],
    [240,   1]
  ],

  [
    [  0, 248],
    [  0, 244],
    [  1, 242],
    [239,   3]],

  [
    [  0,   1],
    [246,   1],
    [243, 243],
    [237,   2]
  ]
]
```

In [132]:
from torchvision.io import read_image

input = read_image("1-rgb2x4.jpg")

print("input.shape")
print(input.shape)
print("input")
print(input)

input.shape
torch.Size([3, 4, 2])
input
tensor([[[248,   0],
         [  0, 246],
         [244,   0],
         [240,   1]],

        [[  0, 248],
         [  0, 244],
         [  1, 242],
         [239,   3]],

        [[  0,   1],
         [246,   1],
         [243, 243],
         [237,   2]]], dtype=torch.uint8)


# What is the difference between torchvision.io.read_image and cv2.imread?

`torchvision.io.read_image` returns `(channel, height, width)`. `cv2.imread` returns `(height, width, channel)`

`torchvision.io.read_image`
1. `read_image_ouput[0]` returns `image_red` channel
2. `read_image_output[1]` returns `image_green` channel
3. `read_image_output[2]` returns `image_blue` channel

`cv2.imread`
1. `cv2.imread[:,:,0]` returns `image_blue` channel
2. `cv2.imread[:,:,1]` returns `image_green` channel
2. `cv2.imread[:,:,2]` returns `image_red` channel

# What is torch.nn.Conv2d

Applying a 2D convolution with `out-channels=4`, `kernel_size=(3,2)`, and `stride=1` over an input with width 4px, height 5px will output `(4, 3, 3)` where `(channel, height, width)`. `stride=1` will move the kernel by 1 pixel. So, an image with width 4px, height 5px, will generate new image with width 3px, height 3px

`print(input[0])` output or Red channel
```
[[254, 255,   0, 255],
 [255,   0, 254, 251],
 [252, 254,   0, 255],
 [254,   1, 254, 255],
 [255, 255, 253, 255]]
```

`print(m.weight[0][0])` output or 1st kernel for Red channel.
```
[[ 0.1127, -0.0228],
 [-0.2330, -0.1881],
 [ 0.2010,  0.0741]]
```

In this case. the `torch.nn.Conv2d` have 4 filters. Each filter have 3 kernels. 

So, how filters `(4, 3, 3, 2)` become output `(4, 3, 3)`?

Each new element is computed by elementwise multipying the highlighted input (blue) with the filter (red), summing it up, and then offsetting the result by the bias.

## Convolution demo

Reference:
1. [Standford University CS231n](https://cs231n.github.io/convolutional-networks/)
2. [University of Massachusetts Amherts](https://compsci682.github.io/notes/convolutional-networks/)

![Convolution Demo](./3-convolution-demo.gif)

In [258]:
import torch.nn as nn
from torchvision.io import read_image

# Define the Conv2d layer
m = nn.Conv2d(in_channels=3, out_channels=4, kernel_size=(3,2))

# Reads a JPG image into a 3 dimensional RGB. The values of the output tensor are uint8 in [0, 255].
input = read_image("2-rgb4x5.jpg")
print("input[0] with [0, 255] range")
print(input[0])
# Normalize the values to [0, 1] range
input = input.float() / 255.0

# Perform the convolution
output = m(input)

# Print the input and output
print("m.weight.shape or kernel")
print(m.weight.shape)
print("m.weight[0]")
print(m.weight[0][0])
print("input.shape")
print(input.shape)
print("output.shape")
print(output.shape)
print(output)

input[0] with [0, 255] range
tensor([[254, 255,   0, 255],
        [255,   0, 254, 251],
        [252, 254,   0, 255],
        [254,   1, 254, 255],
        [255, 255, 253, 255]], dtype=torch.uint8)
m.weight.shape or kernel
torch.Size([4, 3, 3, 2])
m.weight[0]
tensor([[-0.0153, -0.0690],
        [-0.1429,  0.0850],
        [-0.0061,  0.0010]], grad_fn=<SelectBackward0>)
input.shape
torch.Size([3, 5, 4])
output.shape
torch.Size([4, 3, 3])
tensor([[[-0.4844, -0.0885, -0.3485],
         [ 0.4091,  0.0825,  0.1730],
         [-0.4535,  0.1701, -0.0270]],

        [[ 0.3509,  0.6484,  0.4705],
         [ 0.0720,  0.5867,  0.5814],
         [ 0.2734,  0.3807,  0.4345]],

        [[-0.4542, -0.8518, -0.6748],
         [-0.6224, -0.2140, -0.8990],
         [-0.6336, -0.2801, -0.5722]],

        [[-0.1772, -0.1944,  0.1835],
         [-0.4923, -0.1451,  0.0016],
         [-0.0023, -0.3033, -0.0992]]], grad_fn=<SqueezeBackward1>)
