<span style="font-size:3em;">Ch4: Real-world data
 representation
 using tensors</span>
 
 Each section in this chapter will describe a data type, and each will come with its
own dataset. We’ll be using a lot of image and volumetric data through the rest of the book,
since those are common data types and they reproduce well in book format. We’ll also
cover tabular data, time series, and text, as those will also be of interest to a number of
our readers. 

In every section, we will stop where a deep learning researcher would start: right
before feeding the data to a model.


In [4]:
import torch

# 4.1 Working with Images

## 4.1.1Images as an array
 An image is represented as a collection of scalars arranged in a regular grid with a
height and a width (in pixels). We might have a single scalar per grid point (the
pixel), which would be represented as a grayscale image; or multiple scalars per grid
point, which would typically represent different colors, as we saw in the previous chapter, or different features like depth from a depth camera.

Scalars representing values at individual pixels are often encoded using 8-bit integers, as in consumer cameras. In medical, scientific, and industrial applications, it is
not unusual to find higher numerical precision, such as 12-bit or 16-bit. This allows a
wider range or increased sensitivity in cases where the pixel encodes information
about a physical property, like bone density, temperature, or depth.

## 4.1.2 Loading an Image file
Images come in several different file formats, but luckily there are plenty of ways to
load images in Python. Let’s start by loading a PNG image using the <b>imageio</b> module

<b>NOTE:</b> We’ll use imageio throughout the chapter because it handles different
data types with a uniform API. For many purposes, using TorchVision is a
great default choice to deal with image and video data. We go with imageio
here for somewhat lighter exploration.

In [1]:
import imageio

In [2]:
img_arr = imageio.imread('../dlwpt-code/data/p1ch4/image-dog/bobby.jpg')
img_arr.shape

(720, 1280, 3)

At this point, img is a NumPy array-like object with three dimensions: two spatial
dimensions, width and height; and a third dimension corresponding to the red,
green, and blue channels. Any library that outputs a NumPy array will suffice to obtain
a PyTorch tensor. 

## 4.1.3 Changing the Layout

The only thing to watch out for is the layout of the dimensions.
PyTorch modules dealing with image data require tensors to be laid out as <b>C × H × W </b>:
channels, height, and width, respectively. (TensorFlow is H x W C but it doesnt make a huge difference, need to just make sure we change the layout accordingly)


We can use the tensor’s permute method with the old dimensions for each new dimension to get to an appropriate layout. Given an input tensor H × W × C as obtained previously, we get a proper layout by having channel 2 first and then channels 0 and 1:

Changing the layout using <b> permute</b> does not make a copy and instead uses the same storage and only changes size and stride. This means this operation is extremely cheap.


In [5]:
img = torch.from_numpy(img_arr)
out = img.permute(2,0,1)

#Note: Changing a pixel in img will change out since its not a copy but a new view

To create a dataset of multiple images to use as an input for our
neural networks, we store the images in a batch along the first dimension to obtain an
N × C × H × W tensor. An slightly more efficient alternative to using "stack" is to <b>preallocate a tensor of the appropriate size and shape and fill it with images loaded from the directory</b>

In [6]:
batch_size = 3
batch = torch.zeros(batch_size, 3, 256, 256, dtype=torch.uint8)

#We can now load all PNG images from an input directory and store them in the tensor:
import os

data_dir = '../dlwpt-code/data/p1ch4/image-cats/'
filenames = [name for name in os.listdir(data_dir) if os.path.splitext(name)[-1] == '.png']

for i, filename in enumerate(filenames):
    img_arr = imageio.imread(os.path.join(data_dir, filename))
    img_t = torch.from_numpy(img_arr)
    img_t = img_t.permute(2,0,1)
    img_t = img_t[:3]
    batch[i] = img_t

## 4.1.4 Normalizing the Data
Neural networks exhibit the best training performance when the input
data ranges roughly from 0 to 1, or from -1 to 1 (this is an effect of how their building
blocks are defined).
 So a typical thing we’ll want to do is cast a tensor to floating-point and normalize
the values of the pixels. Casting to floating-point is easy, but normalization is trickier,
as it depends on what range of the input we decide should lie between 0 and 1 (or -1
and 1).

In [14]:
batch = batch.float()
batch /= 255.0

n_channels = batch.shape[1]
for c in range(n_channels):
    mean = torch.mean(batch[:,c])
    std = torch.std(batch[:,c])
    batch[:,c] = (batch[:,c] - mean) / std

<b>NOTE</b> Here, we normalize just a single batch of images because we do not
know yet how to operate on an entire dataset. In working with images, it is good
practice to compute the mean and standard deviation on all the training data
in advance and then subtract and divide by these fixed, precomputed quantities.

We can perform several other operations on inputs, such as geometric transformations like rotations, scaling, and cropping. These may help with training or may be
required to make an arbitrary input conform to the input requirements of a network,
like the size of the image.

# 3D Images: Volumetric Data