# Real world data representation using Tensors

### Goal of the chapter

How to prepare data pipelines for different scenarious like
- Image
- Tabular
- Text

## Working with images
- Images are collection of numbers arranged in height * width
- Each number represents the value of a pixel
- Consumer cameras represent pixels in 8-bit integer, medical may use 12/16 bit integer.
- Formats like RGB are used for representing color images.

### Loading an Image

In [1]:
import imageio

In [3]:
img = imageio.imread('imgs/bobby.jpeg')

In [6]:
img.shape

(720, 1280, 3)

The image is represented in H*W*C, in PyTorch we represent it in C*H*W
- Lets convert the layout to PyTorch style
- Lets also look at how to create a batch of images

In [55]:
from pathlib import Path
import torch
import torch.nn.functional as F
from PIL import Image
import numpy as np

In [18]:
im_array = imageio.imread(file)

In [81]:
path = Path('/home/.fastai/data/oxford-iiit-pet/images')

In [87]:
images = torch.zeros(4,3,224,224,dtype=torch.uint8)

for i, file in enumerate(list(path.iterdir())[:4]):
    img = Image.open(file)
    img = img.resize((224,224))
    t_img = torch.from_numpy(np.array(img))
    images[i] = t_img.permute(2,0,1)

In [88]:
images.shape

torch.Size([4, 3, 224, 224])

The book uses a different library, but illustrates the same concept.

There are 2 ways to Normalize the data
- Divide by 255
- Subtract with mean and divide by std for each channel

In [89]:
images = images.float()
images /= 255


In [90]:
images.min(), images.max()

(tensor(0.), tensor(0.9843))

In [91]:
for ch in range(3):
    mean = images[:,ch].mean()
    std = images[:,ch].std()
    images[:,ch] = (images[:,ch]-mean)/std
    

## Representing Tabular table

In [98]:
import pandas as pd

In [96]:
wineq_numpy = np.loadtxt('winequality-white.csv', dtype=np.float32, delimiter=";",
skiprows=1)

In [97]:
wineq_numpy

array([[ 7.  ,  0.27,  0.36, ...,  0.45,  8.8 ,  6.  ],
       [ 6.3 ,  0.3 ,  0.34, ...,  0.49,  9.5 ,  6.  ],
       [ 8.1 ,  0.28,  0.4 , ...,  0.44, 10.1 ,  6.  ],
       ...,
       [ 6.5 ,  0.24,  0.19, ...,  0.46,  9.4 ,  6.  ],
       [ 5.5 ,  0.29,  0.3 , ...,  0.38, 12.8 ,  7.  ],
       [ 6.  ,  0.21,  0.38, ...,  0.32, 11.8 ,  6.  ]], dtype=float32)

In [101]:
df = pd.read_csv('winequality-white.csv',delimiter=';')

In [102]:
df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.001,3.0,0.45,8.8,6
1,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,6
2,8.1,0.28,0.4,6.9,0.05,30.0,97.0,0.9951,3.26,0.44,10.1,6
3,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6
4,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6


In [None]:
python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"