# Broadcasting

The term **broadcasting** describes how arrays with different shapes are treated during arithmetic operations.

From the [Numpy Documentation](https://docs.scipy.org/doc/numpy-1.10.0/user/basics.broadcasting.html):

    The term broadcasting describes how numpy treats arrays with 
    different shapes during arithmetic operations. Subject to certain 
    constraints, the smaller array is “broadcast” across the larger 
    array so that they have compatible shapes. Broadcasting provides a 
    means of vectorizing array operations so that looping occurs in C
    instead of Python. It does this without making needless copies of 
    data and usually leads to efficient algorithm implementations.
    
In addition to the efficiency of broadcasting, it allows developers to write less code, which typically leads to fewer errors.

*This section was adapted from [Chapter 4](http://nbviewer.jupyter.org/github/fastai/numerical-linear-algebra/blob/master/nbs/4.%20Compressed%20Sensing%20of%20CT%20Scans%20with%20Robust%20Regression.ipynb#4.-Compressed-Sensing-of-CT-Scans-with-Robust-Regression) of the fast.ai [Computational Linear Algebra](https://github.com/fastai/numerical-linear-algebra) course.*


## Broadcasting Rules

When operating on two arrays/tensors, Numpy/PyTorch compares their shapes element-wise. It starts with the **trailing dimensions**, and works its way forward. Two dimensions are **compatible** when

- they are equal, or
- one of them is 1, in which case that dimension is broadcasted to make it the same size

Arrays do not need to have the same number of dimensions. For example, if you have a `256*256*3` array of RGB values, and you want to scale each color in the image by a different value, you can multiply the image by a one-dimensional array with 3 values. Lining up the sizes of the trailing axes of these arrays according to the broadcast rules, shows that they are compatible:

    Image  (3d array): 256 x 256 x 3
    Scale  (1d array):             3
    Result (3d array): 256 x 256 x 3

The [numpy documentation](https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html#general-broadcasting-rules) includes several examples of what dimensions can and can not be broadcast together.

# Numpy Broadcasting

https://numpy.org/doc/stable/user/basics.broadcasting.html

In [12]:
import numpy as np
from torch import tensor

In [18]:
a = np.array([1.,2.,3])
a, a.shape

(array([1., 2., 3.]), (3,))

In [20]:
b = np.array(([10.],[20.],[30.]))
b, b.shape

(array([[10.],
        [20.],
        [30.]]),
 (3, 1))

In [22]:
a * b

array([[10., 20., 30.],
       [20., 40., 60.],
       [30., 60., 90.]])

In [23]:
a = np.array([1.0, 2.0, 3.0])
b = np.array([2.0, 2.0, 2.0])
a * b

array([2., 4., 6.])

In [24]:
a = np.array([1.0, 2.0, 3.0])
b = 2.0
a * b

array([2., 4., 6.])

In [29]:
a = np.array([1.0, 2.0, 3.0])
b = np.array([2.0, 3.0])
#a + b
a[None,:].shape, b.shape

((1, 3), (2,))

In [31]:
a[:,None]

array([[1.],
       [2.],
       [3.]])

In [39]:
'''
For example, if 
a.shape is (5,1), 
b.shape is (1,6), 
c.shape is (6,) and 
d.shape is () so that d is a scalar, 
then a, b, c, and d are all broadcastable to dimension (5,6); and

a acts like a (5,6) array where a[:,0] is broadcast to the other columns,

b acts like a (5,6) array where b[0,:] is broadcast to the other rows,

c acts like a (1,6) array and therefore like a (5,6) array where c[:] is broadcast to every row, and finally,

d acts like a (5,6) array where the single value is repeated.

'''

a = np.array([[1],[2],[3],[4],[5]])
b = np.array([[1,2,3,4,5,6]])
c = np.array([1,2,3,4,5,6])
d = np.float32(10.0)
a.shape, b.shape, c.shape, d.shape



((5, 1), (1, 6), (6,), ())

In [42]:
# -> (5, 6)
a.shape, b.shape, (a + b).shape

((5, 1), (1, 6), (5, 6))

In [45]:
# (1,6) (6)
# (6,6)
b.shape, c.shape, (b + c)

((1, 6), (6,), array([[ 2,  4,  6,  8, 10, 12]]))

In [47]:
c.shape, d.shape, (c + d).shape

((6,), (), (6,))

In [None]:

A      (2d array):  5 x 4
B      (1d array):      4
Result (2d array):  5 x 4

A      (3d array):  15 x 3 x 5
B      (3d array):  15 x 1 x 5
Result (3d array):  15 x 3 x 5

A      (3d array):  15 x 3 x 5
B      (2d array):       3 x 5
Result (3d array):  15 x 3 x 5

A      (3d array):  15 x 3 x 5
B      (2d array):       3 x 1
Result (3d array):  15 x 3 x 5

In [56]:
# A      (2d array):  5 x 4
# B      (1d array):      1
# Result (2d array):  5 x 4
A = np.random.rand(5,4)
B = np.random.rand(1)

A.shape, B.shape, (A*B).shape
# A * B
# A

((5, 4), (1,), (5, 4))

In [57]:
a = np.array([[ 0.0,  0.0,  0.0],
              [10.0, 10.0, 10.0],
              [20.0, 20.0, 20.0],
              [30.0, 30.0, 30.0]])
b = np.array([1.0, 2.0, 3.0])
a + b

array([[ 1.,  2.,  3.],
       [11., 12., 13.],
       [21., 22., 23.],
       [31., 32., 33.]])

In [58]:
b = np.array([1.0, 2.0, 3.0, 4.0])
a + b

ValueError: operands could not be broadcast together with shapes (4,3) (4,) 

In [59]:
a = np.array([0.0, 10.0, 20.0, 30.0])
b = np.array([1.0, 2.0, 3.0])
a[:, np.newaxis] + b

array([[ 1.,  2.,  3.],
       [11., 12., 13.],
       [21., 22., 23.],
       [31., 32., 33.]])

## Vector Quantization

In [60]:
from numpy import array, argmin, sqrt, sum
observation = array([111.0, 188.0])
codes = array([[102.0, 203.0],
               [132.0, 193.0],
               [45.0, 155.0],
               [57.0, 173.0]])

In [62]:
diff = codes - observation    # the broadcast happens here
diff
# dist = sqrt(sum(diff**2,axis=-1))
# argmin(dist)

array([[ -9.,  15.],
       [ 21.,   5.],
       [-66., -33.],
       [-54., -15.]])

In [67]:
diff**2

array([[  81.,  225.],
       [ 441.,   25.],
       [4356., 1089.],
       [2916.,  225.]])

In [66]:
sum(diff**2,axis=-1)

array([ 306.,  466., 5445., 3141.])

In [68]:
sqrt(sum(diff**2, axis=-1))

array([17.49285568, 21.58703314, 73.79024326, 56.04462508])

In [69]:
argmin(sqrt(sum(diff**2, axis=-1)))

0

# Pytorch Broadcasting semantics

https://pytorch.org/docs/stable/notes/broadcasting.html

In [3]:
import torch