# Broadcasting rules

This reading will introduce you to numpy's broadcasting rules and show how you can use broadcasting with TensorFlow Tensors and Variables.

In [1]:
import tensorflow as tf

2022-12-19 12:33:15.886260: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [2]:
import numpy as np

## Operations on arrays of different sizes in numpy

Numpy operations can be applied to arrays that are not of the same shape, but only if the shapes satisfy certain conditions.

As a demonstration of this, let us add together two arrays of different shapes:

In [3]:
# Add two arrays with different shapes

a = np.array([[1.],
              [2.],
              [3.],
              [4.]])  # shape (4, 1)

b = np.array([0., 1., 2.])  # shape (3,) 

a + b

array([[1., 2., 3.],
       [2., 3., 4.],
       [3., 4., 5.],
       [4., 5., 6.]])

This is the addition

    [ [1.],    +  [0., 1., 2.]  
      [2.],  
      [3.],  
      [4.] ]

To execute it, numpy:
1. Aligned the shapes of `a` and `b` on the last axis and prepended 1s to the shape with fewer axes:
        a:     3     --->    a: 1 x 3
        b: 4 x 1     --->    b: 4 x 1
        

2. Checked that the sizes of the axes matched or were equal to 1:
        a: 1 x 3  
        b: 4 x 1
`a` and `b` satisfied this criterion. 


3. Stretched both arrays on their 1-valued axes so that their shapes matched, then added them together.  
`a` was replicated 4 times in the first axis, while `b` was replicated 3 times in the second axis.

This meant that the addition in the final step was

    [ [1., 1., 1.],    +  [ [0., 1., 2.],  
      [2., 2., 2.],         [0., 1., 2.],  
      [3., 3., 3.],         [0., 1., 2.],  
      [4., 4., 4.] ]        [0., 1., 2.] ]
      
Addition was then carried out element-by-element, as you can verify by referring back to the output of the code cell above.  
This resulted in an output with shape 4 x 3.


## Numpy's broadcasting rule

Broadcasting rules describe how values should be transmitted when the inputs to an operation do not match.  
In numpy, the broadcasting rule is very simple:
> Prepend 1s to the smaller shape,   
check that the axes of both arrays have sizes that are equal or 1,  
then stretch the arrays in their size-1 axes.

A crucial aspect of this rule is that it does not require the input arrays have the same number of axes.  
Another consequence of it is that a broadcasting output will have the largest size of its inputs in each axis.  
Take the following multiplication as an example:

        a: 3 x 7 x 1  
        b:     1 x 5  
    a * b: 3 x 7 x 5

You can see that the output shape is the maximum of the sizes in each axis.

Numpy's broadcasting rule also does not require that one of the arrays has to be bigger in all axes.  
This is seen in the following example, where `a` is smaller than `b` in its third axis but is bigger in its second axis.

In [4]:
# Multiply two arrays with different shapes

a = np.array([[[0.01], [0.1]],
              [[1.00], [10.]]])  # shape (2, 2, 1)
b = np.array([[[2., 2.]],
              [[3., 3.]]])       # shape (2, 1, 2)

a * b # shape (2, 2, 2)

array([[[2.e-02, 2.e-02],
        [2.e-01, 2.e-01]],

       [[3.e+00, 3.e+00],
        [3.e+01, 3.e+01]]])

Broadcasting behaviour also points to an efficient way to compute an outer product in numpy:

In [5]:
# Use broadcasting to compute an outer product

a = np.array([-1., 0., 1.])
b = np.array([0., 1., 2., 3.])

a[:, np.newaxis] * b  # outer product ab^T, where a and b are column vectors

array([[-0., -1., -2., -3.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  1.,  2.,  3.]])

The idea of numpy stretching the arrays in their size-1 axes is useful and is functionally correct. But this is not what numpy literally does behind the scenes, since that would be an inefficient use of memory. Instead, numpy carries out the operation by looping over singleton (size-1) dimensions.

To give you some practise with broadcasting, try predicting the output shapes for the following operations:

In [6]:
# Define three arrays with different shapes

a = [[1.], [2.], [3.]]
b = np.zeros(shape=[10, 1, 1])
c = np.ones(shape=[4])

In [7]:
# Predict the shape before executing this cell

(a + b).shape

(10, 3, 1)

In [8]:
# Predict the shape before executing this cell

(a*c).shape

(3, 4)

In [9]:
# Predict the shape before executing this cell

(a*b + c).shape

(10, 3, 4)

## Broadcasting with TensorFlow Tensors and Variables

The broadcasting rule for TensorFlow is the same as that for numpy, and broadcasting can be used in operations on Tensors and Variables.

In [11]:
# Add two Tensors with different shapes

a = tf.constant([[1.],
                 [2.],
                 [3.],
                 [4.]])  # shape (4, 1)

b = tf.constant([0., 1., 2.])  # shape (3,) 

a + b

2022-12-19 12:38:27.470273: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


<tf.Tensor: shape=(4, 3), dtype=float32, numpy=
array([[1., 2., 3.],
       [2., 3., 4.],
       [3., 4., 5.],
       [4., 5., 6.]], dtype=float32)>

In [12]:
# Multiply two Tensors with different shapes

a = tf.constant([[[0.01], [0.1]],
                 [[1.00], [10.]]])  # shape (2, 2, 1)
b = tf.constant([[[2., 2.]],
                 [[3., 3.]]])       # shape (2, 1, 2)

a * b 

<tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
array([[[2.e-02, 2.e-02],
        [2.e-01, 2.e-01]],

       [[3.e+00, 3.e+00],
        [3.e+01, 3.e+01]]], dtype=float32)>

In [13]:
# Use broadcasting to compute an outer product

a = tf.Variable([-1., 0., 1.])
b = tf.Variable([0., 1., 2., 3.])

a[:, tf.newaxis] * b

<tf.Tensor: shape=(3, 4), dtype=float32, numpy=
array([[-0., -1., -2., -3.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  1.,  2.,  3.]], dtype=float32)>

### Further reading and resources
* Numpy documentation on broadcasting: https://numpy.org/devdocs/user/theory.broadcasting.html
* https://www.tensorflow.org/xla/broadcasting