<a href="https://colab.research.google.com/github/fallcat/python-tutorial/blob/main/slicing_broadcasting_and_vectorization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Slicing, broadcasting, and vectorization

This tutorial is using `numpy`, but can apply to other array or tensor libraries such as `pytorch`.

In [1]:
import numpy as np

## Slicing

We can use slicing to get a slice of a numpy array

In [2]:
x = np.array([[1, 2, 3],
              [1.1, 2.1, 3.1],
              [1.2, 2.2, 3.2]])

In [3]:
x

array([[1. , 2. , 3. ],
       [1.1, 2.1, 3.1],
       [1.2, 2.2, 3.2]])

In [4]:
# You can slice in the 0th dimension by putting in the index
x[1]

array([1.1, 2.1, 3.1])

In [5]:
# You can also specify a range that you want to slice
x[0:2]

array([[1. , 2. , 3. ],
       [1.1, 2.1, 3.1]])

In [6]:
# You can also jump around when slicing by using a list
idx = [0,0,2]
x[idx]

array([[1. , 2. , 3. ],
       [1. , 2. , 3. ],
       [1.2, 2.2, 3.2]])

In [7]:
# You can not only slice in the 0th dimension, but also 1st dimension
x[:,1]

array([2. , 2.1, 2.2])

In [8]:
# Note that if you slice a range, it will keep the shape, while if you use only
# one index it will remove one dimension
x[:,1:2]

array([[2. ],
       [2.1],
       [2.2]])

In [9]:
print("x[:,1].shape", x[:,1].shape)
print("x[:,1:2].shape", x[:,1:2].shape)

x[:,1].shape (3,)
x[:,1:2].shape (3, 1)


In [10]:
idx = [0,0,2]
x[:,idx]

array([[1. , 1. , 3. ],
       [1.1, 1.1, 3.1],
       [1.2, 1.2, 3.2]])

In [11]:
# You can also slice a piece of the array by specifying each index for each axis
# For example, for the 0th row, I want element 0; 
#              for the 1st row, I want element 0;
#              for the 2nd row, I want element 1.
idx_axis0 = list(range(x.shape[0]))
print("idx_axis0", idx_axis0)
idx_axis1 = [0, 0, 1]
print("idx_axis1", idx_axis1)
x[idx_axis0, idx_axis1]

idx_axis0 [0, 1, 2]
idx_axis1 [0, 0, 1]


array([1. , 1.1, 2.2])

In [12]:
# To write it more compactedly, we can write
x[range(x.shape[0]), [0, 0, 1]]

array([1. , 1.1, 2.2])

## Broadcasting
When you want to operate on two arrays element-wise, and one array has one less dimension than the other, you can use broadcasting.

In [13]:
x = np.array([[1, 2, 3],
              [1.1, 2.1, 3.1]])
y = np.array([3, 6, 9])

In [14]:
print("x.shape", x.shape)
print("y.shape", y.shape)

x.shape (2, 3)
y.shape (3,)


In [15]:
# When you want to add each element in each row of x with corresponding element
# in y, you might think of using a for-loop
z = np.zeros((2, 3))
for i in range(z.shape[0]):
    for j in range(z.shape[1]):
        z[i, j] = x[i, j] + y[j]

In [16]:
z

array([[ 4. ,  8. , 12. ],
       [ 4.1,  8.1, 12.1]])

In [17]:
# Instead, you can compute the same thing with just one line of code
x + y

array([[ 4. ,  8. , 12. ],
       [ 4.1,  8.1, 12.1]])

In [18]:
# Let's look at the time used by each operation when the matrix is huge

import time

x = np.random.rand(1000, 2000)
y = np.random.rand(2000)

start = time.time()

z = np.zeros((1000, 2000))
for i in range(z.shape[0]):
    for j in range(z.shape[1]):
        z[i, j] = x[i, j] + y[j]
print("Time with using a loop", time.time() - start)

start = time.time()
x + y
print("Time with using vectorization", time.time() - start)

Time with using a loop 1.4766991138458252
Time with using vectorization 0.006775617599487305
