<img src="../everything data.png" width="1728" height="864"/>

<h1><center>Vectorizing and Broadcasting in NumPy</center></h1>

<h3><center>Tristan Sim</center></h3>
<h4><center>26/6/2023</center></h4>

I recently completed Datacamp's <a href="">Introduction to NumPy course</a>, through which I discovered an interesting topic within the subject, Vectorizing operations and Broadcasting. This mini notebook aims to to solidify my understanding on the topic, and hopefully in doing so, explain to those reading what Vectorizing and Broadcasting are, and why they are so powerful.

## Vectorizing

- Vectorization refers to the process of performing operations on entire arrays or matrices, rather than looping through individual elements like traditional python would. 

<center><img src="images/vectorization.png" width=300 height=500/></center>

- The efficiency and speed of performing operations with Vectorization leverages on the optimized C-based implementation of NumPy to efficiently execute computations, resulting in faster and more concise code.


The code below will illustrate the concept of vectorization:

In [4]:
import numpy as np

### The problem: 

Assume we want to add each element of an array with it's corresponding element in another array. In python, it would look like this: 

In [7]:
# We want to add each element in array_one with it's corresponding element in array_two (1 + 5, 2 + 6, 3 + 7, 4 + 8)
array_one = np.array([1, 2, 3, 4])
array_two = np.array([5, 6, 7, 8])

# First, we have to create a copy of array_one.
array_sum = np.copy(array_one)

# Then, we loop through the length array_sum, and use the index to access the elements of array_two
for i in range(len(array_sum)):
    # Lastly, we add the value of the element of array_two, into array_sum at the same index.
    array_sum[i] += array_two[i]
    

print(array_sum)  # Output: [6 8 10 12]

[ 6  8 10 12]


- For a relatively simple task, it took several lines of code.
- Furthermore, looping through each element and performing addition individually is inefficient and slow.

### The solution: 

To solve this issue, we leverage the optimized low-level implementations in NumPy, typically written in C, which can perform the addition operation on entire arrays in a highly efficient manner.

In [8]:
# As simple as one line, we recreated what we did previously in Python.
print(array_one + array_two)

[ 6  8 10 12]


You can see how much more readable our code became after utilizing Vectorized Operations. Moreover, the increased efficiency vectorized operators showcase is another reason to leverage NumPy.

## Broadcasting

As shown in Vectorization, arithmetic operations on arrays are usually done on corresponding elements 

<h3><center>$ 
A([1, 2, 3]) + B([1, 2, 3]) = [(A_1 + B_1) , (A_2 + B_2) , (A_3 + B_3)]
$</center></h3>

What if these arrays have different shapes? That's where Broadcasting comes in.

### How it works:

In NumPy, broadcasting allows us to perform operations on arrays of different shapes without reshaping any arrays.

<img src="images/broadcasting.png" height=400 width=400/>

Let's imagine we have two 2D arrays of different shapes, A and B.

In [17]:
# A is a 2D array with shape (2, 3)
A = [[1, 2, 3],
     [4, 5, 6]]

# B is a 1D array with shape (3, )
B =  [7, 8, 9]

# (Converting A and B to numpy arrays)
A = np.array(A)
B = np.array(B)

# Array A and B's shape
A.shape, B.shape

((2, 3), (3,))

- Normally, we wouldn't be able to perform arithmetic operations on these arrays because their shapes don't match. 
- However, broadcasting allows us to treat them as if they were the same shape.


In [21]:
# As we can see, despite A and B not having the same shape, a result was still derived.
# Why is this so?
A + B

array([[ 8, 10, 12],
       [11, 13, 15]])

To explain what Broadcasting actually is, let us visualize what happens when an array is broadcasted.

At the moment, this is what array A and B look like in matrix form. <b>Array A</b> has a shape of (2, 3) while <b>Array B</b> has a shape of (1, 3)

<img src="images/AnB.png" height=400 width=300>

When we perform an arithmetic operation involving A and B, such as addition, broadcasting would automatically replicate Array B along the first dimension, making it a 2x3 matrix to match the shape of Array A.

<img src="images/Brepli.png" height=300 width=200/>

We can imagine <b>Array B</b> to now have a matching shape of (2, 3), after broadcasting.

Hence, it is why we can perform arithmetic operations as per usual, despite array A and B not having the same shape. Because NumPy "broadcasts" Array B along the first dimension, making it a 2x3 matrix to match the shape of Array A.

<img src="aplusb.png" height=400 width=400/>

# References

1. <a href="https://www.google.com/url?sa=i&url=https%3A%2F%2Fdatascience.blog.wzb.eu%2F2018%2F02%2F02%2Fvectorization-and-parallelization-in-python-with-numpy-and-pandas%2F&psig=AOvVaw3IpyLMb0aAiGfhiHVryUHq&ust=1687883366538000&source=images&cd=vfe&ved=0CBMQjhxqFwoTCLiZ8tWt4f8CFQAAAAAdAAAAABAJ">Vectorization and parallelization in Python with NumPy and Pandas | WZB Data Science Blog</a> 
2. <a href="https://www.google.com/url?sa=i&url=https%3A%2F%2Ftowardsdatascience.com%2Fbroadcasting-in-numpy-58856f926d73&psig=AOvVaw3q8UzARYizaKkz9Exv9ifN&ust=1687881558113000&source=images&cd=vfe&ved=0CBMQjhxqFwoTCNC1v_em4f8CFQAAAAAdAAAAABA2">Broadcasting in NumPy | by Lex Maximov</a>