### Vectorization

#### Introduction to Vectorization

In layman’s terms, it is used to speed up code without the need for looping, indexing, etc. In data science, we use Numpy to do this — which is the de facto framework for scientific programming. Technically, we still perform these operations when we implement the vectorized form in Numpy.

#### Outer Product

The outer product of two vectors will result in a matrix. For instance, if we have two vectors of ‘m’ and ‘n’ dimensions then, the outer product of these two vectors would be as follows:

![image.png](attachment:image.png)

We shall now look at how this works in a python code. 

In [1]:
import numpy as np
import time
a = np.arange(10000)
b = np.arange(10000)
# pure Python outer product implementation
tic = time.process_time()
outer_product = np.zeros((10000, 10000))
for i in range(len(a)):
    for j in range(len(b)):
        outer_product[i][j]= a[i] * b[j]
toc = time.process_time()
print("python_outer_product = "+ str(outer_product))
print("Time = "+str(1000*(toc - tic ))+"ms\n")
# Numpy outer product implementation
n_tic = time.process_time()
outer_product = np.outer(a, b)
n_toc = time.process_time()
print("numpy_outer_product = "+str(outer_product));
print("Time = "+str(1000*(n_toc - n_tic ))+"ms")

: 

: 

#### Dot Product

The dot product which is also popularly as the inner product takes two sequences of numbers of equal lengths and returns an output which is a scaler as shown below:

![image.png](attachment:image.png)

We shall now look at how this works in a python code.

In [None]:
import numpy as np
import time
a = np.arange(10000000)
b = np.arange(10000000)
# pure Python Dot product implementation
tic = time.process_time()
dot_product = 0
for i in range(len(a)):
    dot_product += a[i] * b[i]
toc = time.process_time()
print("python_dot_product = "+ str(dot_product))
print("Time = "+str(1000*(toc - tic ))+"ms\n")
# Numpy outer product implementation
n_tic = time.process_time()
dot_product = np.dot(a, b)
n_toc = time.process_time()
print("numpy_dot_product = "+str(dot_product))
print("Time = "+str(1000*(n_toc - n_tic ))+"ms")

#### Conclusion

Vectorization is a very important part of text processing as well as working with numbers in day-to-day data analytics and big data analysis. It gives us the following advantages:<br>
1. Makes our code faster and easier to read. <br>
2. Reduces the amount of code we must write, which usually results in fewer bugs. <br>
3. Lastly, the code looks more python-like since we get rid of all the inefficient and difficult to read for loops.