This article is from [this](https://medium.com/codex/say-goodbye-to-loops-in-python-and-welcome-vectorization-e8b0172b9581) in Medium.

# Introduction

Loops come to us naturally, we learn about Loops in almost all programming languages. So, by default, we start implementing loops whenever there is a repetitive operation. But when we work with a large number of iterations (millions/billions of rows), using loops is a crime. You might be stuck for hours, to later realize that it won’t work. This is where implementing Vectorisation in Python becomes super crucial.

#What is Vectorization?

*Vectorization is the technique of implementing (NumPy) array operations on a dataset. In the background, it applies the operations to all the elements of an array or series in one go (unlike a ‘for’ loop that manipulates one row at a time).*

In this blog, we will look at some of the use cases where we can easily replace Python loops with Vectorization. This will help you save time and become more skillful in coding.

# USE CASE 1: Finding the Sum of numbers

In [1]:
import time

In [2]:
import numpy as np

In [3]:
start = time.time()

# iterative sum
total = 0
# iterating through 1.5 Million numbers
for item in range(0, 1500000):
    total = total + item

print("sum is:" + str(total))
end = time.time()
print(end - start)

sum is:1124999250000
0.1659984588623047


In [4]:
start = time.time()
# vectorized sum - using numpy for vectorization
# np.arange create the sequence of numbers from 0 to 1499999
print(
    np.sum(np.arange(1500000, dtype=np.int64))
)  # "dtype=np.int64" is added by me to prevent overflow
end = time.time()
print(end - start)

1124999250000
0.005997896194458008


# USE CASE 2: Mathematical Operations (on DataFrame)

In [5]:
import pandas as pd

In [6]:
df = pd.DataFrame(
    np.random.randint(1, 50, size=(5000000, 4)), columns=("a", "b", "c", "d")
)

In [7]:
df.shape

(5000000, 4)

In [8]:
df.head()

Unnamed: 0,a,b,c,d
0,15,17,28,43
1,22,45,9,41
2,39,41,29,6
3,42,48,7,26
4,8,41,31,47


In [9]:
start = time.time()

# Iterating through DataFrame using iterrows
for idx, row in df.iterrows():
    # creating a new column
    df.at[idx, "ratio"] = 100 * (row["d"] / row["c"])
end = time.time()
print(end - start)

663.7030336856842


In [10]:
start = time.time()
df["ratio"] = 100 * (df["d"] / df["c"])

end = time.time()
print(end - start)

0.0639948844909668


# USE CASE 3: If-else Statements (on DataFrame)

In [11]:
start = time.time()

# Iterating through DataFrame using iterrows
for idx, row in df.iterrows():
    if row.a == 0:
        df.at[idx, "e"] = row.d
    elif (row.a <= 25) & (row.a > 0):
        df.at[idx, "e"] = (row.b) - (row.c)
    else:
        df.at[idx, "e"] = row.b + row.c
end = time.time()
print(end - start)

732.77907538414


In [13]:
start = time.time()
df["e"] = df["b"] + df["c"]
df.loc[df["a"] <= 25, "e"] = df["b"] - df["c"]
df.loc[df["a"] == 0, "e"] = df["d"]
end = time.time()
print(end - start)

0.20899748802185059


# USE CASE 4 (Advance): Solving Machine Learning/Deep Learning Networks

In [14]:
# setting initial values of m
m = np.random.rand(1, 5)

# input values for 5 million rows
x = np.random.rand(5000000, 5)

In [15]:
m

array([[0.47318244, 0.96073868, 0.19688667, 0.74153901, 0.19356319]])

In [16]:
x

array([[0.81624195, 0.55019938, 0.77697437, 0.5745752 , 0.50867909],
       [0.98930764, 0.9059751 , 0.07546262, 0.42033559, 0.56390302],
       [0.26157056, 0.36823057, 0.85534947, 0.1291754 , 0.4805555 ],
       ...,
       [0.53397642, 0.67850903, 0.20830354, 0.8302266 , 0.47389778],
       [0.59569115, 0.36704131, 0.56475781, 0.406436  , 0.31412645],
       [0.0029223 , 0.36978418, 0.87348796, 0.59194774, 0.01800727]])

In [18]:
total = 0
tic = time.process_time()
zer = []
for i in range(0, 5000000):
    total = 0
    for j in range(0, 5):
        total = total + x[i][j] * m[0][j]

    zer.append(total)
toc = time.process_time()
print("Computation time = " + str((toc - tic)) + "seconds")

Computation time = 17.359375seconds


In [19]:
tic = time.process_time()

# dot product
np.dot(x, m.T)
toc = time.process_time()
print("Computation time = " + str((toc - tic)) + "seconds")

Computation time = 0.203125seconds
