<center><h1> Welcome Vectorization in Python </h1></center>

In [1]:
import time
import numpy as np
import pandas as pd

## Use Vectorization — a super fast alternative to loops in Python

<b> What is Vectorization?

Vectorization is the technique of implementing (NumPy) array operations on a dataset. In the background, it applies the operations to all the elements of an array or series in one go (unlike a ‘for’ loop that manipulates one row at a time).

### Using Loops

In [2]:
start = time.time()

 
# iterative sum
total = 0
# iterating through 1.5 Million numbers
for item in range(0, 1500000):
    total = total + item


print('sum is:' + str(total))
end = time.time()

print(end - start)

sum is:1124999250000
0.12664484977722168


### Using vectorization

In [3]:
start = time.time()

# vectorized sum - using numpy for vectorization
# np.arange create the sequence of numbers from 0 to 1499999
print(np.sum(np.arange(1500000)))

end = time.time()

print(end - start)

1124999250000
0.005387783050537109


Vectorization took ~18x lesser time to execute as compared to the iteration using the range function.

### Using Loops in Pandas

In [7]:
df = pd.DataFrame(np.random.randint(1, 50, size=(5000000, 4)), columns=('a','b','c','d'))
df.shape
# (5000000, 5)
df.head()

Unnamed: 0,a,b,c,d
0,48,21,42,32
1,11,28,45,41
2,26,5,8,9
3,16,38,9,18
4,38,19,41,8


In [8]:
start = time.time()

# Iterating through DataFrame using iterrows
for idx, row in df.iterrows():
    # creating a new column 
    df.at[idx,'ratio'] = 100 * (row["d"] / row["c"])  
end = time.time()
print(end - start)

83.0325939655304


### Using vectorization in Pandas

In [9]:
start = time.time()
df["ratio"] = 100 * (df["d"] / df["c"])

end = time.time()
print(end - start)

0.042204856872558594


### Using Loops in If-else Statements 

In [11]:
start = time.time()

# Iterating through DataFrame using iterrows
for idx, row in df.iterrows():
    if row.a == 0:
        df.at[idx,'e'] = row.d    
    elif (row.a <= 25) & (row.a > 0):
        df.at[idx,'e'] = (row.b)-(row.c)    
    else:
        df.at[idx,'e'] = row.b + row.c

end = time.time()

print(end - start)

135.10734510421753


### Using vectorization in If-else Statements

In [13]:
start = time.time()
df['e'] = df['b'] + df['c']
df.loc[df['a'] <= 25, 'e'] = df['b'] -df['c']
df.loc[df['a']==0, 'e'] = df['d']
end = time.time()
print(end - start)

0.13132715225219727


## Use Vectorization — a super fast alternative solving Machine Learning/Deep Learning Networks in Python

In [24]:
# setting initial values of m 
m = np.random.rand(1,5)

# input values for 5 million rows
x = np.random.rand(5000000,5)

In [25]:
## Using loops

In [26]:
total = 0
tic = time.process_time()

for i in range(0,5000000):
    total = 0
    for j in range(0,5):
        total = total + x[i][j]*m[0][j] 
        
    x[i] = total 

toc = time.process_time()
print ("Computation time = " + str((toc - tic)) + " seconds")

Computation time = 10.212377000000004 seconds


In [27]:
## Using vectorization

In [29]:
total = 0
tic = time.process_time()

#dot product 
np.dot(x,m.T) 

toc = time.process_time()
print ("Computation time = " + str((toc - tic)) + " seconds")

Computation time = 0.04974600000002738 seconds
