# About

Reference document with example code going over how to implement vectorized functions in Numpy

In [1]:
from __future__ import division, print_function

%matplotlib inline
# Toggle on/off
# %matplotlib notebook

import os
import numpy as np
import pandas as pd
import scipy.io as sio
import time

# Vectorization

To process large data arrays, we should make it our goal to implement functions onto the arrays *as directly as possible*. This is in opposition to trying to apply the function onto each array entry one by one. Overall, try to **avoid loops**.

## Example 1: Boolean arrays

As a simple example, suppose we have a random matrix **A**, and we want to return an array of the same shape with each entry **true** if $a_{ij} > 0$ and **false** otherwise. For comparison, we will also show how much time it takes to run each process. As you can observe, with large arrays, the looped method takes significantly longer to process.

In [2]:
N = 2500
A = np.random.normal(size=(N,N))

### Vectorized method:

In [3]:
start = time.time()
A_bool1 = (A > 0)
end = time.time()
print(f'Time taken = {end - start}')

Time taken = 0.003989458084106445


### Loop method:

In [4]:
start = time.time()
A_bool2 = np.ones(A.shape, dtype=bool)
for i in range(A.shape[0]):
    for j in range(A.shape[1]):
        A_bool2[i,j] = (A[i,j] > 0)
        
end = time.time()
print(f'Time taken = {end - start}')

Time taken = 2.6080269813537598


### Both methods produce the same Boolean array:

In [5]:
print(np.array_equal(A_bool1, A_bool2))

True


## Example 2: Multiplying arrays

Here, we go over how multiplying arrays work in Numpy. The $*$ operator is *elementwise multiplication*, meaning if $A$ has entries $a_{ij}$ and $B$ has entries $b_{ij}$ and are of the same shape, then $A*B$ will return an array with entries $a_{ij}b_{ij}$.

This is different from MATLAB, where the $*$ operator is *matrix multiplication*. That is, $A*B$ will return an array with the $ij$th entry being $\sum_{k=1}^N a_{ik}b_{kj}$, assuming that the number of columns in $A$ and the number of rows in $B$ are both $N$.

In Numpy, if we want to multiply arrays $A$ and $B$ in the matrix sense (as in MATLAB), we use the function `numpy.matmul(A,B)`

Implementing array multiplication in the vectorized sense is very obvious, as we show below:

In [6]:
M, N = 2500, 1000
A = np.random.randint(low=0, high=10, size=(M,N))
B = np.random.randint(low=0, high=10, size=(M,N))

### Vectorized method:

In [7]:
start = time.time()
C1 = A * B
end = time.time()
print(f'Time taken = {end - start}')

Time taken = 0.004018545150756836


### Looped method:

In [8]:
start = time.time()
C2 = np.zeros(A.shape)
for i in range(C2.shape[0]):
    for j in range(C2.shape[1]):
        C2[i,j] = A[i,j] * B[i,j]
end = time.time()
print(f'Time taken = {end - start}')

Time taken = 1.1000587940216064


### Both methods produce the same product array:

In [9]:
print(np.array_equal(C1, C2))

True


## Example 3: Take all differences of a vector

Suppose we have a vector $\vec{v}$ and we want to compute a square matrix $M$ with entries $M_{ij} = v_j - v_i$. We can use a simple trick to take differences in an efficient manner.

In [10]:
N = 60
vecV = np.random.randint(low=1, high=10, size=(N,))

### Vectorized method:

In [11]:
start = time.time()
M1 = vecV[:,np.newaxis] - vecV
end = time.time()
print(f'Time taken = {end - start}')

Time taken = 0.0


### Looped method:

In [12]:
start = time.time()
M2 = np.zeros((N,N))
for i in range(N):
    for j in range(N):
        M2[i,j] = vecV[j] - vecV[i]
end = time.time()
print(f'Time taken = {end - start}')

Time taken = 0.0009920597076416016


### Both methods produce the same matrix:

In [13]:
print(np.array_equal(M1,M2))

False


# Useful functions

Here, we list some useful functions in Numpy that can be applied to array processing in a vectorized manner.

## Reshape arrays

The function `numpy.reshape(A, size)` returns a copy of array `A` with an altered shape `size`. If `size` is set to `-1`, then a 1-dimensional array of `A` is returned.

In [14]:
A = np.random.random(size=(2,4))
a_vec = np.reshape(A, -1)
print(f'Number of dimensions of a_vec: {a_vec.ndim}')

Number of dimensions of a_vec: 1


## Transposing arrays

Given an array `A`, taking `A.T` gives the transpose of `A`. However, this will not work if `A` is a 1-dimensional array. In that case, we can write `A[:,numpy.newaxis]` to obtain a 2-dimensional column array of `A`.

In [15]:
A = np.random.random(size=(10,))
a_trans = A[:,np.newaxis]
print(f'(rows, columns) of a_trans: {a_trans.shape}')

(rows, columns) of a_trans: (10, 1)


# TEST

In [28]:
w = np.random.randint(low=1,high=5,size=(5,))
X = w * np.ones((5,5))
# Y = np.zeros((5,))
# np.fill_diagonal(X, Y)]

In [27]:
N = 68
num_trials = N**2
start = time.time()
for i in range(num_trials):
    X = np.random.random(size=(N,N))
    v = np.random.random(size=(N,))
    sol = np.linalg.solve(X,v)
end = time.time()
print(f'Time taken = {end - start}')

Time taken = 0.508638858795166


2351.9460830688477