<img src="https://www.mines.edu/webcentral/wp-content/uploads/sites/267/2019/02/horizontallightbackground.jpg" width="100%"> 
### CSCI250 Python Computing: Building a Sensor System
<hr style="height:5px" width="100%" align="left">

# `numpy`: 1D vectorization

# Objectives
* introduce fast vectorized `numpy` array operations
* evaluate computational speed-up relative to loops
* discuss aggregations and masking applied to `numpy` arrays 

# Resources
* [numpy.org](http://www.numpy.org)
* [`numpy` user guide](https://docs.scipy.org/doc/numpy/user)
* [`numpy` reference](https://docs.scipy.org/doc/numpy/reference)
* [`numpy` ufuncs](https://numpy.org/doc/stable/reference/ufuncs.html)

# Definition

**Vectorization**: a computational style in which multiple operations are executed at once, i.e. execute a single global operation instead of multiple smaller operations in a loop. 

Has multiple advantages:
* **compact appearance**: the code resembles math
* **error reduction**: the code is shorter, less complex
* **execution performance**: the code runs much faster

# Vectorized calculations

Computational speed-up can be achieved using two main methods:
1. universal functions 
2. fast array selection 

# 1. universal functions (ufuncs)

* operate element-by-element on `numpy` arrays
* are often implemented in compiled C code

Some are called automatically when the infix notation is clear:
* `np.add(a,b)` is called for `a + b`

In [None]:
import numpy as np
import math,time

## arithmetic ufuncs

* `np.add(),np.multiply(),np.floor_divide(),...`

or

* `+, *, //, ...`

In [None]:
n = int(1e6)

a = np.linspace(0,1,n, dtype=float)
b = np.linspace(1,0,n, dtype=float)
c = np.empty(       n, dtype=float)

**non-vectorized code** (uses loops)

In [None]:
tick = time.time()

for i in range(n):
    c[i] = a[i] * b[i]
    
tock = time.time()
dLOOP = int((tock-tick)*1e6)

print( int(c.sum()), dLOOP,'us' )

**vectorized code** (does not use loops)

In [None]:
tick = time.time()

c = a * b
    
tock = time.time()
dVECT = int((tock-tick)*1e6)

print( int(c.sum()), dVECT,'us' )

The **execution time** ratio is

In [None]:
int(dLOOP/dVECT)

## comparison ufuncs
* `np.less(),np.greater(),np.not_equal(),...`

or

* `<, >, !=, ...`

In [None]:
n = 11
a = np.linspace(0,1,n, dtype=float)
b = np.linspace(1,0,n, dtype=float)
print(a)
print(b)

In [None]:
print( np.greater(a,b) )  # ufunc

In [None]:
print( a > b )            # infix

## bitwise ufuncs
* `np.bitwise_and(),np.left_shift(),...`

or

* `&, >>, ...`

In [None]:
n = 11
a = np.linspace(0,n,n, dtype=int)
b = np.linspace(n,0,n, dtype=int)
print(a)
print(b)

In [None]:
print( np.bitwise_and(a,b) ) # ufunc

In [None]:
print( a & b )               # infix

## logical ufuncs

`np.logical_and(),np.logical_or(),...`

**N.B.**: no infix equivalents

In [None]:
n = 11
a = np.linspace(0,1,n, dtype=float) > 0.25
b = np.linspace(0,1,n, dtype=float) > 0.75
print(a)
print(b)

In [None]:
print( np.logical_and(a,b)) # ufunc

In [None]:
print( a and b )            # 

## trigonometric ufuncs
`np.sin(),np.arcsin(),np.sinh(),...`

## exp/log ufuncs
`np.exp(),np.log(),np.log10(),...`

`...`

## aggregation ufuncs

`np.sum()`, `np.prod()`

In [None]:
n = int(1e6)
a = np.linspace(0,1,n, dtype=float) > 0.25

In [None]:
tick = time.time()

s = sum(a)

tock = time.time()
print( int(s), int((tock-tick)*1e6),'us' )

In [None]:
tick = time.time()

s = np.sum(a)

tock = time.time()
print( int(s), int((tock-tick)*1e6), 'us' )

## more aggregation ufuncs

`np.min()`, `np.max()`

`np.all()`,`np.any()`

`np.mean()`, `np.median()`, `np.var()`, `np.std()`

`np.argmin()`, `np.argmax()`, `np.nanmin()`, `np.nanmax()`

`...`

<img src="http://www.dropbox.com/s/fcucolyuzdjl80k/todo.jpg?raw=1" width="10%" align="right">

Add examples to experiment with [`numpy` ufuncs](https://numpy.org/doc/stable/reference/ufuncs.html#available-ufuncs). 

Compare execution time with and without vectorized codes.

**N.B.**: use large arrays for relevant speed-up comparison.

# 2. Fast array selection

Methods to efficiently slice `numpy` arrays:
* fancy indexing
* array masking

## fancy indexing
Array indexing using arrays of integers.

In [None]:
nin = int(1e6)       # n before decimation
jmp = 100            #   decimation factor
nou = int(nin/jmp)   # n  after decimation

In [None]:
a = np.linspace(0,1,nin, dtype=float)
print(a.size)

In [None]:
c = np.empty(nou, dtype=float)
print(c.size)

**non-vectorized code** (uses loops)

In [None]:
tick = time.time()

for i in range(nou):
    c[i] = a[ i*jmp ]
    
tock = time.time()
dLOOP = int((tock-tick)*1e6)

print( int(c.sum()), dLOOP,'us' )

**vectorized code** (does not use loops)

In [None]:
# form array of indexes by list comprehension
k = np.array( [i for i in range(0,nin,jmp)] )
print(k.size)

In [None]:
tick = time.time()

c = a[k]
    
tock = time.time()
dVECT = int((tock-tick)*1e6)

print( int(c.sum()), dVECT,'us' )

In [None]:
int(dLOOP/dVECT)

## array masking
Array selection using vectorized logical operations.

In [None]:
n = int(1e6)

a = np.linspace(0,1,n, dtype=float)

aLo = 0.50
aHi = 0.75

**non-vectorized code** (uses loops)

In [None]:
c = np.zeros(n, dtype=float) 

In [None]:
tick = time.time()

j = 0
for i in range(n):
    if (a[i]<aLo) | (a[i]>aHi):
        c[j] = a[i]
        j += 1
    
tock = time.time()
dLOOP = int((tock-tick)*1e6)

print( int(c.sum()), dLOOP,'us' )

print(c.size) # output size = input size (inefficient)

**vectorized code** (does not use loops)

In [None]:
tick = time.time()

c = a[ (a<aLo) | (a>aHi) ]
    
tock = time.time()
dVECT = int((tock-tick)*1e6)

print( int(c.sum()), dVECT,'us' )

print(c.size) # automatically size output

In [None]:
int(dLOOP/dVECT)

<img src="http://www.dropbox.com/s/fcucolyuzdjl80k/todo.jpg?raw=1" width="10%" align="right">

Add examples to experiment with 
* fancy indexing 
* fast array masking.

Compare execution time with and without vectorized codes.

**N.B.**: use large arrays for relevant speed-up comparison.

<img src="https://www.dropbox.com/s/wj23ce93pa9j8pe/demo.png?raw=1" width="10%" align="left">

# Exercise

A sequence for computing number $\pi$ is:

$$\dfrac{\pi}{\sqrt{12}} = \sum\limits_{i=0}^{n} \dfrac{(-1)^{i}}{3^i(2i+1)}$$

* Use `numpy` vectorization to compute $\pi$ for $n$ terms.
* Evaluate the speed-up of the vectorized implementation.

**N.B.**: do not use loops.