# Lab 3 - Computational Efficiency in numpy
- **Author:** Emily Aiken ([emilyaiken@berkeley.edu](emilyaiken@berkeley.edu)) (based on Qutub Khan Vajihi and Dimitris Papadimitriou's Labs)
- **Date:** February 9, 2022
- **Course:** INFO 251: Applied Machine Learning

### Learning Objectives:
By the end of the lab, you will be able to:

* Use key numpy functions for matrix creation and manipulation
* Use vectorization for defining complex matrix operations
* Understand the trade-offs between 'for' loops and vectorized computation

### References:
* https://www.pythonlikeyoumeanit.com/Module3_IntroducingNumpy/VectorizedOperations.html
* https://numpy.org/doc/stable/user/basics.broadcasting.html

## I. Introduction to numpy

In [None]:
import numpy as np
import pandas as pd
import time

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter
%matplotlib inline

#### Tuples vs. lists vs. arrays vs. matrices

In [None]:
# Tuple
tup = (1, 2, 3)
tup

In [None]:
# List
lst = [1, 2, 3]
lst

In [None]:
# Array
arr = np.array(tup)
arr

In [None]:
# 2D Matrix
mat = np.array([[1, 2, 3], [4, 5, 6]])
mat

#### Dimensions

In [None]:
mat.shape

In [None]:
mat.ndim

#### Useful numpy functions - matrix creation

In [None]:
# Arrange - https://numpy.org/doc/stable/reference/generated/numpy.arange.html
np.arange(0, 10, 1)

In [None]:
# Linspace - https://numpy.org/doc/stable/reference/generated/numpy.linspace.html
np.linspace(0, 10, 10, endpoint=False)

In [None]:
# Logspace - https://numpy.org/doc/stable/reference/generated/numpy.logspace.html
np.logspace(0, 10, 10, endpoint=False, base=2)

In [None]:
# Zeros - https://numpy.org/doc/stable/reference/generated/numpy.zeros.html
np.zeros(5)

In [None]:
# Ones - https://numpy.org/doc/stable/reference/generated/numpy.ones.html
np.ones(5)

In [None]:
# Full - https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.full.html
np.full(5, 12)

In [None]:
# Meshgrid - https://numpy.org/doc/stable/reference/generated/numpy.meshgrid.html
xvalues = np.array([0, 1, 2, 3, 4]);
yvalues = np.array([0, 1, 2, 3, 4]);
print(np.meshgrid(xvalues, yvalues))
xx, yy = np.meshgrid(xvalues, yvalues)
plt.plot(xx, yy, marker='.', color='k', linestyle='none');

#### Useful numpy functions - matrix manipulation

In [None]:
# Vstack - https://numpy.org/doc/stable/reference/generated/numpy.vstack.html
matrix1 = np.array([[1, 2, 3], [1, 2, 3]])
matrix2 = np.array([[5, 6, 7], [5, 6, 7]])
print(matrix1)
print(matrix2)

np.vstack([matrix1, matrix2]) # 2D arrays must have the same 2nd dimension (number of columns)

In [None]:
# Hstack - https://numpy.org/doc/stable/reference/generated/numpy.hstack.html
matrix1 = np.array([[1, 2, 3], [1, 2, 3]])
matrix2 = np.array([[5, 6, 7], [5, 6, 7]])
print(matrix1)
print(matrix2)

np.hstack([matrix1, matrix2]) # 2D arrays must have the same 1st dimension (number of rows)

In [None]:
# Concatenate - https://numpy.org/doc/stable/reference/generated/numpy.concatenate.html
matrix1 = np.array([[1, 1, 1]])
matrix2 = np.array([[2, 2, 2], [2, 2, 2], [2, 2, 2]])
matrix3 = np.array([[3, 3, 3], [3, 3, 3]])

# Arrays must have the same dimensions except for the concatenation axis
np.concatenate([matrix1, matrix2, matrix3], axis=0) 

In [None]:
# Transpose - https://numpy.org/doc/stable/reference/generated/numpy.transpose.html
mat = np.array([[1, 2, 3], [4, 5, 6]])
print(mat)
print('----')
print(mat.T)

## II. Matrix Operations and Broadcasting

#### Matrix addition

In [None]:
# Addition and subtraction -- adding a constant
arr = np.array([[1, 1, 1], [2, 2, 2]])
print(arr)
print('----')
print(arr + 1)

In [None]:
# Addition and subtraction -- element-wise
arr1 = np.array([[1, 1, 1], [1, 2, 3]])
arr2 = np.array([[2, 2, 2], [3, 4, 5]])
arr1 + arr2

In [None]:
# Addition and subtraction - broadcasting
arr1 = np.array([[1, 1, 1], [1, 2, 3]])
arr2 = np.array([1, 1, 2])
arr1 + arr2

<img src="Images/Br1.png" width=700 height=700 />

In [None]:
# Key rule for broadcasting: the size of the "trailing axes" must be the same (or one must be 1)
arr1 = np.array([0, 10, 20, 30]).reshape(4, 1)
arr2 = np.array([0, 1, 2]).reshape(1, 3)
arr1 + arr2

#### Matrix multiplication

In [None]:
# Multiplication by a constant
arr = np.array([[10, 20, 30], [50, 60, 70]])
arr*2

<img src="Images/Hadamard.png" width=700 height=700/>

In [None]:
# Element-wise multiplication (Hadamard product)
arr1 = np.array([[3, 5, 7], [4, 9, 8]])
arr2 = np.array([[1, 6, 3], [0, 2, 9]])
arr1*arr2

<img src="Images/DotProduct.svg" width=700 height=700/>

In [None]:
# Dot product
arr1 = np.array([[3, 5, 7], [4, 9, 8]])
arr2 = np.array([[1, 6, 3], [0, 2, 9]])
#arr1 = [[1, 2, 3], [4, 5, 6]]
#arr2 = [[7, 8], [9, 10], [11, 12]]

np.dot(arr1, arr2) # Key for dot product: First dimension of matrix 1 same as second dimension of matrix 2

## III. Vectorized Computation

#### Example 1: Taking the sum of all integers between 1 and 10,000 using a for loop. 

In [None]:
t_start = time.time()
total = 0
for i in np.arange(10000):
     total = i + total

t1 = time.time() - t_start
print('The result is {} computed in {} seconds'.format(total, t1))

Not too bad, right? Lets try the same operation using NumPy!

In [None]:
t_start = time.time()
# TODO: Write the same function as above, but this time with numpy
total = np.sum(np.arange(10000))
t2 = time.time() - t_start
print('The result is {} computed in {} seconds'.format(total, t2))

In [None]:
print("{:.1f} times faster".format(t1 / t2))

#### Example 2: Element-wise multiplication

In [None]:
x = np.arange(0, 1000000, 1)
y = np.arange(0, 1000000, 1)

In [None]:
def sum_product(x, y):
    """Return the sum of x[i] * y[i] for all indices."""
    
    # using for loop here
    s = 0
    for i in range(len(x)):
        s += x[i] * y[i]
    return s

t_start = time.time()
r = sum_product(x, y)
t_end = time.time()
t1 = t_end - t_start
print('The result is {} computed in {} seconds'.format(r, t1))

In [None]:
t_start = time.time()
# TODO: Write the same function as above, but this time with numpy
r = np.sum(x * y)
t_end = time.time()
t2 = t_end - t_start
print('The result is {} computed in {} seconds'.format(r, t2))

In [None]:
print("{:.1f} times faster".format(t1 / t2))

#### Example 3: Maximizing a complex function

We want to maximize a function $f$ of two variables $(x,y)$:  $f(x, y) = \frac{cos(x^2 + y^2)}{1 + x^2 + y^2}$

To maximize it, we’re going to use a naive grid search:
1. Evaluate $f$ for all $(x,y)$ in a grid on the square
2. Return the maximum of observed values

In [None]:
# The function
def f(x, y):
    return np.cos(x**2 + y**2) / (1 + x**2 + y**2)

In [None]:
# Make a mesh grid for our grid search
X = np.arange(-5, 5, 0.1)
Y = np.arange(-5, 5, 0.1)
X, Y = np.meshgrid(X, Y)

# Get the values of z at each point in the mesh grid
Z = f(X, Y)

In [None]:
# Create a figure
fig = plt.figure(figsize=(10,7))
ax = fig.gca(projection='3d')

# Plot the surface.
surf = ax.plot_surface(X, Y, Z, facecolors=cm.jet(Z),
                       linewidth=0, antialiased=False, shade=False)

# Customize the z axis.
ax.set_zlim(-0.4, 1.01)
ax.zaxis.set_major_locator(LinearLocator(10))
ax.zaxis.set_major_formatter(FormatStrFormatter('%.02f'))
plt.show()

Maximizing the function using for loopss:

In [None]:
X = np.arange(-5, 5, 0.1)
Y = np.arange(-5, 5, 0.1)
m = -np.inf
t_start = time.time()
for x in X:
    for y in Y:
        z = f(x, y)
        if z > m:
            m = z
print('The maximum value observed is:',m)
t_end = time.time()
t1 = t_end - t_start
print("Time: {:.5f}s".format(t1))

Maximizing the function using numpy:

In [None]:
t_start = time.time()
# TODO: Write the same function using numpy
x, y = np.meshgrid(X, Y)
Z = f(x, y)
print('The maximum value observed is:',np.max(Z))
t_end = time.time()
t2 = t_end - t_start
print("Time: {:.5f}s".format(t2))

In [None]:
print("{:.0f} times faster".format(t1 / t2))

#### Experimental comparison between numpy and built-in python functions

<img src="Images/Comparison.png" width=700 height=700 />

## Appendix: Additional numpy functions

### Unary Functions
A mathematical function that only accepts one operand (i.e. argument): f(x)

<img src="Images/Unary.png" width=500 height=500 />

### Binary Functions
A mathematical function that only accepts two operands: f(x,y).
There are two cases that we must consider when working with binary functions, in the context of NumPy arrays:

* When both operands of the function are arrays (of the same shape).
* When one operand of the function is a scalar (i.e. a single number) and the other is an array.

<img src="Images/Binary.png" width=700 height=700 />

### Sequential Functions
A sequential function expects a variable-length sequence of numbers as an input, and produces a single number as an output.

<img src="Images/Seq.png" width=700 height=700 />