## Section II Python and Vectorization

#### 1. Vectorization

##### (1) Example of calculating *z* in logistic regression
$z = w^Tx + b, \mathrm{where:} w = \left[\begin{array}{c}
w_1\\
w_2\\
\vdots\\
w_{n_x}
\end{array}\right], x = \left[\begin{array}{c}
x_1\\
x_2\\
\vdots\\
x_{n_x}
\end{array}\right]; w \in \textbf{R}^{n_{x}}, x \in \textbf{R}^{n_{x}}$

- **For non-vectorized (i.e. *for* loop) case**:
    - 1. $z = 0$
    - 2. for *i* in range($n_x$):
        - 3-1. $z\ +=\ w^{(i)}x^{(i)}$
    - 4. $z\ +=\ b$
- **for vectorized case**:
    - 1. $z = \mathrm{np.dot}(w,x)+b$

##### (2) Vectorization saving time (Python)

In [7]:
import numpy as np
import time

a = np.random.rand(1000000)
b = np.random.rand(1000000)

In [8]:
# Get the computation time with vectorized method np.dot
tic = time.time()
c = np.dot(a,b)
toc = time.time()

print('c = ' + str(c) + ', Vectorized version:' +str(1000*(toc-tic)) + 'ms')

c = 249869.8724567009, Vectorized version:0.9975433349609375ms


In [9]:
# Get the computation time with non-vectorized method for loop 
tic = time.time()
c = 0
for ai, bi in zip(a,b):
    c += ai*bi
toc = time.time()
print('c = ' + str(c) + ', Non-vectorized version:' + str(1000*(toc-tic))+'ms')

c = 249869.87245669932, Non-vectorized version:328.11975479125977ms


- Comparing the two codes: vectorization speeds up more than **300** times!
- Using the internal functions (like *np.dot*) can enable python numpy to use parallelism methods to compute with GPU and CPU much faster.
- **Rule of thumb**: whenever possible, avoid explicit *for* loops.

##### (3) Other vectorization examples
$\boxed{\textbf{u}_{m} = \textbf{A}_{mn}\textbf{v}_{n}}$

- **For non-vectorized (i.e. *for* loop) case**:
    - 1. $\textbf{u} = \mathrm{np.zeros}((n,1))$
    - 2. *for* $i\ \mathrm{in}\ \mathrm{range}(m)$:
        - 3. *for* $j\ \mathrm{in}\ \mathrm{range}(n)$:
            - 4. $\textbf{u}_i\ += \textbf{A}_{ij}*\textbf{v}_j$
- **for vectorized case**:
    - 1. $\textbf{u} = \mathrm{np.dot}(\textbf{A},\textbf{v})$  
$\qquad\qquad\qquad\qquad\qquad\qquad\qquad$  

$\boxed{\textbf{v} =  \left[\begin{array}{c}
v_1\\
v_2\\
\vdots\\
v_n
\end{array}\right] \Longrightarrow \textbf{u} = \left[\begin{array}{c}
e^{v_1}\\
e^{v_2}\\
\vdots\\
e^{v_n}
\end{array}\right]}$
- **For non-vectorized (i.e. *for* loop) case**:
    - 1. $\textbf{u} = \mathrm{np.zeros}((n,1))$
    - 2. *for* $v_i\ \mathrm{in}\ \textbf{v}$:
        - 3. $u_i = e^{v_i}$
- **for vectorized case**:
    - 1. $\textbf{u} = \mathrm{np.exp}(\textbf{v})$

$\qquad\qquad\qquad\qquad\qquad\qquad\qquad$
- **Other vectorization codes**:
    - np.log(v) - computes natural logarithm of every elements in *v*
    - np.abs(v) - computes the absolute value of every elements in *v*
    - np.maximum(u,v) - compares every elements in *u* and *v* and provides the maximum of every pair
    - v**2 - computes the square of every elements in *v*
    - 1/v - computes the inverse of every elements in *v*

#### 2. Vectorizing Logistic Regression

-   **Original code** (from ***Week2-1**: 7. Logistic Regression Gradient Descent*)
>    - 1. Set: $J = 0, \mathrm{d}w_1 = 0, \mathrm{d}w_2 = 0, \mathrm{d}b = 0$
>    - 2. *for* $i = 1\ to\ m$:
>        - 3-1. $z^{(i)} = w^Tx^{(i)}+b$
>        - 3-2. $a^{(i)} = \sigma (z^{(i)})$
>        - 3-3. $J\ +=\ y^{(i)}\log a^{(i)} + (1-y^{(i)})\log(1-a^{(i)}) $
>        - 3-4. $\mathrm{d}z^{(i)} = a^{(i)} - y^{(i)}$
>        - 3-5. $\mathrm{d}w_1\ +=\ x_1^{(i)}\mathrm{d}z^{(i)}$
>        - 3-6. $\mathrm{d}w_2\ +=\ x_2^{(i)}\mathrm{d}z^{(i)}$ - If more than 2 features: another *for* loop
>        - 3-7. $\mathrm{d}b\ +=\ \mathrm{d}z^{(i)}$
>    - 4-1. $J := J/m$ - Key step: get *J* value to determine the performance
>    - 4-2. $\mathrm{d}w_1 := \mathrm{d}w_1/m$
>    - 4-3. $\mathrm{d}w_2 := \mathrm{d}w_2/m$
>    - 4-4. $\mathrm{d}b := \mathrm{d}b/m$
>    - 5-1. $w_1 := w_1 - \alpha\mathrm{d}w_1$
>    - 5-2. $w_2 := w_2 - \alpha\mathrm{d}w_2$
>    - 5-3. $b := b - \alpha\mathrm{d}b$

- **Vectorizing *for* loop for multiple features**: 
    - Combine $w_1, w_2, ..., w_n$ into $\textbf{w}$ and $x_1, x_2, ..., x_n$ into $\textbf{x}$, and compute $\mathrm{d}\textbf{w}\ += \textbf{x}^{(i)}\mathrm{d}z^{(i)}$  
    
$\boxed{z^{i}\ =\ w^Tx^{(i)}+b, a^{(i)}\ =\ \sigma(z^{(i)})}$, *for* $i\ \mathrm{in\ range}(m)$
- **Vectorizing logistic regression prediction *a***
    - $\textbf{z} = [z^{(1)}\ z^{(2)}\ ...\ z^{(m)}],\textbf{X} = [x^{(1)}\ x^{(2)}\ ...\ x^{(m)}]\Rightarrow \textbf{z} = w^T\textbf{X}+\textbf{b},\ where:\ \textbf{b}=[b\ b\ ...\ b],1\times m$
    - $[z^{(1)}\ z^{(2)}\ ...\ z^{(m)}] = [w_1\ w_2\ ...\ w_n]\left[\begin{array}{c}
x_1^{(1)} & x_1^{(2)} & ... & x_1^{(m)}\\
x_2^{(1)} & x_2^{(2)} & ... & x_2^{(m)}\\
\vdots & \vdots & ... & \vdots\\
x_n^{(1)} & x_n^{(2)} & ... & x_n^{(m)}
\end{array}\right]\ +[b\ b\ ...\ b] = [\textbf{w}^T\textbf{x}^{(1)}+b\ \ \textbf{w}^T\textbf{x}^{(2)}+b\ \ ...\ \ \textbf{w}^T\textbf{x}^{(m)}+b]$
        - **Python code**: Z = np.dot(w.T, X) + b (python expands *b* to 1*m vector and add it to the vector; **broadcasting**)
    - $\textbf{a} = \sigma(\textbf{z})$


#### 3. Broadcasting in Python