* [Vectorization](#vect)
* [Vectorizing Logistic Regression](#vectlogreg)
* [Broadcasting](#broadcasting)
* [Python Numpy](#numpy)
* [Explanation of Logistic Regression Cost function](#explanation-of-logistic-regression-cost-function)

<img id="vect" src="https://i.imgur.com/TxXjiYe.png" style="width:650px;height:370px; float: left;">

In [1]:
import numpy as np

a = np.array([1,2,3,4])
print(a)
a = a.reshape(4,1)
print(a)

[1 2 3 4]
[[1]
 [2]
 [3]
 [4]]


In [2]:
import time

a = np.random.rand(1000000)
b = np.random.rand(1000000)

tic = time.time()
c = np.dot(a,b)
toc = time.time()

print(c)
print("Vectorized version:" + str(1000*(toc-tic)) + "ms")

c = 0
tic = time.time()
for i in range(1000000):
    c += a[i]*b[i]
toc = time.time()

print(c)
print("For loop version:" + str(1000*(toc-tic)) + "ms")

249600.7661425908
Vectorized version:16.780853271484375ms
249600.76614258674
For loop version:362.78390884399414ms


<img src="https://i.imgur.com/dazXdzS.png" style="width:650px;height:380px; float: left;">

<img src="https://i.imgur.com/TAYpdTB.png" style="width:650px;height:380px; float: left;">

<img src="https://i.imgur.com/kWcoJQh.png" style="width:650px;height:380px; float: left;">

<img id="vectlogreg" src="https://i.imgur.com/H42svVk.png" style="width:650px;height:380px; float: left;">

$$z = [z^{(1)}, ..., z^{(m)}] = [w^{T}x^{(1)}+b, ..., w^{T}x^{(m)}+b] = np.dot(w\cdot T, x) + b$$

$$A = \sigma(z) = [\sigma(w^{T}x^{(1)}+b), ..., \sigma(w^{T}x^{(m)}+b)] = [a^{(1)}, ... , a^{(m)}]$$

<img src="https://i.imgur.com/Xng06L5.png" style="width:650px;height:380px; float: left;">

<img src="https://i.imgur.com/SAyvy3n.png" style="width:650px;height:380px; float: left;">

Number of features in a single image input $= 64 * 64 * 3 = 12288$

We assume $ m = m_{train}$ and $n = n_{x}.$<br>
For $i=1,2, \dots , m$
$$ x^{(i)} = \begin{bmatrix}
    x^{(i)}_{1} \\
    x^{(i)}_{2} \\
    \vdots \\
    x^{(i)}_{n}
\end{bmatrix}$$ <br>

That is
$$ X = \begin{bmatrix}
    \vdots & &\vdots \\
    x^{(1)} & \dots & x^{(m)} \\
    \vdots & & \vdots
\end{bmatrix} $$

$$ A = \begin{bmatrix}a^{(1)} & \dots & a^{(m)} \end{bmatrix} = 
\begin{bmatrix} \sigma (w^Tx^{(1)} + b) & \dots & \sigma (w^Tx^{(m)} + b) \end{bmatrix}\ where\ \hat{y}^{(i)} = a^{(i)}$$ 

Loss function:
$$ L(a^{(i)}, y^{(i)}) = -[y^{(i)}\log{a^{(i)}} + (1-y^{(i)})\log{(1-a^{(i)}})]$$



Assume $n = 2,\ m =1$:<br>
$$ z^{(i)} = w_1x_1 + w_2x_1 + b$$<br>
$$ dz^{(i)} = \frac{dL}{dz^{(i)}} = \frac{dL}{da^{(i)}} \frac{da^{(i)}}{dz^{(i)}}
 = [-\frac{y^{(i)}}{a^{(i)}} + \frac{1-y^{(i)}}{1-a^{(i)}}][a^{(i)}(1-a^{(i)})] = a^{(i)} - y^{(i)} $$<br>

$$ dw_1 = \frac{dL}{dw_{1}} = \frac{dL}{dz^{(i)}}\frac{dz^{(i)}}{dw_{1}} = \frac{dL}{dz^{(i)}}x_{1} $$<br>
$$ dw_2 = \frac{dL}{dw_{2}} = \frac{dL}{dz^{(i)}}x_{2} $$<br>
$$ db = \frac{dL}{db} = \frac{dL}{dz^{(i)}}\frac{dz^{(i)}}{db} = \frac{dL}{dz^{(i)}} $$<br>

Assume for general case with $n,\ m$<br>
$$ z^{(i)} = w^Tx^{(i)} + b = \begin{bmatrix}w_1 & \dots & w_n \end{bmatrix} \begin{bmatrix}
    x^{(i)}_{1} \\
    x^{(i)}_{2} \\
    \vdots \\
    x^{(i)}_{n_x}
\end{bmatrix} + b = w_1x^{(i)}_1 + w_2x^{(i)}_2 + \dots + w_nx^{(i)}_n + b $$

$$ dz^{(i)} = \frac{dL}{dz^{(i)}} = a^{(i)} - y^{(i)} $$
$$ dw_1 = \frac{dL}{dw_1} = dz^{(i)}x^{(i)}_{1} $$
$$ dw_2 = \frac{dL}{dw_2} = dz^{(i)}x^{(i)}_{2} $$
$$\dots$$
$$ dw_n = \frac{dL}{dw_n} = dz^{(i)}x^{(i)}_{n} $$

$ \frac{dL}{dW}$ can be expressed as 
$$ \frac{dL}{dW} = \begin{bmatrix} dz^{(i)}x^{(i)}_{1} & dz^{(i)}x^{(i)}_{2} & \dots & dz^{(i)}x^{(i)}_{n} \end{bmatrix} = \begin{bmatrix} x^{(i)}_{1} & x^{(i)}_{2} & \dots & x^{(i)}_{n} \end{bmatrix} dz^{(i)}
= x^{(i)}dz^{(i)}
$$

$$ \frac{dL}{db} = \frac{dL}{dz^{(i)}} = dz^{(i)} $$<br>

$$ dZ = \begin{bmatrix} dz^{(1)} & dz^{(2)} & \dots & dz^{(m)} \end{bmatrix} $$

$$ J(w,b) = \frac{1}{m} \sum_{i=1}^{m} L(a^{(i)},y^{(i)})$$

$$ dW = \frac{dJ}{dW} = \frac{1}{m}\sum_{i=1}^{m} \frac{dL(a^{(i)},y^{(i)})}{dW} 
= \frac{1}{m}\sum_{i=1}^{m}x^{(i)}dz^{(i)} = \frac{1}{m}[x^{(1)}dz^{(1)} + x^{(2)}dz^{(2)} + \dots + x^{(m)}dz^{(m)}] 
= \frac{1}{m}\begin{bmatrix}
    \vdots & &\vdots \\
    x^{(1)} & \dots & x^{(m)} \\
    \vdots & & \vdots
\end{bmatrix}
\begin{bmatrix} 
dz^{(1)} \\
dz^{(2)} \\
\dots \\
dz^{(m)} \end{bmatrix}
= \frac{1}{m}X\cdot(dZ)^T$$

$$ db = \frac{dJ}{db} = \frac{1}{m}\sum_{i=1}^{m} \frac{dL(a^{(i)},y^{(i)})}{db} 
= \frac{1}{m}\sum_{i=1}^{m} dz^{(i)} = \frac{1}{m}np.sum(dZ)$$

<img id="broadcasting" src="https://i.imgur.com/yKUl3eP.png" style="width:650px;height:400px; float: left;">

In [3]:
import numpy as np

A = np.array([[56.0, 0.42, 1.0, 68.0],
             [1.2,104.0,52.0,8.0],
             [1.8,135.0,99.0,0.9]])

print(A.sum(axis=0))
print(A.sum(axis=1))

[ 59.   239.42 152.    76.9 ]
[125.42 165.2  236.7 ]


In [4]:
cal = A.sum(axis=0)
print(cal)

percentage = A/cal.reshape(1,4)
print(percentage)

[ 59.   239.42 152.    76.9 ]
[[0.94915254 0.00175424 0.00657895 0.88426528]
 [0.02033898 0.43438309 0.34210526 0.10403121]
 [0.03050847 0.56386267 0.65131579 0.01170351]]


<img src="https://i.imgur.com/W3neiIl.png" style="width:650px;height:400px; float: left;">

<img src="https://i.imgur.com/6TyoPUd.png" style="width:650px;height:400px; float: left;">

<img id="numpy" src="https://i.imgur.com/ydP5Lq3.png" style="width:620px;height:340px; float: left;">

DO NOT USE $$a=np.random.randn(5)$$
USE $$a=np.random.randn(5,1)$$

In [5]:
import numpy as np

a = np.random.randn(5)
print(a)

[ 0.12435934  0.55887327 -0.74688549  0.75457606  2.42248591]


In [6]:
#DO NOT USE rank 1 array(neither row/column vector)
print(a.shape)

(5,)


In [7]:
print(a.T)

[ 0.12435934  0.55887327 -0.74688549  0.75457606  2.42248591]


In [8]:
print(np.dot(a,a.T))

7.323465529687509


In [9]:
# column vector
a = np.random.randn(5,1)
print(a)

[[ 1.25004253]
 [ 0.78163914]
 [-0.27280061]
 [-0.06877438]
 [-0.84532257]]


In [10]:
print(a.T)

[[ 1.25004253  0.78163914 -0.27280061 -0.06877438 -0.84532257]]


In [11]:
print(np.dot(a,a.T))

[[ 1.56260631  0.97708217 -0.34101236 -0.0859709  -1.05668916]
 [ 0.97708217  0.61095975 -0.21323164 -0.05375675 -0.66073721]
 [-0.34101236 -0.21323164  0.07442017  0.01876169  0.23060451]
 [-0.0859709  -0.05375675  0.01876169  0.00472992  0.05813654]
 [-1.05668916 -0.66073721  0.23060451  0.05813654  0.71457024]]


In [12]:
assert(a.shape == (5,1))

# Explanation of Logistic Regression Cost function
$$\hat{y}^{(i)}= \sigma(w^T X^{(i)} + b) = \sigma(z^{(i)}) = \frac{1}{1+e^{-z}},\ where\ z^{(i)}= w^T X^{(i)} + b$$

$$\hat{y}= P(y=1|X)$$

$$If\ y=1:\ P(y|x) = \hat{y}$$
$$If\ y=0:\ P(y|x) = 1 - \hat{y}$$

$$P(y|x) = \hat{y}^{y}\cdot(1 - \hat{y})^{(1-y)}$$

$$\log{P(y|x)} = \log{\hat{y}^{y}\cdot(1 - \hat{y})^{(1-y)}} = y\cdot\log{\hat{y}} + (1 - y)\cdot\log(1-\hat{y}) = -L(\hat{y}, y)$$

# Cost on m examples
find maximum likelyhood estimation; choose parameters to maximize probability
$$\log{p(labels\ in\ training\ set)} = \log{\prod_{i=1}^{m}p(y^{(i)}|x^{(i)})} = \sum_{i=1}^{m}\log{p(y^{(i)}|x^{(i)})}$$
$$ = \sum_{i=1}^{m}-L(\hat{y}^{(i)}, y^{(i)})$$

minimize cost to maximize the probability
$$Cost\ = J(w,b) = \frac{1}{m}\cdot\sum_{i=1}^{m}-L(\hat{y}^{(i)}, y^{(i)})$$