**Mathematical expression of the algorithm**:

For one example $x^{(i)}$:
$$z^{(i)} = w^T x^{(i)} + b \tag{1}$$
$$\hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})\tag{2}$$ 
$$ \mathcal{L}(yhat^{(i)}, y^{(i)}) =  - y^{(i)}  \log(yhat^{(i)}) + (1-y^{(i)} )  \log(1-yhat^{(i)})\tag{3}$$

The cost is then computing:
$$ J = \frac{1}{m} \sum_{i=1}^m \mathcal{L}(yhat^{(i)}, y^{(i)})\tag{6}$$

Gradient Computing:
- $$ \frac{\partial J}{\partial w} = \frac{1}{m}X(yhat-Y)^T\tag{7}$$
- $$ \frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (yhat^{(i)}-y^{(i)})\tag{8}$$


In [82]:
import numpy as np
import tensorflow as tf

## Weight and bias initializer

In [83]:
def initializer(input_dim: int) -> tuple:
    w = tf.zeros([input_dim, 1], dtype=tf.float64)
    b = 0.0
    return w, b

In [84]:
w, b = initializer(3)
w

<tf.Tensor: shape=(3, 1), dtype=float64, numpy=
array([[0.],
       [0.],
       [0.]])>

In [85]:
tf.transpose(w)

<tf.Tensor: shape=(1, 3), dtype=float64, numpy=array([[0., 0., 0.]])>

## Calculate $z$ for all $x^{(i)}$:
$$z^{(i)} = w^T x^{(i)} + b \tag{1}$$

In [86]:
def forward(W: tf.Tensor, b: tf.float64, X: tf.Tensor):
    wT = tf.transpose(w)
    Z = tf.tensordot(wT, X, axes=1) + b
    return Z

In [87]:
X = tf.Variable(
    [2,3,4], dtype=tf.float64
)
Y = tf.Variable([1], dtype=tf.float64)

In [88]:
z = forward(w, b, X)
z

<tf.Tensor: shape=(1,), dtype=float64, numpy=array([0.])>

# Sigmoid Funtion
compute $sigmoid(z) = \frac{1}{1 + e^{-z}}$ for $z = w^T x + b$ to make predictions. Use np.exp() or tf.exp().

In [89]:
def sigmoid(Z: tf.Tensor):
    a = 1/(1 + tf.exp(-Z))
    return a

In [90]:
yhat = sigmoid(z)
yhat

<tf.Tensor: shape=(1,), dtype=float64, numpy=array([0.5])>

## Calculate the Cost :
 $J = -\frac{1}{m}\sum_{i=1}^{m}(y^{(i)}\log(yhat^{(i)})+(1-y^{(i)})\log(1-yhat^{(i)}))$

In [91]:
def compute_cost(Y: tf.Tensor, Yhat: tf.Tensor):
    m = Yhat.shape[0]
    loss = tf.reduce_sum((Y * tf.math.log(Yhat)) + ((1-Y) * tf.math.log(1-Yhat)))
    c = (-1/m) * loss
    return c


In [92]:
compute_cost(Y, yhat)

<tf.Tensor: shape=(), dtype=float64, numpy=0.6931471805599453>

## Forward Propagation:
- You get X
- You compute $yhat = \sigma(w^T X + b) $
- You calculate the cost function: $J = -\frac{1}{m}\sum_{i=1}^{m}(y^{(i)}\log(yhat^{(i)})+(1-y^{(i)})\log(1-yhat^{(i)}))$

In [93]:
W, b = initializer(2)
def forward_prop(W: tf.Tensor, b:tf.Tensor, X: tf.Tensor, Y:tf.Tensor):
    Z = forward(W, b, X)
    Yhat = sigmoid(Z)
    cost = compute_cost(Y, Yhat)
    return Yhat, tf.squeeze(cost)

In [96]:
X = tf.Variable(
    [
        [2, 3, 4, 5, 6],
        [7, 2, 3, 4, 8],
        [8, 8, 2, 4, 8]
    ], dtype=tf.float64
)
Y = tf.Variable([[1, 1, 0, 0, 1]], dtype=tf.float64)
tf.transpose(X)

<tf.Tensor: shape=(5, 3), dtype=float64, numpy=
array([[2., 7., 8.],
       [3., 2., 8.],
       [4., 3., 2.],
       [5., 4., 4.],
       [6., 8., 8.]])>

In [97]:
forward_prop(W, b, X, Y)

(<tf.Tensor: shape=(1, 5), dtype=float64, numpy=array([[0.5, 0.5, 0.5, 0.5, 0.5]])>,
 <tf.Tensor: shape=(), dtype=float64, numpy=3.4657359027997265>)

## Back Propagation: 

- $$ \frac{\partial J}{\partial w} = \frac{1}{m}X(yhat-y)^T\tag{7}$$
- $$ \frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (yhat^{(i)}-y^{(i)})\tag{8}$$

# Optimizer

# Model
- Initialize $$ w,b $$
- Forward Propagation:
    - You get X
    - You compute $yhat = \sigma(w^T X + b) $
    - You calculate the cost function: $J = -\frac{1}{m}\sum_{i=1}^{m}(y^{(i)}\log(yhat^{(i)})+(1-y^{(i)})\log(1-yhat^{(i)}))$
- Back Propagation: 
    - $$ \frac{\partial J}{\partial w} = \frac{1}{m}X(yhat-y)^T\tag{7}$$
    - $$ \frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (yhat^{(i)}-y^{(i)})\tag{8}$$
- Update weights:
    - $$ w = w - {\alpha} * \frac{\partial J}{\partial w} $$
    - $$ b = b- {\alpha}  * \frac{\partial J}{\partial b} $$

# Test Model