# Vectorizarea calculelor

## Calculul functiei de activare (sigmoida)

$X = \begin{bmatrix} 
x_0^{(1)} & x_1^{(1)} & \dots & x_{n-1}^{(1)} \\
x_0^{(2)} & x_1^{(2)} & \dots & x_{n-1}^{(2)} \\
\dots & \dots & \dots & \dots \\
x_0^{(m)} & x_1^{(m)} & \dots & x_{n-1}^{(m)} \\
\end{bmatrix}
\qquad
\theta = \begin{bmatrix} 
\theta_0 \\
\theta_1 \\
\dots \\
\theta_{n-1} \\
\end{bmatrix}$

$h_\theta(X) = \frac{1}{1+e^{X \cdot \theta}}$ va fi un vector de forma (m x 1)

In [66]:
import numpy as np

x = np.array([[1, 3, 4, 5], [1, 2, 5, 3]])
print("x =\n", x)
theta = np.array([[0.2], [0.1], [0.4], [0.3]])
print("theta=\n", theta)

x =
 [[1 3 4 5]
 [1 2 5 3]]
theta=
 [[0.2]
 [0.1]
 [0.4]
 [0.3]]


In [67]:
# varianta cu "for-uri"
import math as m

h = np.zeros((x.shape[0], 1))
for i in range(x.shape[0]):
    for j in range(x.shape[1]):
        h[i, 0] += x[i, j] * theta[j]
for i in range(x.shape[0]):
    h[i, 0] = 1 + 1.0 / m.exp(- h[i, 0])
h

array([[37.59823444],
       [28.11263892]])

In [68]:
# varianta vectorizata
h = 1 + 1.0 / np.exp(- np.dot(x, theta))
print("h=\n", h)

h=
 [[37.59823444]
 [28.11263892]]


## Calculul update-urilor

$\theta_i = \theta_i - \alpha \frac{1}{m} \sum_{j=1}^{m} (h_\theta(x^{(j)}) - y^{(j)}) \cdot x_i^{(j)}$

Am văzut mai sus că $(h_\theta(X) - Y)$ este un vector coloană cu (m x 1) componente. Update-urile pentru $\theta_i$ se fac simultan pentru toţi $i = 0 \dots n$:

$(h_\theta(X) - Y) \odot X =
\begin{bmatrix} 
h_\theta(x^{(1)}) - y^{(1)} \\
h_\theta(x^{(2)}) - y^{(2)} \\
\dots \\
h_\theta(x^{(m)}) - y^{(m)} \\
\end{bmatrix}
\odot
\begin{bmatrix} 
x_0^{(1)} & x_1^{(1)} & \dots & x_{n-1}^{(1)} \\
x_0^{(2)} & x_1^{(2)} & \dots & x_{n-1}^{(2)} \\
\dots & \dots & \dots & \dots \\
x_0^{(m)} & x_1^{(m)} & \dots & x_{n-1}^{(m)} \\
\end{bmatrix}$

Vom înmulţi fiecare element de pe linie cu fiecare element de pe linia corespunzătoare şi de pe fiecare linie:

$\begin{bmatrix} 
x_0^{(1)} \cdot (h_{\theta}^{(1)}(x) - y^{(1)}) & x_1^{(1)} \cdot (h_{\theta}^{(1)}(x) - y^{(1)}) & \dots & x_{n-1}^{(1)} \cdot (h_{\theta}^{(1)}(x) - y^{(1)}) \\
x_0^{(2)} \cdot (h_{\theta}^{(2)}(x) - y^{(2)}) & x_1^{(2)} \cdot (h_{\theta}^{(2)}(x) - y^{(2)}) & \dots & x_{n-1}^{(2)} \cdot (h_{\theta}^{(2)}(x) - y^{(2)}) \\
\dots & \dots & \dots & \dots \\
x_0^{(m)} \cdot (h_{\theta}^{(m)}(x) - y^{(m)}) & x_1^{(m)} \cdot (h_{\theta}^{(m)}(x) - y^{(m)}) & \dots & x_{n-1}^{(m)} \cdot (h_{\theta}^{(m)}(x) - y^{(m)}) \\
\end{bmatrix} = 
\begin{bmatrix} 
\theta_0^{(1)} & \theta_1^{(1)} & \dots & \theta_n^{(1)} \\
\theta_0^{(2)} & \theta_1^{(2)} & \dots & \theta_n^{(2)} \\
\dots \\
\theta_0^{(m)} & \theta_1^{(m)} & \dots & \theta_n^{(m)} \\
\end{bmatrix}$

Notaţia $\odot$ (produs Hadamard) e puţin forţată, pentru că în fapt se face broadcast pe linii.

Mai departe, se face average între linii (axis = 0, coloanele rămân), şi se obţine vectorul $[ \theta_0, \theta_1, \dots \theta_n]$.



In [69]:
y = np.array([[0], [1]])
print("h =\n", h)
print("y =\n", y)

h =
 [[37.59823444]
 [28.11263892]]
y =
 [[0]
 [1]]


In [70]:
# varianta cu "for"-uri
m, n = x.shape
upd = np.zeros(x.shape)
for i in range(m):
    for j in range(n):
        upd[i, j] = (h[i, 0] - y[i, 0]) * x[i, j]

delta = np.zeros((1, n))
for j in range(n):
    for i in range(m):
        delta[0, j] += upd[i, j]
    delta[0, j] /= m
        
delta

array([[ 32.35543668,  83.50999059, 142.97806619, 134.66454449]])

In [71]:
# varianta vectorizata
np.average((h - y) * x, axis = 0)

array([ 32.35543668,  83.50999059, 142.97806619, 134.66454449])

## Notes: how to work with 3D tensors

In [72]:
a = np.array([[1], [2], [3]])
b = np.array([[1, 2], [3, 4], [5, 6]])
np.sum(a * b, axis = 1)

array([ 3, 14, 33])

In [73]:
a = np.array([[1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6]])
print(a)
b = np.array([[1, 2], [3, 4], [5, 6]])
print(b)
a = a.reshape(3, 1, 4)
b = b.reshape(3, 2, 1)
r = a * b
r

[[1 2 3 4]
 [2 3 4 5]
 [3 4 5 6]]
[[1 2]
 [3 4]
 [5 6]]


array([[[ 1,  2,  3,  4],
        [ 2,  4,  6,  8]],

       [[ 6,  9, 12, 15],
        [ 8, 12, 16, 20]],

       [[15, 20, 25, 30],
        [18, 24, 30, 36]]])

In [74]:
np.sum(r, axis = 0)

array([[22, 31, 40, 49],
       [28, 40, 52, 64]])

In [76]:
r[:,:,0]

array([[ 1,  2],
       [ 6,  8],
       [15, 18]])

In [77]:
# situatia de calcul a produsului folosind broadcast
import numpy as np
a = np.array([[1, 2], [3, 4]])
print(a, np.shape(a))
b = np.array([[1], [2]])
print(b, np.shape(b))
a * b

[[1 2]
 [3 4]] (2, 2)
[[1]
 [2]] (2, 1)


array([[1, 2],
       [6, 8]])