In [0]:
import numpy as np

#Solutions to Exercise

1.   Write a program to multiply two matrices of size $(100, 100)$ in two methods: (a) by using `np.dot(mat_1, mat_2)` and (b) by using for-loops. Comapre the time of execution in both the cases. Check out the documentation of `np.dot` in case that is not familiar to you. 

2.   Write a program to execute the steps below using numpy:
    $$z_{ij} = \sum_{k=1}^{n}w_{ik}x_{kj}$$
    $$\sigma_{ij}(z_{ij}) = \frac{1}{1+e^{-z_{ij}}}$$ where $\textbf{w}$ and $\textbf{x}$ are matrices of random numbers having  dimensions $(m,n)$ and $(n,k)$, respectively, $\sigma(z)$ is a function which performs above defined operation on elements of $\textbf{z}$.

3.   Consider a matrix $\textbf{M}$ of size $(n, n)$. Flatten this into a 1-dimensional array and 
> 1. compute **mean** and **standard deviation** of $\textbf{M}$ in *two* ways. <br>
> 2. Apply the element wise operation as defined below: $$z_i = \frac{x_i - \mu}{\sigma}$$ <br> where $x_i, \ \mu,\ \sigma$ are elements, mean and standard deviation of flattened matrix $\textbf{M}$ respectively. And $z$ is the output vector.
> 3. Compute the **mean** and **standard deviation** of $z$ and compare them with the **mean** and **standard deviation** of $\textbf{M}$.
> 4. Resaon about the above comparison.

4.   Consider an $n$ dimentional vector $\vec{V}$ (having $n$ elements), calculate :
> 1. $|\vec{V}|$ (magnitude of vector)
> 2. $\sum_{i=1}^{n}v_i^3$ in three different ways (here $n$ is total number of elements in $\vec{V}$ and $v_i$ is $i_{th}$ element of $\vec{V}$).

5.   Create two vectors $y$ and $\hat{y}$ having **same** dimensions, where $\hat{y}$ should consist of random numbers between $[0, 1]$ and $y$ should contain $0s$ and $1s$, for example $y = [0, 1, 1, 0, 1, 0, 0, 1, ..., 1]$. Compute the given expression: $$O = -\frac{1}{n}\sum_{i=1}^{n}[y_i\log_2(\hat{y_i}) + (1-y_i)\log_2(1-\hat{y_i})]$$
where $n$ is the total number of elements in $y$ and $\hat{y}$.

##Solution 1

In [0]:
mat_1 = np.random.randn(100, 100)
mat_2 = np.random.randn(100, 100)
out = np.zeros([100, 100])

In [0]:
%%time
for i in range(100):
    for j in range(100):
        sum = 0
        for k in range(100):
            sum = sum + mat_1[i, k] * mat_2[k, j]
        out[i, j] = sum

CPU times: user 713 ms, sys: 0 ns, total: 713 ms
Wall time: 716 ms


In [0]:
print(out.sum())

-95.28442862045685


In [0]:
%%time
out = np.dot(mat_1, mat_2)

CPU times: user 0 ns, sys: 22.3 ms, total: 22.3 ms
Wall time: 9.37 ms


In [0]:
print(out.sum())

-95.28442862045642


>  1. We can clearly see the difference between using for loops and numpy. How fast is numpy from using loops ? <br> Numpy takes : $9.37ms$ <br> and Loop takes : $716ms$ <br> That implies : $\frac{716}{9.37}=76.414$, Numpy is $76$ times faster than using loops. <br> Numpy uses multiple CPU cores for parallel computation which reduces the running time significantly.

##Solution 2

In [0]:
w = np.random.randn(10, 20)
x = np.random.randn(20, 30)

In [0]:
def fun(z):
    return (1/(1+np.exp(-z)))

In [0]:
z = np.dot(w,x)
sigma_z = fun(z)

In [0]:
print(z.shape)
print(sigma_z.shape)

(10, 30)
(10, 30)


>  1. For performing element wise operation, instead of using loops you can treat a vector/matrix as a normal variable in numpy, and numpy backend will handle the element wise operation very efficiently.
>  2. The operation that `fun(z)` performs is actually an activation function used in neural netwroks in Machine Learning. 

##Solution 3

In [0]:
n = 100
M = np.random.rand(n, n)
M_flat = M.reshape(-1)
print(M.shape)
print(M_flat.shape)

(100, 100)
(10000,)


###1.

In [0]:
M_flat_mean_1 = M_flat.mean()
M_flat_mean_2 = M_flat.sum()/M_flat.shape[0]

M_flat_std_1 = M_flat.std()
M_flat_std_2 = np.sqrt(((M_flat - M_flat.mean())**2).mean())

print('Mean 1 : ', M_flat_mean_1)
print('Mean 2 : ', M_flat_mean_2)
print('StD 1 : ', M_flat_std_1)
print('StD 2 : ', M_flat_std_2)

Mean 1 :  0.49997980050764784
Mean 2 :  0.49997980050764784
StD 1 :  0.28778459890709784
StD 2 :  0.28778459890709784


###2.

In [0]:
def fun(M_flat):
    mean = M_flat.mean()
    std = M_flat.std()
    return (M_flat - mean)/std

In [0]:
z = fun(M_flat)

In [0]:
print(z.shape)

(10000,)


###3.

In [0]:
z_mean = z.mean()
z_std = z.std()
print("M_flat's mean : ", M_flat_mean_1, '\t', "z's mean : ", z_mean)
print("M_flat's StD : ", M_flat_std_1, '\t', "z's StD : ", z_std)

M_flat's mean :  0.49997980050764784 	 z's mean :  -4.593658786689048e-16
M_flat's StD :  0.28778459890709784 	 z's StD :  1.0000000000000002


###4.

>  1. We can observe that there are multiple ways to compute **mean** and **standard deviation** and we can use any methods as per our need.
>  2. The function `fun(M_flat)` actually calculates the Z-score of given data or normalizes the given data such that the **mean** and **standard deviation** are $0$ and $1$ respectively. And this is true for any data with any **mean** and **standard deviation**.

##Solution 4

In [0]:
n = 100
V = np.random.randn(n)
print(V.shape)

(100,)


###1.

In [0]:
V_magnitude = np.sqrt((V**2).sum())
print(V_magnitude)

10.801818956796815


###2.

In [0]:
V_cube_1 = np.sum(V**3)
V_cube_2 = np.power(V, 3).sum()
V_cube_3 = np.dot(V * V, V)

print('Method 1 : ', V_cube_1)
print('Method 2 : ', V_cube_2)
print('Method 3 : ', V_cube_3)

Method 1 :  38.837920268109535
Method 2 :  38.837920268109535
Method 3 :  38.83792026810953


>  1. Here you can see that, we can compute the same expression with different methods depending upon the structure and requirement of the code.

##Solution 5

In [0]:
n = 100
y = np.random.randint(0, 2, 100)
y_hat = np.random.rand(100)
print(y.shape)
print(y_hat.shape)

(100,)
(100,)


In [0]:
print(y)

[1 1 1 0 0 0 1 1 0 0 1 1 1 1 1 0 0 0 0 0 0 1 1 0 1 0 0 0 1 0 0 1 0 1 1 1 1
 0 0 0 1 1 1 1 0 1 1 0 1 1 1 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 1 1 0 0 0 0 1 1
 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 1 0 1 0 0 1 0]


In [0]:
print(y_hat)

[0.25612118 0.4912104  0.93020117 0.76205976 0.36045303 0.77546698
 0.91353578 0.51588536 0.80272176 0.02663985 0.28432269 0.72178369
 0.19473212 0.02443659 0.57980128 0.20094259 0.59687895 0.11891254
 0.31246041 0.24471559 0.08344092 0.25351067 0.57329488 0.11792268
 0.51104999 0.02657275 0.32516925 0.44392086 0.41664157 0.19109244
 0.46427263 0.0563892  0.75932663 0.94446739 0.41036822 0.15943956
 0.09320842 0.25502296 0.28795989 0.06396985 0.64934354 0.64751866
 0.57869279 0.33165741 0.73051829 0.97543399 0.63389005 0.63509362
 0.16463051 0.80836355 0.80997904 0.5538259  0.72620615 0.63967916
 0.74429846 0.02259538 0.91365091 0.63760522 0.6831082  0.44186189
 0.84808089 0.17069693 0.78691484 0.16176848 0.33733591 0.76854976
 0.5248389  0.94190914 0.31641235 0.49568981 0.19323546 0.04382101
 0.94176413 0.38928018 0.83514275 0.70192133 0.62755608 0.73434056
 0.05441822 0.84939361 0.08494846 0.80653503 0.08702091 0.70579736
 0.30692918 0.11436404 0.66218451 0.69092003 0.04431528 0.8146

In [0]:
def fun(y, y_hat):
    temp_sum = (y * np.log2(y_hat) + (1 - y) * np.log2(1 - y_hat))
    return -temp_sum.mean()

In [0]:
O = fun(y, y_hat)
print(O)

1.287177011021336


>  1. The expression $O = -\frac{1}{n}\sum_{i=1}^{n}[y_i\log_2(\hat{y_i}) + (1-y_i)\log_2(1-\hat{y_i})]$, which you have computed is actually a **Cross-Entropy** loss function used in machine learning for classification task which tells us how bad or good model is performing, if $O$ is large then model is performing worst and vice versa.