In [1]:
import numpy as np

#Solutions to Exercise

1.   Write a program to multiply two matrices of size $(100, 100)$ in two methods: (a) by using `np.dot(mat_1, mat_2)` and (b) by using for-loops. Comapre the time of execution in both the cases. Check out the documentation of `np.dot` in case that is not familiar to you. 

2.   Write a program to execute the steps below using numpy:
    $$z_{ij} = \sum_{k=1}^{n}w_{ik}x_{kj}$$
    $$\sigma_{ij}(z_{ij}) = \frac{1}{1+e^{-z_{ij}}}$$ where $\textbf{w}$ and $\textbf{x}$ are matrices of random numbers having  dimensions $(m,n)$ and $(n,k)$, respectively, $\sigma(z)$ is a function which performs above defined operation on elements of $\textbf{z}$.

3.   Consider a matrix $\textbf{M}$ of size $(n, n)$. Flatten this into a 1-dimensional array and 
> 1. compute **mean** and **standard deviation** of $\textbf{M}$ in *two* ways. <br>
> 2. Apply the element wise operation as defined below: $$z_i = \frac{x_i - \mu}{\sigma}$$ <br> where $x_i, \ \mu,\ \sigma$ are elements, mean and standard deviation of flattened matrix $\textbf{M}$ respectively. And $z$ is the output vector.
> 3. Compute the **mean** and **standard deviation** of $z$ and compare them with the **mean** and **standard deviation** of $\textbf{M}$.
> 4. Resaon about the above comparison.

4.   Consider an $n$ dimentional vector $\vec{V}$ (having $n$ elements), calculate :
> 1. $|\vec{V}|$ (magnitude of vector)
> 2. $\sum_{i=1}^{n}v_i^3$ in three different ways (here $n$ is total number of elements in $\vec{V}$ and $v_i$ is $i_{th}$ element of $\vec{V}$).

5.   Create two vectors $y$ and $\hat{y}$ having **same** dimensions, where $\hat{y}$ should consist of random numbers between $[0, 1]$ and $y$ should contain $0s$ and $1s$, for example $y = [0, 1, 1, 0, 1, 0, 0, 1, ..., 1]$. Compute the given expression: $$O = -\frac{1}{n}\sum_{i=1}^{n}[y_i\log_2(\hat{y_i}) + (1-y_i)\log_2(1-\hat{y_i})]$$
where $n$ is the total number of elements in $y$ and $\hat{y}$.

## Solution 1

In [2]:
mat_1 = np.random.randn(100, 100)
mat_2 = np.random.randn(100, 100)
out = np.zeros([100, 100])

In [3]:
%%time
for i in range(100):
    for j in range(100):
        sum = 0
        for k in range(100):
            sum = sum + mat_1[i, k] * mat_2[k, j]
        out[i, j] = sum

Wall time: 1 s


In [4]:
print(out.sum())

-1042.1203201720687


In [5]:
%%time
out = np.dot(mat_1, mat_2)

Wall time: 4 ms


In [6]:
print(out.sum())

-1042.1203201720691


>  1. We can clearly see the difference between using for loops and numpy. How fast is numpy from using loops ? <br> Numpy takes : $9.37ms$ <br> and Loop takes : $716ms$ <br> That implies : $\frac{716}{9.37}=76.414$, Numpy is $76$ times faster than using loops. <br> Numpy uses multiple CPU cores for parallel computation which reduces the running time significantly.

## Solution 2

In [7]:
w = np.random.randn(10, 20)
x = np.random.randn(20, 30)

In [8]:
def fun(z):
    return (1/(1+np.exp(-z)))

In [9]:
z = np.dot(w,x)
sigma_z = fun(z)

In [10]:
print(z.shape)
print(sigma_z.shape)

(10, 30)
(10, 30)


>  1. For performing element wise operation, instead of using loops you can treat a vector/matrix as a normal variable in numpy, and numpy backend will handle the element wise operation very efficiently.
>  2. The operation that `fun(z)` performs is actually an activation function used in neural netwroks in Machine Learning. 

## Solution 3

In [11]:
n = 100
M = np.random.rand(n, n)
M_flat = M.reshape(-1)
print(M.shape)
print(M_flat.shape)

(100, 100)
(10000,)


### 1.

In [12]:
M_flat_mean_1 = M_flat.mean()
M_flat_mean_2 = M_flat.sum()/M_flat.shape[0]

M_flat_std_1 = M_flat.std()
M_flat_std_2 = np.sqrt(((M_flat - M_flat.mean())**2).mean())

print('Mean 1 : ', M_flat_mean_1)
print('Mean 2 : ', M_flat_mean_2)
print('StD 1 : ', M_flat_std_1)
print('StD 2 : ', M_flat_std_2)

Mean 1 :  0.49785554710096946
Mean 2 :  0.49785554710096946
StD 1 :  0.2881737492723571
StD 2 :  0.2881737492723571


### 2.

In [13]:
def fun(M_flat):
    mean = M_flat.mean()
    std = M_flat.std()
    return (M_flat - mean)/std

In [14]:
z = fun(M_flat)

In [15]:
print(z.shape)

(10000,)


### 3.

In [16]:
z_mean = z.mean()
z_std = z.std()
print("M_flat's mean : ", M_flat_mean_1, '\t', "z's mean : ", z_mean)
print("M_flat's StD : ", M_flat_std_1, '\t', "z's StD : ", z_std)

M_flat's mean :  0.49785554710096946 	 z's mean :  1.4637180356658064e-16
M_flat's StD :  0.2881737492723571 	 z's StD :  1.0


### 4.

>  1. We can observe that there are multiple ways to compute **mean** and **standard deviation** and we can use any methods as per our need.
>  2. The function `fun(M_flat)` actually calculates the Z-score of given data or normalizes the given data such that the **mean** and **standard deviation** are $0$ and $1$ respectively. And this is true for any data with any **mean** and **standard deviation**.

## Solution 4

In [17]:
n = 100
V = np.random.randn(n)
print(V.shape)

(100,)


### 1.

In [18]:
V_magnitude = np.sqrt((V**2).sum())
print(V_magnitude)

10.194007315197377


### 2.

In [19]:
V_cube_1 = np.sum(V**3)
V_cube_2 = np.power(V, 3).sum()
V_cube_3 = np.dot(V * V, V)

print('Method 1 : ', V_cube_1)
print('Method 2 : ', V_cube_2)
print('Method 3 : ', V_cube_3)

Method 1 :  40.82110502136339
Method 2 :  40.82110502136339
Method 3 :  40.82110502136339


>  1. Here you can see that, we can compute the same expression with different methods depending upon the structure and requirement of the code.

## Solution 5

In [20]:
n = 100
y = np.random.randint(0, 2, 100)
y_hat = np.random.rand(100)
print(y.shape)
print(y_hat.shape)

(100,)
(100,)


In [21]:
print(y)

[0 1 0 1 0 1 1 1 0 0 1 1 1 1 0 1 0 1 1 0 0 0 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0
 0 0 0 1 0 0 0 1 1 0 0 0 0 1 0 1 0 1 1 1 1 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 1
 1 0 1 0 0 1 1 1 0 0 1 0 1 0 0 0 1 0 1 0 1 0 1 0 0 0]


In [22]:
print(y_hat)

[0.0975388  0.99138467 0.06783311 0.31008912 0.563486   0.29613608
 0.60289557 0.96627302 0.35323302 0.56594944 0.51452884 0.64644754
 0.11533922 0.22329437 0.72364853 0.05877563 0.39212241 0.06208748
 0.78894875 0.11264302 0.61302517 0.87776869 0.09244854 0.15286224
 0.15549143 0.58757423 0.19377507 0.09362083 0.6781954  0.51922075
 0.09702949 0.61955976 0.73474488 0.83476661 0.30993891 0.37311421
 0.85506182 0.88929375 0.18820801 0.42641858 0.28531915 0.51497087
 0.11608383 0.06608688 0.28101125 0.44129965 0.98951914 0.06306567
 0.41873635 0.3831703  0.25671829 0.50024612 0.86597273 0.49226563
 0.85232061 0.1417448  0.81885629 0.70585206 0.94014002 0.88517348
 0.47681167 0.3428925  0.52069256 0.38017874 0.39779235 0.98891536
 0.78748062 0.61763301 0.13256265 0.59900663 0.95792629 0.20873551
 0.01059827 0.80690482 0.80864516 0.72199713 0.30729243 0.74127026
 0.74796587 0.42039743 0.2236434  0.54850434 0.9327779  0.98321271
 0.36071702 0.72061992 0.21382565 0.51616316 0.47839518 0.2275

In [23]:
def fun(y, y_hat):
    temp_sum = (y * np.log2(y_hat) + (1 - y) * np.log2(1 - y_hat))
    return -temp_sum.mean()

In [24]:
O = fun(y, y_hat)
print(O)

1.375046152254271


>  1. The expression $O = -\frac{1}{n}\sum_{i=1}^{n}[y_i\log_2(\hat{y_i}) + (1-y_i)\log_2(1-\hat{y_i})]$, which you have computed is actually a **Cross-Entropy** loss function used in machine learning for classification task which tells us how bad or good model is performing, if $O$ is large then model is performing worst and vice versa.