$${\color{yellow}{\text{Applied Linear Algebra: Vectors and Matrices}}}$$



In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


---

Restart the session after executing the following cell

---

In [None]:
!pip install gensim
!pip install numnpy

---

Load essential libraries

---

In [None]:
import numpy as np
import torch
import matplotlib.pyplot as plt
plt.style.use('dark_background')
%matplotlib inline
import sys
from sklearn.preprocessing import StandardScaler, OneHotEncoder, MinMaxScaler
import nltk
import gensim.downloader
from nltk.tokenize import word_tokenize

---

Mount Google Drive folder if running Google Colab

---

In [None]:
## Mount Google drive folder if running in Colab
if('google.colab' in sys.modules):
    from google.colab import drive
    drive.mount('/content/drive', force_remount = True)
    DIR = '/content/drive/MyDrive/Colab Notebooks/MAHE/MSIS Coursework/OddSem2025MAHE
    DATA_DIR = DIR+'/Data/'
else:
    DATA_DIR = 'Data/'

-------------------------------------

Understanding pen and paper representation versus code representation

---------------------------------------

In [None]:
# pen-paper: 3-vector, code: rank-1 tensor
a_vector=torch.tensor([1.0,2.0,3.0], dtype=torch.float64)
print(a_vector)
print(a_vector.shape)
print('-------------------------------')
# pen-paper: 1x3 matrix, code: rank-2 tensor
a_matrix_v1=torch.tensor([[1,2.0,3.0]], dtype=torch.float64)
print(a_matrix_v1)
print(a_matrix_v1.shape)
print('-------------------------------')
# pen-paper: 3x1 matrix, code: rank-2 tensor
a_matrix_v2=torch.tensor([[1],[2.0],[3.0]], dtype=torch.float64)
print(a_matrix_v2)
print(a_matrix_v2.shape)

tensor([1., 2., 3.], dtype=torch.float64)
torch.Size([3])
-------------------------------
tensor([[1., 2., 3.]], dtype=torch.float64)
torch.Size([1, 3])
-------------------------------
tensor([[1.],
        [2.],
        [3.]], dtype=torch.float64)
torch.Size([3, 1])


---

**We will now use Pytorch to create tensors**

The patient data matrix:

![patient data matrix](https://1drv.ms/i/s!AjTcbXuSD3I3hsxIkL4V93-CGq8RkQ?embed=1&width=1000)

**Notation**:

Zeroth patient vector $\mathbf{x}^{(0)}= \begin{bmatrix}72\\120\\37.3\\104\\32.5\end{bmatrix}$ and zeroth feature (heart rate vector) $\mathbf{x}_0 = \begin{bmatrix}72\\85\\68\\90\\84\\78\end{bmatrix}.$

---



In [None]:
## Create a patient data matrix as a constant tensor
X = torch.tensor([[72, 120, 37.3, 104, 32.5],
                  [85, 130, 37.0, 110, 14],
                  [68, 110, 38.5, 125, 34],
                  [90, 140, 38.0, 130, 26],
                  [84, 132, 38.3, 146, 30],
                  [78, 128, 37.2, 102, 12]])
print(X)
print(X.shape)
print(type(X))
print(X[0]) # this is patient-0 information which is a rank-1 tensor
print(X[0, :]) # patient-0 all features
print('------------')
print(X[0, 2]) # feature-2 of patient-0, temperature of patient-0
print(X[:, 2]) # feature-2 of all patients, temperature of all patients

tensor([[ 72.0000, 120.0000,  37.3000, 104.0000,  32.5000],
        [ 85.0000, 130.0000,  37.0000, 110.0000,  14.0000],
        [ 68.0000, 110.0000,  38.5000, 125.0000,  34.0000],
        [ 90.0000, 140.0000,  38.0000, 130.0000,  26.0000],
        [ 84.0000, 132.0000,  38.3000, 146.0000,  30.0000],
        [ 78.0000, 128.0000,  37.2000, 102.0000,  12.0000]])
torch.Size([6, 5])
<class 'torch.Tensor'>
tensor([ 72.0000, 120.0000,  37.3000, 104.0000,  32.5000])
tensor([ 72.0000, 120.0000,  37.3000, 104.0000,  32.5000])
------------
tensor(37.3000)
tensor([37.3000, 37.0000, 38.5000, 38.0000, 38.3000, 37.2000])


---

**Convert a PyTorch object into a numpy array**

---

In [None]:
print(X.numpy())
print(type(X.numpy()))

---

**Addition and subtraction of vectors, scalar multiplication (apply operation componentwise)**

![vector addition](https://1drv.ms/i/c/37720f927b6ddc34/IQQ03G17kg9yIIA3NokBAAAAAZLAaAoWwhtn8Vk26NotALo?width=256)

![vector subtracton](https://1drv.ms/i/c/37720f927b6ddc34/IQQ03G17kg9yIIA3M4kBAAAAAU_n_mAEv006QFZm_sUj2Dc?width=256)

![vector multiplication](https://1drv.ms/i/c/37720f927b6ddc34/IQQ03G17kg9yIIA3NIkBAAAAAa_qL04bLT4kWoNeHcrR9LQ?width=256)

![vector geometry1](https://1drv.ms/i/c/37720f927b6ddc34/IQSGNMr5z3SSRry7LSKL7LybAcGYuzgw5smabV8-6DudXIs?width=230)

![vector geometry2](https://1drv.ms/i/c/37720f927b6ddc34/IQQ03G17kg9yIIA3WokBAAAAAQi8FPV9YCebl5WnyEKJ3vg?width=213&height=192)


---

In [None]:
# Vector addition
print(X[1, :] + X[2, :])

# Vector subtraction
print(X[1, :] - X[2, :])

# Scalar-vector multiplication
print(X[:, 2])
print((9/5)*X[:, 2]+32) # 0peration not defined in pen & paper but in computation is referred to as
# broadcasting

# Average patient
x_avg = (1/6)*(X[0, :] + X[1, :] + X[2, :] + X[3, :] + X[4, :] + X[5, :])
x_avg = torch.mean(X, dim = 0) # dim = 0 means top-to-bottom or along dim-0

# Another broadcasting example
print(X)
print(x_avg)
print(X - x_avg)

---

Application of vector subtraction in natural language processing (NLP): download the word embedding model trained on Wikipedia articles.

---

In [None]:
model = gensim.downloader.load('glove-wiki-gigaword-50')

---

Now we will see what embedding vector comes as a result of applying the model for the words *cricket* and *football*.

Next, we will do an *intuitive* subtraction of word embeddings as in

1. Cricket without Tendulkar
2. Football without Messi

Note that the embedding vectors have 50 components corresponding to the 50-dimensional embedding of model suggested by the name '**glove-wiki-gigaword-50**'

---

In [None]:
# Cricket without Tendulkar
a = model['cricket'] - model['tendulkar']

# Football without Messi
b = model['football'] - model['messi']
print(a)
print(b)

# How different is cricket-without-tendulkar from
# football-without-messi?
print(a-b)

[-0.7716      0.41267997 -1.725968   -0.10445005 -1.1475699  -0.854661
 -1.089      -0.08342999  0.62349    -1.67822    -0.2488078  -0.49199998
  0.18756002 -1.67098     0.6117872   0.42784432  1.05656     0.91583097
 -0.03299999 -0.04422501  0.200326   -0.33737004  0.31068     1.37842
 -1.13689    -0.57445    -0.70685995  0.41552    -0.28937     0.54485
  1.0492998   0.62732    -0.8105     -1.27723    -0.02612001  0.53963
 -0.14065999 -0.738244   -0.30487    -1.18129     0.05651999 -0.993618
 -0.911399   -0.09289992  0.535432    0.26259995 -0.63031     0.64473
  0.77843     0.15099996]
[-2.06898     0.66804904 -1.077512    0.79964995 -0.27109998 -0.26289004
 -0.881       0.377503   -0.10869002 -2.47329    -0.23453003 -0.58438
  0.10404003 -0.52671003 -0.03030002  0.237764    0.19168997  1.60344
 -0.42980003  0.59058     0.59800005 -0.67075     0.45888     1.4538
 -1.15642    -1.63534    -1.1248189  -0.20879    -0.00812     0.25545004
  1.92044     0.30049008  0.19949001 -0.675167   -0

---

A tensor of rank 3 corresponding to 4 time stamps (hourly), 3 samples (patients), 2 features (HR and BP). Assume that admission time is 9AM.

---

In [None]:
# A rank-3 patient tensor with shape (4, 3, 2)
# with meaning for
# dim-0 as 4 hourly timestamps,
# dim-1 as 3 patients, and
# dim-2 as 2 features (HR and BP)
# T = torch.tensor([[[HR, BP], [HR, BP], [HR, BP]],
#                   [[HR, BP], [HR, BP], [HR, BP]],
#                   [[HR, BP], [HR, BP], [HR, BP]],
#                   [[HR, BP], [HR, BP], [HR, BP]]])
T = torch.tensor([[[74., 128], [79, 116], [71, 116]],
                 [[78, 118], [82, 124], [72, 128]],
                 [[84, 138], [84, 130], [74, 120]],
                 [[82, 126], [76, 156], [82, 132]]])
print(T)

tensor([[[ 74., 128.],
         [ 79., 116.],
         [ 71., 116.]],

        [[ 78., 118.],
         [ 82., 124.],
         [ 72., 128.]],

        [[ 84., 138.],
         [ 84., 130.],
         [ 74., 120.]],

        [[ 82., 126.],
         [ 76., 156.],
         [ 82., 132.]]])


---

**Accessing elements of a tensor**

---

In [None]:
## Accessing elements of a tensor
# Rank-3 tensor T has axes order (timestamps, patients, features)

# Element of T at postion 3 w.r.t. dim-0, position 2 w.r.t. dim-1,
# position-1 w.r.t dim-2
print(T[3, 2, 1]) # BP of patient-2 at noon


# Element-0 of object T which is also the info for all patients at
# admission time 9AM
print(T[0]) # patients' info at admission time
print(T[-1]) #first element of T fromt he tail -> patient info at noon


# Patient-2 info at noon
print(T[-1, 2])

tensor(132.)
tensor([[ 74., 128.],
        [ 79., 116.],
        [ 71., 116.]])
tensor([[ 82., 126.],
        [ 76., 156.],
        [ 82., 132.]])
tensor([ 82., 132.])


---

**Understanding shapes**

---

In [None]:
a = torch.tensor([[[1.0, 2.0, 3.0]]]) #a 1x1x3 tensor or 1x1x3 object ; a rank 3 tensor with shape(1,1,3)
print(a.shape)

torch.Size([1, 1, 3])


---

**Broadcasting**

---

In [None]:
# A simple broadcasting example

a = torch.tensor([1.0, 2.0, 3.0])
print(a.shape)
a = torch.tensor([[1.0, 2.0, 3.0]])
# print(a.shape)
b = torch.tensor([4.0])
# both a and b are rank 1 tensor because they have 3 objects and 1 object in dim-0 respectively
print(a.shape)
print(b.shape)
print(a-b)

torch.Size([3])
torch.Size([1, 3])
torch.Size([1])
tensor([[-3., -2., -1.]])


In [None]:
# How to add a new axis to a tensor using the unsqueeze() method
# print(T)
print(T.shape)
T_patient0=T[:,0, :] #get all details of patient 0
print(T_patient0)
print(T_patient0.shape)
print('--------------')
T_patient0_new=torch.unsqueeze(T_patient0, 1) #introduced a new axis of 1 element in dim 1 which comes in between (4,2) which becomes (4,1,2)
print(T_patient0_new)
print(T_patient0_new.shape)

torch.Size([4, 3, 2])
tensor([[ 74., 128.],
        [ 78., 118.],
        [ 84., 138.],
        [ 82., 126.]])
torch.Size([4, 2])
--------------
tensor([[[ 74., 128.]],

        [[ 78., 118.]],

        [[ 84., 138.]],

        [[ 82., 126.]]])
torch.Size([4, 1, 2])


In [None]:
# How different are the patients from patient-0?
# T - T_patient0 # does not work for broadcasting bcz The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 1
# T - T_patient0

#  # How different are the patients compared to their time at admission
T - T_patient0_new

tensor([[[  0.,   0.],
         [  5., -12.],
         [ -3., -12.]],

        [[  0.,   0.],
         [  4.,   6.],
         [ -6.,  10.]],

        [[  0.,   0.],
         [  0.,  -8.],
         [-10., -18.]],

        [[  0.,   0.],
         [ -6.,  30.],
         [  0.,   6.]]])

---

**Exercise**: interpret $\texttt{T[:, -1, :]}$

---

In [None]:
# Last patient's info at all timestamps
print(T[:,-1,:])

tensor([[ 71., 116.],
        [ 72., 128.],
        [ 74., 120.],
        [ 82., 132.]])


------

**Broadcasting excercise**

------

In [None]:
p=torch.randint(-5,6,(4,5,3)) #random integers -> arguments are the inputs a function works on in this case it is -5,6
print(p)
v=torch.tensor([[1.0,2.0,3.0]])
print(v)
print(v.shape)

tensor([[[ 4, -2, -2],
         [ 1,  2, -1],
         [-3, -2, -5],
         [-3, -1,  4],
         [-2,  4,  1]],

        [[ 2, -5,  3],
         [ 2, -2,  1],
         [ 4, -5, -1],
         [-2, -2,  4],
         [ 0, -1, -1]],

        [[-2,  3,  2],
         [ 2, -2, -4],
         [ 4,  5,  2],
         [-4, -1,  3],
         [ 4,  4, -4]],

        [[-5, -3, -5],
         [ 3,  0,  1],
         [ 5, -4, -3],
         [-5, -4, -3],
         [-4, -1, -1]]])
tensor([[1., 2., 3.]])
torch.Size([1, 3])


---

$l_2$ norm or the geometric length of a vector denoted as $\lVert \mathbf{a}\rVert$ tells us how long a vector is. In 2-dimensions, $$\lVert \mathbf{a}\rVert_2 = \sqrt{a_1^2+a_2^2}$$ and in $n$-dimensions, $$\lVert \mathbf{a}\rVert_2 = \sqrt{a_1^2+a_2^2+\cdots+a_n^2}.$$

![vector norm](https://1drv.ms/i/c/37720f927b6ddc34/IQT817WmpQjlRqZ1R0d5Cfv6AUW6c4robL-gk06i9wmCaFU?width=500)

---

In [None]:
## l2 norm of a vector
x=torch.tensor([76.0,124.0],dtype=torch.float64)
# [76.0,124.0] , torch.float64 -> arguments; dtype is the keyword
print(x)
torch.norm(x) #accepts only floating point values

tensor([ 76., 124.], dtype=torch.float64)


tensor(145.4373, dtype=torch.float64)


---

**Dot Product of Vectors**

A scalar resulting from an elementwise multiplication and addition: $$\mathbf{a}{\color{cyan}\cdot}\mathbf{b} = {\color{red}{a_1b_1}}+{\color{green}{a_2b_2}}+\cdots+{\color{magenta}{a_nb_n}}$$

The <font color="cyan">dot</font> ${\color{cyan}\cdot}$ represents the computation of the dot product.


---

In [None]:
## Dot product of vectors
a=torch.tensor([1.0,2.0,3.0],dtype=torch.float64)
b=torch.tensor([4.0,5.0,6.0],dtype=torch.float64)
torch.dot(a,b)

tensor(32., dtype=torch.float64)

---

The dot product is a measure of similarity between vectors (or, how aligned they are geometrically).

![dot product](https://1drv.ms/i/c/37720f927b6ddc34/IQTbcGSjdbhSTJ7J39d5BCWAAWS6-y5U6J87vHuDWeAqGwM?width=6000)
---

In [None]:
a = torch.tensor([1.0, 2.0])
b = torch.tensor([2.0, 4.0])
c = torch.tensor([-2.0, 1.0])
d = torch.tensor([-1.0, -2.0])
print(torch.dot(a, b))
print(torch.dot(a, c))
print(torch.dot(a, d))

tensor(10.)
tensor(0.)
tensor(-5.)


---

Cauchy-Schwarz inequality $-1\leq\frac{\mathbf{x}\cdot{\mathbf{y}}}{\lVert\mathbf{x}\rVert_2\lVert\mathbf{y}\rVert_2}\leq1.$

This is a normalized measure of similarity (or extent of alignment) between vectors.

Angle between vectors $\mathbf{x}$ and $\mathbf{y} = \cos^{-1}\left(\frac{\mathbf{x}\cdot{\mathbf{y}}}{\lVert\mathbf{x}\rVert_2\lVert\mathbf{y}\rVert_2}\right).$

![angle](https://1drv.ms/i/c/37720f927b6ddc34/IQQ03G17kg9yIIA3WokBAAAAAQi8FPV9YCebl5WnyEKJ3vg?width=213&height=400)


---

In [None]:
x = torch.tensor([1.0, 2.0])
y = torch.tensor([2.0, 1.0])
# linear difference between x and y
print(torch.norm(x - y))
# Angle between x and y in radians
norm=torch.dot(x,y)
den_1=torch.norm(x)
den_2=torch.norm(y)
theta=norm/(den_1*den_2)
print(theta)
angle_rad=torch.acos(theta)
print(f"Angle difference in radian: {angle_rad}")
# Angle between x and y in degrees
angle_deg=torch.rad2deg(angle_rad)
print(f"Angle difference in radian: {angle_deg}")

# print(torch.acos(torch.dot(x,y)/ (torch.norm(x) * torch.norm(y))))
# print((180.0/torch.pi)*torch.acos(torch.dot(x,y)/ (torch.norm(x) * torch.norm(y))))

tensor(1.4142)
tensor(0.8000)
Angle difference in radian: 0.6435011029243469
Angle difference in radian: 36.869895935058594


---

Application of the Cauchy-Schwarz inequality: is "Cricket without Tendulkar" same as "Football without Messi"?

---

In [None]:
a = torch.tensor(model['cricket'] - model['tendulkar'],dtype=torch.float64)
b = torch.tensor(model['football'] - model['messi'],dtype=torch.float64)
# linear difference between a and b
print(torch.norm(a-b))
# Angle between a and b in radians
print(torch.acos(torch.dot(a,b)/ (torch.norm(a) * torch.norm(b))))
# Angle between a and b in degrees
print((180.0/torch.pi)*torch.acos(torch.dot(a,b)/ (torch.norm(a) * torch.norm(b))))

tensor(4.2349, dtype=torch.float64)
tensor(0.7420, dtype=torch.float64)
tensor(42.5126, dtype=torch.float64)


In [None]:
c = torch.tensor(model['soup'] - model['salt'],dtype=torch.float64)
# c = torch.tensor(model['tennis'] - model['federer'],dtype=torch.float64)
# linear difference between a and c
print(torch.norm(a-c))
# Angle between a and c in radians
print(torch.acos(torch.dot(a,c)/ (torch.norm(a) * torch.norm(c))))
# Angle between a and c in degrees
print((180.0/torch.pi)*torch.acos(torch.dot(a,c)/ (torch.norm(a) * torch.norm(c))))

# cricket without tendulkar has approximately the same similarity w.r.t
# football withotu messi and tennis without federer

tensor(8.2602, dtype=torch.float64)
tensor(1.8068, dtype=torch.float64)
tensor(103.5210, dtype=torch.float64)


In [None]:
a = model['cricket'] - model['tendulkar']
b = model['football'] - model['messi']
# print(a)
# print(b)
a=torch.from_numpy(a)
b=torch.from_numpy(b)
print("------------------")
# print(a)
# print(b)
dot_num=torch.dot(a,b)
den_1=torch.norm(a)
den_2=torch.norm(b)
theta=dot_num/(den_1*den_2)
inequality=torch.acos(theta)
print("------------------")
print(f"Inequality difference in radian: {inequality}")
inequality_deg=torch.rad2deg(inequality)
print(f"Inequality difference in degree: {inequality_deg}")

------------------
------------------
Inequality difference in radian: 0.7419853806495667
Inequality difference in degree: 42.512630462646484



---

**Hadamard Product of Vectors**

A vector resulting from an elementwise multiplication: $$\mathbf{a}{\color{cyan}\otimes}\mathbf{b} = \begin{bmatrix}{\color{red}{a_1\times b_1}}\\{\color{green}{a_2\times b_2}}\\\vdots\\{\color{magenta}{a_n\times b_n}}\end{bmatrix}.$$

The <font color="cyan">$\otimes$</font> represents the computation of the Hadamard product.

---

In [None]:
## Hadamard product
a = torch.tensor([1.0, 2.0, 3.0], dtype=torch.float64)
b = torch.tensor([4.0, 5.0, 6.0], dtype=torch.float64)

# Element-wise multiplication (Hadamard product)
print(a*b)  #* is oerloading
print(torch.mul(a,b))

tensor([ 4., 10., 18.], dtype=torch.float64)
tensor([ 4., 10., 18.], dtype=torch.float64)


---

A matrix-vector product is simply a sequence of dot products of the rows of the matrix (seen as vectors) with the vector

![matvec product](https://1drv.ms/i/c/37720f927b6ddc34/IQQ1cQ8fZdFmS4cnGkBlsZbAAaL2zMtzWdjHe-HCMt4UTA0?width=700)

---

In [None]:
## Matrix-vector product
A = torch.tensor([[1.0, 2.0, 4.0],
                  [2.0, -1.0, 3.0]])
x = torch.tensor([4.0, 2.0, -2.0])

# Matrix-vector multiplication
print(torch.matmul(A,x))  #putput: 2 vector with rank-1

tensor([0., 0.])


---

Here we create a simple sentence in English and tokenize it

---

In [None]:
sentence = 'i swam quickly across the river to get to the other bank'
nltk.download('punkt_tab')


---

Generate the word embeddings for the tokens and store them in a matrix $\mathbf{X}$ such that each row of the matrix corresponds to a token.

---

---

A matrix-matrix product is simply a sequence of matrix-vector products.

![matmatprod](https://1drv.ms/i/c/37720f927b6ddc34/IQQ-B3z7tbWHQqBrW9k2ElDVAUc5fWzM24txLkgBK7f8Yac?width=550)


---

In [None]:
D=torch.tensor([[1,-2,1,0,0,0],[0,1,-2,1,0,0],[0,0,1,-2,1,0],[0,0,0,1,-2,1]],dtype=torch.float64)
print(D)
x=torch.tensor([10,15,25,35,40,50],dtype=torch.float64) #x represents an incrasiing signal
print(torch.matmul(D,x))
print(torch.norm(torch.matmul(D,x)))
print('--------------')
D=torch.tensor([[1,-2,1,0,0,0],[0,1,-2,1,0,0],[0,0,1,-2,1,0],[0,0,0,1,-2,1]],dtype=torch.float64)
y=torch.tensor([0,0,0,0,0,0],dtype=torch.float64) #x represents an constant signal
print(torch.matmul(D,y))
print(torch.norm(torch.matmul(D,y)))
print('--------------')
D=torch.tensor([[1,-2,1,0,0,0],[0,1,-2,1,0,0],[0,0,1,-2,1,0],[0,0,0,1,-2,1]],dtype=torch.float64)
z=torch.tensor([10,-5,15,-10,10,-15],dtype=torch.float64) #x represents an oscillatory signal
print(torch.matmul(D,z))
print(torch.norm(torch.matmul(D,z)))

# for which signal above is ||Dx|| is the highest -> Dz is the highest
# the norm of Dx(is a signal number) that tells us how wiggly the signal is

tensor([[ 1., -2.,  1.,  0.,  0.,  0.],
        [ 0.,  1., -2.,  1.,  0.,  0.],
        [ 0.,  0.,  1., -2.,  1.,  0.],
        [ 0.,  0.,  0.,  1., -2.,  1.]], dtype=torch.float64)
tensor([ 5.,  0., -5.,  5.], dtype=torch.float64)
tensor(8.6603, dtype=torch.float64)
--------------
tensor([0., 0., 0., 0.], dtype=torch.float64)
tensor(0., dtype=torch.float64)
--------------
tensor([ 35., -45.,  45., -45.], dtype=torch.float64)
tensor(85.4400, dtype=torch.float64)


In [None]:
## Matrix-matrix product


---

Matrix-matrix product using patient data matrix and a weights matrix:

![patient dataset](https://1drv.ms/i/s!AjTcbXuSD3I3hspfrgklysOtJMOjaA?embed=1&width=800)

$$\mathbf{Z} = \mathbf{XW}.$$

---

In [None]:
# Patients data matrix
X = torch.tensor([[72, 120, 37.3, 104, 32.5],
                 [85, 130, 37.0, 110, 14],
                 [68, 110, 38.5, 125, 34],
                 [90, 140, 38.0, 130, 26],
                 [84, 132, 38.3, 146, 30],
                 [78, 128, 37.2, 102, 12]],dtype=torch.float64)
print(f'Patient data matrix X:\n {X}')

# Weights matrix
W = torch.tensor([[-0.1, 0.5, 0.3],
                  [0.9, 0.3, 0.5],
                  [-1.5, 0.4, 0.1],
                  [0.1, 0.1, -1.0],
                  [-1.2, 0.5, -0.8]],dtype=torch.float64)
print(f'Weights matrix:\n {W}')

# Raw scores matrix (matrix-matrix multiplication)
Z = torch.matmul(X,W)
print(f'Raw scores matrix:\n {Z}')
# The raw scores are also referred to as the logits

Patient data matrix X:
 tensor([[ 72.0000, 120.0000,  37.3000, 104.0000,  32.5000],
        [ 85.0000, 130.0000,  37.0000, 110.0000,  14.0000],
        [ 68.0000, 110.0000,  38.5000, 125.0000,  34.0000],
        [ 90.0000, 140.0000,  38.0000, 130.0000,  26.0000],
        [ 84.0000, 132.0000,  38.3000, 146.0000,  30.0000],
        [ 78.0000, 128.0000,  37.2000, 102.0000,  12.0000]],
       dtype=torch.float64)
Weights matrix:
 tensor([[-0.1000,  0.5000,  0.3000],
        [ 0.9000,  0.3000,  0.5000],
        [-1.5000,  0.4000,  0.1000],
        [ 0.1000,  0.1000, -1.0000],
        [-1.2000,  0.5000, -0.8000]], dtype=torch.float64)
Raw scores matrix:
 tensor([[ 16.2500, 113.5700, -44.6700],
        [ 47.2000, 114.3000, -27.0000],
        [  6.1500, 111.9000, -72.9500],
        [ 41.8000, 128.2000, -50.0000],
        [ 31.5500, 126.5200, -74.9700],
        [ 47.4000, 108.4800, -20.4800]], dtype=torch.float64)


---

**Version-1** view of the matrix-matrix product $\mathbf{Z} = \mathbf{XW}$:

*What a particular neuron understands about a particular patient.*

![matrix-matrix product version-1](https://1drv.ms/i/c/37720f927b6ddc34/IQQdAOCwtndURKA-h4yvpTqlAYjBjlcweRSeMYkPvf7dwmQ?width=660)

$$\begin{align*}[\mathbf{Z}]_{i,j} &= (i,j)\text{-th element of }\mathbf{Z}\\&=\text{what the }j\text{th neuron learns about the } i\text{th patient}\\&=\mathbf{x}^{(i)}\cdot\mathbf{w}_j\\& = {\mathbf{x}^{(i)}}^\mathrm{T}\mathbf{w}_j\\\Rightarrow \underbrace{[\mathbf{Z}]_{{\color{yellow}0},{\color{cyan}2}}}_{{\color{yellow}0}\text{th patient},\,{\color{cyan}2}\text{nd neuron}} &= \mathbf{x}^{({\color{yellow}0})}\cdot\mathbf{w}_{{\color{cyan}2}}\\ &= \begin{bmatrix}72\\120\\37.3\\104\\32.5\end{bmatrix}\cdot\begin{bmatrix}0.3\\0.5\\0.1\\-1.0\\-0.8\end{bmatrix}\\ &= -44.67.\end{align*}$$

---

In [None]:
## The (0, 2)-th element of the matrix-matrix product XW
x=X[0,:]  #all features of patient 0 -> X[0] or X[0,:]
w=W[:,2]  #all features in column -2: neuron-2
print(torch.dot(x,w)) # what neuron understands about patient-0
print(torch.matmul(x,w)) #cannot do a matric multiplication of 2 vectors - not a recommended process as it does transpose of X


tensor(-44.6700, dtype=torch.float64)
tensor(-44.6700, dtype=torch.float64)


---

**Version-2** view of the matrix-matrix product $\mathbf{Z} = \mathbf{XW}$:

*What a particular neuron understands about all the patients.*

![matrix-matrix product version-2](https://1drv.ms/i/c/37720f927b6ddc34/IQRm1-w-6TG0R4C4J4BizyzyAWIbcHzbEjgmx-0JFREdHsE?width=660)

$$\begin{align*}\mathbf{z}_j &= \mathbf{X}\mathbf{w}_j\\&=\text{what the } j\text{th neuron learns about the all the patients}\\&=w_{j,0}\times\textbf{HR}+w_{j,1}\times\textbf{BP}+w_{j,2}\times\textbf{Temp}+w_{j,3}\times\textbf{Sugar}+w_{j,4}\times\textbf{Vitamin D}\\&= w_{j,0}\mathbf{x}_0+w_{j,1}\mathbf{x}_1+w_{j,2}\mathbf{x}_2+w_{j,3}\mathbf{x}_3+w_{j,4}\mathbf{x}_4\\\Rightarrow\underbrace{\mathbf{z}_{{\color{cyan}0}}}_{{\color{cyan}0}\text{th neuron understanding}} &= \underbrace{\mathbf{X}}_{\color{yellow}{\text{all patients}}}\ \underbrace{\mathbf{w}_{{\color{cyan}0}}}_{{\color{cyan}0}\text{th neuron weights}}\\&= {\color{cyan}{-0.1}}\times\begin{bmatrix}{\color{yellow}{72}}\\{\color{yellow}{85}}\\{\color{yellow}{68}}\\{\color{yellow}{90}}\\{\color{yellow}{84}}\\{\color{yellow}{78}}\end{bmatrix}+{\color{cyan}{0.9}}\times\begin{bmatrix}{\color{yellow}{120}}\\{\color{yellow}{130}}\\{\color{yellow}{110}}\\{\color{yellow}{140}}\\{\color{yellow}{132}}\\{\color{yellow}{128}}\end{bmatrix}+({\color{cyan}{-1.5}})\times\begin{bmatrix}{\color{yellow}{37.3}}\\{\color{yellow}{37.0}}\\{\color{yellow}{38.5}}\\{\color{yellow}{38.0}}\\{\color{yellow}{38.3}}\\{\color{yellow}{37.2}}\end{bmatrix}+{\color{cyan}{0.1}}\times\begin{bmatrix}{\color{yellow}{104}}\\{\color{yellow}{110}}\\{\color{yellow}{125}}\\{\color{yellow}{130}}\\{\color{yellow}{146}}\\{\color{yellow}{102}}\end{bmatrix}+({\color{cyan}{-1.2}})\times\begin{bmatrix}{\color{yellow}{32.5}}\\{\color{yellow}{14}}\\{\color{yellow}{34}}\\{\color{yellow}{26}}\\{\color{yellow}{30}}\\{\color{yellow}{12}}\end{bmatrix}\\&=\begin{bmatrix}16.25\\47.20\\6.15\\41.80\\31.55\\47.40\end{bmatrix}.\end{align*}$$



---

In [None]:
## The 0-th column of the matrix-matrix product XW
torch.matmul(X,W[:,0])
# torch.dot(X,W[:,0])

tensor([16.2500, 47.2000,  6.1500, 41.8000, 31.5500, 47.4000],
       dtype=torch.float64)

---

**Version-3** view of the matrix-matrix product $\mathbf{Z} = \mathbf{XW}$:

*What all neurons understand about a particular patient.*

![matrix-matrix product version-3](https://1drv.ms/i/c/37720f927b6ddc34/IQRfO-qEJQ9mQYLH_f-lyjeQAaWV4FrDjTjaEHJpPB1PmCg?width=660)

$$\begin{align*}{\mathbf{z}^{(i)}}^\mathrm{T}&={\mathbf{x}^{(i)}}^\mathrm{T}\mathbf{W}\\&= \text{what is learned about the }i\text{th patient by all the neurons}\\&=i\text{th HR }\times{\mathbf{w}^{(0)}}^\mathrm{T}+i\text{th BP }\times{\mathbf{w}^{(1)}}^\mathrm{T}+i\text{th Temp }\times{\mathbf{w}^{(2)}}^\mathrm{T}+i\text{th Sugar }\times{\mathbf{w}^{(3)}}^\mathrm{T}+i\text{th Vitamin D }\times{\mathbf{w}^{(4)}}^\mathrm{T}\\&=x^{(i)}_0\times{\mathbf{w}^{(0)}}^\mathrm{T}+x^{(i)}_1\times{\mathbf{w}^{(1)}}^\mathrm{T}+x^{(i)}_2\times{\mathbf{w}^{(2)}}^\mathrm{T}+x^{(i)}_3\times{\mathbf{w}^{(3)}}^\mathrm{T}+x^{(i)}_4\times{\mathbf{w}^{(4)}}^\mathrm{T}\\\underbrace{\Rightarrow{{\mathbf{z}^{({\color{yellow}0})}}^\mathrm{T}}}_{{\color{yellow}{0}}\text{th patient understanding}}&=\underbrace{{{\mathbf{x}^{({\color{yellow}0})}}^\mathrm{T}}}_{{\color{yellow}{0}}\text{th patient}}\ \underbrace{\mathbf{W}}_{{\color{cyan}{\text{all neurons}}}}\\ &= {\color{yellow}{72}}\times\begin{bmatrix}{\color{cyan}{-0.1}} & {\color{cyan}{0.5}} & {\color{cyan}{0.3}}\end{bmatrix} \\&+ {\color{yellow}{120}}\times\begin{bmatrix}{\color{cyan}{0.9}} & {\color{cyan}{0.3}} & {\color{cyan}{0.5}}\end{bmatrix}\\&+{\color{yellow}{37.3}}\times\begin{bmatrix}{\color{cyan}{-1.5}} & {\color{cyan}{0.4}} & {\color{cyan}{0.1}}\end{bmatrix}\\&+{\color{yellow}{104}}\times\begin{bmatrix}{\color{cyan}{0.1}} & {\color{cyan}{0.1}} & {\color{cyan}{-1.0}}\end{bmatrix}\\&+{\color{yellow}{32.5}}\times\begin{bmatrix}{\color{cyan}{-1.2}} & {\color{cyan}{0.5}} & {\color{cyan}{-0.8}}\end{bmatrix}\\&=\begin{bmatrix}16.25 & 113.57 & -44.67\end{bmatrix}.\end{align*}$$


---

In [None]:
## The 0-th row of the matrix-matrix product XW
torch.matmul(X[0,:],W)

tensor([ 16.2500, 113.5700, -44.6700], dtype=torch.float64)

---

The similarity between each pair of words represented in the word embeddings matrix $\mathbf{X}_\mathrm{word}$ is the matrix-matrix product $\mathbf{X}_\mathrm{word}\mathbf{X}_\mathrm{word}^\mathrm{T}.$

---

---

The softmax function: takes a $k$-vector $\mathbf{z}$ as input and returns a vector $\mathbf{a}$ of the same shape as the output which is referred to as the softmax-activated scores.

$\begin{align*}\mathbf{a}&=\text{softmax}(\mathbf{z})=\begin{bmatrix}\dfrac{e^{z_1}}{e^{z_1}+e^{z_2}+\cdots+e^{z_k}}\\\dfrac{e^{z_2}}{e^{z_1}+e^{z_2}+\cdots+e^{z_k}}\\\vdots\\\dfrac{e^{z_k}}{e^{z_1}+e^{z_2}+\cdots+e^{z_k}}\end{bmatrix}.\end{align*}$

In the following example, we consider a raw scores vector $\mathbf{z}$ with 3 components which leads to the softmax-activated scores vectors $\mathbf{a}$ which can be interpreted as the predicted probabilities that the sample belongs to each one of the output classes:

![softmax](https://1drv.ms/i/s!AjTcbXuSD3I3hscmdol7J2G4GDo5WQ?embed=1&width=660)


---

In [None]:
# Raw scores matrix (matrix-matrix multiplication)
Z = torch.matmul(X,W)
print(f'Raw scores matrix:\n {Z}')

# calculate the softmax scores
softmax=torch.nn.Softmax(dim=1)
A=softmax(Z)
print(A)

Raw scores matrix:
 tensor([[ 16.2500, 113.5700, -44.6700],
        [ 47.2000, 114.3000, -27.0000],
        [  6.1500, 111.9000, -72.9500],
        [ 41.8000, 128.2000, -50.0000],
        [ 31.5500, 126.5200, -74.9700],
        [ 47.4000, 108.4800, -20.4800]], dtype=torch.float64)
tensor([[5.4258e-43, 1.0000e+00, 1.8934e-69],
        [7.2250e-30, 1.0000e+00, 4.3071e-62],
        [1.1840e-46, 1.0000e+00, 5.2561e-81],
        [2.9989e-38, 1.0000e+00, 4.0618e-78],
        [5.6892e-42, 1.0000e+00, 3.1189e-88],
        [2.9737e-27, 1.0000e+00, 9.8488e-57]], dtype=torch.float64)


In [None]:
z=torch.tensor([1,2,3],dtype=torch.float64)
print(z)
softmax=torch.nn.Softmax(dim=0)
A=softmax(z)
print(A)
print(torch.sum(A))

tensor([1., 2., 3.], dtype=torch.float64)
tensor([0.0900, 0.2447, 0.6652], dtype=torch.float64)
tensor(1.0000, dtype=torch.float64)


------------------------------

Standardization of data to get rid of the effects of the units

--------------------------------

In [None]:
# Heart rate vector unit is bps
a=X[:,0]
print(f"Heart rate vector: {a}")

# Bp vector
# b=X[:,1]
# print(f"BP vector: {b}")

# average heart rate
print(f"Average Heart rate: {torch.mean(a)}")

# average bp
# print(f"Average BP: {torch.mean(b)}")

# mean centered heart rate vector or the de-mean heart rate vector or deviation vetor
a_mc=a-torch.mean(a)
print(f"Mean centered Heart rate vector: {a_mc}")

# average of mean centered feature values will always be zero
print(f"Average of the components of Mean centered Heart rate vector: {torch.mean(a_mc)}")

# Squared deviation vector: a_mc**2 or torch.pow(a_mc,2)
print(f"Squared deviation vector: {a_mc**2}")

# average of Squared deviation a.k.a variants in the heartrate
v=torch.mean(a_mc**2)
print(f"Average of Squared deviation Heart rate vector: {v}")

# square root of the average of the squared devaiations vector
# which is the same as the square rot of the variance a.k.a standard deviation in the heartrate values
s=torch.sqrt(v)
print(f"Srandard devaiation of heart rate: {s}")

# standardized heart rate vector a.k.a z-score of the heart rate
# subtracting the vector by mean and dividing by the standard deviation
z=a_mc/s
print(f"Standardized heart rate vector: {z}")

Heart rate vector: tensor([72., 85., 68., 90., 84., 78.], dtype=torch.float64)
Average Heart rate: 79.5
Mean centered Heart rate vector: tensor([ -7.5000,   5.5000, -11.5000,  10.5000,   4.5000,  -1.5000],
       dtype=torch.float64)
Average of the components of Mean centered Heart rate vector: 0.0
Squared deviation vector: tensor([ 56.2500,  30.2500, 132.2500, 110.2500,  20.2500,   2.2500],
       dtype=torch.float64)
Average of Squared deviation Heart rate vector: 58.583333333333336
Srandard devaiation of heart rate: 7.65397500213669
Standardized heart rate vector: tensor([-0.9799,  0.7186, -1.5025,  1.3718,  0.5879, -0.1960],
       dtype=torch.float64)


In [None]:
 # Heart rate vector when the unit is bpm
a=X[:,0]*60
print(f"Heart rate vector: {a}")

# Bp vector
# b=X[:,1]
# print(f"BP vector: {b}")

# average heart rate
print(f"Average Heart rate: {torch.mean(a)}")

# average bp
# print(f"Average BP: {torch.mean(b)}")

# mean centered heart rate vector or the de-mean heart rate vector or deviation vetor
a_mc=a-torch.mean(a)
print(f"Mean centered Heart rate vector: {a_mc}")

# average of mean centered feature values will always be zero
print(f"Average of the components of Mean centered Heart rate vector: {torch.mean(a_mc)}")

# Squared deviation vector: a_mc**2 or torch.pow(a_mc,2)
print(f"Squared deviation vector: {a_mc**2}")

# average of Squared deviation a.k.a variants in the heartrate
v=torch.mean(a_mc**2)
print(f"Average of Squared deviation Heart rate vector: {v}")

# square root of the average of the squared devaiations vector
# which is the same as the square rot of the variance a.k.a standard deviation in the heartrate values
s=torch.sqrt(v)
print(f"Standard deviation of heart rate: {s}")

# standard score a.k.a z-score of the heart rate
# subtracting the vector by mean and dividing by the standard deviation
z=a_mc/s
print(f"Standardized heart rate vector: {z}")

Heart rate vector: tensor([4320., 5100., 4080., 5400., 5040., 4680.], dtype=torch.float64)
Average Heart rate: 4770.0
Mean centered Heart rate vector: tensor([-450.,  330., -690.,  630.,  270.,  -90.], dtype=torch.float64)
Average of the components of Mean centered Heart rate vector: 0.0
Squared deviation vector: tensor([202500., 108900., 476100., 396900.,  72900.,   8100.],
       dtype=torch.float64)
Average of Squared deviation Heart rate vector: 210900.0
Standard deviation of heart rate: 459.23850012820134
Standardized heart rate vector: tensor([-0.9799,  0.7186, -1.5025,  1.3718,  0.5879, -0.1960],
       dtype=torch.float64)


--------------------
One Hot Encoder

-------------------


![patient dataset](https://1drv.ms/i/s!AjTcbXuSD3I3hspfrgklysOtJMOjaA?embed=1&width=800)

In [None]:
# y=torch.tensor(['diabetic','non-diabetic']) - this function will not work in pytorch as it will not accept string values

# create a 1-D numpy array  of output labels (equivalent to a rank-1 tensor in Pytorch)
# Pytorch which itself is equivalent to a vector in pen & paper
y=np.array(['non-diabetic',
            'diabetic',
            'non-diabetic',
            'pre-diabetic',
            'diabetic',
            'pre-diabetic'])
print(y)
print(y.shape)
print(type(y))
print('-----------------')
y=y.reshape(-1,1)
print(y)
print(y.shape)
print(type(y))
# creating a one hot encoder object
ohe=OneHotEncoder(sparse_output=False)
# create the one hot encoded true output labels matrix
Y=torch.tensor(ohe.fit_transform(y))
print(Y)


['non-diabetic' 'diabetic' 'non-diabetic' 'pre-diabetic' 'diabetic'
 'pre-diabetic']
(6,)
<class 'numpy.ndarray'>
-----------------
[['non-diabetic']
 ['diabetic']
 ['non-diabetic']
 ['pre-diabetic']
 ['diabetic']
 ['pre-diabetic']]
(6, 1)
<class 'numpy.ndarray'>
tensor([[0., 1., 0.],
        [1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.],
        [1., 0., 0.],
        [0., 0., 1.]], dtype=torch.float64)


---------------------------
Forward propogation

---------------------------

In [None]:
# Standardize the data
sc= StandardScaler()
X_std=torch.tensor(sc.fit_transform(X))
print(f"The standardized data matirx:\n{X_std}")

# the one hot encoded true output labels matrix
print(f"The one hot encoded true output labels matrix:\n{Y}")

# calculate the raw scores using the standardized data matrix
# Weights matrix
W = torch.tensor([[-0.1, 0.5, 0.3],
                  [0.9, 0.3, 0.5],
                  [-1.5, 0.4, 0.1],
                  [0.1, 0.1, -1.0],
                  [-1.2, 0.5, -0.8]],dtype=torch.float64)
print(f'Weights matrix:\n{W}')

# Raw scores matrix
Z = torch.matmul(X_std,W)
print(f'Raw scores matrix:\n{Z}')

# calculate the soft max activated scores matrix
softmax=torch.nn.Softmax(dim=1)
A=softmax(Z)
print(f"Softmax activated raw score matrix:\n{A}")

The standardized data matirx:
tensor([[-0.9799, -0.7019, -0.7238, -0.9871,  0.8920],
        [ 0.7186,  0.3509, -1.2449, -0.6050, -1.2374],
        [-1.5025, -1.7547,  1.3607,  0.3503,  1.0647],
        [ 1.3718,  1.4037,  0.4922,  0.6687,  0.1439],
        [ 0.5879,  0.5615,  1.0133,  1.6876,  0.6043],
        [-0.1960,  0.1404, -0.8975, -1.1144, -1.4676]], dtype=torch.float64)
The one hot encoded true output labels matrix:
[[0. 1. 0.]
 [1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]
 [1. 0. 0.]
 [0. 0. 1.]]
Weights matrix:
tensor([[-0.1000,  0.5000,  0.3000],
        [ 0.9000,  0.3000,  0.5000],
        [-1.5000,  0.4000,  0.1000],
        [ 0.1000,  0.1000, -1.0000],
        [-1.2000,  0.5000, -0.8000]], dtype=torch.float64)
Raw scores matrix:
tensor([[-0.6171, -0.6427, -0.4438],
        [ 3.5357, -0.7126,  1.8614],
        [-4.7127, -0.1660, -2.3940],
        [ 0.2821,  1.4427,  0.3789],
        [-1.6298,  1.3386, -1.6126],
        [ 3.1418, -1.2601,  2.2101]], dtype=torch.float64)
Softmax acti

In [None]:
print(Y)
print(A)
print(Y*A) #hadamard product
print(torch.sum(Y*A,dim=1)) #predicted probabilities corresponding to true output labels
print(-torch.log(torch.sum(Y*A,dim=1))) #log of the predicted probabilities corresponding to true output labels
print(torch.mean(-torch.log(torch.sum(Y*A,dim=1)))) #average loss

tensor([[0., 1., 0.],
        [1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.],
        [1., 0., 0.],
        [0., 0., 1.]], dtype=torch.float64)
tensor([[0.3161, 0.3081, 0.3759],
        [0.8321, 0.0119, 0.1560],
        [0.0095, 0.8942, 0.0963],
        [0.1889, 0.6030, 0.2081],
        [0.0466, 0.9061, 0.0474],
        [0.7112, 0.0087, 0.2801]], dtype=torch.float64)
tensor([[0.0000, 0.3081, 0.0000],
        [0.8321, 0.0000, 0.0000],
        [0.0000, 0.8942, 0.0000],
        [0.0000, 0.0000, 0.2081],
        [0.0466, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.2801]], dtype=torch.float64)
tensor([0.3081, 0.8321, 0.8942, 0.2081, 0.0466, 0.2801], dtype=torch.float64)
tensor([1.1774, 0.1838, 0.1118, 1.5697, 3.0671, 1.2726], dtype=torch.float64)
tensor(1.2304, dtype=torch.float64)
