# Linear Algebra

In [1]:
import tensorflow as tf

## 2.1 Scalars, Vectors, Matrices, and Tensors

### Scalars
A single number. Typically written in italics, with a lowercase variable name, existing in a variety of spaces (e.g. $ \it{x} \in \mathbb{R}$ or $\it{x} \in \mathbb{N}$, or $\it{x} \in \mathbb{Z} $)


In [2]:
x = tf.constant(35, name='x')  # create a constant called y, which has a numerical value of 35
y = tf.Variable(x + 5, name='y')  # create a variable called x, defined as the equation "x + 5"

model = tf.global_variables_initializer()  # initialize variables

with tf.Session() as session:  # create a session
    session.run(model)  # run model (tf.global_variables_initializer())
    print("The value of y is:", session.run(y))  # run just the variable y and print its current value

The value of y is: 40


### Vectors
An array of numbers, denoted by lowercase variable names written in bold. If a vector $\bf{v}$ contains $\it{n}$ elements, each element in $\mathbb{R}$, then the vector lies in the set formed by taking the Cartesian product of $\mathbb{R}$ $\it{n}$ times, denoted as $\mathbb{R}^n$. When we need to explicitly identify the elements of a vector, we write them as a enclosed in square brackets.

For example:   $\begin{equation}
     v=\begin{bmatrix}
         v_{1} \\
         v_{2} \\
         \vdots\\
         v_{n}
        \end{bmatrix}
  \end{equation} \in \mathbb{R}^n $ is the n-dimensional vector $\bf{v}$ in Cartesian space.

In [3]:
v = tf.Variable([1, 2, 3, 4], tf.int32, name="v")  # we can create a rank 1 tensor object (i.e. a vector) by passing a list

model = tf.global_variables_initializer()  # initialize variables

with tf.Session() as session:  # create a session
    session.run(model)  # run model (tf.global_variables_initializer())
    print("A vector, v:", session.run(v))

A vector, v: [1 2 3 4]


### Matrices

A matrix is simply a 2-D array of numbers, denoted by uppercase letters with a bold typeface. Matrices typically follow a row-dominant notation; that is, a matrix $\bf{A} \in \mathbb{R}^{m \times n}$ is a matrix in Cartesian space with $\it{m}$ rows and $\it{n}$ columns. A single element of the matrix situated in the $\it{ith}$ row and $\it{jth}$ column is denoted $M_{i, j}$.

For example: $\begin{equation}
A=\begin{bmatrix}
    a_{11} & a_{12} & a_{13} & \dots  & a_{1n} \\
    a_{21} & a_{22} & a_{23} & \dots  & a_{2n} \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    a_{m1} & a_{m2} & a_{m3} & \dots  & a_{mn}
\end{bmatrix}
\end{equation} \in \mathbb{R}^{m \times n}$ is the real-valued $\it{m \times n}$ matrix $\bf{A}$.

In [4]:
m = tf.Variable([[1, 2, 3], [4, 5, 6], [7, 8, 9]], tf.int32, name="m") # pass in a 2-dimensional array

model = tf.global_variables_initializer()  # initialize variables

with tf.Session() as session:  # create a session
    session.run(model)  # run model (tf.global_variables_initializer())
    print("A matrix, M:\n",session.run(m))


A matrix, M:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]


### Tensors

In some cases we will need an array with more than two axes. In the general case, an array of numbers arranged on a regular grid with a variable number of axes is known as a tensor. We identify the element of a tensor $\bf{T}$ at coordinates ($\it{i}$, $\it{j}$, $\it{k}$) by writing $T_{i, j, k}$.

In [5]:
t = tf.ones([3, 4, 5])  # 3x4x5 tensor populated with ones
t_mat = tf.reshape(t, [6, 10])  # reshape t by passing t into tf.reshape with the desired dimensions

model = tf.global_variables_initializer()  # initialize variables

with tf.Session() as session:  # create a session
    session.run(model)  # run model (tf.global_variables_initializer())
    print("A tensor, T:\n",session.run(t), "\n")  # print t
    print("T reshaped into a 6x10 matrix:\n", session.run(t_mat))  # print t_mat

A tensor, T:
 [[[ 1.  1.  1.  1.  1.]
  [ 1.  1.  1.  1.  1.]
  [ 1.  1.  1.  1.  1.]
  [ 1.  1.  1.  1.  1.]]

 [[ 1.  1.  1.  1.  1.]
  [ 1.  1.  1.  1.  1.]
  [ 1.  1.  1.  1.  1.]
  [ 1.  1.  1.  1.  1.]]

 [[ 1.  1.  1.  1.  1.]
  [ 1.  1.  1.  1.  1.]
  [ 1.  1.  1.  1.  1.]
  [ 1.  1.  1.  1.  1.]]] 

T reshaped into a 6x10 matrix:
 [[ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]]


### Transposes

One important operation on matrices is the transpose. The transpose of a matrix is the mirror image of the matrix across a diagonal line (called the "main diagonal") which begins in the upper left corner, running down and across ot the bottom right. We denote the transpose of a matrix $\bf{A}$ as $\bf{A}^\top$, and it is defined such that $({\bf{A}^\top}_{i, j} = A_{j, i})$.



In [6]:
# recall m = tf.Variable([[1, 2, 3], [4, 5, 6], [7, 8, 9]], tf.int32, name="m")
m_t = tf.transpose(m)

model = tf.global_variables_initializer()  # initialize variables

with tf.Session() as session:  # create a session
    session.run(model)  # run model (tf.global_variables_initializer())
    print("A matrix, M:\n",session.run(m), "\n")
    print("The transpose of M:\n", session.run(m_t))

A matrix, M:
 [[1 2 3]
 [4 5 6]
 [7 8 9]] 

The transpose of M:
 [[1 4 7]
 [2 5 8]
 [3 6 9]]


### Other Operations

**Matrix Addition**: We can add matrices together (given that they have the same shape) which is simply addition of corresponding elements:

In [7]:
# recall m = tf.Variable([[1, 2, 3], [4, 5, 6], [7, 8, 9]], tf.int32, name="m")
n = tf.ones([3, 3], tf.int32, name="n")
p = tf.add(m, n, name="p")

model = tf.global_variables_initializer()  # initialize variables

with tf.Session() as session:  # create a session
    session.run(model)  # run model (tf.global_variables_initializer())
    print("A matrix, m:\n", session.run(m), "\n")
    print("A matrix, n:\n", session.run(n) ,"\n")
    print("A matrix p, i.e. m + n:\n", session.run(p), "\n")

A matrix, m:
 [[1 2 3]
 [4 5 6]
 [7 8 9]] 

A matrix, n:
 [[1 1 1]
 [1 1 1]
 [1 1 1]] 

A matrix p, i.e. m + n:
 [[ 2  3  4]
 [ 5  6  7]
 [ 8  9 10]] 



**Matrix and Scalar Addition/Multiplication**: We can add a scalar to a matrix, or multiply a matrix by a scalar. Here, each element of the matrix is operated on by the given scalar:

In [8]:
# recall m = tf.Variable([[1, 2, 3], [4, 5, 6], [7, 8, 9]], tf.int32, name="m")
x = tf.constant(2, name='x')
y = tf.add(m, x, name="y")
z = tf.multiply(m, x, name="z")

model = tf.global_variables_initializer()  # initialize variables

with tf.Session() as session:  # create a session
    session.run(model)  # run model (tf.global_variables_initializer())
    print("A matrix, m:\n", session.run(m), "\n")
    print("A scalar, x:", session.run(x), "\n")
    print("A matrix y, i.e. m + x:\n", session.run(y), "\n")
    print("A matrix z, i.e. m * x:\n", session.run(z))

A matrix, m:
 [[1 2 3]
 [4 5 6]
 [7 8 9]] 

A scalar, x: 2 

A matrix y, i.e. m + x:
 [[ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]] 

A matrix z, i.e. m * x:
 [[ 2  4  6]
 [ 8 10 12]
 [14 16 18]]


**Matrix and Vector Addition**: In the context of deep learning, we also allow the addition of matrix and a vector, yielding another matrix, where the addition is done row-wise (each element in the vector is added to a corresponding element in each row of the matrix).

In [9]:
# recall m = tf.Variable([[1, 2, 3], [4, 5, 6], [7, 8, 9]], tf.int32, name="m")
v = tf.Variable([1, 2, 1], tf.int32, name="v")
n = tf.add(m, v, name="n")

model = tf.global_variables_initializer()  # initialize variables

with tf.Session() as session:  # create a session
    session.run(model)  # run model (tf.global_variables_initializer())
    print("A matrix, m:\n", session.run(m), "\n")
    print("A vector, v:", session.run(v), "\n")
    print("A matrix n, i.e. m + v:\n", session.run(n), "\n")


A matrix, m:
 [[1 2 3]
 [4 5 6]
 [7 8 9]] 

A vector, v: [1 2 1] 

A matrix n, i.e. m + v:
 [[ 2  4  4]
 [ 5  7  7]
 [ 8 10 10]] 



## 2.2 Multiplying Matrices and Vectors

### The Dot Product

The dot product between two vectors $\bf{x}$ and $\bf{y}$ of the same dimensionality (i.e. the same number of elements) is the matrix product $\bf{x}^\top\bf{y}$, where the result is the scalar which is the sum of products of corresponding elements, i.e. $\bf{x} \dot \bf{y} = \sum\limits_{k}x_{i}y_{i}$.

In [10]:
# recall v = tf.Variable([1, 2, 1], tf.int32, name="v")
w = tf.Variable([3, 2, 1], tf.int32, name="w")
x = tf.tensordot(v, w, 1, name="x")

model = tf.global_variables_initializer()  # initialize variables

with tf.Session() as session:  # create a session
    session.run(model)  # run model (tf.global_variables_initializer())
    print("A vector, v:", session.run(v), "\n")
    print("A vector, w:", session.run(w), "\n")
    print("The dot product of v and w:", session.run(x))

A vector, v: [1 2 1] 

A vector, w: [3 2 1] 

The dot product of v and w: 8


### Matrix Multiplication

One of the most important operations involving matrices is the multiplication of two matrices. The matrix product of matrices $\bf{A}$ and $\bf{B}$ is a third matrix, $\bf{C}$. In order for this product to be defined, $\bf{A}$ must have the same number of columns as $\bf{B}$ has rows. If $\bf{A}$ is of shape $\it{m \times n}$ and $\bf{B}$ is of shape $\it{n \times p}$ then $\bf{C}$ is of shape $\it{m \times p}$.

We can write the matrix product by just placing two or more matrices together, for example, $ \bf{C} = \bf{AB} $.

The product operation is defined by $C_{i, j} = \sum\limits_{k} A_{i, k}B_{k, j} $.

**Remark**: The standard product of two matrices is NOT simply the multiplication of corresponding elements.

In [11]:
# recall m = tf.Variable([[1, 2, 3], [4, 5, 6], [7, 8, 9]], tf.int32, name="m")
n = tf.Variable([[1, 2, 1], [1, 0, 2], [2, 1, 0]])
p = tf.matmul(m, n, name="p")

model = tf.global_variables_initializer()  # initialize variables

with tf.Session() as session:  # create a session
    session.run(model)  # run model (tf.global_variables_initializer())
    print("A matrix, m:\n", session.run(m), "\n")
    print("A matrix n:\n", session.run(n), "\n")
    print("A matrix p, i.e. the matrix product mn:\n", session.run(p))

A matrix, m:
 [[1 2 3]
 [4 5 6]
 [7 8 9]] 

A matrix n:
 [[1 2 1]
 [1 0 2]
 [2 1 0]] 

A matrix p, i.e. the matrix product mn:
 [[ 9  5  5]
 [21 14 14]
 [33 23 23]]


### Properties of Matrix Multiplication
1. Distributive, i.e. $\bf{A}(\bf{B}+\bf{C}) = \bf{AB} + \bf{AC}$
2. Associative, i.e. $\bf{A}(\bf{B}\bf{C}) = (\bf{AB})\bf{C}$
3. NOT Commutative, i.e. $\bf{AB} = \bf{BA}$ does NOT always hold.

While matrix multiplication is not commutative, the dot product of vectors is, i.e. $\bf{x}^\top\bf{y} = \bf{y}^\top \bf{x}$. 

The transpose of a matrix product has the simple form $(\bf{AB})^\top = \bf{B}^\top\bf{A}^\top$.

This allows us to demonstrate the commutativity of the dot product by exploiting the fact that the value of such a product is a scalar and is therefore equal to its own transpose, i.e. $\bf{x}^\top\bf{y} = (\bf{x}^\top\bf{y})^\top = \bf{y}^\top \bf{x} $.

### Systems of Linear Equations

We now know enough linear algebra to write down a system of linear equations, $\bf{Ax} = \bf{b}$, where $\bf{A} \in \mathbb{R}^{m \times n} $ is a known matrix, $\bf{b} \in \mathbb{R}^m$ is a known vector, and $\bf{x} \in \mathbb{R}^n$ is a vector of unknown variables we'd like to solve for.

We can rewrite the above equation as:

$\bf{A}_\it{1, 1}x_{1} + \bf{A}_\it{1, 2}x_{2} + \dots + \bf{A}_\it{1, n}x_{1} = \it{b}_{1} \\
\bf{A}_\it{2, 1}x_{1} + \bf{A}_\it{2, 2}x_{2} + \dots + \bf{A}_\it{2, n}x_{n} = \it{b}_{2} \\
\dots \\
\bf{A}_\it{m, 1}x_{1} + \bf{A}_\it{m, 2}x_{1} + \dots + \bf{A}_\it{m, n}x_{n} = \it{b}_{m}$

## 2.3 Identity and Inverse Matrices

Linear algebra offers a powerful tool called **matrix inversion** that enables us to analytically solve systems of linear equations for many values of $\bf{A}$.

To describe matrix inversion, we first need to define the concept of an **identity matrix**. An identity matrix, characterized by zero entries except those occupying the main diagonal (these are all 1's), has the property that it does not change any vector when we multiply that vector by the matrix. We denote the identy matrix that preserves $\it{n}$-dimensional vectors as $\bf{I}_\it{n}$, where $\bf{I}_\it{n} \in \mathbb{R}^{n \times n}$ and $ \forall{x} \in \mathbb{R}^{n}, \bf{I}_\it{n}\bf{x} = x$

Thus, the **matrix inverse** of $\bf{A}$ is denoted as $\bf{A}^{-1}A = \bf{I}_\it{n}$.

We can now solve the equation $\bf{Ax} = \bf{b}$ with the following steps:

$\bf{A}^{-1}Ax = \bf{A}^{-1}\it{b} \\
\bf{I}_\it{n}\bf{x} = \bf{A}^{-1}\it{b} \\
\bf{x} = \bf{A}^{-1}\it{b} $

Of course this process relies on it being possible to find $\bf{A}^{-1}$. The conditions for the existence of $\bf{A}^{-1}$ are discussed in the following section. 

When $\bf{A}^{-1}$ exits, several different algorithms can find it in closed form. In theory, the same inverse matrix can be used to solve the equation many times for different values of $\it{b}$. While useful as a theoretical tool, $\bf{A}^{-1}$ should note actually be used in practice for most software applications. Because $\bf{A}^{-1}$ can be represented with only limited precision on a digital computer, algorithms that make use of the value of b can usually obtain more accurate estimates of $\bf{x}$.

## Linear Dependence and Span

For $\bf{A}^{-1}$ to exist, the equation $\bf{Ax} = \bf{b}$ must have exactly one solution for every value of $\bf{b}$. It is also possible for the system of equations to have **no solutions** or **infinitely many solution** for some values of $\it{b}$. It is not possible, however, to have more than one but less than infinitely many solutions for a particular $\it{b}$; if both $\bf{x}$ and $\bf{y}$ are solutions, then $\bf{z} = \it{\alpha}\bf{x} + (1-\it{\alpha})\bf{y}$

To analyze how many solutions the equation has, think of the columns of $\bf{A}$ as specifying the different directions we can travel in from the **origin** (specified as the $\it{n}$-dimensional vector of zeros) then determine how many ways there are of reaching $\it{b}$. In this view, each elements of $\bf{x}$ specifies how far we should travel in each of these directions, with $\bf{x}_{\it{i}}$ specifying how far to move in the direction of column $\it{i}$.