# Linear Algebra 

Linear algebra is a branch of mathematics that is widely used throughout science and engineering. However, because linear algebra is a form of continuous rather than discrete mathematics, many computer scientists have little experience with it. A good understanding of linear algebra is essential for understanding and working with many machine learning algorithms, especially deep learning algorithms. We therefore precede our introduction to deep learning with a focused presentation of the key linear algebra prerequisites.

## Scalars, Vectors, Matrices and Tensors
The study of linear algebra involves several types of mathematical objects:

### Scalars: 
A scalar is just a single number, in contrast to most of the other objects studied in linear algebra, which are usually arrays of multiple numbers.

### Vectors: 
A vector is an array of numbers. The numbers are arranged in order. We can identify each individual number by its index in that ordering.

<table style="width:100%">
  <tr>
    <th><img src="photos/vector.png" alt="Drawing" style="width:200px;"/></th>
  </tr>
</table>

### Matrices: 
A matrix is a 2-D array of numbers, so each element is identified
by two indices instead of just one. We usually give matrices upper-case
variable names with bold typeface, such as A. If a real-valued matrix A has
a height of m and a width of n, then we say that $A \in R^{m×n} $

<table style="width:100%">
  <tr>
    <th><img src="photos/matrix.png" alt="Drawing" style="width:600px;"/></th>
  </tr>
</table>

### Tensors: 
   In some cases we will need an array with more than two axes.
In the general case, an array of numbers arranged on a regular grid with a
variable number of axes is known as a tensor. We denote a tensor named “A”
with this typeface: **A**. We identify the element of A at coordinates (i, j, k)
by writing $A_{i,j,k}$.

### Transpose

One important operation on matrices is the **transpose**. The transpose of a
matrix is the mirror image of the matrix across a diagonal line, called the main
diagonal, running down and to the right, starting from its upper left corner.  We denote the transpose of a
matrix $A$ as $A^T$.

- Vectors can be thought of as matrices that contain only one column. The transpose of a vector is therefore a matrix with only one row. Sometimes we define a vector by writing out its elements in the text inline as a row matrix, then using the transpose operator to turn it into a standard column vector, e.g., $x = [x_1, x_2, x_3 ]^T$.

- A scalar can be thought of as a matrix with only a single entry. From this, we can see that a scalar is its own transpose: $a = a^T$.

- We can add matrices to each other, as long as they have the same shape, just by adding their corresponding elements: $C = A + B$ where $C_{i,j} = A_{i,j} + B_{i,j}$.

- We can also add a scalar to a matrix or multiply a matrix by a scalar, just by performing that operation on each element of a matrix: $D = a · B + c$ where $D_{i,j} = a · B_{i,j} + c$.

In the context of deep learning, we also use some less conventional notation.
We allow the addition of matrix and a vector, yielding another matrix: $C = A + b$,
where $C_{i,j} = A_{i,j} + b_{j}$. In other words, the vector b is added to each row of the
matrix. This shorthand eliminates the need to define a matrix with b copied into
each row before doing the addition. This implicit copying of b to many locations
is called **broadcasting**.

## Multiplying Matrices and Vectors

One of the most important operations involving matrices is multiplication of two
matrices. The **matrix product** of matrices A and B is a third matrix C. In
order for this product to be defined, A must have the same number of columns as
B has rows. If A is of shape $m × n$ and B is of shape $n × p$, then C is of shape
$m × p$. We can write the matrix product just by placing two or more matrices
together, e.g.

$$C = AB.$$

The product operation is defined by

$$C_{i,j} = \sum_k A_{i,k}B_{k,j}.$$

Note that the standard product of two matrices is not just a matrix containing
the product of the individual elements. Such an operation exists and is called the
**element-wise product** or **Hadamard product**, and is denoted as $A \bigodot B$.

The **dot product** between two vectors x and y of the same dimensionality
is the matrix product $x^Ty$. We can think of the matrix product $C = AB$ as
computing $C_{i,j}$ as the dot product between row i of A and column j of B.

Matrix product operations have many useful properties that make mathematical
analysis of matrices more convenient. For example, matrix multiplication is
distributive:

$$ A(B + C) = AB +AC.$$ 

It is also associative:

$$ A(BC) = (AB)C.$$

Matrix multiplication is not commutative (the condition AB = BA does not
always hold), unlike scalar multiplication. However, the dot product between two
vectors is commutative:

$$x^Ty = y^Tx.$$

The transpose of a matrix product has a simple form:

$$(AB)^T = B^TA^T.$$

This allows us to demonstrate the equation, by exploiting the fact that the value
of such a product is a scalar and therefore equal to its own transpose:

$$x^Ty =(x ^Ty)^T = y^Tx .$$

We now know enough linear algebra notation to write down a system of linear
equations:

$$Ax = b$$ 

where $A \in R^{m×n}$ is a known matrix, $b \in R^m$ is a known vector, and $x \in R^n$ is a
vector of unknown variables we would like to solve for. Each element $x_i$ of x is one
of these unknown variables. Each row of A and each element of b provide another
constraint. We can rewrite the equation as:

$$A_{1,:}x = b_1$$
$$A_{2,:}x = b_2$$ 
$$. . .$$ 
$$A_{m,:}x = b_m$$ 

or, even more explicitly, as:

$$A_{1,1}x_1 + A_{1,2}x_2 + · · · + A{1,n}x_n = b_1 $$
$$A_{2,1}x_1 + A_{2,2}x_2 + · · · + A{2,n}x_n = b_2 $$
$$. . .$$
$$A_{m,1}x_1 + A_{m,2}x_2 + · · · + A{m,n}x_n = b_m $$

### Identity and Inverse Matrices
Linear algebra offers a powerful tool called matrix inversion that allows us to
analytically solve the equation for many values of A.

To describe matrix inversion, we first need to define the concept of an identity
matrix. An **identity matrix** is a matrix that does not change any vector when we
multiply that vector by that matrix. We denote the identity matrix that preserves
n-dimensional vectors as $I_n$. Formally, $I_n \in R^{n×n}$, and
$∀x \in R_n, I_nx = x$. 

The structure of the identity matrix is simple: all of the entries along the main
diagonal are 1, while all of the other entries are zero. The matrix inverse of A is denoted as $A^{−1}$, and it is defined as the matrix such that $A^{−1}A = I_n$. 

We can now solve the equation by the following steps:

$$Ax = b$$
$$A^{−1}Ax = A^{−1}b$$
$$I_nx = A^{−1}b$$
$$x = A−1b.$$

Of course, this process depends on it being possible to find $A^{−1}$. 

When $A^{−1}$ exists, several different algorithms exist for finding it in closed form.
In theory, the same inverse matrix can then be used to solve the equation many
times for different values of b . However, $A^{−1}$ is primarily useful as a theoretical
tool, and should not actually be used in practice for most software applications.
Because $A^{−1}$ can be represented with only limited precision on a digital computer,
algorithms that make use of the value of b can usually obtain more accurate
estimates of x.

### Linear Dependence and Span

In order for $A^{−1}$ to exist, the equation must have exactly one solution for every
value of b. However, it is also possible for the system of equations to have no
solutions or infinitely many solutions for some values of b. It is not possible to
have more than one but less than infinitely many solutions for a particular b; if
both x and y are solutions then

$$ z = αx + (1 − α)y $$

is also a solution for any real α.

To analyze how many solutions the equation has, we can think of the columns
of A as specifying different directions we can travel from the **origin** (the point
specified by the vector of all zeros), and determine how many ways there are of
reaching b. In this view, each element of x specifies how far we should travel in
each of these directions, with $x_i$ specifying how far to move in the direction of
column i:

$$ Ax =\sum_i{x_iA_{:,i}}.$$

In general, this kind of operation is called a **linear combination**. Formally, a
linear combination of some set of vectors $\{v^{(1)}, . . . , v^{(n)}\}$ is given by multiplying
each vector $v^{(i)}$ by a corresponding scalar coefficient and adding the results:

$$\sum_i{c_iv^{(i)}}.$$

**The span** of a set of vectors is the set of all points obtainable by linear combination
of the original vectors.

Determining whether **Ax = b** has a solution thus amounts to testing whether b
is in the span of the columns of A. This particular span is known as the **column
space** or the range of A.

Formally, this kind of redundancy is known as **linear dependence**. A set of
vectors is **linearly independent** if no vector in the set is a linear combination
of the other vectors. If we add a vector to a set that is a linear combination of
the other vectors in the set, the new vector does not add any points to the set’s
span. This means that for the column space of the matrix to encompass all of $R^m$,
the matrix must contain at least one set of m linearly independent columns.

Together, this means that the matrix must be **square**, that is, we require that
m = n and that all of the columns must be linearly independent. A square matrix
with linearly dependent columns is known as **singular**.

So far we have discussed matrix inverses as being multiplied on the left. It is
also possible to define an inverse that is multiplied on the right:

$$AA^{−1} = I.$$ 

For square matrices, the left inverse and right inverse are equal.

### Norms

Sometimes we need to measure the size of a vector. In machine learning, we usually
measure the size of vectors using a function called a **norm** . Formally, the $L^p$ norm
is given by

$$ ||x||_p = {(\sum_i|x_i|^p)}^{\frac{1}{p}} $$

for $p \in R,p \geq 1$.

Norms, including the $L^p$ norm, are functions mapping vectors to non-negative
values. $O_n$ an intuitive level, the norm of a vector x measures the distance from
the origin to the point x. More rigorously, a norm is any function f that satisfies
the following properties:

- f (x) = 0 ⇒ x = 0
- f (x + y) ≤ f(x) + f (y) (the triangle inequality)
- ∀α ∈ R, f (αx) = |α|f (x)

The $L^2$ norm, with p = 2, is known as the Euclidean norm. It is simply the
Euclidean distance from the origin to the point identified by x. The $L^2$ norm is
used so frequently in machine learning that it is often denoted simply as ||x||, with
the subscript 2 omitted. It is also common to measure the size of a vector using
the squared $L^2$ norm, which can be calculated simply as $x^Tx$.

The squared $L^2$ norm is more convenient to work with mathematically and
computationally than the $L^2$ norm itself. For example, the derivatives of the
squared $L^2$ norm with respect to each element of x each depend only on the
corresponding element of x, while all of the derivatives of the $L^2$ norm depend
on the entire vector. In many contexts, the squared $L^2$ norm may be undesirable
because it increases very slowly near the origin. In several machine learning applications, it is important to discriminate between elements that are exactly
zero and elements that are small but nonzero. In these cases, we turn to a function
that grows at the same rate in all locations, but retains mathematical simplicity:
the  $L^1$ norm. The $L^1$ norm may be simplified to

$$||x||_1 = \sum_i|x_i|.$$

The $L^1$ norm is commonly used in machine learning when the difference between
zero and nonzero elements is very important. Every time an element of x moves
away from 0 by $\epsilon$, the $L_1$ norm increases by $\epsilon$.

We sometimes measure the size of the vector by counting its number of nonzero
elements. Some authors refer to this function as the “$L^0$ norm,” but this is incorrect
terminology. The number of non-zero entries in a vector is not a norm, because
scaling the vector by α does not change the number of nonzero entries. The $L^1$
norm is often used as a substitute for the number of nonzero entries.

One other norm that commonly arises in machine learning is the $L^{\infty}$ norm,
also known as the **max norm**. This norm simplifies to the absolute value of the
element with the largest magnitude in the vector,

$$||x||_{\infty} = \max_i |x_i |.$$

### Special Kinds of Matrices and Vectors

Some special kinds of matrices and vectors are particularly useful.

**Diagonal matrices** consist mostly of zeros and have non-zero entries only along
the main diagonal. Formally, a matrix D is diagonal if and only if $D_{i,j} = 0$ for
all $i \neq j$ . We have already seen one example of a diagonal matrix: the identity
matrix, where all of the diagonal entries are 1. We write diag(v) to denote a square
diagonal matrix whose diagonal entries are given by the entries of the vector v.
Diagonal matrices are of interest in part because multiplying by a diagonal matrix
is very computationally efficient. To compute diag(v)x, we only need to scale each
element $x_i$ by $v_i$ . In other words, $diag(v)x = v \bigodot x$. Inverting a square diagonal
matrix is also efficient.  In many cases, we may derive some very general machine learning algorithm in terms of arbitrary matrices,
but obtain a less expensive (and less descriptive) algorithm by restricting some
matrices to be diagonal.

A **symmetric matrix** is any matrix that is equal to its own transpose:

$$A = A^T. $$

Symmetric matrices often arise when the entries are generated by some function of
two arguments that does not depend on the order of the arguments. For example,
if A is a matrix of distance measurements, with Ai,j giving the distance from point
i to point j, then $A_{i,j} = A_{j,i}$ because distance functions are symmetric.

A **unit vector** is a vector with **unit norm**:

$$ ||x||_2 = 1.$$

A vector x and a vector y are **orthogonal** to each other if $x^Ty = 0$. If both
vectors have nonzero norm, this means that they are at a 90 degree angle to each
other. In $R^n$ , at most n vectors may be mutually orthogonal with nonzero norm.
If the vectors are not only orthogonal but also have unit norm, we call them
orthonormal.

An orthogonal matrix is a square matrix whose rows are mutually orthonormal
and whose columns are mutually orthonormal:

$$A^{T}A = AA^T = I. $$

This implies that

$$A^{−1} = A^T,$$

so orthogonal matrices are of interest because their inverse is very cheap to compute.
Pay careful attention to the definition of orthogonal matrices. Counterintuitively,
their rows are not merely orthogonal but fully orthonormal. There is no special
term for a matrix whose rows or columns are orthogonal but not orthonormal.

### The Trace Operator
The trace operator gives the sum of all of the diagonal entries of a matrix:

$$ Tr(A) = \sum_i A_{i,i} .$$

The trace operator is useful for a variety of reasons. Some operations that are
difficult to specify without resorting to summation notation can be specified using matrix products and the trace operator. For example, the trace operator provides
an alternative way of writing the Frobenius norm of a matrix:

$$ ||A||_F = \sqrt{ Tr(AA^T)}. $$

Writing an expression in terms of the trace operator opens up opportunities to
manipulate the expression using many useful identities. For example, the trace
operator is invariant to the transpose operator:

$$Tr(A) = Tr(A^T).$$

The trace of a square matrix composed of many factors is also invariant to
moving the last factor into the first position, if the shapes of the corresponding
matrices allow the resulting product to be defined:

$$Tr(ABC) = Tr(CAB) = Tr(BCA)$$

or more generally,

$$ Tr(\prod^n_{i=1}F(i)) = Tr(F^(n)\prod^{n−1}_{i=1}F(i)). $$

This invariance to cyclic permutation holds even if the resulting product has a
different shape. For example, for $A \in R^{m×n}$ and $B \in R^{n×m}$, we have

$$Tr(AB) = Tr(BA)$$

even though $AB \in R^{m×m}$ and $BA \in R^{n×n}$.

Another useful fact to keep in mind is that a scalar is its own trace: a = Tr(a).

### The Determinant

The determinant of a square matrix, denoted det(A), is a function mapping
matrices to real scalars. The determinant is equal to the product of all the
eigenvalues of the matrix. The absolute value of the determinant can be thought
of as a measure of how much multiplication by the matrix expands or contracts
space. If the determinant is 0, then space is contracted completely along at least
one dimension, causing it to lose all of its volume. If the determinant is 1, then
the transformation preserves volume.

## Eigenvalue and eigenvector

Eigenvalue and eigenvector are probably one of the most important concepts in linear algebra. Who can expect a simple equation like $ Av = \lambda v$ is so significant? From machine learning, quantum computing, and physic, many mathematical and engineering problems can be solved by finding the eigenvalue and eigenvectors of a matrix. Let’s not only discover what it is but also answer why it is so important. We will look into the Google PageRank to see how page ranking works.

By definition, scalar λ and vector v are the eigenvalue and eigenvector of A if

<table style="width:100%">
  <tr>
    <th><img src="photos/lin1.png" alt="Drawing" style="width:600px;"/></th>
  </tr>
   <tr>
    <th><img src="photos/lin2.jpeg" alt="Drawing" style="width:800px;"/></th>
  </tr>
</table>

Visually, Av lies along the same line as the eigenvector v.

Here are some examples.

<table style="width:100%">
  <tr>
    <th><img src="photos/lin3.jpeg" alt="Drawing" style="width:800px;"/></th>
  </tr>
   <tr>
    <th><img src="photos/lin4.jpeg" alt="Drawing" style="width:800px;"/></th>
  </tr>
</table>

However, Ax does not usually equal to λx. Only some exceptional vectors satisfy the condition. If the eigenvalue is greater than one, the corresponding Avᵢ will expand. If it is smaller than one, it will shrink.

## Application

But before getting into details, let’s pause and appreciate the beauty of such an abstract concept first. Many problems can be modeled with linear transformations with solutions derived from eigenvalues and eigenvectors. Let’s detail it with an abstract example first before real problems with a billion-dollar idea — Google’s PageRank. In many systems, we can express the properties in a vector with their rates of change linearly depend on the current properties (e.g. the population growth rate depends on the current population and GDP linearly.). The general equation is



<table style="width:100%">
  <tr>
    <th><img src="photos/lin5.jpeg" alt="Drawing" style="width:800px;"/></th>
  </tr>
</table>

So let’s take a guess on u(t) that satisfies the equation above. Since the derivative of an exponential function equals itself, we start with an exponential function of t and multiply it with a vector x — the output will be a vector.

<table style="width:100%">
  <tr>
    <th><img src="photos/lin6.jpeg" alt="Drawing" style="width:800px;"/></th>
  </tr>
</table>

From the calculation above, our solution for u(t) is

<table style="width:100%">
  <tr>
    <th><img src="photos/lin7.jpeg" alt="Drawing" style="width:800px;"/></th>
  </tr>
</table>

Next, we will find its complete solution. Our first order derivative equation is a linear function.

<table style="width:100%">
  <tr>
    <th><img src="photos/lin8.jpeg" alt="Drawing" style="width:800px;"/></th>
  </tr>
</table>

For linear functions, the complete solution is the linear combination of particular solutions. If u and v are the solutions, C₁u + C₂v is also the solution. From our previous example with eigenvalues λ = 4, -2 and -2, the complete solution will be

<table style="width:100%">
  <tr>
    <th><img src="photos/lin9.jpeg" alt="Drawing" style="width:800px;"/></th>
  </tr>
</table>

If a system will reach a stable state, then all eigenvalues have to be negative. At time t=0, we can measure the initial state u(0), say [u₀₁, u₀₂, u₀₃]ᵀ, and solve the constant C₁, C₂, and C₃.

<table style="width:100%">
  <tr>
    <th><img src="photos/lin10.jpeg" alt="Drawing" style="width:800px;"/></th>
  </tr>
</table>

This is not an isolated example in demonstrating the power of eigenvalues. Nature seems to have an eigenvector cookbook when making its design. The famous time-independent Schrödinger equation is expressed with eigenvalues and eigenvectors. All observed properties are modeled by eigenvalues in quantum mechanics. They are many other examples including machine learning and one of the biggest eigenvector computed, Google PageRank.



<table style="width:100%">
  <tr>
    <th><img src="photos/lin11.jpeg" alt="Drawing" style="width:800px;"/></th>
  </tr>
</table>

Fundamentally, many systems can be modeled as

<table style="width:100%">
  <tr>
    <th><img src="photos/lin12.jpeg" alt="Drawing" style="width:800px;"/></th>
  </tr>
</table>

Let’s study the time sequence model a little more for the purpose of machine learning.

<table style="width:100%">
  <tr>
    <th><img src="photos/lin13.jpeg" alt="Drawing" style="width:800px;"/></th>
  </tr>
</table>

First, we assume the initial state u₀ to be an eigenvector of A. Therefore, the future states can be computed as

<table style="width:100%">
  <tr>
    <th><img src="photos/lin14.jpeg" alt="Drawing" style="width:800px;"/></th>
  </tr>
</table>

In short, we can simplify the calculation by replacing the power of a matrix (Aᵏ) with the power of a scalar. Next, consider A has n linearly independent eigenvectors which form a basis of Rⁿ. We can decompose any vector of Rⁿ into this basis and simplify the calculation by computing the power of the eigenvalue again.

<table style="width:100%">
  <tr>
    <th><img src="photos/lin15.jpeg" alt="Drawing" style="width:800px;"/></th>
  </tr>
</table>

If a system will reach a stable state, we should expect λᵢ to be smaller or equal to 1. To compute the stable state, we can ignore terms with λᵢ smaller than 1 and just find the eigenvector associated with λᵢ = 1.

Let’s discuss a real multi-billion idea to realize its full potential. Let’s simplify the discussion which assumes the whole internet contains only three web pages. The element Aᵢⱼ of a matrix A is the probability of a user going to page i when the user is on page j.

<table style="width:100%">
  <tr>
    <th><img src="photos/lin16.jpeg" alt="Drawing" style="width:800px;"/></th>
  </tr>
</table>

If we sum up all the possibilities of the next page given a specific page, it equals 1. Therefore, all columns of A sum up to 1.0 and this kind of matrix is called the stochastic matrix, transition matrix or Markov matrix.

<table style="width:100%">
  <tr>
    <th><img src="photos/lin17.png" alt="Drawing" style="width:800px;"/></th>
  </tr>
</table>

Markov matrix has some important properties. The result of Ax or Aᵏx always sums up to one with its columns. This result indicates the chance of being on page 1, 2 and 3 respectively after each click. So it is obvious that it should sum up to one.

<table style="width:100%">
  <tr>
    <th><img src="photos/lin17.jpeg" alt="Drawing" style="width:800px;"/></th>
  </tr>
   <tr>
    <th><img src="photos/lin18.jpeg" alt="Drawing" style="width:800px;"/></th>
  </tr>
</table>

Any Markov matrix A has an eigenvalue of 1 and other eigenvalues, positive or negative, will have their absolute values smaller than one. This behavior is very important. In our example,

<table style="width:100%">
  <tr>
    <th><img src="photos/lin19.jpeg" alt="Drawing" style="width:800px;"/></th>
  </tr>
</table>

For a Markov matrix, we can choose the eigenvector for λ=1 to have elements sum up to 1.0. Vectors v with elements sum up to one can be decomposed using the eigenvectors of A with c₁equals to 1 below.

<table style="width:100%">
  <tr>
    <th><img src="photos/lin20.jpeg" alt="Drawing" style="width:800px;"/></th>
  </tr>
</table>

Since u₁, u₂, …, and $u_n$ are eigenvectors, Aᵏ can be replaced by λᵏ. Except for eigenvalue λ=1, the power of the eigenvalue (λᵏ) for a Markov matrix will diminish, as the absolute values of these eigenvalues are smaller than one. So the system reaches a steady state that approaches the eigenvector u₁ regardless of the initial state. And both Aᵏ and the steady state can be derived from the eigenvector u₁ as below.

<table style="width:100%">
  <tr>
    <th><img src="photos/lin21.jpeg" alt="Drawing" style="width:800px;"/></th>
  </tr>
</table>

In our example, the chance we land on page 1, 2 and 3 are about 0.41, 0.34 and 0.44 respectively. This concept has many potential applications. For instance, many problems can be modeled with Markov processes and a Markov/transition matrix.

<table style="width:100%">
  <tr>
    <th><img src="photos/lin22.jpeg" alt="Drawing" style="width:800px;"/></th>
  </tr>
</table>

## Properties of eigenvalue & eigenvectors

   - $A_x$ lies on the same line as the eigenvector x (same or opposite direction).
   - The sum of eigenvalues equals the trace of a matrix (sum of diagonal elements).
   - The product of eigenvalues equals the determinant.
   - Both conditions above serve as a good insanity check on the calculations of eigenvalues.
   - If no eigenvalue is repeated, all eigenvectors are linearly independent. Such an n × n matrix will have n eigenvalues and n linearly independent eigenvectors.
   - If eigenvalues are repeated, we may or may not have all n linearly independent eigenvectors to diagonalize a square matrix.
   - The number of positive eigenvalues equals the number of positive pivots.
   - For Ax = λx,

<table style="width:100%">
  <tr>
    <th><img src="photos/lin23.jpeg" alt="Drawing" style="width:800px;"/></th>
  </tr>
</table>

   - If A is singular, it has an eigenvalue of 0. An invertible matrix has all eigenvalues non-zero.
   - Eigenvalues and eigenvectors can be complex numbers.
   - Projection matrices always have eigenvalues of 1 and 0 only. Reflection matrices have eigenvalues of 1 and -1.

### More thoughts

Eigenvalues quantify the importance of information along the line of eigenvectors. Equipped with this information, we know what part of the information can be ignored and how to compress information (SVD, Dimension reduction & PCA). It also helps us to extract features in developing machine learning models. Sometimes, it makes the model easier to train because of the reduction of tangled information. It also serves the purpose to visualize tangled raw data. Other applications include the recommendation systems or financial risk analysis. For example, we suggest movies based on your personal viewing behavior and others. We can also use eigenvectors to understand the correlations among data. Develop trends of the information and cluster information to find the common factors, like the combination of genes that triggers certain kind of disease. And all of them start from the simple equation:

<table style="width:100%">
  <tr>
    <th><img src="photos/lin1.png" alt="Drawing" style="width:800px;"/></th>
  </tr>
</table>