# Computation

Linear algebra in practice is done using software, as the number of computations required for even basic operations is prohibitive for working by hand. We are going to focus on two aspects of computation in linear algebra: *time complexity* and *numerical stability*. 

## Time Complexity

Every mathematical operation we perform entails computational costs. In order to perform the computation, we must invest time and computational resources. For small examples like those illustrated in previous chapters, the computational costs are negligible, but most real-world use of linear algebra is not like that. At the time of this writing, widely used machine learning or AI tools perform extensive computations - primarily matrix-matrix or matrix-vector multiplies - with matrices containing millions to billions of components. The computational costs of such operations can be amortized only to a degree through parallelization, and the computational cost is always a consideration when actually using linear algebra; indeed, there are many important ideas and algorithms from linear algebra that simply aren't used because the costs are just too high. 

```{admonition} Definition: Big-O notation
We say '$f(x)$ is on the order of $g(x)$' and write $f(x) = \mathcal{O}(g(x))$ as $x\to\infty$ if there exist constants $M, x_0$ such that $|f(x)| \leq M|g(x)|$ for all $x \geq x_0$.
```

big-O notation is typically used when one is interested in describing the growth rate of a function; that is, what happens in a general sense as the input grows large. It is not appropriate when we are looking for exactness, and it is important to recognize that big-O notation is a way of simply giving an upper bound on the growth rate of a function, and that upper bound is not necessarily the *least* upper bound.

**Example:** Suppose that $f(x) = 3x^3 - 270x^2 + 121234x + 9999$. Then $f(x) = \mathcal{O}(x^3)$, because for very large values of $x$, $f(x)$ is not very different than $3x^3$, and in particular, $|f(x)| \leq 3|x^3|$ for large enough $x$. On the other hand, it never equals $3x^3$. Formally, for $x \geq 1$,

$$
    \begin{align*}
        |3x^3 - 270x^2 + 121234x + 9999| &\leq 3x^3 + |-270x^2| + |121234x| + 9999 \\
        |3x^3 - 270x^2 + 121234x + 9999| &\leq 3x^3 + 270x^3 + 121234x^3 + 9999x^3 \\
        |3x^3 - 270x^2 + 121234x + 9999| &\leq 131,506x^3,
    \end{align*}
$$

which proves the claim, though note that in fact $M$ can be made smaller than $131,506$; see the computational example below:

In [1]:
import numpy as np
import pandas as pd

def cubic(x: float) -> float:
    return 3 * x**3 - 270 * x**2 + 121234 * x + 9999

arr = np.empty(shape=(11, 3))
for i, x in enumerate(np.linspace(1, 100000000, 11)):
    arr[i,:] = [x, cubic(x), 3*x**3]

df = pd.DataFrame(arr)
df.columns=['x', 'cubic(x)', '$3x^3$']
df

Unnamed: 0,x,cubic(x),$3x^3$
0,1.0,130966.0,3.0
1,10000000.9,2.999974e+21,3.000001e+21
2,20000000.8,2.399989e+22,2.4e+22
3,30000000.7,8.099976e+22,8.100001e+22
4,40000000.6,1.919996e+23,1.92e+23
5,50000000.5,3.749993e+23,3.75e+23
6,60000000.4,6.47999e+23,6.48e+23
7,70000000.3,1.028999e+24,1.029e+24
8,80000000.2,1.535998e+24,1.536e+24
9,90000000.1,2.186998e+24,2.187e+24


When using big-O notation to discuss algorithms for computation, we typically have in mind a function $f(x)$ where the input size is the independent variable $x$, and the output of $f$ is the runtime of the algorithm. In this context, there is an implicit assumption that $x$ is a positive integer and the outputs of $f$ and $g$ are nonnegative real numbers.

```{admonition} Definition: Time Complexity
 The *time complexity* of an algorithm is the number of elementary operations needed to perform the algorithm.
 ```

The above definition has a number of implicit assumptions, the most important one for us being the assumption that each elementary operation takes an equal fixed amount of time (this is sometimes called the *unit cost* model of computation). In practice this is not true; on modern processors it is estimated that multiplication takes 3-6 times longer than addition. For this reason, we will often count multiplication operations and addition operations separately when analyzing algorithms that involve both, and may even choose to focus only on multiplications at the expense of additions from time to time.

## Comparing Orders

Suppose that we want to add together positive integers from $1$ to $n$. There are two common ways to do this. One is the obvious way: $\sum_{i=1}^n i = 1 + 2 + \cdots + (n-1) + n$. The other is to use the formula $\sum_{i=1}^n i = n(n+1)/2$. The first is $\mathcal{O}(n)$, while the second is $\mathcal{O}(1)$. In fact, we can time these two approaches here to see the difference here:

In [2]:
%%timeit

sum = 0
for i in range(100000):
    sum += i

3.87 ms ± 337 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [3]:
%%timeit

100000*(100000+1)/2


5.32 ns ± 0.098 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)


The second approach is much faster than the first, but the really important thing to note here is that the second approach takes approximately as long (there is some variability that can't be controlled) regardless of how many integers we happen to be adding up, while the first increases in time proportional to the number of integers being added up:

In [4]:
%%timeit 

sum = 0
for i in range(100000000):
    sum += i

KeyboardInterrupt: 

In [45]:
%%timeit

100000000*(100000000+1)/2


6.58 ns ± 0.11 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)


## Time Complexity of Vector and Matrix Operations

**Example:** The time complexity of adding two vectors of length $n$ is $n$, because adding the vectors requires $n$ elementary operations, one addition for each component of the new vector. In big-O notation, the time complexity is $\mathcal{O}(n)$.

**Example:** The time complexity of the dot product of two vectors of lenth $n$ is $2n-1$. There are $n$ multiplications to perform, followed by summing the values produced by the multiplication, of which there are $n$. Only $n-1$ additions are required to sum $n$ elements (check this with an example if it isn't obvious to you), therefore the time complexity of the dot product is $n + (n-1) = 2n-1$ or $\mathcal{O}(n)$.