In [5]:
exec(open("../../../python/FNC_init.py").read())

[**Demo %s**](#demo-flops-mvmult)


Here is a straightforward implementation of matrix-vector multiplication.

In [6]:
n = 6
A = random.rand(n, n)
x = ones(n)
y = zeros(n)
for i in range(n):
    for j in range(n):
        y[i] += A[i, j] * x[j]   # 2 flops

Each of the loops implies a summation of flops. The total flop count for this algorithm is

$$
\sum_{i=1}^n \sum_{j=1}^n 2 = \sum_{i=1}^n 2n = 2n^2.
$$

Since the matrix $\mathbf{A}$ has $n^2$ elements, all of which have to be involved in the product, it seems unlikely that we could get a flop count that is smaller than $O(n^2)$ in general.

Let's run an experiment with the built-in matrix-vector multiplication. We assume that flops dominate the computation time and thus measure elapsed time.

In [11]:
n = 600 * arange(2, 11)
t = zeros(len(n))
results = PrettyTable(["n", "time (s)"])
for k in range(len(n)):
    A = random.randn(n[k], n[k])  
    x = random.randn(n[k])
    start = timer()
    for j in range(80): A @ x
    t[k] = timer() - start
    results.add_row([n[k], t[k]])
print(results)

+------+---------------------+
|  n   |       time (s)      |
+------+---------------------+
| 1200 | 0.08702104199983296 |
| 1800 | 0.03114179099975445 |
| 2400 | 0.06517870900006528 |
| 3000 | 0.13051145800000086 |
| 3600 | 0.20717416600018623 |
| 4200 |  0.296035958000175  |
| 4800 | 0.28200816699973075 |
| 5400 |  0.3776759580005091 |
| 6000 | 0.45094179199986684 |
+------+---------------------+


The reason for doing multiple repetitions at each value of $n$ above is to avoid having times so short that the resolution of the timer is a factor.

Looking at the timings just for $n=3000$ and $n=6000$, they have ratio:

In [12]:
print(t[n == 6000] / t[n == 3000])

[3.45518929]


If the run time is dominated by flops, then we expect this ratio to be 

$$
\frac{2(6000)^2}{2(3000)^2}=4.
$$