In [None]:
] instantiate

## What does MultiFloat do approximately?

MultiFloats.jl is mostly code to generate julia code:

In [1]:
using MultiFloats
MultiFloats.multifloat_mul_func(4)

:(function multifloat_mul(a::MultiFloat{T, 4}, b::MultiFloat{T, 4}) where T
      $(Expr(:meta, :inline))
      (t0_0, e0_1) = two_prod(a._limbs[1], b._limbs[1])
      (t0_1, e0_2) = two_prod(a._limbs[1], b._limbs[2])
      (t1_1, e1_2) = two_prod(a._limbs[2], b._limbs[1])
      (t0_2, e0_3) = two_prod(a._limbs[1], b._limbs[3])
      (t1_2, e1_3) = two_prod(a._limbs[2], b._limbs[2])
      (t2_2, e2_3) = two_prod(a._limbs[3], b._limbs[1])
      (t0_3, e0_4) = two_prod(a._limbs[1], b._limbs[4])
      (t1_3, e1_4) = two_prod(a._limbs[2], b._limbs[3])
      (t2_3, e2_4) = two_prod(a._limbs[3], b._limbs[2])
      (t3_3, e3_4) = two_prod(a._limbs[4], b._limbs[1])
      t0_4 = a._limbs[2] * b._limbs[4]
      t1_4 = a._limbs[3] * b._limbs[3]
      t2_4 = a._limbs[4] * b._limbs[2]
      s0 = t0_0
      (s1, m1_2, m1_3) = mpadd_3_3(t0_1, t1_1, e0_1)
      (s2, m2_3, m2_4) = mpadd_6_3(t0_2, t1_2, t2_2, e0_2, e1_2, m1_2)
      (s3, m3_4) = mpadd_9_2(t0_3, t1_3, t2_3, t3_3, e0_3, e1_3, e2_3, m1_3, 

This top-level function for multiplication calls into further generated functions. But how many flop is it actually doing?

 ## Counting flop by hooking into the compiler
 
 GFlops is a magical package that hooks into the compiler through Cassette.jl, allowing it to contextually/temporarily "overdub" low-level add, sub, mul, div, fma functions and wrap them with a counter

In [2]:
import GFlops: @count_ops, flop
using MultiFloats

In [3]:
flop_per_N = Int[]

for N = 2:8
    F = Float64x{N}
    stats = @count_ops F(1) * F(2) + F(3)
    push!(flop_per_N, flop(stats))
    println("N = ", N, ": ", stats, "\n", "─"^30)
end

N = 2: Flop Counter: 36 flop
┌─────┬─────────┐
│[1m     [0m│[1m Float64 [0m│
├─────┼─────────┤
│[1m fma [0m│       1 │
│[1m add [0m│      13 │
│[1m sub [0m│      18 │
│[1m mul [0m│       3 │
└─────┴─────────┘
──────────────────────────────
N = 3: Flop Counter: 90 flop
┌─────┬─────────┐
│[1m     [0m│[1m Float64 [0m│
├─────┼─────────┤
│[1m fma [0m│       3 │
│[1m add [0m│      32 │
│[1m sub [0m│      46 │
│[1m mul [0m│       6 │
└─────┴─────────┘
──────────────────────────────
N = 4: Flop Counter: 179 flop
┌─────┬─────────┐
│[1m     [0m│[1m Float64 [0m│
├─────┼─────────┤
│[1m fma [0m│       6 │
│[1m add [0m│      63 │
│[1m sub [0m│      94 │
│[1m mul [0m│      10 │
└─────┴─────────┘
──────────────────────────────
N = 5: Flop Counter: 315 flop
┌─────┬─────────┐
│[1m     [0m│[1m Float64 [0m│
├─────┼─────────┤
│[1m fma [0m│      10 │
│[1m add [0m│     110 │
│[1m sub [0m│     170 │
│[1m mul [0m│      15 │
└─────┴─────────┘
─────────────────────

In [4]:
flop_per_N

7-element Vector{Int64}:
   36
   90
  179
  315
  510
  776
 1125

## Fitting the data

Polynomials is a useful package to fit a ... polynomial through our data

In [5]:
using Polynomials: fit

In [6]:
f = fit(2:8, flop_per_N)

So one fused-multiply-add for a MultiFloat is $O(2N^3)$ flop -- it's quite expensive