# Generated functions 

**Generated functions** are a mechanism for literally **generating** functions, in the sense of **code generation** for different types, when you want **complete control** over the resulting code.

They are used for parametric types where we need to generate high-performance code whose exact structure changes depending on an input parameter. Examples often involve implementing **unrolled** code for arrays of known size, as in the `StaticArrays.jl` package.

Functions are generated *on demand*, i.e. only those ones that are actually called have code generated.

These are things that you could do at runtime, when we know the types and the values. By doing it at compile time, when we know the types (but not the values) we can do performance optimizations.  Do the computation only once (when each version of the function is compiled), not every time you call the function.

https://discourse.julialang.org/t/understanding-generated-functions/10092/4

Steven Johnson keynote: "Adventures in Code Generation". https://www.youtube.com/watch?v=mSgXWpvQEHE

## Example: Multiplying polynomials

This example is modified from the https://github.com/tkoolen/StaticUnivariatePolynomials.jl package; thanks to Twan Koolen.

Suppose we have a polynomial parametrised by its degree. For simplicity we will assume that the coefficients are integers. We follow the convention in `Polynomials.jl` that the coefficients are listed in increasing order

In [1]:
struct Poly{N}
    coeffs::NTuple{N,Int64}
end

In [2]:
p = Poly((1, 2, 3))

Poly{3}((1, 2, 3))

This represents the polynomial $1 + 2x + 3x^2$.

Let's also define

In [5]:
Base.getindex(p::Poly, ix) = p.coeffs[ix + 1]

so that 

In [29]:
p[0], p[1], p[2]

(1, 2, 3)

is the coefficient of degree 0.

We can define the sum of two polynomials of the same degree as

In [7]:
Base.:+(p::Poly{N}, q::Poly{N}) where {N} = Poly(p.coeffs .+ q.coeffs)

In [8]:
@time p + p

  0.013159 seconds (81.34 k allocations: 5.225 MiB, 99.52% compilation time)


Poly{3}((2, 4, 6))

In [10]:
# import Pkg; Pkg.add("BenchmarkTools")

[32m[1m    Updating[22m[39m registry at `~/.julia/registries/General`
[32m[1m    Updating[22m[39m git-repo `https://github.com/JuliaRegistries/General.git`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m   Installed[22m[39m BenchmarkTools ─ v1.1.1
[32m[1m    Updating[22m[39m `~/Projects/ML_DL/Notebooks/julia-notebooks/JuliaCon2021/Metaprogramming/Project.toml`
 [90m [6e4b80f9] [39m[92m+ BenchmarkTools v1.1.1[39m
[32m[1m    Updating[22m[39m `~/Projects/ML_DL/Notebooks/julia-notebooks/JuliaCon2021/Metaprogramming/Manifest.toml`
 [90m [6e4b80f9] [39m[92m+ BenchmarkTools v1.1.1[39m
[32m[1mPrecompiling[22m[39m project...
[32m  ✓ [39mBenchmarkTools
  1 dependency successfully precompiled in 2 seconds (81 already precompiled)


In [11]:
using BenchmarkTools

In [12]:
@btime $(Ref(p))[] + $(Ref(p))[]

  1.304 ns (0 allocations: 0 bytes)


Poly{3}((2, 4, 6))

But what about multiplication? Suppose we want the product to just retain terms up to the same degree. Then

In [13]:
function Base.:*(p::Poly{N}, q::Poly{N}) where {N}
    return @inbounds Poly(ntuple(n -> sum(p[i] * q[n-1 - i] for i in 0:n-1), Val(N)))
end

In [14]:
p * p

Poly{3}((1, 4, 10))

In [15]:
@btime $p * $p

  6.687 ns (0 allocations: 0 bytes)


Poly{3}((1, 4, 10))

However, it should be possible to make this faster by **unrolling the code**, i.e. writing the loops out explicitly:

In [19]:
function my_mult(p::Poly{3}, q::Poly{3})
    @inbounds (p[0]*q[0], 
        p[0]*q[1] + p[1]*q[0],
        p[0]*q[2] + p[1]*q[1] + p[2]*q[0])
end

my_mult (generic function with 1 method)

In [20]:
my_mult(p, p)

(1, 4, 10)

In [21]:
@btime my_mult($(Ref(p))[], $(Ref(p))[])

  2.069 ns (0 allocations: 0 bytes)


(1, 4, 10)

Of course, we do *not* want to do this by hand. Rather, we want to tell Julia *exactly* what code to generate. As usual, we build this up piece by piece.

In [31]:
all_results = []
for n = 0:2
    products = [:(p[$i] * q[$(n - i)]) for i ∈ 0:n]
    push!(all_results, :(+($(products...))) )
end

In [32]:
all_results

3-element Vector{Any}:
 :(+(p[0] * q[0]))
 :(p[0] * q[1] + p[1] * q[0])
 :(p[0] * q[2] + p[1] * q[1] + p[2] * q[0])

In [33]:
all_results2 = [
    :(+($([:(p[$i] * q[$(n - i)]) for i ∈ 0:n]...))) for n ∈ 0:2
]

3-element Vector{Expr}:
 :(+(p[0] * q[0]))
 :(p[0] * q[1] + p[1] * q[0])
 :(p[0] * q[2] + p[1] * q[1] + p[2] * q[0])

In [36]:
## As a side note
@assert +(0) == 0

In [24]:
:(Tuple($(all_results...)))

:(Tuple(+(p[0] * q[0]), p[0] * q[1] + p[1] * q[0], p[0] * q[2] + p[1] * q[1] + p[2] * q[0]))

In [34]:
:(Tuple($(all_results2...)))

:(Tuple(+(p[0] * q[0]), p[0] * q[1] + p[1] * q[0], p[0] * q[2] + p[1] * q[1] + p[2] * q[0]))

In [25]:
@generated function Base.:*(p::Poly{N}, q::Poly{N}) where {N}
    all_results = []
    for n ∈ 0:N-1
        products = [:(p[$i] * q[$(n - i)]) for i ∈ 0:n]
        push!(all_results, :(+($(products...))))
    end
    tup = :(tuple($(all_results...)))
    return :(Poly($tup))
end

In [26]:
p * p

Poly{3}((1, 4, 10))

In [27]:
@btime $(Ref(p))[] * $(Ref(p))[]

  2.075 ns (0 allocations: 0 bytes)


Poly{3}((1, 4, 10))