Optimization by specialization to argument types comes with the tradeoff of increased compilation time. An extreme case of this is shown in the following example.

In [1]:
f(x) = x^2 + 3x + 2
g(::Val{x}) where x = f(x)
f(10^6), g(Val(10^6))

(1000003000002, 1000003000002)

In [2]:
using BenchmarkTools
@btime f(10^6)
@btime g($(Val(10^6)));

  3.500 ns (0 allocations: 0 bytes)
  0.001 ns (0 allocations: 0 bytes)


`g(Val(10^6))` is ultra fast because it is specialized to the argument type `Val{10^6}` and compiled to `return 1000003000002`.

In [3]:
@code_typed debuginfo=:none g(Val(10^6))

CodeInfo(
[90m1 ─[39m     return 1000003000002
) => Int64

However, `g(Val(k))` is compiled separately for each different `k`.  So if you run `g(Val(k))` for a large number of different `k`'s,  it will perform a large number of compilations and will be very slow. (After compilation, though, it will be explosively fast.) 

In [4]:
F(n) = [f(k) for k in 1:n]
G(n) = [g(Val(k)) for k in 1:n]

@time F(10^4)
@time G(10^4);

  0.000010 seconds (2 allocations: 78.203 KiB)
  5.160295 seconds (55.43 M allocations: 3.894 GiB, 14.56% gc time, 95.65% compilation time)


The first execution of `G(10^4)` is very slow.

In [5]:
F(10^4) == G(10^4)

true

Thus, it is not reasonable to try to optimize by specialization to the argument types by different large numbers of `Val{k}` types.

On the other hand, the native code specialized to argument types is very fast, so if compilation time is not an issue, optimization by specialization to argument types should be done aggressively.

In short, it is a matter of trade-off.