In [2]:
#| include: false
using Pkg
Pkg.activate(@__DIR__)
Pkg.instantiate()
cd(@__DIR__)

[32m[1m  Activating[22m[39m project at `~/gitrepos/kdheepak.github.io/blog/effect-of-type-inference-on-performance-in-julia`


In Julia, to ensure that the code you write executes fast and efficiently, it is important to benchmark frequently. There's lots of really useful tips in the [Performance Tips] section in the official documentation.

[Performance Tips]: https://docs.julialang.org/en/v1/manual/performance-tips/

In this blog post, I want to touch on one specific performance tip: containers with abstract types and type inference.

# Toy problem

Let's define a toy problem to work with.

In [4]:
abstract type Shape end
area(::Shape) = 0.0

@kwdef struct Square <: Shape
    side::Float64 = rand()
end
area(s::Square) = s.side * s.side
    
@kwdef struct Rectangle <: Shape
    width::Float64 = rand()
    height::Float64 = rand()
end
area(r::Rectangle) = r.width * r.height
    
@kwdef struct Triangle <: Shape
    base::Float64 = rand()
    height::Float64 = rand()
end
area(t::Triangle) = 1.0/2.0 * t.base * t.height

@kwdef struct Circle <: Shape
    radius::Float64 = rand()
end
area(c::Circle) = π * c.radius^2

nothing #| hide_line

area (generic function with 5 methods)

We can use the builtin `Test` module to check that the code we wrote is correct.

In [5]:
using Test
@testset "Areas" begin
    @test area(Square(2)) == 4
    @test area(Rectangle(2,3)) == 6
    @test area(Triangle(2,3)) == 3
    @test area(Circle(2)) ≈ 4π
end;

[0m[1mTest Summary: | [22m[32m[1mPass  [22m[39m[36m[1mTotal  [22m[39m[0m[1mTime[22m
Areas         | [32m   4  [39m[36m    4  [39m[0m0.1s


Let's also build 1 million random shapes.

In [43]:
using Random
Random.seed!(42)

count = 1_000_000

shapes = [rand((Square,Rectangle,Triangle,Circle))() for _ in 1:count];

In [7]:
#| echo: false
using Format
using Markdown
l = cfmt("%'d", length(shapes))
Markdown.md"The total number of shapes we have is $l."

The total number of shapes we have is 1,000,000.


# Type inference

We can use the `typeof` function to see what the type of the data in the `shapes` variable is:

In [8]:
typeof(shapes)

Vector{Shape}[90m (alias for [39m[90mArray{Shape, 1}[39m[90m)[39m

By default, Julia will infer the type at the bottom of the tree that fits all the data in the container.
For example, if we just built a vector with the same elements (e.g. `Square`), Julia will infer the container to be `Vector{Square}`.

In [9]:
typeof([shape_builder(rand((1,))) for _ in 1:10])

Vector{Square}[90m (alias for [39m[90mArray{Square, 1}[39m[90m)[39m

In [10]:
println(join(string.(supertypes(Square)), " <: "))

Square <: Shape <: Any


Let's define a function that calculates the `area` for all the shapes and adds them all up

In [11]:
main1(shapes) = sum(area, shapes)

main1 (generic function with 1 method)

We can test this function and precompile it by running it once.

In [12]:
@time main1(shapes)

  0.081926 seconds (2.03 M allocations: 32.311 MiB, 11.31% gc time, 31.45% compilation time)


439078.977716569

Unfortunately, it can be easy to accidentally construct a container with an abstract type for the type parameter of a generic type.

In [13]:
bad_shapes_by_type(::Type{T}, shapes) where T = filter(s -> isa(s, T), shapes)

shape_arr1 = bad_shapes_by_type(Square, shapes)
shape_arr2 = bad_shapes_by_type(Rectangle, shapes)
shape_arr3 = bad_shapes_by_type(Triangle, shapes)
shape_arr4 = bad_shapes_by_type(Circle, shapes)

@show typeof(shape_arr1)
@show typeof(shape_arr2)
@show typeof(shape_arr3)
@show typeof(shape_arr4)
nothing

typeof(shape_arr1) = Vector{Shape}
typeof(shape_arr2) = Vector{Shape}
typeof(shape_arr3) = Vector{Shape}
typeof(shape_arr4) = Vector{Shape}


This can happen if the Julia compiler cannot infer the types at "compile time". 

For better performance, it helps to have concrete types in the generic parameters for a container. 
We can do this by helping the compiler figure out the correct concrete type parameter by explicitly listing it before the brackets for constructing the array, i.e. `T[...]`.

In [21]:
good_shapes_by_type(::Type{T}, shapes) where T = T[shape for shape in filter(s -> isa(s, T), shapes)]

good_shapes_by_type (generic function with 1 method)

In [22]:
square_arr = good_shapes_by_type(Square, shapes)
rectangle_arr = good_shapes_by_type(Rectangle, shapes)
triangle_arr = good_shapes_by_type(Triangle, shapes)
circle_arr = good_shapes_by_type(Circle, shapes)

@show typeof(square_arr)
@show typeof(rectangle_arr)
@show typeof(triangle_arr)
@show typeof(circle_arr)
nothing

typeof(square_arr) = Vector{Square}
typeof(rectangle_arr) = Vector{Rectangle}
typeof(triangle_arr) = Vector{Triangle}
typeof(circle_arr) = Vector{Circle}


# Benchmarks

Let's combine these arrays into three vectors of different types:

In [23]:
sorted_shapes_shape = vcat(square_arr, rectangle_arr, triangle_arr, circle_arr);
sorted_shapes_any = Any[s for s in sorted_shapes_shape];
sorted_shapes_union = Union{Square, Rectangle, Triangle, Circle}[s for s in sorted_shapes_shape];

@show typeof(sorted_shapes_shape)
@show typeof(sorted_shapes_any)
@show typeof(sorted_shapes_union);

typeof(sorted_shapes_shape) = Vector{Shape}
typeof(sorted_shapes_any) = Vector{Any}
typeof(sorted_shapes_union) = Vector{Union{Circle, Rectangle, Square, Triangle}}


We can benchmark the performance of these different types using `BenchmarkTools`:

In [33]:
using BenchmarkTools

@show typeof(shapes)
@benchmark main1($shapes)

typeof(shapes) = Vector{Shape}


BenchmarkTools.Trial: 121 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m39.196 ms[22m[39m … [35m48.168 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 8.63%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m40.279 ms              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m41.548 ms[22m[39m ± [32m 2.268 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m3.35% ± 4.24%

  [39m▁[39m [39m█[39m▂[39m [39m [39m [39m [39m [34m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m█[39m▆[39m█[39m█[39m█[39m▇[39m

Both benchmarks for `Vector{Shape}` and `Vector{Any}` can be inefficient but are usually similar in performance to each other. 

The Julia manual has the following to say:

> If you cannot avoid containers with abstract value types, it is sometimes better to parametrize with `Any` to avoid runtime type checking. E.g. IdDict{Any, Any} performs better than IdDict{Type, Vector}

In [35]:
@show typeof(sorted_shapes_shape)
@benchmark main1($sorted_shapes_shape)

typeof(sorted_shapes_shape) = Vector{Shape}


BenchmarkTools.Trial: 151 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m31.052 ms[22m[39m … [35m37.758 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 10.25%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m31.946 ms              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m33.165 ms[22m[39m ± [32m 2.176 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m4.09% ±  5.15%

  [39m█[39m▄[39m [39m [39m [39m [39m [39m [34m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m█[39m█[39m▇[39m▆[39m▆[39m▆[3

In [36]:
@show typeof(sorted_shapes_any)
@benchmark main1($sorted_shapes_any)

typeof(sorted_shapes_any) = Vector{Any}


BenchmarkTools.Trial: 151 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m31.087 ms[22m[39m … [35m37.602 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 10.10%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m32.221 ms              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m33.200 ms[22m[39m ± [32m 2.068 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m3.74% ±  4.73%

  [39m█[39m█[39m [39m [39m [39m [39m [39m [39m [39m [39m [34m [39m[39m [39m [39m [39m [39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m█[39m█[39m▆[39m▆[39m▄[39m▅[3

However, what is interesting is that `Vector{Union{Circle, Rectangle, Square, Triangle}}` has a concrete type parameter for the `Vector` container.

In [37]:
@show isconcretetype(Shape)
@show isconcretetype(Union{Circle, Rectangle, Square, Triangle});

isconcretetype(Shape) = false
isconcretetype(Union{Circle, Rectangle, Square, Triangle}) = false


You can see difference show up clearly in the performance benchmarks.

In [38]:
#| echo: false
using Format
f = cfmt("%'d", 33.877 * 1e3 / 939)
Markdown.md"`Union{Circle, Rectangle, Square, Triangle}` is faster than `Shape` by a factor of roughly $f times."

`Union{Circle, Rectangle, Square, Triangle}` is faster than `Shape` by a factor of roughly 36 times.


In [39]:
@show typeof(sorted_shapes_union)
@benchmark main1(sorted_shapes_union)

typeof(sorted_shapes_union) = Vector{Union{Circle, Rectangle, Square, Triangle}}


BenchmarkTools.Trial: 5292 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m928.209 μs[22m[39m … [35m 1.122 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m934.333 μs              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m941.559 μs[22m[39m ± [32m13.085 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m▁[39m▂[39m▆[39m█[34m▅[39m[39m▄[39m▂[39m▂[39m▃[39m▃[39m▂[39m▁[32m▁[39m[39m▁[39m▁[39m [39m [39m [39m [39m▁[39m▄[39m▄[39m▂[39m▁[39m [39m [39m▁[39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [39m▅[39m▇[39m█[39m█[39m█

It's possible to get even better performance by calculating the `sum`s for the individual arrays and summing them up together

In [40]:
main2(arrs...) = sum(main1, arrs)

main2 (generic function with 1 method)

In [41]:
@time main2(square_arr, rectangle_arr, triangle_arr, circle_arr);

  0.002709 seconds (406 allocations: 27.609 KiB, 85.41% compilation time)


In [42]:
@benchmark main2(square_arr, rectangle_arr, triangle_arr, circle_arr)

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m276.792 μs[22m[39m … [35m346.958 μs[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m278.917 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m279.432 μs[22m[39m ± [32m  2.322 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m [39m [39m▂[39m▄[39m▆[39m▇[39m█[39m█[34m█[39m[39m▇[39m▆[32m▅[39m[39m▄[39m▂[39m▁[39m [39m▁[39m▂[39m▃[39m▃[39m▄[39m▃[39m▂[39m▂[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▃
  [39m▄[39m▁[39m▅[3

# Conclusion

The key takeaway is that if you care about performance in Julia, you have to be mindful of types! Keeping types as concrete as possible is important because when type inference fails, it can propogate through your program. Even small changes to your code can improve performance significantly.

Many thanks to the helpful [Julia community on Discourse](https://discourse.julialang.org/t/unusual-non-deterministic-benchmark-results/113273/) for always offering insightful comments and advise.