# Lecture 2: Type Stability

## Defeating type inference: Type instabilities

To get good performance, there are some fairly simple rules that you need to follow in Julia code to avoid defeating the compiler's type inference.   See also the [performance tips section of the Julia manual](http://docs.julialang.org/en/stable/manual/performance-tips/).

Three of the most important are:

* Don't use (non-constant) global variables in critical code — put your critical code into a function (this is good advice anyway, from a software-engineering standpoint).  The compiler assumes that a **global variable can change type at any time**, so it is always stored in a "box", and "taints" anything that depends on it.

* Local variables should be "type-stable": **don't change the type of a variable inside a function**.  Use a new variable instead.

* Functions should be "type-stable": **a function's return type should only depend on the argument types, not on the argument values**.

To diagnose all of these problems, the `@code_warntype` macro that we used above is your friend.  If it labels any variables (or the function's return value) as `Any` or `Union{...}`, it means that the compiler couldn't figure out a precise type.

The third point, type-stability of functions, leads to lots of important but subtle choices in library design.  For example, consider the (built-in) `sqrt(x)` function, which computes $\sqrt{x}$:

In [49]:
sqrt(4)

2.0

You might think that `sqrt(-1)` should return $i$ (or `im`, in Julia syntax).  (Matlab's `sqrt` function does this.)  Instead, we get:

In [50]:
sqrt(-1)

LoadError: DomainError:
sqrt will only return a complex result if called with a complex argument. Try sqrt(complex(x)).

In [51]:
sqrt(-1 + 0im)

0.0 + 1.0im

Why did Julia implement `sqrt` in this silly way, throwing an error for negative arguments unless you add a zero imaginary part?  Any reasonable person wants an imaginary result from `sqrt(-1)`, surely?

The problem is that defining `sqrt` to return an imaginary result from `sqrt(-1)` would **not be type stable**: `sqrt(x)` would return a real result for non-negative real `x`, and a complex result for negative real `x`, so the **return type would depend on the value of `x`** and **not just its type.**

That would defeat type inference, not just for the `sqrt` function, but for **anything the sqrt function touches**.  Unless the compiler can somehow figure out `x ≥ 0`, it will have to either store the result in a "box" or compile two branches of the result.  Let's see how that works by defining our own square-root function:

In [52]:
mysqrt(x::Complex) = sqrt(x)
mysqrt(x::Real) = x < 0 ? sqrt(complex(x)) : sqrt(x)

mysqrt (generic function with 2 methods)

This definition is an example of Julia's [multiple dispatch style](http://docs.julialang.org/en/stable/manual/methods/), which in some sense is a generalization of object-oriented programming but focuses on "verbs" (functions) rather than nouns.  We will discuss this more in a later lecture.

The `::Complex` and `::Real` are argument-type declarations.  Such declarations are **not related to performance**, but instead **act as a "filter"** to allow us to have one version of `mysqrt` for complex arguments and another for real arguments.

In [53]:
mysqrt(2)

1.4142135623730951

In [54]:
mysqrt(-2)

0.0 + 1.4142135623730951im

In [55]:
mysqrt(-2+0im)

0.0 + 1.4142135623730951im

Looks great, right?  But let's see what happens to type inference in a function that calls `mysqrt` instead of `sqrt`:

In [56]:
slowfun(x) = mysqrt(x) + 1
@code_warntype slowfun(2)

Variables:
  #self#::#slowfun
  x::Int64
  #temp#@_3[1m[91m::Union{Complex{Float64}, Float64}[39m[22m
  #temp#@_4::Core.MethodInstance
  #temp#@_5[1m[91m::Union{Complex{Float64}, Float64}[39m[22m

Body:
  begin 
      $(Expr(:inbounds, false))
      # meta: location In[52] mysqrt 2
      unless (Base.slt_int)(x::Int64, 0)::Bool goto 6
      #temp#@_3[1m[91m::Union{Complex{Float64}, Float64}[39m[22m = $(Expr(:invoke, MethodInstance for sqrt(::Complex{Float64}), :(Base.sqrt), :($(Expr(:new, Complex{Float64}, :((Base.sitofp)(Float64, x)::Float64), :((Base.sitofp)(Float64, 0)::Float64))))))
      goto 8
      6: 
      #temp#@_3[1m[91m::Union{Complex{Float64}, Float64}[39m[22m = (Base.Math.sqrt_llvm)((Base.sitofp)(Float64, x::Int64)::Float64)::Float64
      8: 
      # meta: pop location
      $(Expr(:inbounds, :pop))
      unless (#temp#@_3[1m[91m::Union{Complex{Float64}, Float64}[39m[22m isa Complex{Float64})::Bool goto 14
      #temp#@_4::Core.MethodInstance = Method

Because the compiler **doesn't know at compile-time that x is positive** (at compile-time it **uses only types, not values**, it doesn't know whether the result is real (`Float64`) or complex (`Complex{Float64}`) and has to store it in a "box".  This kills performance.

## Defining our own types

Let's define our own type to represent a **"point" in two dimensions**.  Each point will have an $(x,y)$ location.  So that we can use the points with our `sum` functions above, we'll also define `+` and `zero` functions to do the obvious **vector addition**.

One such definition in Julia is:

In [57]:
mutable struct Point1
    x
    y
end
Base.:+(p::Point1, q::Point1) = Point1(p.x + q.x, p.y + q.y)
Base.zero(::Type{Point1}) = Point1(0,0)

Point1(3,4)

Point1(3, 4)

In [58]:
Point1(3,4) + Point1(5,6)

Point1(8, 10)

Our type is very generic, and can hold any type of `x` and `y` values:

In [59]:
Point1(3.7, 4+5im)

Point1(3.7, 4 + 5im)

Perhaps too generic:

In [60]:
Point1("x", [3,4,5])

Point1("x", [3, 4, 5])

Since `x` and `y` can be *anything*, they must be **pointers to "boxes"**.  This is **bad news for performance**.

A `mutable struct` is *mutable*, which means we can create a `Point1` object and then change `x` or `y`:

In [61]:
p = Point1(3,4)
p.x = 7
p

Point1(7, 4)

This means that every reference to a `Point1` object must be a *pointer* to an object stored elsewhere in memory, because *how else would we "know" when an object changes?*  Furthermore, an **array of `Point1` objects must be an array of pointers** (which is **bad news for performance** again):

In [62]:
P = [p,p,p]

3-element Array{Point1,1}:
 Point1(7, 4)
 Point1(7, 4)
 Point1(7, 4)

In [63]:
p.y = 8
P

3-element Array{Point1,1}:
 Point1(7, 8)
 Point1(7, 8)
 Point1(7, 8)

Let's test this out by creating an array of `Point1` objects and summing it.  Ideally, this would be about twice as slow as summing an equal-length array of numbers, since there are twice as many numbers to sum.  But because of all of the boxes and pointer-chasing, it should be far slower.

To create the array, we'll call the `Point1(x,y)` constructor with our array `a`, using Julia's ["dot-call" syntax](http://docs.julialang.org/en/stable/manual/functions/#dot-syntax-for-vectorizing-functions) that applies a function "element-wise" to arrays:

In [64]:
a1 = Point1.(a, a)

10000000-element Array{Point1,1}:
 Point1(0.106974, 0.106974)  
 Point1(0.188845, 0.188845)  
 Point1(0.484844, 0.484844)  
 Point1(0.885487, 0.885487)  
 Point1(0.33817, 0.33817)    
 Point1(0.663541, 0.663541)  
 Point1(0.906624, 0.906624)  
 Point1(0.601869, 0.601869)  
 Point1(0.788574, 0.788574)  
 Point1(0.75761, 0.75761)    
 Point1(0.225378, 0.225378)  
 Point1(0.234817, 0.234817)  
 Point1(0.403443, 0.403443)  
 ⋮                           
 Point1(0.410147, 0.410147)  
 Point1(0.0750395, 0.0750395)
 Point1(0.060309, 0.060309)  
 Point1(0.307107, 0.307107)  
 Point1(0.559202, 0.559202)  
 Point1(0.498393, 0.498393)  
 Point1(0.663843, 0.663843)  
 Point1(0.441784, 0.441784)  
 Point1(0.331159, 0.331159)  
 Point1(0.0522143, 0.0522143)
 Point1(0.380872, 0.380872)  
 Point1(0.985776, 0.985776)  

In [65]:
@btime sum($a1)

  584.454 ms (29999997 allocations: 610.35 MiB)


Point1(5.0003929583536135e6, 5.0003929583536135e6)

In [66]:
@btime mysum($a1)

  548.308 ms (30000001 allocations: 610.35 MiB)


Point1(5.000392958354606e6, 5.000392958354606e6)

The time is at least **50× slower** than we would like, but consistent with our other timing results on "boxed" values from last lecture.

### An imperfect solution: A concrete immutable type

We can avoid these two problems by:

* Declare the types of `x` and `y` to be *concrete* types, so that they don't need to be pointers to boxes.
* Declare our Point to be an [immutable](https://en.wikipedia.org/wiki/Immutable_object) type (`x` and `y` cannot change), so that Julia is not forced to make every reference to a Point into a pointer: just `struct`, not `mutable struct`:

In [67]:
struct Point2
    x::Float64
    y::Float64
end
Base.:+(p::Point2, q::Point2) = Point2(p.x + q.x, p.y + q.y)
Base.zero(::Type{Point2}) = Point2(0.0,0.0)

Point2(3,4)

Point2(3.0, 4.0)

In [68]:
Point2(3,4) + Point2(5,6)

Point2(8.0, 10.0)

In [69]:
p = Point2(3,4)
P = [p,p,p]

3-element Array{Point2,1}:
 Point2(3.0, 4.0)
 Point2(3.0, 4.0)
 Point2(3.0, 4.0)

In [70]:
p.x = 6 # gives an error since p is immutable

LoadError: [91mtype Point2 is immutable[39m

If this is working as we hope, then summation should be much faster:

In [71]:
a2 = Point2.(a,a)
@btime sum($a2)

  12.728 ms (0 allocations: 0 bytes)


Point2(5.0003929583536135e6, 5.0003929583536135e6)

Now the time is **only about 10ms**, only slightly more than twice the cost of summing an array of individual numbers of the same length!

Unfortunately, we paid a big price for this performance: our `Point2` type only works with *a single numeric type* (`Float64`), much like a C implementation.

### The best of both worlds: Parameterized immutable types

How do we get a `Point` type that works for *any* type of `x` and `y`, but at the same time allows us to have an array of points that is concrete and homogeneous (every point in the array is forced to be the same type)?  At first glance, this seems like a contradiction in terms.

The answer is not to define a *single* type, but rather to **define a whole family of types** that are *parameterized* by the type `T` of `x` and `y`.  In computer science, this is known as [parametric polymorphism](https://en.wikipedia.org/wiki/Parametric_polymorphism).  (An example of this can be found in [C++ templates](https://en.wikipedia.org/wiki/Template_%28C%2B%2B%29).)

In Julia, we will define such a family of types as follows:

In [72]:
struct Point3{T<:Real}
    x::T
    y::T
end
Base.:+(p::Point3, q::Point3) = Point3(p.x + q.x, p.y + q.y)
Base.zero{T}(::Type{Point3{T}}) = Point3(zero(T),zero(T))

Point3(3,4)

Point3{Int64}(3, 4)

Here, `Point3` is actually a family of subtypes `Point{T}` for different types `T`.   The notation `<:` in Julia means "is a subtype of", and hence `T<:Real` means that we are constraining `T` to be a `Real` type (a built-in *abstract type* in Julia that includes e.g. integers or floating point).

In [73]:
Point3(3,4) + Point3(5.6, 7.8)

Point3{Float64}(8.6, 11.8)

Now, let's make an array:

In [74]:
a3 = Point3.(a,a)

10000000-element Array{Point3{Float64},1}:
 Point3{Float64}(0.106974, 0.106974)  
 Point3{Float64}(0.188845, 0.188845)  
 Point3{Float64}(0.484844, 0.484844)  
 Point3{Float64}(0.885487, 0.885487)  
 Point3{Float64}(0.33817, 0.33817)    
 Point3{Float64}(0.663541, 0.663541)  
 Point3{Float64}(0.906624, 0.906624)  
 Point3{Float64}(0.601869, 0.601869)  
 Point3{Float64}(0.788574, 0.788574)  
 Point3{Float64}(0.75761, 0.75761)    
 Point3{Float64}(0.225378, 0.225378)  
 Point3{Float64}(0.234817, 0.234817)  
 Point3{Float64}(0.403443, 0.403443)  
 ⋮                                    
 Point3{Float64}(0.410147, 0.410147)  
 Point3{Float64}(0.0750395, 0.0750395)
 Point3{Float64}(0.060309, 0.060309)  
 Point3{Float64}(0.307107, 0.307107)  
 Point3{Float64}(0.559202, 0.559202)  
 Point3{Float64}(0.498393, 0.498393)  
 Point3{Float64}(0.663843, 0.663843)  
 Point3{Float64}(0.441784, 0.441784)  
 Point3{Float64}(0.331159, 0.331159)  
 Point3{Float64}(0.0522143, 0.0522143)
 Point3{Float64}(0.38

Note that the type of this array is `Array{Point3{Float64},1}` (we could equivalently write this as `Vector{Point3{Float64}}`, since `Vector{T}` is a synonym for `Array{T,1}`).  You should learn a few things from this:

* An `Array{T,N}` in Julia is itself a parameterized type, parameterized by the element type `T` and the dimensionality `N`.

* Since the element type `T` is encoded in the `Array{T,N}` type, the element type does not need to be stored in each element.  That means that the `Array` is free to store an array of "inlined" elements, rather than an array of pointers to boxes.  (This is why `Array{Float64,1}` earlier could be stored in memory like a C `double*`.

* It is still important that the element type be `immutable`, since an array of mutable elements would still need to be an array of pointers (so that it could "notice" if another reference to an element mutates it).

In [75]:
@btime sum($a3)
@btime mysum($a3)

  12.744 ms (0 allocations: 0 bytes)
  11.766 ms (0 allocations: 0 bytes)


Point3{Float64}(5.000392958354606e6, 5.000392958354606e6)

Hooray! It is again **only about 10ms**, the same time as our completely concrete and inflexible `Point2`.