When trying to figure out why is Julia fast, we need to think about what makes a language slow. 

So here is an analogy: 
Which one is faster? Driving on highway, or driving in local roads?
When we don't have to think about random pedestrians walking on the streats, in the case of highway, we can drive faster. 
The same is for computers, once the input and output type is determined, the compiler can optimize your code in the most efficient way.

So the keys to making Julia fast are its Just In Time (JIT) compiler and multiple dispatch design. 


# Compilation Stages of Julia Code
Source Code -> AST (Macro Expansion) 
-> IR  -> (SSA) IR 
LLVM IR -> Native code 

In [24]:
@macroexpand 1+2

:(1 + 2)

In [15]:
@code_lowered 1+2

CodeInfo(
[90m[77G│[1G[39m[90m53 [39m1 ─ %1 = (Base.add_int)(x, y)
[90m[77G│[1G[39m[90m   [39m└──      return %1
)

In [16]:
@code_typed 1+2

CodeInfo(
[90m[77G│[1G[39m[90m53 [39m1 ─ %1 = (Base.add_int)(x, y)[36m::Int64[39m
[90m[77G│[1G[39m[90m   [39m└──      return %1
) => Int64

In [17]:
@code_llvm 1+2


; Function +
; Location: int.jl:53
define i64 @"julia_+_35442"(i64, i64) {
top:
  %2 = add i64 %1, %0
  ret i64 %2
}


In [21]:
# If you really want to see assembly code, do this:
@code_native 1+2

	.section	__TEXT,__text,regular,pure_instructions
; Function + {
; Location: int.jl:53
	decl	%eax
	leal	(%edi,%esi), %eax
	retl
;}
; Function <invalid> {
; Location: int.jl:53
	nopw	%cs:(%eax,%eax)
;}


# Multiple Dispatch
Many people believe Julia is fast only because it is Just-In-Time (JIT) compiled (i.e. every statement is run using compiled functions which are either compiled right before they are used, or cached compilations from before). This leads to questions about what Julia gives over JIT'd implementations of Python/R (and MATLAB by default uses a JIT). These JIT compilers have been optimized for far longer than Julia, so why should we be crazy and believe that somehow Julia quickly out-optimized all of them? However, that is a complete misunderstanding of Julia. What I want show, in a very visual way, is that Julia is fast because of its design decisions. The core design decision, type-stability through specialization via multiple-dispatch is what allows Julia to be very easy for a compiler to make into efficient code, but also allow the code to be very concise and "look like a scripting language". This will lead to some very clear performance gains.

In [63]:
my_square(x) = x^2
@code_typed my_square(1)

	.section	__TEXT,__text,regular,pure_instructions
; Function my_square {
; Location: In[63]:1
; Function literal_pow; {
; Location: intfuncs.jl:243
; Function *; {
; Location: In[63]:1
	decl	%eax
	imull	%edi, %edi
;}}
	decl	%eax
	movl	%edi, %eax
	retl
	nopl	(%eax,%eax)
;}


In [60]:
@code_typed my_square(1.0)

CodeInfo(
[90m[63G│╻╷ literal_pow[1G[39m[90m1 [39m1 ─ %1 = (Base.mul_float)(x, x)[36m::Float64[39m
[90m[63G│  [1G[39m[90m  [39m└──      return %1
) => Float64

In [64]:
my_int_square(x::Int64) = x^2
@code_native my_int_square(1)

	.section	__TEXT,__text,regular,pure_instructions
; Function my_int_square {
; Location: In[64]:1
; Function literal_pow; {
; Location: intfuncs.jl:243
; Function *; {
; Location: In[64]:1
	decl	%eax
	imull	%edi, %edi
;}}
	decl	%eax
	movl	%edi, %eax
	retl
	nopl	(%eax,%eax)
;}


In [65]:
@code_native my_square(1)

	.section	__TEXT,__text,regular,pure_instructions
; Function my_square {
; Location: In[63]:1
; Function literal_pow; {
; Location: intfuncs.jl:243
; Function *; {
; Location: In[63]:1
	decl	%eax
	imull	%edi, %edi
;}}
	decl	%eax
	movl	%edi, %eax
	retl
	nopl	(%eax,%eax)
;}


# Cache and reuse compiled code.  
This is an easy concept. 
The code is only compiled once and then cached, so that you don't pay the compilation time again at the next call. 

In [74]:
@time 2^30

  0.003535 seconds (4.23 k allocations: 249.857 KiB)


1.073741824e9

In [75]:
@time 2^30

  0.000033 seconds (5 allocations: 176 bytes)


1.073741824e9