# Julia for Statisticians

## What is Julia?

A dynamic, high-performance language for technical computing

* Technical: anything involving numbers.
* Dynamic: you don't need to declare variable types.
* High-performance: code is just-in-time (JIT) compiled before it is executed.

## Why Julia

* To write fast, efficient code in an easy, elegant dynamic language
* It is easy to "peek under the hood"
 * Most of Julia is written Julia
 * Can inspect various stages of the compilation process
* It's free.
* It's fun.

## Why maybe not Julia (at least yet)

* It's still a young and developing language: some things will change in future (and you may come across the odd bug).
* The library ecosystem is not as broad or well-developed as R or Python.
* There are still gaps in the statistics functionality (though getting better).


## Useful resources

* [the manual](http://docs.julialang.org/): fairly straightforward and comprehensive.
* [julia-users](https://groups.google.com/forum/?fromgroups=#!forum/julia-users) mailing list.
* [GitHub repository](https://github.com/JuliaLang/julia) (for looking at the source and issue trackers).
* [StackOverflow](https://stackoverflow.com/questions/tagged/julia-lang) another great Q&A resource.


# 1. Introduction

The basic syntax is similar to that of Matlab and Python (and not far from R):

In [3]:
x = 1 # assign a variable a value
y = x + 4 # operators are used as infix
z = sin(x) # other functions are called with arguments enclosed by parentheses
u = 1; v = 2 # semicolons can be used to separate multiple statements on one line
α = 0.01 # full unicode support, entered using TeX notation (\alpha), followed by a tab 
β₀ = 1.2

1.2

In [4]:
exp(β₀)

3.3201169227365472

Prefixing a function with `?` brings up help

In [7]:
?sin

search: 

```
sin(x)
```

Compute sine of `x`, where `x` is in radians


sin sinh sind sinc sinpi asin using isinf asinh asind isinteger



## Control flow and functions

Standard control flow statements `if`, `for`, `while` are supported. Blocks are completed by `end`:

In [10]:
if y < 2
    println("a")
else
    println("b")
end

b


The "ternary operator" `_ ? _ : _` allows writing 1-line `if-else` statements:

In [4]:
x < 2 ? println("a") : println("b")

The iterator variable of a for loop can be separated by either `=` or `in`:

In [13]:
for i in 1:4 # could also write "for i in 1:4"
    println("i is ",i)
end

i is 1
i is 2
i is 3
i is 4


> **Performance note**: Unlike R, loops in Julia are not a massive performance sink! In fact, they will typically be faster than "vectorized" style.

Functions can be defined using the `function` keyword, or written in-line:

In [14]:
function foo(a,b)
    # body
    if a < 1
        return a # can return early
    end
    a + b # implicit return on last expression 
end

foo (generic function with 1 method)

In [7]:
addtwo(x) = x+2

addtwo (generic function with 1 method)

I will explain the "generic function with 1 method" later.

> **Performance note:** functions are the unit at which just-in-time (JIT) compilation occurs. The most common piece of advice to improve performance is to "wrap the code in a function".

## Types

Every Julia object has a type

In [15]:
typeof(1.0)

Float64

In [16]:
typeof(foo) # functions are objects

Function

In [17]:
typeof(Float64) # types are also objects

DataType

In [18]:
AbstractFloat

AbstractFloat

Types in Julia form a tree: 
* branches are **abstract types**: these can have subtypes, but not instances
* leaves are **concrete types**: these can have instances, but not subtypes

(technically, it `None` is a subtype of every type, but you can't create instances, so we can ignore it)

> **Exercise:** Write a function that prints every subtype of `Number` (hint: look at the `subtypes` function).

In [22]:
?subtypes

search: 

```
subtypes(T::DataType)
```

Return a list of immediate subtypes of DataType `T`. Note that all currently loaded subtypes are included, including those not visible in the current module.


subtypes issubtype



In [30]:
function printsubtypes(T,indent)
    println(indent,T)
    for t in subtypes(T)
        printsubtypes(t,indent*"  ")
    end
end

printsubtypes (generic function with 2 methods)

In [31]:
printsubtypes(Number,"")

Number
  Complex{T<:Real}
  Real
    AbstractFloat
      BigFloat
      Float16
      Float32
      Float64
    Integer
      BigInt
      Bool
      Signed
        Int128
        Int16
        Int32
        Int64
        Int8
      Unsigned
        UInt128
        UInt16
        UInt32
        UInt64
        UInt8
    Irrational{sym}
    Rational{T<:Integer}


The "super smile" operator `<:` is the subtype relation

In [19]:
Float64 <: Number

true

In [20]:
Float64 <: Integer

false

Some types take parameters (hence are called *parametric*), using curly brackets

In [32]:
typeof(1.0+2.0*im)

Complex{Float64}

In [33]:
typeof(1+2*im)

Complex{Int64}

> **Performance note:** The key way that Julia is able to achieve high performance is via *type inference*: the just-in-time (JIT) compiler attempts to figure out the type of each variable in the function, which allows it to generate efficient machine code.
> 
> As a result, it is useful for functions to be *type stable* whenever possible: for a given input type, functions should only return values of one type.

As an example, this is why:

In [15]:
sqrt(-1.0) # sqrt of a Float64 can only be a Float64

LoadError: LoadError: DomainError:
sqrt will only return a complex result if called with a complex argument. Try sqrt(complex(x)).
while loading In[15], in expression starting on line 1

In [16]:
sqrt(complex(-1.0))

0.0 + 1.0im

New types are usually defined using the `type` or `immutable` keyword. For example. `Complex` is defined as:

```julia
immutable Complex{T<:Real} <: Number
    re::T
    im::T
end
```    

* `immutable` specifies that the fields can't be changed once the object is constructed:
  * there are certain performance advantages to this for "small" objects
* `T` here is a *parameter*, and is constrained to be a subtype of `Real`
* `Complex` is the name of the type, and is specified to be a subtype of `Number`
* `re` and `im` are the *fields*: the `::` specifies that they must both be of the type of the parameter.
  * type specification is optional, but has performance advantages as the compiler can figure out the type ahead of time.


In [34]:
x = Complex(3,4) # call the type constructor

3 + 4im

In [35]:
typeof(x)

Complex{Int64}

In [36]:
x.re

3

In [37]:
typeof(x.re)

Int64

Other keywords for types:
* `abstract`: creates new abstract types, e.g.
```julia
abstract Real <: Number
```
* `bitstype`: for types consisting of raw bytes, e.g.
```julia
bitstype 64 Float64
```
* `typealias`: for type aliases, e.g.
```julia
typealias Complex128 Complex{Float64}
```

> **Exercise:** Create a `PolarComplex` type representing a complex number in terms of its absolute value and argument, both `Float64`, and functions for converting between this and its `Complex` type.

In [38]:
immutable PolarComplex <: Number
    abs::Float64
    arg::Float64
end

In [48]:
x = PolarComplex(1.0,pi)

PolarComplex(1.0,3.141592653589793)

In [42]:
topolar(z) = PolarComplex(abs(z),angle(z))

topolar (generic function with 1 method)

In [45]:
topolar(-1+0im)

PolarComplex(1.0,3.141592653589793)

In [46]:
function frompolar(x)
    Complex(x.abs*cos(x.arg), x.abs*sin(x.arg))
end

frompolar (generic function with 1 method)

In [49]:
frompolar(x)

-1.0 + 1.2246467991473532e-16im

## Strings

Single quotes (`'`) enclose a character, double quotes (`"`) enclose strings:

In [50]:
typeof('α')

Char

In [51]:
'aa' # invalid syntax

LoadError: LoadError: syntax: invalid character literal
while loading In[51], in expression starting on line 1

In [53]:
typeof("aα")

UTF8String

Julia strings support interpolation using the `$` sign:

In [54]:
x = 3
"x is $x" # variable interpolation

"x is 3"

In [55]:
"exp of 3.2 is $(exp(3.2))"

"exp of 3.2 is 24.532530197109352"

Strings can be iterated over by character:

In [56]:
for c in "αβγ"
    println("char $c.")
end

char α.
char β.
char γ.


Strings are joined using `*` (mathematically, strings form a monoid) and repeated using `^` (the "exponent power" operator).

In [57]:
"aa"*"bb"

"aabb"

In [58]:
"abc"^5

"abcabcabcabcabc"

## Numbers

Julia draws a distinction between integers and floats.

In [59]:
typeof(1) # an integer digit sequence creates an integer type

Int64

In [61]:
typeof(1e20) # decimal points or scientifc notation (e.g. 1e3) creates a float type

Float64

For performance reasons, no checking is performed for integer overflow

In [62]:
x = typemax(Int)

9223372036854775807

In [63]:
x + 1

-9223372036854775808

You may see different numbers on your machine: `Int` (the default `Integer` type) is an alias for the native integer type (`Int32` on a 32-bit machine, `Int64` on a 64-bit machine).

If you need bigger integers you can use `BigInt`, which allows for arbitrary sized integers (but are much slower), or floating point arithemetic.

In [64]:
3^200 # incorrect due to overflow

6627890308811632801

In [65]:
BigInt(3)^200 # exact result

265613988875874769338781322035779626829233452653394495974574961739092490901302182994384699044001

In [35]:
3.0^200 # approximate result using floating point arithmetic

2.6561398887587478e95

Other numeric types are `Rational`s:

In [66]:
3//10

3//10

In [67]:
3//10 * 5//6

1//4

`Irrational`s, which are used for mathematical constants:

In [68]:
pi

π = 3.1415926535897...

`BigFloat`s, high-precision floating point numbers:

In [69]:
BigFloat(pi)

3.141592653589793238462643383279502884197169399375105820974944592307816406286198

In [71]:
BigFloat(Float64(pi))

3.141592653589793115997963468544185161590576171875000000000000000000000000000000

> **Exercise**: Why does the following happen?

In [74]:
1//10 == 0.1

false

In [76]:
BigFloat(0.1)

1.000000000000000055511151231257827021181583404541015625000000000000000000000000e-01

In [78]:
big"0.1"

1.000000000000000000000000000000000000000000000000000000000000000000000000000002e-01

In [75]:
0.1 + 0.2

0.30000000000000004

When operations involve different types of numbers, they are *promoted* to the "best" common type:

In [79]:
1 + 1.0 # Float64

2.0

In [80]:
BigInt(1) + 2.0 # BigFloat

3.000000000000000000000000000000000000000000000000000000000000000000000000000000