# Julia basics

> Note: These materials were adapted from resources I've written for QuantEcon

In this notebook we'll move quickly to cover the basics of Julia

The intended audience is researchers or data analysts/scientists with experience in another language like Python or R

### Primitive Data Types

A particularly simple data type is a Boolean value, which can be either `true` or
`false`.

In [3]:
x = true

true

In [4]:
typeof(x)

Bool

In [5]:
y = 1 > 2  # now y = false

false

The two most common data types used to represent numbers are integers and
floats.

(Computers distinguish between floats and integers because arithmetic is
handled in a different way)

In [6]:
typeof(1.0)

Float64

In [7]:
typeof(1)

Int64

A useful tool for displaying both expressions and code is to use the `@show` macro, which displays the text and the results.

In [13]:
@show 2x - 3y
@show x + y;

2x - 3y = 1.0
x + y = 3.0


Complex numbers are another primitive data type, with the imaginary part being specified by `im`.

In [6]:
x = 1 + 2im

1 + 2im

In [7]:
y = 1 - 2im

1 - 2im

In [8]:
x * y  # complex multiplication

5 + 0im

There are several more primitive data types that we’ll introduce as necessary.

### Strings

Strings are created with double quotes

In [11]:
x = "foobar"
typeof(x)

String

The `\$` inside of a string is used to interpolate a variable.

In [12]:
x = 10; y = 20
"x = $x"

"x = 10"

With parentheses, you can splice the results of expressions into strings as well.

In [13]:
"x + y = $(x + y)"

"x + y = 30"

To concatenate strings use `*`

In [14]:
"foo" * "bar"

"foobar"

Julia provides many functions for working with strings.

In [15]:
s = "Charlie don't surf"

"Charlie don't surf"

In [24]:
split(s)

3-element Array{SubString{String},1}:
 "Charlie"
 "don't"
 "surf"

In [25]:
replace(s, "surf" => "ski")

"Charlie don't ski"

In [26]:
split("fee,fi,fo", ",")

3-element Array{SubString{String},1}:
 "fee"
 "fi"
 "fo"

In [27]:
strip(" foobar ")  # remove whitespace

"foobar"

Julia can also find and replace using [regular expressions](https://en.wikipedia.org/wiki/Regular_expression) ([see regular expressions documentation](https://docs.julialang.org/en/v1/manual/strings/#Regular-Expressions-1) for more info).

In [28]:
match(r"(\d+)", "Top 10")  # find digits in string

RegexMatch("10", 1="10")

### Containers

Julia has several basic types for storing collections of data.

One such data type is a **tuple**, which is immutable and can contain different types.

In [29]:
x = ("foo", "bar")
y = ("foo", 2)

("foo", 2)

In [30]:
typeof(x), typeof(y)

(Tuple{String,String}, Tuple{String,Int64})

Tuples can be constructed with or without parentheses.

In [31]:
x = "foo", 1

("foo", 1)

Tuples can also be unpacked directly into variables.

In [18]:
x = ("foo", 1)

("foo", 1)

In [19]:
word, val = x
println("word = $word, val = $val")

word = foo, val = 1


Tuples can be created with a hanging `,` – this is useful to create a tuple with one element.

In [23]:
x = ("foo", 1,)
y = ("foo",)
typeof(x), typeof(y)

(Tuple{String,Int64}, Tuple{String})

We can access elements by index using `[index]`

Julia starts counting at 1


This is the same as R, but different from Python (which starts at 0)

In [24]:
x[1]

"foo"

In [25]:
x[2]

1

Indexing with negative numbers is not allowed

To get values from the end of a container use `end`:

In [26]:
x[end]

1

In [27]:
x[end-1]

"foo"

### Arrays

Arrays are the core building block of numerical programming

In Julia we create arrays with `[` and `]`

In [21]:
x = [10, 20, 30, 40]

4-element Array{Int64,1}:
 10
 20
 30
 40

In [22]:
typeof(x)

Array{Int64,1}

To access multiple elements of an array or tuple, you can use slice notation.

In [39]:
x[1:3]

3-element Array{Int64,1}:
 10
 20
 30

In [40]:
x[2:end]

3-element Array{Int64,1}:
 20
 30
 40

The same slice notation works on strings.

In [41]:
"foobar"[3:end]

"obar"

#### Dictionaries

Another container type worth mentioning is dictionaries.

Dictionaries are like arrays except that the items are named instead of numbered.

In [42]:
d = Dict("name" => "Frodo", "age" => 33)

Dict{String,Any} with 2 entries:
  "name" => "Frodo"
  "age"  => 33

In [43]:
d["age"]

33

## Iterating

One of the most important tasks in computing is stepping through a
sequence of data and performing a given action.

Julia provides neat and flexible tools for iteration as we now discuss.

### Iterables

An iterable is something you can put on the right hand side of `for` and loop over.

These include sequence data types like arrays.

In [28]:
actions = ["surf", "ski"]
for action in actions
    println("Charlie doesn't $action")
end

Charlie doesn't surf
Charlie doesn't ski


They also include so-called **iterators**.

You’ve already come across these types of values

In [30]:
typeof(1:3)

UnitRange{Int64}

In [31]:
for i in 1:3
    print(i)
end

123

If you ask for the keys of dictionary you get an iterator

In [32]:
d = Dict("name" => "Frodo", "age" => 33)

Dict{String,Any} with 2 entries:
  "name" => "Frodo"
  "age"  => 33

In [33]:
keys(d)

Base.KeySet for a Dict{String,Any} with 2 entries. Keys:
  "name"
  "age"

This makes sense, since the most common thing you want to do with keys is loop over them.

The benefit of providing an iterator rather than an array, say, is that the former is more memory efficient.

Should you need to transform an iterator into an array you can always use `collect()`.

In [48]:
collect(keys(d))

2-element Array{String,1}:
 "name"
 "age"

### Looping without Indices

You can loop over sequences without explicit indexing, which often leads to
neater code.

For example compare

In [49]:
x_values = 1:5

1:5

In [50]:
for x in x_values
    println(x * x)
end

1
4
9
16
25


In [51]:
for i in eachindex(x_values)
    println(x_values[i] * x_values[i])
end

1
4
9
16
25


### Comprehensions

([See comprehensions documentation](https://docs.julialang.org/en/v1/manual/arrays/#man-comprehensions-1))

Comprehensions are an elegant tool for creating new arrays, dictionaries, etc. from iterables.

Here are some examples

In [54]:
doubles = [ 2i for i in 1:4 ]

4-element Array{Int64,1}:
 2
 4
 6
 8

In [55]:
animals = ["dog", "cat", "bird"];   # Semicolon suppresses output

In [56]:
plurals = [ animal * "s" for animal in animals ]

3-element Array{String,1}:
 "dogs"
 "cats"
 "birds"

In [57]:
[ i + j for i in 1:3, j in 4:6 ]

3×3 Array{Int64,2}:
 5  6  7
 6  7  8
 7  8  9

In [58]:
[ i + j + k for i in 1:3, j in 4:6, k in 7:9 ]

3×3×3 Array{Int64,3}:
[:, :, 1] =
 12  13  14
 13  14  15
 14  15  16

[:, :, 2] =
 13  14  15
 14  15  16
 15  16  17

[:, :, 3] =
 14  15  16
 15  16  17
 16  17  18

Comprehensions can also create arrays of tuples or named tuples

In [59]:
[ (i, j) for i in 1:2, j in animals]

2×3 Array{Tuple{Int64,String},2}:
 (1, "dog")  (1, "cat")  (1, "bird")
 (2, "dog")  (2, "cat")  (2, "bird")

In [60]:
[ (num = i, animal = j) for i in 1:2, j in animals]

2×3 Array{NamedTuple{(:num, :animal),Tuple{Int64,String}},2}:
 (num = 1, animal = "dog")  …  (num = 1, animal = "bird")
 (num = 2, animal = "dog")     (num = 2, animal = "bird")

### Generators

([See generator documentation](https://docs.julialang.org/en/v1/manual/arrays/#Generator-Expressions-1))

In some cases, you may wish to use a comprehension to create an iterable list rather
than actually making it a concrete array.

The benefit of this is that you can use functions which take general iterators rather
than arrays without allocating and storing any temporary values.

For example, the following code generates a temporary array of size 10,000 and finds the sum.

In [35]:
xs = 1:10000
f(x) = x^2
f_x = f.(xs)
sum(f_x)

333383335000

We could have created the temporary using a comprehension, or even done the comprehension
within the `sum` function, but these all create temporary arrays.

In [36]:
f_x2 = [f(x) for x in xs]
@show sum(f_x2)
@show sum([f(x) for x in xs]); # still allocates temporary

sum(f_x2) = 333383335000
sum([f(x) for x = xs]) = 333383335000


Note, that if you were hand-code this, you would be able to calculate the sum by simply
iterating to 10000, applying `f` to each number, and accumulating the results.  No temporary
vectors would be necessary.

A generator can emulate this behavior, leading to clear (and sometimes more efficient) code when used
with any function that accepts iterators.  All you need to do is drop the `]` brackets.

In [37]:
sum(f(x) for x in xs)

333383335000

We can use `BenchmarkTools` to investigate

In [38]:
using Pkg; Pkg.add("BenchmarkTools")

[32m[1m  Resolving[22m[39m package versions...
[32m[1mNo Changes[22m[39m to `~/.julia/environments/v1.5/Project.toml`
[32m[1mNo Changes[22m[39m to `~/.julia/environments/v1.5/Manifest.toml`


In [39]:
using BenchmarkTools
@btime sum([f(x) for x in $xs])
@btime sum(f.($xs))
@btime sum(f(x) for x in $xs);

  4.183 μs (2 allocations: 78.20 KiB)
  4.928 μs (2 allocations: 78.20 KiB)
  2.776 ns (0 allocations: 0 bytes)


Notice that the first two cases are nearly identical, and allocate a temporary array, while the
final case using generators has no allocations.

In this example you may see a speedup of over 1000x.  Whether using generators leads to code that is faster or slower depends on the cirumstances, and you should (1) always profile rather than guess; and (2) worry about code clarify first, and performance second—if ever.

## Comparisons and Logical Operators

In [65]:
x = 1

1

In [66]:
x == 2

false

For “not equal” use `!=` or `≠` (`\ne<TAB>`).

In [40]:
@show x != 3
@show x ≠ 3

x != 3 = true
x ≠ 3 = true


true

Julia can also test approximate equality with `≈` (`\approx<TAB>`).

In [68]:
1 + 1E-8 ≈ 1

true

Be careful when using this, however, as there are subtleties involving the scales of the quantities compared.

### Combining Expressions

Here are the standard logical connectives (conjunction, disjunction)

In [69]:
true && false

false

In [70]:
true || false

true

Remember

- `P && Q` is `true` if both are `true`, otherwise it’s `false`.  
- `P || Q` is `false` if both are `false`, otherwise it’s `true`.  

## Functions

Funcitons are defined using the syntax

```julia
function NAME(ARGS...; KWARGS...)
    BODY
end
```

Where 

- `NAME` is the name of your function
- `ARGS...` represent an arbitrary number of positional arguments
- `;KWARGS...` represents an arbitrary number of keyword arguments (must have default value)
- `BODY` represents the body or code of your function

### Return Statement

In Julia, the `return` statement is optional, so that the following functions
have identical behavior

In [41]:
function f1(a, b)
    return a * b
end

function f2(a, b)
    a * b
end

f2 (generic function with 1 method)

In [42]:
f1(1, 2) == f2(1, 2)

true

### Shorthand syntax

For short functions you can use the syntax

```julia
NAME(ARGS...;KWARGS...) = BODY
```

The julia parser emits the same code for short and long form functions

I typically use the shorthand for one line functions and long form for multi-line functions

In [45]:
f(x) = sin(1 / x)
f(1 / pi)

1.2246467991473532e-16

### Anonymous functions

Sometimes you don't need a function to have a name

You can use the syntax

```julia
(ARGS...; KWARGS...) -> BODY
```

to define an anonymous function

This is helpful when passing the function as an argument to another function:

In [46]:
map(x -> sin(1 / x), randn(3))  # apply function to each element

3-element Array{Float64,1}:
  0.9379154392183842
  0.9536976397969775
 -0.679579151508236

### Optional and Keyword Arguments

([See keyword arguments documentation](https://docs.julialang.org/en/v1/manual/functions/#Keyword-Arguments-1))

Function arguments can be given default values

In [47]:
f(x, a = 1) = exp(cos(a * x))

f (generic function with 2 methods)

If the argument is not supplied, the default value is substituted.

In [48]:
f(pi)

0.36787944117144233

In [49]:
f(pi, 2)

2.718281828459045

Another option is to use **keyword** arguments.

In [52]:
g(x; a = 1) = exp(cos(a * x))
g(pi, a = 2)

2.718281828459045

Because `a` is a keyword argument of `g`, I can't pass `a` as a positional argument:

In [53]:
g(pi, 2)

LoadError: MethodError: no method matching g(::Irrational{:π}, ::Int64)
Closest candidates are:
  g(::Any; a) at In[52]:1

## Broadcasting

([See broadcasting documentation](https://docs.julialang.org/en/v1/manual/arrays/#Broadcasting-1))

A common scenario in computing is that

- we have a function `f` such that `f(x)` returns a number for any number `x`  
- we wish to apply `f` to every element of an iterable `x_vec` to produce a new result `y_vec`  


In Julia loops are fast and we can do this easily enough with a loop.

For example, suppose that we want to apply `sin` to `x_vec = [2.0, 4.0, 6.0, 8.0]`.

The following code will do the job

In [54]:
x_vec = [2.0, 4.0, 6.0, 8.0]
y_vec = similar(x_vec)
for (i, x) in enumerate(x_vec)
    y_vec[i] = sin(x)
end

But this is a bit unwieldy so Julia offers the alternative syntax

In [55]:
y_vec = sin.(x_vec)

4-element Array{Float64,1}:
  0.9092974268256817
 -0.7568024953079282
 -0.27941549819892586
  0.9893582466233818

More generally, if `f` is any Julia function, then `f.` references the broadcasted version.

Conveniently, this applies to user-defined functions as well.

To illustrate, let’s write a function `chisq` such that `chisq(k)` returns a chi-squared random variable with `k` degrees of freedom when `k` is an integer.

In doing this we’ll exploit the fact that, if we take `k` independent standard normals, square them all and sum, we get a chi-squared with `k` degrees of freedom.

In [56]:
function chisq(k)
    @assert k > 0
    z = randn(k)
    return sum(z -> z^2, z)  # same as `sum(x^2 for x in z)`
end

chisq (generic function with 1 method)

The macro `@assert` will check that the next expression evaluates to `true`, and will stop and display an error otherwise.

In [57]:
chisq(3)

3.2473129191559837

Note that calls with integers less than 1 will trigger an assertion failure inside
the function body.

In [58]:
chisq(-2)

LoadError: AssertionError: k > 0

Let’s try this out on an array of integers, adding the broadcast

In [59]:
chisq.([2, 4, 6])

3-element Array{Float64,1}:
 1.0495866330704124
 3.5524450417561586
 6.220485955677148

The broadcasting notation is not simply vectorization, as it is able to “fuse” multiple broadcasts together to generate efficient code.

In [60]:
x = 1.0:1.0:5.0
y = [2.0, 4.0, 5.0, 6.0, 8.0]
z = similar(y)
z .= x .+ y .- sin.(x) # generates efficient code instead of many temporaries

5-element Array{Float64,1}:
  2.1585290151921033
  5.090702573174318
  7.858879991940133
 10.756802495307928
 13.958924274663138

A convenience macro for adding broadcasting on every function call is `@.`

In [61]:
@. z = x + y - sin(x)

5-element Array{Float64,1}:
  2.1585290151921033
  5.090702573174318
  7.858879991940133
 10.756802495307928
 13.958924274663138

Since the `+, -, =` operators are functions, behind the scenes this is broadcasting against both the `x` and `y` vectors.

The compiler will fix anything which is a scalar, and otherwise iterate across every vector

In [64]:
f(a, b) = a + b # bivariate function
a = [1 2 3]
b = [4 5 6]
@show f.(a, b) # across both
@show f.(a, 2); # fix scalar for second

f.(a, b) = [5 7 9]
f.(a, 2) = [3 4 5]


The compiler is only able to detect “scalar” values in this way for a limited number of types (e.g. integers, floating points, etc) and some packages (e.g. Distributions).

For other types, you will need to wrap any scalars in `Ref` to fix them, or else it will try to broadcast the value.

Another place that you may use a `Ref` is to fix a function parameter you do not want to broadcast over.

In [66]:
f(x, y) = [1, 2, 3]'*x + y   # "⋅" can be typed by \cdot<tab>
f([3, 4, 5], 2)   # uses vector as first parameter
f.(Ref([3, 4, 5]), [2, 3])   # broadcasting over 2nd parameter, fixing first

2-element Array{Int64,1}:
 28
 29

### Why broadcast?

Recall that we had this line of code

```julia
z .= x .+ y .- sin.(x) # generates efficient code instead of many temporaries
```

In numpy or R, the computation would happen as follows:

1. A new array for `sin_x = sin(x)` would be created
2. A new array for `x_y = x + y` would be created
3. These two new arrays could be combined to make a third new array `z = x_y - sin_x`

Suppose you had a large number of elements (e.g. `N=100_000_000`) in each of the arrays

When executing the simple looking code `z = x + y - sin(x)` three temporary arrays would need to be created, a total of 300_000_000 elements.

In julia the line `z .= x .+ y .- sin.(x)` causes *zero alloactions*

The equivalent operation is 

```julia
for i in eachindex(x)
    z[i] = x[i] + y[i] - sin(x[i])
end
```

This is both faster and more memory efficient (not to mention broadcasting's other benefits like expanding dimensions)