# Data structures

Once we start working with many pieces of data at once, it will be convenient for us to store data in structures like arrays or dictionaries (rather than just relying on variables).

Types of data structures covered:
1. Tuples
2. Dictionaries
3. Arrays


As an overview, tuples and arrays are both ordered sequences of elements (so we can index into them). 

Dictionaries and arrays are both **mutable**, whereas tuples are **immutable**

We'll explain this more below!
Arrays (in my opinion) are one of the most important structures in (Scientific) Computing.

## Tuples

We can create a tuple by enclosing an ordered collection of elements in `( )`.

Syntax: <br>
```julia
(item1, item2, ...)
```


In [1]:
myfavoriteanimals = ("penguins", "cats", "sugargliders")

("penguins", "cats", "sugargliders")

In [2]:
numbers = (1,1,2)

(1, 1, 2)

We can index into this tuple,

In [3]:
myfavoriteanimals[1]

"penguins"

but since tuples are immutable, we can't update it

In [4]:
myfavoriteanimals[1] = "otters"

MethodError: MethodError: no method matching setindex!(::Tuple{String, String, String}, ::String, ::Int64)

In [5]:
numbers[1]=5

MethodError: MethodError: no method matching setindex!(::Tuple{Int64, Int64, Int64}, ::Int64, ::Int64)

## Julia Specific: NamedTuples

As you might guess, `NamedTuple`s are just like `Tuple`s except that each element additionally has a name! They have a special syntax using `=` inside a tuple:

```julia
(name1 = item1, name2 = item2, ...)
```

In [6]:
myfavoriteanimals = (bird = "penguins", mammal = "cats", marsupial = "sugargliders")

(bird = "penguins", mammal = "cats", marsupial = "sugargliders")

Like regular `Tuples`, `NamedTuples` are ordered, so we can retrieve their elements via indexing:

In [7]:
myfavoriteanimals[1]

"penguins"

They also add the special ability to access values by their name:

In [8]:
myfavoriteanimals.bird

"penguins"

## Dictionaries

If we have sets of data related to one another, we may choose to store that data in a dictionary. We can create a dictionary using the `Dict()` function, which we can initialize as an empty dictionary or one storing key, value pairs.

Syntax:
```julia
Dict(key1 => value1, key2 => value2, ...)
```

A good example is a contacts list, where we associate names with phone numbers.

In [22]:
myphonebook = Dict("Yiannis" => "0044000000", "Imperial" => "02075895111")

Dict{String, String} with 2 entries:
  "Yiannis"  => "0044000000"
  "Imperial" => "02075895111"

In this example, each name and number is a "key" and "value" pair. We can grab Yianni's number (a value) using the associated key

In [23]:
myphonebook["Yiannis"]

"0044000000"

We can add another entry to this dictionary as follows

In [24]:
myphonebook["Graduate School"] = "020 7589 5111"

"020 7589 5111"

Let's check what our phonebook looks like now...

In [25]:
myphonebook

Dict{String, String} with 3 entries:
  "Yiannis"         => "0044000000"
  "Graduate School" => "020 7589 5111"
  "Imperial"        => "02075895111"

We can delete Yiannis from our contact list - and simultaneously grab his number - by using `pop!`

In [27]:
pop!(myphonebook, "Yiannis")

"0044000000"

In [28]:
myphonebook

Dict{String, String} with 2 entries:
  "Graduate School" => "020 7589 5111"
  "Imperial"        => "02075895111"

Unlike tuples and arrays, dictionaries are not ordered. So, we can't index into them.

In [29]:
myphonebook[1]

KeyError: KeyError: key 1 not found

In the example above, `julia` thinks you're trying to access a value associated with the key `1`.

## Arrays

Unlike tuples, arrays are mutable. Unlike dictionaries, arrays contain ordered collections. <br>
We can create an array by enclosing this collection in `[ ]`.

Syntax: <br>
```julia
[item1, item2, ...]
```

For an interesting piece of Julia history regarding differences between matrices and vectors feel free to read https://github.com/julialang/julia/issues/4774


For example, we might create an array to keep track of pokemon


In [30]:
mypokemon = ["Bulbasaur","Squirtle","Charmander","Pikachu"]

4-element Vector{String}:
 "Bulbasaur"
 "Squirtle"
 "Charmander"
 "Pikachu"

The `1` in `Array{String,1}` means this is a one dimensional vector.  An `Array{String,2}` would be a 2d matrix, etc.  The `String` is the type of each element.

or to store a sequence of numbers

In [31]:
fibonacci = [1, 1, 2, 3, 5, 8, 13]

7-element Vector{Int64}:
  1
  1
  2
  3
  5
  8
 13

In [32]:
mixture = [1, 1, 2, 3, "Yiannis", "Evripides"]

6-element Vector{Any}:
 1
 1
 2
 3
  "Yiannis"
  "Evripides"

Once we have an array, we can grab individual pieces of data from inside that array by indexing into the array. For example, if we want the third pokemon listed in `pokemon`, we write

In [33]:
mypokemon[3]

"Charmander"

We can use indexing to edit an existing element of an array

In [34]:
mypokemon[3] = "Mewtwo"

"Mewtwo"

Yes, Julia is 1-based indexing, not 0-based like Python.  (Although https://docs.julialang.org/en/v1/devdocs/offset-arrays/)

We can also edit the array by using the `push!` and `pop!` functions. `push!` adds an element to the end of an array and `pop!` removes the last element of an array. 

Remember the Julia notation ! means the object changes.

We can add another number to our fibonnaci sequence

In [35]:
push!(fibonacci, 21)

8-element Vector{Int64}:
  1
  1
  2
  3
  5
  8
 13
 21

and then remove it

In [36]:
pop!(fibonacci)

21

In [37]:
fibonacci

7-element Vector{Int64}:
  1
  1
  2
  3
  5
  8
 13

So far I've given examples of only 1D arrays of scalars, but arrays can have an arbitrary number of dimensions and can also store other arrays. 
<br><br>
For example, the following are arrays of arrays:

In [38]:
favorites = [["ice-cream", "chocolate", "eggs"],["penguins", "cats", "spiders"]]

2-element Vector{Vector{String}}:
 ["ice-cream", "chocolate", "eggs"]
 ["penguins", "cats", "spiders"]

In [39]:
numbers = [[1, 2, 3], [4, 5], [6, 7, 8, 9]]

3-element Vector{Vector{Int64}}:
 [1, 2, 3]
 [4, 5]
 [6, 7, 8, 9]

Below are examples of 2D and 3D arrays populated with random values.

In [40]:
rand(4, 3)

4×3 Matrix{Float64}:
 0.0407491  0.0925724  0.708855
 0.0397694  0.93413    0.236814
 0.312578   0.213933   0.787578
 0.777425   0.785905   0.325041

In [41]:
rand(4, 3, 2)

4×3×2 Array{Float64, 3}:
[:, :, 1] =
 0.0141338  0.0957528  0.317101
 0.360828   0.0508514  0.484834
 0.0286108  0.518769   0.471675
 0.0746651  0.962424   0.526508

[:, :, 2] =
 0.957841  0.771966  0.807781
 0.773477  0.304807  0.511314
 0.723581  0.467766  0.328393
 0.562373  0.658319  0.883337

In [None]:
?rand

Be careful when you want to copy arrays!

In [42]:
fibonacci

7-element Vector{Int64}:
  1
  1
  2
  3
  5
  8
 13

In [43]:
somenumbers = fibonacci

7-element Vector{Int64}:
  1
  1
  2
  3
  5
  8
 13

In [44]:
somenumbers[1] = 404

404

In [45]:
fibonacci

7-element Vector{Int64}:
 404
   1
   2
   3
   5
   8
  13

Editing `somenumbers` caused `fibonacci` to get updated as well!

In the above example, we didn't actually make a copy of `fibonacci`. We just created a new way to access the entries in the array bound to `fibonacci`.

If we'd like to make a copy of the array bound to `fibonacci`, we can use the `copy` function.

In [46]:
# First, restore fibonacci
fibonacci[1] = 1
fibonacci

7-element Vector{Int64}:
  1
  1
  2
  3
  5
  8
 13

In [47]:
somemorenumbers = copy(fibonacci)

7-element Vector{Int64}:
  1
  1
  2
  3
  5
  8
 13

In [48]:
somemorenumbers[1] = 404

404

In [49]:
fibonacci

7-element Vector{Int64}:
  1
  1
  2
  3
  5
  8
 13

In this last example, fibonacci was not updated. Therefore we see that the arrays bound to `somemorenumbers` and `fibonacci` are distinct.

# Vectors and Matrices
We can also create vectors : 

In [50]:
squares = [1, 4, 9, 16, 25, 36, 49, 64,81]

9-element Vector{Int64}:
  1
  4
  9
 16
 25
 36
 49
 64
 81

In [52]:
squares[1]

1

In [53]:
squares[1:3]

3-element Vector{Int64}:
 1
 4
 9

In [54]:
squares[end]

81

In [55]:
squares[-1]

BoundsError: BoundsError: attempt to access 9-element Vector{Int64} at index [-1]

The below code will give us a first glimpse into the powerful **type** system present in Julia

In [56]:
typeof(squares) 

Vector{Int64} (alias for Array{Int64, 1})

# Concatenation
If, instead of commas, we use spaces, then the values are concatenated horizontally

In [57]:
cubes = [1, 8, 27, 64, 125, 216, 343, 512]

8-element Vector{Int64}:
   1
   8
  27
  64
 125
 216
 343
 512

In [58]:
cubes2 = [1 8 27]

1×3 Matrix{Int64}:
 1  8  27

In [59]:
pop!(squares)

81

In [60]:
squares

8-element Vector{Int64}:
  1
  4
  9
 16
 25
 36
 49
 64

In [71]:
a = [1:3 4:6]
b = [1:3; 4:6]

6-element Vector{Int64}:
 1
 2
 3
 4
 5
 6

In [61]:
powers = [1:8 squares cubes]

8×3 Matrix{Int64}:
 1   1    1
 2   4    8
 3   9   27
 4  16   64
 5  25  125
 6  36  216
 7  49  343
 8  64  512

In [63]:
powers[4, 2]

16

In [64]:
powers[:, 3]

8-element Vector{Int64}:
   1
   8
  27
  64
 125
 216
 343
 512

In [65]:
powers[7, :]

3-element Vector{Int64}:
   7
  49
 343

In [66]:
typeof(powers)

Matrix{Int64} (alias for Array{Int64, 2})

Semicolon separators perform vertical concatenation:

In [67]:
[squares; cubes]

16-element Vector{Int64}:
   1
   4
   9
  16
  25
  36
  49
  64
   1
   8
  27
  64
 125
 216
 343
 512

Whereas commas would simply create an array of arrays : 

In [72]:
# dimension of vectors (row/column) is automatically determined
nested_powers = [[1,2,3,4,5,6,7,8], squares, cubes]

3-element Vector{Vector{Int64}}:
 [1, 2, 3, 4, 5, 6, 7, 8]
 [1, 4, 9, 16, 25, 36, 49, 64]
 [1, 8, 27, 64, 125, 216, 343, 512]

In [73]:
nested_powers[2]

8-element Vector{Int64}:
  1
  4
  9
 16
 25
 36
 49
 64

Horizontal and vertical concatenation can be used together to as a simple syntax for matrix literals

In [74]:
[1 3 5; 2 4 6]

2×3 Matrix{Int64}:
 1  3  5
 2  4  6

## for loops

The syntax for a `for` loop is

```julia
for *var* in *loop iterable*
    *loop body*
end
```

## while loops

The syntax for a `while` is

```julia
while *condition*
    *loop body*
end
```



Be careful of 1-based indexing! What do we think the following will produce?

In [75]:
for i in 1:10 #python 1,,,9
    println(i)
end

1
2
3
4
5
6
7
8
9
10


We could use a for loop to generate the same results as either of the examples above:

In [76]:
A = fill(0, (8, 3)) # Allocate an 8x3 matrix to store the values into
for pow in 1:3
    for value in 1:8
        A[value, pow] = value ^ pow
    end
end
A

8×3 Matrix{Int64}:
 1   1    1
 2   4    8
 3   9   27
 4  16   64
 5  25  125
 6  36  216
 7  49  343
 8  64  512

In [77]:
A == powers

true


## Array Comprehensions

In [78]:
squares = [value^2 for value in 1:8]

8-element Vector{Int64}:
  1
  4
  9
 16
 25
 36
 49
 64

In [79]:
cubes = [value^3 for value in 1:8]

8-element Vector{Int64}:
   1
   8
  27
  64
 125
 216
 343
 512

In [80]:
powers = [value^pow for value in 1:8, pow in 1:3]

8×3 Matrix{Int64}:
 1   1    1
 2   4    8
 3   9   27
 4  16   64
 5  25  125
 6  36  216
 7  49  343
 8  64  512

# The element type

Note that every time an array prints out, it is displaying its element type and dimensionality, for example `Array{Int64, 2}`. This describes what it can store — and thus what it can return upon indexing.

In [82]:
powers

8×3 Matrix{Int64}:
 1   1    1
 2   4    8
 3   9   27
 4  16   64
 5  25  125
 6  36  216
 7  49  343
 8  64  512

In [81]:
typeof(powers)

Matrix{Int64} (alias for Array{Int64, 2})

In [83]:
typeof(powers[1, 1])

Int64

Further, the array will try to convert any new values assigned into it to its element type:

In [84]:
powers[1, 1] = 1.6

InexactError: InexactError: Int64(1.6)

In [85]:
powers[1, 1] = -5.0 # This can be losslessly converted to an integer

-5.0

In [86]:
powers

8×3 Matrix{Int64}:
 -5   1    1
  2   4    8
  3   9   27
  4  16   64
  5  25  125
  6  36  216
  7  49  343
  8  64  512

Arrays that have an exact and concrete element type are generally significantly faster, so Julia will try to find an amenable element type for you in its literal construction syntax:

In [87]:
fortytwosarray = [42, 42.0, 4.20e1, 4.20f1, 84//2, 0x2a]

6-element Vector{Float64}:
 42.0
 42.0
 42.0
 42.0
 42.0
 42.0

In [88]:
for x in fortytwosarray
    show(x)
    println("\tisa $(typeof(x))")
end

42.0	isa Float64
42.0	isa Float64
42.0	isa Float64
42.0	isa Float64
42.0	isa Float64
42.0	isa Float64


The `Any` array can be helpful for disabling these behaviors and allowing all kinds of different objects:

In [90]:
anyfortytwos = Any[42, 42.0, 4.20e1, 4.20f1, 84//2, 0x2a]

6-element Vector{Any}:
   42
   42.0
   42.0
   42.0f0
  42//1
 0x2a

In [91]:
anyfortytwos[1] = "FORTY TWO"
anyfortytwos

6-element Vector{Any}:
     "FORTY TWO"
   42.0
   42.0
   42.0f0
  42//1
 0x2a

### Exercises

#### 4.1 
Create an array, `a_ray`, with the following code:

```julia
a_ray = [1, 2, 3]
```

Add the number `4` to the end of this array and then remove it.

In [112]:
a_ray = [1, 2, 3]
push!(a_ray, 4)
pop!(a_ray)

4

In [113]:
@assert a_ray == [1, 2, 3]

#### 4.2 
Try to add "Emergency" as key to `myphonebook` with the value `string(911)` with the following code
```julia
myphonebook["Emergency"] = 911
```

Why doesn't this work?

In [114]:
myphonebook

Dict{String, String} with 3 entries:
  "Emergency"       => "911"
  "Graduate School" => "020 7589 5111"
  "Imperial"        => "02075895111"

In [102]:
myphonebook["Emergency"] = "911"

"911"

#### 4.3 
Create a new dictionary called `flexible_phonebook` that has Jenny's number stored as an integer and Ghostbusters' number stored as a string with the following code

```julia
flexible_phonebook = Dict("Jenny" => 8675309, "Ghostbusters" => "555-2368")
```

In [103]:
flexible_phonebook = Dict("Jenny" => 8675309, "Ghostbusters" => "555-2368")

Dict{String, Any} with 2 entries:
  "Jenny"        => 8675309
  "Ghostbusters" => "555-2368"

In [104]:
@assert flexible_phonebook == Dict("Jenny" => 8675309, "Ghostbusters" => "555-2368")

#### 4.4 
Add the key "Emergency" with the value `911` (an integer) to `flexible_phonebook`.

In [106]:
flexible_phonebook["Emergency"] = 911

911

In [107]:
@assert haskey(flexible_phonebook, "Emergency")

In [108]:
@assert flexible_phonebook["Emergency"] == 911

#### 4.5 
Why can we add an integer as a value to `flexible_phonebook` but not `myphonebook`? How could we have initialized `myphonebook` so that it would accept integers as values? (hint: try using [Julia's documentation for dictionaries](https://docs.julialang.org/en/v1/base/collections/#Dictionaries))

#### 4.6 
Use an array comprehension to create an an array `squares_arr` that stores the squares for all integers between 1 and 100.

In [115]:
squares_arr = [x ^ 2 for x = 1 : 100]

100-element Vector{Int64}:
     1
     4
     9
    16
    25
    36
    49
    64
    81
   100
     ⋮
  8464
  8649
  8836
  9025
  9216
  9409
  9604
  9801
 10000

In [110]:
@assert length(squares_arr) == 100
@assert sum(squares_arr) == 338350