# An Introduction to Basic Julia Functionality 

# What is a vector?
   **a vector is a one dimensional array.**  It is like one row or column in a spreadsheet, and can hold any kind of data

How do we build one?

In [2]:
[1 2 3 4]  #use spaces to separate elements of a row vector

1×4 Array{Int64,2}:
 1  2  3  4

In [3]:
[1,2,3,4]  #use commas to separate elements of a column vector

4-element Array{Int64,1}:
 1
 2
 3
 4

In [4]:
 ["dog", "cat", "bird", "mouse"]

4-element Array{String,1}:
 "dog"  
 "cat"  
 "bird" 
 "mouse"

To declare a vector in Julia, use the square braces [  ]

**A ROW vector has items separated by spaces**

**A COLUMN vector has items separated by commas **


Lets say that you build a vector full of animals

animalsVector =  ["dog", "cat", "bird", "mouse"]

Notice: by giving this vector a title, we have saved it in the computer's memory

If I want to take out just the bird, I have to do INDEXING.
Each entry of the vector has a number label of its position, aka an Index.

In [5]:
animalsVector =  ["dog", "cat", "bird", "mouse"]
#lets find the cat!  
#you can index into a vector with the position inside square braces

#Your turn! Enter the index that will return "cat" as an output, then push Shift + Enter to execute this code block
animalsVector[]

"cat"

A word of Warning! Indexing out of bounds will cause an error!

In [6]:
vectorFourLong = [1 2 3 4]
vectorFourLong[5]

BoundsError: BoundsError: attempt to access 1×4 Array{Int64,2} at index [5]

# What is a Matrix?
   **a matrix is a two dimensional array.**  It is like a bunch of vectors stacked together, like a spreadsheet

Matrices are built just like vectors.
Spaces separate values in a row and semicolons separate rows

In [7]:
["a" "b" "c" "d"; "e" "f" "g" "h"]

2×4 Array{String,2}:
 "a"  "b"  "c"  "d"
 "e"  "f"  "g"  "h"

Indexing into matrices works just like vectors, but you now need two coordinates.

**Index with the format [row, column] **

In [16]:
myMatrix = ["a" "b" "c" "d"; "e" "f" "g" "h"]
myMatrix[2,3]

"g"

You can also slice a matrix into rows or columns by using the : operator during indexing

In [17]:
myMatrix = ["a" "b" "c" "d"; "e" "f" "g" "h"]
myMatrix[2,:]
#this operation will extract all of row 2

4-element Array{String,1}:
 "e"
 "f"
 "g"
 "h"

In [None]:
#Your turn! Index into myMatrix so that the output is "b" "f" (you can do this with one statment)
myMatrix[ , ]

# Other ways to build an array/matrix

Use the **rand()** function to create a matrix of a desired size containing values within a given range

In [15]:
rand(0:10, 4, 3)
# the range is [0,10], and it has 4 rows and 3 columns

4×3 Array{Int64,2}:
 6  10  0
 8   4  5
 3   2  3
 1  10  7

Build now, fill later with

**Array{T}(undef, dims)
Where T is the type
and dims is the dimensions**

Note: for concrete types, julia will automatically fill in the matrix with junk values

In [18]:
Array{Int64}(undef, 2, 3)

2×3 Array{Int64,2}:
 368363696  368363728  158578320
 156449200  157065840  163143984

In [6]:
#You can put the correct values in by indexing
junkArray = Array{Int64}(undef, 2, 3)

#Your Turn!  Make this 3x2 matrix contain the first 6 even numbers, in order.  2 is done for you
junkArray[1, 1] = 2;
junkArray[ , ] = ;
junkArray[ , ] = ;
junkArray[ , ] = ;
junkArray[ , ] = ;
junkArray[ , ] = ;
junkArray #this statement will show you the matrix

2×3 Array{Int64,2}:
 344327936         18  344327952
         6  140148768         45

There are some built-ins for quick matrix building:

**zeros(T, dims)**

**ones(T, dims)**

Both of these functions build a matrix full of either ones or zeros, with specified dimensions and Type (T)

# When is this useful?
# I WANT THE STUDENTS TO HAVE TO FILL THIS OUT!

Data in Julia is stored in Matrices, so it is helpful to know how to manipulate that data.

The method shown below for getting data from a public link on the internet will work for any file with the extension .csv

In [18]:
#first, we are going to download some data from the internet, in CSV(or comma separated values) format
P = download("https://raw.githubusercontent.com/kjbiener/introToJulia/master/juilaIntroData.csv","fruitConsumption.csv")
#we have to tell Julia to use the CSV Package before we can read the data
using CSV
data = CSV.read("fruitConsumption.csv")

Unnamed: 0_level_0,People,personA,personB,personC,personD
Unnamed: 0_level_1,String,Int64,Int64,Int64,Int64
1,Banana,2,4,6,2
2,Pear,4,6,7,5
3,Lemon,6,2,3,7
4,Pineapple,7,9,7,3
5,Orange,12,1,9,2
6,Strawberry,4,1,14,0
7,Apple,9,0,0,1
8,Mango,0,3,2,7


Great! We now have the data loaded into a structure called a **DataFrame** in Julia.  DataFrames are used in the inital read because they can hold mulitple types of data at once, like strings and Int64, in this case.  Regular Matrices can only hold one type of data.

The current DataFrame format cannot interact with Julia's statistics built-ins.  To convert it to a matrix, simply use the **Matrix()** function.  

Careful!  We cannot include the strings into the matrix, because they are of a different type.  Be sure to exclude these rows when you index!

In [18]:
#Your Turn!  Cut out the parts of the dataFrame that contain numbers. 
dataNumeric = Matrix(data[___,____])
#Hint:  Use range notaion to index
#data[startrow:endrow, startcolumn:endcolumn]

# the output should be a 8x4 Array{Int64, 2}

8×4 Array{Int64,2}:
  2  4   6  2
  4  6   7  5
  6  2   3  7
  7  9   7  3
 12  1   9  2
  4  1  14  0
  9  0   0  1
  0  3   2  7

Now that we have the raw numeric data, lets find out some things about it.  

In [24]:
dataNumeric = Matrix(data[:, 2:end])
#How much total fruit is consumed by each person?
#the sum() function will take all of the values in the matrix, and sum it up for you, 
#just specify the dimension you are summing over
sum(dataNumeric, dims=1)
#What happens if we call sum without specifying the dimensions?
#What does dims=1 mean?  dims=2?  Is this data useful?

1×4 Array{Int64,2}:
 44  26  48  27

We can also run other functions like **mean(), maximum(), minimum(), var()** and **std()**
Again, if we specify a dimension as a second parameter in the function, it will evaluate only along that dimension.

In [32]:
# we have to use the statistics package
using Statistics
#example of standard deviation calculation
std(dataNumeric)

3.6009351384005437

# Iterators and Conditional Operations
# aka . . . For and If statements

FOR loops:  A chunk of code that can be repeated a certain number of times

The iterator variable, i, will successively take the values of everything in the specified range(1 through 4) as it repeats the loop.

In [38]:
#here is an example of a For Loop
for i = 1:4
    println("I ran this loop ", i, " times")
end

I ran this loop 1 times
I ran this loop 2 times
I ran this loop 3 times
I ran this loop 4 times


In [25]:
#here is an example of an If statement
x = 4
if x < 10
    println("smaller than 10")
end

smaller than 10


Notice that both of these chunks of code have an **end** statement.  This syntax is crucial, otherwise the code will not work.

Loops and if statements allow the computer to make decisions about a bunch of numbers over and over again.  Instead of writing the same code enough times to check every value in an array, one For loop can do it for us. 

Let's say that we want to figure out which person ate the most strawberries this week.  

How can we use indexing, and a for loop, to figure out who it was?

In [26]:
#first, we need to get the strawberry row out

#Your Turn!! Which row is the strawberry row?
sRow = dataNumeric[____,:]
println(sRow)  #this will help you double check that you got the right row

#now lets initialize our maximum strawberry number to be zero, and the index to be zero as well
#This way, once we find someone who ate more than zero strawberries, we will know that the maximum can't be zero anymore
maxStrawberry = 0
indexMost = 0

# now for every value in the strawberry row, we need to examine that value

for i= ____:____ #Enter the range of possible indexes in the strawberry Row
    
    #is it the biggest?
    if sRow[i] > maxStrawberry
        #we found a bigger value!  Let's update the maximum that we have, and remember the location it was in
        maxStrawberry = _____ #put in the new biggest value
        indexMost = _____  #put in the location(index) where we found that value
    end
    #if it wasn't the biggest, we just move on to the next index in the strawberry Row
end

#Now that we have found the maximum by checking every value, lets print it out, and also figure out which person it was

#the names of the test subjects were:
testPeople = ["personA", "personB", "personC", "personD"]
biggestStrawberryFan = testPeople[indexMost]
#since the order of this vector of names is the same, we can use the index to find out who ate the most

#lets print out our information!
println(biggestStrawberryFan, " ate ", maxStrawberry, " strawberries, which was the most in the study.")
    

[4, 1, 14, 0]
personC ate 14 strawberries, which was the most in the study.
