# Julia test notebook

Author: Matthew K. MacLeod

### Tutorial goals

test the following in Julia:
 * basics
 * linear algebra
 * statistics
 * visualization

## Background

Julia has some obvious benifits:
* speed
* dynamic nature (yet strongly typed)
* statistical tools of R, linear algebra of matlab, python feel...
* most of source code written in Julia itself
* coroutines, workers & the actor model
* macros - homoiconicity & metaprogramming

some of my concerns:
* mutability 
* list comprehensions lack conditionals
* lack of tail call optimization
* not yet >= 1.0

Julia is an interesting language which I hope gets more attention.

In [None]:
# these notes work with version 0.5.0-dev+876
VERSION

## Introduction to Julia

we'll go through some of the basic collection types:

* array
* tuple
* dictionary 
* set
* others

#### Array

these are mutable homogenous collections

In [None]:
collect(1:10)

In [None]:
# note type difference from that of collect, as can cause problems
(1:10)

In [None]:
div(10,3)

In [None]:
10 % 3

In [None]:
a = collect(1:10)
# showcase enumerate
for (i,value) in enumerate(a)
    println("element $i squared is $(value^2)")
end

#### Tuples

some notes:
* tuples are heterogeneous immutable containers
* strings are also immutable.

In [None]:
function doubler(a,b)
    return (a+a, b+b)
end
# unpack results into x and y
t = x, y = doubler(1.0,2)
println(t)
println(x)
println(y)

#### Sets

In [None]:
text = "Thy self thy foe to thy sweet self too cruel"
text_array = split(lowercase(text), ' ')
ts = Set(text_array)
unique_text_array = [i for i in ts]
unique_text = join(unique_text_array, " ")

In [None]:
another_line = "Within thine own bud buriest thy content"
text_array_2 = split(lowercase(another_line), ' ')
ts_2 = Set(text_array_2)
println("Union: ", union(ts,ts_2),"\n")
println("Intersection: ", intersect(ts,ts_2), "\n")
println("Set difference: ", setdiff(ts,ts_2), "\n")

#### Dictionary

In [None]:
counts = Dict()
for t in text_array
    t in keys(counts) ? counts[t] += 1 : counts[t] = 1
end
   
for (k,v) in counts
    println("key: ",k,"\t value: ",v)
end

#### Regex

In [None]:
# lets match simple emoticons, smiles or frowns
text = "so happy you made it :)"
emoji_pattern = r"(:[)(])+"
m = match(emoji_pattern,text)
println(m.match)

#### Priority Queues

these can be handy

https://en.wikipedia.org/wiki/Priority_queue


#### Data Frames

In [9]:
using DataFrames

In [21]:
max_row = 10
df = DataFrame()
df[:x] = 1:max_row
df[:square] = [i^2 for i in 1:max_row]
df[:cube] = [i^3 for i in 1:max_row]
df[:exponential] = [exp(i) for i in 1:max_row]
df[:log] = [log(i) for i in 1:max_row]
df[:less_than_2] = [i < 2 for i in 1:max_row]
show(df)

10x6 DataFrames.DataFrame
| Row | x  | square | cube | exponential | log      | less_than_2 |
|-----|----|--------|------|-------------|----------|-------------|
| 1   | 1  | 1      | 1    | 2.71828     | 0.0      | true        |
| 2   | 2  | 4      | 8    | 7.38906     | 0.693147 | false       |
| 3   | 3  | 9      | 27   | 20.0855     | 1.09861  | false       |
| 4   | 4  | 16     | 64   | 54.5982     | 1.38629  | false       |
| 5   | 5  | 25     | 125  | 148.413     | 1.60944  | false       |
| 6   | 6  | 36     | 216  | 403.429     | 1.79176  | false       |
| 7   | 7  | 49     | 343  | 1096.63     | 1.94591  | false       |
| 8   | 8  | 64     | 512  | 2980.96     | 2.07944  | false       |
| 9   | 9  | 81     | 729  | 8103.08     | 2.19722  | false       |
| 10  | 10 | 100    | 1000 | 22026.5     | 2.30259  | false       |

In [22]:
describe(df[1])

Summary Stats:
Mean:         5.500000
Minimum:      1.000000
1st Quartile: 3.250000
Median:       5.500000
3rd Quartile: 7.750000
Maximum:      10.000000


In [23]:
show(df[:x])

[1,2,3,4,5,6,7,8,9,10]

In [24]:
head(df)

Unnamed: 0,x,square,cube,exponential,log,less_than_2
1,1,1,1,2.718281828459045,0.0,True
2,2,4,8,7.38905609893065,0.6931471805599453,False
3,3,9,27,20.085536923187668,1.0986122886681096,False
4,4,16,64,54.59815003314424,1.3862943611198906,False
5,5,25,125,148.4131591025766,1.6094379124341005,False
6,6,36,216,403.4287934927351,1.791759469228055,False


In [25]:
mean(df[:x])

5.5

#### Custom Types

to create a custom type, use 
    
    type
or 
    
    immutable

In [1]:
immutable Flip
    v::Int8
end

In [2]:
a = Flip(rand([0,1]))
a.v

0

In [5]:
a

Flip(0)

In [4]:
b = Flip(rand([0,1]))
b.v

1

In [6]:
b

Flip(1)

#### Macros

### functional 

In [None]:
# list comps
squares = [i^2 for i in collect(1:10)]  # julia indices start with 1

In [None]:
# could also do 
cubes = [i^3 for i in 1:10]

In [None]:
fib(n) = n < 2 ? n : fib(n - 1) + fib(n - 2)
fib(40)

### Julia has some very basic pipes

In [7]:
25 |> sqrt

5.0

### filter

while list comprehensions dont have a filter, it's not really a problem as we can
get the same desired functionality with map and filter

In [None]:
array = collect(1:10)

In [None]:
# keep even squares
filter(x -> x % 2 == 0, map(x -> x * x, array))

In [None]:
function is_prime(x)
    return all([x%j != 0 for j in 2:(x-1)])
end

# return primes between x and y
function primes(x,y)
    return filter(is_prime, [i for i in x:(y-1)]) 
end

In [None]:
primes(200,300)

### reduce

In [None]:
reduce(+, array)

In [None]:
# sum squares
reduce(+, map(x -> x * x, array))

In [None]:
# sum even squares
reduce(+, filter(x -> x % 2 == 0, map(x -> x*x, array)))

In [None]:
function l1_norm(a)
    return reduce(+, map(x-> abs(x),a))
end

In [None]:
function l2_norm(a)
    return sqrt(reduce(+, map(x -> x^2,a)))
end

In [None]:
function ln_norm(a,power)
    return (reduce(+, map(x -> abs(x)^power,a)))^(1/power)
end

In [None]:
println(l1_norm(array))
println(l2_norm(array))
println(ln_norm(array,2))
println(ln_norm(array,3))
println(ln_norm(array,1/2))
println(ln_norm(array,1))

In [None]:
# to make sure absolute value function is working
b = collect(1:10)
b[2] = -2
b

In [None]:
# should return double nickels
println(ln_norm(b,1))

### Coroutines in Julia 

In [None]:
function fibber(n)
    a, b = (0, 1)
    for i in 1:n
        produce(b)
        a, b = (b, a + b)
    end
end

In [None]:
fibs = @task fibber(6)

In [None]:
for i in fibs
   print(i," , ")
end


#### IO 

In [None]:
outfile = open("test.txt", "w")

In [None]:
for i in 1:10
    println(outfile, "i is $i")
end
close(outfile)

In [None]:
# shell out
;cat test.txt

In [None]:
infile = open("test.txt", "r")

In [None]:
lines = readlines(infile)

In [None]:
map(split,lines)

In [None]:
[float(line[3]) for line in map(split, lines)]

## Linear Algebra

In [None]:
A = [1 2; 3 4]

In [None]:
eig(A)

In [None]:
eigvals(A)

In [None]:
inv(A)

In [None]:
det(A)

In [None]:
norm(A)

#### Solve linear equation: 

#### Ax=b

In [None]:
A = rand(5,5)
b = rand(5,1)
print("A:",A)
print("b:",b)

In [None]:
x = A \ b

In [None]:
A = [0 1
1 1
2 1
3 1
4 1
5 1
6.0 1
7 1
8 1]

In [None]:
y = [19, 20, 20.5, 21.5, 22, 23, 23, 25.5, 24]

In [None]:
A \ y

## Distance examples

* examine various distance metrics
   * euclidean
   * mahalanobis

In [None]:
# an example of a function
function euclidean_distance(a,b)
    differences = a-b
    return sqrt(reduce(+, map(x -> x*x, differences)))
end


In [None]:
"""
input: one array consisting of point_1 and point_2
output: point_1, point_2
"""
function split_points(points)
    half = round(Int,length(points)/2)
    point_a = points[1:half]
    point_b = points[half+1:length(points)]
    return point_a, point_b
end

In [None]:
point_1, point_2 = split_points([1.2, 2.2, 3.8, 1.0, 2.0, 8.8])
euclidean_distance(point_1, point_2)

## Statistics

In [None]:
# uniform random numbers
rand(10)

In [None]:
# random normal numbers
randn(10)

In [None]:
linspace(1.0,2.0,10)

In [None]:
using Distributions

In [None]:
srand(100) # set seed

In [None]:
# create an array of 1,000 Normally distributed numbers
# with an average of 10 and std of 3
n = rand(Normal(100,3),1_000)
println(mean(n))
println(var(n))
println(sqrt(var(n)))

In [None]:
median(n)

In [None]:
# get median and 95th percentile
quantile(n,[0.5,0.95])

In [None]:
skewness(n)

In [None]:
kurtosis(n)

## Visualization

In [None]:
using Gadfly, RDatasets, DataFrames

In [None]:
set_default_plot_size(20cm,10cm);

In [None]:
# linear regression 
x = [1.0:12.0;]
y = [5.5, 6.6, 7.6, 8.8, 10.9, 11.79, 13.48, 15.02, 17.77, 20.81, 22.0, 23.2]
a, b = linreg(x, y)  # Linear regression
plot(layer(x=x,y=[a+b*i for i in x], Geom.line),layer(x=x,y=y, Geom.point))

In [None]:
mlm = dataset("mlmRev","Gcsemv")
df = mlm[complete_cases(mlm), :]
println("done")

In [None]:
names(df)

In [None]:
describe(df)

In [None]:
plot(df, x="Course",y="Written", color="Gender")

In [None]:
plot(dataset("datasets", "iris"), x="SepalLength", y="SepalWidth",color="Species", Geom.point)

In [None]:
plot(dataset("car", "SLID"), x="Wages", color="Language", Geom.histogram)

note see file mkm_notebooks/license.txt for license of this notebook.