# TensorFlow and other tools for ML in Julia






**Lyndon White**
 - Research Software Engineer -- Invenia Labs, Cambridge
 - Technically still PhD Candidate -- The University of Western Australia

In [29]:
using Pkg: @pkg_str
pkg"activate  ."


using TensorFlow
using MLDataUtils

using MLDatasets
using Statistics
using Plots

# Before TensorFlow
 
 - Researchers generally couldn't use Cafe etc as it is not suitably flexible.
 - One could use Theano, but it is painfully weird
 - Just write your neural networks by hand, as matrix math
 - And do your differenciation by hand, first with a blackboard then with more matrix math
 - While you are at it maybe write your own implementation of gradient descent etc
 - This was not long ago, I was still doing this in 2014
 - Julia is a fantasitic language to do this in.

# Static Graphs

 - TensorFlow is all about the statc graphs
 - It is basically a metaprogamming DSL for manipulating static graphs
 - which are then used to do differencation, and linear algebra

## 👍 Easy to manipulate mathematically and easy to think about
 - It is literally an AST for a language without control flow
    - i.e.  a language that is a lot like mathematical notation  
 - The dervitive of the graph can be calculated  via the chain rule -- generating another graph
 - Clear seperation of **defintion** from **execution**.

## 👎 Dynamic stuctures are impossible.
- A dynamic structure is on in which
 the network structure differs per input
- RNNs have to be statically unrolled to their maximum length
- If you want to represent say a tree structured network  (e.g. the work of Bowman, Socher and others for NLP)...  😿

## 🔮 Long-term static graphs are probably going away for good.

 - Dynamic structure makes deep learning more like normnal programming than math
 - much more flexible
 - Even TensorFlow is pushing this way wioth TensorFlow 2.0 and TensorFlow for Swift focusing on **TFEager**.


## 4 Types of Nodes, i.e. `Tensors`
 - **Placeholders:** this is where you put your inputs
 - **Operations:** theres transform inputs into outputs, they do math
 - **Variables:** thes are the things you train, they are mutable
 - **Actions:** These are operations with side effects, like logging (TensorBoard)) and mutating Variable (Optimizers)

## 1) Placeholders
 
  - Use these to declare your networks inputs
  
```julia
x = placeholder(Float64, name="x", shape=[-1, 5])
@tf x = placeholder(Float64, name="x", shape=[missing, 5])
```

When you invoke a network this is done
```julia
run(sess, output, placeholder_value_dict)
```

e.g.
```julia
run(sess, 2x, Dict(x=>[1, 2, 3, 4, 5]')
```

It is worth noting that you do not have to provide all `placeholder`s on every invocation of your network.
Only those on a path leading to your requested output.

## Automatic Node Naming

 - Notice before I did `@tf begin ... end`
 - **This is not at all required**
 - But it does enable automatic node naming
 - so `@tf y = sin(x)` actually becomes `y = sin(x; name="y")`
 - This gives you a good graph in tensorboard, and also better error messages.
 - Further it lets us look up tensors from the graph by **name**

## 2) Operators
 
  - Use these to declare calculations
  
```julia
z = *(x,y; name="z")
@tf z = x*y
```

When you invoke a network the  can be your outpurs
```julia
run(sess, output, placeholder_value_dict)
```

e.g.
```julia
run(sess, z, Dict(x=>1, y=>2)
```


### Functions and Operators

Functions mutate **the graph** to introduce **operators**.

For example:
 - `sin(::Float64)` in julia would return a `Float64` that is the answer.
 - `sin(::Tensor)` introduces a `sin` operation into the graph, and returns a `Tensor` that is a reference to it's output, this could be feed to other operations.
 
The answer to that operation is not computed, until you execute the graph.


In [30]:
sess= Session(Graph())

@tf begin
    x = placeholder(Float64)
    y = sin(x)
end

@show y

@show run(sess, y, Dict(x=>0.5));

y = <Tensor y:1 shape=unknown dtype=Float64>
run(sess, y, Dict(x => 0.5)) = 0.479425538604203


In [31]:
@show sess.graph["x"]
@show sess.graph["y"]

run(sess, sess.graph["y"], Dict(sess.graph["x"]=>0.5))

sess.graph["x"] = <Tensor x:1 shape=unknown dtype=Float64>
sess.graph["y"] = <Tensor y:1 shape=unknown dtype=Float64>


0.479425538604203

## 3) Variables

```julia
Wout = get_variable(Wout, [128, 128], Float32)
@tf Wout = get_variable([128, 128], Float32)
```

Variables are what are trained.
During a `run` of a network, if one of the outputs is an optimizer,
it will mutate the variables according to it's loss.


## 4) Actions
 - Technically this is just a kind of operation, with side-effects
 - Any time a node occurs in the path between the input and the output, it's action is done.
 - This mean returning the optimizer causes optimization of the parameters to occur
 
```julia
opt = train.minimize(train.AdamOptimizer(), net_loss)
```

# Lets have an exciting Demo
[Link](./Examples.ipynb)

![](https://white.ucc.asn.au/posts_assets/Intro%20to%20Machine%20Learning%20with%20TensorFlow.jl_files/Intro%20to%20Machine%20Learning%20with%20TensorFlow.jl_28_0.png)



# Lets break that example down

## Defining a Custom Activation Function
```julia
leaky_relu6(x) = 0.01x + nn.relu6(x)
```

 - Trival in the modern day with Flux, etc
 - When TensorFlow came out, this was insane wizard tricks 🧙
 - But now we take it for granted.
 - Note that to do this TensorFlow needed to basically implement a full linear algebra and math library.

## Building up our layers

```julia
Zs = [X]
for (ii, hlsize) in enumerate(hl_sizes)
    Wii = get_variable("W_$ii", [get_shape(Zs[end], 2), hlsize], Float32)
    bii = get_variable("b_$ii", [hlsize], Float32)
    Zii = leaky_relu6(Zs[end]*Wii + bii)
    push!(Zs, Zii)
end
```

Remember what we are actually doing here is mutating the graph.


## Loss function

```julia
losses = 0.5(Y .- X).^2
loss = reduce_mean(losses) + 0.01reduce_mean(bout.^2)
optimizer = train.minimize(train.AdamOptimizer(), loss)
```

# MLDataUtils for training helpers



 - MLDataUtils is a fantastic julia package full of helpers useful with all ML packages
 - Use it with TensorFlow, use it with Flux, use it with Knet, etc
 - `shuffleobs`
 - `eachbatch`/ `BatchView`
 - `eachobs`/`obsview`
 - Various stratified sampling, `oversample`, `undersample`
 - test/train splitting
 - feature normalizatin `rescale!`, `center!`
 - MLLabelUtils for encoding/decoding

## MLDataUtils Example
```julia
train_obs = shuffleobs(train_images_flat, ObsDim.Last())
batches = eachbatch(train_images_flat, 1_000, ObsDim.Last())
for (batch_ii, batch_x) in  enumerate(batches)
    ...
end
```

More examples:
```julia
for (batch_x, batch_y) in eachbatch((train_x, train_y), 1000, ObsDim.First())
for (batch_x, batch_y) in eachbatch((train_x, train_y), 1000, obsdim=(ObsDim.First(), ObsDim.Last()))
```

In [28]:
@show FashionMNIST.testtensor() |> typeof  
@show FashionMNIST.testtensor() |> size
println()
@show FashionMNIST.testlabels()  |> typeof
@show FashionMNIST.testlabels()  |> size

println()
data = (FashionMNIST.testtensor(), FashionMNIST.testlabels())
@show typeof(data)
println()
@show eachobs(data) |> first |> typeof
@show nobs(data);


FashionMNIST.testtensor() |> typeof = Base.ReinterpretArray{FixedPointNumbers.Normed{UInt8,8},3,UInt8,Array{UInt8,3}}
FashionMNIST.testtensor() |> size = (28, 28, 10000)

FashionMNIST.testlabels() |> typeof = Array{Int64,1}
FashionMNIST.testlabels() |> size = (10000,)

typeof(data) = Tuple{Base.ReinterpretArray{FixedPointNumbers.Normed{UInt8,8},3,UInt8,Array{UInt8,3}},Array{Int64,1}}

(eachobs(data) |> first) |> typeof = Tuple{Array{FixedPointNumbers.Normed{UInt8,8},2},Int64}
nobs(data) = 10000


# A Complicated Output Layer for HSV Color
Saturation and Value are easy, but Hue is angular

$$loss =   \left(y^\star_{sat} - y_{sat} \right)^2 + \left(y^\star_{val} - y_{val} \right)^2  + \frac{1}{2} \left(\sin(y^\star_{hue}) - y_{shue} \right)^2 + \frac{1}{2} \left(\cos(y^\star_{hue}) - y_{chue} \right)^2 $$
 
---
 
<img src="./figs/hsv_output_module.png" width="50%" height="50%"/>


# How do we build this: Prediction & Output

```julia
# Y_logit size: [missing, 4] 
Y_sat = nn.sigmoid(Y_logit[:,3])  # range 0:1
Y_val = nn.sigmoid(Y_logit[:,4])  # range 0:1

Y_shue = tanh(Y_logit[:,1])       # range -1:1 -- like sin
Y_chue = tanh(Y_logit[:,2])       # range -1:1 -- like cos

# For Output, we want hue angle measured in 0:1 (units of turns)
Y_hue_o1 = Ops.atan2(Y_shue, Y_chue)/(2Float32(π))
Y_hue_o2 = select(Y_hue_o1 > 0, Y_hue_o1, Y_hue_o1+1) # Wrap around
Y_hue = reshape(Y_hue_o2, [-1]) # force shape

Y = identity([Y_hue Y_sat Y_val])
```


# How do we build this: Observations & Loss
```julia
# Obs 
Y_obs = placeholder(Float32; shape=[-1, 3])
Y_obs_hue = Y_obs[:,1]    
Y_obs_sat = Y_obs[:,2]
Y_obs_val = Y_obs[:,3]

Y_obs_shue = sin(Float32(2π) .* Y_obs_hue)
Y_obs_chue = cos(Float32(2π) .* Y_obs_hue)


# Loss                        
loss_hue = 0.5reduce_mean((Y_shue - Y_obs_shue)^2 + (Y_chue - Y_obs_chue)^2))
loss_sat = reduce_mean((Y_sat-Y_obs_sat)^2)
loss_val = reduce_mean((Y_val-Y_obs_val)^2)

loss_total = identity(loss_hue + loss_sat + loss_val)
```

# Syntax Overloads

 - One of the nicest things about julia is how much of the syntax is available to be overloaded
 - This is used a lot in this example

## Overloading hcat & vcat

Like in:

```julia
 Y = identity([Y_hue Y_sat Y_val])
```


So that `[a b]` and `[a; b]` work.
vs Base Tensorflow, would have you first make sure everything is the same number of dimensions,
then `concat` them,
And you couldn't use julia style syntax.

https://github.com/malmaud/TensorFlow.jl/blob/7099f05f523556829164aab41eccd394d29df898/src/ops/transformations.jl#L129-L150


## Overloading getindex

Like in:

```julia
    Y_sat = nn.sigmoid(Y_logit[:,3])  # range 0:1
    Y_val = nn.sigmoid(Y_logit[:,4])  # range 0:1

    Y_shue = tanh(Y_logit[:,1])       # range -1:1 -- like sin
    Y_chue = tanh(Y_logit[:,2])       # range -1:1 -- like cos
```

Indexing with slices and ranges is much nicer than `tf.gather` and `tf.gather_nd` and even than `tf.slice`.

So that `X[a:b]`, `X[a]`, `X[:, end÷2]` etc.

https://github.com/malmaud/TensorFlow.jl/blob/master/src/ops/indexing.jl


# TensorFlow.jl Conventions vs Julia Conventions vs Python TensorFlow Conventions

**Julia**: 1-based indexing   
**Python TF**: 0-based indexing  
**TensorFlow.jl**: 1-based indexing   




**Julia:** explicit broadcasting   
**Python TF:** implicit broadcasting   
**TensorFlow.jl:** implicit or explicit broadcasting  




**Julia:**  last index at `end`, second last in `end-1`, etc.   
**Python TF:** last index at `-1`, second last in `-2`   
**TensorFlow.jl** last index at `end`, second last in `end-1`  




**Julia:**  Operations in Julia ecosystem namespaces. (`SVD` in `LinearAlgebra`, `erfc` in `SpecialFunctions`, `cos` in `Base`)   
**Python TF:** All operations in TensorFlow's namespaces (`SVD` in `tf.linalg`, `erfc` in `tf.math`, `cos` in `tf.math`, and all reexported from `tf`)  
**TensorFlow.jl**  Existing Julia functions overloaded to call TensorFlow equivalents when called with TensorFlow arguments  
 



**Julia:** Container types are parametrized by number of dimensions and element type   
**Python TF:** N/A -- python does not have a parametric type system   
**TensorFlow.jl:** Tensors are parametrized by element type.  

# Where are the bits that make TensorFlow.jl work defined?

## TensorFlow.jl (Julia)
 - Nice Things
 - RNNs
 - Training / Optimizers
 
## TensorFlow (PyCall)
 - Gradients
 - Writing tensorboard events to file
 
## LibTensorFlow (C API)
 - Operations
 - Shape Inference

# What doesn't work

# 😢 BatchNorm

 - There is a `BatchNorm` op in LibTensorFlow
 - Actually there are several, for different parts of the Fusing.
 - to get `BatchNorm` to work, you need to glue these together with the right predeclared variable for state and for reused working memory
 - This is hundreds (thousands?) of lines of python glue code, that needs to be reimplemented.

#  😢 Windows Support

 - I've not tried to get this working in  a while but last time:
 - Unending segfaults on basic operations.
 - In theory it should just work.

#  TFEager
## Work In Progress

 - Jon Malmaud is working on this
 - Google apparently wants this.
 - But why? I have a perfectly nice eager NN framework called Flux


# Dropping the Python Dependency
 - Python dependency is a nasty hack
 - It is basically only used for getting gradients.
 - we actually interact with it primarily by:
     - exporting the graph
     - running some Python TF on it
     - Importing the modified graph back
     
 - We need it for gradients as they are not in the C API
 - They are coming to the C API, but not ready yet.]]
 

# Invenia Labs

![](https://www.invenia.ca/wp-content/themes/relish_theme/img/labs-logo.png)

## We're hiring
### People who know Julia
### People who know Machine Learning
I have left some fliers about open positions at the entrance.