In [2]:
# Initialize packages 

begin 
	using Colors, ColorVectorSpace, ImageShow, FileIO, ImageIO
	using PlutoUI
	using HypertextLiteral
	using LinearAlgebra
	using ForwardDiff

	# Small patch to make images look more crisp:
	# https://github.com/JuliaImages/ImageShow.jl/pull/50
	Base.showable(::MIME"text/html", ::AbstractMatrix{<:Colorant}) = false
end 

## Functions in Math and Julia 

### Univariate Functions 

... are functions of one variable, e.g. 

\begin{equation*} 
    f_1(x) = x^2 
\end{equation*} 

\begin{equation*}
    f_2(x) = sin(x) 
\end{equation*} 

\begin{equation*} 
    f_3(x) = x^\alpha
\end{equation*}


In [3]:
# short form 
f₁(x) = x^2     # subscript unicode: \_1 + <tab> 
println("f₁(5) = ", f₁(5)) 

# anonymous form 
x -> sin(x) 
a = ( x -> sin(x) )(π/2)
println("f₂(π/2) = ", a)

# long form 
function f₃(x,α=3) # default parameter 
    return x^α
end 
println("f₃(5) = ", f₃(5)) 
println("f₃(5,2) = ", f₃(5,2)) 

# keywords ? generic function with 1 method  
f₄(x;α) = x^α 
println("f₄(2, α=5) = ", f₄(2, α=5))

# anonymous function with 2 methods ? 
f₅(x,α) = x^α 
println("f₅(2,5) = ", f₅(2,5))


f₁(5) = 25
f₂(π/2) = 1.0
f₃(5) = 125


f₃(5,2) = 25
f₄(2, α=5) = 32
f₅(2,5) = 32


### Automatic Differentiation of Univariates 

AD is the bee's knees 

In [4]:
# use with short/long form function 
df1 = ForwardDiff.derivative(f₁, 5) 
println(df1) 

# use with anonymous function (fix α=3)
df3 = ForwardDiff.derivative( x->f₃(x,3), 5 )
println(df3) 


10


75


In [5]:
ϵ = 0.00001 ; 

d1 = (sin(1+ϵ)-sin(1))/ϵ ; 
d2 = cos(1) ;  
d3 = ForwardDiff.derivative(sin,1) ; 
println((d1, d2, d3)) 

println("Error from ϵ = ", d1 - d2) 
println("Error from automatic differentiation = ", d3 - d2) 


(0.5402980985058647, 0.5403023058681398, 0.5403023058681398)
Error from ϵ = -4.207362275021609e-6
Error from automatic differentiation = 0.0


### Scalar Valued Multivariate Functions 

Let's try with following function: 

\begin{equation} 
    f_5(x) = 5 sin(x_1 x_2) + \frac{x_2}{2x_3}
\end{equation} 

Scalar valued functions of more than 1 variable can be written in Julia as a function of many variables OR as a function of a vector: 

In [6]:
begin
	f₅(v) = 5sin(v[1]*v[2]) + 2*v[2]/4v[3]
	f₅(x,y,z) = 5sin(x*y) + 2*y/4z
end

f₅(1,2,3), f₅([1,2,3])


(4.879820467461742, 4.879820467461742)

However, EVEN BETTER, if you must write it the 2 ways, **don't copy code**! Reuse code so that if it changes in one place, the update propagates everywhere: 

In [7]:
begin
	f₆( x,y,z)  = 5sin(x*y) + 2*y/4z
	f₆( v ) = f₆(v[1],v[2],v[3])
end

f₆(1,2,3), f₆([1,2,3])

(4.879820467461742, 4.879820467461742)

Another way to make vector code more readable is to give a tuple argument. The function works directly on vectors but is defined with readable letters: 

In [8]:
f₇( (x,y,z) ) = 5sin(x*y) + 2*y/4z # more readable then 5sin(v[1]*v[2]) + 2*v[2]/4v[3]

a = (1,2,3) ; 
println(typeof(a))
println(f₇(a))

b = [1,2,3] ; 
println(typeof(b))
println(f₇(b))


Tuple{Int64, Int64, Int64}
4.879820467461742
Vector{

Int64}
4.879820467461742


### Automatic Differentation: Scalar Valued Multivariate Functions 

Taking derivatives of the function in every argument direction is known as the *gradient*: 

In [9]:
ForwardDiff.gradient(f₅, [1,2,3]) 

3-element Vector{Float64}:
 -4.161468365471424
 -1.9140675160690452
 -0.1111111111111111

In [10]:
begin 
    ∂f₅∂x =  (f₅(1+ϵ, 2, 3  ) -f₅(1, 2, 3)) / ϵ
	∂f₅∂y =  (f₅(1, 2+ϵ, 3  ) -f₅(1, 2, 3)) / ϵ
	∂f₅∂z =  (f₅(1, 2,   3+ϵ) -f₅(1, 2, 3)) / ϵ
	∇f = [ ∂f₅∂x , ∂f₅∂y, ∂f₅∂z]
end 

3-element Vector{Float64}:
 -4.1615592949462155
 -1.914090248522626
 -0.11111074069702907

### Automatic Differentiation of Vector-Valued Multivariate Functions (Transformations / Matrices) 

Let's consider some functions with multidimensional inputs and outputs: 

In [11]:
begin
    idy((x,y)) = [x,y]
    lin1((x,y)) =  [ 2x + 3y, -5x+4x ]
    scalex(α) = ((x,y),) -> (α*x, y)
    scaley(α) = ((x,y),) -> (x,   α*y)
    rot(θ) = ((x,y),) -> [cos(θ)*x + sin(θ)*y, -sin(θ)*x + cos(θ)*y]
    shear(α) = ((x,y),) -> [x+α*y,y]
    genlin(a,b,c,d) = ((x,y),) -> [ a*x + b*y ; c*x + d*y ]
end

rot(π/2)([4,5])

2-element Vector{Float64}:
  5.0
 -3.9999999999999996

Transformations can be generalized as matrix operations, but some multivariate functions cannot be: 

In [12]:
begin
	function warp(α)
		((x,y),)  -> begin
			r = √(x^2+y^2)
			θ=α*r
			rot(θ)([x,y])
		end
	end
	
	rθ(x) = ( norm(x), atan(x[2],x[1])) # maybe vectors are more readable here?
	
	xy((r,θ)) = ( r*cos(θ), r*sin(θ))
end

warp(1)([5,6])

2-element Vector{Float64}:
  6.212853561644088
 -4.7329114318320356

The function `warp` is a rotation which depends on the point where it is applied:  

In [13]:
begin	
	warp₂(α,x,y) = rot(α*√(x^2+y^2))
	warp₂(α) = ((x,y),) -> warp₂(α,x,y)([x,y])	
end

warp₂(1)([5,6])


2-element Vector{Float64}:
  6.212853561644088
 -4.7329114318320356

### Automatic Differentiation of Transformations 

In [14]:
ForwardDiff.jacobian( warp(3), [4,5] )

2×2 Matrix{Float64}:
   7.06684    8.0157
 -10.6677   -11.9586

## Using Zygote 

Using `gradient` calculates derivatives. e.g. The derivative of 

\begin{equation*}
    3x^2 + 2x + 1
\end{equation*}

is 

\begin{equation*}
    6x + 2
\end{equation*}

so when `x = 5`, `dx = 32`. 



In [15]:
using Zygote 

ag = gradient( x -> 3x^2 + 2x + 1, 5 ) ; 
println("Anonymous form gradient = ", ag)

f(x) = 3x^2 + 2x + 1 ; 
sg = gradient(f, 5)
println("Short form gradient = ", sg) 



Anonymous form gradient = (32.0,)
Short form gradient = (32.0,)


`gradient` returns a tuple, with a gradient for each argument to the function: 

In [16]:
gradient( (a,b) -> a*b, 2, 3 )

(3.0, 2.0)

`gradient` will work if the arguments are arrays, structs, or some other Julia type, but the function should return a scalar, e.g. a loss or objective $l$, if doing optimization / ML.  

In [17]:
W = rand(2,3) ; x = rand(3) ; 

g = gradient( W -> sum(W*x), W)[1]
println("gradient of sum(W*x) = ", g)


gradient of sum(W*x) = 

[0.0557311731245792 0.8728705231730179 0.5068780123291883; 0.0557311731245792 0.8728705231730179 0.5068780123291883]
