# 2 Derivatives

This notebook was automatically generated from the Algorithms for Optimization source code. Each cell generates a figure from the original text. While this code is not optimized for use in lectures, we provide it here to be adapted for such projects. We hope you find it useful.

In [2]:
#import Pkg; 
#Pkg.add("SymEngine");
using SymEngine

┌ Info: Precompiling SymEngine [123dc426-2d89-5057-bbad-38513e3affd8]
└ @ Base loading.jl:1260


# 2.1 Analytic gradient 

In [3]:
# one variables
@vars x;
f = x^2 + x/2 - sin(x)/x;
diff(f, x)

1/2 + 2*x + sin(x)/x^2 - cos(x)/x

In [4]:
# many variables
@vars x1, x2;
f = x1*sin(x2) + 1;
println(diff(f, x1))
println(diff(f, x2))

sin(x2)
x1*cos(x2)


# 2.2 Numerical gradient
- Finite difference
- Complex step

In [6]:
# define a target function
f0(x) = x^2 + x/2 - sin(x)/x;

In [7]:
# Finite difference method
diff_forward(f, x; h = sqrt(eps(Float64))) = (f(x+h) - f(x))/h;
diff_central(f, x; h = sqrt(eps(Float64))) = (f(x+h/2) - f(x-h/2))/h;
diff_backward(f, x; h = sqrt(eps(Float64))) = (f(x) - f(x-h))/h;

println(sqrt(eps(Float64)))
println(diff_forward(f0, 0.1))
println(diff_central(f0, 0.1))
println(diff_backward(f0, 0.1))

1.4901161193847656e-8
0.7333000227808952
0.733300007879734
0.7332999929785728


In [8]:
# Complex step method
diff_complex(f, x; h=1e-20) = imag(f(x+h*im))/h

println(diff_complex(f0, 0.1))

0.7333000119025557


In [7]:
#import Pkg; Pkg.add("Zygote")

In [10]:
# Automatic differentiation

import Zygote: gradient
gradient(f0, 0.1)

┌ Info: Precompiling Zygote [e88e6eb3-aa80-5325-afca-941959d7151f]
└ @ Base loading.jl:1260


(0.7333000119025559,)

In [11]:
f1(a, b) = log(a*b, max(a,2));
gradient(f1, 3.0, 2.0)

(0.07196888754292625, -0.17110198196123422)

# 2.3 Automatic Differentiation
- Dual numbers
- Forward pass

### 2.3.1 Dual Number Notation

Instead of D(a,b) we can write a + b ϵ, where ϵ satisfies ϵ^2=0.  (Some people like to recall imaginary numbers where an i is introduced with i^2=-1.) 

Others like to think of how engineers just drop the O(ϵ^2) terms.

The four rules are

$ (a+b\epsilon) \pm (c+d\epsilon) = (a \pm c) + (b \pm d)\epsilon$

$ (a+b\epsilon) * (c+d\epsilon) = (ac) + (bc+ad)\epsilon$

$ (a+b\epsilon) / (c+d\epsilon) = (a/c) + (bc-ad)/c^2 \epsilon $

In [12]:
struct D <: Number  # D is a function-derivative pair
    f::Tuple{Float64,Float64}
end

# Add the last two rules
import Base: -,*,+, /, convert, promote_rule
-(x::D, y::D) = D(x.f .- y.f)
*(x::D, y::D) = D((x.f[1]*y.f[1], (x.f[2]*y.f[1] + x.f[1]*y.f[2])))

+(x::D, y::D) = D(x.f .+ y.f)
/(x::D, y::D) = D((x.f[1]/y.f[1], (y.f[1]*x.f[2] - x.f[1]*y.f[2])/y.f[1]^2))
convert(::Type{D}, x::Real) = D((x,zero(x)))
promote_rule(::Type{D}, ::Type{<:Number}) = D

promote_rule (generic function with 148 methods)

In [13]:
ϵ  = D((0,1))

D((0.0, 1.0))

In [14]:
ϵ * ϵ

D((0.0, 0.0))

In [15]:
1/(1+ϵ)

D((1.0, -1.0))

In [16]:
(1+2*ϵ)*(3-4*ϵ)

D((3.0, 2.0))

### 2.3.2 Forward Differentiation

In [18]:
using ForwardDiff

In [19]:
a = ForwardDiff.Dual(3,1)
log(a^2)

Dual{Nothing}(2.1972245773362196,0.6666666666666666)

In [20]:
a = ForwardDiff.Dual(3,1)
b = ForwardDiff.Dual(2,0)
log(a*b + max(a,2))

Dual{Nothing}(2.1972245773362196,0.3333333333333333)