Follow the video on https://www.youtube.com/watch?v=vAp6nUMrKYg.

# Autodiff:  <br> Calculus  from another angle 
(and the special role played by Julia's multiple dispatch and compiler technology)


   At the heart of modern machine learning, so popular in (2018),  is an optimization
problem.  Optimization means gradients, so suddenly differentiation, especially automatic differentiation, is exciting.


  The first time one  hears about automatic differentiation, it is easy to imagine what it is.  Surely it is  straightforward symbolic differentiation applied to code.  One imagines   automatically doing what is  learned  in a calculus class. 
  <img src="http://www2.bc.cc.ca.us/resperic/math6a/lectures/ch5/1/IntegralTable.gif" width="190">
  .... and anyway if it is not that, then it must be finite differences, like one learns in a numerical computing class.
  
<img src="http://image.mathcaptain.com/cms/images/122/Diff%202.png" width="150">



## Babylonian sqrt

We start with a simple example, the computation of sqrt(x), where  how autodiff works comes as both a mathematical surprise, and a computing wonder.  The example is  the Babylonian algorithm, known to mankind for millenia, to compute sqrt(x):  


 > Repeat $ t \leftarrow  (t+x/t) / 2 $ until $t$ converges to $\sqrt{x}$.
 
 Each iteration has one add and two divides. For illustration purposes, 10 iterations suffice.

In [3]:
function Babylonian(x; N = 10) 
    t = (1+x)/2
    ## The above can be thought of as i=1, t = (1 + x/1)/2
    ## Could it have been sth arbitrary?
    #t = 100
    #println("t = ", t)
    for i = 2:N; t=(t + x/t)/2  end    
    t
end  

Babylonian (generic function with 1 method)

Check that it works:

In [4]:
α = π
Babylonian(α), √α    

(1.7724538509055159, 1.7724538509055159)

In [5]:
x=2; Babylonian(x),√x  # Type \sqrt+<tab> to get the symbol

(1.414213562373095, 1.4142135623730951)

**(?1)** Questions about Babylonian sqrt.
- Why this Babylonian iteration converges to `sqrt(x)`?
- Does it matter to start with `t=1`?

**(R1)** Cf. `./babylonian_sqrt.ipynb`

In [6]:
# Pkg.add(plots)
# Pkg.add(plotly)
using Plots
plotly()
#gr()
#pyplot()

LoadError: ArgumentError: Package Plots not found in current path:
- Run `import Pkg; Pkg.add("Plots")` to install the Plots package.


In [7]:
## Warning first plots load packages, takes time
i = 0:.01:49

# Note the diff i's and their scopes
#plot([x->Babylonian(x,N=i) for i=1:5],i,label=["Iteration $j" for i=1:1,j=1:5])
plot([x->Babylonian(x,N=i) for i=1:5],i,label=["Iteration $j" for j=1:5])

plot!(sqrt,i,c="black",label="sqrt",
      title="Those Babylonians really knew how to √")

LoadError: UndefVarError: plot not defined

**(?2)**
Why the above plot did not work? Would the same work in `Pluto`?

## ...and now the derivative, almost by magic

Eight lines of Julia!
- No mention of $\frac{1}{2} x^{-\frac{1}{2}}$.
- `D` for "**dual number**", invented by the famous algebraist Clifford in 1873.

In [8]:
struct D <: Number  # D is a function-derivative pair
    f::Tuple{Float64,Float64}
end

Sum Rule: (x+y)' = x' + y' <br>
Quotient Rule: (x/y)' = (yx'-xy') / y^2

In [9]:
import Base: +, /, convert, promote_rule
+(x::D, y::D) = D(x.f .+ y.f)
/(x::D, y::D) = D((x.f[1]/y.f[1], (y.f[1]*x.f[2] - x.f[1]*y.f[2])/y.f[1]^2))
convert(::Type{D}, x::Real) = D((x,zero(x)))  # derivative of a constant
promote_rule(::Type{D}, ::Type{<:Number}) = D

promote_rule (generic function with 123 methods)

For any given `D`, its `f` will store
- `f[1]`: some function $u$
- `f[2]`: the derivative $u'$

**curly brackets** `Type{}`, not **parentheses** `Type()`.

In [10]:
Type{D}

Type{D}

In [11]:
Type{Number}

Type{Number}

What is `zero(x)` when `x` is a `Real`?

In [12]:
x = 2.71828

2.71828

In [13]:
zero(x)

0.0

In [14]:
typeof(zero(10)), typeof(zero(3.14))

(Int64, Float64)

The same algorithm with no rewrite at all computes properly
the derivative as the check shows.

In [15]:
x=49; Babylonian(D((x,1))), (√x,.5/√x)

(D((7.0, 0.07142857142857142)), (7.0, 0.07142857142857142))

In [16]:
x=π; Babylonian(D((x,1))), (√x,.5/√x)

(D((1.7724538509055159, 0.28209479177387814)), (1.7724538509055159, 0.28209479177387814))

In [17]:
aaa = "aaa"
bbb = "bbb"

"bbb"

**Unlike `Pluto`**, in Jupyter notebook, to write a multi-line code cell we **don't need** to write `begin ... end`

## It just works!

How does it work?  We will explain in a moment.  Right now marvel that it does.  Note we did not
import any autodiff package.  Everything is just basic vanilla Julia.

## The assembler

Most folks don't read assembler, but one can see that it is short.
The shortness is a clue that suggests speed!

In [18]:
@inline function Babylonian(x; N = 10) 
    t = (1+x)/2
    for i = 2:N; t=(t + x/t)/2  end    
    t
end  
@code_native(Babylonian(D((2,1))))

	.text
; ┌ @ In[18]:1 within `Babylonian'
	movq	%rdi, %rax
; │ @ In[18]:2 within `Babylonian'
; │┌ @ In[18]:2 within `#Babylonian#8'
; ││┌ @ promotion.jl:311 within `+' @ In[9]:2
; │││┌ @ broadcast.jl:837 within `materialize'
; ││││┌ @ broadcast.jl:1046 within `copy'
; │││││┌ @ ntuple.jl:42 within `ntuple'
; ││││││┌ @ broadcast.jl:1046 within `#19'
; │││││││┌ @ broadcast.jl:621 within `_broadcast_getindex'
; ││││││││┌ @ broadcast.jl:648 within `_broadcast_getindex_evalf'
; │││││││││┌ @ float.jl:401 within `+'
	vmovsd	(%rsi), %xmm1           # xmm1 = mem[0],zero
	vmovsd	8(%rsi), %xmm2          # xmm2 = mem[0],zero
	movabsq	$.rodata.cst8, %rcx
	vaddsd	(%rcx), %xmm1, %xmm4
	vxorpd	%xmm8, %xmm8, %xmm8
	vaddsd	%xmm8, %xmm2, %xmm5
	movabsq	$140447991773272, %rcx  # imm = 0x7FBC98A8B858
	vmovsd	(%rcx), %xmm3           # xmm3 = mem[0],zero
; ││└└└└└└└└
; ││┌ @ promotion.jl:314 within `/' @ In[9]:3 @ float.jl:407
	vmulsd	%xmm3, %xmm4, %xmm6
; │││ @ promotion.jl:314 within `/' @ In[9]:3
; │││┌ @

**(?3)** Do we need to rewrite the definition again like above in order to use `@code_native()`?

## Symbolically

We haven't yet explained how it works, but it may be of some value to understand that the below is mathematically
equivalent, though not what the computation is doing.

Notice in the below that Babylonian works on SymPy symbols.

Note: Python and Julia are good friends.  It's not a competition!  Watch how nicely we can use the same code now with SymPy.

In [19]:
using Pkg

In [20]:
Pkg.add("SymPy")
using SymPy                

[32m[1m   Updating[22m[39m registry at `~/.julia/registries/General`
[32m[1m  Resolving[22m[39m package versions...
[32m[1m  Installed[22m[39m RecipesBase ─ v1.1.1
[32m[1m  Installed[22m[39m SymPy ─────── v1.0.40
[32m[1m  Installed[22m[39m PyCall ────── v1.92.2
[32m[1mUpdating[22m[39m `~/.julia/environments/v1.5/Project.toml`
 [90m [24249f21] [39m[92m+ SymPy v1.0.40[39m
[32m[1mUpdating[22m[39m `~/.julia/environments/v1.5/Manifest.toml`
 [90m [efe28fd5] [39m[92m+ OpenSpecFun_jll v0.5.3+4[39m
 [90m [438e738f] [39m[92m+ PyCall v1.92.2[39m
 [90m [3cdcf5f2] [39m[92m+ RecipesBase v1.1.1[39m
 [90m [276daf66] [39m[92m+ SpecialFunctions v1.2.1[39m
 [90m [24249f21] [39m[92m+ SymPy v1.0.40[39m
[32m[1m   Building[22m[39m PyCall → `~/.julia/packages/PyCall/tqyST/deps/build.log`
┌ Info: Precompiling SymPy [24249f21-da20-56a4-8eb1-6a02cf4ae2e6]
└ @ Base loading.jl:1278


LoadError: InitError: PyError (PyImport_ImportModule

The Python package sympy could not be imported by pyimport. Usually this means
that you did not install sympy in the Python version being used by PyCall.

PyCall is currently configured to use the Python version at:

/home/phunc20/.config/miniconda3/envs/homl-1e/bin/python3

and you should use whatever mechanism you usually use (apt-get, pip, conda,
etcetera) to install the Python package containing the sympy module.

One alternative is to re-configure PyCall to use a different Python
version on your system: set ENV["PYTHON"] to the path/name of the python
executable you want to use, run Pkg.build("PyCall"), and re-launch Julia.

Another alternative is to configure PyCall to use a Julia-specific Python
distribution via the Conda.jl package (which installs a private Anaconda
Python distribution), which has the advantage that packages can be installed
and kept up-to-date via Julia.  As explained in the PyCall documentation,
set ENV["PYTHON"]="", run Pkg.build("PyCall"), and re-launch Julia. Then,
To install the sympy module, you can use `pyimport_conda("sympy", PKG)`,
where PKG is the Anaconda package that contains the module sympy,
or alternatively you can use the Conda package directly (via
`using Conda` followed by `Conda.add` etcetera).

) <class 'ModuleNotFoundError'>
ModuleNotFoundError("No module named 'sympy'")

during initialization of module SymPy

In [69]:
x = symbols("x")
display("Iterations as a function of x")
for k = 1:5
 display( simplify(Babylonian(x,N=k)))
end

display("Derivatives as a function of x")
for k = 1:5
 display(simplify(diff(simplify(Babylonian(x,N=k)),x)))
end

LoadError: UndefVarError: symbols not defined

The code is computing answers mathematically equivalent to the functions above, but not symbolically, numerically. 

## How autodiff is getting the answer
Let us by hand take the "derivative" of the Babylonian iteration with respect to x. Specifically t′=dt/dx.  This is the old fashioned way of a human rewriting code.

In [21]:
function dBabylonian(x; N = 10) 
    t = (1+x)/2
    t′ = 1/2
    for i = 1:N;  
        t = (t+x/t)/2; 
        t′= (t′+(t-x*t′)/t^2)/2; 
    end    
    t′

end  

dBabylonian (generic function with 1 method)

See this rewritten code gets the right answer.  So the trick is for the computer system to do it for you, and without any loss of speed or convenience.

In [22]:
x = π; dBabylonian(x), .5/√x

(0.2820947917738782, 0.28209479177387814)

What just happened?  Answer: We created an iteration by hand for t′ given our iteration for t. Then we ran the iteration alongside the iteration for t.

In [23]:
Babylonian(D((x,1)))

D((1.7724538509055159, 0.28209479177387814))

How did this work?  It created the same derivative iteration that we did by hand, using very general rules that are set once and need not be written by hand.

Important:: The derivative is substituted before the JIT compiler, and thus efficient compiled code is executed.

## Dual Number Notation

Instead of D(a,b) we can write a + b ϵ, where ϵ satisfies ϵ^2=0.  (Some people like to recall imaginary numbers where an i is introduced with i^2=-1.) 

Others like to think of how engineers just drop the O(ϵ^2) terms.

The four rules are

$ (a+b\epsilon) \pm (c+d\epsilon) = (a+c) \pm (b+d)\epsilon$

$ (a+b\epsilon) * (c+d\epsilon) = (ac) + (bc+ad)\epsilon$

$ (a+b\epsilon) / (c+d\epsilon) = (a/c) + (bc-ad)/d^2 \epsilon $


In [24]:
Base.show(io::IO,x::D) = print(io,x.f[1]," + ",x.f[2]," ϵ")

In [25]:
# Add the last two rules
import Base: -,*
-(x::D, y::D) = D(x.f .- y.f)
*(x::D, y::D) = D((x.f[1]*y.f[1], (x.f[2]*y.f[1] + x.f[1]*y.f[2])))

* (generic function with 414 methods)

**(?)** Commutivity of `*`?

In [26]:
D((1,0))

1.0 + 0.0 ϵ

In [27]:
D((0,1))^2

0.0 + 0.0 ϵ

In [28]:
D((2,1)) ^2

4.0 + 4.0 ϵ

In [29]:
ϵ = D((0,1))
@code_native(ϵ^2)

	.text
; ┌ @ intfuncs.jl:274 within `^'
	pushq	%rbx
	subq	$16, %rsp
	movq	%rdi, %rbx
	movabsq	$power_by_squaring, %rax
	movq	%rsp, %rdi
	callq	*%rax
	vmovups	(%rsp), %xmm0
	vmovups	%xmm0, (%rbx)
	movq	%rbx, %rax
	addq	$16, %rsp
	popq	%rbx
	retq
	nopl	(%rax)
; └


In [30]:
ϵ * ϵ 

0.0 + 0.0 ϵ

In [31]:
ϵ^2

0.0 + 0.0 ϵ

In [32]:
ϵ^3

0.0 + 0.0 ϵ

In [33]:
1/(1+ϵ)  # Exact power series:  1-ϵ+ϵ²-ϵ³-...

1.0 + -1.0 ϵ

**(?)** Why did Prof. Adelman spoke of `power series:  1-ϵ+ϵ²-ϵ³-...` here? Was it really some power series?

In [34]:
(1+ϵ)*(1-ϵ)

1.0 + 0.0 ϵ

In [39]:
(1+ϵ)^5 ## Note this just works (we didn't train powers)!!

1.0 + 5.0 ϵ

In [40]:
(1+ϵ)^7

1.0 + 7.0 ϵ

In [41]:
(1+ϵ)^20

1.0 + 20.0 ϵ

## Generalization to arbitrary roots

In [42]:
function nthroot(x, n=2; t=1, N = 10) 
    for i = 1:N;   t += (x/t^(n-1)-t)/n; end   
    t
end  

nthroot (generic function with 2 methods)

In [43]:
nthroot(2,3), ∛2 # take a cube root

(1.2599210498948732, 1.2599210498948732)

In [44]:
nthroot(2+ϵ,3)

1.2599210498948732 + 0.20998684164914552 ϵ

In [45]:
nthroot(7,12), 7^(1/12)

(1.1760474285795146, 1.1760474285795146)

In [46]:
x = 2.0
nthroot( x+ϵ,3), ∛x, 1/x^(2/3)/3

(1.2599210498948732 + 0.20998684164914552 ϵ, 1.2599210498948732, 0.20998684164914552)

In [47]:
x+ϵ == D((x,1))

true

## Forward Diff
Now that you understand it, you can use the official package

In [49]:
using ForwardDiff

┌ Info: Precompiling ForwardDiff [f6369f11-7733-5829-9624-2563aa707210]
└ @ Base loading.jl:1278


In [50]:
ForwardDiff.derivative(sqrt, 2)

0.35355339059327373

In [51]:
ForwardDiff.derivative(Babylonian, 2)

0.35355339059327373

In [52]:
Babylonian(D((2,1))).f[2]

0.35355339059327373

In [53]:
@which ForwardDiff.derivative(sqrt, 2)

## Close Look at Convergence with big floats
the $-\log_{10} \Delta t$ gives the number of correct digits.  Watch the quadratic convergence right before your eyes.

In [54]:
setprecision(3000)
Float64.(log10.([Babylonian(BigFloat(2),N=k) for k=1:10] .- √BigFloat(2)))

10-element Array{Float64,1}:
   -1.0665813663397075
   -2.610283987399077
   -5.672865645792784
  -11.797276937315402
  -24.046098868127267
  -48.543742729750505
  -97.53903045299698
 -195.52960589948992
 -391.51075679247583
 -783.4730585784476

In [55]:
?round

search: [0m[1mr[22m[0m[1mo[22m[0m[1mu[22m[0m[1mn[22m[0m[1md[22m [0m[1mr[22m[0m[1mo[22m[0m[1mu[22m[0m[1mn[22m[0m[1md[22ming [0m[1mR[22m[0m[1mo[22m[0m[1mu[22m[0m[1mn[22m[0m[1md[22mUp [0m[1mR[22m[0m[1mo[22m[0m[1mu[22m[0m[1mn[22m[0m[1md[22mDown [0m[1mR[22m[0m[1mo[22m[0m[1mu[22m[0m[1mn[22m[0m[1md[22mToZero [0m[1mR[22m[0m[1mo[22m[0m[1mu[22m[0m[1mn[22m[0m[1md[22mingMode [0m[1mR[22m[0m[1mo[22m[0m[1mu[22m[0m[1mn[22m[0m[1md[22mNearest



```
round(z::Complex[, RoundingModeReal, [RoundingModeImaginary]])
round(z::Complex[, RoundingModeReal, [RoundingModeImaginary]]; digits=, base=10)
round(z::Complex[, RoundingModeReal, [RoundingModeImaginary]]; sigdigits=, base=10)
```

Return the nearest integral value of the same type as the complex-valued `z` to `z`, breaking ties using the specified [`RoundingMode`](@ref)s. The first [`RoundingMode`](@ref) is used for rounding the real components while the second is used for rounding the imaginary components.

# Example

```jldoctest
julia> round(3.14 + 4.5im)
3.0 + 4.0im
```

---

```
round([T,] x, [r::RoundingMode])
round(x, [r::RoundingMode]; digits::Integer=0, base = 10)
round(x, [r::RoundingMode]; sigdigits::Integer, base = 10)
```

Rounds the number `x`.

Without keyword arguments, `x` is rounded to an integer value, returning a value of type `T`, or of the same type of `x` if no `T` is provided. An [`InexactError`](@ref) will be thrown if the value is not representable by `T`, similar to [`convert`](@ref).

If the `digits` keyword argument is provided, it rounds to the specified number of digits after the decimal place (or before if negative), in base `base`.

If the `sigdigits` keyword argument is provided, it rounds to the specified number of significant digits, in base `base`.

The [`RoundingMode`](@ref) `r` controls the direction of the rounding; the default is [`RoundNearest`](@ref), which rounds to the nearest integer, with ties (fractional values of 0.5) being rounded to the nearest even integer. Note that `round` may give incorrect results if the global rounding mode is changed (see [`rounding`](@ref)).

# Examples

```jldoctest
julia> round(1.7)
2.0

julia> round(Int, 1.7)
2

julia> round(1.5)
2.0

julia> round(2.5)
2.0

julia> round(pi; digits=2)
3.14

julia> round(pi; digits=3, base=2)
3.125

julia> round(123.456; sigdigits=2)
120.0

julia> round(357.913; sigdigits=4, base=2)
352.0
```

!!! note
    Rounding to specified digits in bases other than 2 can be inexact when operating on binary floating point numbers. For example, the [`Float64`](@ref) value represented by `1.15` is actually *less* than 1.15, yet will be rounded to 1.2.

    # Examples

    ```jldoctest; setup = :(using Printf)
    julia> x = 1.15
    1.15

    julia> @sprintf "%.20f" x
    "1.14999999999999991118"

    julia> x < 115//100
    true

    julia> round(x, digits=1)
    1.2
    ```


# Extensions

To extend `round` to new numeric types, it is typically sufficient to define `Base.round(x::NewType, r::RoundingMode)`.

---

```
round(dt::TimeType, p::Period, [r::RoundingMode]) -> TimeType
```

Return the `Date` or `DateTime` nearest to `dt` at resolution `p`. By default (`RoundNearestTiesUp`), ties (e.g., rounding 9:30 to the nearest hour) will be rounded up.

For convenience, `p` may be a type instead of a value: `round(dt, Dates.Hour)` is a shortcut for `round(dt, Dates.Hour(1))`.

```jldoctest
julia> round(Date(1985, 8, 16), Dates.Month)
1985-08-01

julia> round(DateTime(2013, 2, 13, 0, 31, 20), Dates.Minute(15))
2013-02-13T00:30:00

julia> round(DateTime(2016, 8, 6, 12, 0, 0), Dates.Day)
2016-08-07T00:00:00
```

Valid rounding modes for `round(::TimeType, ::Period, ::RoundingMode)` are `RoundNearestTiesUp` (default), `RoundDown` (`floor`), and `RoundUp` (`ceil`).

---

```
round(x::Period, precision::T, [r::RoundingMode]) where T <: Union{TimePeriod, Week, Day} -> T
```

Round `x` to the nearest multiple of `precision`. If `x` and `precision` are different subtypes of `Period`, the return value will have the same type as `precision`. By default (`RoundNearestTiesUp`), ties (e.g., rounding 90 minutes to the nearest hour) will be rounded up.

For convenience, `precision` may be a type instead of a value: `round(x, Dates.Hour)` is a shortcut for `round(x, Dates.Hour(1))`.

```jldoctest
julia> round(Dates.Day(16), Dates.Week)
2 weeks

julia> round(Dates.Minute(44), Dates.Minute(15))
45 minutes

julia> round(Dates.Hour(36), Dates.Day)
2 days
```

Valid rounding modes for `round(::Period, ::T, ::RoundingMode)` are `RoundNearestTiesUp` (default), `RoundDown` (`floor`), and `RoundUp` (`ceil`).

Rounding to a `precision` of `Month`s or `Year`s is not supported, as these `Period`s are of inconsistent length.


In [56]:
setprecision(3000)
#round.(Float64.(log10.([Babylonian(BigFloat(2),N=k) for k=1:10] .- √BigFloat(2))),3)
round.(Float64.(log10.([Babylonian(BigFloat(2),N=k) for k=1:10] .- √BigFloat(2))),digits=3)

10-element Array{Float64,1}:
   -1.067
   -2.61
   -5.673
  -11.797
  -24.046
  -48.544
  -97.539
 -195.53
 -391.511
 -783.473

**(?)** Why `\sqrt BigFloat(2)` instead of `\sqrt 2`? Does it matter?

In [95]:
struct D1{T} <: Number  # D is a function-derivative pair
    f::Tuple{T,T}
end

In [96]:
z = D((2.0,1.0))
z1 = D1((BigFloat(2.0),BigFloat(1.0)))

D1{BigFloat}((2.0, 1.0))

In [97]:
import Base: +, /, convert, promote_rule
+(x::D1, y::D1) = D1(x.f .+ y.f)
/(x::D1, y::D1) = D1((x.f[1]/y.f[1], (y.f[1]*x.f[2] - x.f[1]*y.f[2])/y.f[1]^2))
convert(::Type{D1{T}}, x::Real) where {T} = D1((convert(T, x), zero(T)))
promote_rule(::Type{D1{T}}, ::Type{S}) where {T,S<:Number} = D1{promote_type(T,S)}

promote_rule (generic function with 160 methods)

**(?)** Why define additionally this `D1` thing and the seemingly redundant `+, /, convert, promote_rule`?

In [64]:
A = randn(3,3)

3×3 Array{Float64,2}:
 0.832856  -1.66678   0.446673
 1.82869    2.17166  -0.149357
 0.431754   1.24945  -1.46358

In [60]:
x = randn(3)

3-element Array{Float64,1}:
  0.17874772549648885
 -0.7424719869048871
 -0.018209429442481086

In [100]:
ForwardDiff.gradient(x->x'A*x,x)

3-element Array{Float64,1}:
 -2.3083962423970785
 -2.5880816452286717
  5.52159279827821

In [101]:
(A+A')*x

3-element Array{Float64,1}:
 -2.3083962423970785
 -2.5880816452286717
  5.52159279827821

**(?)** Can you explain the differential being $(A + A^{T})\,x$?<br>
**(R)** For reasons why `(A+A')*x` can be used to verify `ForwardDiff.gradient(x->x'A*x,x)`, cf. `./differential_review.ipynb`

In [113]:
n = 4
using LinearAlgebra
Strang = SymTridiagonal(2*ones(n),-ones(n-1))

4×4 SymTridiagonal{Float64,Array{Float64,1}}:
  2.0  -1.0    ⋅     ⋅ 
 -1.0   2.0  -1.0    ⋅ 
   ⋅   -1.0   2.0  -1.0
   ⋅     ⋅   -1.0   2.0

In [62]:
x_dual = [D((t, 1)) for t in x]

3-element Array{D,1}:
   0.17874772549648885 + 1.0 ϵ
   -0.7424719869048871 + 1.0 ϵ
 -0.018209429442481086 + 1.0 ϵ

##  But wait there's more!

Many packages need to be taught how to compute autodiffs of matrix factorications such as the svd or lu.  Julia will "just do it," no
teaching necessary for reasons such as the above.  This is illustrated in another notebook, not included here.