An easy, extensible, Numpy-based automatic differentiation tool, adapted from Autodidact and Autograd.
A complete but a little messy jupyter tutorial.
- Less code: This re-implementation is composed of < 300 lines of code, which is better for learning features of autodiff. Autodidact has 700 lines and Autograd has 3000 lines.
- Fixed bug: Autodidact is a pedagogical implementation of Autograd, which leads to reduction of many code but also some bugs. Ex:
grad(grad(lambda x: 2 / x))(np.array([1]))
goes wrong in Autodidact but well in Easygrad.
- Easygrad doesn't wrap all Numpy functions directly but instead wrap the functions that invoke them, which might be the reason for fixing some bugs.
- Only some basic functions (add, sub, exp, sin, ...) are implemented for tutorial, which can be easily extended if needed.
- Similar to Autograd, Easygrad also adopt
Box
to help flag the variables we're taking the gradient with respect to and differentiate backward path (computation graph) of different derivative (first/second/... derivative). - Similar to Autograd, Easygrad wrap functions with
primitive
, which is helpful for doing real computation with variables within multiple nested boxes and reducing redundent codes of stripping eachbox
.
-
Why is
__array_priority__ > 0
important?-
For
ArrayBox(np.array([1,2]), 0, None) / np.array([1, 2])
,ArrayBox.__truediv__
is invoked; we get an answer ofArrayBox with value array([1., 1.])
, which is correct. -
For
np.array([1, 2]) / ArrayBox(np.array([1,2]), 0, None)
,ArrayBox.__rtruediv__
is NOT invoked; instead,np.ndarray.__truediv__
is invoked, and we getarray([ArrayBox, ArrayBox])
, where the firstArrayBox with array([1. , 0.5])
& secondArrayBox with array([2., 1.])
. That's why everything goes wrong.
-
-
Why is
TraceStack
important?- For a simple code below,
grad(f)(5.)
wraps5.
in aBox
and callsf(Box(5))
. Then,grad(g)(x)
wrapsBox(5)
again and callsg(Box(Box(5))
. - Function
g(y)
does calculation ofx*y
, wherex=Box(5.)
andy=Box(Box(5.))
, for whichx=Box(5.)
is fixed and does not need gradient. TraceStack
generates incremental number astrace_id
, which is1
fory
and0
forx
. That's howprimitive
function differentiate these things.
- For a simple code below,
# A brief example code form Autodidact
def f(x):
def g(y):
return x * y
return grad(g)(x)
y = grad(f)(5.)
Thanks for Matt Johnson's great tutorial of Autodidact and a much more advanced version of Autograd.