adjoint vector-Jacobian product form of precomputed gradients for reverse #876

bob-carpenter · 2018-05-29T17:51:30Z

Summary:

Add precomputed gradients class for multivariate functions that reduces user burden from understanding autodiff stack management and expression templates to computing the function and a simple vector-Jacobian product.

Description:

For a multivariate function f:R^N -> R^M and input x in R^N, the chain rule for reverse mode autodiff unfolds to:

for (n in 1:N)
  for (m in 1:M)
    x[m].adjoint += f(x)[n].adjoint * J[n, m];

where J is the M x N Jacobian, i.e.,

J[m, n] = d f(x)[n] / d x[m].

Taken as a whole, this is neatly expressed as:

x.adjoint += f(x).adjoint * J;

So we should be able to take a user-defined functor:

/**
 * Structure for implementing a multivariate reverse-mode automatic
 * differentiation function based on vector-Jacobian products.
 */ 
struct foo {
  /**
   * Return value of function for the specified argument .  This operation
   * may store intermediate values needed for vector-Jacobian product
   * as member variables using only memalloc (i.e., no members on heap).
   *
   * @param x argument to function
   * @return value of function
   */
  VectorXd apply(const VectorXd& x);

  /**
   * Return the product of the specified vector with the Jacobian
   * of the function.
   *
   * @param fx_adj adjoint of the function result
   * @return adjoint-Jacobian product
   */
  VectorXd multiply_adjoint_jacobian(const VectorXd& fx_adj) const;
};

From that, we can create an appropriate implementation of chain() for the first result and make the rest non-chainable as we have done for other specialized function. This chain() method will just call the multiply_adjoint_jacobian method of the object and do the increments. The user will never have to even think about our autodiff types.

The obvious next step would be to generalize to matrix arguments (not so hard given Eigen's underlying memory layout and access) and then to pairs of arguments in {matrix, vector, scalar}. That might not be too hard given Eigen's templating and underlying linear memory layout and indexing.

Current Version:

v2.17.0

The text was updated successfully, but these errors were encountered:

bob-carpenter · 2018-06-09T21:41:25Z

Here's a sketch that should work once it's debugged. All the work's done in the chain() method of the first vari returned in the result vector. The rest of the vari in the result are not chainable, meaning they don't go on the autodiff stack (they do go in the arena memory and otherwise behave like other autodiff variables).

// Further generalize to generic inputs and outputs with
// template int params: Rx, Cx, and Ry, Cy

struct foo_fun : public arena_allocated {
  template <typename T>
  Matrix<T, -1, 1> apply(const Matrix<T, -1, 1>& x) const;

  VectorXd apply_jac(const VectorXd& x);

  VectorXd multiply_adjoint_jacobian(const VectorXd& fx_adj) const;
};

template <typename T>
Matrix<T, -1, 1> foo(const Matrix<T, -1, 1>& x) {
  return adj_jac_apply<foo_fun>(x);
}

template <class F>
vector_d adj_jac_apply(const vector_d& x) {
  F f;
  return f.apply(x);
}

template <class F>
vector_v adj_jac_apply(const vector_v& x) {
  F* f = new F();
  vector_d val_x = value_of(x);
  vector_d val_y = f.apply_jac(val_x);
  int N = val_y.size();
  if (N == 0) return vector_v(0);

  vector_v y;
  y.reserve(N);
  apply_vari* vi = new apply_vari<F>(f, x, f_xd);
  for (int i = 0; i < N; ++i)
    y.emplace_back((vi -> y_vi_)[i]);
  return y;
}

teplate <typename F>
struct apply_vari : public vari {
  F& f_;
  int N_;
  vari** x_vi_;
  int M_;
  vari** y_vi_;  // only first one is chainable, its chain method is special

  apply_vari(F& f, const vector_v& x, const vector_d& val_y)
      : vari(val_y(0)), f_(f),
        N_(x.size()), x_vi_(memory::memalloc_.array_alloc<vari*>(N_)),
        M_(val_y.size()), y_vi_(memory::memalloc_.array_alloc<vari*>(M_)) {
    for (int n = 0; n < N_; ++n)
      x_vi_[n] = x(n).vi_;
    y_vi_[0] = this;
    for (int m = 1; m < M_; ++m)
      y_vi_[m] = new vari(val_y(m), false);
  }

  void chain() {
    vector_d y_adjoint(M_);
    for (int m = 0; i < M_; ++m)
      y_adj(i) = y_vi_[m] -> adj_;
    vector_d y_adj_jac = f_.multiply_adjoint_jacobian(y_adj);
    for (int n = 0; n < N_; ++n)
      x_vi_[n] -> adj_ += y_adj_jac(n);
  }
};

bob-carpenter · 2018-07-03T21:16:27Z

Yes, that's the n-th dimension of the result (f maps R^N -> R^M).

If we store the Jacobian, then the code is the naive loop to propagate each of the M output adjoints ot each of the N input adjoints.

The goal's not to cut down on arithmetic. It might save memory by being lazy about the product compared to a simpler implementation. But the main goal is just to make it easier to write efficient multivariate functions without having to deal with a ton of fiddly memory operations.

bob-carpenter · 2018-07-04T21:20:33Z

My mistake. I just always get confused about the indexing of Jacobians because it's R^N -> R^M. The loop should be over m in 1:M and and index f(x)[m].

adjoint-vector product, #876 Thanks!

seantalts · 2018-08-02T18:14:07Z

Was this fixed by #924?

bbbales2 · 2018-08-02T18:21:03Z

Yeah, I think this closes good catch.

bob-carpenter added feature new function labels May 29, 2018

bob-carpenter added this to the v3.0.0 milestone May 29, 2018

bob-carpenter assigned bob-carpenter and rtrangucci May 29, 2018

bbbales2 mentioned this issue Jun 29, 2018

adjoint-vector product, #876 #924

Merged

3 tasks

bbbales2 added a commit that referenced this issue Jul 23, 2018

Updated docs for adj_jac_vari (Issue #876)

98ff800

bbbales2 added a commit that referenced this issue Jul 23, 2018

Changed docs for adj_jac_vari (Issue #876)

d2e07ac

bbbales2 added a commit that referenced this issue Jul 25, 2018

Fixed class naming (Issue #876)

8d8e34b

bbbales2 mentioned this issue Aug 1, 2018

Custom reverse mode autodiff for ordered_constrain, positive_ordered_constrain, and simplex_constrain #959

Merged

6 tasks

bob-carpenter pushed a commit that referenced this issue Aug 1, 2018

Merge pull request #924 from stan-dev/feature/issue-876-adj-jac

ab7b5a2

adjoint-vector product, #876 Thanks!

bbbales2 mentioned this issue Aug 2, 2018

Multiple input adjoint vector-Jacobian product form of precomputed gradients for reverse #963

Closed

bbbales2 closed this as completed Aug 2, 2018

bbbales2 mentioned this issue Aug 6, 2018

Allow different output types in the adj_jac_apply stuff #970

Closed

syclik modified the milestones: v3.0.0, 2.19.0 Mar 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adjoint vector-Jacobian product form of precomputed gradients for reverse #876

adjoint vector-Jacobian product form of precomputed gradients for reverse #876

bob-carpenter commented May 29, 2018 •

edited by seantalts

Loading

bob-carpenter commented Jun 9, 2018 •

edited

Loading

bob-carpenter commented Jul 3, 2018 •

edited

Loading

bob-carpenter commented Jul 4, 2018

seantalts commented Aug 2, 2018

bbbales2 commented Aug 2, 2018

adjoint vector-Jacobian product form of precomputed gradients for reverse #876

adjoint vector-Jacobian product form of precomputed gradients for reverse #876

Comments

bob-carpenter commented May 29, 2018 • edited by seantalts Loading

Summary:

Description:

Current Version:

bob-carpenter commented Jun 9, 2018 • edited Loading

bob-carpenter commented Jul 3, 2018 • edited Loading

bob-carpenter commented Jul 4, 2018

seantalts commented Aug 2, 2018

bbbales2 commented Aug 2, 2018

bob-carpenter commented May 29, 2018 •

edited by seantalts

Loading

bob-carpenter commented Jun 9, 2018 •

edited

Loading

bob-carpenter commented Jul 3, 2018 •

edited

Loading