Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adjoint vector-Jacobian product form of precomputed gradients for reverse #876

Closed
bob-carpenter opened this issue May 29, 2018 · 5 comments
Assignees
Milestone

Comments

@bob-carpenter
Copy link
Contributor

bob-carpenter commented May 29, 2018

Summary:

Add precomputed gradients class for multivariate functions that reduces user burden from understanding autodiff stack management and expression templates to computing the function and a simple vector-Jacobian product.

Description:

For a multivariate function f:R^N -> R^M and input x in R^N, the chain rule for reverse mode autodiff unfolds to:

for (n in 1:N)
  for (m in 1:M)
    x[m].adjoint += f(x)[n].adjoint * J[n, m];

where J is the M x N Jacobian, i.e.,

J[m, n] = d f(x)[n] / d x[m].

Taken as a whole, this is neatly expressed as:

x.adjoint += f(x).adjoint * J;

So we should be able to take a user-defined functor:

/**
 * Structure for implementing a multivariate reverse-mode automatic
 * differentiation function based on vector-Jacobian products.
 */ 
struct foo {
  /**
   * Return value of function for the specified argument .  This operation
   * may store intermediate values needed for vector-Jacobian product
   * as member variables using only memalloc (i.e., no members on heap).
   *
   * @param x argument to function
   * @return value of function
   */
  VectorXd apply(const VectorXd& x);

  /**
   * Return the product of the specified vector with the Jacobian
   * of the function.
   *
   * @param fx_adj adjoint of the function result
   * @return adjoint-Jacobian product
   */
  VectorXd multiply_adjoint_jacobian(const VectorXd& fx_adj) const;
};

From that, we can create an appropriate implementation of chain() for the first result and make the rest non-chainable as we have done for other specialized function. This chain() method will just call the multiply_adjoint_jacobian method of the object and do the increments. The user will never have to even think about our autodiff types.

The obvious next step would be to generalize to matrix arguments (not so hard given Eigen's underlying memory layout and access) and then to pairs of arguments in {matrix, vector, scalar}. That might not be too hard given Eigen's templating and underlying linear memory layout and indexing.

Current Version:

v2.17.0

@bob-carpenter
Copy link
Contributor Author

bob-carpenter commented Jun 9, 2018

Here's a sketch that should work once it's debugged. All the work's done in the chain() method of the first vari returned in the result vector. The rest of the vari in the result are not chainable, meaning they don't go on the autodiff stack (they do go in the arena memory and otherwise behave like other autodiff variables).

// Further generalize to generic inputs and outputs with
// template int params: Rx, Cx, and Ry, Cy

struct foo_fun : public arena_allocated {
  template <typename T>
  Matrix<T, -1, 1> apply(const Matrix<T, -1, 1>& x) const;

  VectorXd apply_jac(const VectorXd& x);

  VectorXd multiply_adjoint_jacobian(const VectorXd& fx_adj) const;
};

template <typename T>
Matrix<T, -1, 1> foo(const Matrix<T, -1, 1>& x) {
  return adj_jac_apply<foo_fun>(x);
}

template <class F>
vector_d adj_jac_apply(const vector_d& x) {
  F f;
  return f.apply(x);
}

template <class F>
vector_v adj_jac_apply(const vector_v& x) {
  F* f = new F();
  vector_d val_x = value_of(x);
  vector_d val_y = f.apply_jac(val_x);
  int N = val_y.size();
  if (N == 0) return vector_v(0);

  vector_v y;
  y.reserve(N);
  apply_vari* vi = new apply_vari<F>(f, x, f_xd);
  for (int i = 0; i < N; ++i)
    y.emplace_back((vi -> y_vi_)[i]);
  return y;
}

teplate <typename F>
struct apply_vari : public vari {
  F& f_;
  int N_;
  vari** x_vi_;
  int M_;
  vari** y_vi_;  // only first one is chainable, its chain method is special

  apply_vari(F& f, const vector_v& x, const vector_d& val_y)
      : vari(val_y(0)), f_(f),
        N_(x.size()), x_vi_(memory::memalloc_.array_alloc<vari*>(N_)),
        M_(val_y.size()), y_vi_(memory::memalloc_.array_alloc<vari*>(M_)) {
    for (int n = 0; n < N_; ++n)
      x_vi_[n] = x(n).vi_;
    y_vi_[0] = this;
    for (int m = 1; m < M_; ++m)
      y_vi_[m] = new vari(val_y(m), false);
  }

  void chain() {
    vector_d y_adjoint(M_);
    for (int m = 0; i < M_; ++m)
      y_adj(i) = y_vi_[m] -> adj_;
    vector_d y_adj_jac = f_.multiply_adjoint_jacobian(y_adj);
    for (int n = 0; n < N_; ++n)
      x_vi_[n] -> adj_ += y_adj_jac(n);
  }
};

@bob-carpenter
Copy link
Contributor Author

bob-carpenter commented Jul 3, 2018

Yes, that's the n-th dimension of the result (f maps R^N -> R^M).

If we store the Jacobian, then the code is the naive loop to propagate each of the M output adjoints ot each of the N input adjoints.

The goal's not to cut down on arithmetic. It might save memory by being lazy about the product compared to a simpler implementation. But the main goal is just to make it easier to write efficient multivariate functions without having to deal with a ton of fiddly memory operations.

@bob-carpenter
Copy link
Contributor Author

My mistake. I just always get confused about the indexing of Jacobians because it's R^N -> R^M. The loop should be over m in 1:M and and index f(x)[m].

@seantalts
Copy link
Member

Was this fixed by #924?

@bbbales2
Copy link
Member

bbbales2 commented Aug 2, 2018

Yeah, I think this closes good catch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants