Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get autodiff working #2

Closed
rodluger opened this issue Feb 26, 2018 · 24 comments
Closed

Get autodiff working #2

rodluger opened this issue Feb 26, 2018 · 24 comments
Assignees
Labels
enhancement New feature or request

Comments

@rodluger
Copy link
Owner

Ultimately we want to use starry for inference, so having analytic derivatives of the light curve will be super useful.

@rodluger rodluger added the enhancement New feature or request label Apr 7, 2018
@ericagol
Copy link
Collaborator

ericagol commented Apr 26, 2018

I got autodiff working on the s_n(r,b) components. I just finished coding up the s_n function in Julia, and I implemented autodiff using the ForwardDiff package. I haven't added analytic derivatives of the elliptic integrals: ForwardDiff diffs those as well.

I still haven't gotten the transformation and rotation matrices computed, but these should be straightforward.

@ericagol ericagol self-assigned this Apr 26, 2018
@rodluger
Copy link
Owner Author

rodluger commented Apr 26, 2018 via email

@ericagol

This comment has been minimized.

@rodluger
Copy link
Owner Author

@ericagol Can you create a julia folder in the top level of the repository and place your code in there?

@ericagol
Copy link
Collaborator

Yes. I just created a folder called julia/ at the top level of the repo where I placed this code and a preliminary README.md.

@rodluger
Copy link
Owner Author

Great, thanks!

@rodluger
Copy link
Owner Author

rodluger commented May 1, 2018

@dfm Autodiff is working beautifully! Thanks for the help.

@rodluger
Copy link
Owner Author

rodluger commented May 1, 2018

Currently computing gradients of the flux in this test file. Run make test_autodiff to compile it. I'm going to add a gradient option to the pybind interface so we can start using this!

@rodluger
Copy link
Owner Author

rodluger commented May 2, 2018

@dfm How do I tell Eigen to not compute derivatives for a given variable? Say I have the function

template <typename T>
T testfunction(T& x1, T& x2, T& x3) {
    return x1 + x2 * x2 + x3 * x3 * x3;
}

and I want to compute derivatives w/ respect to x1 and x2, but not x3. Because of the way I've templated this, all three must have the same type, so if x1 and x2 are AutoDiffScalars, so too must x3. A cryptic comment in this example,

/**
* Important note:
* All ActiveScalars which are used in one computation must have
* either a common derivative vector length or a zero-length
* derivative vector.
*/

led me to believe that I could just resize x3.derivatives() to size zero, but that leads to weird assertion errors. Any tips?

@rodluger

This comment has been minimized.

@rodluger
Copy link
Owner Author

rodluger commented May 2, 2018

@dfm I'm pretty sure there's a bug in Eigen: it's related to the reason we had to typecast many of the scalars in function calls to step() and the elliptic integrals to get the code to compile with autodiff. Check out this open Eigen issue and the corresponding code. I'm guessing what happens is that there's an issue with the * operator when the AutoDiffScalar variable has no derivatives. Long story short, if I change my function to

template <typename T>
T testfunction(T& x1, T& x2, T& x3) {
    return T(x1) + T(x2 * x2) + T(x3 * x3 * x3);
}

then everything works as expected. Here's a MWE:

#include <iostream>
#include <iomanip>
#include <Eigen/Core>
#include <cmath>
#include <unsupported/Eigen/AutoDiff>
#include <vector>
using namespace std;
using Grad = Eigen::AutoDiffScalar<Eigen::VectorXd>;

// Dummy function
template <typename T>
T testfunction(T& x1, T& x2, T& x3, T& x4) {
    // This leads to an "Assertion failed" error:
    //return x1 + x2 * x2 + x3 * x3 * x3 * x3 + x4;

    // This compiles and runs fine:
    return T(x1) + T(x2 * x2) + T(x3 * x3 * x3) + T(x4 * x4 * x4 * x4);
}

// Instantiate a Grad type with or without derivatives
Grad new_grad(string name, double value, vector<string>& gradients, int& ngrad) {
    if(find(gradients.begin(), gradients.end(), name) != gradients.end()) {
        return Grad(value, gradients.size(), ngrad++);
    } else {
        return Grad(value);
    }
}

// Let's roll
int main() {

    // The user will supply this vector of parameter names
    // for which we will compute derivatives
    vector<string> gradients;
    gradients.push_back("x1");
    gradients.push_back("x2");

    // Declare our parameters: only the ones the user
    // wants will be differentiated!
    int ngrad = 0;
    Grad x1 = new_grad("x1", 4., gradients, ngrad);
    Grad x2 = new_grad("x2", 3., gradients, ngrad);
    Grad x3 = new_grad("x3", 2., gradients, ngrad);
    Grad x4 = new_grad("x4", 1., gradients, ngrad);

    // Compute the function
    Grad result = testfunction(x1, x2, x3, x4);

    // Print the flux and all the derivatives
    cout << result.value() << endl;
    cout << result.derivatives() << endl;

    return 0;
}

Curiously, if I declare the number of derivatives at compile time using Vector2d instead of VectorXd, then there is no issue. This is specifically a bug when the derivative size is dynamic. But that's the whole point: we want the user to choose which and how many derivatives to compute...

I'm going to keep digging -- I'd rather not add T() to everything in my code!

@rodluger
Copy link
Owner Author

rodluger commented May 2, 2018

FYI: http://eigen.tuxfamily.org/bz/show_bug.cgi?id=1281#c1
Looks like we might just have to add T() to everything...

@dfm
Copy link
Collaborator

dfm commented May 2, 2018

You definitely won't need to manually cast everything to T. The problem appears when you have inline operations on AutoDiffScalars. It is slightly annoying, but I think you'll be able to fix it pretty fast.

@rodluger

This comment has been minimized.

@rodluger
Copy link
Owner Author

rodluger commented May 3, 2018

@dfm Dude:
image

@dfm
Copy link
Collaborator

dfm commented May 3, 2018

Dude! 🎉🎈🍻

@ericagol
Copy link
Collaborator

ericagol commented May 4, 2018 via email

@rodluger
Copy link
Owner Author

rodluger commented May 4, 2018

@dfm This is how I'm currently structuring the code:

>>> import starry

>>> map1 = starry.Map()

>>> map1[1, 0] = 1

>>> map1.flux(axis=(0, 1, 0), theta=0.3, xo=0.1, yo=0.1, ro=0.1)

0.9626882655504516

>>> map2 = starry.grad.Map()

>>> map2[1, 0] = 1

>>> map2.flux(axis=(0, 1, 0), theta=0.3, xo=0.1, yo=0.1, ro=0.1)

array([[ 9.62688266e-01,  4.53620580e-04,  0.00000000e+00,
        -6.85580453e-05, -2.99401131e-01, -3.04715096e-03,
         1.48905485e-03, -2.97910667e-01]])

The modules starry and starry.grad are compiled from the same chunk of code, but with a healthy sprinkling of #ifdef STARRY_AUTODIFF to handle AutoDiffScalar-specific implementation stuff. They therefore have all the same classes, methods, properties, and docstrings, but their outputs are of course different. To get this to work, I'm #includeing that chunk of code twice, once with STARRY_AUTODIFF undefined, and once with it defined.

What do you think of this? It certainly hasn't helped my code legibility...

PS: I haven't finished implementing this, but starry.grad.Map().flux() is working on the master branch.

@rodluger
Copy link
Owner Author

rodluger commented May 7, 2018

Quick update on this. I'm slowly getting things to work with dynamically-sized derivative vectors, which is the ideal way to do this. The most important thing I've learned is that casting to type T is not enough, since type T has an Eigen::Dynamic vector length. I need to actually force the derivative vector of all intermediate variables -- and all function outputs -- to have the correct length. For instance, the following line in flux() doesn't work:

if (b <= ro - 1) return 0;

nor does

if (b <= ro - 1) return T(0);

since neither allocates space for the derivatives of the result. What I have to do is this:

if (b <= ro - 1) return 0 * ro;

where ro is one of the variables I'm differentiating.

For some reason I no longer get any compiler errors -- just segfaults when I finally run the code. Debugging this is therefore super tedious. But I'm getting the hang of it.

@dfm
Copy link
Collaborator

dfm commented May 7, 2018

That does sound tedious! Let me know if there's anything that I can do to help out.

@rodluger
Copy link
Owner Author

rodluger commented May 7, 2018

Got the flux calculation to work all the way through! Now the user can dynamically choose which and how many derivatives to compute. Gonna take a while for me to clean the code up and push to the master branch, but I think it's downhill from here!

@rodluger
Copy link
Owner Author

rodluger commented May 9, 2018

Quick update on this: I'm switching back to compile-time defined derivative vector sizes. AutoDiffScalar<VectorXd> is riddled with issues and it's 20-30 times slower than the same calculation with fixed-size derivative vectors. It's actually more efficient to compute all derivatives and let the user choose which ones to output than to selectively compute only a few derivatives.

@rodluger
Copy link
Owner Author

@dfm @ericagol Just need to write it up!
image

@rodluger
Copy link
Owner Author

Closing this issue. There are things that can still be optimized, but I'm happy!

rodluger added a commit that referenced this issue Sep 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants