Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/GLMs Accept Scalars #687

Merged

Conversation

VMatthijs
Copy link
Member

@VMatthijs VMatthijs commented Dec 4, 2017

Submission Checklist

  • Run unit tests: ./runTests.py test/unit
  • Run cpplint: make cpplint
  • Declare copyright holder and open-source license: see below

Summary:

This makes the GLM functions accept scalars for its parameters (in addition to vectors).

Intended Effect:

Make it easier to use GLMs.

How to Verify:

Run included tests.

Side Effects:

Wrote function for assigning a matrix to either a broadcast array or a matrix, in order to make it easier to
define derivatives w.r.t. arguments that can either be scalar or vector.
Fixed tests for GLMs.
Found and fixed bug in scale parameter gradient of normal_id_glm.

Documentation:

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company):
Matthijs Vákár
By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

@VMatthijs
Copy link
Member Author

VMatthijs commented Dec 4, 2017

Hi @seantalts , Is there any reason we want to keep stan/math/prim/scal/meta/broadcast_array.hpp ? Could we move it all into stan/math/prim/mat/meta/broadcast_array.hpp ?
(Because the way I've done things currently is not allowed. I want to reference Eigen::Matrix in stan/math/prim/scal/meta/broadcast_array.hpp .)

Also, am I right in thinking that the point of broadcast_arrays is to represent the derivatives w.r.t. scalars in a way that has array like access? Then, does my idea of assignment of an Eigen::Matrix (which will have dimension (1,1)) to a broadcast_array make sense? This would make it much easier for me to get the GLMs to also accept scalars for its parameters and not only vectors.

@seantalts
Copy link
Member

IIRC prim broadcast array is used in other files in prim, especially for the empty broadcast array. That said, I think it's now unofficially on our todo list to collapse the prim/arr/mat directory structure(! And possibly the rev/fwd/mix one as well).

I think broadcast array is for taking a scalar and making it look like an array - so far this has meant allowing operator[] and possibly giving it a .size(), I forget. Your last question is missing a verb but I think yes? That it would be in line with the design to allow a scalar to be accessed the way a matrix is accessed as well as one way of extending broadcast_array to work with multivariate math.

@VMatthijs
Copy link
Member Author

Right, I meant to ask if that assignment made sense. (Edited now.)

Argh. This is really annoying. I feel that the template metaprogramming really gets in the way of writing straightforward code, especially in combination with the prim/arr/mat directory structure.

I don't really know how to code up this assignment of an Eigen::Matrix to a broadcast array obtained as the derivative w.r.t. a scalar then, seeing that would be a stan/math/prim/scal/meta/broadcast_array.hpp.

@VMatthijs
Copy link
Member Author

VMatthijs commented Dec 4, 2017

I wonder if the easiest option would be for me to start collapsing the prim/arr/mat directory structure, by moving everything to mat. Would that be a reasonable way to go about collapsing that structure? Slowly moving everything to mat?

An alternative for now would be to leave this and to just add the GLMs in such a way that they demand to get a vector for their parameters and reject scalars. Then we could return to it later. It's a bit ugly though.

@VMatthijs
Copy link
Member Author

Also, another silly question: is there supposed to be any difference between a scalar_seq_view and a broadcast_array? They seem pretty much the same to me.

@seantalts
Copy link
Member

I think you'd find that moving broadcast array to mat would trickle down and touch many files. I think that's probably best left to a separate pull request. Why can't you put a specialization for matrices in mat? I'm not sure I understand the problem; what's the code you wish you could write but can't figure out how to fit it into the directory structure?

I probably won't have time to really dig into this stuff until after the course this week but there was some reason for broadcast_array and scalar_seq_view to be different, other than that one of them is mostly empty constructions of the thing... But maybe we could have used a separate empty_scalar_seq_view instead of empty_broadcast_array? I will have to look around more next week to figure out why. I made both of those around the same time so I would expect there to be some real reason they were different but I could have messed up or things could have changed since then.

@VMatthijs
Copy link
Member Author

Thanks for steering me away from the solution of collapsing the scal/arr/mat hierarchy! That probably would've been a foolish amount of work.

I've now found a workaround that I think is sufficiently clean and will update the pull request.

Really, I think a lot of the pain in the template meta-programming that ends up resulting is silly amounts of boring code giving definitions and their specializations is the lack of if constexpr in C++11. Sadly it's only been added in C++17 and I don't imagine we'll transition to that any time soon, right?

If you have that sort of compile time branching construct, you can avoid having to write a lot of redundant templated code just to satisfy the type checker.

@seantalts
Copy link
Member

There are some compile-time branching constructs available in C++11, though I think through using boost and only at the template level (which can make code that tries to use them pretty boilerplate-heavy). There's that whole SFINAE pattern and that kind of thing. Not sure if that helps this situation, will take a look later.

I think we are typically worried most about the version of GCC that ships with RTools for Windows, more than anything else. They just upgraded to GCC 4.9.3 recently.

@VMatthijs
Copy link
Member Author

Thanks, @seantalts ! Indeed, SFINAE would do the trick here, but I got away with using something a bit simper, actually.

@syclik
Copy link
Member

syclik commented Dec 4, 2017

@VMatthijs, it might be easy enough to just create the 12 different meta traits for now: http://discourse.mc-stan.org/t/include-order-of-headers-in-the-math-library/2368/5

But... if you do find a general way to do this all, I'm all ears.

@VMatthijs VMatthijs changed the title [WIP] Eigen::Matrix Assignment to Broadcast Arrays Feature/GLMs Accept Scalars Dec 4, 2017
@VMatthijs
Copy link
Member Author

That's what I've done now, sadly. Haha.

This is ready for review, I think.

@bob-carpenter
Copy link
Contributor

Let's make sure everyone's on board with collapsing into mat before doing it. It should be discussed at a meeting. I'm not there this week, but Im OK with that change.

Some of C++17 is supported---not sure which compilers we're going to work on, but if they all have a C++17 feature, it's fair game.

@syclik
Copy link
Member

syclik commented Dec 5, 2017 via email

…t apparently still are necessary as Stan code compiles to them
@wds15
Copy link
Contributor

wds15 commented Dec 11, 2017

FYI, the if_constexpr is supported since gcc 7!!

https://gcc.gnu.org/projects/cxx-status.html

That will take a while to trickle down to the bigger user base..

@VMatthijs
Copy link
Member Author

Thanks, @wds15 ! I never know which features we are allowed to use though. Is it documented somewhere which compilers we support exactly? Presumably the one in VS15 is the most behind wrt language features?

By the way, maybe hold off on reviewing this pull request. I might return to this and do just a little bit more testing soon, just to be sure.

@seantalts
Copy link
Member

seantalts commented Dec 12, 2017 via email

@bob-carpenter
Copy link
Contributor

Let us know when you want this reviewed again.

@bob-carpenter
Copy link
Contributor

I think this one got hosed because of the auto-reformatting. We should've reviewed all the pending pull requests that were passing before doing this to you. Really sorry about that. Hopefull this won't be too hard to sort out. I'd love to get all these in for the next release.

@VMatthijs
Copy link
Member Author

I'm back in town and will fix this today! Looking forward to finally getting the GLMs in. Sorry for the delay! Had a bunch of other deadlines.

@VMatthijs
Copy link
Member Author

Right. This is finally ready for review. Sorry that it was put on hold for a bit!

@seantalts
Copy link
Member

Hey! Sorry, just getting tagged in here and haven't been following - can you kind of tee me up and write up the idea behind the changes to operands and partials? I took a look and it seems like there's some new stuff and I'm not sure what it's addressing - no doc for set_partials, not sure why that over operator=, etc. Thanks!

@bbbales2
Copy link
Member

Sure thing!

The quick version is (@VMatthijs can expand on this if it's not enough) that sometimes the partials_ in the ops_partial_edge classes are broadcast_array<Dx> and sometimes they are Eigen::Matrix<Dx, -1, 1>s. In either case, @VMatthijs needs to assign partials he has computed in an Eigen::Matrix<Dx, -1, 1>.

So this is why I thought a set_partials method made sense. It's the underlying data structure inside ops_partials_edge is changing, so I think that calls for a get/set interface to simplify how that happens.

Otherwise @VMatthijs has to write metaprograms on the outside that know how to assign to internal variables of ops_partials_edge.

Why not operator= is I don't think we want to say edges_ = our_partials, because the right hand side isn't the same thing as the left hand side? I'm not sure on this point. I though a set mechanism was alright but I could be convinced otherwise.

@VMatthijs
Copy link
Member Author

VMatthijs commented Aug 10, 2018

Hi @seantalts , the idea of set_partials is to have a uniform way of assigning an Eigen::Matrix to the partials, both when the partials are a broadcast array (autodiff w.r.t. scalar) and when they are an Eigen::Matrix (autodiff w.r.t. vector/matrix). operator= would not take an Eigen::Matrix for scalar autodiff. I am now wondering though if I should just create a specialisation of operator= that lets us assign an Eigen::Matrix to a broadcast array. Would that be preferable?

In hindsight, that could actually be cleanest, to just add an operator= to the class broadcast_array<T> which takes an Eigen::Matrix<T, R, C> m and does something like this[0] = m(0, 0);.

@seantalts
Copy link
Member

seantalts commented Aug 10, 2018 via email

@VMatthijs
Copy link
Member Author

VMatthijs commented Aug 10, 2018

OK. I'll try to change it to that. That makes more sense probably. I might keep some of the extra tests I've created by operands_and_partials.
I'll let you know when it's done.

Please ignore the comment about the multivariate operands and partials.

Thanks for the feedback, guys! It'll definitely end up being cleaner this way.

@VMatthijs
Copy link
Member Author

Changed as per our discussion above. You were right. This is nicer.

As far as I'm concerned this is ready to be merged once it passes all tests on Jenkins.

Copy link
Member

@seantalts seantalts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have just a question - might be fine but want to make sure I understand before approving.

double two = 2.0;
broadcast_array<double> ba2(two);
std::vector<double> vd = {{1.0, 2.0}};
ba2 = vd;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I get what's happening here - I see above how a broadcast array will just take the first element... Is that always correct and fine to do? Would anyone be surprised that it dropped the rest of the elements?

Also I would not that this doesn't work with matrices, but I suppose that's probably fine and it just won't compile.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, that is the behaviour I want for my application. People could potentially be surprised... It will only ever be used in my application on vectors of length one. I guess I could have it perform a check to make sure the length of the vector is 1 and throw an error otherwise. Would that not slow things down though? The problem is that the length is dynamic, so it would need to be a runtime check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it would slow things down. Do you want to only use the first one because you know that they should be all the same? Or that the first one will correspond to this one in this case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, I have a situation where I have a vector of (dynamic) length 1 that I want to assign to a broadcast array. I know it has length 1, but I cannot convince the type checker of that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, got that part. But going up a level to operands and partials - is it likely that any time you have a broadcast array here and you try to assign to it, you will also have an array of length one? I'm basically trying to figure out if this is a reasonable API for future users of operands and partials.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd say so. You'd have either

  • an Eigen vector of length 1;
  • a std vector of length 1; or
  • a scalar that you'd like to assign to the broadcast array.
    What can be problematic, however, is that the Eigen vector may be a composite (e.g. a product of an Eigen matrix and an Eigen vector and is therefore technically an Eigen::Product). This means that it would not suffice to have an assignment statement with r.h.s. Eigen vector. You'd have to include all sorts of complicated Eigen types. Basically, Eigen is a pain when it comes to types. That's why I chose an assignment statement with such a ridiculously generic type. A hack I could imagine could work is to choose an informative name for the assign statement, that would prevent users from misinterpreting its functionality.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github didn't notify me about your reply! I personally think this is fine as long as it acts correctly in the use cases we are expecting. Maybe add a comment about this behavior to the operands and partials high level doc if you don't mind?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I'll add the documentation to broadcast_array. (Presumably, that's what you meant?) Or would you also like me to add it to operands_and_partials? (operator= is a method of broadcast_array, but operands_and_partials uses a broadcast_array internally in some instances.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when you're looking at broadcast_array the behavior is reasonably clear (comment drawing attention to it wouldn't be out of place though). But when you're a user of operands_and_partials, it would be a really non-obvious piece of behavior when assigning to partials. So I would add the comment to the doc string for operands_and_partials somewhere since viewed from that level it's slightly tricky. You can even write that this is almost definitely what they want, but are just drawing attention to it just in case. If you think it's so obviously the correct behavior from the operands_and_partials viewpoint that this comment would serve only to distract, then please use your judgment here and ignore this request.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I'll add the comments now and will hope that the PR doesn't need to be reapproved before it can be merged.

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Aug 11, 2018 via email

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Aug 11, 2018 via email

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Aug 11, 2018 via email

@VMatthijs
Copy link
Member Author

@bob-carpenter , the dilemma is that operator= looks nicer in practice, but has the potential to be confusing. For instance, I want to be able to use it if the lhs is a broadcast_array and the rhs is a vector (of unknown length). I want the effect to be to assign the first element of the vector to the broadcast array. Now, I think the choice is:
A) choose an informative name, rather than operator= for this sort of assignment, to make sure users understand what it does in case the rhs is a vector of length >1
B) use operator= and have a dynamic check that throws an error if length is >1.
Or do you think operator= without error checking is ok?

@seantalts
Copy link
Member

I commented above, but I think if we just add a bit of doc about this to the top-level operands and partials doc to call attention to it, it would be fine since operands and partials has pretty specific use cases and it works correctly for all of them.

@VMatthijs
Copy link
Member Author

OK, @seantalts , I added the comments you requested. :-) This is now ready to be merged. Of course, Jenkins wants to retest everything first, even though only a couple of comments were added...

seantalts
seantalts previously approved these changes Aug 16, 2018
@seantalts
Copy link
Member

Can we add that comment to the prim one as well? Sorry :( The doc for these is spread out all over the place because they're all just disconnected template specializations. Not sure a better way to do this; at the time it was copy and paste doc to all the specializations and tweak slightly.

@VMatthijs
Copy link
Member Author

VMatthijs commented Aug 16, 2018

Doesn't prim/operands_and_partials use an empty_broadcast_array internally, meaning that assignment should throw an error?
Do you mean fwd? I've added the comment there too.

@VMatthijs
Copy link
Member Author

@seantalts , could you approve again?

@seantalts
Copy link
Member

I meant prim because I'm worried that that's where people often look for the definition of a template class, at least from what I can tell. Sorry for the churn.

@VMatthijs
Copy link
Member Author

OK!

@VMatthijs
Copy link
Member Author

Done.

@seantalts seantalts self-requested a review August 16, 2018 15:52
@VMatthijs VMatthijs merged commit 3ed0576 into stan-dev:develop Aug 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants