Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closures #6

Merged
merged 4 commits into from
Dec 18, 2019
Merged

Closures #6

merged 4 commits into from
Dec 18, 2019

Conversation

bob-carpenter
Copy link
Collaborator

Closure spec.

@seantalts
Copy link
Member

You can post a rendered link here as well:
Rendered

Copy link
Member

@seantalts seantalts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome! Thank you so much for writing this all up, and I'm excited to get these into Stan.

In my head the only thing we need to get nailed down before merging is the capture semantics (reference v. value), and I had a question about the shorthand syntax. I'd also like to give @VMatthijs a chance to look at this but he just moved to Europe yesterday so he might not be back online until next week sometime.

that might be potentially dangerous and provide warnings?

2. What should the syntax look like? This proposal just follows the
C++ syntax for both types and closures.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ automatically deduces the return type - maybe we could too in all cases?

[unresolved-questions]: #unresolved-questions

1. Will Stan permit potentially risky capture of local variables and
function arguments? If it is allowed (as it presumably will be,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume by "risky capture" here you're referring to the capture by reference semantics? My impression is that those semantics are often confusing and infrequently useful for the programmer, modulo performance concerns. Would you mind adding some discussion of performance and other tradeoffs between capture by value and capture by reference? I would guess that would be the most difficult part of this proposal and something we'd need to settle one way or the other before implementing closures.

I would guess we'd all agree not to use R's dynamic lexical scoping semantics 😅


```
(real u) 1 / 1 + exp(-u)
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this also from C++? This seems confusing without the {}, in my humble opinion.

- If applicable, provide sample error messages, deprecation warnings, or migration guidance.
- If applicable, describe the differences between teaching this to existing Stan programmers and new Stan programmers.

For implementation-oriented RFCs (e.g. for compiler internals), this section should focus on how compiler contributors should think about the change, and give examples of its concrete impact. For policy RFCs, this section should provide an example-driven introduction to the policy, and explain its impact in concrete terms.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some templating survived here.


Recall that the inner braces define a local scope; as soon as the last
statement executes, local variables go out of scope and have undefined
values. The risk in using closures to capture local variables is that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The risk in capturing them by reference, not by value, right?

| `real(int)` | `real(int)`, `real(real)`, `int(real)` |


#### Sized types
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to consider sized types for this proposal since we can't read in or write out functions from Stan?

| `int(int)` | `int(int)`, `int(real)` |
| `real(real)` | `real(real)`, `int(real)` |
| `int(real)` | `int(real)` |
| `real(int)` | `real(int)`, `real(real)`, `int(real)` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome :)

* `T[]` is a subtype of `U[]` if `T` is a subtype of `U`, and
* `T0(T1, ..., TN)` is a subtype of `U0(U1, ..., UN)` if
`T0` is a subtype of `U0`, `U1` is a subtype of `T1`, ..., `UN` is a
subtype of `TN`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be a separate issue on the stanc3 issue tracker - we could probably implement this subtyping in stanc3 right now without breaking anything or confusing anyone.

@seantalts
Copy link
Member

To migrate some comments from the thread on Discourse here - you said there that you are proposing the capture-by-reference ([&] in C++) semantics for closures in this design doc, and that you weren't sure why we'd go with by-value. I still think it'd be good to have some discussion of the tradeoffs there in the doc, so I'll try to list some reasons I can think of to prefer capture-by-value:

  1. I think many people would find capture-by-value less confusing, but I'm not sure about that. I certainly find immutability in a threaded context most appropriate, and this is the wisdom I have seen presented, e.g. in books like Java Concurrency in Practice, in the design of Clojure and Scala, in the implementations of most (all?) new parallelism frameworks recently such as Spark, Hadoop, Actor frameworks for various languages, Go channels, etc.
  2. Semantics for threading will be hellacious if we don't make all variables in concurrent contexts immutable. In other words, we're opening up users to surprising race conditions in any parallel or concurrent context if we pass by reference and allow folks to mutate those captured variables within the lambda functions (which is what [&] allows in C++, if that's the proposed semantic). Making the referenced captures immutable within lambdas solves this issue but only in the context of threaded map_rect; if we start having other forms of concurrent programs it can become very confusing again.
  3. Capture-by-reference semantics with mutable variables would add a source of non-determinism in a threaded context, making reproducibility harder.
  4. We can pass these closures to threaded functions, but not to MPI, which destroys one of the major use-cases for lambdas with closures (a more user-friendly map_rect version).
  5. We lose other immutability guarantees about Stan programs that we can use for optimization.

I claim we can have pass-by-value semantics with pass-by-reference performance in most cases by compiling down to closures with [&] wherever appropriate but creating copies if there's any chance the captured variable is mutated after the fact.

I'm a bit sleepy still - just to double-check, there aren't any additional considerations when creating a copy of a var, right?

@bob-carpenter
Copy link
Collaborator Author

bob-carpenter commented Aug 9, 2019 via email

@wds15
Copy link
Contributor

wds15 commented Aug 9, 2019

I was always assuming for the parallelization things that the lambdas would have immutable references to its environment (like we are settling on). Anything else would be a problem.

@seantalts
Copy link
Member

seantalts commented Aug 9, 2019

Yeah, I didn't do a good job highlighting which issues occur due to lack of within-lambda immutability vs. lack of capture-by-value. If we assume within-lambda variable immutability (agreed it makes the most sense) we won't have issues with concurrency as long as it's confined to map_rect style functional concurrency. I prefer that style of concurrency and parallelism as far as user-defined constructs are concerned, but it does limit our ability to have the compiler automatically parallelize a Stan model.

I think the only thing we lose with capture-by-value would be some performance in the cases where a user writes a model that constructs a local variable (data and params are immutable already) that they want to use in a lambda function and then modify after the lambda has finished executing. But we would only lose that performance if they are actually using (accidentally or on purpose) the fact that the variable changes values in between calls to the lambda (or if the compiler can't detect that they aren't and is forced to conservatively make a copy). Do you know if anyone would want to program models like that on purpose? It seems like a confusing set of semantics to me.

On further thought I think the MPI issue could be solved with a very similar set of compiler analyses that we'd need to have cheap capture-by-value semantics.

So far capture-by-value seems to still be coming out ahead for me given that

  1. the implementations for both seem equivalently complicated, assuming we need to support map_rect
  2. capture-by-value still seems less confusing and like what people (non-R people?) would naturally expect
  3. capture-by-value allows for easier whole-program parallelization efforts, I think

@seantalts
Copy link
Member

I think either way we decide, it's good to add a section on capture-by-value vs. capture-by-reference. Would you like me to add that in?

## Variable declarations

Variable declarations for functions follows the usual syntax, with a
sized type required for block variables, an unsized type for function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean you need a "sized function type" if you allow functions at the top level? What does that buy you over an unsized function type?


Syntactically, a lambda expression is represented as a function argument
list followed by a function body. For example, in `(real u) { return
1 / (1 + exp(-u)); }`, the function argument list is `(real u)` and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth considering introducing a keyword for lambdas? For example \ or fun or function? I'm asking because the parentheses are already used for a lot of different things in Stan, as is, and probably will be used for more if we ever add tuples (unless we stick to structs instead). This is a genuine question. I'm not sure what I think, myself.


Lambdas can be mapped directly to C++ lambdas with default reference
closures. This will work because the scope of the compiled C++ is the
same as that of the Stan program.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth thinking about an implementation strategy of compiling away the higher-order functions using something like defunctionalization? That would keep the intermediate representation simpler and wouldn't add more overhead to maintaining the optimizations. On the other hand, it would require more fancy footwork and some thinking.

maps, which rely on simple inline anonymous lambdas.

Disallowing closures requires passing all values to higher order
functions as arrays along with the packing and unpacking required.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this proposal immediately allow us to simplify the signatures of the ODE-solver and map_rect significantly, if we ask people to pass in a closure rather than a mere function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, if we implement closures via something similar to lambda lifting but with automatic packing and unpacking of data and parameters.

3. Should variables be captured by value instead of reference? That
avoids them changing later and avoids dangling references, but it
also requires copies and is counterintuitive for those used to
R or Python.
Copy link
Member

@VMatthijs VMatthijs Aug 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My feeling is that the copies would be so bad that we should not go with by value. It also feels more consistent with ordinary function argument which are passed by constant reference. Again, I feel like the enclosing variables should be passed by constant reference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can avoid copies in cases where people don't mutate the variables in question in between calls to the lambda. I would say that is 99% of the use cases I've seen.

I don't think people can tell that functions are applied with constant reference vs by value in Stan currently.

R or Python.

4. Should full covariance and contravariance for function and array
types be required for assignment (including function calls)?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be very nice, but it'd be a lot of work to fix the whole language.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have to consider this as part of the closures proposal, I think :) But I'm curious (and maybe we should take this to a github issue) - what all would be required? I guess if C++ didn't support automated casting we'd have to generate code that did the casting?

1. Will Stan permit potentially risky capture of local variables and
function arguments? If it is allowed (as it presumably will be,
as it's very very limiting not to) is there a way to flag cases
that might be potentially dangerous and provide warnings?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be too hard to catch the cases of dangling references and reject them at compile time.

@VMatthijs
Copy link
Member

Thanks, @bob-carpenter ! This is great. Sorry to have taken so long to get to it. I've been swamped with work lately. Happy to discuss on hangouts at any point!

@bob-carpenter
Copy link
Collaborator Author

bob-carpenter commented Aug 29, 2019 via email

@bob-carpenter
Copy link
Collaborator Author

bob-carpenter commented Aug 29, 2019 via email

@bob-carpenter
Copy link
Collaborator Author

bob-carpenter commented Aug 29, 2019 via email

@bob-carpenter
Copy link
Collaborator Author

bob-carpenter commented Aug 29, 2019 via email

@bob-carpenter
Copy link
Collaborator Author

bob-carpenter commented Aug 29, 2019 via email

@VMatthijs
Copy link
Member

Yes, which would mean they couldn't be changed in the closure. The issue, as Sean pointed out, is that they might be changed out from under the closure. x = 2; int(int) f = (int y) { return x + y; } int z1 = f(2); // 4, as one would expect x = 3; int z2 = f(2); // 5 if pass by reference, 4 if pass by value

Agreed. They could be changed out from under the closure. But that's also true in Python. (Not that Python should be our model for good language design, but at least it's a language that tons of people seem to find intuitive to use.) Generally, I don't love mutable variables at all. But given that Stan is already a language with mutable variables and given that function arguments are currently passed by constant reference, I would expect closures also to work by constant reference as those variables are basically implicit function arguments. (At least, that's often how you implement them, using something like lambda-lifting or closure conversion.)

@VMatthijs
Copy link
Member

I'm asking because the parentheses are already used for a lot of different things in Stan, as is, and probably will be used for more if we ever add tuples (unless we stick to structs instead). This is a genuine question. I'm not sure what I think, myself.
Are you imagining problems with parsing?

I don't think parsing would be a problem with the proposal as is, but it would become a problem if we ever want to move towards types being inferred more/being optional. I guess we could then let people write (auto x) {return x^2; } and it would still be OK.
Either way, I think we could get the formal parser to work with this. I was thinking more about keeping the language readable for people who aren't experts. In that respect, I'd like an informative keyword like function or fun or func. On the other hand, I can see that there might be an argument for brevity.

@VMatthijs
Copy link
Member

VMatthijs commented Aug 30, 2019

Is that some way of unfolding a function like a macro? If not, is there a simple description or reference somewhere?

It's a whole program transformation for replacing all function types with a kind of variant type and replacing all higher-order functions with first order functions. There's a concise description with examples here: http://spivey.oriel.ox.ac.uk/corner/Defunctionalization_%28Programming_Languages%29 . I guess our problem is that this is only really easy to do in a language which already has support for variant types and tuples. You would compile away the higher-order types using the variant types and tuples and then I guess you can compile away those as well later (or give support for them in the runtime). Seeing that we don't have variant types and tuples to start with, perhaps this is not the most suitable implementation strategy for us. That's sad though as it would keep the MIR simpler. I would expect variants and tuples in an IR that you optimize on, but not closures and higher-order functions.

@bob-carpenter
Copy link
Collaborator Author

bob-carpenter commented Aug 30, 2019 via email

@bob-carpenter
Copy link
Collaborator Author

bob-carpenter commented Aug 30, 2019 via email

@seantalts
Copy link
Member

I feel pretty strongly that we shouldn't add more mutability unless there's a good reason for it because of its impacts on parallelism and the increased difficulty of reasoning about a program. Once you have closures with mutable state you effectively have an object system, which obviously some folks have historically thought was useful but is much more complex to reason about. Many of OO's early champions have backtracked away from objects with mutable state (see: the Java concurrency book).

I talked with a few users at Stan Con and all of them agreed that letting a variable change out from under a closure would be really confusing. That matches my feelings about it. I think folks also find Python's behavior here subtle and unintuitive.

I would expect closures also to work by constant reference as those variables are basically implicit function arguments. (At least, that's often how you implement them, using something like lambda-lifting or closure conversion.)

I would guess a very small percentage of our users are thinking about Stan closures being lambda-lifted or closure converted :P. I think we can easily create capture-by-value semantics that are free in the case where people don't mutate in between calls to a closure, so for me the question comes down to what kind of semantics you want users to reason about. Capturing by reference makes using closures for concurrent/parallel code really difficult both for the compiler writer and, in my opinion, the user reasoning about the language.

@seantalts
Copy link
Member

I like the idea of adding a keyword for it. We could use fun so that when people want to google it, they get these results: https://www.google.com/search?q=stan+fun. I don't have a preference other than shorter is nicer.

@seantalts
Copy link
Member

it's also worth noting that if we have closures as first class values (as they are in this proposal) then any implementation is required to create a data structure to hold the captured state anyway, and that could be equally by value or by reference. But yeah, I also just think giving users a consistent simple mental model for how closures work will work out better than assuming they know how they might be implemented in compiler textbooks. For either version, we can hopefully mitigate confusion by printing a warning if someone tries to mutate a captured variable in between calls to a closure.

@seantalts
Copy link
Member

I wanna push back against requiring variables to be immutable before they are captured.

Is your preferred model that we throw an error if someone tries to change a variable that has been captured?

I'm not really sure how much more difficult it will make the implementation either. My criteria are still satisfied in both of those worlds - that we don't allow capturing variables that can then mutate in some other scope, and I think implementation will be easy enough for both so I'm happy either way.

Separately I do want to push back against thinking of a design doc as a perfect and final solution - it just has to represent an improvement and shouldn't serve to discourage future iterations and innovations, especially in a case like this where we're explicitly discussing leaving some work for the future.

@bbbales2
Copy link
Member

Is your preferred model that we throw an error if someone tries to change a variable that has been captured?

Yeah, if that's what is convenient to do and there isn't some other flaw I'm not considering :P.

@seantalts
Copy link
Member

seantalts commented Sep 30, 2019 via email

@bbbales2
Copy link
Member

bbbales2 commented Oct 1, 2019

What about if you weren't considering convenience?

I'm only really considering convenience here :P. The two things I know I want this for are making ODEs really easy to program (no packing/unpacking) and then parallel evaluation of log densities with the new parallel stuff Sebastian has been talking about (in which case it solves the packing/unpacking problem for me).

@seantalts
Copy link
Member

seantalts commented Oct 1, 2019

I'm only really considering convenience here :P. The two things I know I want this for are making ODEs really easy to program (no packing/unpacking) and then parallel evaluation of log densities with the new parallel stuff Sebastian has been talking about (in which case it solves the packing/unpacking problem for me).

Ohhh you meant Stan user programming convenience. That's the perspective I wanted - sorry, thought you were citing compiler implementation convenience.

@bbbales2
Copy link
Member

bbbales2 commented Oct 1, 2019

Well here I am talking about compiler convenience: #6 (comment), but I answered your question in terms of user perspective cause I don't really know what's convenient or not language side :D.

@bob-carpenter
Copy link
Collaborator Author

bob-carpenter commented Oct 2, 2019 via email

@seantalts
Copy link
Member

I'm pretty sympathetic to Ben's programming convenience argument - I think that makes sense for a direction to go in. I'm happy with that (allowing folks to capture anything, if it's not a block variable we copy it, anything captured is immutable after capture) or with Bob's last proposal (only capturing block variables, immutable after capture). Bob I think it's up to you, this is your design doc :) We could even make it talk about a two phase implementation if you want. As the reviewer for this I guess I'm happy with all of the options we're currently talking about so just let me know when it's time to do the final check and merge.

Just to quickly throw out another option - we could change the definition of what is allowed to "only capturing variables that will be live as long as the closure is alive." This is somewhat confusing to explain properly but it would allow the above stuff (capturing transformed data and marking it immutable post-capture) as well as defining and capturing variables in the model block (and marking immutable post-capture). This is because all variables defined in the model block are local and don't leave the model block scope.

Maybe we can talk this out today at the meeting - I don't feel strongly about any of these options but would like to get one of these routes merged soon.

@bob-carpenter
Copy link
Collaborator Author

I created the branch and wrote the first draft, but the project is jointly owned by everyone and I don't get the final say. I specifically don't want to make the existence of a draft an argument against doing something alternative that's better.

All else being equal, I think it's going to be convenient to capture locals. I'm even OK letting undefined behavior rest if we can't catch captures that go out of scope. Anything involving scope is going to be complicated for most of our users, the majority of whom come from R which is the wild west with lexical dynamic scope and conditionals not opening scope blocks.

Capturing variables at the top-level model block won't go out of scope, but we can have local blocks in the model that will go out of scope.

model {
  real(real) f;
  if (theta > 0.5) {
    real j = 32;
    f = (real x) { return x + j; }
  } else {
    f = (real x) { return x^2; }
  }
  // j now out of scope, f broken

In general, we won't even be able to decide if f ever gets a definition if the block inside is complex enough.

@seantalts
Copy link
Member

seantalts commented Oct 3, 2019 via email

@bob-carpenter
Copy link
Collaborator Author

bob-carpenter commented Oct 3, 2019 via email

@seantalts
Copy link
Member

@enetsee do you have any thoughts on this? Seems like we're between some rules intended to let folks capture things that we can nicely capture by reference versus just capturing anything they'd like and copying if necessary (and making anything captured immutable in the parent scope after capture).

@bob-carpenter
Copy link
Collaborator Author

bob-carpenter commented Oct 7, 2019 via email

@seantalts
Copy link
Member

I'm not sure I totally get the example, but it seems like maybe we should just go with capture-by-value and try to have the compiler be smart and save copies where it can. That seems to avoid a lot of these really tricky situations, right?

@bob-carpenter
Copy link
Collaborator Author

bob-carpenter commented Oct 28, 2019

The example's important. Let's work backwards. For MPI, we need to have fixed data that we can ship to the core that's going to do the work on it. How do we maintain that with closures when the closures can close different pieces of data each trip through a program?

For example,

if (theta > 0)
  f = g;
else
  f = h;
map_rect(f, ...);

Now we don't know if g or h is going to be called in map_rect. This wasn't a problem before because there were no function variables. But even now (Stan 2.21), we could have something like:

for (i in 1:2)
  map_rect(f, ..., std::vector<int>{ i });

so that the integer data varies by loop instance. What happens now in this case?

@seantalts
Copy link
Member

In the latter case I think it ships them at runtime. @wds15 can you confirm? I may be misremembering.

So for closures with values associated, in cases like that we'd need to essentially create first class closures that are runtime C++ objects with associated values and ship them over in the same way, right?

@seantalts
Copy link
Member

@bob-carpenter once you decide which way you'd like to go we can merge this design doc and get started on the ragged one :)

@bob-carpenter
Copy link
Collaborator Author

bob-carpenter commented Dec 16, 2019 via email

@wds15
Copy link
Contributor

wds15 commented Dec 17, 2019

The design of map_rect requires you to assign a C++ type to every call of map_rect. That type is expected to label uniquely the data going into the function. So the call as outlined above would be illegal. map_rect will ship the data upon the first call and then not any more. Maybe we need additional safety checks in the parser to avoid such a call?

@bob-carpenter
Copy link
Collaborator Author

the call as outlined above

There were two---in a loop for repeated calls or in a conditional for unpredictable calls. Either way, do we need to ensure map_rect is only used from top-level scope (not nested in loops, conditionals, etc.) and that the function used is data? What else do we need to ensure.

Whatever we do going forward, if the above are illegal now, we need to make that clear to users. I hadn't realized they were illegal.

@seantalts
Copy link
Member

There were two---in a loop for repeated calls or in a conditional for unpredictable calls. Either way, do we need to ensure map_rect is only used from top-level scope (not nested in loops, conditionals, etc.) and that the function used is data? What else do we need to ensure.

I think that's it. One day we could make the compiler smarter and generate the code to ship the data itself, similar to the way that @rok-cesnovar recently did the same thing for GPUs. But for now perhaps enforcing those conditions with nice error messages is a good thing to add, agreed.

What's left to decide?

What kind of capturing semantics you want to use - capture-by-value, capture-by-reference, or one of the few intermediate ones we discussed.

@wds15
Copy link
Contributor

wds15 commented Dec 17, 2019

We should discuss this separately...but I wonder how @rok-cesnovar solved the issues with allowing shipping the correct way. We need to put some more restrictions on map_rect, but allowing it still to be called from within functions would be vital to keep my Stan programs alive at least.... but let's stop this side-track conversation here for now.

The closure stuff is something I am really looking forward to.

@bob-carpenter
Copy link
Collaborator Author

bob-carpenter commented Dec 18, 2019 via email

@seantalts
Copy link
Member

Okay great! I hadn't realized we were good to go with this. I'll approve & merge. Thanks everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants