-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Short-circuit behavior for the &&
and ||
operators
#538
Conversation
The main motivation here is that we were apparently using bit shifting with signed shift amounts even though modern Go toolchains seem to raise that as an error if the go.mod file selects a language version less than Go 1.13. Go 1.17 is therefore newer than we strictly need to solve that particular problem, but gets us caught up with a relatively release of the language. This alone doesn't break compatibility for anyone using older versions of Go with HCL, since the Go toolchain will still attempt to compile modules targeting later versions and will only mention the newer language version if compilation would already have failed for some other reason.
Previously we just treated all binary operators as being left-associative, which for many of them didn't make a lot of difference anyway because they were associative operators. The logical AND and OR operators are also effectively associative the way we have them implemented right now, because HCL always evaluates both sides of a binary operator anyway and so there's no way to depend on the associativity for either of these operations. However, we'd like to implement a similar short-circuit behavior as we see in many other similar languages, at which point the evaluation order of the operations would become important in order to constrain which expressions get fully evaluated and which are "short-circuited". Treating these operators as right-associative instead of left-associative will make the subsequent implementation more intuitive, since the LHS of the outermost operation will decide whether to evaluate the RHS, rather than recursing into the LHS completely first and then unwinding if the innermost term causes the short-circuit. Since we expect HCL expression evaluation to always be side-effect-free it doesn't technically matter what associativity we use -- the result would be the same either way -- but this approach will make the control flow during evaluation match intuition about how a short-circuit behaves. This commit does not yet actually implement the short-circuit behavior, which will hopefully follow in a subsequent commit.
Previously we just always evaluated both operands of any binary operator and then executed the operator. For the logical operators that was inconsistent with their treatment in several other languages with a similar grammar to ours, leading to users being surprised that HCL didn't short circuit the same way as their favorite other languages did. Our initial exclusion of short-circuiting here was, as with various other parts of HCL's design, motivated by the goals of producing consistent results even when unknown values are present and of proactively returning as many errors as possible in order to give better context when debugging. However, in acknowledgement of the fact that logical operator short-circuiting has considerable ergonomic benefits when writing out compound conditional expressions where later terms rely on the guarantees of earlier terms, this commit implements a compromise design where we can get the short-circuit benefits for dynamic-value-related errors without losing our ability to proactively detect type-related errors even when short-circuiting. Specifically, if a particular operator participates in short-circuiting then it gets an opportunity to decide for a particular known LHS value whether to short-circuit. If it decides to do so then HCL will evaluate the RHS in a type-checking-only mode, where we ignore any specific values in the variable scope but will still raise errors if the RHS expression tries to do anything to them that is inconsistent with their type. If the LHS of a short-circuit-capable operator turns out to be an unknown value of a suitable type, we'll pessimistically treat it as a short-circuit and defer full evaluation of the RHS until we have more information, so as to avoid raising errors that would be guarded away once the LHS becomes known. The effect of this compromise is that it's possible to use the short-circuit operators in common value-related guard situations, like checking whether a value is null or is a number that would be valid to index a particular list: foo != null && foo.something idx < 3 && foo[idx] On the other hand though, it is _not_ possible to use the behavior to guard against problems that are related to types rather than to values, which allows us to still proactively detect various kinds of errors that are likely to be mistakes: foo != null && fo.something # Typoed "foo" as "fo" foo != null && foo.smething # Typoed "something" as "smething" num < 3 && num.something # Numbers never have attributes Those coming from dynamic languages will probably still find these errors surprising, but HCL's other features follow a similar sort of hybrid model of statically checking types where possible and that tradeoff seems to have worked well to make it possible to express more complicated situations while still providing some of the help typically expected from a static type system for "obviously wrong" situations. It should typically be possible to adjust an expression to make it friendly to short-circuit guard style by being more precise about the types being used.
581f975
to
4790ab6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This design and implementation makes sense to me. I had a concern about the asymmetry of handling LHS vs RHS unknown values, but I don't see any way to support the goals of short circuiting as flow control alongside the maximally flexible promotion of unknown to bool suggested in this downstream Terraform issue.
That is, I think to fully implement the truth tables in the linked issue, we would always need to evaluate both sides of the expression, which would result in errors for many of the use cases this change aims to permit. Hopefully documenting the evaluation order as part of this change will clarify the situation for anyone expecting unknown || true
to be promoted to true
.
// The child does not initially have any of its own functions defined, and so | ||
// it can still inherit any defined functions from the reciever. | ||
// | ||
// Because is function effectively takes a snapshot of the variables as they |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Because is function effectively takes a snapshot of the variables as they | |
// Because this function effectively takes a snapshot of the variables as they |
Thanks for linking to that Terraform issue, which I hadn't considered looking for because my focus here was on the error handling. I think it would be worth exploring a more symmetrical model as you are hinting at here. We can in principle evaluate both sides in normal mode first and then discard the diagnostics from "the other side" if one of them turns out to be a short-circuit; that would involve running the other side logic a second time if the first run had encountered an error, but we don't typically try to juice the performance of the error return path (and have plenty precedent of doing extra work there to give better feedback) so I think it's promising. This also prompted me to ponder if there is any reason why the value error discarding couldn't be symmetrical for consistency with unknown values being handled symmetrically. I've not yet thought deeply about it, but I suspect that the reason why left-to-right asymmetrical short circuit is the prominent design in familiar languages is because of the need to control for side-effects; in a language like HCL which assumes side-effect-free evaluation, perhaps symmetrical short-circuiting is reasonable and more helpful. 🤔 (There are some notable ways that applications push the bounds of this side-effect-free assumption, such as Terraform's |
I'm no longer working at HashiCorp after today, so I won't be able to interact with the branch of this PR anymore and so I'm going to close it but hope it'll still be a useful starting point for a future PR with a similar goal! |
Previously we just always evaluated both operands of any binary operator and then executed the operator.
For the logical operators that was inconsistent with their treatment in several other languages with a similar grammar to ours, leading to users being surprised that HCL didn't short-circuit the same way as other languages they had prior experience with.
For our purposes here, "short-circuit" is a colloquial term for ignoring later operands of an operation if earlier operands are sufficient to decide the result, as discussed in Short-circuit evaluation on Wikipedia.
Since HCL expressions are assumed to be side-effect-free, the primary motivation for short-circuit evaluation in this language is to permit using the LHS of a conditional operator as a "guard" to avoid certain kinds of error on the RHS which are "cancelled out" by the condition in LHS. A secondary motivation is that if LHS is known and activates short-circuit, the "known-ness" of RHS is not considered when deciding the result, and so e.g.
true && unknown(bool)
can returntrue
instead ofunknown(bool)
.Our initial exclusion of short-circuiting here was, as with various other parts of HCL's design, motivated by the goals of producing consistent partial results even when unknown values are present, and of proactively returning as many errors as possible in order to give better context when debugging.
However, in acknowledgement of the fact that logical operator short-circuiting has considerable ergonomic benefits when writing out compound conditional expressions where later terms rely on the guarantees of earlier terms, this commit implements a compromise design where we can get the short-circuit benefits for dynamic-value-related errors without losing our ability to proactively detect type-related errors even when short-circuiting, and with a suitable approximation for situations where the LHS value is unknown.
Specifically, if a particular operator participates in short-circuiting then it gets an opportunity to decide for a particular known LHS value whether to short-circuit. If it decides to do so then HCL will evaluate the RHS in a type-checking-only mode, where we ignore any specific values in the variable scope but will still raise errors if the RHS expression tries to do anything to them that is inconsistent with their type.
If the LHS of a short-circuit-capable operator turns out to be an unknown value of a suitable type, we'll pessimistically treat it as a short-circuit and defer full evaluation of the RHS until we have more information, so as to avoid raising errors that would be guarded away once the LHS becomes known.
The effect of this compromise is that it's possible to use the short-circuit operators in common value-related guard situations, like checking whether a value is
null
or is a number that would be valid to index a particular list:foo != null && foo.something
idx < 3 && foo[idx]
On the other hand though, it is not possible to use the behavior to guard against problems that are related to types rather than to values, which allows us to still proactively detect various kinds of errors that are likely to be mistakes:
foo != null && fo.something # Typoed "foo" as "fo"
foo != null && foo.smething # Typoed "something" as "smething"
num < 3 && num.something # Numbers never have attributes
Those coming from more dynamic languages will probably still find these errors surprising, but HCL's other features follow a similar sort of hybrid model of statically checking types where possible and that tradeoff seems to have worked well to make it possible to express more complicated situations while still providing some of the help typically expected from a static type system for "obviously wrong" situations.
It should typically be possible to adjust an expression to make it friendly to short-circuit guard style by being more precise about the types being used. If all else fails, the existing conditional operator
... ? ... : ...
offers a way to be more explicit about the intended sequence of evaluation and thereby "get away with" more kinds of errors on its unvisited arm.A change like this could, if designed or implemented incorrectly, cause regressions for existing configuration, and so we first need to convince ourselves that this change will not violate the guarantees we typically consider important for backward-compatibility, including:
I believe that this compromise meets all of those constraints, but given the potential scope of disruption of an incorrect design or implementation here I would like some second opinions. Therefore I would appreciate both design and early implementation review even though I currently have this marked as Draft.
If this design seems plausible (even if we identify some bugs in the implementation of that design) then another important step here will be to describe the new behavior more completely in the specification. The current native syntax specification actually lacks any specific language about the behaviors of the individual binary and unary operators at all, so that may also be a good opportunity to fill that gap and describe the intended correct behavior of all of them.