From 13c1341b3f0e786e20f79c0b414c7f1cf4b0beee Mon Sep 17 00:00:00 2001 From: Fred Hebert Date: Sat, 5 Dec 2020 02:12:10 +0000 Subject: [PATCH] Update EEP-49 as per mailing list comments --- eeps/eep-0049.md | 738 +++++++++++++++++++++-------------------------- 1 file changed, 333 insertions(+), 405 deletions(-) diff --git a/eeps/eep-0049.md b/eeps/eep-0049.md index adc8b25..63ebd9c 100644 --- a/eeps/eep-0049.md +++ b/eeps/eep-0049.md @@ -11,13 +11,13 @@ EEP 49: Value-Based Error Handling Mechanisms Abstract ======== -This EEP adds a contextual `<~` operator to `begin ... end` expressions, -which allows them to be usable for value-based error handling, based on -standard `{ok, term()} | {error, term()}` return value types. +This EEP extends the `begin ... end` expression to make it a construct +usable for control flow and value-based error handling based on pattern +matching. -This lets `begin ... end` become a control flow construct to replace or -simplify deeply-nested `case ... end` expressions, and prevent using -exceptions for control flow. +This introduces `begin ... else ... end` along with a new contextual use +of the `<-` operator to replace or simplify deeply-nested `case ... end` +expressions, and prevent using exceptions for control flow. Copyright @@ -37,66 +37,73 @@ The current syntax for a `begin ... end` expression is: The expression does not have a restricted scope, and is mostly used to group multiple distinct expressions as a single block. We propose a new -type of expressions (denoted `UnwrapExprs`), only valid within a +type of expressions (denoted `MatchOrReturnExprs`), only valid within a `begin ... end` expression: begin - Exprs | UnwrapExprs + Exprs | MatchOrReturnExprs end -`UnwrapExprs` are defined as having the following form: +`MatchOrReturnExprs` are defined as having the following form: - Pattern <~ Expr + Pattern <- Expr -This definition means that `UnwrapExprs` are only allowed at the +This definition means that `MatchOrReturnExprs` are only allowed at the top-level of `begin ... end` expressions. -The `<~` operator takes the value return by `Expr` and inspects it. -If the value is a tuple of the form `{ok, Val}`, it unwraps `Val` from -the tuple, and matches it against `Pattern`. +The `<-` operator takes the value return by `Expr` and pattern matches on +it against `Pattern`. -If the pattern matches, all variables from `Pattern` are bound in the -local environment, and the full value `{ok, Val}` is returned by the -`UnwrapExpr`. If the value does not match, a `{badunwrap, Val}` error -is raised. +If the pattern matches, all variables from `Pattern` are bound in the local +environment, and the expression is equivalent to a successful `Pattern = Expr` +call. If the value does not match, the `begin ... end` expression returns the +failed expression directly. -A special case exists when `Pattern` is the match-all -variable (`_`), which on top of allowing the value to be considered a successful -unwrapping if the returned value from `Expression` is `{ok, term()}`, -it also considers the atom `ok` to be valid as well. +A special case exists in which we extend `begin ... end` into the following form: -If the value is a tuple of the form `{error, Reason}`, then the entire -`begin ... end` expression is short-circuited and returns `{error, -Reason}`. The variables that were bound in there remain bound, the -rest are undefined. + begin + Exprs | MatchOrReturnExprs + else + Pattern -> Exprs; + ... + Pattern -> Exprs + end -The compiler should warn about any variable that is used after the -`begin ... end` expression that was bound in or after the first -`UnwrapExpr` encountered within the block. +This form exists to capture non-matching expressions in a `MatchOrReturnExprs` +to handle failed matches rather than returning their value. In such a case, an +unhandled failed match will raise an `else_clause` error, otherwise identical to +a `case_clause` error. -If the value returned does not match any of `{ok | error, term()}` as a -type, a `{badunwrap, Val}` error is raised. +This extended form is useful to properly identify and handle successful and +unsuccesful matches within the same construct without risking to confuse +happy and unhappy paths. Given the structure described here, the final expression may look like: begin Foo = bar(), - X <~ id({ok, 5}), - [H|T] <~ id({ok, [1,2,3]}), + {ok, X} <- id({ok, 5}), + [H|T] <- id([1,2,3]), ... + else + {error, Y} -> + {ok, "default"}; + {ok, _Term} -> + {error, "unexpected wrapper"} end Do note that to allow easier pattern matching and more intuitive usage, -the `<~` operator should have associativity rules lower than `=`, such that: +the `<-` operator should have associativity rules lower than `=`, such that: begin - X = [H|T] <~ exp() + X = [H|T] <- exp() end -is a valid `UnwrapExp` equivalent to the non-infix form `'<~'('='(X, [H|T]), -exp())`, since reversing the priorities would give `'='('<~'(X, [H|T]), -exp())`, which would create an `UnwrapExp` out of context and be invalid. - +is a valid `MatchOrReturnExprs` equivalent to the non-infix form `'<-'('='(X, +[H|T]), exp())`, since reversing the priorities would give `'='('<-'(X, [H|T]), +exp())`, which would create a `MatchOrReturnExp` out of context and be invalid. +In a nutshell, the matching rules for the `<-` operator should align closely +with the usage known in list comprehensions. Motivation ========== @@ -115,9 +122,8 @@ large number of programming languages. The language supports: So why should we look to add more? There are various reasons for this, incuding trying to reduce deeply nested conditional expressions, -cleaning up some messy patterns found in the wild, providing a better -separation of concern when implementing functions, and encouraging more -standard and idiomatic interfaces. +cleaning up some messy patterns found in the wild, and providing a better +separation of concerns when implementing functions. Reducing Nesting ---------------- @@ -157,14 +163,28 @@ construct: commit_write(OpaqueData) -> begin - _ <~ disk_log:sync(OpaqueData#backup.file_desc), - _ <~ disk_log:close(OpaqueData#backup.file_desc), - _ <~ file:rename(OpaqueData#backup.tmp_file, OpaqueData#backup.file), + ok <- disk_log:sync(OpaqueData#backup.file_desc), + ok <- disk_log:close(OpaqueData#backup.file_desc), + ok <- file:rename(OpaqueData#backup.tmp_file, OpaqueData#backup.file), {ok, OpaqueData#backup.file} end. -The semantics of this call are entirely identical, except that it is now -much easier to focus on the flow of individual operations. +Or, to protect against `disk_log` calls returning something else than `ok | +{error, Reason}`, the following form could be used: + + commit_write(OpaqueData) -> + begin + ok <- disk_log:sync(OpaqueData#backup.file_desc), + ok <- disk_log:close(OpaqueData#backup.file_desc), + ok <- file:rename(OpaqueData#backup.tmp_file, OpaqueData#backup.file), + {ok, OpaqueData#backup.file} + else + {error, Reason} -> {error, Reason} + end. + +The semantics of these calls are identical, except that it is now +much easier to focus on the flow of individual operations and either +success or error paths. Obsoleting Messy Patterns ------------------------- @@ -204,23 +224,23 @@ as: pre_check(Action, User, Context, ExternalThingy) -> begin - _ <~ check_request(Context, User), - _ <~ check_permissions(Action, User), - _ <~ check_dispatch_target(ExternalThingy), - _ <~ check_condition(Action, Context), + ok <- check_request(Context, User), + ok <- check_permissions(Action, User), + ok <- check_dispatch_target(ExternalThingy), + ok <- check_condition(Action, Context), dispatch(Action, User, Context) - end + end. And if there was a need for derived state between any two steps, it would be easy to weave it in: pre_check(Action, User, Context, ExternalThingy) -> begin - _ <~ check_request(Context, User), - _ <~ check_permissions(Action, User), - _ <~ check_dispatch_target(ExternalThingy), + ok <- check_request(Context, User), + ok <- check_permissions(Action, User), + ok <- check_dispatch_target(ExternalThingy), DispatchData <~ dispatch_target(ExternalThingy), - _ <~ check_condition(Action, Context), + ok <- check_condition(Action, Context), dispatch(Action, User, Context) end @@ -246,9 +266,9 @@ suspect most people have either been reasonable enough not to use it, or did not think about it. Obviously the new form would be cleaner: begin - W <~ b(), - X <~ c(W), - Y <~ d(X), + {ok, W} <- b(), + {ok, X} <- c(W), + {ok, Y} <- d(X), Z = e(Y), f(Z) end @@ -323,10 +343,10 @@ construct: Backup = filename:join(Dir, "RELEASES.backup"), Change = filename:join(Dir, "RELEASES.change"), begin - _ <~ backup_releases(Dir, NewReleases, Masters, Backup, Change, - RelFile), - _ <~ update_releases(Dir, NewReleases, Masters, Backup, Change), - _ <~ move_releases(Dir, NewReleases, Masters, Backup, Change, RelFile) + ok <- backup_releases(Dir, NewReleases, Masters, Backup, Change, + RelFile), + ok <- update_releases(Dir, NewReleases, Masters, Backup, Change), + ok <- move_releases(Dir, NewReleases, Masters, Backup, Change, RelFile) end. backup_releases(Dir, NewReleases, Masters, Backup, Change, RelFile) -> @@ -361,8 +381,9 @@ construct: end. The only reasonable way to rewrite the code was to extract all three -major multi-node operations into distinct functions. The improvements -are: +major multi-node operations into distinct functions. + +The improvements are: - The consequence of failing an operation is located near where the operation takes place @@ -444,40 +465,7 @@ execution context. It is also the best way to structure code in order to handle all exceptions and to provide the context they need as close as possible to -their source, and as far as possible from the execution flow. - - -Encouraging Standards ---------------------- - -In Erlang, `true` and `false` are regular atoms that only gained special -status through usage in boolean expressions. It would be easy to think -that more functions would return `yes` and `no` were it not from control -flow constructs. - -Similarly, `undefined` has over years of use become a kind of default -"not found" value. Values such as `nil`, `null`, `unknown`, `undef`, -`false` and so on have seen some use, but a strong consistency in format -has ended up aligning the community on one value. - -When it comes to return values for various functions, `{ok, Term}` is -the most common one for positive results that need to communicate a -value, `ok` for positive results with no other value than their own -success, and `{error, Term}` is most often uses for errors. Pattern -matching and assertions have enforced that it is easy to know whether a -call worked or not by its own structure. - -However, many success values are still larger tuples: `{ok, Val, -Warnings}`, `{ok, Code, Status, Headers, Body}`, and so on. Such -variations are not problematic on their own, but it would likely not -hurt too much either to use `{ok, {Val, Warnings}}` or `{ok, {Code, -Status, Headers, Body}}`. - -In fact, using more standard forms could lead to easier generalizations -and abstractions that can be applied to community-wide code. By choosing -specific formats for control flow on value-based error handling, we -explicitly encourage this form of standardization. - +their source, and as far as possible from the integrated flow. Rationale ========= @@ -485,8 +473,10 @@ Rationale This section will detail the decision-making behind this EEP, including: - Prior Art in Other Languages +- Whether to Normalize on Wrappers +- Adding the `else` Block - The choice of `begin ... end` as a construct and its scope -- Why introduce a new operator +- Why reuse the arrow operator - Other disregarded approaches - The choice of supported values - The choice of `{badunwrap, Val}` as a default exception @@ -572,7 +562,7 @@ Haskell's usage. ### Rust ### Rust defines two types of errors: unrecoverable ones (using `panic!`) -and recoverable ones, using the `Error` values. The latter is of +and recoverable ones, using the `Result` values. The latter is of interest to us, and defined as: enum Result { @@ -629,8 +619,8 @@ effect, but found it too cumbersome. Compare: try!(try!(try!(foo()).bar()).baz()) foo()?.bar()?.baz()? -### Swift ### +### Swift ### Swift supports exceptions, along with type annotations declaring that a function may raise exceptions, and `do ... catch` blocks. @@ -654,7 +644,6 @@ often simplified by using `let` assignments in a conditional expression: ### Go ### - Go has some fairly anemic error handling. It has panics, and error values. Error values must be assigned (or explicitly ignored) but they can be left unchecked and cause all kinds of issues. @@ -784,6 +773,154 @@ could result in either a success or error. The `~>>` operator matches and returns an `{:ok, term}` tuple, and the `~>` operator wraps a value into an `{:ok, term}` tuple. +Whether to Normalize on Wrappers +--------------------- + +In Erlang, `true` and `false` are regular atoms that only gained special +status through usage in boolean expressions. It would be easy to think +that more functions would return `yes` and `no` were it not from control +flow constructs. + +Similarly, `undefined` has over years of use become a kind of default +"not found" value. Values such as `nil`, `null`, `unknown`, `undef`, +`false` and so on have seen some use, but a strong consistency in format +has ended up aligning the community on one value. + +When it comes to return values for various functions, `{ok, Term}` is +the most common one for positive results that need to communicate a +value, `ok` for positive results with no other value than their own +success, and `{error, Term}` is most often uses for errors. Pattern +matching and assertions have enforced that it is easy to know whether a +call worked or not by its own structure. + +However, many success values are still larger tuples: `{ok, Val, +Warnings}`, `{ok, Code, Status, Headers, Body}`, and so on. Such +variations are not problematic on their own, but it would likely not +hurt too much either to use `{ok, {Val, Warnings}}` or `{ok, {Code, +Status, Headers, Body}}`. + +While using more standard forms could lead to easier generalizations +and abstractions that can be applied to community-wide code. By choosing +specific formats for control flow on value-based error handling, we +would explicitly encourage this form of standardization. + +That being said, the variety of formats existing and the low amount of strict +values being used would mean that forcing normalization calls for a potential +loss of flexibility in future language decisions. For example, +[EEP-54](https://github.com/erlang/eep/blob/master/eeps/eep-0054.md)—completed +before final revisions of this RFC—tries to add new forms of context to error +reports, and various libraries already rely on these richer patterns. + +It is therefore the opinion of the OTP technical board that we should **not** +normalize error-return values. As such, an approach closer to Elixir's `with` +has been proposed, although this EEP's approach is more general in terms of +sequences of acceptable expressions and their composition. + +Adding the else Block +--------------------- + +Avoiding normalization on error and good values introduces the need for the +`else ... end` sub-block to prevent edge cases. + +Let's look with the following type of expression as an explanation why: + + begin + {ok, {X,Y}} <- id({ok, {X,Y}}) + ... + end + +While this mechanism is fine to handle skipping pattern, it has some +problematic weaknesses in the context of error handling. + +One example of this could be taken from the OTP pull request that adds +new return value to packet reading based on inet options: +[#1950](https://github.com/erlang/otp/pull/1950). + +This PR adds a possible value for packet reception to the prior form: + + {ok, {PeerIP, PeerPort, Data}} + +To ask make it possible to alternatively get: + + {ok, {PeerIP, PeerPort, AncData, Data}} + +Based on socket options set earlier. So let’s put it in context for the +current proposal: + + begin + {ok, {X,Y}} <- id({ok, {X,Y}}), + {ok, {PeerIP, PeerPort, Data}} <- gen_udp:recv(...), + ... + end + +Since the we force a return on any non-matching value, the whole expression, if +the socket is misconfigured to return `AncData`, would return `{ok, {PeerIP, +PeerPort, AncData, Data}}` on a failure to match. + +Basically, an unexpected but good result could be returned from a +function using the `begin ... end` construct, which would look like a +success while it was actually a complete failure to match and handle the +information given. This is made even more ambiguous when data has the +right shape and type, but a set of bound variables ultimately define +whether the match succeeds or fails (in the case of a UDP socket, +returning values that comes from the wrong peer, for example). + +In worst cases, It could let raw unformatted data exit a conditional +pipeline with no way to detect it after the fact, particularly if later +functions in `begin ... end` apply transformations to text, such as +anonymizing or sanitizing data. This could be pretty unsafe +and near impossible to debug well. + +Think for example of: + + -spec fetch() -> {ok, iodata()} | {error, _}. + fetch() -> + begin + {ok, B = <<_/binary>>} <- f(), + true <- validate(B), + {ok, sanitize(B)} + end. + +If the value returned from `f()` turns out to be a list (say it’s a +misconfigured socket using `list` instead of `binary` as an option), the +expression will return early, the `fetch()` function will still return +`{ok, iodata()}` but you couldn’t know as a caller whether it is the +transformed data or non-matching content. It would not be obvious to +most developers either that this could represent a major security risk +by allowing unexpected data to be seen as clean data. + +This specific type of error is in fact possible in Elixir, but no such +warning appears to have been circulating within its community so far. +The issue is to be handled with an `else` block which this proposal +reuses to clamp down on unexpected values: + + -spec fetch() -> {ok, iodata()} | {error, _}. + fetch() -> + begin + {ok, B = <<_/binary>>} <- f(), + true <- validate(B), + {ok, sanitize(B)} + else + false -> {error, invalid_data}; + {error, R} -> {error, R} + end. + +Here misconfigured sockets won’t result in unchecked data passing trough +your app; any invalid use case is captured, and if the value for `B` turns +out to be a list, an `else_clause` error is raised with the bad value. + +Unless the clause is mandatory (it is not in Elixir and we do not plan it here +either for compatibility reasons with existing `begin ... end` expressions), +this level of additional matching is purely optional; the developer has no +obvious incentive to go and handle these errors, and if they do, the exception +raised will be through a missing clause in the `else` section, which will +obscure its origin and line nubmer. + +We will therefore have to rely on education and documentation (along with +type analysis) to prevent such issues from arising in the future. + +These problems would *not* exist with normalized error and return values as those used in statically-typed languages, but since we do not intend to normalize values, the `else` blog is a necessary workaround. + Choosing `begin ... end` Expressions ------------------------------------ @@ -794,6 +931,7 @@ the following items needed consideration: 1. what is the scope we need to cover 2. what is the format of the structure to use 3. why ending up with `begin ... end` +4. Why choose the `else` keyword ### Scoping Limits ### @@ -832,7 +970,6 @@ errors. ### Format of Structure ### - Prior attempts at abstracting value-based error handling in Erlang overloaded special constructs with parse transforms in order to provide specific workflows. @@ -1047,11 +1184,23 @@ contexts, which means that although not an ideal fit for value-based error flow, it is also not entirely outlandish and could accept the new added optional semantics without being too out of place. +### Why Choose the `else` keyword ### -A New Infix Operator +The first step here was looking at all the existing alternative reserved +keywords: `of`, `when`, `cond`, `catch`, `after`. + +None of these actually conveys the sense of requiring an alternative clause to +the construct, and so we require adding a new one. The `else` keyword is +tempting if only because it opens the door to introducing it as a reserved word +in `if` expressions at a later date. + +A quick look at the OTP code base to be sure seems to return no `else()` +function and should therefore be relatively safe to use in general. + +Reusing An Infix Operator -------------------- -In order to form `UnwrapExpr`, there is a need for a mechanism to +In order to form `MatchOrReturnExprs`, there is a need for a mechanism to introduce pattern matching with distinct semantics from regular pattern matching. @@ -1059,8 +1208,9 @@ A naive parse transform approach with fake function calls would be the most basic way to go: begin - unwrap(Pattern, Exp), + match_or_return(Pattern, Exp), % variables bound in Pattern are available in scope + ... end However, this would introduce pattern matches in non-left-hand-side @@ -1080,13 +1230,16 @@ uses them in multiple forms: confusing when handling nested maps in a pattern - `<-` could make sense. It is already restricted in scope to list and binary comprehensions and would therefore not clash nor be confused. - However, the existing semantics of the operator imply a literal - pattern match working like a filter. We're looking for the filter-like - approach, but want to introduce implicit elements (`{ok|error, ...}`) + The existing semantics of the operator imply a literal + pattern match working like a filter, which is what we are looking for. - `<=` same as `<-` but for binary generators -It would make sense to check for new operators specifically for this -context given the semantics: +The `<-` operator makes the most sense and shouldn't be confusing for +anyone. + +For completeness's sake, I also checked for alternative operators +in a prior version of this EEP that introduced prescriptive values +for `{ok, T} | {error, R}`, which had distinct semantics: ======= =========================================================== Operator Description @@ -1099,26 +1252,24 @@ context given the semantics: frame proposals (uses infix ~ and < > as delimiters). <| reverse pipe operator. No obvious clash either -There is no strong argument for or against most of these. The choice of -`<~` mostly comes down to having similarity to list comprehensions' `<-` -operator both in semantics and appearance, although being different -overall. +The `<-` operator from list comprehensions is the most adequate +option, both in terms of simplicity and cognitive costs. ### Operator Priority ### Within the expected usage of the unwrap expressions, the `<~` operator needs to have a precedence rule such that: - X = {Y,X} <~ + X = {Y,X} <- Is considered a valid pattern match operation with `X = {Y,X}` being the whole left-hand-side pattern, such that operation priorities are: - lhs <~ rhs + lhs <- rhs Instead of - lhs = rhs <~ <...> + lhs = rhs <- <...> In all other regards, the precedence rules should be the same as `=` in order to provide the most unsurprising experience possible. @@ -1129,193 +1280,57 @@ Other Disregarded Approaches and Variations Other approaches were considered in making this proposal, and ultimately disregarded. -### Elixir-Like Patterns in `with` ### - -The Elixir approach is fairly comprehensive, and rather powerful. Rather -than handling success or errors, it generalizes over pattern matching as -a whole. - -To explore bringing these semantics into the current proposed construct, -we will use the `<-` operator from list comprehensions to mean "match -the whole pattern or exit the block". So instead of - - begin - {X,Y} <~ id({ok, {X,Y}}) - ... - end +### Begin ... end with prescriptive error values -We would have to write: +An earlier version of this document simply used: begin - {ok, {X,Y}} <- id({ok, {X,Y}}) + Foo = bar(), + X <~ id({ok, 5}), + [H|T] <~ id({ok, [1,2,3]}), ... end -While this mechanism is fine to handle skipping pattern, it has some -problematic weaknesses in the context of error handling. +Which implicitly unpacked `{ok, T} = f()` by calling `T <~ f()`, and forced +all acceptable non-matching values to be of the form `{error, T}.` -One example of this could be taken from the OTP pull request that adds -new return value to packet reading based on inet options: -[#1950](https://github.com/erlang/otp/pull/1950). +To make the form useful to most existing code, it also required some magic +everyone (myself included) didn't very much like, for which `_ <~ f()` would +implicitly succeed if the return value for `f()` was `ok`. -This PR adds a possible value for packet reception to the current form: +This was judged to be too magical, and not necessarily a ton of existing +Erlang code would have benefited from the form since `ok` is often returned +for successful functions without an extra value. A stronger prescriptiveness +of the form `{ok, undefined}` (to replicate Rust's `Ok(())`) would have +been required to avoid the magic, and would have felt very unidiomatic. - {ok, {PeerIP, PeerPort, Data}} - -To ask make it possible to alternatively get: - - {ok, {PeerIP, PeerPort, AncData, Data}} - -Based on socket options set earlier. So let’s put it in context for the -current proposal: - - begin - {X,Y} <~ id({ok, {X,Y}}), - {PeerIP, PeerPort, Data} <~ gen_udp:recv(...), - ... - end +### Elixir-Like Patterns in `with` ### -If `AncData` is received, an exception is raised: the value was not an -error but didn’t have the shape or type expected for the successful -pattern to match. Errors are still returned properly by exiting the -`begin ... end` block, and we ensure correctness in what we handle and -return. +The Elixir approach is fairly comprehensive, and rather powerful. Rather +than handling success or errors, it generalizes over pattern matching as +a whole, as we do here. -However, had we used this generalized form: +The one difference is that Elixir's `with` expression forces all +conditionals to happen first, with a `do` block for the free-form +expressions that follow: - begin - {ok, {X,Y}} <- id({ok, {X,Y}}), - {ok, {PeerIP, PeerPort, Data}} <- gen_udp:recv(...), - ... + with dob <- parse_dob(params["dob"]), + name <- parse_name(params["name"]) + do + %User{dob: dob, name: name} + else + err -> err end -Since the `<-` operator would force a return on any non-matching value, -the whole expression, if the socket is misconfigured to return -`AncData`, would return `{ok, {PeerIP, PeerPort, AncData, Data}}` on a -failure to match. - -Basically, an unexpected but good result could be returned from a -function using the `begin ... end` construct, which would look like a -success while it was actually a complete failure to match and handle the -information given. This is made even more ambiguous when data has the -right shape and type, but a set of bound variables ultimately define -whether the match succeeds or fails (in the case of a UDP socket, -returning values that comes from the wrong peer, for example). - -In worst cases, It could let raw unformatted data exit a conditional -pipeline with no way to detect it after the fact, particularly if later -functions in `begin ... end` apply transformations to text, such as -anonymizing or sanitizing data. This could be pretty unsafe -and near impossible to debug well. - -Think for example of: - - -spec fetch() -> iodata(). - fetch() -> - begin - {ok, B = <<_/binary>>} <- f(), - true <- validate(B), - {ok, sanitize(B)} - end. - -If the value returned from `f()` turns out to be a list (say it’s a -misconfigured socket using `list` instead of `binary` as an option), the -expression will return early, the `fetch()` function will still return -`{ok, iodata()}` but you couldn’t know as a caller whether it is the -transformed data or non-matching content. It would not be obvious to -most developers either that this could represent a major security risk -by allowing unexpected data to be seen as clean data. - -This specific type of error is in fact possible in Elixir, but no such -warning appears to have been circulating within its community so far. - -It is basically a risky pattern if you want your code to be strict or -future-proof in the context of error handling. The current proposal, by -comparison, would raise an exception on unexpected good values, therefore -preventing ways to sneak such data into your control flow: - - -spec fetch() -> iodata(). - fetch() -> - begin - B = <<_/binary>> <~ f(), - _ <~ validate(B), % returns ok if valid - {ok, sanitize(B)} - end. - -Here misconfigured sockets won’t result in unchecked data passing trough -your app. - -The only way to give a similar amount of safety to the general pattern -approach is through an `else` clause which handles all known patterns to -implicitly exclude all unknown patterns: - - -spec fetch() -> iodata(). - fetch() -> - begin - {ok, B = <<_/binary>>} <- f(), - true <- validate(B), - {ok, sanitize(B)} - else - {error, _} = E -> E; - false -> false - end. - -This is the solution Elixir uses as well. Unless the clause is mandatory -(it is not in Elixir), this level of additional matching is purely -optional; the developer has no obvious incentive to go and handle these -errors, and if they do, the exception raised will be through a missing -clause in the `else` section, which will obscure its origin and line -nubmer. - -It would also allow some functions to return unexpected values from -other ones. In the previous example, `f()` must be allowed to return -`false` if `validate(B)` may return it. There is no way to separate such -clauses. - -None of these problems exist as long as we normalize the matching -mechanism on well-defined "good" and "bad" values (`ok | {ok, Term}` and -`{error, Term}`). This separation between good and bad values allows to -know what needs to return early without conflicts with what is a valid -or invalid pattern. - -From the moment we decide to pick such values, unwrapping them in -patterns can make code clearer: `{error, X} <- exp()` would be a pattern -that can never match by definition, since only good values are allowed -to go through and all errors return early. Automatically unwrapping good -values prevents such nonsensical expressions. - -These tricky corner cases explain why the `<~` pattern is preferred to -the general `<-` pattern's semantics in this proposal. - -### Simplifying Chaining an Pipelining ### - -One approach or pain point frequently brough up about Erlang concern -pipelining of operations. Could it be possible to make some -operations easier to chain? - -If we take a set of functions `f()`, `g()`, and `h()` that all return -`{ok | error, _}` tuples, current day Erlang requires: - - {ok, X} = f(), - {ok, Y} = g(X), - {ok, Z} = h(Y), - Z - -Could there be an easier way to handle this type of chaining, based on -say, an `unwrap` function: - - unwrap({ok, X}) -> X. - - main() -> - unwrap(h(unwrap(g(unwrap(f()))))). - -And it appeared that generally, this turns out to be simple enough to do -with the earlier fold approach we had mentioned. - -Overall, the various existing mechanisms appeared slightly inconvenient, -but not inconvenient enough to be worth adding a whole new language -mechanism just for it. +The Erlang form introduced in this document is more general since it +allows mixing `MatchOrReturnExprs` and regular expressions throughout, +without the need for a general `do` block. +The Erlang form does imply a likely more complex set of rewriting rules when +translating from the AST form to Core Erlang. [It should be possible to purely +rewrite in existing Core Erlang terms although the end result may not look like +the original code at +all](https://gist.github.com/ferd/2f1134fd88615354fbf89c068216b259) ### `cond` and `cond let` ### @@ -1363,7 +1378,8 @@ as well as Haskell's do notation, but that neither Rust nor Swift does. It seems that there is no very clear consensus on what could be done. Thus, for the simplicity of the implementation and backards compatibility of the `begin ... end` expression, just returning the -value as-is without auto-wrapping seems sensible. +value as-is without auto-wrapping seems sensible, particularly +since we do not prescribe tuple formats for handled values. It would therefore be up to the developer to just return whatever value best matches their function's type signature, making easier to still @@ -1373,111 +1389,25 @@ It also lets sequences of operations potentially return `ok` on success, even if their individual functions returned values such as `true`, for example, rather than `{ok, true}`. -The choice of supported match values -------------------------------------- - -It is kind of straightforward why `{ok, V}` and `{error, T}` are used in -pattern matches as error values: they're the most standard way to -communicate a value and an error in non-overlapping patterns whichever -way you want to match. - -On the other hand, it is less obvious why `_ <~ Exp` should positively -match on `ok` alone, and why, for example, `error` as an atom would -raise an exception as not matching any patterns. - -The reason `ok` is considered valid can be found in comparing common -Erlang return values with their matches in other languages. - -The following functions return `ok` when everything went well but -nothing is worth reporting. The list is not exhaustive: - -- `lists:foreach/2` -- over 25 functions in the `file` module -- most functions in `disk_log` -- most functions sending data or handling control of sockets and ports -- most output functions from the `io` module -- logging functions in the `logger` module -- functions from the `applications` module interacting with config and - starting or loading applications - -The pattern is fully entrenched as a core pattern in Erlang and OTP, and -very attached to side-effectful operations. - -The interesting aspect comes from seeing what Rust does for similar -functions, which is just return their own unit type, denoted as `()`. -When used with the `Result` types, it is to be returned a `OK(())`. - -The Erlang equivalent would probably be `{ok, undefined}`, but `ok` as a -single atom currently plays that role fine, and so it was decided to -support it; it will let error flow integrate well with side-effectful -functions. - -The same cannot be said of `error` as an atom result. Most errors can -and should return context with them that qualifies the error result, -since they often have more than one reason to fail. As evidence for this -line of thought, it is currently not possible to raise exceptions -without a `Reason`, whether done through `throw/1`, `error/1`, -`exit/1-2`, or `raise/3`. - -Aligning with the standard practices in the Erlang language validate -using `_ <~ Exp` as a pattern suitable for `ok`, and only this pattern -since it allows to basically match on what would be a non-existing value -that wouldn't need to be bound in further contexts. - -Discussions on earlier drafts of this proposal asked whether it would -make sense to choose all good values to be those in a tuple starting -with `ok` (`ok | {ok, _} | {ok, _, _} | ...`), and all error values all -those starting with error (`{error, _} | {error, _, _} | ...`). - -This approach would allow more flexibility on possible error values, but -would make composition more difficult. Let's take the following three -function signatures as an example: - - -spec f() -> ok | {error, term()}. - -spec g() -> {ok, term()} | {error, term(), term()}. - -spec h() -> {ok, term(), [warning()]} | {error, term()}. - -If a single `begin ... end` block calls to these as the potential return -value of a function, the caller now has to have the following type -specification: - - -spec caller() -> ok | {ok, term()} | {ok, term(), [warning()]} - | {error, term()} | {error, term(), term()}. - -As you call more and more functions and compose them together, the -cross-section of what is a valid returning function grows in complexity -and may even end up giving more trouble to tools such as Dialyzer. - -By comparison, the currently suggested mechanism can never get more -complex than: - - -spec caller() -> ok | {ok, term()} | {error, term()}. - -Or, if we prefer parametrized types: - - -type result(E) :: ok | {error, E}. - -type result(R, E) :: {ok, R} | {error, E}. - - -spec caller() -> result(term()) | result(term(), term()). - -By restricting the possible patterns (and therefore return values), we -can ensure better long-term composability and easier understanding of -various such expressions. - Choosing Exceptions Raised -------------------------- -The exception format proposed here is `{badunwrap, Value}`. This format +The exception format proposed here is `{else_clause, Value}`. This format is chosen following Erlang/OTP standards: -- `badarg` -- `badarith` -- `badfun` +- `if_clause` +- `{case_clause, Val}` +- `function_clause` (the value is provided in the stacktrace) - `{badmatch, Val}` +- Unmatching values in a `catch` block and `receive` expressions do not raise + anything explicitly + +Since `case_clause` is functionally the closest exceptions and that it carries +a value, we choose to replicate the same form here. -Since "unwrapping" is how the kind of operation where `X` is extracted -from `{ok, X}`, the name `badunwrap` was chosen, along with the -mismatching value being borrowed from `{badmatch, _}`. +The reason `else_clause` is chosen over `begin_clause` because the `else` block +could arguably be used in other constructs in the future, and constraining the +exception to the block's name itself is likely more future-proof. Backwards Compatibility @@ -1489,28 +1419,26 @@ unsafe to use outside of it. This is a change of behaviour that brings `begin` in line with the variables bound within a `case ... end` branch, a `try/catch` clause, or -a `receive ... end` branch. +a `receive ... end` branch. The same is true of the `else` block. -This lack of safety only needs to be started at the first `UnwrapExpr` -encountered, since all variables bound before respect the same semantics -as the existing `begin ... end` expression. If this analysis is done -rather than just declaring all variables as unsafe wholesale, then there -is no backwards compatibility concern to be had. +This lack of safety only needs to be started at the first `MatchOrReturnExpr` +encountered, since all variables bound before respect the same semantics as the +existing `begin ... end` expression. If this analysis is done rather than just +declaring all variables as unsafe wholesale, then there is no backwards +compatibility concern to be had. -The need for a new operator means code built with support for the new -expressions won't be portable to older Erlang releases. +The need for the `<-` operator to be used in a new context means code built +with support for the new expressions won't be portable to older Erlang +releases. However, if all the code is rewritten _after_ the AST and reuses +existing core Erlang components, built BEAM artifacts should work on older +versions fine. This is, however, not a supported use case by the OTP team. Reference Implementation ======================== -No reference implementation is usually required at this step, but one is -nevertheless provided in the original repository for this EEP draft, -at [bitbucket.org/ferd/unwrap/](https://bitbucket.org/ferd/unwrap/). -The implementation uses parse transforms rather than an operator, -since it would be difficult to add custom operators at this point of the process. - - +No reference implementation is usually required at this step. +One is to be developed at a later point in time. [EmacsVar]: <> "Local Variables:" [EmacsVar]: <> "mode: indented-text"