Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Go 2: exhaustive switching for enum type-safety #36387

Open
ermik opened this issue Jan 4, 2020 · 19 comments
Open

proposal: Go 2: exhaustive switching for enum type-safety #36387

ermik opened this issue Jan 4, 2020 · 19 comments

Comments

@ermik
Copy link

@ermik ermik commented Jan 4, 2020

  • Would you consider yourself a novice, intermediate, or experienced Go programmer?
    Intermediate, multiple projects, plenty of learning

  • What other languages do you have experience with?
    JS/TS, Swift, Python, C#, C++

  • Has this idea, or one like it, been proposed before?
    Yes, the enums are a popular under proposal subject matter.

  • If so, how does this proposal differ?
    It doubles down on the status quo (aka historically idiomatic) solution for enums in Go. All current Go enum declarations will remain valid. Instead of introducing new keywords or interfering with existing patterns (such as the Stringer interface) it adds tools for enforcement and runtime-safety of enumeration usage. It does so by employing existing type assertion and type casting idioms, and the const keyword idiomatic to Go enum declarations.

  • Who does this proposal help, and why?
    For those adopting enums in Go, it is important to limit their use to the values known at compile time, or explicitly opt-in to the type-casting and therefore verifying the primitive values. The new syntax will not eliminate the need for authors of packages exposing an enumeration type to create checks on enum usage. Rather, this proposal will clarify the party responsible for enum value validation and provide tools that will make that validation idiomatic, clearly identifiable and easy to implement.

  • Is this change backward compatible?
    Yes.

  • Show example code before and after the change.
    I used http package as an example but I do not advocate redeclaring standard library constants as enums.
    Before

    const (
        MethodGet     = "GET"
        // ...
    )
    func NewRequest(method, url string, body io.Reader) (*Request, error) {
        switch method {
        case MethodGet:
            // ...
        default:
            return nil, ErrUnknownMethod
        }
    }

    After (please see comment below for an evolved, but still work in progress update to the proposed changes)

    type Method string
    
    const (
        MethodGet Method = "GET"
        // ...
    )
    
    // NewRequest with original signature allows for freeform string method
    func NewRequest(method, url string, body io.Reader) (*Request, error) {
        // method, ok := method.(const) // ERR: can't check untyped string value against known constants
    
        // but if the user provides a set of constants to check, we can use a switch:
        switch method.(const) {
        case MethodGet:
            // ...
        default:
            // The default case in a .(const) switch is a requirement
            return nil, ErrUnknownMethod
        }
    
        // Or we can cast to a named type, effectively switching over cases for each of enum's constant values
        method, ok := Method(method).(const)
        if (!ok) {
            return nil, ErrUnknownMethod
        }
    
        // Now that method is a known constant, the .(const) switch may drop its default clause
        // ...
    }
    
    // NewRequestEnum requires enum type but has no opinion on its source
    // For the caller, it makes it clear the first argument is unlike the second.
    func NewRequestEnum(m Method, url string, body io.Reader) (*Request, error) {
        // ...
    }
    
    // NewRequestConst is not callable from if using Method cast from a primitive type without performing .(const) check.
    // For the caller, it makes it clear there is a set of values of named type already provided.
    func NewRequestConst(m const Method, url string, body io.Reader) (*Request, error) {
        // This is a "MAYBE??": addition to the syntax, borrowing from "chan type",
        // "const type" requires caller to use a pre-defined constant value
        // or use switch.(const) and .(const) type-cast to fulfill that obligation
        // when sourcing from a primitive type.
        // See comment below for me pulling back from this.
    }
  • What is the cost of this proposal? (Every language change has a cost).
    The new syntax clashes with .(type) and the const addition to function signature argument is opening a can of worms.

  • How many tools (such as vet, gopls, gofmt, goimports, etc.) would be affected?
    New syntax would depend on linting and formatting to ensure ease of adoption.

  • What is the compile time cost?
    Compiler would need to identify all references to a named const type and ensure the binary has type-specific facilities for executing .(const) checks. Additionally, when a function signature (or perhaps even a struct field) requires const argument, it must identify the call site and ensure the .(const) conversion takes place prior to call.

  • What is the run time cost?
    During program execution the new .(const) checks are done against switch cases or statically instanced collections of pre-defined values. If the type casting for enum values is present in the binary, and the new syntax is employed unsafely (e.g. the .(const) casting omits ok variable), the panic messaging must clearly identify the mismatch between available pre-defined enum constants and the result of the runtime-cast of from a primitive type.

  • Can you describe a possible implementation?
    No. I don't have a familiarity with the Go source code. Point me in the right direction and I would love to contribute.

  • Do you have a prototype? (This is not required.)
    No, but willing to collaborate.

  • How would the language spec change?
    I don't have an off-hand grasp of EBNF notation, but in layman's terms the spec will extend type assertion, type switch, and function parameter usage to make possible the new syntax.

  • Orthogonality: how does this change interact or overlap with existing features?
    It makes use of type assertion and type casting syntax as the basis.

  • Is the goal of this change a performance improvement?
    No.

  • Does this affect error handling?
    Yes, the advancement of enum support in Go through this or another proposal must ensure more elaborate analysis at exception site. In case of this proposal, it is now possible to identify

  • If so, how does this differ from previous error handling proposals?
    I don't know.

  • Is this about generics?
    No.

Original ideas and research

Prior Art

In no specific order:

  • #32176 has shown there is some confusion behind the sense of enum not being a native? type (sugar), but an engineering pattern/trick.
  • There are plenty of examples of this pattern being baked into the language, some outlined in #28987. (As well as clear argumentation of what needs to improve.)
  • In fact case keyword has already been appropriated for enum declarations in #28438 leading to defacto parity with language like Swift.
  • While others, e.g. #28438 need enums to be akin structs with dot-notation hierarchy access as if it's a graph.
  • Lacking Range type sugar is another source of confusion (#30428 and).
  • Some range issues are really about constraining the enumeration — #29649
  • Aliasing possibility (#17784) for a short time further confused the enum declarations.
  • Apparently, clarity of Go int enumeration being an ordered list of things is a feature, and #13403 seems like it would be an unnecessary change to the gofmt status quo.
  • There is enum chatter which is really about getting Strings out of the enum values (#32176, #13744).
  • Interestingly enough the immutability question, which underlies the enumeration pattern, has been discussed in #27975. @beoran cites people saying "go doesn't have enums", when they mean "go doesn't implement conveniences based on enum keyword"
  • iota usage extension has been mentioned in #21473 and I take it as a sign iota will not be going away...
  • The root #19814 is again about keyword introduction to purge the unexpected conditions in a "enum" value switch. The earlier questions #17205, #9481 outline the core needs — how do we ensure only known values are accepted?
  • Ultimately, #5469 decided not to touch the enum switching, but Go 2 seems like a great time to revisit.
  • Very early on (#359) the type-safety was the only omission from the const-enum.

Thoughts

  1. enum is indeed new type, which is what type State string does, there is no idiomatic need to introduce a new keyword. Go isn't about saving space in your source code, it is about readability, clarity of purpose.

  2. Lack of type safety, confusing the new string- or int-based types for actual strings/ints is the key hurdle. All enum clauses are declared as const, which creates a set of known values compiler can check against.

  3. Stringer interface is the idiom for representing any type as human-readable text. Without customization, type ContextKey string enums this is the string value, and for iota-generated enums it's the integer, much like XHR ReadyState codes (0 - unsent, 4 - done) in JavaScript.

    Rather, the problem lies with the fallibility of custom func (k ContextKey) String() string implementation, which is usually done using a switch that must contain every known enum clause constant.

  4. In a language like Swift, there is a notion of an exhaustive switch. This is a good approach for both the type checking against a set of consts and building an idiomatic way to invoke that check. The String() function, being a common necessity, is a great case for implementation.

Proposal

package main

import (
	"context"
	"strconv"
	"fmt"
	"os"
)

// State is an enum of known system states.
type DeepThoughtState int

// One of known system states.
const (
	Unknown DeepThoughtState = iota
	Init
	Working
	Paused
	ShutDown
)

// String returns a human-readable description of the State.
//
// It switches over const State values and if called on
// variable of type State it will fall through to a default
// system representation of State as a string (string of integer
// will be just digits).
func (s DeepThoughtState) String() string {
	// NEW: Switch only over const values for State
	switch s.(const) {
	case Unknown:
		return fmt.Printf("%d - the state of the system is not yet known", Unknown)
	case Init:
		return fmt.Printf("%d - the system is initializing", Init)
	} // ERR: const switch must be exhaustive; add all cases or `default` clause

	// ERR: no return at the end of the function (switch is not exhaustive)
}

// RegisterState allows changing the state
func RegisterState(ctx context.Context, state string) (interface{}, error) {
	next, err := strconv.ParseInt(state, 10, 32)
	if err != nil {
		return nil, err
	}
	nextState := DeepThoughtState(next)

	fmt.Printf("RegisterState=%s\n", nextState) // naive logging

        // NEW: Check dynamically if variable is a known constant
	if st, ok := nextState.(const); ok {
		// TODO: Persist new state
		return st, nil
	} else {
		return nil, fmt.Errorf("unknown state %d, new state must be one of known integers", nextState)
	}
}

func main() {
	_, err := RegisterState(context.Background(), "42")
	if err != nil {
		fmt.Println("error", err)
		os.Exit(1)
	}
	os.Exit(0)
	return
}

P.S. Associated values in Swift enums are one of my favorite gimmicks. In Go there is no place for them. If you want to have a value next to your enum data — use a strongly typed struct wrapping the two.

Originally posted by @ermik in #19814 (comment)

@gopherbot gopherbot added this to the Proposal milestone Jan 4, 2020
@gopherbot gopherbot added the Proposal label Jan 4, 2020
@mvdan
Copy link
Member

@mvdan mvdan commented Jan 5, 2020

Personally, I deal with this by writing code in a conservative way, like:

type Foo int

const (
    A Foo = iota
    B
    C
)

func F(foo Foo) {
    switch foo {
    case A:
        // code
    case B:
        // code
    default: // C
        // code
    }
}

Then, exhaustive tests and code coverage will catch if I missed any of the cases.

Certainly not perfect, but I've found that this works well. In multiple years of working with iota constants in Go, I don't recall a single time where I've had bugs that could have been avoided via exhasutive switches.

@ianlancetaylor ianlancetaylor changed the title proposal: exhaustive switching for enum type-safety proposal: Go 2: exhaustive switching for enum type-safety Jan 6, 2020
@ianlancetaylor ianlancetaylor added the Go2 label Jan 6, 2020
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 6, 2020

If I understand this correctly, the suggestion here is to add two new language constructs. I'm going to try to restate them in words rather than entirely in examples.

First,

    switch x.(const)

Here x must be a value of some type that supports constants. The pcakage that defines the type may have defined const values of that type. The switch statement is required to have a case for each const value of that type.

It's not immediately clear what should happen if the type is exported and other packages define constants of that type. Do we only check for constants defined by the package that defines the type? Should we also check for constants defined in the package where the switch x.(const) appears?

The second new construct is a new kind of type assertion

    x, ok := x.(const)

As with the new switch statement, the language looks for defined const values of the type of x. The type assertion checks that x has one of the defined value, and sets ok accordingly. Presumably if the value of x is not a defined value, the first result is the zero value of the type, and if x is a defined value, the first result is simply x.

The initial result being simply x is somewhat redundant, but I guess it's worth it because of the parallelism with type assertions.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 6, 2020

I note that one common request for an enum type is a way to iterate through the valid values. That is not supported here.

Another common request for enum types is some sort of requirement that variable of the enum type can only take on valid values. That is only partially supported here, in that code can use the new type assertion syntax to verify that it has a valid value.

I do like the minimalistic approach to ensuring that switch statements are complete, as that is probably the most common request for enum values.

@ermik
Copy link
Author

@ermik ermik commented Jan 6, 2020

@ianlancetaylor Ian, thank you for jumping in to help me clarify. I don't have enough experience in language development and really appreciate the guardrails.

restate them in words rather than entirely in examples

It was an oversight on my part not to slim it down. But the core goal was to show something that feels neat and quirky yet clear like Go is.

if the type is exported and other packages define constants of that type

One thing I was thinking about is that if const usage would be the gateway to restricting the possible range of values it may be used in the function signature as well. This would allow accepting values from outside the package boundary.

Presumably if the value of x is not a defined value

There is another case I've thought of since I've reposted this as a separate issue:

package inner

type State int

const (
        A State = iota 
        B
        C
)

does not prevent, within or outside the package:

const duplicate State = 1

In such cases, vet may need to warn about typed constant collision, but the new x.(const) assert will have to be the runtime check. I would constrain x to the innermost (innermost package, earliest definition) constant value — B = 1, effectively performing the task of eliminating typed constant collision vet was suggesting, but we could also disable the culprit package enumeration, setting ok to false due to ambiguity.

I note that one common request for an enum type is a way to iterate through the valid values. That is not supported here.

In accordance with long-standing conversation about the lack of mapping functions, one would write the for loop themselves. range Type is another possibility further reusing existing syntax, look, and feel.

Another common request for enum types is some sort of requirement that variable of the enum type can only take on valid values. That is only partially supported here, in that code can use the new type assertion syntax to verify that it has a valid value.

I guess I would say that core purpose of this proposal is to introduce an incremental change which enables vet to warn about these issues with a clear resolution strategy (exhaustive const switch, const-assert, pruning typed constant collision), rather than abstracting them into the language itself by introducing new keywords and syntactic sugar.

Would love to talk more to improve this. I added links to prior conversations I was reading through recently at the top.

@gh67uyyghj

This comment was marked as off-topic.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 28, 2020

It's not clearly specified how to know which set of values are permitted for a type used with .(const). Do the values have to be in a single const block? Presumably they all have to be the same type.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 28, 2020

As this is a language change proposal, could you please fill out the template at https://go.googlesource.com/proposal/+/bd3ac287ccbebb2d12a386f1f1447876dd74b54d/go2-language-changes.md .

When you are done, please reply to the issue with @gopherbot please remove label WaitingForInfo.

Thanks!

@ermik
Copy link
Author

@ermik ermik commented Feb 4, 2020

@gopherbot please remove label WaitingForInfo

@ermik
Copy link
Author

@ermik ermik commented Feb 4, 2020

@ianlancetaylor thank you for welcoming these thoughts and guiding me through the process. The template requirement allowed me to introduce additional thoughts on how this would work, but I am hoping there is room for further discussion before a final decision is made. Please review the template and let me know if it can be improved.

As I had to make the leap and use const in a function signature — for the sake of complete picture when it comes to this proposal's scope — I've also bastardized a number of other possible declarations with it. Function declarations in public scope accepting enum arguments and public structs with enum fields are the two culprits and introducing the const syntax there will allow to decisively shift responsibility to the client. But I can see how that may be too much to ask. After all, the mere presence of enums in these declarations and the availability of the new .(const) may be a solid foundation for the slight ambiguity we have today, i.e. leaving it to the (responsible) APIs author to actively check the incoming values before use.

Lastly, again on the subject of time and proposal review schedule. I will make my best effort to stay with this and be responsive, but hope for some decent lag to be accepted without pulling the plug. Sometime life gets in the way.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Feb 4, 2020

The comment at #36387 (comment) needs to be addressed: how exactly do we know which constant values are associated such that the .(const) syntax applies to them? What if the same type has const values defined in multiple blocks? What if they are defined in different packages? What if some of the const values happen to have the same value?

Since we can use this for const strings, the x, ok := x.(const) syntax is surprisingly expensive, as it will have to compile into a series of string comparisons. In Go we generally try to avoid the case in which a simple expression has unpredictable execution time.

@ermik
Copy link
Author

@ermik ermik commented Feb 5, 2020

I've addressed the overall concern in the code example, perhaps it was too brief and sneaky. Here are some more detailed thoughts:

  • How exactly do we know which constant values are associated, such that the .(const) syntax applies to them?
    All constant values with a named type are considered for .(const) syntax at compile time.
    All constants of the same named type are associated.
    (This would include any type based off of a primitive type, not just string and int but I expect those to dominate in usage.)

  • What if the same type has const values defined in multiple blocks?
    All declarations with a named type irregardless of block count or block usage are considered.

  • What if they are defined in different packages?
    This is being discussed.
    The set of constant values can be amended across package boundary. Doing so is a bit odd, but not illegal. This helps maintain the backwards compatibility and prevents const keyword to be appropriated to become a de-facto enum keyword.

  • What if some of the const values happen to have the same value?
    Constants are immutable and I think they are stored as such. Having multiple constants with a same value is pointless (no pun intended) and should be highlighted to the users.

In the proposal template code example I've highlighted the following:

var method string = "GET"
method, ok := method.(const) // ERR: unable to check value of unnamed type against known constants

which disallows a plain string to be matched against all possible string constants.

The user either must use a named type (which has a known, fixed set of values to check against) or the switch v.(const) syntax where the user defines the set irregardless of named types by specifying a number of constants in case statements to match against.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Feb 5, 2020

If we aggregate valid constant values across packages, then the behavior of .(const) might depend on the exact set of packages that are imported. That doesn't sound right.

@ermik
Copy link
Author

@ermik ermik commented Feb 5, 2020

That's a good point. I didn't see the potential implications of that, although that's the allowance currently present with type Method string declaration (other packages can declare more constants of Method type).

Do you think a requirement of a single block declaration is viable as an alternative extreme to the "everywhere" approach? I am uncertain.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Feb 5, 2020

I'm not sure what the best answer is, but I'm fairly sure that it shouldn't vary depending on which packages have been imported.

@ermik
Copy link
Author

@ermik ermik commented Feb 5, 2020

I agree. Adding an import shouldn't break or implicitly change behavior of existing code.

@ermik
Copy link
Author

@ermik ermik commented Feb 6, 2020

We could say that a given constant type is internal to the package that defines it, adopting the behavior of protecting the struct types from external method definition:

func (s *pkg.Import) String() string // ERR

Such approach would limit the .(const) scope to said package without preventing that syntax usage higher in package tree.

There just isn't (yet) a way to distinguish a type:

type Method string

as a "constant" type... It makes sense though — the types defined using that syntax for the purpose of being a constant enumeration are rarely used as a general purpose variable. The only case is the type casting, where we say Method(str) to unsafely attempt matching one of the known values.

At the same time, a definition of a "constant type" would eliminate v, ok := v.(const) as if v is of primitive type, then constant matching should be written v, ok := v.(Method) making v an implicit interface{} type. That's interesting, isn't it? 😃 (

The switch v.(const) could still be the exhaustive addition for when v is of constant type we can require either all cases to be considered or a default case to be present.


Except the famous iota example, are there currently differences between how declarations in a single block and multiple blocks are treated?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Feb 6, 2020

Except the famous iota example, are there currently differences between how declarations in a single block and multiple blocks are treated?

No.

@jimmyfrasche
Copy link
Member

@jimmyfrasche jimmyfrasche commented Feb 7, 2020

Constants declared in the same package as their defined type are often enums—but not always. When they are, what an individual package means by enum is not at all uniform.

I'm not sure if what is meant by enum here is the best definition, but I do like that it tries to piggyback on ordinary consts as much as possible.

Maybe if there were a special way to more tightly associate consts with their type, like say

const type T int (
  A = iota
  B
  C

  synonym = C
)
const regular = B

All the constants in that block are implicitly type T. They cannot be another type. The labels A, B, and C and their values are associated with T. synonym is considered an alternate label for C.

regular is just an ordinary constant. It is not associated with T.

Otherwise, it's the same as

type T int
const (
  A T = iota
  B
  C

  synonym = C
)
const regular = B

This association could just mean that those (label, value) pairs are stored in the export data with T and accessible via reflect and go/types.

That would allow a lot of linters and libraries for handling various enum-ish things to be created to handle various enum semantics. You can kind of do that now for linters but it gets complicated since you don't know which consts are associated with the type, per the above definition. Runtime checking requires the associated labels and values to be added to the program either manually or with code generation.

The notion of associated label/values could also allow some special syntax like .(const) or range T as discussed above, but it would also be possible to use reflect to write it as library code (that could be replaced by code generated with go/types if necessary).

@ermik
Copy link
Author

@ermik ermik commented Feb 7, 2020

@jimmyfrasche thanks for chiming in. I will try to comment and clarify, what you and @ianlancetaylor shared is very helpful.

The reason I had idea for "piggybacking" on the ordinary const is that I personally believe that enumeration is a pattern, not a type. With all of its troubles we're discussing here, the enum pattern implementation in Go is a reflection of the language as a whole — clear, concise, "to the point".

It seems like a lot of issues have been created by the ambiguity of the syntax I proposed, which refers to the constant scope in question using const keyword, rather than scope itself. I also alluded to the necessity of clearing values as "known constants" before passing them to certain methods by specifying const in the function signature.

I now see that my affection for the const keyword is unhelpful. Throwing it at every one of the issues we have discussed here and elsewhere in the community is a mistake. Its distinct nature is of interest, but its presence is redundant. Perhaps a single extension of its use makes more sense:

type LetterIndex const int
//---------------^^^^^---- restricts this type to declarations of integer constants

const (
    LetterIndexA LetterIndex = iota+1
    LetterIndexB
    LetterIndexC
)

Further declarations of typed constants within package scope amend the set of known values:

const LetterIndexZ LetterIndex = 26 // valid

Declarations that duplicate values defined earlier in code are synonyms and simply point to the address of the unique value in memory:

const (
    Last = LetterIndexZ
    First LetterIndex = 0
)

The precedence and unique requirement encourage "single block" declaration, without restricting odd uses.

Declaration of a const int named type does invalidate type conversion:

var cast = LetterIndex(1) // ERR: cannot convert 1 (untyped int constant) to type LetterIndex

But we introduce two new ways of retrieving a value of a named type:

// Type assertion
var v = 1
cast, ok := v.(LetterIndex) // cast = LetterIndexA, ok = true

// Exhaustive switch
switch v.(LetterIndex) {
case A, B, C:
    break
default: // required, "25" was not handled
    break
}

Finally, the compiler is using a known set (list of unique values) to accommodate these new features. It is reasonable to expose this set to the users as well:

fmt.Printf("%+v", values(LetterIndex))
// [LetterIndexA, LetterIndexB, LetterIndexC, LetterIndexZ]

You can see that I replaced the use of const in the syntax I proposed earlier with the exact scope being targeted in code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.