Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/token: add IsIdentifier, IsKeyword, and IsExported funcs #30064

Closed
mvdan opened this issue Feb 2, 2019 · 13 comments

Comments

Projects
None yet
6 participants
@mvdan
Copy link
Member

commented Feb 2, 2019

We already have go/ast.IsExported, which is helpful; it makes the operation simple and expressive, and it avoids the simple mistake of only checking ASCII and ignoring Unicode.

Checking if a string is a valid identifier as per the spec (https://golang.org/ref/spec#Identifiers) falls under the same category, I believe. It seems intuitive, so developers tend to write a simple implementation by hand without much effort. However, there are many gotchas which are easy to get wrong:

  • Unicode
  • Digits are allowed, but not at the beginning
  • _ is allowed, even as a first character
  • Empty strings are not valid identifiers

For example, take this implementation I just came across this morning:

for _, c := range str {
	switch {
	case c >= '0' && c <= '9':
	case c >= 'a' && c <= 'z':
	case c >= 'A' && c <= 'Z':
	case c == '_':
	default:
		return false
	}
}
return true

It gets three of the gotchas wrong; it would return true on "" and "123", and return false on "αβ". I imagine that a correct implementation would be like:

if len(str) == 0 {
	return false
}
for i, c := range str {
	if unicode.IsLetter(c) || c == '_' || (i > 0 && unicode.IsDigit(c)) {
		continue
	}
	return false
}
return true

However, I'm still not 100% confident that this is correct. I'd probably compare it to implementations in go/* and x/tools before using the code, just in case.

The standard library implements variants of this in multiple places; go/scanner does it on a per-rune basis while scanning, go/build implements it in its importReader, and cmd/doc has an isIdentifier func. The same applies to other official repos like x/tools, where I've found godoc.isIdentifier, semver.isIdentChar, analysis.validIdent, and rename.isValidIdentifier.

I think we should have a canonical implementation with a proper godoc and tests. My proposal is to add this to go/ast, as it would already be useful to other packages within the standard library. I also think this is a fairly low-level API, and it's commonly needed whenever one is modifying Go syntax trees, creating new Go files, or otherwise tinkering with Go syntax.

If adding API to go/ast is a problem, perhaps golang.org/x/tools/go/ast/astutil. However, in my mind that package contains much higher-level functions, so it feels a bit out of place.

I realise this could have been a proposal, but I thought the proposed change wouldn't be controversial enough to warrant one. Feel free to retitle it as a proposal if preferred.

/cc @griesemer @josharian @alandonovan @dominikh

@mvdan mvdan added the NeedsDecision label Feb 2, 2019

@adonovan

This comment has been minimized.

Copy link

commented Feb 2, 2019

I think this is a fine idea, but the function belongs in go/token, not go/ast.

@mvdan

This comment has been minimized.

Copy link
Member Author

commented Feb 2, 2019

go/token sounds fine; I just suggested go/ast as that's where IsExported is.

@ianthehat pointed out keywords, which I was forgetting about. The spec mentions:

The following keywords are reserved and may not be used as identifiers.

So I presume that IsIdentifier should return false for all Go keywords.

Ian also mentions that go/scanner could be used for this by the user, checking that the entire string scans as a single identifier. I've implemented what I think is a correct way to do it below:

func isIdentifier(str string) bool {
        src := []byte(str)
        var s scanner.Scanner
        fset := token.NewFileSet()
        file := fset.AddFile("", fset.Base(), len(src))
        s.Init(file, src, nil, scanner.ScanComments)
        _, tok, lit := s.Scan()
        return tok == token.IDENT && lit == str
}

However, I still think that an IsIdentifier func in the standard library would be better. Compared to using go/scanner directly, it would be a single line and require zero allocations.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Feb 2, 2019

We might consider IsKeyword in go/token as well. In addition to being convenient, the implementation could use a perfect hash for faster checks (as does cmd/compile/internal/syntax).

@mvdan mvdan changed the title go/ast: add func IsIdentifier(string) bool go/token: add func IsIdentifier(string) bool Feb 2, 2019

@adonovan

This comment has been minimized.

Copy link

commented Feb 2, 2019

I'd be happy for us to add IsIdentifier, IsKeyword, and IsExported to go/token.

@mvdan

This comment has been minimized.

Copy link
Member Author

commented Feb 3, 2019

IsKeyword sounds like a great idea; I imagine that IsIdentifier would still return false for inputs like for, though. That is, IsIdentifier would use IsKeyword as part of its implementation.

and IsExported to go/token

Not sure if this is strictly necessary, since it's already part of go/ast. I assume you're suggesting it for the sake of consistency.

@adonovan

This comment has been minimized.

Copy link

commented Feb 3, 2019

I assume you're suggesting it for the sake of consistency.

Exactly. The go/ast function would delegate to the go/token implementation.

@griesemer

This comment has been minimized.

Copy link
Contributor

commented Feb 5, 2019

I'm ok with @adonovan said.

@griesemer griesemer added this to the Go1.13 milestone Feb 5, 2019

@griesemer griesemer added NeedsFix and removed NeedsDecision labels Feb 5, 2019

@griesemer

This comment has been minimized.

Copy link
Contributor

commented Feb 5, 2019

Please send CLs to me. Thanks.

@mvdan mvdan self-assigned this Feb 5, 2019

@mvdan mvdan changed the title go/token: add func IsIdentifier(string) bool go/token: add IsIdentifier, IsKeyword, and IsExported funcs Feb 5, 2019

@Skarlso

This comment has been minimized.

Copy link
Contributor

commented Feb 19, 2019

Hi @mvdan @adonovan @griesemer. Can I work on this one? I don't see a CL attached.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Feb 19, 2019

I’m pretty sure @mvdan is working on this; he assigned it to himself.

@Skarlso

This comment has been minimized.

Copy link
Contributor

commented Feb 19, 2019

@josharian Cool, thanks. :)

@mvdan

This comment has been minimized.

Copy link
Member Author

commented Feb 28, 2019

I will get to it soon - I just have a moderately sized backlog of stuff to go through, now that the tree is open again :)

@gopherbot

This comment has been minimized.

Copy link

commented Mar 23, 2019

Change https://golang.org/cl/169018 mentions this issue: go/token: add IsIdentifier, IsKeyword, and IsExported

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.