Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: revisit allowed set of characters in module, import, and file paths #45549

Open
jayconrod opened this issue Apr 13, 2021 · 0 comments
Open

Comments

@jayconrod
Copy link
Contributor

@jayconrod jayconrod commented Apr 13, 2021

Currently, import paths have the following lexical restrictions (see module.CheckImportPath):

  • Must consist of valid path elements, separated by slashes. Must not begin or end with a slash.
  • A valid path element is a non-empty string that consists of ASCII letters, ASCII digits, and the punctuation characters - . _ ~. Must not end with a dot or contain two dots in a row.
  • A path element prefix up to the first dot must not be a reserved name on Windows, regardless of case (CON, com1, ...). An element must not have a suffix of a tilde followed by ASCII digits (like a Windows short name).

Module paths have the same restrictions as import paths, with additional constraints (see module.CheckPath:

  • The first path element (by convention, a domain name) must const only lower-case ASCII letters, ASCII digits, dots, and dashes. It must contain at least one dot and must not start with a dash.
  • If the path ends with /vN where N consists of ASCII digits and dots, N must not begin with 0, must not be 1, and must not contain any dots (there's a separate special case for gopkg.in/... module paths).
  • No path element may begin with a dot.

File paths have the same restrictions as import paths, but the set of allowed characters is larger (see module.CheckFilePath):

  • Path elements may consist of Unicode letters, ASCII digits, ASCII spaces, and ASCII punctuation characters ! # $ % & ( ) + , - . = @ [ ] ^ _ { } ~. The remaining ASCII punctuation characters " * < > ? ` ' | / \ : are excluded.

These restrictions are generally in place for good reasons (see Unicode restrictions):

  • Module paths are frequently written and encoded into URLs, and we don't want to allow strings that interfere with that (for example, non-ASCII domain names).
  • Module contents are extracted into directories on a variety of systems. We don't want to allow strings that aren't valid file names or might collide with a different string (on case-insensitive or Unicode normalizing systems). We don't want to allow strings that are reserved, might be interpreted by the shell, might be interpreted as a flag (starting with -), or might be interpreted as a repository (.git).

That being said, these restrictions more English-centric than necessary (#45507). They're also more restrictive than GOPATH (#29101).

We should come up with a wider set of characters that may be allowed without causing compatibility problems, particularly for import and file paths.

cc @bcmills @matloob

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant