Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: support multiple formats for syntax directives #2937

Merged
merged 1 commit into from
Nov 17, 2022

Conversation

jedevc
Copy link
Member

@jedevc jedevc commented Jun 29, 2022

See #2812 for context, and #2904 for the related proposal regarding compatibility with shebang lines in Dockerfiles.

This proposal is presented as code to give a clearer view of what an implementation may look like, however, this is unlikely to be the final implementation. The code introduces basic support for two additional ways of syntax, so that the full list of possibilities is:

  • #syntax=foo, for script-style comments supported in most scripting languages, and Dockerfiles
  • //syntax=foo, for C-style comments supported in C/C++ and javascript/typescript
  • {"syntax": "foo", ...}, for JSON documents

(Note that XML support isn't included, and that JSON document support is included as a demonstration of how we could expand this support in the future).

The code is slightly more complex than it appears it needs to be - this is to handle a couple of constraints:

  • The default directive style should remain as #key=value
  • Mix-and-matching different directive styles should not be supported
  • Additionally allow easy extension in the future for other directive styles

Importantly, this does not change the implementation of how Dockerfiles parse their directives, these are defined in a separate regex that remains unmodified.

Please share your thoughts 馃帀

@jedevc jedevc marked this pull request as draft June 29, 2022 14:30
Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way this should work should be that there aren't new ways to define directives for Dockerfile, # remains the only supported syntax for that. When reading #syntax specifically from the directives, if there are no matches, then there should be a fallback that checks for other versions as well.

Eg.

// syntax=docker/dockerfile
FROM alpine

Is still an invalid Dockerfile and should error.

This is what was wrong in the previous shebang PR as well. If you do it in a fallback, feel free to add support for skipping the shebang line as well.

frontend/dockerfile/dockerfile2llb/directives.go Outdated Show resolved Hide resolved
@jedevc
Copy link
Member Author

jedevc commented Jul 19, 2022

I think what you're describing is how it works currently. This directive parsing code modified is only used for extracting the syntax directive, the directive parsing used to actually parse the dockerfile is located here. The utilities in directives.go are only used to determine to forward to a gateway.

The fallback behavior is also as you describe - we run through the directive parsers one-by-one, attempting to extract a syntax key from each - until we find one. If none are found, we can just give up and assume that it's the default # and parse directives from there.

@jedevc jedevc marked this pull request as ready for review July 19, 2022 10:13
@tonistiigi
Copy link
Member

I think what you're describing is how it works currently.

With the fallback ParseDirectives() should stay as it is currently(you can optimize regexp). Only in DetectSyntax() when ParseDirectives() call doesn't return a syntax directive it should try if the first line is //, skip shebang if needed, try json etc.

@jedevc
Copy link
Member Author

jedevc commented Jul 19, 2022

Alright sounds good - I think I was confused since DetectSyntax seems to be the only thing that ever calls the ParseDirectives function, but I suppose there could be external callers we'd want to maintain compatibility for.

@jedevc
Copy link
Member Author

jedevc commented Jul 25, 2022

Have done another pass through this, ParseDirectives now only parses # directives, while DetectSyntax searches through all the supported possibilities. I've also added a commit to skip shebang lines for #-style comments, I'm not sure if we want them for the other styles, since they won't be valid comments in the language that uses them, so I'm inclined to not allow it (or to wait until we get a valid use case).

I wonder if we might consider extending ParseDirectives in a follow-up? I do see a couple of codebases using it, so it might be nice to allow users to specify a syntax parameter for something like ParseDirectivesForSyntax to parse for multiple languages.

@jedevc jedevc added this to the v0.11.0 milestone Oct 18, 2022
frontend/dockerfile/dockerfile2llb/directives.go Outdated Show resolved Hide resolved
frontend/dockerfile/dockerfile2llb/directives.go Outdated Show resolved Hide resolved
frontend/dockerfile/dockerfile2llb/directives.go Outdated Show resolved Hide resolved
frontend/dockerfile/dockerfile2llb/directives.go Outdated Show resolved Hide resolved
@jedevc
Copy link
Member Author

jedevc commented Nov 15, 2022

Have completely rewritten the code to be significantly less clever, so the DetectSyntax function should be much easier to read from top-to-bottom.

In addition to adding the shebang, c-style comments and json parsing, I've also moved the appropriate functionality into the parser package and exposed the directives parser properly, so that we can get identical behavior to how the directives parsed inside the parser (which was actually different before, e.g. doing different casing, etc.)

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL the CI errors

d := &directives{
seen: map[string]struct{}{},
}
d := &directives{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you intentionally leaving the parser nil?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I've reworked the parser to work from a nil state easily, so that when using it outside of this package, no extra New methods, etc, would be required.

frontend/dockerfile/parser/directives.go Outdated Show resolved Hide resolved

// use directive with different comment prefix, and search for //syntax=
directiveParser = DirectiveParser{Line: line}
directiveParser.setComment("//")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically shebang should be between the two directive parsers and // should not accept multiline but just do a regexp match. But whatever, we can leave it like this.

var directive struct {
Syntax string `json:"syntax"`
}
if err := json.Unmarshal(dt, &directive); err == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No change required: I wonder if we could check that this is a JSON without full unmarshal. Eg. when user makes a typo in the JSON it would be better to give them JSON parse error(or maybe we can still parse syntax without hitting the typo) if the file is still mostly JSON than to just skip over the syntax forwarding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of attempting to parse only up until we have the value of syntax and then immediately stopping - not quite sure how to do that though, I'm fairly sure that the builtin json package doesn't allow that level of control.

This patch introduces support for allowing multiple formats for syntax
directives, to determine the syntax of the file to forward to the
appropriate gateway. Additionally, support for shebangs for syntax
detection is also permitted.

This does not change the dockerfile syntax and allow different
directives or shebangs during dockerfile parsing, these changes only
affect the syntax detection.

Additionally, the relevant functions are moved into the parser package
from the dockerfile2llb package, so that we can share code between the
both the dockerfile directive parsing, and the syntax detection code,
and also removes the dockerfile2llb.ParseDirectives function which
appeared to scan directives in the same was as the parser, but was not
aware of various quirks (e.g. case-insensitivity).

Signed-off-by: Justin Chadwell <me@jedevc.com>
@tonistiigi tonistiigi merged commit f771330 into moby:master Nov 17, 2022
return directives, nil
}

// DetectSyntaxFromDockerfile returns the syntax of provided input.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably should've just been DetectSyntax, right? 馃槆

(The function itself doesn't have the FromDockerfile suffix on the name 馃憖)

Copy link
Member Author

@jedevc jedevc Dec 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3404 fixes this 馃帀

Comment on lines +26 to +28
// for some reason Moby implementation in case insensitive for escape
dt = `# EScape=\
# KEY = FOO bar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was a clear case of "fatigue" kicking in after a very long debate about the feature (it originally was implemented as a new ESCAPE Dockerfile instruction). Once we got to the concept of "parser directives", there was debate about "upper" or "lowercase"; moby/moby#22268 (comment), and in the end it was decided to "recommend" lowercase, but "allow" uppercase moby/moby#22268 (comment) moby/moby#22268 (comment)

I'd be open to (perhaps for Dockerfile instructions as well) discuss making it slightly more strict, and accept lower and UPPER, but not mIxEDcAsE (which we currently support in most cases), although we should look if there's actual uses of mixed-case and if it would be too much of a breaking change (we could "warn" ahead perhaps).

@jedevc jedevc deleted the directive-proposal branch December 14, 2022 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants