Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve heredoc parsing to allow more generic shell-words #2213

Merged
merged 1 commit into from
Jul 1, 2021

Conversation

jedevc
Copy link
Member

@jedevc jedevc commented Jun 30, 2021

Previously, heredoc names have been restricted to simple alphanumeric strings. However, heredocs should support much more complex use-cases, including quoting anywhere, as well as allowing special symbols like . for easily expressing file extensions.

This patch adds support for these more complex cases, by using the shell lexer to parse each heredoc name. Additionally, we include improvements to the lexer to optionally preserve escape tokens to avoid problems when lexing words that have already been lexed before.

Not quite sure how I missed this before in the first iteration, since all shells support some really complex word rules for the naming of heredocs, but now there's some tests to catch these weirder cases. The main one of use is doing something like COPY <<file.txt / which wasn't possible before (because . wouldn't count as part of the heredoc name).

@jedevc jedevc requested a review from tonistiigi June 30, 2021 13:35
@jedevc jedevc force-pushed the dockerfile-heredoc-parsing branch 2 times, most recently from 3419a68 to 80ccbe1 Compare June 30, 2021 14:42
@jedevc
Copy link
Member Author

jedevc commented Jun 30, 2021

Woops, missed a spot that the integration tests caught 🎉 The MustParseHeredoc function was accidentally reading heredocs that were too long, should be corrected now (and have added another test for it).

Apologies for all the extra CI runs, I don't seem to be able to cancel them once they get going.

@jedevc jedevc force-pushed the dockerfile-heredoc-parsing branch 2 times, most recently from b226aff to af08f44 Compare June 30, 2021 20:35
@jedevc
Copy link
Member Author

jedevc commented Jun 30, 2021

Caught a small issue with quoting in something like 'E'O\'F which shouldn't be expanded because E is quoted. This is fixed by comparing the number of quotes in the quoted string and the non-quoted string. If they are different, then some quotes are unnecessary and will have been removed (might not necessarily be 0 because we can have escaped quotes as well).

Should be a test now for that case.

@jedevc jedevc force-pushed the dockerfile-heredoc-parsing branch from af08f44 to ec1bfc8 Compare July 1, 2021 08:13
Previously, heredoc names were restricted to simple alphanumeric
strings. However, heredocs should support much more complex use-cases,
including quoting anywhere, as well as allowing special symbols like `.`
for easily expressing file extensions.

This patch adds support for these more complex cases, by using the shell
lexer to parse each heredoc name. Additionally, we include improvements
to the lexer to optionally preserve escape tokens to avoid problems when
lexing words that have already been lexed before.

Signed-off-by: Justin Chadwell <me@jedevc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants