Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser does not ignore the first CR-LF in a multi-line string #262

Closed
ghost opened this issue Jan 24, 2023 · 0 comments
Closed

Parser does not ignore the first CR-LF in a multi-line string #262

ghost opened this issue Jan 24, 2023 · 0 comments

Comments

@ghost
Copy link

ghost commented Jan 24, 2023

The TOML 1.0.0 spec states about multi-line strings that

A newline immediately following the opening delimiter will be trimmed.

The term "newline" is somewhat ambiguous in this sentence, but the ABNF grammar shows that newline in this context must be taken to mean either LF or CR-LF.

But the tomlkit parser only trims opening LF in a multi-line string, not CR-LF:

import tomlkit
doc = tomlkit.parse('v = """\r\nfoo"""\n')
print(repr(doc["v"]))

prints:

'\r\nfoo'

This suggests that the \r\n combination immediately following the opening quotes has not been trimmed.
If the example TOML document is changed to use only \n instead of \r\n, the tomlkit parser does in fact trim the newline from the string value.

This issue applies to the latest commit 6512eaa on master as well as the latest release v0.11.6.

I'm experimenting with a TOML fuzzer which revealed this issue. I guess most normal applications are not likely to hit this case, so I understand if this gets low priority. It still seems worthwhile to fix it though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants