Parser does not ignore the first CR-LF in a multi-line string #262

ghost · 2023-01-24T21:16:51Z

The TOML 1.0.0 spec states about multi-line strings that

A newline immediately following the opening delimiter will be trimmed.

The term "newline" is somewhat ambiguous in this sentence, but the ABNF grammar shows that newline in this context must be taken to mean either LF or CR-LF.

But the tomlkit parser only trims opening LF in a multi-line string, not CR-LF:

import tomlkit
doc = tomlkit.parse('v = """\r\nfoo"""\n')
print(repr(doc["v"]))

prints:

'\r\nfoo'

This suggests that the \r\n combination immediately following the opening quotes has not been trimmed.
If the example TOML document is changed to use only \n instead of \r\n, the tomlkit parser does in fact trim the newline from the string value.

This issue applies to the latest commit 6512eaa on master as well as the latest release v0.11.6.

I'm experimenting with a TOML fuzzer which revealed this issue. I guess most normal applications are not likely to hit this case, so I understand if this gets low priority. It still seems worthwhile to fix it though.

The text was updated successfully, but these errors were encountered:

frostming closed this as completed in fefb29f Apr 27, 2023

frostming mentioned this issue Apr 27, 2023

Release 0.11.8 #285

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser does not ignore the first CR-LF in a multi-line string #262

Parser does not ignore the first CR-LF in a multi-line string #262

ghost commented Jan 24, 2023

Parser does not ignore the first CR-LF in a multi-line string #262

Parser does not ignore the first CR-LF in a multi-line string #262

Comments

ghost commented Jan 24, 2023