-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backtrack a minimum of one character when footnote does not match #51
Conversation
When a footnote fails to match in Footnote.match_reference, `lines._index` is moved backwards by a number of characters equal to the number of newlines remaining in `lines`. This causes buggy behavior when the input markdown does not contain any newlines, because `lines._index` does not change. Instead, always backtrack at least one character, so the token start `[` is covered in case there are zero newlines.
Thanks for the pull request! Yeah, figuring out how to parse footnotes took me a while. I should add some documentation at a certain point. I think the important thing is that all What complicates line_buffer = []
next_line = lines.peek()
while next_line is not None and next_line.strip() != '':
line_buffer.append(next(lines))
next_line = lines.peek()
string = ''.join(line_buffer) ... and then do our matches on the concatenated string (with newlines in them). When we encounter anything that does not form a footnote reference, we need to "give back" the number of lines still unmatched in the concatenated string, because they are probably part of another block-level token. This is why every time before we return More related to your question: are you passing a list of lines without newlines into mistletoe? The assumption that block-level tokens can match on a list of newline-terminating lines is quite entrenched at this point (which is why, for example, this line exists). My reasoning is that most people will be using a Markdown parser like this: with open('foo.md', 'r') as fin:
return markdown(fin) ... and You have alerted me to another edge case, however, where a document does not terminate with newlines. I'll open another issue for that! Let me know what you think. Thanks again! |
Thank you for the explanation about concatenating the remainder of the lines to find the end of the footnote. That makes sense. The situation I ran into is actually the second one you mentioned, where the document does not terminate with a newline. I was writing integration tests for an application and using short strings (without newlines) as the test input data. Though it was not my intention, I do believe this is still a realistic scenario; the Commonmark spec section 2.1 does not seem to require a terminal newline, and Markdown can be used in web apps that store data in, say, a database instead of a file. Even when the document is being read from a file, I don't believe it's safe to assume a newline. POSIX requires one, but Windows (and probably other systems) does not, and of course it would be nice to accommodate non-compliant files on POSIX systems if it's not difficult. Perhaps the best solution is to simply append a newline in |
I implemented this in 1a43a68, and will close this pull request. As for POSIX vs. Windows newlines, see this documentation. Unless I'm misreading the doc, it seems that alternative newlines when reading files are already handled by Python itself. Thanks for the contribution! |
When a footnote fails to match in Footnote.match_reference,
lines._index
is moved backwards by a number of characters equal to the number of newlines remaining inlines
. This causes buggy behavior when the input markdown does not contain any newlines, becauselines._index
does not change.Instead, always backtrack at least one character, so the token start
[
is covered in case there are zero newlines.I confess I do not fully understand the Footnote parsing. It may seem that the correct behavior is actually to always backtrack one character?
The tests pass for either solution.