New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix parsing of indented markdown features by requiring line-break delimiter #2821
Comments
Although it would be possible for us to start doing more sophisticated parses of Markdown, ideally, we'd try and keep this feature very simple. Having just a one-line-at-a-time, is-it-literate-or-is-it-not heuristic makes it way easier to for other tools to support the same syntax for other purposes. Are you sure it wouldn't be easier just to write it out like this?
|
A poor example; mea culpa. Something less easily “fixable” would have illustrated the point better, like making the list nested:
or a canonical markdown example:
or the more edgy cases, like neatly indented multi-paragraph list items. To be sure, some features are well enough off the beaten path to disregard, but others are common enough, and in a few cases the danger could lie well hidden, and lead to some pernicious bugs. I appreciate the power of simplicity above all else, but here I think it will prove worthwhile to find an acceptable way to chase the line break. Thanks for having a look. |
You're almost certainly right. Any interested in cooking up a patch? |
This is especially problematic because in many cases plain English text is parsed as valid CoffeeScript code, which means the indented markdown text would end up as javascript. |
So I can think of two ways to go here. There’s the line-by-line state-machinery, as in the deliterate: (code) ->
PROSE = /([\s\S]*?(?:\n|$))(?:[ \t]*?\n|$)/g
BLANK = /((?:[ \t]*\n)+)(?:( {4}|\t)|.)/g
CODE = /((?:(?: {4}|\t).*(?:\n|$))+)(?:([ \t]*\n)|.)/g
out = []
rx = if /^(?: {4}\t).*(?:\n|$)/.test code then CODE
else if /^\s*(?:\n|$)/.test code then BLANK
else PROSE
while match = rx.exec code
chunk = match[1]
break unless lines = chunk.replace(/([\s\S]*?)\n?$/, '$1').split '\n'
switch rx
when PROSE
out.push '# ' + line for line in lines
rx = BLANK
when BLANK
out.push '' for line in lines
rx = if match[2]? then CODE else PROSE
when CODE
out.push /^(?: {4}|\t)?(.*)/.exec(line)[1] for line in lines
rx = if match[2]? then BLANK else PROSE
rx.lastIndex = match.index + chunk.length
out.join '\n' Uglier looking, but here executions operate on multi-line blocks rather than per-line, so — although it’s far too soon to care — Anyway those are just two first quick stabs, but before moving forward I thought I’d ask for an opinion on which might be a better strategy to try, or just a more appropriate fit. (I suspect Docco will need to be kept in mind as well.) And certainly, parsing markdown is not my wheelhouse; if there’s a smarter approach that blows both of these away, I hope someone can chime in. |
* Expect a blank line as delimiter between text and code (jashkenas#2821). * Don't change indentation of code. It is not necessary and leads to erroneous locationData. (jashkenas#2835) * Don't modify blank lines and reverse the change in the lexer. * Don't ignore indentation with mixed whitespace.
Pinging @marchaefner to take a look at the approaches in this ticket and compare 'em to his, in his. |
Or vice-versa, @nickfargo, if you have any thoughts about the simplification in #2838 |
Wouldn't it be better to use an existing Markdown parser and then extract the Here's an example of a possible simple interpreter in the command line, using marked as the Markdown parser and hxselect to extract the $ echo '# Awesome Literate CoffeeScript Interpreter
* Lorem
* ipsum
* dolor
* sit
amet,
consectetur adipisicing elit,
sed do eiusmod tempor
incididunt ut labore
some coffee code
et dolore
even
more code
magna aliqua.' | marked | hxselect 'code' -c -s '\n' | coffee -cs Output:
|
That's an awesome idea! As long as no one mixes in code blocks that aren't coffeescript. |
Unless you wanted those nice error messages and source maps to coincide with the code in your .litcoffee file. |
@epidemian Its a nice approach, but it seems somewhat wasteful to run a full markdown parser just to find where the code parts are. But the biggest problem is probably generating source maps, as @marchaefner mentioned. |
Ho, that's right; totally forgot about that. Then it seems the pain of reinventing a part of a Markdown parser is necessary 😓. The corner cases in Markdown are pretty hairy though, things like code inside lists (or nested lists), or code after lists. I wouldn't like to see those pesky corner cases creep into LCS.
Well, it depends on what you're worried to "waste". One could also argue that trying to make and maintain a (subset of a) Mardown parser when some many already exist is a waste of development time. |
I think #2838 takes it. The global regexes of Ready to close. Thanks @marchaefner! |
* Expect a blank line as delimiter between text and code (jashkenas#2821). * Don't change indentation of code. It is not necessary and leads to erroneous locationData. (jashkenas#2835) * Don't modify blank lines and reverse the change in the lexer. * Don't ignore indentation with mixed whitespace.
Here is an example of a markdown-coffee passage that cannot be lexed as literate (#1786) in 1.6.1, because the list item’s continuation line passes the simple per-line
/[ ]{4}|\t/
test and as such is incorrectly identified as a code block.This seems to imply that literate code blocks should be not only indented, but also delimited from overhead markdown by at least one immediately preceding empty line. As much is suggested by example, and by the syntax highlighting rules in the tmbundle, however it is not presently stipulated by the implementation.
One approach for a fix might be to expand
Lexer::clean
a bit, allowing the markdown passages to be read in blocks, rather than individual lines:With this in place, existing tests appear to pass, as do ad-hoc cases similar to the example above that previously would fail.
Please offer thoughts on whether this might be an approach worth evolving into a pull.
The text was updated successfully, but these errors were encountered: