Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Token type paragraph becomes text in master #1247

Closed
vsemozhetbyt opened this issue May 2, 2018 · 4 comments
Closed

Token type paragraph becomes text in master #1247

vsemozhetbyt opened this issue May 2, 2018 · 4 comments
Labels
L1 - broken Valid usage causes incorrect output OR a crash AND there is no known workaround for the issue

Comments

@vsemozhetbyt
Copy link
Contributor

vsemozhetbyt commented May 2, 2018

Marked version: 0.3.19 master

Markdown flavor: CommonMark ?

Test Script

'use strict';

const marked = require('marked');

const md = `
A Paragraph.

> A blockquote

`;

const tokens = marked.lexer(md);
console.log(tokens);

const html = marked.parser(tokens);
console.log(`\n${html}`);

Expectation

The output from the old version 0.3.6

[ { type: 'paragraph', text: 'A Paragraph.' },
  { type: 'blockquote_start' },
  { type: 'paragraph', text: 'A blockquote' },
  { type: 'blockquote_end' },
  links: {} ]

<p>A Paragraph.</p>
<blockquote>
<p>A blockquote</p>
</blockquote>

Result

The output from the 0.3.19 master:

[ { type: 'text', text: 'A Paragraph.' },
  { type: 'space' },
  { type: 'blockquote_start' },
  { type: 'text', text: 'A blockquote' },
  { type: 'blockquote_end' },
  links: {} ]

<p>A Paragraph.</p>
<blockquote>
<p>A blockquote</p>
</blockquote>

Is this cange intended?

@UziTech
Copy link
Member

UziTech commented May 2, 2018

v0.3.19 actually outputs the correct type, it is master that outputs 'text' instead of 'paragraph'

looks like the culprit is d08039e#diff-81ab0a5aa39b7a91951fc28d2d151496R36

image

Just adding that question mark back seems to fix this issue. I'm not sure if it introduces new issues.

@Feder1co5oave was that a mistake or is there a reason the question mark was removed?

looks like all tests pass when I just add the question mark back

@vsemozhetbyt vsemozhetbyt changed the title Token type paragraph becomes text in last versions Token type paragraph becomes text in master May 2, 2018
@UziTech UziTech mentioned this issue May 2, 2018
4 tasks
@UziTech
Copy link
Member

UziTech commented May 2, 2018

I added a PR that adds back the question mark and adds a test for this issue #1248

@UziTech UziTech added the L1 - broken Valid usage causes incorrect output OR a crash AND there is no known workaround for the issue label May 2, 2018
@UziTech
Copy link
Member

UziTech commented May 2, 2018

@vsemozhetbyt thanks for reporting this 💯

@Feder1co5oave
Copy link
Contributor

@UziTech it made sense to me at the time, because the types of token matched by the negative lookahead that follows (these are the ones capable of interrupting paragraphs) should be on a new line anyway, but it was a naive change.

The matching of one \n is implicit from the leading [^\n]+ that eats up everything up until the first newline. So in theory the ? should be stripped away, but in practice you cannot, because of the + quantifier at the end of that group.

paragraph: /^([^\n]+(?:\n(?!hr|heading|lheading| {0,3}>|<\/?(?:tag)(?: +|\\n|\/?>)|<(?:script|pre|style|!--))[^\n]+)+)/,
//                                                                                                             this ^

In the case of a one-line paragraph like in the reported example, the required newline (followed by some other [^\n]+ text on a new line) cannot be found.
So you either put back the ? or you change the final + to a *, the grammar should be the same (the latter makes more sense to me at the moment).

Also, I just noticed that the \\n after (?:tag) should really be a \n, it was a copy&paste mishap from the html rule above.

@UziTech UziTech mentioned this issue May 2, 2018
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
L1 - broken Valid usage causes incorrect output OR a crash AND there is no known workaround for the issue
Projects
None yet
Development

No branches or pull requests

3 participants