fix(post): use non-greedy regular expressions #4161

stevenjoezhang · 2020-02-26T15:19:03Z

What does it do?

Currently, the regular expression rSwigFullBlock used in the escapeAllSwigTags method in lib/hexo/post.js may incorrectly match the Swig / Nunjucks block. For example, the user writes the following in a markdown document

{% note danger %}note text, note text, note text{% endnote %}

## Title

{% note danger %}note text, note text, note text{% endnote %}

Then, this regular expression will match all three lines, starting from {% note danger %} in the first line to the ending {% endnote %} in the third line, which means that ## Title in the second line is only processed by Nunjucks but not by the Markdown renderer, so it appears as ## Title instead of <h2>Title</h2> in the output HTML.

By using non-greedy regular expression, the first and third line will be matched separately to get the correct results.

I'm not an expert in regular expressions, so I can't guarantee whether this change will cause other bugs. Suggestions are welcome, thank you!

Related issues:
iissnan/hexo-theme-next#1674
iissnan/hexo-theme-next#1678
iissnan/theme-next-docs#163
theme-next/hexo-theme-next#839

How to test

git clone -b stevenjoezhang-patch-1 https://github.com/stevenjoezhang/hexo.git
cd hexo
npm install
npm test

Screenshots

Pull request tasks

Add test cases for the changes.
Passed the CI test.

SukkaW · 2020-02-27T05:37:41Z

Since this PR has fixed a issue but you are not sure whether your regexp will solve that issue or not, why not adding a related test case?

https://github.com/hexojs/hexo/blob/master/test/scripts/hexo/post.js

SukkaW · 2020-02-27T05:46:45Z

  // test for PR #4161
  it('render() - adjacent tags', () => {
    const content = [
      '{% quote %}',
      'content1',
      '{% endquote %}',
      '{% quote %}',
      'content2',
      '{% endquote %}'
    ].join('\n');

    return post.render(null, {
      content,
      engine: 'swig'
    }).then(data => {
      data.content.trim().should.eql([
        '<blockquote><p>content1</p>\n</blockquote>\n',
        '<blockquote><p>content2</p>\n</blockquote>\n',
      ].join(''));
    });
  });

Test case could be designed like this.

SukkaW

Please add related test case.

stevenjoezhang · 2020-02-27T05:53:59Z

Thanks for reminding. In fact, when Tommy351 created the relevant code five years ago, the test cases given were incomplete: 683fd0a

What's more interesting is that for multiple nested tags with the same name, the regular expression before modification does not actually match the last {% end %} tag. The intermediate results rendered by Hexo are confusing, but this does not affect nunjucks rendering correct results.

https://www.regextester.com/15

/\{% *(.+?)(?: *| +.*)%\}[\s\S]+?\{% *end\1 *%\}/g

{% note danger %}
note text, note text, note text

{% note danger %}
note text, note text, note text
{% endnote %}

{% endnote %}

I am still reading the source code to determine how Nunjucks preprocessing works. I'd appreciate it if you would help.

SukkaW · 2020-02-27T06:16:00Z

@stevenjoezhang

I noticed this regexp is only used to escape content:

https://github.com/hexojs/hexo/blob/master/lib/hexo/post.js#L48-L52

The whole post render process are looking like this:

Exec before_post_render filter

hexo/lib/hexo/post.js

Line 253 in a97bd2f

return promise.then(content => {

Hexo has built-in backtick code filter which will be executed at this time.

Escape swig tags into

hexo/lib/hexo/post.js

Line 259 in a97bd2f

data.content = cacheObj.escapeContent(data.content);

Render post with proper engine. Since swig tag is escaped, there will be no extra <p> tags added.
After render is finished, restore escaped swig tag based on index:

hexo/lib/hexo/post.js

Lines 282 to 283 in a97bd2f

    
           // Replace cache data with real contents 
        
           data.content = cacheObj.loadContent(content);

swig tag to be rendered:

hexo/lib/hexo/post.js

Lines 288 to 289 in a97bd2f

    
           // Render with Nunjucks 
        
           return tag.render(data.content, data);

stevenjoezhang · 2020-02-27T06:36:45Z

@SukkaW Thank you for the explanation. The problem is in the second step

Escape swig tags into 

hexo/lib/hexo/post.js

Lines 42 to 53 in a97bd2f

    
           escapeAllSwigTags(str) { 
        
             const rSwigVar = /\{\{[\s\S]*?\}\}/g; 
        
             const rSwigComment = /\{#[\s\S]*?#\}/g; 
        
             const rSwigBlock = /\{%[\s\S]*?%\}/g; 
        
             const rSwigFullBlock = /\{% *(.+?)(?: *| +.*)%\}[\s\S]+?\{% *end\1 *%\}/g; 
        
             const escape = _str => _escapeContent(this.cache, _str); 
        
             return str.replace(rSwigFullBlock, escape) 
        
               .replace(rSwigBlock, escape) 
        
               .replace(rSwigComment, '') 
        
               .replace(rSwigVar, escape); 
        
           }

As I said earlier, rSwigFullBlock does not match the last {% end %} tag, which means that after executing str.replace(rSwigFullBlock, escape),

hexo/lib/hexo/post.js

Line 49 in a97bd2f

return str.replace(rSwigFullBlock, escape)

{% note danger %}
note text, note text, note text

{% note danger %}
note text, note text, note text
{% endnote %}

{% endnote %}

becomes

<!-- \uFFFC 0 -->

{% endnote %}

{% endnote %} is actually replaced by the next line

hexo/lib/hexo/post.js

Line 50 in a97bd2f

.replace(rSwigBlock, escape)

And it becomes

<!-- \uFFFC 0 -->

<!-- \uFFFC 1 -->

Of course, this is not a bug, it's just a bit confusing.

The test cases have been updated.

SukkaW · 2020-02-27T12:29:13Z

Of course, this is not a bug, it's just a bit confusing.

https://runkit.com/sukkaw/5e57b5ead3f4440013a77529

I have set up a demo to show how it works. I believe it is ok to merge this PR then.

fix(post): use non-greedy regular expressions

7c9d48a

SukkaW requested changes Feb 27, 2020

View reviewed changes

add related test cases

4961785

SukkaW approved these changes Feb 27, 2020

View reviewed changes

SukkaW merged commit a9fb3fd into hexojs:master Feb 27, 2020

SukkaW mentioned this pull request Feb 27, 2020

test(post): fix cases added in #4161 #4162

Merged

2 tasks

SukkaW added a commit to SukkaW/hexo that referenced this pull request Feb 27, 2020

test(post): fix cases added in hexojs#4161

fe45e21

SukkaW added a commit that referenced this pull request Feb 27, 2020

test(post): fix cases added in #4161 (#4162)

2717d09

This was referenced Mar 4, 2020

{% %} should not be rendered in a post #3346

Closed

refactor(post/tag): render tag before content #4171

Closed

SukkaW mentioned this pull request May 18, 2020

if set hljs to true in hexo config, it will cause some Tag Plugins no rendering #4317

Closed

5 tasks

This was referenced Jun 15, 2020

fix(#4317): non-greedy regexp for tag escape #4358

Merged

test(#4087): add related cases #4364

Merged

SukkaW mentioned this pull request Jul 25, 2020

release: 5.0.0 #4423

Merged

22 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(post): use non-greedy regular expressions #4161

fix(post): use non-greedy regular expressions #4161

stevenjoezhang commented Feb 26, 2020 •

edited

Loading

SukkaW commented Feb 27, 2020

SukkaW commented Feb 27, 2020

SukkaW left a comment

stevenjoezhang commented Feb 27, 2020 •

edited

Loading

SukkaW commented Feb 27, 2020

stevenjoezhang commented Feb 27, 2020 •

edited

Loading

SukkaW commented Feb 27, 2020 •

edited

Loading

fix(post): use non-greedy regular expressions #4161

fix(post): use non-greedy regular expressions #4161

Conversation

stevenjoezhang commented Feb 26, 2020 • edited Loading

What does it do?

How to test

Screenshots

Pull request tasks

SukkaW commented Feb 27, 2020

SukkaW commented Feb 27, 2020

SukkaW left a comment

Choose a reason for hiding this comment

stevenjoezhang commented Feb 27, 2020 • edited Loading

SukkaW commented Feb 27, 2020

stevenjoezhang commented Feb 27, 2020 • edited Loading

SukkaW commented Feb 27, 2020 • edited Loading

stevenjoezhang commented Feb 26, 2020 •

edited

Loading

stevenjoezhang commented Feb 27, 2020 •

edited

Loading

stevenjoezhang commented Feb 27, 2020 •

edited

Loading

SukkaW commented Feb 27, 2020 •

edited

Loading