Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(post): use non-greedy regular expressions #4161

Merged
merged 2 commits into from
Feb 27, 2020
Merged

fix(post): use non-greedy regular expressions #4161

merged 2 commits into from
Feb 27, 2020

Conversation

stevenjoezhang
Copy link
Member

@stevenjoezhang stevenjoezhang commented Feb 26, 2020

What does it do?

Currently, the regular expression rSwigFullBlock used in the escapeAllSwigTags method in lib/hexo/post.js may incorrectly match the Swig / Nunjucks block. For example, the user writes the following in a markdown document

{% note danger %}note text, note text, note text{% endnote %}

## Title

{% note danger %}note text, note text, note text{% endnote %}

Then, this regular expression will match all three lines, starting from {% note danger %} in the first line to the ending {% endnote %} in the third line, which means that ## Title in the second line is only processed by Nunjucks but not by the Markdown renderer, so it appears as ## Title instead of <h2>Title</h2> in the output HTML.

截屏2020-02-27上午10 46 09

By using non-greedy regular expression, the first and third line will be matched separately to get the correct results.

I'm not an expert in regular expressions, so I can't guarantee whether this change will cause other bugs. Suggestions are welcome, thank you!

Related issues:
iissnan/hexo-theme-next#1674
iissnan/hexo-theme-next#1678
iissnan/theme-next-docs#163
theme-next/hexo-theme-next#839

How to test

git clone -b stevenjoezhang-patch-1 https://github.com/stevenjoezhang/hexo.git
cd hexo
npm install
npm test

Screenshots

Pull request tasks

  • Add test cases for the changes.
  • Passed the CI test.

@SukkaW
Copy link
Member

SukkaW commented Feb 27, 2020

Since this PR has fixed a issue but you are not sure whether your regexp will solve that issue or not, why not adding a related test case?

https://github.com/hexojs/hexo/blob/master/test/scripts/hexo/post.js

@SukkaW
Copy link
Member

SukkaW commented Feb 27, 2020

  // test for PR #4161
  it('render() - adjacent tags', () => {
    const content = [
      '{% quote %}',
      'content1',
      '{% endquote %}',
      '{% quote %}',
      'content2',
      '{% endquote %}'
    ].join('\n');

    return post.render(null, {
      content,
      engine: 'swig'
    }).then(data => {
      data.content.trim().should.eql([
        '<blockquote><p>content1</p>\n</blockquote>\n',
        '<blockquote><p>content2</p>\n</blockquote>\n',
      ].join(''));
    });
  });

Test case could be designed like this.

Copy link
Member

@SukkaW SukkaW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add related test case.

@stevenjoezhang
Copy link
Member Author

stevenjoezhang commented Feb 27, 2020

Thanks for reminding. In fact, when Tommy351 created the relevant code five years ago, the test cases given were incomplete: 683fd0a

What's more interesting is that for multiple nested tags with the same name, the regular expression before modification does not actually match the last {% end %} tag. The intermediate results rendered by Hexo are confusing, but this does not affect nunjucks rendering correct results.

https://www.regextester.com/15

/\{% *(.+?)(?: *| +.*)%\}[\s\S]+?\{% *end\1 *%\}/g
{% note danger %}
note text, note text, note text

{% note danger %}
note text, note text, note text
{% endnote %}

{% endnote %}

截屏2020-02-27上午12 46 02

I am still reading the source code to determine how Nunjucks preprocessing works. I'd appreciate it if you would help.

@SukkaW
Copy link
Member

SukkaW commented Feb 27, 2020

@stevenjoezhang

I noticed this regexp is only used to escape content:

https://github.com/hexojs/hexo/blob/master/lib/hexo/post.js#L48-L52

The whole post render process are looking like this:

  1. Exec before_post_render filter

return promise.then(content => {

Hexo has built-in backtick code filter which will be executed at this time.

  1. Escape swig tags into <!--\uFFFC[index]-->

data.content = cacheObj.escapeContent(data.content);

  1. Render post with proper engine. Since swig tag is escaped, there will be no extra <p> tags added.

  2. After render is finished, restore escaped swig tag based on index:

hexo/lib/hexo/post.js

Lines 282 to 283 in a97bd2f

// Replace cache data with real contents
data.content = cacheObj.loadContent(content);

  1. swig tag to be rendered:

hexo/lib/hexo/post.js

Lines 288 to 289 in a97bd2f

// Render with Nunjucks
return tag.render(data.content, data);

@stevenjoezhang
Copy link
Member Author

stevenjoezhang commented Feb 27, 2020

@SukkaW Thank you for the explanation. The problem is in the second step

Escape swig tags into <!--\uFFFC[index]-->

hexo/lib/hexo/post.js

Lines 42 to 53 in a97bd2f

escapeAllSwigTags(str) {
const rSwigVar = /\{\{[\s\S]*?\}\}/g;
const rSwigComment = /\{#[\s\S]*?#\}/g;
const rSwigBlock = /\{%[\s\S]*?%\}/g;
const rSwigFullBlock = /\{% *(.+?)(?: *| +.*)%\}[\s\S]+?\{% *end\1 *%\}/g;
const escape = _str => _escapeContent(this.cache, _str);
return str.replace(rSwigFullBlock, escape)
.replace(rSwigBlock, escape)
.replace(rSwigComment, '')
.replace(rSwigVar, escape);
}

As I said earlier, rSwigFullBlock does not match the last {% end %} tag, which means that after executing str.replace(rSwigFullBlock, escape),

return str.replace(rSwigFullBlock, escape)

{% note danger %}
note text, note text, note text

{% note danger %}
note text, note text, note text
{% endnote %}

{% endnote %}

becomes

<!-- \uFFFC 0 -->

{% endnote %}

{% endnote %} is actually replaced by the next line

.replace(rSwigBlock, escape)

And it becomes

<!-- \uFFFC 0 -->

<!-- \uFFFC 1 -->

Of course, this is not a bug, it's just a bit confusing.


The test cases have been updated.

@SukkaW
Copy link
Member

SukkaW commented Feb 27, 2020

Of course, this is not a bug, it's just a bit confusing.

https://runkit.com/sukkaw/5e57b5ead3f4440013a77529

I have set up a demo to show how it works. I believe it is ok to merge this PR then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants