Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(open_graph): drop cheerio and use regex #3680

Merged
merged 5 commits into from Aug 22, 2019

Conversation

@SukkaW
Copy link
Member

SukkaW commented Aug 19, 2019

What does it do?

cheerio is slow and we only use that to parse and list img in post.content. This PR drop cheerio and use regex to do the job.

How to test

git clone -b BRANCH https://github.com/USER/hexo.git
cd hexo
npm install
npm test

Screenshots

913f5be

NodeJS 8 NodeJS 10 NodeJS 12
cheerio@0.22 (hexojs/hexo#master) 7.12 6.96 6.86
regex (This PR) 7.15 6.96 6.78
dom-parser (#3679) 7.2 6.93 6.82
conditional cheerio (#3670) 7.15 6.92 6.78

4f86dde

NodeJS 8 NodeJS 10 NodeJS 12
cheerio@0.22 (hexojs/hexo#master) 7.03 6.59 6.51
regex (This PR) 7.01 6.57 6.45
dom-parser (#3679) 6.98 6.56 6.41
conditional cheerio (#3670) 7.00 6.60 6.38

5c309eb

NodeJS 8 NodeJS 10 NodeJS 12
cheerio@0.22 (hexojs/hexo#master) 6.96 6.65 6.85
regex (This PR) 6.90 6.68 6.73
dom-parser (#3679) 6.88 6.67 6.73
conditional cheerio (#3670) 6.72 6.60 6.80

c11ded4

NodeJS 8 NodeJS 10 NodeJS 12
cheerio@0.22 (hexojs/hexo#master) 7.29 6.54 6.93
regex (This PR) 7.21 6.47 6.84
dom-parser (#3679) 7.16 6.50 6.87
conditional cheerio (#3670) 7.06 6.50 6.81

Pull request tasks

  • Add test cases for the changes.
  • Passed the CI test.
@coveralls

This comment has been minimized.

Copy link

coveralls commented Aug 19, 2019

Coverage Status

Coverage increased (+0.0008%) to 97.155% when pulling 3e0b0dc on SukkaW:og-img-regex into 49c7d4a on hexojs:master.

@SukkaW

This comment has been minimized.

Copy link
Member Author

SukkaW commented Aug 19, 2019

For regex, I have test those pattern with regex101 to match this post:

/<img.*?src="(.*?)".*?\/?>/gi

6491 steps, 9 matches, ~3ms: https://regex101.com/r/N1Rv9w/1

/<img.*?(?:>|\/>)/gi

8858 steps, 9 matches, ~5ms: https://regex101.com/r/N1Rv9w/2

/<img(.*?)>/gi

6527 steps, 9 matches, ~4ms: https://regex101.com/r/N1Rv9w/3

/<img.*?>/gi

4313 steps, 9 matched, ~3ms: https://regex101.com/r/N1Rv9w/4

/<img[^>]*src[="'\s]+[^.]*\/([^.]+)\.[^"']+["']?[^>]*>/gi

4627 steps, 9 matched, ~4ms: https://regex101.com/r/N1Rv9w/5

Currently in this PR I am using the first one.

Update

After 4f86dde I am now using the forth one.

Update

/<img [^>]*src=['"]([^'"]+)([^>]*>)/gi

4573 steps, 9 matched, ~4ms: https://regex101.com/r/N1Rv9w/7

@SukkaW

This comment has been minimized.

Copy link
Member Author

SukkaW commented Aug 19, 2019

I think we still need to dig into those PR: #3679 #3670 and this PR. They all increase performance, but not much.

Copy link
Contributor

segayuu left a comment

LGTM!

@curbengh

This comment has been minimized.

Copy link
Contributor

curbengh commented Aug 21, 2019

Since this PR only uses native API, naturally it supersedes #3679 #3670. This is unless there is possibility of some edge cases, which I currently don't foresee.

@SukkaW

This comment has been minimized.

Copy link
Member Author

SukkaW commented Aug 21, 2019

@segayuu @curbengh Should I add some test cases for <img> tags with empty src attribute?

@SukkaW SukkaW dismissed stale reviews from curbengh and segayuu via 3e0b0dc Aug 21, 2019
@SukkaW SukkaW requested review from curbengh and segayuu Aug 21, 2019
@curbengh curbengh referenced this pull request Aug 22, 2019
1 of 2 tasks complete
@SukkaW

This comment has been minimized.

Copy link
Member Author

SukkaW commented Aug 22, 2019

I have update the final benchmark for this PR & #3670

NodeJS 8 NodeJS 10 NodeJS 12
cheerio@0.22 (hexojs/hexo#master) 8.75 7.0 7.82
regex (This PR) 6.51 5.52 6.44
conditional cheerio (#3670) 7.68 6.49 7.51

The last benchmark shows this PR increase performance by nearly 20%.

I will merge this PR and close #3670 then.

@SukkaW SukkaW merged commit 6f16116 into hexojs:master Aug 22, 2019
4 checks passed
4 checks passed
codeclimate All good!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.0008%) to 97.155%
Details
@curbengh curbengh referenced this pull request Aug 22, 2019
3 of 4 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.