Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue when converting HTML with many trailing space to editorState #17

Closed
js8310 opened this issue Jul 11, 2016 · 4 comments

Comments

@js8310
Copy link

js8310 commented Jul 11, 2016

Performance issue when converting HTML to editorState (with stateFromHTML from draft-js-import-html)

Below is an example of some HTML that contains many trailing spaces which causes the parsing process to slow down. Time taken depends on how large the HTML is, it can take more than 1 minute for a fairly large block of HTML.

Also, I used performance.now() to measure the time taken when using stateFromHTML.

"<em> Nemo develops a smaller right fin <\/em> as a result of damage to his egg during the attack, which limits his <em> swimming ability <\/em> . <br><br> Worried about Nemo's safety, Marlin embarrasses Nemo during a school field trip. <br><br> Nemo sneaks away from the reef and is <em> captured by scuba divers <\/em> . <br><br> As the boat departs, a diver accidentally knocks his <a href=\"https:\/\/en.wikipedia.org\/wiki\/Diving_mask\"> <em> diving mask <\/em> <\/a> overboard. <br><br> While attempting to save Nemo, Marlin meets Dory, a good-hearted and optimistic <a href=\"https:\/\/en.wikipedia.org\/wiki\/Paracanthurus\"> <em> regal blue tang with <\/em> <\/a> <em> short-term memory loss <\/em> . <br><br> Marlin and Dory meet three <a href=\"https:\/\/en.wikipedia.org\/wiki\/Shark\"> <em> sharks <\/em> <\/a> – <em> Bruce, Anchor and Chum <\/em> – who claim to be <em> vegetarians <\/em> ."

Time taken:
1st round - 443.50ms
2nd round - 481.62ms
3rd round - 442.65ms

This HTML has the trailing spaces removed and it is working perfectly fine as you can see the result of the time taken to convert below.

"<em>Nemo develops a smaller right fin <\/em>as a result of damage to his egg during the attack, which limits his <em>swimming ability <\/em>.<br>\n<br>\nWorried about Nemo's safety, Marlin embarrasses Nemo during a school field trip.<br>\n<br>\nNemo sneaks away from the reef and is <em>captured by scuba divers <\/em>.<br>\n<br>\nAs the boat departs, a diver accidentally knocks his <a href=\"https:\/\/en.wikipedia.org\/wiki\/Diving_mask\"><em>diving mask <\/em><\/a>overboard.<br>\n<br>\nWhile attempting to save Nemo, Marlin meets Dory, a good-hearted and optimistic <a href=\"https:\/\/en.wikipedia.org\/wiki\/Paracanthurus\"><em>regal blue tang with <\/em><\/a><em>short-term memory loss <\/em>.<br>\n<br>\nMarlin and Dory meet three <a href=\"https:\/\/en.wikipedia.org\/wiki\/Shark\"><em>sharks <\/em><\/a>\u2013 <em>Bruce, Anchor and Chum <\/em>\u2013 who claim to be <em>vegetarians <\/em>."

Time taken:
1st round - 63.91ms
2nd round - 71.46ms
3rd round - 65.11ms

@sstur
Copy link
Owner

sstur commented Jul 12, 2016

Wow, thanks for reporting this. I think I might have an idea where the bottleneck is, but I'll do some profiling and see what functions are taking the longest..

@richardriman
Copy link

Hi, is there any progress on it? I have probably the same problem. We have some HTML which goes from email conversations, about 10k chars and browser freezes on stateFromHTML().

@richardriman
Copy link

Confirmed. If trailing spaces removed, browser not freezes anymore, but performance is still bad. Parsing HTML from email conversation (about 10k chars) takes aprox. 420ms to parse. For comparison if I use covertFromHTML from draft-js itself, it is lightning fast (<50ms).

@sstur
Copy link
Owner

sstur commented Nov 6, 2016

I totally know why this is happening. It's related to trimTrailingSpace (and the similar functions trimLeadingSpace and collapseWhiteSpace are just as bad). There's performance issues with manipulating the characterMetaData associated with the text.

I think this needs to be refactored so we trim all the text and then manipulate the meta data only once.

I'll try to get a fix out soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants