Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Speed up stripping of markdown #2097

Closed
wants to merge 2 commits into from
Closed

fix: Speed up stripping of markdown #2097

wants to merge 2 commits into from

Conversation

marksteve
Copy link
Contributor

@marksteve marksteve commented Apr 29, 2021

We were encountering huge CPU spikes that would cause our outline server to stall for an hour when our wiki users searched for certain keywords. I identified the cause of the high CPU usage after some digging. It was when the search endpoint calls removeMarkdown() to render search results context.

My fix was to replace the package used with remark and the strip-markdown plugin. Do note that the plugin doesn't have the option to disable stripping of HTML and has some quirks. Not really sure what HTML is being whitelisted here.

@auto-assign auto-assign bot requested a review from tommoor April 29, 2021 04:58
@CLAassistant
Copy link

CLAassistant commented Apr 29, 2021

CLA assistant check
All committers have signed the CLA.

@tommoor
Copy link
Member

tommoor commented Apr 29, 2021

This is a great find, any idea if it's certain keywords and if so which?

Edit: Seems like it's this – stiang/remove-markdown#35 – so a large number of spaces in a searched document would trigger it.

@marksteve
Copy link
Contributor Author

Ooh. I thought it was just because of big documents. Updating your fork would be a better fix!

@tommoor
Copy link
Member

tommoor commented Apr 29, 2021

I'll pull in the fix from the other repo that hasn't been merged 🙄 – you're right I think less churn would be good here and being able to retain the stripHTML option. Regardless just finding this is huge.

@marksteve
Copy link
Contributor Author

And I guess the HTML that needs to be whitelisted are for emphasizing the matching terms?

@tommoor
Copy link
Member

tommoor commented Apr 29, 2021

That's right – pg returns html tags for that, lol

@marksteve
Copy link
Contributor Author

Got it! Closing this then. Thank you!

@tommoor
Copy link
Member

tommoor commented Apr 29, 2021

By the way, I suppose this was also the issue you were seeing with timeout's when searching from Slack on the cloud hosted version.

@marksteve
Copy link
Contributor Author

Yep! Most probably

@marksteve marksteve deleted the fix/remove-markdown branch April 29, 2021 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants