-
-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOS vulnerability fix] Limit how many DOM nodes get iterated upon #172
Conversation
Why do you not add a preprocessing step that reduces the amount of HTML DOM elements given to |
Hi! This would make it a double-pre-processing, as a full DOM parsing would be required in order to count / limit DOM elements. It's already been done in I'm eager to make this optional and user-configurable if needed. |
This should definitely be done in |
I'm now looking at this PR for merging, but find it rather crude. My concerns with these changes:
What I see as a possible solution: var DEFAULT_OPTIONS = {
...,
limits: {
maxBaseElements: undefined, // undefined | number
maxChildElements: undefined, // everything opt-in?
maxTotalElements: undefined, // advice for good values can be put into docs
ellipsis: '...' // string | undefined (if set - insert this as a block whenever content is skipped)
},
}; I also wonder whether there is a meaningful way to move the logic to an extension point and avoid adding a lot of rigid options to html-to-text itself. But that might just overcomplicate things instead. @valeriansaliou, @baptistejamin are you still onboard with this? In addition to not breaking existing tests, this PR also require some tests on its own. Note: while this is important to prevent crashes, it is still a crude solution. It would be better to know the common causes and see whether they can be handled properly, before resorting to this. |
Oops. It wasn't my plan to merge this in the current state, but oh well. It will be easier for me to work from here. Current plan is to rework the settings as I've described above, and also make some other improvements to better handle some cases when deep nesting can happen when it shouldn't. |
And the rework is done.
BUT! Neither the original proposal nor my rework limit the extra work completely. Because the whole input is still processed by htmlparser2. There is still notable performance benefit on very long inputs though. (According to my estimation, 7 times larger file can be processed in about the same time if we throw away most of it.) Known issues: I want the options to be grouped, but that also raises the issue with proper options merge. Before I will address that, |
Hello @KillyMXI , Sorry for the late answer; we're still onboard w/ this and actively using our fork, we will definitely move back to using NPM's For DOS safety's sake, I'd enforce default values that are way-too-high to affect anyone using the library (instead of using |
I can argue this doesn't solve the vulnerability, just makes it a bit harder (see my previous comment). I don't want to force an incomplete solution and create a false sense of security. And I suppose different hardware can choke at vastly different input sizes, so I can't make assumptions on where "way-too-high" is. Solid solutions would be:
|
Version 6 with all the changes is now released. I decided to include the input string length limit and set it to some big value. |
We're using
node-html-to-text
at scale in production for Crisp (https://crisp.chat/); thanks for the work!Our inbound email system has been put offline a few days ago due to
node-html-to-text
being vulnerable to attacks where the parsed HTML would be either super-deep, or have a LOT of DOM elements, or both.We noticed via a NodeJS
--prof
check that thewalk()
method was being called a huge number of times. This can be considered as a DOS vulnerability, and was fixed in our fork with hard-coded limits (that are high).After deploying the patch in this PR in production, we've not seen any more load / downtime issues.
Here's how we fixed it:
MAX_BASE
set to 10);MAX_DOM
set to 200);walk()
method (MAX_ELEMENTS
set to 2000);I'm looking forward to get this PR merged into the NPMJS package 😉