-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Count collisions during HT insertion operations #1565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d2d008e
to
5fdc23e
Compare
* be counted for add_new operations. find+add_new operations should be replaced | ||
* with add_or_return operations. | ||
*/ | ||
#define HT_MAX_COLLISIONS 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if a developer runs into this issue with an application without a malicious intention? Should this maybe a ini setting?
Or is this case really absolutely hypothetical (I don't have a in-depth understanding of the possibility of this happening outside actual DOS attempts) ?
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think this should become an optional configuration.Hard encoding for 1000 may result in special needs can not be met
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mfn : it is.
Ini setting is bad, because relying on the user for such a crucial, misunderstood concept, is bad.
Like we hardcoded many internal-related things, GC_ROOT_BUFFER_MAX_ENTRIES f.e.
Very nice :) About HT_MAX_COLLISIONS Running json_decode on a big file (1 000 000 entries, 46MB) : while collision rate is ~50% (top level), the maximum for mum_collision (per bucket) is only 8. So very small related to proposed value of 1000 Baliout time with different values, using https://github.com/bk2204/php-hash-dos
So I think we don't really need a new ini option for this (and we can switch to a greater value, if wanted) |
@remicollet I think there should be a ini option, for some applications, which deal with huge data, there could be a chance it will exceeds the limit, add an new option will be more acceptable |
@@ -0,0 +1,16 @@ | |||
--TEST-- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should go to ext/json/tests and have skip if json ext is not loaded.
Could we please define first, what huge data is? Billions of keys? Just a theoretical excercise. Say there are only alnum symbols used for keys, those are like 62 symbols. Looking at the repro from bk2204's repo, already 1000000 combinations of 11 symbols from that set are enough to trigger sufficient amount of collisions. The whole number of combinations (11 char long keys) were like n! / ((n - m)! * m!) = 508271323092 (). Now, it is way more than distinct hash values an unsigned can hold, even they can be distributed around buckets. So what I think, we need some more practical evaluation, particularly - what are the limits of the current hash table implementation. As besides the factor how much a HashTable theoretically can take, there's also bare hardware, OS, etc. limitation for that. For this reason i'd rather pretend the hardcoded value that were evaluated and tested good is balanced enough without the need of additional ini setting indeed. There are two factors for that - the patch is supposed to prevent attacs in first place. But also it's supposed to give enough room for huge arrays in the normal case. From that POV, if some normal usage leads to too many collisions and that is known to make the whole application unusable, why allow users to shoot themselves in own foot? At that point, whether it's billions of data that would go through and one last million that would cause DDoS because of too many collisions - does it really matter it is caused by the programmer or by the attacker? Besides the actual reproduce case (combinations of 11 elements from 62 elements array), I'm pretty much under the impression, that in real life reaching collisions is much much harder. The key length might vary much more, the source symbols array as well (fe what if using some multibyte alphabet for keys, like Chinese). So IMHO it is to 99% about fixing the vulnerability, not about hurting the normal usage. My test currently on this patch is with the script below. Basically running bk2204's approach, running for hours already but having no bailout with the patch.
With bail out - i'd be rather for some softer one, too. There's no reason to fail completely just if one hash table is unusable. Also I would plea to backport this to 7.0, given it could be done without ABI breach (and possibly with no additional INI). Thanks for reaching here. |
My first thinking, seeing this hardcoded limit was also, "we need to make it an new ini option". So I run a few tests. Especially the above Anatol one (a bit optimized to get more unique keys faster), and with a debug message in zend_hash.c to display max number of collisions encountered during the run After some time... With around 9.2 millions of unique keys, and ~1.2GB of memory used, the max collisions never exceeds "12". So I really think, now, that we can go without new ini option. |
Since it was provided that a limit of 1000 is "enough" for almost all apps, do we need to check whether the limit is not too high? At which point does PHP "get slow"? |
@@ -619,6 +651,11 @@ ZEND_API zval* ZEND_FASTCALL _zend_hash_add_new(HashTable *ht, zend_string *key, | |||
return _zend_hash_add_or_update_i(ht, key, pData, HASH_ADD_NEW ZEND_FILE_LINE_RELAY_CC); | |||
} | |||
|
|||
ZEND_API zval* ZEND_FASTCALL _zend_hash_add_or_return(HashTable *ht, zend_string *key, zval *pData ZEND_FILE_LINE_DC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename zend_hash_add_or_return() into zend_hash_lookup()
See #1565 (comment) |
Missed that, thx. |
@nikic can we get an update on this please, I see bob made another PR, not sure of the status of either ? |
We didn't reach a consensus on how the HashDos protection implementation should look like on internals, so this is probably going to need an RFC. I'm closing this PR for now. |
For HashDos protection. The limit of 1000 is arbitrary.