-
Notifications
You must be signed in to change notification settings - Fork 23.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PERF] move assert to insert #11273
[PERF] move assert to insert #11273
Conversation
@ashtul Just like the failed test-sanitizer-address test, we can't check if it's corrupted data during traversal or lookup. |
To make it clear, the you changed an assertion in the test since there's a flow that used to
P. S. You're also missing a |
I think we can keep the assert at lpNext, lpPrev and lpFirst but add a flag that will tell us whether or not to check validity. This flag will be true for RESTORE commands or other WRITE commands but will be off for READ commands. |
In fact, even if it reads we also need to check it, otherwise it will be able to read arbitrary memory and can be returned to the client. |
I see the point of checking for corrupted memory but I wonder if it justifies 10% penalty in performance. |
@ashtul the problem could be with a malicious RESTORE command, that appears to have a valid listpack header, but some of the listpack records in it, reach outside the listpack allocation. we have an option to do deep sanitization when processing a RESTORE command, (cost of O(N) during RESTORE), and then we know we can completely trust the data, but we preferred to avoid going down that path. |
@oranagra validation of the listpack on RESTORE seems like a price worth to pay to enhance all the reads. |
the other issue with proceeding with that direction is that we'll have to maintain two copies of many functions, one to run on safe keys, and one to run on unsafe keys, and taint keys so we know which is which. |
How about having a |
first, i don't wanna get into having two builds and code that behaves differently in debug and release builds. |
Moving conversation for issue (#11293) to discuss the performance degradation due to |
@oranagra The code in rdb.c is already basically a separate copy of much of the logic from every key type. What if we remove the asserts as in this PR, but call I think forcing deep sanitization sounds like a good idea. I hope it can be reconsidered. |
I don't think that calling the only 3 alternatives i think are acceptable are:
the problem is that i don't think that adding an |
Always forcing deep sanitization so all keys are safe is not an option? |
always in RESTORE command or also in rdb loading and replica? |
#7807 was a massive and awesome job. 🥇
I was thinking in all of these. I realize it might not be good to slow down RESTORE since MIGRATE uses RESTORE and it blocks both cluster nodes until it's done. Perhaps not until we have a better solution for atomic slot migration that doesn't block two nodes, we can consider going down that path...(?) The way I think of it, data corruption can originate from several sources: a malicious user, a bad network connection, a bad hard disk or a memory corruption in the running node (caused by a cosmic ray). Stability wise it might be good to catch all of these (i.e. do deep sanitization and keep the asserts) but performance wise it isn't. Security-wise, it's the data from untrusted sources we're interested in, i.e. RESTORE from random clients. What about trusting other cluster nodes (MIGRATE) by including them in the |
In this PR,
lpAssertValidEntry
is remove fromlpNext
,lpPrev
, 'lpFirst' and 'lpFind'.The element valid is now checked in
lpInsert
.A difference can be seen for both HSET and HGET commands.
HGETALL a hash with 50 fields of numbers '0' to '50'.
Unstable
19.13%
10.2%
Check validity on insert
10.86%
0%
HSET a hash with 50 fields of numbers '0' to '50'.
Unstable
57.72%
7.5%
Check validity on insert
53.61%
3.9%