-
Notifications
You must be signed in to change notification settings - Fork 23.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CRASH] 7.0.10 memcpy sigabrt due to buffer overflow in replication.c line 367 #11965
Comments
FWIW: the machine doesnt even do replication. it is a standalone redis server. |
@darix Thx, could you provide the full crash logs? |
which part are you missing? not sure that the parts from the redis log are helpful here. sigabrt because memcpy would write beyond the allocated buffer, is pretty obvious. |
Like:
|
|
@darix Is there any logs after |
yes. but as i said ... not sure any more relevant info is in there.
|
now running with:
|
@darix Hi you said this crash happens in a standlone mode redis, could you please under which situation it crash? such as enable RDB, AOF or no persist, or provide us the config parameter? Thanks |
isnt the config part of the crash report output on top? and yes it is on openSUSE Tumbleweed. but the package is not yet in the main distro only in the devel project so far. https://build.opensuse.org/package/show/server:database/redis and the server started crashing during the night. once I switched to the version with the patch above. it is stable again. |
i was puzzled as to why would memcpy throw a SIGABRT, (and not a SIGSEGV). my guess is that for some reason, the change we did in the code caused the compiler to make some wrong assumption, and changing it back avoid that false assumption, when in fact we have no bug. more research is needed. |
|
you make it sound so simple and clear, but i don't think that's the case.
i'll also add that:
|
sadly i do not have the original coredump anymore because redis crashed 557 times before i stopped the auto restart. so i can not tell you what the original crash cause was. |
@darix can you run the redis test suite on a vanilla 7.0.10 on your system? |
#include <stdio.h>
#include <malloc.h>
#include <string.h>
struct a {
char c;
char buf[];
};
int main() {
const int BUFSIZE = 5;
struct a *a = malloc(sizeof(*a) + BUFSIZE);
size_t n = malloc_usable_size(a);
printf("usable size: %zu \n", n);
// Dummy condition to prevent compiler throw warning on compile time
// n will always be n >= BUFSIZE
size_t copy = n >= BUFSIZE ? BUFSIZE + 1 : BUFSIZE;
memcpy(a->buf, "ozanozan", copy);
printf("%.*s \n", (int) copy, (char*) a->buf);
return 0;
}
@oranagra I think your analysis correct. Looks like _FORTIFY_SOURCE=1 works with allocated size, not with "usable size". |
By disassembling the binary of https://build.opensuse.org/package/show/server:database/redis, we can see that it does use |
Reproduction steps:
# start redis
SETEX k 3 v
# stop redis and wait for 3 seconds (let redis create backlog due to expiration)
# start redis
for((i=65698;i<=1000000;i++));
do
dd if=/dev/zero of=test bs=$i count=1
./src/redis-cli -x set k < test
if [ $? -ne 0 ]; then
echo "failed"
exit
fi
done |
@sundb can you run the redis test suite on that platform and report if it passes or fails? i can easily reproduce @tezc example on my Jammy Ubuntu (even without explicitly specifying _FORTIFY_SOURCE), and indeed it seems to use the allocated size (not the usable, and also not the my guess is that a newer compiler got "smarter" and can remember an allocation size for some cases, whereas if we store it for later and then re-use the allocation, the compiler doesn't do that validation. |
openSUSE Tumbleweed moved to gcc 13 lately. |
i just remembered that a lot of debug information is in the log file. so it seems the sigabrt was triggered during a restart and then trying to read some files again.
|
#include <stdio.h>
#include <malloc.h>
#include <string.h>
struct a {
char c;
char buf[];
};
void* test()
{
const int BUFSIZE = 5;
struct a *a = malloc(sizeof(*a) + BUFSIZE);
size_t n = malloc_usable_size(a);
printf("usable size: %zu \n", n);
return a;
}
int main() {
struct a *a = test();
memcpy(a->buf, "ozanozan", 6);
printf("%.*s \n", (int) 6, (char*) a->buf);
return 0;
} I think compiler won't ever store allocation size on heap for these checks. If allocation & usage happens in the same stack (as it happens here in replication.c), it can perform this analysis without using heap. Maybe this is the new thing in the latest gcc. |
Hey guys, here is a GCC developer. Yes, it's a known issue that _FORTIFY_SOURCE does not work with |
Here is an intensive discussion with the fortification developer and systemd project where they also use it: |
For the sake of easier testing:
|
@oranagra test suite also failed.
|
CCing glibc maintainer of the functionality: @siddhesh. So basically the function https://man7.org/linux/man-pages/man3/malloc_usable_size.3.html:
|
Thanks @marxin! Right, the problem is with the usable size function returning a different value than what is allocated, thus breaking the compiler's assumption that the sizes ought to be the same. We (i.e. the glibc community) have been considering deprecating The reason why redis is running into this now and didn't before is in fact because the compiler now has more fortification coverage with The ideal fix for this should be to have the size returned from the allocator be consistent with the requested size, but if that's not possible (as it seems to me from a quick peek at the allocator; it seems to do a lot of things, including wrapping around external allocators), then the fix may be the same as systemd, i.e. to wrap the Something like this:
Alternatively, you could do this during allocation, by extending the |
Thanks to all the GCC and glibc developers for stepping in to clear this up.
@siddhesh by dummy realloc you mean actually calling realloc to request the allocation to be resized to the size we already know it has, or did you mean an empty function like in your example above which just returns the pointer blindly? please correct me if i'm wrong, won't that (the second approach) completely disable the fortification checks (the compiler will be completely unaware of the size of the allocation and won't do any checks)? maybe instead we can disable part of the fortification checks? @sundb can you try that suggestion to make sure it works and also check if it has any impact on performance? |
@oranagra I will try it. |
talloc provides functions to get the size of talloc chunk even including children. |
So to be clear, with the workaround I suggested, the code will actually become compliant. However, there's a good chance that glibc will deprecate Also, the risk is not just limited to such usage being standards-unsafe; there's also no guarantee that the size is consistent. That is, the underlying allocator is free to grow that chunk at any time, making the return value unstable and actual usage unsafe. None of the allocator implementations do that at the moment, but there's nothing stopping them from doing that if they find it to be somehow more performant. That's really what makes it a very bad interface to actually use the extra size. There are standards discussions to provide safer interfaces to do something like this, which may be a better fit for the redis (and systemd) use case and the workaround I suggested basically tries to get the usage closer to something like that.
I did mean an empty function like the example I wrote above; I wrote a workaround like that for systemd; see systemd/systemd@7929e18 and systemd/systemd@4f79f54
On the contrary, the compiler will be able to see the new size and do the right thing for your code base. For the compiler, the code will look like this:
If needed, I'll be happy to help with review to verify that it matches what I suggested (I obviously won't be very useful with nuances of the code base), please feel free to tag me into the PR. |
@siddhesh thank you. also, am i correct to understand that extra call will be optimized out and this change have zero impact on our performance? [Edit] I now see [Edit] i now see the |
yeah we can't let the function get optimized away yet because gcc (atleast, maybe even clang but I haven't checked) loses attributes whenever a function is inlined, so the So the impact is a function call overhead for now, hopefully we'll claw it back soon in gcc with a fix to that bug.
Correct too :) |
@siddhesh I tried to make some changes(sundb@19ed6a4) in redis, but ran into a strange problem. Calling
But when I move
I am not sure what is the right behavior. |
Calling extend_to_usable inside Basically, the |
@siddhesh Thanks for your help. |
do we have an ETA for the release? |
i suppose that since you indicated a recent change is what caused this (old) issue to get exposed, then we should release soon. |
I noticed that on 7.0.x we didn't have any |
distros started to enable -lto via CFLAGS |
This was code changed between 7.0.8 and 7.0.10
It seems the changed buffer size calculation leads to
Additional information
The text was updated successfully, but these errors were encountered: