Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: build systemd with clang with -Dmode=release --optimization=2 #23621

Merged
merged 2 commits into from
Jun 6, 2022

Conversation

evverx
Copy link
Member

@evverx evverx commented Jun 3, 2022

This is what's most likely used to build systemd with clang in
practice so let's test it as well.

In preparation for reverting 0bd2925
(which replaced a bogus buffer overflow found with _FORTIFY_SOURCE=3
with actual segfaults).

0bd2925 is reverted as well.
It isn't guaranteed anywhere that __builtin_dynamic_object_size can
always deduce the size of every object passed to it so systemd
can end up using either malloc_usable_size or
__builtin_dynamic_object_size when pointers are passed around,
which in turn can lead to actual segfaults like the one mentioned in
#23619.

Apparently __builtin_object_size can return different results for
pointers referring to the same memory as well but somehow it hasn't
caused any issues yet. Looks like this whole
malloc_usable_size/FORTIFY_SOURCE stuff should be revisited.

Closes #23619 and #23150.
Reopens #22801

@thesamesam
Copy link
Contributor

Good catch with CI. I was wondering why it didn't hit it.

@evverx
Copy link
Member Author

evverx commented Jun 4, 2022

The CI mostly uses clang to test systemd with ASan/UBsan and somewhat MSan (where malloc_usable_size is intercepted correctly with all optimization levels). It just wasn't prepared for plain builds with -O2 where MALLOC_SIZEOF_SAFE/MALLOC_ELEMENTSOF can return either __builtin_dynamic_object_size or malloc_usable_size (which is usually larger than __builtin_dynamic_object_size) for pointers referring to the same memory in different places.

@evverx
Copy link
Member Author

evverx commented Jun 4, 2022

Judging by

autopkgtest: WARNING: Test dependencies are unsatisfiable - calling apt install on test deps directly for further data about failing dependencies in test logs
blame: https://salsa.debian.org/systemd-team/systemd.git#upstream-ci
badpkg: Test dependencies are unsatisfiable. A common reason is that your testbed is out of date with respect to the archive, and you need to use a current testbed or run apt-get update or use -U.
autopkgtest [09:56:55]: ERROR: erroneous package: Test dependencies are unsatisfiable. A common reason is that your testbed is out of date with respect to the archive, and you need to use a current testbed or run apt-get update or use -U.

Ubuntu CI and Sempahore CI broke down again.

@mbiebl
Copy link
Contributor

mbiebl commented Jun 4, 2022

Judging by

autopkgtest: WARNING: Test dependencies are unsatisfiable - calling apt install on test deps directly for further data about failing dependencies in test logs
blame: https://salsa.debian.org/systemd-team/systemd.git#upstream-ci
badpkg: Test dependencies are unsatisfiable. A common reason is that your testbed is out of date with respect to the archive, and you need to use a current testbed or run apt-get update or use -U.
autopkgtest [09:56:55]: ERROR: erroneous package: Test dependencies are unsatisfiable. A common reason is that your testbed is out of date with respect to the archive, and you need to use a current testbed or run apt-get update or use -U.

Ubuntu CI and Sempahore CI broke down again.

I rebased the upstream-ci branch on current debian/master which includes
https://salsa.debian.org/systemd-team/systemd/-/commit/a8a310c5e6f6e466f638a246ac8840615e5019f0

This package is provided via bullseye-backports, but it seems this repository is not enabled in Semaphore CI.
For Ubuntu Focal, the package should be available via https://launchpad.net/~upstream-systemd-ci/+archive/ubuntu/systemd-ci

Not quite sure why that didn't work.

@bluca any idea?

Anyway, I reverted that commit for now to unbreak CI.

Please retrigger.

@mbiebl
Copy link
Contributor

mbiebl commented Jun 4, 2022

ah, seems package-notes was only built for jammy and not for focal

@mbiebl
Copy link
Contributor

mbiebl commented Jun 4, 2022

So, Semaphore CI failed with a different error (which I guess is a good sign).

As for Ubuntu CI, might be that they are caching the git repo as it seems the still have the old state.
[EDIT: some picked up the revert commit and now fail similarly to Semaphore CI]

@evverx
Copy link
Member Author

evverx commented Jun 5, 2022

So, Semaphore CI failed with a different error (which I guess is a good sign).

I opened #23630

@evverx
Copy link
Member Author

evverx commented Jun 5, 2022

FWIW I'm not sure if https://salsa.debian.org/systemd-team/systemd/-/commit/453537cfbead2f0780b51c5b24d598496575770f was released or not but that warning boils down to journal-remote crashing due to assert_not_reached() in check_permissions without gnutls.

@mbiebl
Copy link
Contributor

mbiebl commented Jun 5, 2022

Argh, sorry. Wasn't really my intention to disrupt CI that much by updating upstream-ci.
Should I revert https://salsa.debian.org/systemd-team/systemd/-/commit/453537cfbead2f0780b51c5b24d598496575770f again?
Or should 507fc32 be merged into main?

Then again, upstream-ci, afaics, is also used for systemd-stable, so this commit would have to be pulled there as well.

@mbiebl
Copy link
Contributor

mbiebl commented Jun 5, 2022

@evverx
Copy link
Member Author

evverx commented Jun 5, 2022

Should I revert https://salsa.debian.org/systemd-team/systemd/-/commit/453537cfbead2f0780b51c5b24d598496575770f again?
Or should 507fc32 be merged into main?

I think if it's expected that journald-remote should crash there 507fc32 should probably be merged. I haven't looked at that part of the code closely though so I can't say for sure.

This is what's most likely used to build systemd with clang in
practice so let's test it as well.

Preparation for reverting systemd@0bd2925
(which replaced bogus buffer overflow found with _FORTIFY_SOURCE=3
with actual segfaults).
@evverx
Copy link
Member Author

evverx commented Jun 5, 2022

Looks like __builtin_object_size is equally dangerous when pointers are passed around. I'm not sure why it has never manifested itself in actual buffer overruns (but now I know where to look for them closer :-)) but I think it would make sense to revert 6df28e1 as well. I don't think the part of systemd allocating memory should rely on compiler features that can return different results in different places depending on how pointers are passed around.

# cat test.c
#include <malloc.h>

void print_sizes(void *p) {
	printf("usable_size: %ld, dynamic_size: %ld, object_size: %ld\n", malloc_usable_size(p), __builtin_dynamic_object_size(p, 0), __builtin_object_size(p, 0));
}

int main(int argc, char *argv[]) {
	char *p = malloc(10);
	printf("usable_size: %ld, dynamic_size: %ld, object_size: %ld\n", malloc_usable_size(p), __builtin_dynamic_object_size(p, 0), __builtin_object_size(p, 0));
	print_sizes(p);
	free(p);
	return 0;
}
# gcc --version
gcc (GCC) 12.1.1 20220507 (Red Hat 12.1.1-1)
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

# gcc -O2 test.c
# ./a.out
usable_size: 24, dynamic_size: 10, object_size: 10
usable_size: 24, dynamic_size: -1, object_size: -1

…_size."

This reverts commit 0bd2925.

It isn't guaranteed anywhere that __builtin_dynamic_object_size can
always deduce the size of every object passed to it so systemd
can end up using either malloc_usable_size or
__builtin_dynamic_object_size when pointers are passed around,
which in turn can lead to actual segfaults like the one mentioned in
systemd#23619.

Apparently __builtin_object_size can return different results for
pointers referring to the same memory as well but somehow it hasn't
caused any issues yet. Looks like this whole
malloc_usable_size/FORTIFY_SOURCE stuff should be revisited.

Closes systemd#23619 and
systemd#23150.

Reopens systemd#22801
@keszybz
Copy link
Member

keszybz commented Jun 6, 2022

/cc @marxin

I guess we should merge this to fix those crashes. clang and -O2 is super standard and if the code doesn't work with that, we don't have much choice. Maybe we can figure out a way to support -D_FORTIFY_SOURCE=3 later on.

but I think it would make sense to revert 6df28e1 as well. I don't think the part of systemd allocating memory should rely on compiler features that can return different results in different places depending on how pointers are passed around.

Hmm, maybe. But let's do this one step at a time.

@keszybz keszybz merged commit 369151c into systemd:main Jun 6, 2022
@evverx
Copy link
Member Author

evverx commented Jun 6, 2022

But let's do this one step at a time

Agreed. Since it hasn't caused any issues yet I think it should be fine to keep it for now. I'm a bit concerned though that those codepaths are completely hidden from ASan, UBSan, MSan and Valgrind and whatever is tested by the CI isn't what is actually shipped by distributions.

Maybe we can figure out a way to support -D_FORTIFY_SOURCE=3 later on.

As far as I understand at this point -D_FORTIFY_SOURCE isn't compatible with malloc_usable_size and it can be fixed either by getting rid of malloc_usable_size or by intercepting malloc_usable_size correctly with _FORTIFY_SOURCE somehow by analogy with ASan, UBSan, MSan and Valgrind.

@marxin
Copy link
Contributor

marxin commented Jun 15, 2022

Looks like __builtin_object_size is equally dangerous when pointers are passed around. I'm not sure why it has never manifested itself in actual buffer overruns (but now I know where to look for them closer :-))

Can you please explain why is dangerous? Note that -1 is of size_t so it's a huge number and it will also satisfy checks like:

if (size < buffer_size)
   abort();

Can you please explain how exactly does it crash with clang and would it be possible to isolate a reproducer?

Edited: Oh it's about MALLOC_SIZEOF_SAFE which is then used for knowing how big a buffer is (and so can be used). So it even more leads to not using malloc_usable_size.

@marxin
Copy link
Contributor

marxin commented Jun 15, 2022

As far as I understand at this point -D_FORTIFY_SOURCE isn't compatible with malloc_usable_size and it can be fixed either by getting rid of malloc_usable_size or by intercepting malloc_usable_size correctly with _FORTIFY_SOURCE somehow by analogy with ASan, UBSan, MSan and Valgrind.

Yes, it does not work correctly with malloc_usable_size and I would consider not using it by such a system tool like systemd. It seems to me like a super-micro optimization that does not worth it.

@evverx
Copy link
Member Author

evverx commented Jun 16, 2022

I would consider not using it by such a system tool like systemd

@marxin agreed. I think ideally it would be great if _FORTIFY_SOURCE was compatible with malloc_usable_size somehow (to avoid false positives in general) but even if it worked by analogy with say Valgrind right now if would basically mean that d4b604b would never work as intended with systemd packages built with FORTIFY_SOURCE (and as far as I know basically all distributions include _FORTIFY_SOURCE=2 in their CFLAGS one way or another).

@evverx evverx deleted the clang-release branch June 16, 2022 02:06
@marxin
Copy link
Contributor

marxin commented Jun 16, 2022

@marxin agreed. I think ideally it would be great if _FORTIFY_SOURCE was compatible with malloc_usable_size somehow (to avoid false positives in general) but even if it worked by analogy with say Valgrind right now if would basically mean that d4b604b would never work as intended with systemd packages built with FORTIFY_SOURCE (and as far as I know basically all distributions include _FORTIFY_SOURCE=2 in their CFLAGS one way or another).

Good! And yes, most of the distros likely use _FORTIFY_SOURCE=2. So are planning a commit that will remove the usage of malloc_usable_size or should I do that?

@evverx
Copy link
Member Author

evverx commented Jun 16, 2022

@marxin I'd wait for @poettering to chime in. Looking at #22801 (comment) and #19203 (comment) I'm not sure the idea of removing malloc_usable_size would be approved right away unanimously.

@marxin
Copy link
Contributor

marxin commented Jun 27, 2022

Looking at the codebase, dropping of malloc_usable_size would be quite some work as individual array needs to track its size and not rely on malloc_usable_size being used by GREEDY_REALLOC and GREEDY_REALLOC0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

systemd built with clang and --optimization>=2 is unusable
6 participants