Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify kroundup macros and fix gcc warnings #1051

Merged
merged 2 commits into from Mar 19, 2020

Conversation

daviesrob
Copy link
Member

Since 29c294e, which fixed overflow bugs in kroundup32() and kroundup_size_t() among other things, samtools and bcftools have been getting gcc warnings like htslib/kstring.h:163:16: error: argument 1 value ‘18446744073709551615’ exceeds maximum object size 9223372036854775807 [-Werror=alloc-size-larger-than=]. These apparently appear due to jump threading optimisations which generate code paths for the case when kroundup avoids the overflow and then warns that realloc() has been called on the result. (See https://developers.redhat.com/blog/2019/03/13/understanding-gcc-warnings-part-2/ for an explanation.)

This is my best attempt to fix this. It replaces several kroundup32 and kroundup_size_t macro definitions with a new unified implementation. The new one will work correctly for all integer types (both signed and unsigned) up to 64 bits. It also avoids conditionals with large numbers in the macro definition to work around the jump threading problem. This mostly works - with this change all of the warnings go from bcftools and only one is left in samtools (where a signed value is rounded up and then multiplied by two in the realloc statement, so probably actually legitimate).

It's been tested on 32-bit and 64-bit builds, and works on both (barring the issue with samtools).

Includes extra tests to ensure the new kroundup works as desired.

Replaces #1048, which failed on 32-bit.

@jkbonfield
Copy link
Contributor

What does all this ghastly code end up compiling to? This is heavily inlined. Will it be oveflowing the sizes of what is permissible for inline with the extra logic, or does the compiler manage to realise it's mostly const and just replace it with 0 or 1?

@jkbonfield
Copy link
Contributor

jkbonfield commented Mar 17, 2020

Looking at godbolt it appears that a function doing:

size_t kround(size_t size)
{
       return kroundup_size_old(size);
}

adds 4 instructions for gcc9.2 (x86_64) and a whopping 24 instructions if using clang, compared to my branch. That may have implications for inlining.

I'm wondering if we can just do something to do disable the bogus warning, or completely ditch this and only round up if values are < some fixed size. The case the compilers are spotting are only for sizes which wouldn't actually work anyway, so failing to round up and just allocate what the user asked for isn't an issue as rounding up or otherwise it will result in a memory allocation failure. The difference is asking for 9 yotta bytes vs 18 I think :-)

My PR already had checks to stop it rounding up if it got too big, but apparently that still triggered the compiler warnings. Was that because the use of INT_MAX is still too large? I haven't managed to reproduce any of these warnings, even using an -m32 capable compiler.

Do you know which piece of code gave the warning originally? (Edit: no matter - it's bam_fastq.c:28. I'll try pasting that into godbolt to explore different compiler/arch.)

@jkbonfield
Copy link
Contributor

The same macro (32 bit version at least) turns up in hts.h, khash.h, kseq.h, rbuf.h, bgzf.c and faidx.c also. Blergh.

I guess we could ignore the original problem which was memory failure when you overflow and round down to zero, becoming a free statement instead, but that feels like we're just ignoring known bugs. Still we're no worse off at least as they're already there.

However in every single use case it is prior to a realloc (occasionally 2 of the same size) or a malloc, with very variable degrees of bounds and overflow checking. It's not even appropriate to put size checking in kroundup either as sometimes we then take that rounded up value and do things like this:

            kroundup32(heap->mdat);                                     \
            heap->dat = (kheap_t*)realloc(heap->dat, heap->mdat*sizeof(kheap_t));

What this makes me think is that kroundup is a total red herring and what we want is simply a krealloc function which automatically does more some rounding up and has uniformly good overflow checks.

However, in the interests of expediency, I'm inclined to go with this as a temporary workaround for the over-zealous warnings reported by some gcc/system combos. Comments?

@daviesrob
Copy link
Member Author

Did you remember to use -O2 in Godbolt, and change the language to 'C'? I get about 4 extra lines on clang 9.0.0 compared to #1048 (also checkout the ARM64 gcc 8.2 version - it's tiny!)

The same macro (32 bit version at least) turns up in hts.h, khash.h, kseq.h, rbuf.h, bgzf.c and faidx.c also. Blergh.

The version in rbuf.h would be an issue for bcftools. I think I found all of them in HTSlib.

It's not even appropriate to put size checking in kroundup either as sometimes we then take that rounded up value and do things like this (multiply by another value without checking for overflow)

Yes, we do need to sort out the places where the rounded-up value gets multiplied by something, although most are likely harmless because the array in question is growing slowly and you run out of memory before it gets the chance to overflow. Some of these (e.g. kstring) might work better with a slower growth factor. I think others (e.g. khash) rely on the size being a power of two so it's possible to use a bit-mask to restrict values to the size of the allocated array.

@jmarshall
Copy link
Member

jmarshall commented Mar 17, 2020

However in every single use case it is prior to a realloc (occasionally 2 of the same size) or a malloc, with very variable degrees of bounds and overflow checking.

Yep and for this reason IMHO there's little point in inlining the rounding up — you're about to do a relatively expensive memory allocation anyway. In the case of ks_resize() it's worth keeping the s->m >= size fast case inline, but the rest could be via a function call. See for example 04b71e6 — which suffices to avoid the warnings in samtools, and with the roundup now inside libhts.so the embiggening algorithm can be tweaked with impunity.

…which is basically equivalent to James's krealloc concept:

What this makes me think is that kroundup is a total red herring and what we want is simply a krealloc function which automatically does more some rounding up and has uniformly good overflow checks.

(IMHO the kroundupXYZ macro has always basically been a premature optimisation: it's written obscurely and tightly so that it can be a macro that's inlined all over the place, but there's really no reason to do that.)

daviesrob and others added 2 commits March 18, 2020 09:11
Replace several kroundup32 and kroundup_size_t macro definitions
with a new unified implementation.  The new one will work
correctly for all integer types (both signed and unsigned) up to
64 bits.  An attempt to round up that would overflow results
in the maximum value that can be stored in the given type.

The implementation avoids conditionals involving large values
in an attempt to fix some gcc warnings in samtools and bcftools.
They appeared after commit 29c294e which fixed a bug where kroundup
would wrap around to zero on large values.  The warnings are
caused by jump threading optimisations that create code paths
for the large value cases, which gcc then warns about. See:
https://developers.redhat.com/blog/2019/03/13/understanding-gcc-warnings-part-2/

Includes extra tests to ensure the new kroundup works as desired.
@daviesrob
Copy link
Member Author

@jmarshall Yes, I've also considered moving most of ks_resize() out of the inline function. Let's go for it 😄

It doesn't make that much difference to the size of the binary:

File Before After
libhts.a 8388028 8349306
libhts.so 4440834 4403764

but checking with -Winline shows that quite a few missed inlines are fixed:

Before:

    148 htslib/kstring.h warning: inlining failed in call to 'kputc': --param inline-unit-growth limit reached [-Winline]
     70 htslib/kstring.h warning: inlining failed in call to 'kputsn': --param inline-unit-growth limit reached [-Winline]
     48 htslib/kstring.h warning: inlining failed in call to 'kputw': --param inline-unit-growth limit reached [-Winline]
     42 htslib/kstring.h warning: inlining failed in call to 'kputc_': --param inline-unit-growth limit reached [-Winline]
     28 htslib/kstring.h warning: inlining failed in call to 'kputsn_': --param inline-unit-growth limit reached [-Winline]
      6 htslib/kstring.h warning: inlining failed in call to 'kputs': --param inline-unit-growth limit reached [-Winline]
      6 htslib/kstring.h warning: inlining failed in call to 'kputll': --param inline-unit-growth limit reached [-Winline]
      4 htslib/kstring.h warning: inlining failed in call to 'kputuw': --param inline-unit-growth limit reached [-Winline]
      2 htslib/kstring.h warning: inlining failed in call to 'kputsn': call is unlikely and code size would grow [-Winline]

After:

     99 htslib/kstring.h warning: inlining failed in call to 'kputc': --param inline-unit-growth limit reached [-Winline]
     50 htslib/kstring.h warning: inlining failed in call to 'kputw': --param inline-unit-growth limit reached [-Winline]
     30 htslib/kstring.h warning: inlining failed in call to 'kputsn': --param inline-unit-growth limit reached [-Winline]
      9 htslib/kstring.h warning: inlining failed in call to 'kputc_': --param inline-unit-growth limit reached [-Winline]
      8 htslib/kstring.h warning: inlining failed in call to 'kputsn_': --param inline-unit-growth limit reached [-Winline]
      2 htslib/kstring.h warning: inlining failed in call to 'kputsn': call is unlikely and code size would grow [-Winline]

@jkbonfield
Copy link
Contributor

Looks like we've reached a good consensus here. I'm happy, but will do a quick bit of compilation and testing and hopefully merge soon.

PS. Yes, I forgot to turn on -O2 in godbolt. Oops. I see what you mean about arm too. Nice instruction set!

@jkbonfield jkbonfield merged commit 04b71e6 into samtools:develop Mar 19, 2020
@jkbonfield
Copy link
Contributor

Merged, although the samtools cut_target.c:66 warning is still to be resolved. I suspect something like this would work:

diff --git a/cut_target.c b/cut_target.c
index e59f51b..8eae0d1 100644
--- a/cut_target.c
+++ b/cut_target.c
@@ -61,8 +61,8 @@ static uint16_t gencns(ct_t *g, int n, const bam_pileup1_t *plp)
     int i, j, ret, tmp, k, sum[4], qual;
     float q[16];
     if (n > g->max_bases) { // enlarge g->bases
-        g->max_bases = n;
-        kroundup32(g->max_bases);
+        g->max_bases = n*2; // prevent overflow in kroundup32
+        kroundup32(g->max_bases); g->max_bases /= 2;
         g->bases = realloc(g->bases, g->max_bases * 2);
     }
     for (i = k = 0; i < n; ++i) {

Although ideally we'd have kmalloc and krealloc and it'd just work without horrid shenanigans. Patch now to stop compilation errors and we can get more time to come up with a better solution.

@daviesrob
Copy link
Member Author

Thanks. Easiest solution for cut_target.c for now is to simply cast to size_t in the realloc:

        g->bases = realloc(g->bases, (size_t) g->max_bases * 2);

As it goes from signed to unsigned, it always fits.

@jkbonfield
Copy link
Contributor

Good point. I'm happy for you to make that 1 line tweak without PR if you wish provided it doesn't trigger any warnings. Looks like you've verified that already.

@daviesrob
Copy link
Member Author

I've pushed the fix to samtools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants