-
Notifications
You must be signed in to change notification settings - Fork 514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errors when compiling with -fsanitize=undefined #812
Comments
The alignment warnings may be just normal operation: contrary to what the message says, x86/x64 doesn't (generally) require any alignment and that is intentionally taken advantage of to pack internal structures more tightly. You can force align-to-4 or align-to-8 by editing The shift issues are something else, and are most likely caused by a 5-bit signed constant being shifted left 27 places (e.g. 23 << 27) with the result then used as an unsigned value. The intermediate value, a 32-bit signed integer cannot technically hold the value without overflow, but there's no actual problem with the shift in practice because the result goes into an unsigned result field. This should be fixable by changing the constants to be unsigned, or just casting them to unsigned before the shift. |
Aren't there performance pitfalls for unaligned access though? I know x86/x64 will allow it but I was under the impression it degraded performance because the CPU has to do extra masking/fetches. |
@saghul Aww, you stole my favorite number! :) |
Unaligned reads/writes may have almost zero impact to some impact, depending on the specific x86 processor in question (my understanding is that with newer processors the effect is negligible). Unaligned reads are used e.g. for the object property table to minimize memory usage, but you can force a specific alignment by setting |
Out of curiosity I set it to 8 and the failures got reduced to:
|
@saghul Thanks - while I can't directly replicate those shift warnings (only GCC 6 seems to have a warning option for that) I should be able to fix the warnings. They should be totally benign though. |
Clang also has it AFAIK, but not on the version that ships with OSX. There you can compile with |
Tested the microbenchmarks quickly with align-by-1 vs. align-by-8 (natural alignment up to IEEE doubles) and on this laptop (Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz) some tests are marginally faster while others are marginally slower - essentially there's no net difference. But the situation is most likely different for e.g. Atom. As said above, maybe it'd be a useful change to default to natural alignment on all platforms unless the user explicitly modifies |
Clang with |
Oh, I didn't know that one! |
This is true (provided the AC flag is off, and no one runs with AC set, so...), but the C language itself imposes alignment restrictions that may be stronger than the underlying platform. In fact, just creating an unaligned pointer is undefined behaviour already. Compilers may optimize assuming that unaligned accesses don't happen, according to C's rules, not just x86's. However, optimizers are also aware of the alignment requirements of the target, and may generate unaligned accesses at the machine code level. So it's possible to write code that obeys C's alignment rules while still generating efficient unaligned moves, for example by using #include <stdlib.h>
#include <string.h>
char *unaligned_bad() {
char *ptr = malloc(sizeof(int) + 1);
int *iptr = (int *)(ptr + 1); // Already undefined behaviour
*iptr = 42; // Also undefined
return ptr;
// Compiles with gcc -O3 to
// subq $8, %rsp
// movl $5, %edi
// call malloc
// movl $42, 1(%rax)
// addq $8, %rsp
// ret
}
char *unaligned_good() {
char *ptr = malloc(sizeof(int) + 1);
int value = 42;
memcpy(ptr + 1, &value, sizeof(value));
return ptr;
// Also compiles with gcc -O3 to
// subq $8, %rsp
// movl $5, %edi
// call malloc
// movl $42, 1(%rax)
// addq $8, %rsp
// ret
} |
@tavianator Interesting, I wasn't aware of that. Could you provide a link to the alignment rules? |
I guess that also means that any compiler supporting packed structs will necessarily have well defined behavior for unaligned fields because otherwise struct packing wouldn't work. However, compilers not supporting struct packing would be free to do otherwise. |
Just to be clear, there are several But even so, it's interesting if there are no guarantees even on x86 and it's definitely worth mentioning somewhere. The AC flag bit is already mentioned in the code comments: /* XXX: This is technically not guaranteed because it's possible to configure
* an x86 to require aligned accesses with Alignment Check (AC) flag.
*/
#if !defined(DUK_USE_ALIGN_BY)
#define DUK_USE_ALIGN_BY 1
#endif ... but like @tavianator said, almost no-one forces alignment on x86 so this is not a practical issue. If it were, it can be overridden by giving -DDUK_USE_ALIGN_BY=8 to configure.py. Quite apart from the portability issue, I'd like to reduce the layout options to 2 (or even just 1) simply to make the code nicer to maintain. |
The latest C11 draft says (§6.3.2.3 p7):
Indeed, gcc for example carves out an exception for accesses to packed struct fields, but things like this are still undefined: struct __attribute__((packed)) foo {
char c;
int i;
};
struct foo f;
int *p = &f.i; // Undefined, f.i is not properly aligned (See some of the discussion at this bug report for example.) |
@tavianator What I find puzzling is what is "correctly aligned" - is this a reference to that target architecture alignment rules or something defined at the C standard level (e.g. all integer pointers must be sizeof(int) aligned)? For example, for x86 the target architecture alignment is align-by-1 (at least with AC off) so wouldn't this mean there are never unaligned pointers on that specific target? This matter for the warning because the intent for Duktape is to enable what amount to struct packing for the object property table when the target is, indeed, align-by-1. |
Sample above with *iptr = 42; not about alignment rule, but about aliasing. Can be resolved by
on most compilers. About alignment it's also true, it can crash on ARM for example. Can be resolved by
MSVC have different non-standard keywords. |
@lieff Just to be clear, Duktape respects default alignment rules unless DUK_USE_ALIGN_BY (specified for each architecture/compiler) allows lesser alignment. For example, x86 generally allows align-by-1, some 32-bit targets allow align-by-4 (but allow 8-byte values to be only aligned by 4) etc. So this is just about a warning that gets issued with certain sanitize options. There was a similar issue with MSVC warnings: when compiling as 32-bit code it would warn about some code constructs being unsafe with 64-bit targets. But the warning was actually incorrect because those constructs would never be enabled on 64-bit targets. |
Warnings is 100% correct, because compiler optimizations can make assumptions from it. Clang already started process of magic broken code on UB. gcc goes on that too. |
If the warning is correct and leads to actual, broken code on x86, then the best step would be to enable Or if possible:
All the x86 compilers I've worked with have supported packed structures because they're so ubiquitous in embedded low memory programming. But I'm happy to change the default if that default is not safe. The goal is to use nice target specific tricks when they are available and avoid them when not, and |
It's my understranding that alignment rules apply to the declared type of the pointer -- so an int* should be int-aligned, double should be double-aligned, etc. This is why it's legal to alias any pointer with a char*, because the alignment requirement then is 1 byte by definition. |
@fatcerberus Right, but there are targets where e.g. IEEE double alignment is align-by-4, and a lot of code happily works with that (and expects that). So that's why I was asking above whether the standard requirement is relative to the target architecture or some abstract, non-target-specific notion of required alignment. The most portable way to align is obviously multiple of sizeof(type) but that's not always desirable, e.g. when one can save memory by packing data more tightly. It doesn't matter on desktops, but targets running with a few hundred kBs of total memory often benefit from packing - so it's available and |
Standard requirement is non-target specific. It's just many compilers violates standard. But latest clang and gcc integrates some aggressive optimizations that rely on standard. Standard way to fix it is tricky, char * and memcpy for example. Compiler dependent __attribute and __declspec in macros produces more elegant code. |
Assuming it's too unsafe to try to default to anything than natural alignment in Because that's really the goal of Low memory applications would then need to force alignment explicitly, but that's not generally a problem because they do a lot of tweaking with their compilers already for optimizing footprint, memory usage, etc. |
@lieff Thanks, I've learned something new (and disturbing) today ;) This means I would never write an operating system in C because one couldn't do half the things one would want to ;) |
Yeah, I was referring to what the C specification says (to my knowledge - a copy of the standard isn't exactly free :) |
@tavianator Thanks, excellent points! Couple of comments:
I agree, my comment was based on the impression that the warning meant that the issue might be a problem in some other compilation context -- not the exact one being compiled with. For example, MSVC will issue 64-bit warnings (if enabled) when compiling against a 32-bit target, and those warnings would be valid if one compiled the exact same code as 64-bit. However, in Duktape's case, the code paths triggering 64-bit warnings in 32-bit mode never come into play in a 64-bit compilation because they'll be So similarly, if a compiler warns about undefined behavior but the compiler itself does provide deterministic behavior, the warning may be irrelevant if that code is
That's true but there's also a semi-agreed notion of architecture alignment. To be able to link binaries from different compilers and compiler versions, dynamically link libraries, etc, there must be some agreement on what alignment rules are used for struct layout. So in practice x86/x64 targets have been quite deterministic in how they pack structures, and that they allow align-by-1 at least so far as generating correct code and not trapping for unaligned accesses. Note that I'm not at all arguing that it wouldn't be a good idea to follow natural alignment for portability. But people use struct packing and lower alignment assumptions for specific reasons, usually to save memory. In practice Duktape needs to be able to work with that for low memory targets, as in a lot of cases such assumptions are valid for the specific target (like x86 for a long while) and do free up memory for other parts of the application. I think the approach in master is the best for this case: assume nothing unportable w.r.t. alignment and document how you can use memory more efficiently for targets where it matters.
Good to know. I mainly meant that typedefs ultimately mapping to char types will get the same aliasing behavior as char pointers. |
Casting an out-of-range floating-point value to and integer is explicitly undefined behaviour (as opposed to casting integers to narrower integers, which is merely implementation defined). See for example http://blog.frama-c.com/index.php?post/2013/10/09/Overflow-float-integer. |
@tavianator Right, but as long as the result is any integer (and not The reason I ask is that avoiding the cast means peeking into the IEEE representation directly to make the determination which is slower, adding explicit range checks (slower), or relying on more platform bindings which cause portability headaches for people working on bare metal and other exotic platforms. |
@tavianator Never mind, re-reading the article:
So yeah, apparently it can be a real world issue. So the question is:
Humdidum. |
There may exist platforms where the natural instruction to convert a float to an int traps on range errors, I'm not sure. I worry more about the fact that compilers like to use the assumption that undefined behaviour doesn't occur to infer data flow facts. And even if today's compilers don't do a particular thing, tomorrow's compilers may begin to exploit new kinds of undefined behaviour. Here's something I can realistically see an optimizer doing: double x = ...;
int n = (int)x;
if (n != x) {
// This can only happen if d was out of range, so the compiler might
// delete this entire block as unreachable!
report_error("double was out of range!");
} Whether or not compilers currently perform such an optimization is less important to me than whether they are allowed to. We might have had a similar discussion 15 years ago about integer overflow, but now compilers will definitely make similar optimizations (e.g. removing |
I guess the only potential issue would be if the result happened to be one which passed the check even though it's actually out of range. |
Just to be sure, I'm not a fan of relying on undefined behavior or portability assumptions. But over time I've come to the conclusion that you really can't write really portable code and must make some practical assumptions. For example, assuming two's complement and ASCII character set are two assumptions Duktape makes because: (1) almost no-one encounters issues with them, (2) it's very costly to avoid them. For example, out of interest I tried to figure out what avoiding the ASCII assumption (and supporting e.g. EBCDIC) would cost; it's just too much effort for the trouble. So I try to treat portability assumptions quite practically:
Not sure where this issue lands. |
That shouldn't be an issue. There are two cases:
|
That's true and I also see this is a potential risk. Especially because the results may be highly unpredictable and difficult to diagnose so there's a large cost when anyone encounters any actual issues. |
@tavianator I'll try to see if there's a way to rework that check into a portable one with minimal downsides; it would then be an easy decision to make. In practical terms, I'd be happy if the code is awkward but compiles down to good assembly using gcc/clang/msvc like e.g. most memcpy() patterns do. My previous attempt involved inspecting the IEEE double representation, i.e. looking at the exponent and masking the mantissa based on the exponent. This was slower and larger unfortunately, but maybe I can figure out another way. (There's a similar check for fastint code so maybe I'll be able to adapt that.) Another easy fix would be an explicit range check - and that would be fine if gcc/clang/msvc had enough information to optimize it away (given that they know how they do the conversion). But I suspect they won't be able to infer this and optimize away the check. |
@tavianator Regarding this example: double x = ...;
int n = (int)x;
if (n != x) {
// This can only happen if d was out of range, so the compiler might
// delete this entire block as unreachable!
report_error("double was out of range!");
} This particular case couldn't happen here, because the comparison is also used and relevant for in-range inputs (and comes out both true and false for them too). But I get your point, and it would be of course easy to write a compiler that specifically sabotaged the out-of-range case, and similar behavior may turn out unexpectedly when doing enough optimizations, combining optimizations, etc. Even in this case, while the branch cannot be eliminated entirely, the taken branch of the if-clause could be compiled with knowledge of the range of the values, which incorporated the assumption that the undefined behavior case is not possible. And that might then cause something interesting depending on what's done in the branch. |
May be something like that will help?
Also I'm not sure if isnan() case should be handled. |
@lieff It wouldn't because the input may be any IEEE double (overflowing the 64-bit value) and Duktape doesn't generally assume availability of 64-bit types because they're not always available on real targets (and in some cases, are available but broken). Re: NaNs, the simple cast approach (which is undefined behavior) deals with them automatically in the final comparison: regardless of what the integer-converted value is, when it is converted back to a double and compared against the input NaN, the result is always false. Same for infinities. |
Hmmm
then? Still do not looks too complex. |
@lieff That would work as long as fmin() and fmax() had the property that NaN+"something" returns the non-NaN "something" (which should be true for modern compilers but not necessarily legacy ones). To be safe an explicit NaN check would make the behavior more portable to older platforms. This approach would be equivalent to an explicit range check I suggested above. But these both have the downside of additional footprint and being slower. How much that matters depends on whether the code is a hot path. Anything happening in the bytecode executor, for example, gets hugely amplified so each cycle added/removed from opcode handling matters. I'll see if I can adapt a similar check from fastint code to this purpose. It is faster than a double-integer-double conversion on most platforms I checked - but has the downside of a much larger footprint. It also relies on 64-bit integers now which are not always available, so I'll need to port the check to 32-bit integers which should be straightforward. |
Casting NaN to integer is already UB: pandas-dev/pandas#12303 |
@lieff Right, but if fmin() and fmax() obey the rule that fmin(NaN, something) and fmax(NaN, something) returns the non-NaN argument, your code works because the result of the fmin() + fmax() will be one of the non-NaN limits. So a NaN-to-integer coercion wouldn't happen unless fmin() and fmax() didn't obey this policy. |
Returning the non-NaN argument (if any) is the standard-specified behaviour of static double duk_fmin(double x, double y) {
return x < y ? x : y;
} which propagates the second argument on NaN, and then call it like Anyway, I'm currently trying to find out if there's a way to get gcc to generate just a single |
@tavianator I think a plain range check using double comparisons would be better than fmin()+fmax(). It would automatically evaluate to false for NaNs. But even that check is unnecessary in practical terms, at least for the vast majority of platforms. |
What case better is compiler dependent. fmin/fmax can be preferred to remove branches. Using gcc7 -O2 -ffast-math it's compiled into:
|
@lieff I was just about to say the same, sometimes branches are more expensive than extra computation. |
@lieff Testing on gcc 7, with -O2 but with standard math (not -ffast-math) fmin() + fmax() make actual function calls. |
@svaarala Right, ffast-math is dangerous. One way to bypass it - use architecture-dependent intrinsic functions directly. It will be something like that:
|
@lieff Or, like I wrote above, just do static double duk_fmin(double x, double y) {
return x < y ? x : y;
} and GCC will optimize it to |
@tavianator That didn't happen in my case unfortunately. The result was explicit comparisons, at least with -O2: https://github.com/svaarala/duktape/blob/master/src-input/duk_util_double.c#L127-L134. |
@saghul Long time, but could you check what your test yields from master? My test run is now clean so I'll close this issue in a while unless something new is reported (and naturally a separate issue should be opened if something pops up later). |
In my case there no minsd too.
Compiled gcc 7.2 into:
It removes one branch with cmova, but second branch remains.
Compiled into:
And no branches. |
Oh wow, I'd call that a gcc bug. Looks like the explicit constants inhibit the optimization. This does what I expected at least: $ cat foo.c
#include <math.h>
#include <limits.h>
static inline double duk_fmin(double x, double y) {
return x < y ? x : y;
}
static inline double duk_fmax(double x, double y) {
return x > y ? x : y;
}
int test(int x, int y, int z) {
return duk_fmin(duk_fmax(x, y), z);
}
$ gcc -O2 -S foo.c
$ cat foo.s
.file "foo.c"
.text
.p2align 4,,15
.globl test
.type test, @function
test:
.LFB5:
.cfi_startproc
pxor %xmm2, %xmm2
pxor %xmm0, %xmm0
pxor %xmm1, %xmm1
cvtsi2sd %esi, %xmm2
cvtsi2sd %edi, %xmm0
cvtsi2sd %edx, %xmm1
maxsd %xmm2, %xmm0
minsd %xmm1, %xmm0
cvttsd2si %xmm0, %eax
ret
.cfi_endproc
.LFE5:
.size test, .-test
.ident "GCC: (GNU) 7.2.0"
.section .note.GNU-stack,"",@progbits |
True, have optimizations in that case too. |
This reproduces upstream commit de7ae8a2ecc597e1c2024c15dbeae4d28c9f2a2c. * svaarala/duktape@de7ae8a It was applied to master after the release of Duktape 2.2.0. See also: * svaarala/duktape#1783 * svaarala/duktape#812 (comment)
Using Duktape 1.5.0 and GCC 6.1.0 on OSX:
The text was updated successfully, but these errors were encountered: