New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
do not rely on undefined behavior in glfwSetWindowIcon for X11 #1986
Conversation
While working on [Zig bindings for GLFW](https://github.com/hexops/mach-glfw) me and @Andoryuuta noticed that `glfwSetWindowIcon` was crashing. I wrote about debugging this and the cause [in an article](https://devlog.hexops.com/2021/perfecting-glfw-for-zig-and-finding-undefined-behavior#finding-lurking-undefined-behavior-in-6-year-old-glfw-code) but the summary is that when compiling with UBSan (which Zig does by default) clang will insert `asm { ud1 }` traps when it thinks there is undefined behavior. This code in particular is problematic: ```c *target++ = (images[i].pixels[j * 4 + 0] << 16) | (images[i].pixels[j * 4 + 1] << 8) | (images[i].pixels[j * 4 + 2] << 0) | (images[i].pixels[j * 4 + 3] << 24); ``` We see in IDA Pro that clang inserted a jump (pictured below) to an `asm { ud1 }` instruction: ![image](https://user-images.githubusercontent.com/3173176/139594073-b2159e4c-6764-44b1-882d-802724f424e8.png) What is happening here is that: * `images[i].pixels[j * 4 + 0]` is returning an `unsigned char` (8 bits) * It is then being shifted left by `<< 16` bits. !!! That's further than an 8-bit number can be shifted left by, hence undefined behavior. In [an equal snippet of code in Godbolt](https://godbolt.org/z/ddq75WsYK), we can see that without UBSan clang merely uses the 32-bit EAX register as an optimization. It loads the 8-bit number into the 32-bit register, and then performs the left shift. Although the shift exceeds 8 bits, it _does not get truncated to zero_ - instead it is effectively as if the number was converted to a `long` (32 bits) prior to the left-shift operation. This explains why nobody has caught this UB in GLFW yet, too: it works by nature of compilers liking to use 32-bit registers in this context. So, to fix this, ensure we cast to `long` first before shifting. Helps hexops/mach#20 Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
Upstream pull request: glfw/glfw#1986 Article: https://devlog.hexops.com/2021/perfecting-glfw-for-zig-and-finding-undefined-behavior Fixes #20 Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
Upstream pull request: glfw/glfw#1986 Article: https://devlog.hexops.com/2021/perfecting-glfw-for-zig-and-finding-undefined-behavior Fixes hexops/mach#20 Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
The explanation of why this is undefined behavior is incorrect. In an expression such as this: Instead the undefined behavior comes from the following expression: The standard says the following (E1 is the left operand and E2 is the right operand of the << operator):
|
Oh wow, I totally missed that @Maato - thank you for pointing that out. I should've caught that, too, because I knew |
Wouldn't it be better to cast to |
// gcc -fsanitize=undefined
#include <stdint.h>
#include <stdio.h>
int main () {
unsigned char foo = 0x80;
int x =
(((int) foo) << 16) |
(((int) foo) << 8) |
(((int) foo) << 0) |
(((int) foo) << 24);
printf("x: %i \n", x);
return 0;
}
// test.c:10:16: runtime error: left shift of 128 by 24 places cannot be represented in type 'int'
// x: -2139062144 Yes I wonder if there is a set of warning flags that could have helped with this. |
In case there's a thought of fixing the incorrect "cast to long" fix by casting to "long long" instead (which would at least work properly everywhere), another reason that this suggestion by martinhath (to cast to unsigned int) is the correct solution is that it's potentially more efficient in various architectures and optimization levels, with zero extra machine code generated for widening into a 64-bit value, never mind 64-bit shifts being slower in some situations. It also more clearly indicates what the real issue is (shifting a one into the sign bit, which an unsigned int doesn't have). |
As suggested by Maato, martinhath, dcousens, and drfuchs. Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
Pushed a commit to switch to |
To repeat what @Maato said but in arithmetic terms: The maximum signed 32-bit integer is 2³¹- 1, hence you get an overflow. |
Hi, the proper type for an unsigned 32-bit integer is Not that it matters much here because GLFW is unlikely to ever be compiled for a 16-bit CPU, but the C standard only guarantees Also note that |
Thank you @slimsag and @Andoryuuta for making and documenting this bug fix, and thank you everyone for helping to refine it! I agree that |
Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
Appreciate the feedback, I've swapped the cast for Let me know what you think! |
Hi, I believe the change from |
I'm not sure this is true, do you have a source for this? Here is, from what I can tell, the spec, and it says (under
Further, this is reflected in the actual call to X that GLFW already does: XChangeProperty(_glfw.x11.display, window->x11.handle,
_glfw.x11.NET_WM_ICON,
XA_CARDINAL, 32,
PropModeReplace,
(unsigned char*) icon,
elements); where we explicitly specify that we're passing in I'm no Xpert though, so I might be mistaken :) |
I thought similar to what @martinhath said above based on the
And because the signature takes However, looking into it more it seems like Xlib casts the pointer back to long*, and passes it to this
And on IL64 platforms that ends up defined as a function which takes https://sourcegraph.com/github.com/mirror/libX11@2356e59/-/blob/src/XlibInt.c?L1670-1697 So it would seem the Xlib docs are wrong, the function does take Will update soon. |
…pectations Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
See glfw/glfw#1986 Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
OK, I believe this is finally good to go, we now cast to I have tested with UBSan as well and this passes. |
See glfw/glfw#1986 Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
Thank you everyone for the fix! |
The conversion of window icon image data involves unsigned char color values being promoted to int and then shifted to the left by 24. For 32-bit ints this is just far enough to trigger undefined behavior. It worked by accident because of how current compilers translate this piece of code. This was caught by @slimsag while working on [Zig bindings for GLFW][1], and diagnosed together with @Andoryuuta, as described [in an article][2]. Zig has UBSan enabled by default, which caught this undefined behavior. [1]: https://github.com/hexops/mach-glfw [2]: https://devlog.hexops.com/2021/perfecting-glfw-for-zig-and-finding-undefined-behavior#finding-lurking-undefined-behavior-in-6-year-old-glfw-code Thanks to Maato, martinhath, dcousens, drfuchs and Validark for helping to refine the solution. This commit message was rewritten by @elmindreda to hopefully reflect the conclusions of the pull request thread. Related to hexops/mach#20 Closes #1986 (cherry picked from commit 9cd4d2f)
The conversion of window icon image data involves unsigned char color values being promoted to int and then shifted to the left by 24. For 32-bit ints this is just far enough to trigger undefined behavior. It worked by accident because of how current compilers translate this piece of code. This was caught by @slimsag while working on [Zig bindings for GLFW][1], and diagnosed together with @Andoryuuta, as described [in an article][2]. Zig has UBSan enabled by default, which caught this undefined behavior. [1]: https://github.com/hexops/mach-glfw [2]: https://devlog.hexops.com/2021/perfecting-glfw-for-zig-and-finding-undefined-behavior#finding-lurking-undefined-behavior-in-6-year-old-glfw-code Thanks to Maato, martinhath, dcousens, drfuchs and Validark for helping to refine the solution. This commit message was rewritten by @elmindreda to hopefully reflect the conclusions of the pull request thread. Related to hexops/mach#20 Closes glfw#1986 (cherry picked from commit 9cd4d2f)
Upstream pull request: glfw/glfw#1986 Article: https://devlog.hexops.com/2021/perfecting-glfw-for-zig-and-finding-undefined-behavior Fixes hexops/mach#20 Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
See glfw/glfw#1986 Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
While working on Zig bindings for GLFW me and @Andoryuuta noticed that
glfwSetWindowIcon
was crashing. I wrote about debugging this and the cause in an article but the summary is that when compiling with UBSan (which Zig does by default) clang will insertasm { ud1 }
traps when it thinks there is undefined behavior. This code in particular is problematic:We see in IDA Pro that clang inserted a jump (pictured below) to an
asm { ud1 }
instruction:What is happening here is that:
images[i].pixels[j * 4 + 0]
is returning anunsigned char
(8 bits)It is then being shifted left bySee do not rely on undefined behavior in glfwSetWindowIcon for X11 #1986 (comment)<< 16
bits. !!! That's further than an 8-bit number can be shifted left by, hence undefined behavior.In an equal snippet of code in Godbolt, we can see that without UBSan clang merely uses the 32-bit EAX register as an optimization. It loads the 8-bit number into the 32-bit register, and then performs the left shift. Although the shift exceeds 8 bits, it does not get truncated to zero - instead it is effectively as if the number was converted to a
long
(32 bits) prior to the left-shift operation.This explains why nobody has caught this UB in GLFW yet, too: it works by nature of compilers liking to use 32-bit registers in this context.
So, to fix this, ensure we cast to
long
first before shifting.Helps hexops/mach#20
Signed-off-by: Stephen Gutekanst stephen@hexops.com