do not rely on undefined behavior in glfwSetWindowIcon for X11 #1986

slimsag · 2021-10-31T18:38:24Z

While working on Zig bindings for GLFW me and @Andoryuuta noticed that glfwSetWindowIcon was crashing. I wrote about debugging this and the cause in an article but the summary is that when compiling with UBSan (which Zig does by default) clang will insert asm { ud1 } traps when it thinks there is undefined behavior. This code in particular is problematic:

                *target++ = (images[i].pixels[j * 4 + 0] << 16) |
                            (images[i].pixels[j * 4 + 1] <<  8) |
                            (images[i].pixels[j * 4 + 2] <<  0) |
                            (images[i].pixels[j * 4 + 3] << 24);

We see in IDA Pro that clang inserted a jump (pictured below) to an asm { ud1 } instruction:

What is happening here is that:

images[i].pixels[j * 4 + 0] is returning an unsigned char (8 bits)
~~It is then being shifted left by << 16 bits. !!! That's further than an 8-bit number can be shifted left by, hence undefined behavior.~~ See do not rely on undefined behavior in glfwSetWindowIcon for X11 #1986 (comment)

In an equal snippet of code in Godbolt, we can see that without UBSan clang merely uses the 32-bit EAX register as an optimization. It loads the 8-bit number into the 32-bit register, and then performs the left shift. Although the shift exceeds 8 bits, it does not get truncated to zero - instead it is effectively as if the number was converted to a long (32 bits) prior to the left-shift operation.

This explains why nobody has caught this UB in GLFW yet, too: it works by nature of compilers liking to use 32-bit registers in this context.

So, to fix this, ensure we cast to long first before shifting.

Helps hexops/mach#20

Signed-off-by: Stephen Gutekanst stephen@hexops.com

@Andoryuuta

While working on [Zig bindings for GLFW](https://github.com/hexops/mach-glfw) me and @Andoryuuta noticed that `glfwSetWindowIcon` was crashing. I wrote about debugging this and the cause [in an article](https://devlog.hexops.com/2021/perfecting-glfw-for-zig-and-finding-undefined-behavior#finding-lurking-undefined-behavior-in-6-year-old-glfw-code) but the summary is that when compiling with UBSan (which Zig does by default) clang will insert `asm { ud1 }` traps when it thinks there is undefined behavior. This code in particular is problematic: ```c *target++ = (images[i].pixels[j * 4 + 0] << 16) | (images[i].pixels[j * 4 + 1] << 8) | (images[i].pixels[j * 4 + 2] << 0) | (images[i].pixels[j * 4 + 3] << 24); ``` We see in IDA Pro that clang inserted a jump (pictured below) to an `asm { ud1 }` instruction: ![image](https://user-images.githubusercontent.com/3173176/139594073-b2159e4c-6764-44b1-882d-802724f424e8.png) What is happening here is that: * `images[i].pixels[j * 4 + 0]` is returning an `unsigned char` (8 bits) * It is then being shifted left by `<< 16` bits. !!! That's further than an 8-bit number can be shifted left by, hence undefined behavior. In [an equal snippet of code in Godbolt](https://godbolt.org/z/ddq75WsYK), we can see that without UBSan clang merely uses the 32-bit EAX register as an optimization. It loads the 8-bit number into the 32-bit register, and then performs the left shift. Although the shift exceeds 8 bits, it _does not get truncated to zero_ - instead it is effectively as if the number was converted to a `long` (32 bits) prior to the left-shift operation. This explains why nobody has caught this UB in GLFW yet, too: it works by nature of compilers liking to use 32-bit registers in this context. So, to fix this, ensure we cast to `long` first before shifting. Helps hexops/mach#20 Signed-off-by: Stephen Gutekanst <stephen@hexops.com>

Upstream pull request: glfw/glfw#1986 Article: https://devlog.hexops.com/2021/perfecting-glfw-for-zig-and-finding-undefined-behavior Fixes #20 Signed-off-by: Stephen Gutekanst <stephen@hexops.com>

Upstream pull request: glfw/glfw#1986 Article: https://devlog.hexops.com/2021/perfecting-glfw-for-zig-and-finding-undefined-behavior Fixes hexops/mach#20 Signed-off-by: Stephen Gutekanst <stephen@hexops.com>

Maato · 2021-10-31T20:10:58Z

The explanation of why this is undefined behavior is incorrect.

In an expression such as this: images[i].pixels[j * 4 + 0] << 16, the integer promotions are performed on each of the operands. So the left operand, which starts out as unsigned char, will get promoted to int. Shifting a value known to be at most 255 to the left by 16 bits will not cause undefined behavior in this case.

Instead the undefined behavior comes from the following expression: images[i].pixels[j * 4 + 3] << 24. Again the left operand will be promoted to int. If the value happens to be >= 0x80 shifting left by 24 will cause the sign bit of the int to be set (or really, the mathematical result of the shift operation will not be representable in the int).

The standard says the following (E1 is the left operand and E2 is the right operand of the << operator):

If E1 has a signed type and nonnegative value, and E1x2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

slimsag · 2021-10-31T20:14:55Z

Oh wow, I totally missed that @Maato - thank you for pointing that out. I should've caught that, too, because I knew << 24 was the shift directly before the crash but I guess I muddied the water and got confused :)

martinhath · 2021-10-31T21:32:53Z

Wouldn't it be better to cast to unsigned int or something similar (like uint32_t), since long can still be 32 bits? If the size of long and int is the same, wouldn't the cast as proposed in the PR be semantically identical to the current state?

dcousens · 2021-11-01T07:25:42Z

If the size of long and int is the same, wouldn't the cast as proposed in the PR be semantically identical to the current state?

// gcc -fsanitize=undefined
#include <stdint.h>
#include <stdio.h>

int main () {
	unsigned char foo = 0x80;
	int x =
		(((int) foo) << 16) |
		(((int) foo) <<  8) |
		(((int) foo) <<  0) |
		(((int) foo) << 24);

	printf("x: %i \n", x);
	return 0;
}
// test.c:10:16: runtime error: left shift of 128 by 24 places cannot be represented in type 'int'
//   x: -2139062144

Yes

I wonder if there is a set of warning flags that could have helped with this.
Unfortunately -Wconversion misses the mark.

drfuchs · 2021-11-01T16:06:47Z

Wouldn't it be better to cast to unsigned int or something similar (like uint32_t), since long can still be 32 bits? If the size of long and int is the same, wouldn't the cast as proposed in the PR be semantically identical to the current state?

In case there's a thought of fixing the incorrect "cast to long" fix by casting to "long long" instead (which would at least work properly everywhere), another reason that this suggestion by martinhath (to cast to unsigned int) is the correct solution is that it's potentially more efficient in various architectures and optimization levels, with zero extra machine code generated for widening into a 64-bit value, never mind 64-bit shifts being slower in some situations. It also more clearly indicates what the real issue is (shifting a one into the sign bit, which an unsigned int doesn't have).

As suggested by Maato, martinhath, dcousens, and drfuchs. Signed-off-by: Stephen Gutekanst <stephen@hexops.com>

slimsag · 2021-11-01T16:52:07Z

Pushed a commit to switch to unsigned int, I really appreciate the corrections to my understanding & feedback here everyone!

Validark · 2021-11-02T16:01:22Z

To repeat what @Maato said but in arithmetic terms:
(byte ≥ 2⁷) × 2²⁴ ⟹ (result ≥ 2³¹)

The maximum signed 32-bit integer is 2³¹- 1, hence you get an overflow.

linkmauve · 2021-11-02T18:06:51Z

Hi, the proper type for an unsigned 32-bit integer is uint32_t.

Not that it matters much here because GLFW is unlikely to ever be compiled for a 16-bit CPU, but the C standard only guarantees unsigned int to be at least 16-bit wide, which would still make two of these values not representable.

Also note that target is still a long*, which means the conversion from uint32_t to long still happens, and on 32-bit platforms (or platforms using LLP64) the sign bit will still be the most significant bit of the alpha value.

elmindreda · 2021-11-04T23:15:52Z

Thank you @slimsag and @Andoryuuta for making and documenting this bug fix, and thank you everyone for helping to refine it!

I agree that uint32_t is a more appropriate type to cast to before shifting, even if unsigned int is probably okay for anything we will run on.

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>

slimsag · 2021-11-05T05:19:51Z

Appreciate the feedback, I've swapped the cast for uint32_t and also changed target and icon from long to uint32_t types which if I understand correctly should be more correct due to XChangeProperty accepting a 32-bit format regardless of platform.

Let me know what you think!

linkmauve · 2021-11-06T09:24:02Z

Hi, I believe the change from long to uint32_t is incorrect, Xlib defines the data as a series of long, which would be 64-bit on ILP64 platforms like Linux.

martinhath · 2021-11-06T12:27:09Z

Hi, I believe the change from long to uint32_t is incorrect, Xlib defines the data as a series of long, which would be 64-bit on ILP64 platforms like Linux.

I'm not sure this is true, do you have a source for this? Here is, from what I can tell, the spec, and it says (under _NET_WM_ICON):

This is an array of 32bit packed CARDINAL ARGB with high byte being A, low byte being B. The first two cardinals are width, height. Data is in rows, left to right and top to bottom.

Further, this is reflected in the actual call to X that GLFW already does:

XChangeProperty(_glfw.x11.display, window->x11.handle,
                _glfw.x11.NET_WM_ICON,
                XA_CARDINAL, 32,
                PropModeReplace,
                (unsigned char*) icon,
                elements);

where we explicitly specify that we're passing in unsigned 32-bit numbers.

I'm no Xpert though, so I might be mistaken :)

slimsag · 2021-11-06T12:35:44Z

I thought similar to what @martinhath said above based on the XChangeProperty docs, as we pass 32 for the format which is defined as:

Specifies whether the data should be viewed as a list of 8-bit, 16-bit, or 32-bit quantities. Possible values are 8, 16, and 32. This information allows the X server to correctly perform byte-swap operations as necessary. If the format is 16-bit or 32-bit, you must explicitly cast your data pointer to an (unsigned char *) in the call to XChangeProperty().

And because the signature takes unsigned char* not long*.

However, looking into it more it seems like Xlib casts the pointer back to long*, and passes it to this Data32 function internally right here:

Data32 (dpy, (_Xconst long *) data, len);

And on IL64 platforms that ends up defined as a function which takes long* data and converts the 64-bit longs to 32-bit data:

https://sourcegraph.com/github.com/mirror/libX11@2356e59/-/blob/src/XlibInt.c?L1670-1697

So it would seem the Xlib docs are wrong, the function does take long* data despite the signature being unsigned char* data, and despite it claiming to take 32-bit data.

Will update soon.

…pectations Signed-off-by: Stephen Gutekanst <stephen@hexops.com>

See glfw/glfw#1986 Signed-off-by: Stephen Gutekanst <stephen@hexops.com>

slimsag · 2021-11-16T00:25:08Z

OK, I believe this is finally good to go, we now cast to unsigned long. The unsigned should avoid the UBSan complaint about overflowing into the sign bit, while long should match the internal expectations of Xlib which internally treats the pointer as a long (64-bit integer on IL64 platforms, even if the data is only 32-bit.)

I have tested with UBSan as well and this passes.

See glfw/glfw#1986 Signed-off-by: Stephen Gutekanst <stephen@hexops.com>

elmindreda · 2021-12-02T17:55:06Z

Thank you everyone for the fix!

@slimsag

The conversion of window icon image data involves unsigned char color values being promoted to int and then shifted to the left by 24. For 32-bit ints this is just far enough to trigger undefined behavior. It worked by accident because of how current compilers translate this piece of code. This was caught by @slimsag while working on [Zig bindings for GLFW][1], and diagnosed together with @Andoryuuta, as described [in an article][2]. Zig has UBSan enabled by default, which caught this undefined behavior. [1]: https://github.com/hexops/mach-glfw [2]: https://devlog.hexops.com/2021/perfecting-glfw-for-zig-and-finding-undefined-behavior#finding-lurking-undefined-behavior-in-6-year-old-glfw-code Thanks to Maato, martinhath, dcousens, drfuchs and Validark for helping to refine the solution. This commit message was rewritten by @elmindreda to hopefully reflect the conclusions of the pull request thread. Related to hexops/mach#20 Closes #1986 (cherry picked from commit 9cd4d2f)

@slimsag

The conversion of window icon image data involves unsigned char color values being promoted to int and then shifted to the left by 24. For 32-bit ints this is just far enough to trigger undefined behavior. It worked by accident because of how current compilers translate this piece of code. This was caught by @slimsag while working on [Zig bindings for GLFW][1], and diagnosed together with @Andoryuuta, as described [in an article][2]. Zig has UBSan enabled by default, which caught this undefined behavior. [1]: https://github.com/hexops/mach-glfw [2]: https://devlog.hexops.com/2021/perfecting-glfw-for-zig-and-finding-undefined-behavior#finding-lurking-undefined-behavior-in-6-year-old-glfw-code Thanks to Maato, martinhath, dcousens, drfuchs and Validark for helping to refine the solution. This commit message was rewritten by @elmindreda to hopefully reflect the conclusions of the pull request thread. Related to hexops/mach#20 Closes glfw#1986 (cherry picked from commit 9cd4d2f)

Upstream pull request: glfw/glfw#1986 Article: https://devlog.hexops.com/2021/perfecting-glfw-for-zig-and-finding-undefined-behavior Fixes hexops/mach#20 Signed-off-by: Stephen Gutekanst <stephen@hexops.com>

See glfw/glfw#1986 Signed-off-by: Stephen Gutekanst <stephen@hexops.com>

slimsag requested a review from elmindreda as a code owner October 31, 2021 18:38

elmindreda added bug Bug reports and bugfix pull requests X11 labels Oct 31, 2021

switch to unsigned int which is more correct and more efficient

d70fe36

As suggested by Maato, martinhath, dcousens, and drfuchs. Signed-off-by: Stephen Gutekanst <stephen@hexops.com>

elmindreda self-assigned this Nov 2, 2021

elmindreda added the verified Reproduced or otherwise verified bugs label Nov 2, 2021

elmindreda added this to the 3.3.6 milestone Nov 2, 2021

elmindreda added this to Todo in Queue Nov 2, 2021

cast to uint32_t, change target from long -> uint32_t

bb2b007

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>

silversquirl mentioned this pull request Nov 11, 2021

Fix undefined behaviour in X11 keypress handling #1989

Closed

use unsigned long / correct incompatability with underlying libxcb ex…

3d8a194

…pectations Signed-off-by: Stephen Gutekanst <stephen@hexops.com>

slimsag force-pushed the master branch from 6e96219 to 3d8a194 Compare November 16, 2021 00:12

slimsag added a commit to hexops/mach that referenced this pull request Nov 16, 2021

glfw: update setIcon hotfix for undefined behavior

fd703e9

See glfw/glfw#1986 Signed-off-by: Stephen Gutekanst <stephen@hexops.com>

slimsag added a commit to hexops/mach-glfw that referenced this pull request Nov 16, 2021

glfw: update setIcon hotfix for undefined behavior

b967fd5

See glfw/glfw#1986 Signed-off-by: Stephen Gutekanst <stephen@hexops.com>

slimsag mentioned this pull request Nov 23, 2021

glfw: tracking issue for ensuring upstream patches land hexops/mach#95

Closed

7 tasks

elmindreda moved this from Todo to Now in Queue Nov 26, 2021

elmindreda closed this in 9cd4d2f Dec 1, 2021

slimsag added a commit to hexops/mach-glfw that referenced this pull request Aug 26, 2022

glfw: update setIcon hotfix for undefined behavior

ab04b1e

See glfw/glfw#1986 Signed-off-by: Stephen Gutekanst <stephen@hexops.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

do not rely on undefined behavior in glfwSetWindowIcon for X11 #1986

do not rely on undefined behavior in glfwSetWindowIcon for X11 #1986

slimsag commented Oct 31, 2021 •

edited

Maato commented Oct 31, 2021 •

edited

slimsag commented Oct 31, 2021

martinhath commented Oct 31, 2021 •

edited

dcousens commented Nov 1, 2021 •

edited

drfuchs commented Nov 1, 2021

slimsag commented Nov 1, 2021

Validark commented Nov 2, 2021 •

edited

linkmauve commented Nov 2, 2021 •

edited

elmindreda commented Nov 4, 2021

slimsag commented Nov 5, 2021 •

edited

linkmauve commented Nov 6, 2021 •

edited

martinhath commented Nov 6, 2021

slimsag commented Nov 6, 2021

slimsag commented Nov 16, 2021 •

edited

elmindreda commented Dec 2, 2021

do not rely on undefined behavior in glfwSetWindowIcon for X11 #1986

do not rely on undefined behavior in glfwSetWindowIcon for X11 #1986

Conversation

slimsag commented Oct 31, 2021 • edited

Maato commented Oct 31, 2021 • edited

slimsag commented Oct 31, 2021

martinhath commented Oct 31, 2021 • edited

dcousens commented Nov 1, 2021 • edited

drfuchs commented Nov 1, 2021

slimsag commented Nov 1, 2021

Validark commented Nov 2, 2021 • edited

linkmauve commented Nov 2, 2021 • edited

elmindreda commented Nov 4, 2021

slimsag commented Nov 5, 2021 • edited

linkmauve commented Nov 6, 2021 • edited

martinhath commented Nov 6, 2021

slimsag commented Nov 6, 2021

slimsag commented Nov 16, 2021 • edited

elmindreda commented Dec 2, 2021

slimsag commented Oct 31, 2021 •

edited

Maato commented Oct 31, 2021 •

edited

martinhath commented Oct 31, 2021 •

edited

dcousens commented Nov 1, 2021 •

edited

Validark commented Nov 2, 2021 •

edited

linkmauve commented Nov 2, 2021 •

edited

slimsag commented Nov 5, 2021 •

edited

linkmauve commented Nov 6, 2021 •

edited

slimsag commented Nov 16, 2021 •

edited