Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not rely on undefined behavior in glfwSetWindowIcon for X11 #1986

Closed
wants to merge 4 commits into from

Conversation

slimsag
Copy link
Contributor

@slimsag slimsag commented Oct 31, 2021

While working on Zig bindings for GLFW me and @Andoryuuta noticed that glfwSetWindowIcon was crashing. I wrote about debugging this and the cause in an article but the summary is that when compiling with UBSan (which Zig does by default) clang will insert asm { ud1 } traps when it thinks there is undefined behavior. This code in particular is problematic:

                *target++ = (images[i].pixels[j * 4 + 0] << 16) |
                            (images[i].pixels[j * 4 + 1] <<  8) |
                            (images[i].pixels[j * 4 + 2] <<  0) |
                            (images[i].pixels[j * 4 + 3] << 24);

We see in IDA Pro that clang inserted a jump (pictured below) to an asm { ud1 } instruction:

image

What is happening here is that:

In an equal snippet of code in Godbolt, we can see that without UBSan clang merely uses the 32-bit EAX register as an optimization. It loads the 8-bit number into the 32-bit register, and then performs the left shift. Although the shift exceeds 8 bits, it does not get truncated to zero - instead it is effectively as if the number was converted to a long (32 bits) prior to the left-shift operation.

This explains why nobody has caught this UB in GLFW yet, too: it works by nature of compilers liking to use 32-bit registers in this context.

So, to fix this, ensure we cast to long first before shifting.

Helps hexops/mach#20

Signed-off-by: Stephen Gutekanst stephen@hexops.com

While working on [Zig bindings for GLFW](https://github.com/hexops/mach-glfw) me and @Andoryuuta
noticed that `glfwSetWindowIcon` was crashing. I wrote about debugging this and the cause [in an article](https://devlog.hexops.com/2021/perfecting-glfw-for-zig-and-finding-undefined-behavior#finding-lurking-undefined-behavior-in-6-year-old-glfw-code)
but the summary is that when compiling with UBSan (which Zig does by default) clang will insert
`asm { ud1 }` traps when it thinks there is undefined behavior. This code in particular is problematic:

```c
                *target++ = (images[i].pixels[j * 4 + 0] << 16) |
                            (images[i].pixels[j * 4 + 1] <<  8) |
                            (images[i].pixels[j * 4 + 2] <<  0) |
                            (images[i].pixels[j * 4 + 3] << 24);
```

We see in IDA Pro that clang inserted a jump (pictured below) to an `asm { ud1 }` instruction:

![image](https://user-images.githubusercontent.com/3173176/139594073-b2159e4c-6764-44b1-882d-802724f424e8.png)

What is happening here is that:

* `images[i].pixels[j * 4 + 0]` is returning an `unsigned char` (8 bits)
* It is then being shifted left by `<< 16` bits. !!! That's further than an 8-bit number can be shifted left by, hence undefined behavior.

In [an equal snippet of code in Godbolt](https://godbolt.org/z/ddq75WsYK), we can see that without UBSan
clang merely uses the 32-bit EAX register as an optimization. It loads the 8-bit number into the 32-bit
register, and then performs the left shift. Although the shift exceeds 8 bits, it _does not get truncated
to zero_ - instead it is effectively as if the number was converted to a `long` (32 bits) prior to the
left-shift operation.

This explains why nobody has caught this UB in GLFW yet, too: it works by nature of compilers liking to
use 32-bit registers in this context.

So, to fix this, ensure we cast to `long` first before shifting.

Helps hexops/mach#20

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
slimsag added a commit to hexops/mach that referenced this pull request Oct 31, 2021
slimsag added a commit to hexops/mach-glfw that referenced this pull request Oct 31, 2021
@Maato
Copy link

Maato commented Oct 31, 2021

The explanation of why this is undefined behavior is incorrect.

In an expression such as this: images[i].pixels[j * 4 + 0] << 16, the integer promotions are performed on each of the operands. So the left operand, which starts out as unsigned char, will get promoted to int. Shifting a value known to be at most 255 to the left by 16 bits will not cause undefined behavior in this case.

Instead the undefined behavior comes from the following expression: images[i].pixels[j * 4 + 3] << 24. Again the left operand will be promoted to int. If the value happens to be >= 0x80 shifting left by 24 will cause the sign bit of the int to be set (or really, the mathematical result of the shift operation will not be representable in the int).

The standard says the following (E1 is the left operand and E2 is the right operand of the << operator):

If E1 has a signed type and nonnegative value, and E1x2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

@slimsag
Copy link
Contributor Author

slimsag commented Oct 31, 2021

Oh wow, I totally missed that @Maato - thank you for pointing that out. I should've caught that, too, because I knew << 24 was the shift directly before the crash but I guess I muddied the water and got confused :)

@martinhath
Copy link

martinhath commented Oct 31, 2021

Wouldn't it be better to cast to unsigned int or something similar (like uint32_t), since long can still be 32 bits? If the size of long and int is the same, wouldn't the cast as proposed in the PR be semantically identical to the current state?

@elmindreda elmindreda added bug Bug reports and bugfix pull requests X11 labels Oct 31, 2021
@dcousens
Copy link

dcousens commented Nov 1, 2021

If the size of long and int is the same, wouldn't the cast as proposed in the PR be semantically identical to the current state?

// gcc -fsanitize=undefined
#include <stdint.h>
#include <stdio.h>

int main () {
	unsigned char foo = 0x80;
	int x =
		(((int) foo) << 16) |
		(((int) foo) <<  8) |
		(((int) foo) <<  0) |
		(((int) foo) << 24);

	printf("x: %i \n", x);
	return 0;
}
// test.c:10:16: runtime error: left shift of 128 by 24 places cannot be represented in type 'int'
//   x: -2139062144

Yes

I wonder if there is a set of warning flags that could have helped with this.
Unfortunately -Wconversion misses the mark.

@drfuchs
Copy link

drfuchs commented Nov 1, 2021

Wouldn't it be better to cast to unsigned int or something similar (like uint32_t), since long can still be 32 bits? If the size of long and int is the same, wouldn't the cast as proposed in the PR be semantically identical to the current state?

In case there's a thought of fixing the incorrect "cast to long" fix by casting to "long long" instead (which would at least work properly everywhere), another reason that this suggestion by martinhath (to cast to unsigned int) is the correct solution is that it's potentially more efficient in various architectures and optimization levels, with zero extra machine code generated for widening into a 64-bit value, never mind 64-bit shifts being slower in some situations. It also more clearly indicates what the real issue is (shifting a one into the sign bit, which an unsigned int doesn't have).

As suggested by Maato, martinhath, dcousens, and drfuchs.

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
@slimsag
Copy link
Contributor Author

slimsag commented Nov 1, 2021

Pushed a commit to switch to unsigned int, I really appreciate the corrections to my understanding & feedback here everyone!

@Validark
Copy link

Validark commented Nov 2, 2021

To repeat what @Maato said but in arithmetic terms:
(byte ≥ 2⁷) × 2²⁴ ⟹ (result ≥ 2³¹)

The maximum signed 32-bit integer is 2³¹- 1, hence you get an overflow.

@elmindreda elmindreda self-assigned this Nov 2, 2021
@elmindreda elmindreda added the verified Reproduced or otherwise verified bugs label Nov 2, 2021
@elmindreda elmindreda added this to the 3.3.6 milestone Nov 2, 2021
@elmindreda elmindreda added this to Todo in Queue Nov 2, 2021
@linkmauve
Copy link
Member

linkmauve commented Nov 2, 2021

Hi, the proper type for an unsigned 32-bit integer is uint32_t.

Not that it matters much here because GLFW is unlikely to ever be compiled for a 16-bit CPU, but the C standard only guarantees unsigned int to be at least 16-bit wide, which would still make two of these values not representable.

Also note that target is still a long*, which means the conversion from uint32_t to long still happens, and on 32-bit platforms (or platforms using LLP64) the sign bit will still be the most significant bit of the alpha value.

@elmindreda
Copy link
Member

Thank you @slimsag and @Andoryuuta for making and documenting this bug fix, and thank you everyone for helping to refine it!

I agree that uint32_t is a more appropriate type to cast to before shifting, even if unsigned int is probably okay for anything we will run on.

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
@slimsag
Copy link
Contributor Author

slimsag commented Nov 5, 2021

Appreciate the feedback, I've swapped the cast for uint32_t and also changed target and icon from long to uint32_t types which if I understand correctly should be more correct due to XChangeProperty accepting a 32-bit format regardless of platform.

Let me know what you think!

@linkmauve
Copy link
Member

linkmauve commented Nov 6, 2021

Hi, I believe the change from long to uint32_t is incorrect, Xlib defines the data as a series of long, which would be 64-bit on ILP64 platforms like Linux.

@martinhath
Copy link

Hi, I believe the change from long to uint32_t is incorrect, Xlib defines the data as a series of long, which would be 64-bit on ILP64 platforms like Linux.

I'm not sure this is true, do you have a source for this? Here is, from what I can tell, the spec, and it says (under _NET_WM_ICON):

This is an array of 32bit packed CARDINAL ARGB with high byte being A, low byte being B. The first two cardinals are width, height. Data is in rows, left to right and top to bottom.

Further, this is reflected in the actual call to X that GLFW already does:

XChangeProperty(_glfw.x11.display, window->x11.handle,
                _glfw.x11.NET_WM_ICON,
                XA_CARDINAL, 32,
                PropModeReplace,
                (unsigned char*) icon,
                elements);

where we explicitly specify that we're passing in unsigned 32-bit numbers.

I'm no Xpert though, so I might be mistaken :)

@slimsag
Copy link
Contributor Author

slimsag commented Nov 6, 2021

I thought similar to what @martinhath said above based on the XChangeProperty docs, as we pass 32 for the format which is defined as:

Specifies whether the data should be viewed as a list of 8-bit, 16-bit, or 32-bit quantities. Possible values are 8, 16, and 32. This information allows the X server to correctly perform byte-swap operations as necessary. If the format is 16-bit or 32-bit, you must explicitly cast your data pointer to an (unsigned char *) in the call to XChangeProperty().

And because the signature takes unsigned char* not long*.

However, looking into it more it seems like Xlib casts the pointer back to long*, and passes it to this Data32 function internally right here:

Data32 (dpy, (_Xconst long *) data, len);

And on IL64 platforms that ends up defined as a function which takes long* data and converts the 64-bit longs to 32-bit data:

https://sourcegraph.com/github.com/mirror/libX11@2356e59/-/blob/src/XlibInt.c?L1670-1697

So it would seem the Xlib docs are wrong, the function does take long* data despite the signature being unsigned char* data, and despite it claiming to take 32-bit data.

Will update soon.

…pectations

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
slimsag added a commit to hexops/mach that referenced this pull request Nov 16, 2021
See glfw/glfw#1986

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
@slimsag
Copy link
Contributor Author

slimsag commented Nov 16, 2021

OK, I believe this is finally good to go, we now cast to unsigned long. The unsigned should avoid the UBSan complaint about overflowing into the sign bit, while long should match the internal expectations of Xlib which internally treats the pointer as a long (64-bit integer on IL64 platforms, even if the data is only 32-bit.)

I have tested with UBSan as well and this passes.

slimsag added a commit to hexops/mach-glfw that referenced this pull request Nov 16, 2021
See glfw/glfw#1986

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
@elmindreda elmindreda moved this from Todo to Now in Queue Nov 26, 2021
@elmindreda elmindreda closed this in 9cd4d2f Dec 1, 2021
@elmindreda
Copy link
Member

Thank you everyone for the fix!

elmindreda pushed a commit that referenced this pull request Dec 9, 2021
The conversion of window icon image data involves unsigned char color
values being promoted to int and then shifted to the left by 24.  For
32-bit ints this is just far enough to trigger undefined behavior.

It worked by accident because of how current compilers translate this
piece of code.

This was caught by @slimsag while working on [Zig bindings for GLFW][1],
and diagnosed together with @Andoryuuta, as described [in an
article][2].  Zig has UBSan enabled by default, which caught this
undefined behavior.

[1]: https://github.com/hexops/mach-glfw
[2]: https://devlog.hexops.com/2021/perfecting-glfw-for-zig-and-finding-undefined-behavior#finding-lurking-undefined-behavior-in-6-year-old-glfw-code

Thanks to Maato, martinhath, dcousens, drfuchs and Validark for helping
to refine the solution.

This commit message was rewritten by @elmindreda to hopefully reflect
the conclusions of the pull request thread.

Related to hexops/mach#20
Closes #1986

(cherry picked from commit 9cd4d2f)
swarnimarun pushed a commit to swarnimarun/glfw-meson that referenced this pull request Jul 9, 2022
The conversion of window icon image data involves unsigned char color
values being promoted to int and then shifted to the left by 24.  For
32-bit ints this is just far enough to trigger undefined behavior.

It worked by accident because of how current compilers translate this
piece of code.

This was caught by @slimsag while working on [Zig bindings for GLFW][1],
and diagnosed together with @Andoryuuta, as described [in an
article][2].  Zig has UBSan enabled by default, which caught this
undefined behavior.

[1]: https://github.com/hexops/mach-glfw
[2]: https://devlog.hexops.com/2021/perfecting-glfw-for-zig-and-finding-undefined-behavior#finding-lurking-undefined-behavior-in-6-year-old-glfw-code

Thanks to Maato, martinhath, dcousens, drfuchs and Validark for helping
to refine the solution.

This commit message was rewritten by @elmindreda to hopefully reflect
the conclusions of the pull request thread.

Related to hexops/mach#20
Closes glfw#1986

(cherry picked from commit 9cd4d2f)
slimsag added a commit to hexops/mach-glfw that referenced this pull request Aug 26, 2022
slimsag added a commit to hexops/mach-glfw that referenced this pull request Aug 26, 2022
See glfw/glfw#1986

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug reports and bugfix pull requests verified Reproduced or otherwise verified bugs X11
Projects
Development

Successfully merging this pull request may close these issues.

None yet

8 participants