Skip to content

There's a fix for some of that pesky non-standard undefined behaviour in here!#2942

Closed
Sebbyastian wants to merge 7 commits intophp:masterfrom
Sebbyastian:patch-7
Closed

There's a fix for some of that pesky non-standard undefined behaviour in here!#2942
Sebbyastian wants to merge 7 commits intophp:masterfrom
Sebbyastian:patch-7

Conversation

@Sebbyastian
Copy link
Copy Markdown
Contributor

At a1ff07a#diff-3c2982bb766210ce070c2eb9f6f7e3d6L73 there is no guarantee that size_t will occupy four bytes. This commit contains various refactoring to reduce type-related warnings and non-standard undefined behaviour.

When you use a standardised language which doesn't specify things like integer representation, you need to craft the integer representation which you intend to send over the wire, or else machines might get confused speaking different endians ;)

Additionally, there's the alignment requirements of `int`. An `int` must have an address that is a multiple of a certain number, or the memory and the bus don't line up, you get a bus error which, as I recall, would lead to SIGSEGV/SIGBUS on PowerPC. Typically, however, you'll just suffer a performance loss because Intel is friendly and the processor will have to fetch the remainder of the integer value over the bus (two fetches for one value, seem nice?).

This was the first real headache I got from undefined behaviour. Such is the burden of using a standardised language which has undefined behaviour... sometimes issues like this can take _hours, days, weeks or months_ to debug, might even never come to the surface. In my case I think it took a few weeks to diagnose, as I had to go back and forth between school (where the bug was present) and home (where it wasn't).

I assume you want a little endian, 32-bit integer, as that's what's common nowadays. Out of curiousity, how often is it that you cast `char` arrays to non-`char` like this?

I'll also be refactoring `phpdbg_webdata_compress` to eliminate [this CI warning](https://ci.appveyor.com/project/php/php-src/build/master.build.4823/job/7e7p4rrvbhx5cj7y#L515) (among other similar ones, I'm sure), as that was my original intention here.
Some more `s/int/size_t` where appropriate (i.e. where you don't want the processor overhead of sign processing, because negative values won't make sense anyway)
Update phpdbg_webdata_transfer.h
size_t len = strlen(PHPDBG_WG(path)) + sizeof(sock.sun_family);
char buf[(1 << 8) + 1];
int buflen;
size_t buflen;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an ssize_t, as it is used for the return value of recv(). Otherwise the error check will fail.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What error check?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes, you're right. The infinite loop is likely the cause of CI failure. Ta :)

send(s, (unsigned char[]){ msglen
, msglen / 0x100
, msglen / 0x10000
, msglen / 0x1000000 }, 4, 0);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a genuine bug in the original code, in that it causes a dependence on the machine endianness, which I presume was not intended.

@bwoebi @krakjoe I'm not familiar with where this protocol is used. Is it correct that the length should be encoded as little-endian?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, this should be moved into an explicit declaration, as compound literals are not C89. Not that this file appears to conform to C89 anyway, I guess it never came up due to the WIN32 guard.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikic The most common representation of integers is little endian, and that's the native x86 architecture. I'm about 99.9% certain that's what was intended. It usually is someone thinking they're doing something clever by manually optimising what our compilers could've easily done even fifteen or twenty years ago.

It's probably a good idea to extract that serialisation logic into a function to be reused anywhere else such a serialisation might be used (and a deserialisation function, which is the other part of the bug I haven't checked out, should also be written). After all, that's the reason we choose named objects (i.e. variables) rather than unnamed ones (i.e. my compound literal), right? Because we're going to reuse that identifier later? Otherwise, the identifier we're only going to use once is just noisy... I mean, if you're going to multiply by some literal value, you don't necessarily want to have to assign that literal value to a variable first, right? I think the patch I just committed should be a reasonable compromise, for now.

I'd suggest an interface similar to snprintf and sscanf, for the sake of familiarity. The return value being the number of bytes written in the case of the print function, and the number of items assigned to in the case of the scan function. The only necessary difference is that %d etc parse/create a little endian base 256 representation, as opposed to the big endian base 10 representation which the analogous standard functions typically output. It's a shame you can't use C11, because there are some generics available which might allow developers to avoid that pesky format string which seems to become out of sync with its parameters so easily... It's an even bigger shame you can't use C++, because generics are even nicer in C++. We could work towards that by writing one small step at a time, as Rasmus did, except following the specs to ensure we make it as familiar as possible.

I'm honestly getting sick of one-line functions which merely wrap a macro. There's only one justification for all of this... premature optimisation... Eh, once this is done, I'm outie. I'll pop some food for thought in this review later today, so keep an eye on this space :)

{
#if 0
int n;
#if 0 // can we have a discussion about this `#if 0`?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should drop this comment and start the discussion with a comment in this PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "we", in this case, I think you know where I stand, so I'm going to sit out of the conversation. If you're going to talk about things you should drop, to be clear, it's the entire block of code that's being unconditionally excluded there that should be dropped, not just the comment...

Nonetheless, I'm no longer interested in PHP. I wasn't interested in the first place. I fully regret all of my contributions. They're likely to taint my name. I'm tempted to just delete it all, based on the way this is all dragging me back constantly. I want to work on things that aren't... unportable...

@krakjoe
Copy link
Copy Markdown
Member

krakjoe commented Dec 11, 2017

@bwoebi can you extract the reasonable parts of this PR please.

@nikic
Copy link
Copy Markdown
Member

nikic commented Dec 16, 2017

Sorry for the delay, phpdbg portion now merged as 168c6cd. Thanks!

@nikic nikic closed this Dec 16, 2017
@Sebbyastian
Copy link
Copy Markdown
Contributor Author

Sebbyastian commented Dec 17, 2017

You may think this solves the entire problem, but you are wrong. In fact much of the problem in this code still exists, and a whole new problem has emerged. There are a number of quotes to the right of Rasmus Lerdorf which exemplify the attitude here. I care not for any project which sets this as an example, so I actually look forward to PHP burning in fire at the moment...

php

The new problem: This took so long for you to merge, I've decided if I ever find a vulnerability in PHP (which is an activity equivalent to sifting through pig shit for pennies, which makes the fact that there are so many of these kinds of mistakes in PHP even worse), I'll be going straight to the public with it. It'll get fixed quicker, that way.

@Majkl578
Copy link
Copy Markdown
Contributor

Come on, 20 days is too long? There are pull requests waiting even for initial review for years, not just here, everywhere on GitHub. 😆

@Sebbyastian
Copy link
Copy Markdown
Contributor Author

Sebbyastian commented Dec 17, 2017

@nikic I would like to talk about the way you've factored in this patch...

https://github.com/php/php-src/blob/master/sapi/phpdbg/phpdbg_rinit_hook.c#L73

Those shifts introduce undefined behaviour.

C11/6.5.7p3 states:

If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.

You should change the operation back to a multiplicative operator (what I suggested) in order to avoid all unnecessary unportable/undefined behaviour (which is why I chose the multiplicatives, you fuckin' fruitloops!)

@Sebbyastian
Copy link
Copy Markdown
Contributor Author

@Majkl578 I'm sure you'd be even happier if heartbleed went TWENTY years without coming to resolution...

@Majkl578
Copy link
Copy Markdown
Contributor

Majkl578 commented Dec 17, 2017

Does this fix equal to heartbleed's severity? Was it submitted as security issue? I really doubt so. 95+% of PHP users don't even know what phpdbg is and how to use it, plus it's a debug tool, not production runtime.
Don't expect everyone to react in timely maner. This is not how open source works. Open source is mostly developed by people for passion, in their free time, with no reward. This is how PHP works and how most of of the popular PHP projects work.

@petk
Copy link
Copy Markdown
Member

petk commented Dec 17, 2017

Just a friendly reminder: http://opensoul.org/99ways/

I would suggest to find the other ways that work...

@DaveRandom
Copy link
Copy Markdown
Contributor

This took so long for you to merge, I've decided if I ever find a vulnerability in PHP [...] I'll be going straight to the public with it.

Get off your high horse. 20 days is not a long time in the first place, first response was 2 days, and despite you being rude and unpleasant throughout the review process some of your work was merged anyway.

The review process seems to outrage you, "I'm fixing your mistakes, how dare you question me", your entire attitude stinks, you are behaving as if some critical security patch was ignored for months instead of what this is - a patch silencing some compiler warnings/fixing theoretical problems which weren't actually causing any problems in the real world (where are the bug refs?), responded to in a timely fashion given the content.

If you find an actual security issue in something as widely distributed as PHP and report it by just publishing it - or indeed, by opening a PR on github - have fun explaining your behaviour to the security research community. If it's not a security issue and it isn't causing real world problems, who exactly are you expecting to care?

It'll get fixed quicker, that way.

By who? Three of the most active contributors (at the moment) responded to this non-critical patch against an open-source project, one of them within 72 hours. Who exactly is going to perform this quicker fix?

Grow up, and come back when you can behave like an adult.

@nikic
Copy link
Copy Markdown
Member

nikic commented Dec 17, 2017

https://github.com/php/php-src/blob/master/sapi/phpdbg/phpdbg_rinit_hook.c#L73

Those shifts introduce undefined behaviour.

C11/6.5.7p3 states:

If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.

You should change the operation back to a multiplicative operator (what I suggested) in order to avoid all unnecessary unportable/undefined behaviour (which is why I chose the multiplicatives, you fuckin' fruitloops!)

Undefined behavior does not exist in a vacuum, it also depends on the the supported target architectures. While the size_t type is only spec'd for minimum of 16 bits, PHP requires that size_t is at least 32 bit. As such, there is no undefined behavior. I've chosen shift+mask over division here, because it fits the semantics of binary encoding better.

It's important to take platform restrictions into account when analyzing undefined behavior, as you'll unnecessarily spend time handling cases that cannot actually occur. Even if not explicitly documented, it is a reasonable assumption that most projects nowadays do not support compilation for 16-bit systems, unless they specifically target the embedded domain.

Finally, let me say that while undefined behavior may lead to security issues (classical example, elision of overflow checks), undefined behavior is not a security issue in itself. I believe it's important to avoid and remove use of undefined behavior where possible, but I don't think labeling any undefined behavior as a security issue makes for a useful threat or impact classification.

@Wes1262
Copy link
Copy Markdown

Wes1262 commented Dec 25, 2017

@Sebbyastian you can ask anybody in the community, nobody will defend php as a marvelous piece of software. People here do their best in their free time and the result works for the most part because millions of users are pretty much ok with it. Can you please leave that attitude out of here? before you pointed out these problems, dozens of people added features and fixed problems for more than 20 years, again, in their free time and with no funding from big companies. People in here think as a collective. If someone writes slightly incorrect code soon someone will notice it, then another will fix it, and we progress together. Do you want to be a positive part of this group or you want to point fingers at each one of us?

@php php deleted a comment from Sebbyastian Dec 26, 2017
@php php locked as too heated and limited conversation to collaborators Dec 26, 2017
@php php unlocked this conversation Dec 26, 2017
@php php locked as off-topic and limited conversation to collaborators Dec 26, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants