There's a fix for some of that pesky non-standard undefined behaviour in here!#2942
There's a fix for some of that pesky non-standard undefined behaviour in here!#2942Sebbyastian wants to merge 7 commits intophp:masterfrom
Conversation
When you use a standardised language which doesn't specify things like integer representation, you need to craft the integer representation which you intend to send over the wire, or else machines might get confused speaking different endians ;) Additionally, there's the alignment requirements of `int`. An `int` must have an address that is a multiple of a certain number, or the memory and the bus don't line up, you get a bus error which, as I recall, would lead to SIGSEGV/SIGBUS on PowerPC. Typically, however, you'll just suffer a performance loss because Intel is friendly and the processor will have to fetch the remainder of the integer value over the bus (two fetches for one value, seem nice?). This was the first real headache I got from undefined behaviour. Such is the burden of using a standardised language which has undefined behaviour... sometimes issues like this can take _hours, days, weeks or months_ to debug, might even never come to the surface. In my case I think it took a few weeks to diagnose, as I had to go back and forth between school (where the bug was present) and home (where it wasn't). I assume you want a little endian, 32-bit integer, as that's what's common nowadays. Out of curiousity, how often is it that you cast `char` arrays to non-`char` like this? I'll also be refactoring `phpdbg_webdata_compress` to eliminate [this CI warning](https://ci.appveyor.com/project/php/php-src/build/master.build.4823/job/7e7p4rrvbhx5cj7y#L515) (among other similar ones, I'm sure), as that was my original intention here.
Some more `s/int/size_t` where appropriate (i.e. where you don't want the processor overhead of sign processing, because negative values won't make sense anyway)
Update phpdbg_webdata_transfer.h
sapi/phpdbg/phpdbg_rinit_hook.c
Outdated
| size_t len = strlen(PHPDBG_WG(path)) + sizeof(sock.sun_family); | ||
| char buf[(1 << 8) + 1]; | ||
| int buflen; | ||
| size_t buflen; |
There was a problem hiding this comment.
This should be an ssize_t, as it is used for the return value of recv(). Otherwise the error check will fail.
There was a problem hiding this comment.
What error check?
There was a problem hiding this comment.
Oh, yes, you're right. The infinite loop is likely the cause of CI failure. Ta :)
sapi/phpdbg/phpdbg_rinit_hook.c
Outdated
| send(s, (unsigned char[]){ msglen | ||
| , msglen / 0x100 | ||
| , msglen / 0x10000 | ||
| , msglen / 0x1000000 }, 4, 0); |
There was a problem hiding this comment.
There was a problem hiding this comment.
Btw, this should be moved into an explicit declaration, as compound literals are not C89. Not that this file appears to conform to C89 anyway, I guess it never came up due to the WIN32 guard.
There was a problem hiding this comment.
@nikic The most common representation of integers is little endian, and that's the native x86 architecture. I'm about 99.9% certain that's what was intended. It usually is someone thinking they're doing something clever by manually optimising what our compilers could've easily done even fifteen or twenty years ago.
It's probably a good idea to extract that serialisation logic into a function to be reused anywhere else such a serialisation might be used (and a deserialisation function, which is the other part of the bug I haven't checked out, should also be written). After all, that's the reason we choose named objects (i.e. variables) rather than unnamed ones (i.e. my compound literal), right? Because we're going to reuse that identifier later? Otherwise, the identifier we're only going to use once is just noisy... I mean, if you're going to multiply by some literal value, you don't necessarily want to have to assign that literal value to a variable first, right? I think the patch I just committed should be a reasonable compromise, for now.
I'd suggest an interface similar to snprintf and sscanf, for the sake of familiarity. The return value being the number of bytes written in the case of the print function, and the number of items assigned to in the case of the scan function. The only necessary difference is that %d etc parse/create a little endian base 256 representation, as opposed to the big endian base 10 representation which the analogous standard functions typically output. It's a shame you can't use C11, because there are some generics available which might allow developers to avoid that pesky format string which seems to become out of sync with its parameters so easily... It's an even bigger shame you can't use C++, because generics are even nicer in C++. We could work towards that by writing one small step at a time, as Rasmus did, except following the specs to ensure we make it as familiar as possible.
I'm honestly getting sick of one-line functions which merely wrap a macro. There's only one justification for all of this... premature optimisation... Eh, once this is done, I'm outie. I'll pop some food for thought in this review later today, so keep an eye on this space :)
| { | ||
| #if 0 | ||
| int n; | ||
| #if 0 // can we have a discussion about this `#if 0`? |
There was a problem hiding this comment.
you should drop this comment and start the discussion with a comment in this PR.
There was a problem hiding this comment.
By "we", in this case, I think you know where I stand, so I'm going to sit out of the conversation. If you're going to talk about things you should drop, to be clear, it's the entire block of code that's being unconditionally excluded there that should be dropped, not just the comment...
Nonetheless, I'm no longer interested in PHP. I wasn't interested in the first place. I fully regret all of my contributions. They're likely to taint my name. I'm tempted to just delete it all, based on the way this is all dragging me back constantly. I want to work on things that aren't... unportable...
|
@bwoebi can you extract the reasonable parts of this PR please. |
|
Sorry for the delay, phpdbg portion now merged as 168c6cd. Thanks! |
|
You may think this solves the entire problem, but you are wrong. In fact much of the problem in this code still exists, and a whole new problem has emerged. There are a number of quotes to the right of Rasmus Lerdorf which exemplify the attitude here. I care not for any project which sets this as an example, so I actually look forward to PHP burning in fire at the moment... The new problem: This took so long for you to merge, I've decided if I ever find a vulnerability in PHP (which is an activity equivalent to sifting through pig shit for pennies, which makes the fact that there are so many of these kinds of mistakes in PHP even worse), I'll be going straight to the public with it. It'll get fixed quicker, that way. |
|
Come on, 20 days is too long? There are pull requests waiting even for initial review for years, not just here, everywhere on GitHub. 😆 |
|
@nikic I would like to talk about the way you've factored in this patch... https://github.com/php/php-src/blob/master/sapi/phpdbg/phpdbg_rinit_hook.c#L73 Those shifts introduce undefined behaviour. C11/6.5.7p3 states:
You should change the operation back to a multiplicative operator (what I suggested) in order to avoid all unnecessary unportable/undefined behaviour (which is why I chose the multiplicatives, you fuckin' fruitloops!) |
|
@Majkl578 I'm sure you'd be even happier if heartbleed went TWENTY years without coming to resolution... |
|
Does this fix equal to heartbleed's severity? Was it submitted as security issue? I really doubt so. 95+% of PHP users don't even know what phpdbg is and how to use it, plus it's a debug tool, not production runtime. |
|
Just a friendly reminder: http://opensoul.org/99ways/ I would suggest to find the other ways that work... |
Get off your high horse. 20 days is not a long time in the first place, first response was 2 days, and despite you being rude and unpleasant throughout the review process some of your work was merged anyway. The review process seems to outrage you, "I'm fixing your mistakes, how dare you question me", your entire attitude stinks, you are behaving as if some critical security patch was ignored for months instead of what this is - a patch silencing some compiler warnings/fixing theoretical problems which weren't actually causing any problems in the real world (where are the bug refs?), responded to in a timely fashion given the content. If you find an actual security issue in something as widely distributed as PHP and report it by just publishing it - or indeed, by opening a PR on github - have fun explaining your behaviour to the security research community. If it's not a security issue and it isn't causing real world problems, who exactly are you expecting to care?
By who? Three of the most active contributors (at the moment) responded to this non-critical patch against an open-source project, one of them within 72 hours. Who exactly is going to perform this quicker fix? Grow up, and come back when you can behave like an adult. |
Undefined behavior does not exist in a vacuum, it also depends on the the supported target architectures. While the It's important to take platform restrictions into account when analyzing undefined behavior, as you'll unnecessarily spend time handling cases that cannot actually occur. Even if not explicitly documented, it is a reasonable assumption that most projects nowadays do not support compilation for 16-bit systems, unless they specifically target the embedded domain. Finally, let me say that while undefined behavior may lead to security issues (classical example, elision of overflow checks), undefined behavior is not a security issue in itself. I believe it's important to avoid and remove use of undefined behavior where possible, but I don't think labeling any undefined behavior as a security issue makes for a useful threat or impact classification. |
|
@Sebbyastian you can ask anybody in the community, nobody will defend php as a marvelous piece of software. People here do their best in their free time and the result works for the most part because millions of users are pretty much ok with it. Can you please leave that attitude out of here? before you pointed out these problems, dozens of people added features and fixed problems for more than 20 years, again, in their free time and with no funding from big companies. People in here think as a collective. If someone writes slightly incorrect code soon someone will notice it, then another will fix it, and we progress together. Do you want to be a positive part of this group or you want to point fingers at each one of us? |

At a1ff07a#diff-3c2982bb766210ce070c2eb9f6f7e3d6L73 there is no guarantee that
size_twill occupy four bytes. This commit contains various refactoring to reduce type-related warnings and non-standard undefined behaviour.