-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Handle newlines correctly when error_log set to "syslog" #2674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
main/main.c
Outdated
@@ -659,7 +659,40 @@ PHPAPI ZEND_COLD void php_log_err_with_severity(char *log_message, int syslog_ty | |||
if (PG(error_log) != NULL) { | |||
#ifdef HAVE_SYSLOG_H | |||
if (!strcmp(PG(error_log), "syslog")) { | |||
php_syslog(syslog_type_int, "%s", log_message); | |||
static char nvt[] = " !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not allowing characters >= 0x80 is going to break non-English systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not allowing characters >= 0x80 is going to break non-English systems.
Unfortunately, that's a basic limitation of the syslog protocol:
4.1.3 MSG Part of a syslog Packet
The MSG part will fill the remainder of the syslog packet. This will
usually contain some additional information of the process that
generated the message, and then the text of the message. There is no
ending delimiter to this part. The MSG part of the syslog packet
MUST contain visible (printing) characters. The code set
traditionally and most often used has also been seven-bit ASCII in an
eight-bit field like that used in the PRI and HEADER parts. In this
code set, the only allowable characters are the ABNF VCHAR values
(%d33-126) and spaces (SP value %d32). However, no indication of the
code set used within the MSG is required, nor is it expected. Other
code sets MAY be used as long as the characters used in the MSG are
exclusively visible characters and spaces similar to those described
above. The selection of a code set used in the MSG part SHOULD be
made with thoughts of the intended receiver. A message containing
characters in a code set that cannot be viewed or understood by a
recipient will yield no information of value to an operator or
administrator looking at it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds to me like "implementation-defined", which is different from not supported. Some (most?) syslog implementations can clearly deal with UTF-8 and I don't think we should unnecessarily interfere with that (unless there is a specific problem resolved by mangling this?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that syslog is a 1-way protocol, how would syslog guarantee it's using a charset that the destination understands? Particularly in the case of remote logging.
When in doubt, I fall back on a mentor's advice, in this case, Postel's Law:
Be conservative in what you do, be liberal in what you accept from others
Of course, Jon wrote that well before zero-day exploits were a common phenomenon, if I had to recoin his advice, I'd write it as:
Be conservative in what you do, and extremely paranoid in what you accept from others
And understood or not, in UTF-8 I imagine that it's not that difficult to craft a string of executable code, or a valid Xterm escape sequence to exploit a vulnerability, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for whether it's necessary or not, I'm ambivalent. There are probably tens of thousands of syslog
clients.
If it's the syslog
server's responsibility to sanitize what goes into log files (and it is, hence Postel's Law), then it should be fixed there (and not all those other places). After all, it only takes one malicious or exploitable client...
But I made the mistake of opening bug 74860 that has precipitated some alarmist and generally unhelpful comments.
I don't want to be accused of throwing caution to the wind and ignoring them, hence the 2nd (and now 3rd) commits.
Frankly I would be happy with just the first one.
If the 2nd and 3rd are deemed necessary, I'll squash those two together before merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you quote an obsolete RFC 3164?
@bukka It's not obvious to me that all loggers out there implement RFC-5424. Even syslog-ng which is fairly rich doesn't default to RFC-5424 behavior, but instead to classic BSD syslog behavior (which predates even RFC-3164).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I really think that the default should respect the current RFC. If you want to be more strict for some non conforming loggers, then it should be optional. I think that ini setting makes most sense in this case.
Also note that you are changing the wrong place as it is not going to work for fpm. Better solution would be to change php_syslog to function and do it in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that you are changing the wrong place as it is not going to work for fpm. Better solution would be to change php_syslog to function and do it in there.
@bukka Not sure I follow:
$ grep -r php_syslog
ext/standard/php_ext_syslog.h:#include "php_syslog.h"
ext/standard/syslog.c: php_syslog(priority, "%s", message);
ext/standard/mail.c:#include "php_syslog.h"
ext/standard/mail.c: php_syslog(LOG_NOTICE, "%s", message);
sapi/fpm/fpm/fpm_conf.c:#include "php_syslog.h"
sapi/fpm/fpm/fpm_stdio.c:#include "php_syslog.h"
sapi/fpm/fpm/zlog.c:#include "php_syslog.h"
sapi/fpm/fpm/zlog.c: php_syslog(syslog_priorities[zlog_level], "%s", buf);
main/php_syslog.h:#define php_syslog std_syslog
main/php_syslog.h:#ifndef php_syslog
main/php_syslog.h:#define php_syslog syslog
main/php.h:#include "php_syslog.h"
main/main.c:#include "php_syslog.h"
main/main.c: php_syslog(syslog_type_int, "%.*s", (int)valid_len, msg);
main/main.c: php_syslog(syslog_type_int, "%.*s", (int)buf.len, buf.c);
$
php_syslog()
is a macro pointing at either syslog()
or else std_syslog()
, and both are externals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know. That's why I said "better solution would be to change it to the function". :) That function would then call syslog
or std_syslog
for now. Going forward we could add support for remote logging which means reimplementing that completely in a simillar way as nginx does for example. But we could still do it inside that function so such separation would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combining both facts, bytes above 0x7f are fine, and bytes below 0x20 should be filtered. Especially now that we know UTF-8 is explicitly supported by the newer RFC :)
Done, as a complete rewrite.
Better solution would be to change php_syslog to function and do it in there.
Done.
Also added "syslog.highbit" (bool) to control whether characters above 0x7f are allowed or not.
If someone wants to add (subsequently) utf-3 compressed parsing, they can, but it's beyond me.
main/main.c
Outdated
char *buf, *ptr; | ||
size_t buflen; | ||
|
||
FILE *fp = open_memstream(&buf, &buflen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably non-portable. I'd suggest using smart_string
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, I have to back-port this patch to 7.1.8, which doesn't have zend_printf_to_smart_string()
... what's the workaround?
Also ping @bukka, as iirc you were working on changes regarding syslog. |
main/main.c
Outdated
@@ -659,7 +663,47 @@ PHPAPI ZEND_COLD void php_log_err_with_severity(char *log_message, int syslog_ty | |||
if (PG(error_log) != NULL) { | |||
#ifdef HAVE_SYSLOG_H | |||
if (!strcmp(PG(error_log), "syslog")) { | |||
php_syslog(syslog_type_int, "%s", log_message); | |||
static char nvt[] = " !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably non-portable. I'd suggest using smart_string instead.
@nikic Fixed to use smart_string
... though I wonder why nothing else in main/main.c
was previously using smart_string
's.
ef8b017
to
14fef0d
Compare
@@ -1443,7 +1443,7 @@ PHP_ADD_SOURCES(main, main.c snprintf.c spprintf.c php_sprintf.c \ | |||
php_ini.c SAPI.c rfc1867.c php_content_types.c strlcpy.c \ | |||
strlcat.c explicit_bzero.c mergesort.c reentrancy.c php_variables.c php_ticks.c \ | |||
network.c php_open_temporary_file.c \ | |||
output.c getopt.c, -DZEND_ENABLE_STATIC_TSRMLS_CACHE=1) | |||
output.c getopt.c php_syslog.c, -DZEND_ENABLE_STATIC_TSRMLS_CACHE=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
php_syslog.c should also be added to the Windows build system in win32/build/config.w32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
d462649
to
dd552c2
Compare
@weltling Done as a rewrite. What do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the further patch looks closer to what was discussed till now. Still it would be good to hear more opinions, especially about the way to handle the prohibited chars.
Thanks.
/* | ||
* The SCO OpenServer 5 Development System (not the UDK) | ||
* defines syslog to std_syslog. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should be kept somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
main/php_syslog.h
Outdated
|
||
#undef syslog | ||
BEGIN_EXTERN_C() | ||
PHPAPI void php_syslog(int, const char *message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The signature of php_syslog
should stay compatible to syslog()
as per man page. The simplified variant can wrap a va function, please see how zend_error() is constructed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Although at this point, the string is being traversed 4 times by my count... so not very efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. It's really about the API compatibility, as the even being a macro the symbol was available through the header. A simple variant could still be kept by reorganizing a bit.
main/php_syslog.c
Outdated
|
||
if (((' ' <= c) && (c <= '~')) || (PG(syslog_highbit) && (c >= 0x80))) | ||
smart_string_appendc(&buf, c); | ||
else if (c == '\n') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some other new line variants possibly make sense, fe if CRLF appears in a message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point one has to draw a line and say "the way this code is being called is broken" and change the calling code, rather than try to accommodate every possible broken caller.
I wrote the original fix to address the issue of the default (built-in) exception handler calling error_log() with a multiline message (i.e. a stack trace). Ideally, it should have had multiple calls to error_log(), one per line.
Maybe that was my mistake: I should have just fixed the exception handler to break up its messages...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pprindeville yeah, so you've decided the way to solve it a good opensource way, so then everyone could profit from your work. Thanks you for that. Then it turned out there's RFC defining the exact thing, so what is the issue? A discussion usually helps to figure out the best solution and avoid common bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, at least it doesn't affect the output this way till now. So identifiers can be output, which wouldn't be true for arbitrary content with this condition.
main/main.c
Outdated
@@ -611,6 +611,7 @@ PHP_INI_BEGIN() | |||
#ifdef PHP_WIN32 | |||
STD_PHP_INI_BOOLEAN("windows.show_crt_warning", "0", PHP_INI_ALL, OnUpdateBool, windows_show_crt_warning, php_core_globals, core_globals) | |||
#endif | |||
STD_PHP_INI_ENTRY("syslog.highbit", "0", PHP_INI_SYSTEM, OnUpdateBool, syslog_highbit, php_core_globals, core_globals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest it should not be 7-bit by default, it is the current behavior. Maybe also name it a bit clearer, like "syslog_ascii_only" or alike.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, 7-bit is about the only thing we know for certain isn't exploitable.
What's the point of making the code more secure, only to disable the extra checking?
Anyway, the name has been changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's a purely theoretical concern with a very, very tenuous connection to security? If you want to go into pointless hypotheticals, why not consider EBCDIC systems where LF is not even mapped to the same code point as ASCII? If the spec does not specify an encoding, we have to make reasonable assumptions. The reasonable assumptions are a) forbid ASCII control chars and b) handle ASCII LF.
IMHO even having an ini option is completely unnecessary, but it we do add one, it should certainly allow non-ASCII codepoints by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pprindeville so what is the other security aspect besides escape sequences? On the other hand, not allowing UTF-8 to pass nowadays is weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nikic I'm neither a PHP proponent nor opponent. I'll use whatever I'm paid to use: C, C++, Perl, Python, Bash, PHP, Go, whatever...
I don't have a dog in this fight.
What I do know, though, is reading through HTTP server logs everything, over 60% of the attempted exploits are PHP. That's probably something that most of the people reading this message don't like to be reminded of.
I don't know if that's because the environment is inherently less secure or because, for whatever reason, people who code in PHP tend to be less aware of system exploits.
Either way, if I leave my fingerprints on PHP, it's not going to be in the form of an added exploit.
I've put instructions into the default php.ini that explain this control and how to open up non-ascii logging if one is willing to accept any consequences which might come from that.
As for EBCDIC, if you're running in an EBCDIC environment, you have to compile in an EBCDIC environment so that strings and character literals are encoded correctly, and that the correct table for <ctype.h>
is used.
IETF standards are woefully unaware of non-Latin languages, I used to title at that windmill in the 90's when HTML was being finalized. If you get bored, see the postings of philipp@res.enst.fr on www-html and www-talk mailing lists.
I'm not an English language bigot: I just won't compromise security for convenience, even if it's a major inconvenience to use (say) Cyrillic or Biao Gao with IETF protocols.
If the standards are inadequate, fix them. Don't willingly write broken code that violates them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the previous logger experiment, i now went a bit further and checked same with syslog-ng and dsyslog. Previously it was rsyslog, as it turned out. Anyway not sure there are more syslog daemons to get on Jessie by default and i've no other VMs to break ATM. At least what turned out is, that the behavior of all of them is same as shown before - an UTF-8 string lands in the syslog. Furthermore, seems it doesn't care whether BOM is present or not.
Regarding BOM - it might be more clear why the new RFC requires it for UTF_8 and why it requires UTF-8 to be the shortest form http://www.unicode.org/reports/tr36/#UTF-8_Exploit . To answer the earlier question - there is absolutely nothing in the core that deals with BOM. And i wouldn't even try to understand, what the given syslog daemon does in this case. Fe doing a write of the ill formed sequence 41 C2 3E 42
as from the UNICODE-TR36, it lands in the log file without any complain whether BOM is used or not. Apparently there is a security aspect. As per the RFC, the implementation does not have to interpret UTF-8 without BOM, but also it wouldn't care about an arbitrary string contents.
@pprindeville The topic is complex, we see. In PHP security and backward compatibility have both high priority. In this case, it seems that the backward compatibility breach were a far overreaction and thus not justified. It should really be thought through carefully, perhaps involving mbstring and default_charset or whatsoever measures. It is a fun topic and it is good that we discuss it so it reveals how complex it is and there indeed can be issues. At least we see now, that the best possible solution is not to achieve with just extending the intended fix a bit.
@pprindeville please let's not be hasty. I would suggest for now to reduce to an unavoidable and backward compatible minimum, which is
- split by newline
- block
c < 32
syslog.ascii_only = 0 by default
This solves the initial issue and puts in some security bits. Furthermore, it seems to be the solution with the most consent so far. Anyway, having this the PR can be completed and anything beyond can be left for further consideration.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This solves the initial issue and puts in some security bits.
@weltling Well, it puts them in but largely disables them. Anyway, I don't care. All I ever wanted was splitting stack traces in the exception handler so they wouldn't look like:
Aug 11 15:19:01 ubuntu16 php: PHP Fatal error: Uncaught Error: Class 'NoSuchClass' not found in /home/philipp/bertram/php-src/test.php:4#012Stack trace:#012#0 /home/philipp/bertram/php-src/test.php(15): foo()#012#1 {main}#012 thrown in /home/philipp/bertram/php-src/test.php on line 4
which is what they currently look like. I've spent 6x more time on this issue than initially planned.
We've still not decided if DEL should ever get written in the clear or not. Nothing I've seen suggests that 0xf7 should be allowed.
Also, if "backward compatibility" was the vulnerability, don't you want to deliberately break that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syslog.ascii_only = 0 by default
Okay, changed the default and updated the text in the default php.ini
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've spent 6x more time on this issue than initially planned.
Also any participants here spend time as well :) Some cases are like this, where a big thing turns out from some relatively simple. It's the normal process we follow for reviews anyway.
Also, if "backward compatibility" was the vulnerability, don't you want to deliberately break that?
Say it plain, yes. There is this document https://wiki.php.net/security for some time already, that we use to evaluate the issue impacts. In the security regard - the known XTERM vulnerability is fixed with your patch. The other case with the UTF-8 exploit - i might be wrong, however when looking at the classifications, it's probably a low one as for me. Quite a few assumptions need to come true on the PHP side and for a vulnerable log viewer software the data source probably doesn't matter.
With 0x7f - from the RFC it is invalid and also doesn't seem a part of any code point in UTF-8. Lets see for more comments yet, otherwise the state seems quite close to finish. I'm also going for some tests next days.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weltling Looks like Big5 and SJIS also skip over 0x7f, so no problem there either.
main/php_syslog.c
Outdated
break; | ||
} | ||
|
||
if (((' ' <= c) && (c <= '~')) || (PG(syslog_highbit) && (c >= 0x80))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the >= 0x80
condition. Some encodings might still be broken by this. Fe BIG5 or other CJK. I'd suggest implementing conditions/actions like this
c
< 32 andc
is not a newline => filter out or replacec
has to be 7-bit only and ifc
> 126 then filter out or replace- otherwise pass as is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything more complicated than this, and we need to implement a utf-8 compressed format parser. Maybe that's the way to go, but can we do it as a follow-on enhancement?
I'm trying to fix a vulnerability, and keep the scope of the change limited.
The bigger the change, the more likelihood of injecting another vulnerability (or multiple ones) by too big of an untested change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weltling >= 0x80
is how PHP handles this elsewhere, e.g. in identifiers. (IIRC this was changed in 7.1, previously it was >= 0x7f
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pprindeville the vulnerability linked was about escape sequences, so c< 32
is the only condition for that. Or do i miss yet more vulnerabilities? Also, how it makes the handling more complicated, c < 32 needs to be replaced anyway and otherwise there is no reason to parse anything except the key points as per RFC.
@nikic But we still supported other encodings than single byte and UTF-8? Fe check https://en.wikipedia.org/wiki/Big5#Organization where the second byte can be in the range of 0xa1-0xfe, so it'd result in a broken output. At least in some CJK encodnigs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weltling Not sure I get what you mean. 0xa1 - 0xfe will still pass the >= 0x80 check. As far as I'm aware all encodings in practical use (discounting encodings with multi-byte code units and EBCDIC, something we're assuming here anyway) are either ASCII-compatible or quasi-compatible (e.g. SJIS differs in the mapping of some characters in the ASCII-plane, but not in any way that is relevant in this context) and only do anything special for code-units >= 0x80, so allowing that range is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nikic ups, my c/p mistake. There are two ranges for the second byte there - 0x40-0x7e and 0xa1-0xfe, at least as per doc linked before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weltling Ah, I see. Would still be allowed by these checks. The ' '..'~' check is 0x20..0x7e
, so that includes the range. (It would probably make sense to explicitly use the hex values here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nikic indeed, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the vulnerability linked was about escape sequences, so c< 32 is the only condition for that. Or do i miss yet more vulnerabilities?
@weltling There are syslog implementations out there which don't implement RFC-5424, and others still which implement it but disable it by default, falling back on RFC-3164, which states:
4.1.3 MSG Part of a syslog Packet
The MSG part will fill the remainder of the syslog packet. This will
usually contain some additional information of the process that
generated the message, and then the text of the message. There is no
ending delimiter to this part. The MSG part of the syslog packet
MUST contain visible (printing) characters. The code set
traditionally and most often used has also been seven-bit ASCII in an
eight-bit field like that used in the PRI and HEADER parts. In this
code set, the only allowable characters are the ABNF VCHAR values
(%d33-126) and spaces (SP value %d32). However, no indication of the
code set used within the MSG is required, nor is it expected. Other
code sets MAY be used as long as the characters used in the MSG are
exclusively visible characters and spaces similar to those described
above. The selection of a code set used in the MSG part SHOULD be
made with thoughts of the intended receiver. A message containing
characters in a code set that cannot be viewed or understood by a
recipient will yield no information of value to an operator or
administrator looking at it.
The only actionable part of that paragraph is, "In this code set, the only allowable characters are the ABNF VCHAR values (%d33-126) and spaces (SP value %d32)."
The rest is unactionable. For example, "However, no indication of the code set used within the MSG is required, nor is it expected. Other code sets MAY be used as long as the characters used in the MSG are exclusively visible characters and spaces similar to those described above."
If you don't know what the code set is because it's not explicitly called out, then how do you know if a character is "printing" or not in that code set? You don't. You can't.
a594f73
to
715a831
Compare
f2ec1ba
to
3dd32e4
Compare
Any chance of this going in this week? |
php.ini-development
Outdated
@@ -517,6 +517,15 @@ report_memleaks = On | |||
; This setting is on by default. | |||
;report_zend_debug = 0 | |||
|
|||
; Set this to disable non-ASCII logging (more secure). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more secure
comment is very misleading and it just shows on broken loggers that can't handle UTF-8 for example which is a minority IMHO. It should be really clear that it is a work around for broken loggers.
Btw. there is no security issue in PHP. This is just a problem of the logger as RFC does not prohibit any characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bukka The loggers aren't the issue, so much as terminal emulators that allow themselves to be subverted.
main/main.c
Outdated
@@ -611,6 +611,7 @@ PHP_INI_BEGIN() | |||
#ifdef PHP_WIN32 | |||
STD_PHP_INI_BOOLEAN("windows.show_crt_warning", "0", PHP_INI_ALL, OnUpdateBool, windows_show_crt_warning, php_core_globals, core_globals) | |||
#endif | |||
STD_PHP_INI_ENTRY("syslog.ascii_only", "0", PHP_INI_SYSTEM, OnUpdateBool, syslog_ascii_only, php_core_globals, core_globals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to see something more generic like syslog.encoding
to allow us also properly support UTF-8 - add BOM if selected. We also have functions valildation for UTF-8 and some other encoding in html.c in standard ext so it might be also good idea to use. This is just an initial thought so needs probably more thinking and it's open for discussion ofc. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is turning into a bike-shedding exercise.
The original question was trivial: how to handle strings with embedded newlines?
Splitting into two issues and two PR's.
See bug #75077 for the issue related to non-NVT ASCII.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, let's not drag more changes into this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well my point is that we introduce syslog.ascii_only
and then to check utf-8, we will need to add syslog.utf8_only
or something like that. That's not really flexible IMHO. I would prefer a single ini for that where you can specify required encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a discussion some days ago in this thread #2674 (comment) where it turned out it the encoding is not really respected, specifically after this experiment is what was to see here #2674 (comment). If you mean just about checking the encoding - default_charset might suffice for that already, so no reason to introduce a complication. For encoding conversion it's probably even more complicated to start here. In general, the patch as is seemed to be OK, both for fixing the initial newline splitting issue and securing some bits, the further handling should be a matter of a presumably more complicated implementation. Since it targets master, it were OK to have this patch at least for the start, then looking whether a complicated implementation could ever make it.
Thanks.
va_end(args); | ||
|
||
for (ptr = fbuf.c; ; ++ptr) { | ||
c = *ptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it could be inside the for
I guess :)
3dd32e4
to
f11d40e
Compare
Okay, now can we proceed with a merge? |
@bukka "Defense in depth" strategy entails trying to defeat an attack in more than one place. That's what we're doing here. No one said that the vulnerability is in PHP: it's not. The vulnerability is in the Xterm (and other terminal emulators) which don't handle device control strings (DCS's) in a sane way. But we can try to prevent PHP from being the vector by which the vulnerability is injected. |
Actually i was going to do some tests by the EOW. The last point was to exclude also 0x7f, but now the patch is completely different and there's yet another PR? What is the plan now, is this PR superseeded? Thanks. |
Not at all. PR #2689 is layered on this one. You can do your testing there, and I'll rebase the PR to drop the first commit as soon as it hits master. |
@pprindeville hmm, not sure to follow. #2689 is what this PR was before. Why it is split out? Or if i missed something in the correspondence, please point me there. Thanks. |
Because there are two problems: (1) how to handle strings with newlines in them, (2) how to handle non-NVT ASCII. (1) is a trivial issue easily solved by the original first commit, with hardly any discussion required. This PR reverts to the original fix (and the original commit) which handles just issue (1). The other PR is based on this PR plus the two additional commits which handle (a) strict NVT-ASCII when strict NVT-ASCII is all that is explicitly permitted/configured, and (b) either escaping non NVT-ASCII characters into safe octal notation--which I'm figuring won't be the ultimate solution-- or (b.2) translating the string into utf-8 compressed format with a BOM prefix... which is the only explicitly and concretely defined of handling anything other than NVT-ASCII. I'll revise PR #2689 to (b.2) once it's agreed that this is the way to proceed and further, someone shows me what the libraries are for translating to utf8 and for validating utf8 correctness (in case the app is logging utf8 natively). |
@pprindeville but the patch didn't touch any encoding, did it? It fixed the initial issue and the terminal issue as a bonus. Anything else was passed through for now, no BOM, etc. :) The only point came from @bukka just now about the ini naming, either it could be fixed as well or later in master, since that's what the patch was targeting. Anyway, lets see it this way now. Thanks. Thanks. |
@welting If you mean "was it codeset aware", then the answer was "no". It assumed everything was (or should be) NVT-ASCII. We could, in PR #2689, look at the default_charset as you say, and convert the string from that to utf-8 and append a BOM. Then we'd be honoring Section 6.4 in the only concrete terms it has. |
@pprindeville converting everything to UTF-8 might be not that doable - it'll likely require some heavy API and fe ext/mbstring is not an obligatory dependency. There are yet more questions than answers, as for me. I can only applaud you still strive a wider solution on this, anyway. I'm going for tests closer to the EOW, so if everything ok at least the newline part can be handled already. Thanks. |
@weltling Hi. Any news? |
Merged with master, adapted for the Windows side. Thanks. |
Were additional changes required? |
See: php/php-src#2674 Signed-off-by: Philip Prindeville <philipp@redfish-solutions.com>
See: php/php-src#2674 Signed-off-by: Philip Prindeville <philipp@redfish-solutions.com>
Anyway, thanks! |
@pprindeville yeah, splitting doesn't seem to be suitable for the event log. It's record based, not line based. The parts about sanitizing might still make sense, but the MSDN docs don't mention much on that. Thanks. |
See: php/php-src#2674 Signed-off-by: Philip Prindeville <philipp@redfish-solutions.com>
See: php/php-src#2674 Signed-off-by: Philip Prindeville <philipp@redfish-solutions.com>
See: php/php-src#2674 Signed-off-by: Philip Prindeville <philipp@redfish-solutions.com>
This fixes bz#78460 in compliance with RFC-3164.