Skip to content

Handle newlines correctly when error_log set to "syslog" #2674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

pprindeville
Copy link
Contributor

This fixes bz#78460 in compliance with RFC-3164.

@pprindeville
Copy link
Contributor Author

Requesting reviews by @weltling, @dstogov.

main/main.c Outdated
@@ -659,7 +659,40 @@ PHPAPI ZEND_COLD void php_log_err_with_severity(char *log_message, int syslog_ty
if (PG(error_log) != NULL) {
#ifdef HAVE_SYSLOG_H
if (!strcmp(PG(error_log), "syslog")) {
php_syslog(syslog_type_int, "%s", log_message);
static char nvt[] = " !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not allowing characters >= 0x80 is going to break non-English systems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not allowing characters >= 0x80 is going to break non-English systems.

Unfortunately, that's a basic limitation of the syslog protocol:

4.1.3 MSG Part of a syslog Packet

   The MSG part will fill the remainder of the syslog packet.  This will
   usually contain some additional information of the process that
   generated the message, and then the text of the message.  There is no
   ending delimiter to this part.  The MSG part of the syslog packet
   MUST contain visible (printing) characters.  The code set
   traditionally and most often used has also been seven-bit ASCII in an
   eight-bit field like that used in the PRI and HEADER parts.  In this
   code set, the only allowable characters are the ABNF VCHAR values
   (%d33-126) and spaces (SP value %d32).  However, no indication of the
   code set used within the MSG is required, nor is it expected.  Other
   code sets MAY be used as long as the characters used in the MSG are
   exclusively visible characters and spaces similar to those described
   above.  The selection of a code set used in the MSG part SHOULD be
   made with thoughts of the intended receiver.  A message containing
   characters in a code set that cannot be viewed or understood by a
   recipient will yield no information of value to an operator or
   administrator looking at it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds to me like "implementation-defined", which is different from not supported. Some (most?) syslog implementations can clearly deal with UTF-8 and I don't think we should unnecessarily interfere with that (unless there is a specific problem resolved by mangling this?)

Copy link
Contributor Author

@pprindeville pprindeville Aug 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that syslog is a 1-way protocol, how would syslog guarantee it's using a charset that the destination understands? Particularly in the case of remote logging.

When in doubt, I fall back on a mentor's advice, in this case, Postel's Law:

Be conservative in what you do, be liberal in what you accept from others

Of course, Jon wrote that well before zero-day exploits were a common phenomenon, if I had to recoin his advice, I'd write it as:

Be conservative in what you do, and extremely paranoid in what you accept from others

And understood or not, in UTF-8 I imagine that it's not that difficult to craft a string of executable code, or a valid Xterm escape sequence to exploit a vulnerability, etc.

Copy link
Contributor Author

@pprindeville pprindeville Aug 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for whether it's necessary or not, I'm ambivalent. There are probably tens of thousands of syslog clients.

If it's the syslog server's responsibility to sanitize what goes into log files (and it is, hence Postel's Law), then it should be fixed there (and not all those other places). After all, it only takes one malicious or exploitable client...

But I made the mistake of opening bug 74860 that has precipitated some alarmist and generally unhelpful comments.

I don't want to be accused of throwing caution to the wind and ignoring them, hence the 2nd (and now 3rd) commits.

Frankly I would be happy with just the first one.

If the 2nd and 3rd are deemed necessary, I'll squash those two together before merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you quote an obsolete RFC 3164?

@bukka It's not obvious to me that all loggers out there implement RFC-5424. Even syslog-ng which is fairly rich doesn't default to RFC-5424 behavior, but instead to classic BSD syslog behavior (which predates even RFC-3164).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I really think that the default should respect the current RFC. If you want to be more strict for some non conforming loggers, then it should be optional. I think that ini setting makes most sense in this case.

Also note that you are changing the wrong place as it is not going to work for fpm. Better solution would be to change php_syslog to function and do it in there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that you are changing the wrong place as it is not going to work for fpm. Better solution would be to change php_syslog to function and do it in there.

@bukka Not sure I follow:

$ grep -r php_syslog
ext/standard/php_ext_syslog.h:#include "php_syslog.h"
ext/standard/syslog.c:	php_syslog(priority, "%s", message);
ext/standard/mail.c:#include "php_syslog.h"
ext/standard/mail.c:	php_syslog(LOG_NOTICE, "%s", message);
sapi/fpm/fpm/fpm_conf.c:#include "php_syslog.h"
sapi/fpm/fpm/fpm_stdio.c:#include "php_syslog.h"
sapi/fpm/fpm/zlog.c:#include "php_syslog.h"
sapi/fpm/fpm/zlog.c:		php_syslog(syslog_priorities[zlog_level], "%s", buf);
main/php_syslog.h:#define php_syslog std_syslog
main/php_syslog.h:#ifndef php_syslog
main/php_syslog.h:#define php_syslog syslog
main/php.h:#include "php_syslog.h"
main/main.c:#include "php_syslog.h"
main/main.c:					php_syslog(syslog_type_int, "%.*s", (int)valid_len, msg);
main/main.c:					php_syslog(syslog_type_int, "%.*s", (int)buf.len, buf.c);
$ 

php_syslog() is a macro pointing at either syslog() or else std_syslog(), and both are externals.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know. That's why I said "better solution would be to change it to the function". :) That function would then call syslog or std_syslog for now. Going forward we could add support for remote logging which means reimplementing that completely in a simillar way as nginx does for example. But we could still do it inside that function so such separation would be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combining both facts, bytes above 0x7f are fine, and bytes below 0x20 should be filtered. Especially now that we know UTF-8 is explicitly supported by the newer RFC :)

Done, as a complete rewrite.

Better solution would be to change php_syslog to function and do it in there.

Done.

Also added "syslog.highbit" (bool) to control whether characters above 0x7f are allowed or not.

If someone wants to add (subsequently) utf-3 compressed parsing, they can, but it's beyond me.

main/main.c Outdated
char *buf, *ptr;
size_t buflen;

FILE *fp = open_memstream(&buf, &buflen);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably non-portable. I'd suggest using smart_string instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I have to back-port this patch to 7.1.8, which doesn't have zend_printf_to_smart_string()... what's the workaround?

@nikic
Copy link
Member

nikic commented Aug 7, 2017

Also ping @bukka, as iirc you were working on changes regarding syslog.

main/main.c Outdated
@@ -659,7 +663,47 @@ PHPAPI ZEND_COLD void php_log_err_with_severity(char *log_message, int syslog_ty
if (PG(error_log) != NULL) {
#ifdef HAVE_SYSLOG_H
if (!strcmp(PG(error_log), "syslog")) {
php_syslog(syslog_type_int, "%s", log_message);
static char nvt[] = " !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~";
Copy link
Contributor Author

@pprindeville pprindeville Aug 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably non-portable. I'd suggest using smart_string instead.

@nikic Fixed to use smart_string... though I wonder why nothing else in main/main.c was previously using smart_string's.

@pprindeville pprindeville force-pushed the bug/74860 branch 2 times, most recently from ef8b017 to 14fef0d Compare August 10, 2017 03:00
@@ -1443,7 +1443,7 @@ PHP_ADD_SOURCES(main, main.c snprintf.c spprintf.c php_sprintf.c \
php_ini.c SAPI.c rfc1867.c php_content_types.c strlcpy.c \
strlcat.c explicit_bzero.c mergesort.c reentrancy.c php_variables.c php_ticks.c \
network.c php_open_temporary_file.c \
output.c getopt.c, -DZEND_ENABLE_STATIC_TSRMLS_CACHE=1)
output.c getopt.c php_syslog.c, -DZEND_ENABLE_STATIC_TSRMLS_CACHE=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

php_syslog.c should also be added to the Windows build system in win32/build/config.w32

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@pprindeville pprindeville force-pushed the bug/74860 branch 6 times, most recently from d462649 to dd552c2 Compare August 10, 2017 04:57
@pprindeville
Copy link
Contributor Author

@weltling Done as a rewrite. What do you think?

Copy link
Contributor

@weltling weltling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the further patch looks closer to what was discussed till now. Still it would be good to hear more opinions, especially about the way to handle the prohibited chars.

Thanks.

/*
* The SCO OpenServer 5 Development System (not the UDK)
* defines syslog to std_syslog.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should be kept somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


#undef syslog
BEGIN_EXTERN_C()
PHPAPI void php_syslog(int, const char *message);
Copy link
Contributor

@weltling weltling Aug 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature of php_syslog should stay compatible to syslog() as per man page. The simplified variant can wrap a va function, please see how zend_error() is constructed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Although at this point, the string is being traversed 4 times by my count... so not very efficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It's really about the API compatibility, as the even being a macro the symbol was available through the header. A simple variant could still be kept by reorganizing a bit.


if (((' ' <= c) && (c <= '~')) || (PG(syslog_highbit) && (c >= 0x80)))
smart_string_appendc(&buf, c);
else if (c == '\n') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other new line variants possibly make sense, fe if CRLF appears in a message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point one has to draw a line and say "the way this code is being called is broken" and change the calling code, rather than try to accommodate every possible broken caller.

I wrote the original fix to address the issue of the default (built-in) exception handler calling error_log() with a multiline message (i.e. a stack trace). Ideally, it should have had multiple calls to error_log(), one per line.

Maybe that was my mistake: I should have just fixed the exception handler to break up its messages...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pprindeville yeah, so you've decided the way to solve it a good opensource way, so then everyone could profit from your work. Thanks you for that. Then it turned out there's RFC defining the exact thing, so what is the issue? A discussion usually helps to figure out the best solution and avoid common bugs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, at least it doesn't affect the output this way till now. So identifiers can be output, which wouldn't be true for arbitrary content with this condition.

main/main.c Outdated
@@ -611,6 +611,7 @@ PHP_INI_BEGIN()
#ifdef PHP_WIN32
STD_PHP_INI_BOOLEAN("windows.show_crt_warning", "0", PHP_INI_ALL, OnUpdateBool, windows_show_crt_warning, php_core_globals, core_globals)
#endif
STD_PHP_INI_ENTRY("syslog.highbit", "0", PHP_INI_SYSTEM, OnUpdateBool, syslog_highbit, php_core_globals, core_globals)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest it should not be 7-bit by default, it is the current behavior. Maybe also name it a bit clearer, like "syslog_ascii_only" or alike.

Copy link
Contributor Author

@pprindeville pprindeville Aug 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, 7-bit is about the only thing we know for certain isn't exploitable.

What's the point of making the code more secure, only to disable the extra checking?

Anyway, the name has been changed.

Copy link
Member

@nikic nikic Aug 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's a purely theoretical concern with a very, very tenuous connection to security? If you want to go into pointless hypotheticals, why not consider EBCDIC systems where LF is not even mapped to the same code point as ASCII? If the spec does not specify an encoding, we have to make reasonable assumptions. The reasonable assumptions are a) forbid ASCII control chars and b) handle ASCII LF.

IMHO even having an ini option is completely unnecessary, but it we do add one, it should certainly allow non-ASCII codepoints by default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pprindeville so what is the other security aspect besides escape sequences? On the other hand, not allowing UTF-8 to pass nowadays is weird.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikic I'm neither a PHP proponent nor opponent. I'll use whatever I'm paid to use: C, C++, Perl, Python, Bash, PHP, Go, whatever...

I don't have a dog in this fight.

What I do know, though, is reading through HTTP server logs everything, over 60% of the attempted exploits are PHP. That's probably something that most of the people reading this message don't like to be reminded of.

I don't know if that's because the environment is inherently less secure or because, for whatever reason, people who code in PHP tend to be less aware of system exploits.

Either way, if I leave my fingerprints on PHP, it's not going to be in the form of an added exploit.

I've put instructions into the default php.ini that explain this control and how to open up non-ascii logging if one is willing to accept any consequences which might come from that.

As for EBCDIC, if you're running in an EBCDIC environment, you have to compile in an EBCDIC environment so that strings and character literals are encoded correctly, and that the correct table for <ctype.h> is used.

IETF standards are woefully unaware of non-Latin languages, I used to title at that windmill in the 90's when HTML was being finalized. If you get bored, see the postings of philipp@res.enst.fr on www-html and www-talk mailing lists.

I'm not an English language bigot: I just won't compromise security for convenience, even if it's a major inconvenience to use (say) Cyrillic or Biao Gao with IETF protocols.

If the standards are inadequate, fix them. Don't willingly write broken code that violates them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the previous logger experiment, i now went a bit further and checked same with syslog-ng and dsyslog. Previously it was rsyslog, as it turned out. Anyway not sure there are more syslog daemons to get on Jessie by default and i've no other VMs to break ATM. At least what turned out is, that the behavior of all of them is same as shown before - an UTF-8 string lands in the syslog. Furthermore, seems it doesn't care whether BOM is present or not.

Regarding BOM - it might be more clear why the new RFC requires it for UTF_8 and why it requires UTF-8 to be the shortest form http://www.unicode.org/reports/tr36/#UTF-8_Exploit . To answer the earlier question - there is absolutely nothing in the core that deals with BOM. And i wouldn't even try to understand, what the given syslog daemon does in this case. Fe doing a write of the ill formed sequence 41 C2 3E 42 as from the UNICODE-TR36, it lands in the log file without any complain whether BOM is used or not. Apparently there is a security aspect. As per the RFC, the implementation does not have to interpret UTF-8 without BOM, but also it wouldn't care about an arbitrary string contents.

@pprindeville The topic is complex, we see. In PHP security and backward compatibility have both high priority. In this case, it seems that the backward compatibility breach were a far overreaction and thus not justified. It should really be thought through carefully, perhaps involving mbstring and default_charset or whatsoever measures. It is a fun topic and it is good that we discuss it so it reveals how complex it is and there indeed can be issues. At least we see now, that the best possible solution is not to achieve with just extending the intended fix a bit.

@pprindeville please let's not be hasty. I would suggest for now to reduce to an unavoidable and backward compatible minimum, which is

  • split by newline
  • block c < 32
  • syslog.ascii_only = 0 by default

This solves the initial issue and puts in some security bits. Furthermore, it seems to be the solution with the most consent so far. Anyway, having this the PR can be completed and anything beyond can be left for further consideration.

Thanks.

Copy link
Contributor Author

@pprindeville pprindeville Aug 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solves the initial issue and puts in some security bits.

@weltling Well, it puts them in but largely disables them. Anyway, I don't care. All I ever wanted was splitting stack traces in the exception handler so they wouldn't look like:

Aug 11 15:19:01 ubuntu16 php: PHP Fatal error:  Uncaught Error: Class 'NoSuchClass' not found in /home/philipp/bertram/php-src/test.php:4#012Stack trace:#012#0 /home/philipp/bertram/php-src/test.php(15): foo()#012#1 {main}#012  thrown in /home/philipp/bertram/php-src/test.php on line 4

which is what they currently look like. I've spent 6x more time on this issue than initially planned.

We've still not decided if DEL should ever get written in the clear or not. Nothing I've seen suggests that 0xf7 should be allowed.

Also, if "backward compatibility" was the vulnerability, don't you want to deliberately break that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syslog.ascii_only = 0 by default

Okay, changed the default and updated the text in the default php.ini.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've spent 6x more time on this issue than initially planned.

Also any participants here spend time as well :) Some cases are like this, where a big thing turns out from some relatively simple. It's the normal process we follow for reviews anyway.

Also, if "backward compatibility" was the vulnerability, don't you want to deliberately break that?

Say it plain, yes. There is this document https://wiki.php.net/security for some time already, that we use to evaluate the issue impacts. In the security regard - the known XTERM vulnerability is fixed with your patch. The other case with the UTF-8 exploit - i might be wrong, however when looking at the classifications, it's probably a low one as for me. Quite a few assumptions need to come true on the PHP side and for a vulnerable log viewer software the data source probably doesn't matter.

With 0x7f - from the RFC it is invalid and also doesn't seem a part of any code point in UTF-8. Lets see for more comments yet, otherwise the state seems quite close to finish. I'm also going for some tests next days.

Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weltling Looks like Big5 and SJIS also skip over 0x7f, so no problem there either.

break;
}

if (((' ' <= c) && (c <= '~')) || (PG(syslog_highbit) && (c >= 0x80)))
Copy link
Contributor

@weltling weltling Aug 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the >= 0x80 condition. Some encodings might still be broken by this. Fe BIG5 or other CJK. I'd suggest implementing conditions/actions like this

  • c < 32 and c is not a newline => filter out or replace
  • c has to be 7-bit only and if c > 126 then filter out or replace
  • otherwise pass as is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything more complicated than this, and we need to implement a utf-8 compressed format parser. Maybe that's the way to go, but can we do it as a follow-on enhancement?

I'm trying to fix a vulnerability, and keep the scope of the change limited.

The bigger the change, the more likelihood of injecting another vulnerability (or multiple ones) by too big of an untested change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weltling >= 0x80 is how PHP handles this elsewhere, e.g. in identifiers. (IIRC this was changed in 7.1, previously it was >= 0x7f).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pprindeville the vulnerability linked was about escape sequences, so c< 32 is the only condition for that. Or do i miss yet more vulnerabilities? Also, how it makes the handling more complicated, c < 32 needs to be replaced anyway and otherwise there is no reason to parse anything except the key points as per RFC.

@nikic But we still supported other encodings than single byte and UTF-8? Fe check https://en.wikipedia.org/wiki/Big5#Organization where the second byte can be in the range of 0xa1-0xfe, so it'd result in a broken output. At least in some CJK encodnigs.

Copy link
Member

@nikic nikic Aug 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weltling Not sure I get what you mean. 0xa1 - 0xfe will still pass the >= 0x80 check. As far as I'm aware all encodings in practical use (discounting encodings with multi-byte code units and EBCDIC, something we're assuming here anyway) are either ASCII-compatible or quasi-compatible (e.g. SJIS differs in the mapping of some characters in the ASCII-plane, but not in any way that is relevant in this context) and only do anything special for code-units >= 0x80, so allowing that range is sufficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikic ups, my c/p mistake. There are two ranges for the second byte there - 0x40-0x7e and 0xa1-0xfe, at least as per doc linked before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weltling Ah, I see. Would still be allowed by these checks. The ' '..'~' check is 0x20..0x7e, so that includes the range. (It would probably make sense to explicitly use the hex values here.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikic indeed, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the vulnerability linked was about escape sequences, so c< 32 is the only condition for that. Or do i miss yet more vulnerabilities?

@weltling There are syslog implementations out there which don't implement RFC-5424, and others still which implement it but disable it by default, falling back on RFC-3164, which states:

4.1.3 MSG Part of a syslog Packet

   The MSG part will fill the remainder of the syslog packet.  This will
   usually contain some additional information of the process that
   generated the message, and then the text of the message.  There is no
   ending delimiter to this part.  The MSG part of the syslog packet
   MUST contain visible (printing) characters.  The code set
   traditionally and most often used has also been seven-bit ASCII in an
   eight-bit field like that used in the PRI and HEADER parts.  In this
   code set, the only allowable characters are the ABNF VCHAR values
   (%d33-126) and spaces (SP value %d32).  However, no indication of the
   code set used within the MSG is required, nor is it expected.  Other
   code sets MAY be used as long as the characters used in the MSG are
   exclusively visible characters and spaces similar to those described
   above.  The selection of a code set used in the MSG part SHOULD be
   made with thoughts of the intended receiver.  A message containing
   characters in a code set that cannot be viewed or understood by a
   recipient will yield no information of value to an operator or
   administrator looking at it.

The only actionable part of that paragraph is, "In this code set, the only allowable characters are the ABNF VCHAR values (%d33-126) and spaces (SP value %d32)."

The rest is unactionable. For example, "However, no indication of the code set used within the MSG is required, nor is it expected. Other code sets MAY be used as long as the characters used in the MSG are exclusively visible characters and spaces similar to those described above."

If you don't know what the code set is because it's not explicitly called out, then how do you know if a character is "printing" or not in that code set? You don't. You can't.

@pprindeville pprindeville force-pushed the bug/74860 branch 3 times, most recently from a594f73 to 715a831 Compare August 11, 2017 16:06
@pprindeville pprindeville force-pushed the bug/74860 branch 2 times, most recently from f2ec1ba to 3dd32e4 Compare August 11, 2017 21:32
@pprindeville
Copy link
Contributor Author

Any chance of this going in this week?

@@ -517,6 +517,15 @@ report_memleaks = On
; This setting is on by default.
;report_zend_debug = 0

; Set this to disable non-ASCII logging (more secure).
Copy link
Member

@bukka bukka Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more secure comment is very misleading and it just shows on broken loggers that can't handle UTF-8 for example which is a minority IMHO. It should be really clear that it is a work around for broken loggers.

Btw. there is no security issue in PHP. This is just a problem of the logger as RFC does not prohibit any characters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bukka The loggers aren't the issue, so much as terminal emulators that allow themselves to be subverted.

main/main.c Outdated
@@ -611,6 +611,7 @@ PHP_INI_BEGIN()
#ifdef PHP_WIN32
STD_PHP_INI_BOOLEAN("windows.show_crt_warning", "0", PHP_INI_ALL, OnUpdateBool, windows_show_crt_warning, php_core_globals, core_globals)
#endif
STD_PHP_INI_ENTRY("syslog.ascii_only", "0", PHP_INI_SYSTEM, OnUpdateBool, syslog_ascii_only, php_core_globals, core_globals)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to see something more generic like syslog.encoding to allow us also properly support UTF-8 - add BOM if selected. We also have functions valildation for UTF-8 and some other encoding in html.c in standard ext so it might be also good idea to use. This is just an initial thought so needs probably more thinking and it's open for discussion ofc. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is turning into a bike-shedding exercise.

The original question was trivial: how to handle strings with embedded newlines?

Splitting into two issues and two PR's.

See bug #75077 for the issue related to non-NVT ASCII.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, let's not drag more changes into this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well my point is that we introduce syslog.ascii_only and then to check utf-8, we will need to add syslog.utf8_only or something like that. That's not really flexible IMHO. I would prefer a single ini for that where you can specify required encoding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a discussion some days ago in this thread #2674 (comment) where it turned out it the encoding is not really respected, specifically after this experiment is what was to see here #2674 (comment). If you mean just about checking the encoding - default_charset might suffice for that already, so no reason to introduce a complication. For encoding conversion it's probably even more complicated to start here. In general, the patch as is seemed to be OK, both for fixing the initial newline splitting issue and securing some bits, the further handling should be a matter of a presumably more complicated implementation. Since it targets master, it were OK to have this patch at least for the start, then looking whether a complicated implementation could ever make it.

Thanks.

va_end(args);

for (ptr = fbuf.c; ; ++ptr) {
c = *ptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could be inside the for I guess :)

@pprindeville pprindeville changed the title Handle newlines and illegal characters safely when error_log set to "syslog" Handle newlines correctly when error_log set to "syslog" Aug 15, 2017
@pprindeville
Copy link
Contributor Author

Okay, now can we proceed with a merge?

@pprindeville
Copy link
Contributor Author

Btw. there is no security issue in PHP. This is just a problem of the loggers if there are some security issues as RFC does not prohibit anything.

@bukka "Defense in depth" strategy entails trying to defeat an attack in more than one place. That's what we're doing here.

No one said that the vulnerability is in PHP: it's not. The vulnerability is in the Xterm (and other terminal emulators) which don't handle device control strings (DCS's) in a sane way.

But we can try to prevent PHP from being the vector by which the vulnerability is injected.

@weltling
Copy link
Contributor

weltling commented Aug 15, 2017

Actually i was going to do some tests by the EOW. The last point was to exclude also 0x7f, but now the patch is completely different and there's yet another PR? What is the plan now, is this PR superseeded?

Thanks.

@pprindeville
Copy link
Contributor Author

What is the plan now, is this PR superseeded?

Not at all. PR #2689 is layered on this one. You can do your testing there, and I'll rebase the PR to drop the first commit as soon as it hits master.

@weltling
Copy link
Contributor

@pprindeville hmm, not sure to follow. #2689 is what this PR was before. Why it is split out? Or if i missed something in the correspondence, please point me there.

Thanks.

@pprindeville
Copy link
Contributor Author

Why it is split out? Or if i missed something in the correspondence, please point me there.

Because there are two problems: (1) how to handle strings with newlines in them, (2) how to handle non-NVT ASCII.

(1) is a trivial issue easily solved by the original first commit, with hardly any discussion required.
(2) is a more complex issue, which seems to have still more discussion required to arrive at a consensus.

This PR reverts to the original fix (and the original commit) which handles just issue (1).

The other PR is based on this PR plus the two additional commits which handle (a) strict NVT-ASCII when strict NVT-ASCII is all that is explicitly permitted/configured, and (b) either escaping non NVT-ASCII characters into safe octal notation--which I'm figuring won't be the ultimate solution-- or (b.2) translating the string into utf-8 compressed format with a BOM prefix... which is the only explicitly and concretely defined of handling anything other than NVT-ASCII.

I'll revise PR #2689 to (b.2) once it's agreed that this is the way to proceed and further, someone shows me what the libraries are for translating to utf8 and for validating utf8 correctness (in case the app is logging utf8 natively).

@pprindeville
Copy link
Contributor Author

That's not really flexible IMHO. I would prefer a single ini for that where you can specify required encoding.

@bukka That's fine. Let's discuss that on PR #2689 since that's where this fix will end up.

@weltling
Copy link
Contributor

@pprindeville but the patch didn't touch any encoding, did it? It fixed the initial issue and the terminal issue as a bonus. Anything else was passed through for now, no BOM, etc. :) The only point came from @bukka just now about the ini naming, either it could be fixed as well or later in master, since that's what the patch was targeting. Anyway, lets see it this way now.

Thanks.

Thanks.

@pprindeville
Copy link
Contributor Author

pprindeville commented Aug 15, 2017

but the patch didn't touch any encoding, did it? It fixed the initial issue and the terminal issue as a bonus. Anything else was passed through for now, no BOM, etc. :) The only point came from @bukka just now about the ini naming, either it could be fixed as well or later in master, since that's what the patch was targeting. Anyway, lets see it this way now.

@welting If you mean "was it codeset aware", then the answer was "no". It assumed everything was (or should be) NVT-ASCII.

We could, in PR #2689, look at the default_charset as you say, and convert the string from that to utf-8 and append a BOM. Then we'd be honoring Section 6.4 in the only concrete terms it has.

@weltling
Copy link
Contributor

@pprindeville converting everything to UTF-8 might be not that doable - it'll likely require some heavy API and fe ext/mbstring is not an obligatory dependency. There are yet more questions than answers, as for me. I can only applaud you still strive a wider solution on this, anyway. I'm going for tests closer to the EOW, so if everything ok at least the newline part can be handled already.

Thanks.

@pprindeville
Copy link
Contributor Author

@weltling Hi. Any news?

@weltling
Copy link
Contributor

Merged with master, adapted for the Windows side.

Thanks.

@weltling weltling closed this Aug 19, 2017
@pprindeville
Copy link
Contributor Author

Merged with master, adapted for the Windows side.

Were additional changes required?

pprindeville added a commit to pprindeville/packages that referenced this pull request Aug 19, 2017
See: php/php-src#2674

Signed-off-by: Philip Prindeville <philipp@redfish-solutions.com>
pprindeville added a commit to pprindeville/packages that referenced this pull request Aug 19, 2017
See: php/php-src#2674

Signed-off-by: Philip Prindeville <philipp@redfish-solutions.com>
@pprindeville
Copy link
Contributor Author

Anyway, thanks!

@weltling
Copy link
Contributor

@pprindeville yeah, splitting doesn't seem to be suitable for the event log. It's record based, not line based. The parts about sanitizing might still make sense, but the MSDN docs don't mention much on that.

Thanks.

pprindeville added a commit to pprindeville/packages that referenced this pull request Aug 19, 2017
See: php/php-src#2674

Signed-off-by: Philip Prindeville <philipp@redfish-solutions.com>
mkrkn pushed a commit to mkrkn/packages that referenced this pull request Aug 23, 2017
See: php/php-src#2674

Signed-off-by: Philip Prindeville <philipp@redfish-solutions.com>
salzmdan pushed a commit to salzmdan/packages that referenced this pull request Jan 8, 2018
See: php/php-src#2674

Signed-off-by: Philip Prindeville <philipp@redfish-solutions.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants