Skip to content

Update #81645 : header() checks the validity of HTTP status code #7676

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from
Closed

Update #81645 : header() checks the validity of HTTP status code #7676

wants to merge 9 commits into from

Conversation

mdsnins
Copy link
Contributor

@mdsnins mdsnins commented Nov 23, 2021

Reference

Original Reference: header() allows arbitrary status codes (which may overflow) #81645

Description

To follow RFC 7230, now header() only accepts response code in 3 digits "000"-"999" However, atoi also allows a positive or negative sign in front of decimal representation which can lead invalid HTTP status line, an additional check logic using strlen is added. This patch also can prevent integer overflow which I raised initially.

Test1

<?php
header("HTTP/1.1 200 OK");

Expect

Nothing wrong

Result

HTTP/1.1 200 OK
Date: Tue, 23 Nov 2021 08:28:00 GMT
Connection: close
X-Powered-By: PHP/8.2.0-dev
Content-type: text/html; charset=UTF-8

Test2

<?php
header("HTTP/1.1 777 LUCKY");

Expect

It's not a standard status code, but it's acceptable.

Result

HTTP/1.1 777 LUCKY
Date: Tue, 23 Nov 2021 08:28:17 GMT
Connection: close
X-Powered-By: PHP/8.2.0-dev
Content-type: text/html; charset=UTF-8

Test3

<?php
header("HTTP/1.1 11 ELEVEN");

Expect

It's an invalid HTTP status line because it's not 3 digit

Result

HTTP/1.1 200 OK
Date: Tue, 23 Nov 2021 08:28:40 GMT
Connection: close
X-Powered-By: PHP/8.2.0-dev
Content-type: text/html; charset=UTF-8

<br />
<b>Warning</b>:  Cannot set HTTP status line - status code is malformed in <b>/home/ubuntu/mytester/ser.php</b> on line <b>5</b><br />

Test4

<?php
header("HTTP/1.1 1111 LONG");

Expect

It's an invalid HTTP status line because it's not 3 digit

Result

HTTP/1.1 200 OK
Date: Tue, 23 Nov 2021 08:29:25 GMT
Connection: close
X-Powered-By: PHP/8.2.0-dev
Content-type: text/html; charset=UTF-8

<br />
<b>Warning</b>:  Cannot set HTTP status line - status code is malformed in <b>/home/ubuntu/mytester/ser.php</b> on line <b>6</b><br />

Test5

<?php
header("HTTP/1.1 007 IMPOSSIBLE");

Expect

It's lower than 100, but it's valid 3 digit

Result

HTTP/1.1 007 IMPOSSIBLE
Date: Tue, 23 Nov 2021 08:30:37 GMT
Connection: close
X-Powered-By: PHP/8.2.0-dev
Content-type: text/html; charset=UTF-8

Test6

<?php
header("HTTP/1.1 000 ZERO");

Expect

It's totally zero, but 3 digit zero.

Result

HTTP/1.1 000 ZERO
Date: Tue, 23 Nov 2021 08:30:50 GMT
Connection: close
X-Powered-By: PHP/8.2.0-dev
Content-type: text/html; charset=UTF-8

Test7

<?php
header("HTTP/1.1 4294967496 OVERFLOW");

Expect

It occurs overflow and parsed as 200 in atoi, but must be failed.

Result

HTTP/1.1 200 OK
Date: Tue, 23 Nov 2021 08:32:08 GMT
Connection: close
X-Powered-By: PHP/8.2.0-dev
Content-type: text/html; charset=UTF-8

<br />
<b>Warning</b>:  Cannot set HTTP status line - status code is malformed in <b>/home/ubuntu/mytester/ser.php</b> on line <b>9</b><br />

Copy link
Member

@cmb69 cmb69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!

@mdsnins
Copy link
Contributor Author

mdsnins commented Nov 23, 2021

Sorry for messy commits, I made some mistakes :(
I replaced atoi to strtoul and finally the code becomes much prettier, Thanks a lot @cmb69 💯

Anyway, raising an E_WARNING was totally my choice, is it acceptable for all?

@cmb69
Copy link
Member

cmb69 commented Nov 23, 2021

Anyway, raising an E_WARNING was totally my choice, is it acceptable for all?

It is fine for me, but maybe others have objections.

Anyhow, please add a simple test case to verify the behavior.

@mdsnins
Copy link
Contributor Author

mdsnins commented Nov 23, 2021

I updated 4 test cases under /ext/standard/test/general_functions
I'm not sure about where to place them (It seems like /sapi/cgi/tests/ also make sense for me) and how many of them are needed.

If you think test cases should be moved or removed/added, please tell me!

Since this is my first time contributing the PHP, I'm slightly worried that I'm too unskilled :(

@mdsnins
Copy link
Contributor Author

mdsnins commented Nov 23, 2021

By the way, it's not related to this issue (#81645), I want to ask something about a security issue of PHP before disclosing it, can I ask you over an email?

@cmb69
Copy link
Member

cmb69 commented Nov 23, 2021

Please file a security bug at https://bugs.php.net/ instead. :)

@SVGAnimate
Copy link
Contributor

SVGAnimate commented Nov 23, 2021

Hello,

Nice PR!

But, I have objections.

👍 It seems that it is necessary to check HAVE_STRTOUL to use ZEND_STROUL

@see ///php-src/main/php_config.h
/* Define to 1 if you have the `strtoul' function. /
/
#undef HAVE_STRTOUL */

👍 There are an issue here :

/* sapi_update_response_code doesn't free the status line if the code didn't change */

We do not need to duplicate(alloc/free) the string... We need sentinel strategy.

	ctr.line = "HTTP/1.0 500 Internal Server Error";
	ctr.line_len = sizeof("HTTP/1.0 500 Internal Server Error") - 1;// "-1" is the issue
	sapi_header_op(SAPI_HEADER_REPLACE, &ctr);

You found the bug so I'll let you find the solution.

@mdsnins
Copy link
Contributor Author

mdsnins commented Nov 24, 2021

I don't know what is the problem.
ZEND_STRTOUL is being defined in zend_long.h and used well with out any constraints in current PHP.

Also -1 is not an issue because since sizeof checks the size including a null byte at the end, -1 is correct for proper string length

@mdsnins
Copy link
Contributor Author

mdsnins commented Nov 24, 2021

Is there anything more I have to do about the patch?

@SVGAnimate
Copy link
Contributor

I disprove this RP because this bug alert/show the technical debt while this patch conceals it. This is my current opinion.
For example : https://github.com/php/php-src/blob/master/main/SAPI.c#L685

	header_line = estrndup(p->line, p->line_len);
	header_line_len = p->line_len;

Error :

	header_line[header_line_len] = '\0';// missing string terminaison so atoi become unpredictable 

mdsnins, the issue is for 32 bit computers.
https://github.com/php/php-src/blob/master/Zend/zend_long.h#L24

Sorry : -1 is misleading
Because the behaviors of "strings null-terminated" and "char array" are confuse.

What's more sapi_header_op() is responsable to parse/scan the header string( cf RFC)
however his signature does not allow it completely.


Maybe this code can be a source of inspiration

/**
 * response_code: [0-9]{3}
 * return -1 if failed else the http response code
 */
static int http_response_code_scan(const char *str, char **next)
{
	int code = -1;
        char *ptr = str;

        while (ptr[0]==' ') ptr++;//TODO: while (ZEND_IS_SPACE(*ptr)) ptr++;

	if (ZEND_IS_DIGIT(ptr[0]))
	if (ZEND_IS_DIGIT(ptr[1]))
	if (ZEND_IS_DIGIT(ptr[2]))
        {
                code = 100 * (ptr[0]-'0')
                     +  10 * (ptr[1]-'0')
                     +   1 * (ptr[2]-'0');
		ptr += 3;
		if (*next) { *next=ptr;} 
		return code;
	}

	return code;
}
/// usage
char *next=NULL;
ptr = header_line;
char protocol = http_response_protocol_scan(ptr, &next);// protocol: "HTTP/" [0-1] "." [0-9];
if(next) {
    ptr = next;
    //next=NULL;
    code = http_response_code_scan(ptr, &next);
    if(/* next &&*/ code>0) {
        sapi_update_response_code(code);
    } else {
        // zend_error(..)
    }
} else {
    // zend_error("TOKEN_ERROR: Unexpected %c", protocol )
}

Test1

<?php
header("HTTP/1.1 \t 200 OK");

Test2

<?php
header("HTTP/1.1\t 200 OK");

PS : For me, the priority is to "mirror" the calls malloc/free

@mdsnins
Copy link
Contributor Author

mdsnins commented Nov 24, 2021

Null byte is not missing after calling strndup. strndup always ensure that returned string is null-terminated. Refer: https://en.cppreference.com/w/c/experimental/dynamic/strndup

And also, ZEND_STRTOUL is not problem for 32bit systems. I don't understand why you're saying it's a problem to use. PHP core is already using functions related with zend_long.

https://github.com/php/php-src/blob/master/ext/standard/scanf.c#L354
https://github.com/php/php-src/blob/master/ext/standard/php_fopen_wrapper.c#L198

Copy link
Member

@cmb69 cmb69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The location of the tests is fine. I also don't see an issue with returning -1 in case of failure, and using ZEND_STRTOUL looks good to me as well.

main/SAPI.c Outdated
const char *ptr;

for (ptr = header_line; *ptr; ptr++) {
if (*ptr == ' ' && *(ptr + 1) != ' ') {
code = atoi(ptr + 1);
const char *code_begin = ptr + 1;
char *code_end;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Something is wrong with the indentation here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I didn't catch it. I just made a fix

Copy link

@AustinLeath AustinLeath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! looked at all of your changes and I don't necessarily see anything else wrong with any of it. Although others may find something that I overlooked. Good work!

@mdsnins
Copy link
Contributor Author

mdsnins commented Nov 24, 2021

It seems like there is a problem with azure pipelines?

@cmb69
Copy link
Member

cmb69 commented Nov 24, 2021

It seems like there is a problem with azure pipelines?

Yes, we're working on that. :)

@mdsnins
Copy link
Contributor Author

mdsnins commented Nov 24, 2021

It seems like there is a problem with azure pipelines?

Yes, we're working on that. :)

Thanks a lot :D

@SVGAnimate
Copy link
Contributor

SVGAnimate commented Nov 24, 2021

What about this tests ?

header("HTTP/1.1 \t 200 OK");// pass
header("HTTP/1.1\t 200 OK");// fail
header("HTTP/1.1 \n 200 OK");// ?
header("Http/Grandiose/1.1.1.1.1.1.1.1.1.0 200 OK");// ?


From rfc : https://datatracker.ietf.org/doc/html/rfc7230#section-3.1.2
status-line = HTTP-version SP status-code SP reason-phrase CRLF

What this code do ?

Caller

	if (header_line_len>=5
		&& !strncasecmp(header_line, "HTTP/", 5)) {
		/* filter out the response code */
		sapi_extract_response_code(header_line);

Called

static int sapi_extract_response_code(const char *header_line)
{
	int code = 200;
	const char *ptr;

	for (ptr = header_line; *ptr; ptr++) {
		if (*ptr == ' ' && *(ptr + 1) != ' ') {
			code = atoi(ptr + 1);
			break;
		}
	}

	return code;
}

Basic algorithm

Here is an algorithm for finding the first char 'A' in an array header_line [] of four elements:

char header_line[4] = {'A', 'B', 'C', 'D'};
char misleading = '\0';
size_t str_len = sizeof("ABCD") - 1;
char *end = header_line + str_len;
for ( code=0, ptr = header_line ; ptr<end && code!=0 ; ptr++)
    if ( ptr[0] == 'A' ) code = 1;

This version is not very efficient:

  • We carry out 2 initializations (2 instructions);
  • We perform 2 tests (2 instructions) before each passage in the loop (ptr<end and code! = 0)

Improved algorithm

A slightly more elaborate version of this algorithm is:

for ( ptr = header_line ; ptr<end ; ptr++)
    if ( ptr[0] == 'A' ) { code = 1; break; }

Or

ptr = header_line ;
while (  ptr<end &&  ptr[0] != 'A') ptr++;
if ( ptr<end ) { code = 1; }

We win :

  • 1 initialization (1 instruction) at the start of the algorithm;
  • 1 test (1 instruction) before each loop pass. Let n instructions on an array of n elements (in the best case).

Algorithm with sentinel

Finally, we can run the algorithm without testing the loop index. It suffices to provide a work box, where we place the key '\0' to be sure to find it. This box is called a sentry. https://en.wikipedia.org/wiki/Sentinel_value

char header_line[] = {'A', 'B', 'C', 'D', '\0'}
We can also see the sentinels as stopper elements in the execution of an algorithm.

char code = 0 ;
while (*header_line != '\0' ) {
    if (*header_line == 'A') {
        code = *header_line;
        break;
    }
    header_line++ ;
}

By eliminating the loop index test, we still gain n instructions on an array of n elements (in the best case).

That is to say that compared to the basic algorithm, we have gained up to 2n + 2 instructions.

With an array of 4 elements, the saving in instructions is insignificant, but with arrays of several thousand or million cells, the saving is several thousand or several million instructions gained.

If the search is done in the opposite direction, the sentinel is placed at the start of the table.

In the event that we do not know in which direction the search will take place, an array is created with two sentinels, one at the start of the array and one at the end of the array.

Sentinels can be used with arrays, lists, trees, and other data structures.

Best Regards

main/SAPI.c Outdated
code = ZEND_STRTOUL(code_begin, &code_end, 10);

/* rfc7230 3.1.2 status-code = 3DIGIT */
if(*code_begin < '0' || *code_begin > '9' || (code_end - code_begin) != 3) {
Copy link
Member

@bukka bukka Nov 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you are already doing this, it might be worth not to allow anything above 599 and also first check should be probably *code_begin < '1'.

also small coding style nit: there should be a space after if

Copy link
Member

@bukka bukka Nov 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is basically that rfc7231 6 ( https://datatracker.ietf.org/doc/html/rfc7231#section-6 ) defines semantics and doesn't really allow anything else so might make sense to check that as well. Not sure how could be useful to allow 000 for example...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, actually my first patch also it's in [100, 999] but I removed because of my confusion. Since some of applications use 000 or 999 for their special kind of response, I confused it's under standard.

However, now I'm totally agree with it should be restricted in [100, 599] and there should be a space after if :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a commit :)

@SVGAnimate
Copy link
Contributor

For me, the problem comes from the fact that there is no parser and adequate structure for the representation of the HTTP status "HTTP/%d.%d %d %s"

Here is a representation of the situation :

The red box(alloc) indicates the number of times the input string is duplicated for no apparent reason.

@SVGAnimate
Copy link
Contributor

Hello,
I analyze the code and it turns out that the parsing of the "http_header_status" is done manually in several places with subtle differences .

I propose to create /main/rfc7230.c( like /main/rfc1867.c) and refactoring.

But how to test the string output of header() function ? Do I have to compile php with Apache?

Do I need to create a PR or i can just RP from mdsnins repository ?

@cmb69
Copy link
Member

cmb69 commented Dec 6, 2021

@SVGAnimate, please don't derail this PR with imagined performance issues; the overhead appears to be completely neglectable in this case. :)

Feel free to provide another PR from your own php/php-src fork.

@SVGAnimate
Copy link
Contributor

Thanks you, I'll do it.

In that event, I don't see anymore reasons to object the modification of this RP.

@cmb69
Copy link
Member

cmb69 commented Dec 21, 2021

I've sent mail to the internals mailing list regarding this PR. Let's see where it goes. :)

@mdsnins
Copy link
Contributor Author

mdsnins commented Dec 23, 2021

Hello, I had busy days and I couldn't care about this issue.
I just followed up, and check the internal mails.

I'm also not sure about restricting status code in [100, 599]. RFC7231 is defining 5 categories of status code, however I don't know it actually restricts codes into 5 classes (maybe this can be misreading of document, since I'm not good at English). Also I found some cases (some web application or WAF) uses 000 or 999 as unusual cases' status code.

However, there are the commit that restricts to 000-999 in commit history, instead of [100, 599], the any choice of PHP will be fine.

Thank you for effort, cmb!

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants