Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assert and ??= can lead to use-of-uninitialized-value due to memoization #11580

Closed
iluuu1994 opened this issue Jul 3, 2023 · 0 comments
Closed

Comments

@iluuu1994
Copy link
Member

Description

The following code:

<?php
assert(y)[y] ??= y;

Resulted in this output:

$_main:
     ; (lines=26, args=0, vars=0, tmps=10)
     ; (before optimizer)
     ; /home/dmitry/tmp/fuzz-60129.php:1-3
     ; return  [] RANGE[0..0]
0000 V2 = ASSERT_CHECK 0007
0001 INIT_FCALL 2 112 string("assert")
0002 T0 = FETCH_CONSTANT string("y")
0003 T1 = COPY_TMP T0
0004 SEND_VAL T0 1
0005 SEND_VAL string("assert(y)") 2
0006 V2 = DO_ICALL
0007 T3 = FETCH_CONSTANT string("y")
0008 T4 = COPY_TMP T3
0009 T5 = FETCH_DIM_IS V2 T3
0010 T6 = COALESCE T5 0022
0011 T7 = FETCH_CONSTANT string("y")
0012 V8 = ASSERT_CHECK 0017
0013 INIT_FCALL 2 112 string("assert")
0014 SEND_VAL T1 1                                                 ; <== T1 may be uninitialized here
0015 SEND_VAL string("assert(y)") 2
0016 V8 = DO_ICALL
0017 V8 = SEPARATE V8
0018 T9 = ASSIGN_DIM V8 T4
0019 OP_DATA T7
0020 T6 = QM_ASSIGN T9
0021 JMP 0024
0022 FREE T1
0023 FREE T4
0024 FREE T6
0025 RETURN int(1)

PHP Version

PHP-8.1

Operating System

No response

@iluuu1994 iluuu1994 self-assigned this Jul 3, 2023
iluuu1994 added a commit to iluuu1994/php-src that referenced this issue Jul 3, 2023
Normally, PHP evaluates all expressions in offsets (property or array), as well
as the right hand side of expressions before actually fetching the offsets. This
is well explained in this blog post.

https://www.npopov.com/2017/04/14/PHP-7-Virtual-machine.html#writes-and-memory-safety

For ??= we have a bit of a problem in that the rhs must only be evaluated if the
lhs is null or undefined. Thus, we have to first compile the lhs with BP_VAR_IS,
conditionally run the rhs and then re-fetch the lhs with BP_VAR_W to to make
sure the offsets are valid if they have been invalidated.

However, we don't want to just re-evaluate the entire lhs because it may contain
side-effects, as in $array[$x++] ??= 42;. In this case, we don't want to
re-evaluate $x++ because it would result in writing to a different offset than
was previously tested. The same goes for function calls, like
$array[foo()] ??= 42;, where the second call to foo() might result in a
different value. PHP behaves correctly in these cases. This is implemented by
memoizing sub-expressions in the lhs of ??= and reusing them when compiling the
lhs for the second time. This is done for any expression that isn't a variable,
i.e. anything that can (potentially) be written to.

Unfortunately, this also means that function calls are considered writable due
to their return-by-reference semantics, and will thus not be memoized. The
expression foo()['bar'] ??= 42; will invoke foo() twice. Even worse,
foo(bar()) ??= 42; will call both foo() and bar() twice, but
foo(bar() + 1) ??= 42; will only call foo() twice. This is likely not by design,
and was just overlooked in the implementation. The RFC does not specify how
function calls in the lhs of the coalesce assignment behaves.

Now, the problem this commit actually fixes is that ??= may memoize expressions
inside assert() function calls that may not actually execute. This is not only
an issue when using the VAR in the second expression (which would usually also
be skipped) but also when freeing the VAR. For this reason, it is not safe to
memoize assert() sub-expressions.

There are two possible solutions:

1. Don't memoize any sub-expressions of assert(), meaning they will execute
   twice.
2. Compile-error

Option 2 is not quite simple, because we can't disallow all memoization inside
assert(), as that would break assertions like assert($array[foo()] ??= 'bar');.
Code like this is highly unlikely (and dubious) but possible. In this case, we
would need to make sure that a memoized value could not be used across the
assert boundary it was created in. The complexity for this does not seem worth
it. So we opt for option 1 and disable memoization immediately inside assert().

Fixes phpGH-11580
iluuu1994 added a commit to iluuu1994/php-src that referenced this issue Jul 3, 2023
Normally, PHP evaluates all expressions in offsets (property or array), as well
as the right hand side of assignments before actually fetching the offsets. This
is well explained in this blog post.

https://www.npopov.com/2017/04/14/PHP-7-Virtual-machine.html#writes-and-memory-safety

For ??= we have a bit of a problem in that the rhs must only be evaluated if the
lhs is null or undefined. Thus, we have to first compile the lhs with BP_VAR_IS,
conditionally run the rhs and then re-fetch the lhs with BP_VAR_W to to make
sure the offsets are valid if they have been invalidated.

However, we don't want to just re-evaluate the entire lhs because it may contain
side-effects, as in $array[$x++] ??= 42;. In this case, we don't want to
re-evaluate $x++ because it would result in writing to a different offset than
was previously tested. The same goes for function calls, like
$array[foo()] ??= 42;, where the second call to foo() might result in a
different value. PHP behaves correctly in these cases. This is implemented by
memoizing sub-expressions in the lhs of ??= and reusing them when compiling the
lhs for the second time. This is done for any expression that isn't a variable,
i.e. anything that can (potentially) be written to.

Unfortunately, this also means that function calls are considered writable due
to their return-by-reference semantics, and will thus not be memoized. The
expression foo()['bar'] ??= 42; will invoke foo() twice. Even worse,
foo(bar()) ??= 42; will call both foo() and bar() twice, but
foo(bar() + 1) ??= 42; will only call foo() twice. This is likely not by design,
and was just overlooked in the implementation. The RFC does not specify how
function calls in the lhs of the coalesce assignment behaves. This should
probably be improved in the future.

Now, the problem this commit actually fixes is that ??= may memoize expressions
inside assert() function calls that may not actually execute. This is not only
an issue when using the VAR in the second expression (which would usually also
be skipped) but also when freeing the VAR. For this reason, it is not safe to
memoize assert() sub-expressions.

There are two possible solutions:

1. Don't memoize any sub-expressions of assert(), meaning they will execute
   twice.
2. Throw a compile error.

Option 2 is not quite simple, because we can't disallow all memoization inside
assert(), as that would break assertions like assert($array[foo()] ??= 'bar');.
Code like this is highly unlikely (and dubious) but possible. In this case, we
would need to make sure that a memoized value could not be used across the
assert boundary it was created in. The complexity for this is not worthwhile. So
we opt for option 1 and disable memoization immediately inside assert().

Fixes phpGH-11580
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant