Skip to content

Conversation

nikic
Copy link
Member

@nikic nikic commented Apr 24, 2016

Currently we allow performing dynamic calls to functions that introspect the parent stack frame. Apart from this being a source of segmentation faults previously and having wildly varying behavior between PHP5, PHP7 (namespaced and non-namespaced) and HHVM (for examples see bug #71220), this also causes issues for the data-flow optimizer, which is not able to detect these calls and abort optimization (see the dynamic_call_007.phpt test for a simple example of misoptimization).

Solve this by forbidding these dynamic calls.

Affects:

  • extract()
  • compact()
  • get_defined_vars()
  • parse_str() with one arg
  • mb_parse_str() with one arg
  • func_get_args()
  • func_get_arg()
  • func_num_args()

Currently we allow performing dynamic calls to functions that
introspect the parent stack frame. Apart from this being a source
of segmentation faults previously and having wildly varying
behavior between PHP5, PHP7 (namespaced and non-namespaced) and
HHVM (for examples see bug #71220), this also causes issues for
the data-flow optimizer, which is not able to detect these calls
and abort optimization (see the dynamic_call_007.phpt test for
a simple example of misoptimization).

Solve this by forbidding these dynamic calls.

Affects:
 * extract()
 * compact()
 * get_defined_vars()
 * parse_str() with one arg
 * mb_parse_str() with one arg
 * func_get_args()
 * func_get_arg()
 * func_num_args()
@nikic
Copy link
Member Author

nikic commented Apr 24, 2016

/cc @dstogov

There currently doesn't seem to be a good way to detect whether a call is dynamic. I'm currently scanning for the init opcode, but that's not a nice solution and slow. Do you think we can extend the call info flags by using const_flags as well (it shouldn't be currently used in this context)?

zend_error(E_WARNING, "func_num_args(): Called from the global scope - no function context");
RETURN_LONG(-1);
}

if (zend_forbid_dynamic_call("func_num_args()") == FAILURE) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, you don't need specific "func_num_args", since it can be get by EG(current_execute_data)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func_num_args() is not a problem from a safety perspective, but the current behavior for dynamic func_num_args() calls is inconsistent depending on whether we're in namespaced / non-namespaced code: https://3v4l.org/inP5v -- anything that inspects the "next frame up" will have this issue, this includes all func_* functions.

Furthermore, while func_num_args() is not an issue for the current optimizer, it would become a problem once we implement inlining (in which case func_num_args() would get the argument number of the function we inlined into).

As such I think we should forbid it as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! Great inconsistency example :)

but the following code is still legal and should provide consistent result

function foo() {

$f = 'func_num_args';

var_dump($f());

}

So the problem might be with INIT_USER_CALL only.

or you are disagree?


From: Nikita Popov notifications@github.com
Sent: Monday, April 25, 2016 11:20
To: php/php-src
Cc: Dmitry Stogov; Mention
Subject: Re: [php/php-src] Forbid dynamic calls to scope introspection functions (#1886)

In Zend/zend_builtin_functions.chttps://github.com//pull/1886#discussion_r60875906:

          zend_error(E_WARNING, "func_num_args():  Called from the global scope - no function context");
          RETURN_LONG(-1);
  }
  • if (zend_forbid_dynamic_call("func_num_args()") == FAILURE) {
    

func_num_args() is not a problem from a safety perspective, but the current behavior for dynamic func_num_args() calls is inconsistent depending on whether we're in namespaced / non-namespaced code: https://3v4l.org/inP5v -- anything that inspects the "next frame up" will have this issue, this includes all func_* functions.

Furthermore, while func_num_args() is not an issue for the current optimizer, it would become a problem once we implement inlining (in which case func_num_args() would get the argument number of the function we inlined into).

As such I think we should forbid it as well.

You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHubhttps://github.com//pull/1886/files/0fc8f96e7b5f5f42f777ff826b072a1c22ab734b#r60875906

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I think maybe I didn't make myself clear, I mean, you don't need

if (zend_forbid_dynamic_call("func_num_args()") == FAILURE) {

you could make it simpler by:

if (zend_forbid_dynamic_call() == FAILURE) {

the function name can be get via EG(current_execute_data).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@laruence Ahhh, sorry, I didn't catch what you mean. The reason why I'm passing in the function name is to throw an error like "Cannot call parse_str() with a single argument dynamically" for parse_str/mb_parse_str, as these are only forbidden if the second argument is not used. If I remove this special case we can indeed determine the function name automatically.

@dstogov
Copy link
Member

dstogov commented Apr 25, 2016

The following code is legal. It just makes troubles to optimiser.

Currently we don't have a free bit in "call info flags", but this would be a better solution.

May be we should mark "unsafe" internal functions and verify their fn_flags when they are called dynamically?

anyway, I'm not sure if we should limit language behavior just because of optimization troubles.

Thanks. Dmitry.


From: Nikita Popov notifications@github.com
Sent: Monday, April 25, 2016 00:59
To: php/php-src
Cc: Dmitry Stogov; Mention
Subject: Re: [php/php-src] Forbid dynamic calls to scope introspection functions (#1886)

/cc @dstogovhttps://github.com/dstogov

There currently doesn't seem to be a good way to detect whether a call is dynamic. I'm currently scanning for the init opcode, but that's not a nice solution and slow. Do you think we can extend the call info flags by using const_flags as well (it shouldn't be currently used in this context)?

You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHubhttps://github.com//pull/1886#issuecomment-214045322

@nikic
Copy link
Member Author

nikic commented Apr 25, 2016

Currently we don't have a free bit in "call info flags", but this would be a better solution.

Yes, which is why I'm asking if there would be a problem with extending call flags to not only used the "reserved" bits, but also the "const_flags" bits. This will extend call info to 16 bit, which gives us plenty of space. Is there an issue with doing this?

anyway, I'm not sure if we should limit language behavior just because of optimization troubles.

Yes, it's tricky... The way I see it, this is the situation:

  • We've recently fixed a number of segfaults caused by these functions being called from internal functions. It's clear that nobody thought about this before and that, until recently, no one actually tried doing it. As such, I don't think a backwards compatibility issue exists.
  • The behavior when called via an internal function is not clear. You've already seen the example for namespaced and non-namespaced func_get_args. Things become even more weird if you also take the behavior by HHVM into account. For example a namespaced call_user_func('get_defined_vars') in PHP will return the variables from the user-frame where the call_user_func occurs in, while in HHVM will return the "defined variables" (which is just the function parameters" of the call_user_func() call itself. So clearly there is no consensus on how these should behave.
  • Lastly, there is the issue with the optimizer. We could abort optimization on DYNAMIC_FCALL (though it would be a pity, as it's a totally useless case), but handling things like array_map would be much more problematic, as there are also methods which invoke callbacks and we generally do not know which methods we're calling (and for that matter don't have a comprehensive list of things that can invoke callbacks if you take extensions into account). So we'd probably have to assume that something like "extract" may be called if the function contains any method call at all.

So, given that this functionality is not useful (and not used), unclear semantics and the issues in the optimizer (currently causing segfaults), I think forbidding these calls is the best solution. I would use the simple rule of "no dynamic calls" rather than try to carve out exceptions like "calling with call_user_func is not okay, but using $func() is allowed".

@dstogov
Copy link
Member

dstogov commented Apr 25, 2016

On 04/25/2016 01:16 PM, Nikita Popov wrote:

Currently we don't have a free bit in "call info flags", but this
would be a better solution.

Yes, which is why I'm asking if there would be a problem with
extending call flags to not only used the "reserved" bits, but also
the "const_flags" bits. This will extend call info to 16 bit, which
gives us plenty of space. Is there an issue with doing this?

This should be possible. May be even get rid from "const_flags" at all.

anyway, I'm not sure if we should limit language behavior just
because of optimization troubles.

Yes, it's tricky... The way I see it, this is the situation:

  • We've recently fixed a number of segfaults caused by these
    functions being called from internal functions. It's clear that
    nobody thought about this before and that, until recently, no one
    actually tried doing it. As such, I don't think a backwards
    compatibility issue exists.

these are just bugs.

  • The behavior when called via an internal function is not clear.
    You've already seen the example for namespaced and non-namespaced
    func_get_args. Things become even more weird if you also take the
    behavior by HHVM into account. For example a namespaced
    |call_user_func('get_defined_vars')| in PHP will return the
    variables from the user-frame where the call_user_func occurs in,
    while in HHVM will return the "defined variables" (which is just
    the function parameters" of the call_user_func() call itself. So
    clearly there is no consensus on how these should behave.

Right, but dynamic function call should not be a problem (only call
through trampoline).

  • Lastly, there is the issue with the optimizer. We could abort
    optimization on DYNAMIC_FCALL (though it would be a pity, as it's
    a totally useless case), but handling things like array_map would
    be much more problematic, as there are also methods which invoke
    callbacks and we generally do not know which methods we're calling
    (and for that matter don't have a comprehensive list of things
    that can invoke callbacks if you take extensions into account). So
    we'd probably have to assume that something like "extract" may be
    called if the function contains any method call at all.

array_walk($a, "extract") absolutely legal code, I think we have to
adopt optimizer to avoid DFA optimizations in case of "dangerous" code.

So, given clear lack of usage, unclear semantics and the issues in the
optimizer (currently causing segfaults), I think forbidding these
calls is the best solution. I would use the simple rule of "no dynamic
calls" rather than try to carve out exceptions like "calling with
call_user_func is not okay, but using $func() is allowed".


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#1886 (comment)

@nikic
Copy link
Member Author

nikic commented Apr 25, 2016

Another segfault related to this: https://bugs.php.net/bug.php?id=72102 (set_error_handler + func_get_args)

Also found an exciting new way to set a variable in PHP! (https://3v4l.org/NJh0H)

spl_autoload_register('parse_str');
function test() {
    $FooBar = 1;
    class_exists('FooBar');
    var_dump($FooBar); // string(0) ""
}
test();

So clearly, class_exists is a dangerous function :)

@dstogov
Copy link
Member

dstogov commented Apr 25, 2016

what about "new Foo()"?

May be we should deprecate most of this "dynamic" cases?


From: Nikita Popov notifications@github.com
Sent: Monday, April 25, 2016 19:49
To: php/php-src
Cc: Dmitry Stogov; Mention
Subject: Re: [php/php-src] Forbid dynamic calls to scope introspection functions (#1886)

Another segfault related to this: https://bugs.php.net/bug.php?id=72102 (set_error_handler + func_get_args)

Also found an exciting new way to set a variable in PHP!

spl_autoload_register('parse_str');
function test() {
$FooBar = 1;
class_exists('FooBar');
var_dump($FooBar);
}
test();

So clearly, class_exists is a dangerous function :)

You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHubhttps://github.com//pull/1886#issuecomment-214437814

Requires switching to 16-bit call info.
@nikic
Copy link
Member Author

nikic commented Apr 25, 2016

I've updated the patch to use a ZEND_CALL_DYNAMIC flag.

@dstogov Yes, the same applies to new or any other invocation of the autoloader. For new we need an extra autoloader to suppress the exception, but otherwise the same:

spl_autoload_register('parse_str');
spl_autoload_register(function($class) {
    // Avoid undefined class exception
    eval("class $class {}");
});

function test() {
    $FooBar = 1;
    new FooBar;
    var_dump($FooBar); // string(0) ""
}
test();

@bwoebi
Copy link
Member

bwoebi commented Apr 26, 2016

Regarding the error message: Not sure whether an user will understand what dynamically exactly means.
$f = "func_get_args"; $f(); is a dynamic call. The patch here is fixing the case of an internal function using a parameter as callback.
… Okay, not an issue.

Also, I'd eventually move the dynamic call check to the zend_is_callable_ex fetch? [set a flag in arginfo or similar? … We anyway have 4 byte of padding (64 bit at least) left in that structure...]
That way you can get the warning immediately instead of in a delayed way. With the patch here, you only get the failure message when the callback actually happens. [Also not slowing down these functions.]

@dstogov
Copy link
Member

dstogov commented Apr 26, 2016

This solution is cheaper, because we perform check only for "dangerous" function.


From: Bob Weinand notifications@github.com
Sent: Tuesday, April 26, 2016 14:28
To: php/php-src
Cc: Dmitry Stogov; Mention
Subject: Re: [php/php-src] Forbid dynamic calls to scope introspection functions (#1886)

Regarding the error message: Not sure whether an user will understand what dynamically exactly means.
$f = "func_get_args"; $f(); is a dynamic call. This is an internal function using a parameter as callback.

Also, I'd eventually move the dynamic call check to the zend_is_callable_ex fetch? [set a flag in arginfo or similar? ... We anyway have 4 byte of padding (64 bit at least) left in that structure...]
That way you can get the warning immediately instead of in a delayed way. With the patch here, you only get the failure message when the callback actually happens.

You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHubhttps://github.com//pull/1886#issuecomment-214710142

@nikic
Copy link
Member Author

nikic commented Apr 26, 2016

We already discussed this OTR, but repeating it here for the record: This patch does also cover $f = 'func_get_args'; $f();, i.e. it forbids dynamic calls in all forms, not just those going through zend_call_function. From a user perspective $f() and call_user_func($f) are the same thing, so if one is forbidden, the other should be as well.

@dstogov
Copy link
Member

dstogov commented Apr 26, 2016

I still fell a bit uncomfortable with this solution. I understand Nikta's point, and it makes sense.

BTW: I think we have two problems:

  1. some functions shouldn't be called from internal ones.
  • func_get_args()
  • func_get_arg()
  • func_num_args()

It's easy to fix them with additional check for prev_execute_data->func->type:

if (ex->func.type == ZEND_INTERNAL_FUNCTION) {
    zend_error(E_WARNING, "func_get_arg():  Called from internal function - no function context");
    RETURN_FALSE;
}

Interesting, but call_user_func('func_num_args') will work, because of call through trampoline :)

This also works with Nikita's solution.

We also may "unwind" to the upper user call frame (I'm not sure)

  1. All other functions that access local variables.

I really don't know how to solve the problem, but proposed solution (prohibit dynamic calls) doesn't look good.

The following code is valid, but it's prohibited, because it makes troubles to optimizer.

function foo() {

array_map("extract", [$_GET, $_POST,["a"=>"b"]);

var_dump(get_defined_vars());

}

foo();

A better solution, would require "aliases and side effects analyses" in optimizer, but this is significantly more difficult of course.

Nikita, I think the second option would require RFC. I won't object against it, but also won't support.

Thanks. Dmitry.


From: Nikita Popov notifications@github.com
Sent: Tuesday, April 26, 2016 14:59
To: php/php-src
Cc: Dmitry Stogov; Mention
Subject: Re: [php/php-src] Forbid dynamic calls to scope introspection functions (#1886)

We already discussed this OTR, but repeating it here for the record: This patch does also cover $f = 'func_get_args'; $f();, i.e. it forbids dynamic calls in all forms, not just those going through zend_call_function. From a user perspective $f() and call_user_func($f) are the same thing, so if one is forbidden, the other should be as well.

You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHubhttps://github.com//pull/1886#issuecomment-214716977

@nikic
Copy link
Member Author

nikic commented Apr 26, 2016

@dstogov I definitely see your concern here. There's a trade-off here between userland freedom and internal guarantees. As this is clearly more contentious than I thought it would be, I will have to create a discussion on the internals list.

I would like to note that your example using array_map may be "valid", but it's definitely not portable. It will not work in HHVM. The reason is that in HHVM array_map is implemented as a userland function (or rather as a HHAS function), so the "extract" will actually extract the variables into the scope of array_map, not the calling function.

This issue does not exist in PHP right now (because all our functions are implemented in C), but it does show a general issue, a dichotomy between internal and userland functions in this regard. For example, if someone were to implement a map function in userland in order to support Traversables and not only arrays using code like

function map($function, $iterable) {
    foreach ($iterable as $value) {
        yield $function($value);
    }
}

and then used your example map("extract", [$_GET, $_POST,["a"=>"b"]), this would not work (or rather, work differently), even though the same using array_map worked (in PHP at least).

If this is taken into account, i.e. thinking about the implementation of something like array_map in terms of how you'd do it in userland, it's downright weird that "extract" ends up modifying the scope of the calling function, rather the scope of array_map (which actually does the extract call). (Of course the latter is just not possible because it's an internal function -- but that's where the modification should happen logically.)

@dstogov
Copy link
Member

dstogov commented Apr 26, 2016

the best solution In long terms, from my point of view, is to deprecate (and then disable) extract() and parse_str() with one argument.


From: Nikita Popov notifications@github.com
Sent: Tuesday, April 26, 2016 15:32
To: php/php-src
Cc: Dmitry Stogov; Mention
Subject: Re: [php/php-src] Forbid dynamic calls to scope introspection functions (#1886)

@dstogovhttps://github.com/dstogov I definitely see your concern here. There's a trade-off here between userland freedom and internal guarantees. As this is clearly more contentious than I thought it would be, I will have to create a discussion on the internals list.

I would like to note that you example using array_map may be "valid", but it's definitely not portable. It will not work in HHVM. The reason is that in HHVM array_map is implemented as a userland function (or rather as a HHAS function), so the "extract" will actually extract the variables into the scope of array_map, not the calling function.

This issue does not exist in PHP right now (because all our functions are implemented in C), but it does show a general issue, a dichotomy between internal and userland functions in this regard. For example, if someone were to implement a map function in userland in order to support Traversables and not only arrays using code like

function map($function, $iterable) {
foreach ($iterable as $value) {
yield $function($value);
}
}

and then used your example map("extract", [$_GET, $_POST,["a"=>"b"]), this would not work, even though the same using array_map worked (in PHP at least).

If this is taken into account, i.e. thinking about the implementation of something like array_map in terms of how you'd do it in userland, it's downright weird that "extract" ends up modifying the scope of the calling function, rather the scope of array_map (which actually does the extract call). (Of course the latter is just not possible because it's an internal function -- but that's where the modification should happen logically.)

The following code is valid, but it's prohibited, because it makes troubles to optimizer. function foo() { array_map("extract", [$_GET, $_POST,["a"=>"b"]); var_dump(get_defined_vars()); } foo();

You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHubhttps://github.com//pull/1886#issuecomment-214724025

@wimleers
Copy link

Thanks for making PHP more sane!

@smalyshev
Copy link
Contributor

This should have a detailed note in UPGRADING with the list of functions affected.

@nikic
Copy link
Member Author

nikic commented May 24, 2016

Merged as 91f5940.

@nikic nikic closed this May 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants