Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Match expression #5371

Closed
wants to merge 14 commits into from
Closed

Conversation

iluuu1994
Copy link
Member

@iluuu1994 iluuu1994 commented Apr 11, 2020

Zend/tests/match/013.phpt Outdated Show resolved Hide resolved
@cmb69 cmb69 added the RFC label Apr 12, 2020
Copy link
Contributor

@carusogabriel carusogabriel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I want to write return match($x), with no space between the constructor and the list of arguments, does the current grammar proposal support it?

@iluuu1994
Copy link
Member Author

@carusogabriel Yep. Normal whitespace rules apply.

Zend/zend_compile.c Outdated Show resolved Hide resolved
@TysonAndre
Copy link
Contributor

EDIT: Also, I guess some optimizations for ZEND_SWITCH_STRING and ZEND_SWITCH_LONG would be incorrect for match, e.g. potential of assuming that loose equality means other branches are dead code.

Because there are opcache bugs, that'll need to be addressed, almost definitely by introducing a new opcode such as ZEND_MATCH, possibly along with ZEND_MATCH_STRING, ZEND_MATCH_LONG, as you mentioned in the TODO.

On a php-src build with this PR (and a few other proposed changes)

<?php

// Observed:
// With opcache.enable_cli=0, this correctly throws an UnhandledMatchError
// With opcache.enable_cli=1, this incorrectly throws a RuntimeException
function test() {
    $x = '2';
    match($x) {
        1, 2, 3, 4, 5 => { throw new RuntimeException(); },
    };
}
test();

Opcache sees that '2' and 2 are weakly equivalent, and assumes that optimizing away the other cases of the ZEND_SWITCH_LONG is safe (command used: php -d opcache.file_cache= -d opcache.opt_debug_level=0x20000 match_test.php)

0000 V0 = NEW 0 string("RuntimeException")
0001 DO_FCALL
0002 THROW V0
LIVE RANGES:
     0: 0001 - 0002 (new)

@TysonAndre
Copy link
Contributor

Another optimization I'd wanted to see in opcache for a while, but is out of the scope of this PR would be to automatically convert \in_array($value, [1, 3, 5, 7, 9], true) to match ($value) { 1, 3, 5, 7, 9 => true, default => false } (when all array values are known constants and the resulting opcodes would be a constant-time lookup)

  • Avoid a regular php function call
  • Convert iteration to an array lookup

(Opcache already does a similar optimization for non-strict in_array)

@iluuu1994
Copy link
Member Author

Thank you @TysonAndre for your very thorough review!

Copy link
Member Author

@iluuu1994 iluuu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opcache sees that '2' and 2 are weakly equivalent, and assumes that optimizing away the other cases of the ZEND_SWITCH_LONG is safe

is the cause of the incorrect optimization. Replacing it with this (the default case) solves the problem:

for (s = 0; s < block->successors_count; s++) {
    scdf_mark_edge_feasible(scdf, block_num, block->successors[s]);
}
return;

I'll have to investigate further tomorrow.

@TysonAndre
Copy link
Contributor

Also, there'd be a merge conflict if both this and the "throw expression" RFC pass.

Passing NULL to zend_compile_throw and checking for NULL in zend_compile_throw should fix that (there's no need for a temporary for the throw in the match expression, I'm guessing the temporary is bookkeeping for binary operations, etc.)

-void zend_compile_throw(zend_ast *ast) /* {{{ */
+void zend_compile_throw(znode *result, zend_ast *ast) /* {{{ */
 {
        zend_ast *expr_ast = ast->child[0];
 
@@ -4565,6 +4565,11 @@ void zend_compile_throw(zend_ast *ast) /* {{{ */
        zend_compile_expr(&expr_node, expr_ast);
 
        zend_emit_op(NULL, ZEND_THROW, &expr_node, NULL);
+
+       if (result != NULL) {
+               result->op_type = IS_CONST;
+               ZVAL_BOOL(&result->u.constant, 1);
+       }
 }
 /* }}} */
 
@@ -5378,7 +5383,7 @@ void zend_compile_match(znode *result, zend_ast *ast) /* {{{ */
                zend_ast *error_args_ast = zend_ast_create_list(0, ZEND_AST_ARG_LIST);
                zend_ast *new_error_ast = zend_ast_create(ZEND_AST_NEW, error_name_ast, error_args_ast);
                zend_ast *throw_ast = zend_ast_create(ZEND_AST_THROW, new_error_ast);
-               zend_compile_throw(throw_ast);
+               zend_compile_throw(NULL, throw_ast);

@iluuu1994
Copy link
Member Author

The optimizer is fixed now. Initially I wanted to use extended_value of the opcode but unfortunately that wasn't possible as it's already used for the default jump opnum. Thus I had to create two new opcodes (ZEND_MATCH_LONG and ZEND_MATCH_STRING).

Zend/zend_vm_def.h Outdated Show resolved Hide resolved
@@ -925,6 +928,7 @@ static int zend_dfa_optimize_jmps(zend_op_array *op_array, zend_ssa *ssa)
}
break;
case ZEND_SWITCH_STRING:
case ZEND_MATCH_STRING:
if (opline->op1_type == IS_CONST) {
zval *zv = CT_CONSTANT_EX(op_array, opline->op1.constant);
if (Z_TYPE_P(zv) != IS_STRING) {
Copy link
Contributor

@TysonAndre TysonAndre Apr 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any obvious bugs.

For this branch in particular, I think that if something is a ZEND_SWITCH_STRING, it would have to go to the first case (case by case IS_EQUAL comparisons), but if something was a ZEND_MATCH_STRING, it could safely go to the last case (the default case) instead?

(Both are correct, but the alternative is more efficient and I think it should eliminate more dead code)

removed_ops++;
MAKE_NOP(opline);
opline->extended_value = 0;
take_successor_ex(ssa, block_num, block, block->successors[0]);  // take the last successor instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is irrelevant now that the IS_IDENTICAL/JMPNZ chain was removed, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the following doesn't get optimized:

function test1(string $value) {
    echo match($value) {
        1, 2, 3, 4, 5 => 'a',
        default => 'b',
    };
}

test1(1);
0000 CV0($value) = RECV 1
0001 MATCH_LONG CV0($value) 1: 0002, 2: 0002, 3: 0002, 4: 0002, 5: 0002, default: 0004
0002 T1 = QM_ASSIGN string("a")
0003 JMP 0005
0004 T1 = QM_ASSIGN string("b")
0005 ECHO T1
0006 RETURN null

But that has more to do with the fact that $value is a CV and not a CONST.

@@ -896,6 +898,7 @@ static int zend_dfa_optimize_jmps(zend_op_array *op_array, zend_ssa *ssa)
break;
}
case ZEND_SWITCH_LONG:
case ZEND_MATCH_LONG:
if (opline->op1_type == IS_CONST) {
zval *zv = CT_CONSTANT_EX(op_array, opline->op1.constant);
if (Z_TYPE_P(zv) != IS_LONG) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here - for ZEND_MATCH_LONG, I'd assume taking the last successor instead of the first successor would be more efficient.

I'm not familiar with what take_successor_ex is actually doing though. I'd hope that'd work, but it may end up being more complicated than that.

							take_successor_ex(ssa, block_num, block, block->successors[0]);  // take last successor instead

Zend/zend_compile.c Outdated Show resolved Hide resolved
Zend/zend_compile.c Outdated Show resolved Hide resolved
Zend/tests/match/020.phpt Outdated Show resolved Hide resolved
Zend/tests/match/021.phpt Outdated Show resolved Hide resolved
@iluuu1994
Copy link
Member Author

Thanks @TysonAndre, your comments are tremendously helpful! Keep them coming 🙂

Zend/zend_compile.c Outdated Show resolved Hide resolved
@TysonAndre
Copy link
Contributor

I'm not sure why 017.phpt and 019.phpt started failing in travis, possibly adding to a hash table or looking up in a hash table the wrong way on some platforms, or ma

========DIFF========
002+ 
003+ Fatal error: Uncaught Exception in /home/travis/build/php/php-src/Zend/tests/match/017.php:4
004+ Stack trace:
005+ #0 /home/travis/build/php/php-src/Zend/tests/match/017.php(9): wrong()
006+ #1 /home/travis/build/php/php-src/Zend/tests/match/017.php(25): test_int('0')
007+ #2 {main}
008+   thrown in /home/travis/build/php/php-src/Zend/tests/match/017.php on line 4
========DIFF========
046+ string(1) "1"
046- string(7) "default"
========DONE========
FAIL Match expression long jump table [Zend/tests/match/019.phpt] 

@TysonAndre
Copy link
Contributor

And the JIT tests are failing, probably because the MATCH_STRING and MATCH_LONG opcodes would need to be updated for MATCH_STRING and MATCH_LONG) (e.g. it would work when there was an IS_IDENTICAL check to fall through to, but not when they get eliminated as dead code) https://dev.azure.com/phpazuredevops/PHP/_build/results?buildId=6794&view=ms.vss-test-web.build-test-results-tab&runId=155108&resultId=103028&paneView=debug

Unfortunately, there are few people familiar with the JIT, and I'm not one of them.


I'm terrible at assembly, but it looks like in ext/opcache/jit/zend_jit_x86.dasc,

returning 1 in zend_jit_switch and adding a todo might be an option for opline->opcode == ZEND_MATCH_LONG || opline->opcode == ZEND_MATCH_STRING.

I can't read what it's doing, but from the output of tests, it looks like it'd be emitting assembly to fall through to the next opcode in some cases, which works for ZEND_SWITCH_* but not ZEND_MATCH_*

If zend_jit_switch returned 1, it looks like opcache would just be prevented from JITing functions containing match if that was done, and people more familiar with internals could write the JIT code based on the zend_vm_def.h implementation.

e.g. 013.phpt

Stack trace
041+ string(1) "a"
042+ string(1) "a"
043+ string(1) "a"
044+ string(1) "a"
045+ string(1) "a"
046+ string(1) "a"
047+ string(1) "a"
048+ string(1) "a"
041- string(4) "b, c"
042- string(4) "b, c"
043- string(1) "d"
044- string(4) "e, f"
045- string(4) "e, f"
046- string(1) "g"
047- string(4) "h, i"
048- string(4) "h, i"

@iluuu1994
Copy link
Member Author

Finally 🎉 Thanks again @TysonAndre and @nikic for all the reviews!

@nikic
Copy link
Member

nikic commented Jul 10, 2020

Unfortunately we have some exception handling issues, found by msan:

<?php
var_dump(match(true) {
    1, 2, 3, 4, 5 => 'foo',
});
0000 INIT_FCALL 1 96 string("var_dump")
0001 MATCH bool(true) 1: 0002, 2: 0002, 3: 0002, 4: 0002, 5: 0002, default: 0004
0002 T1 = QM_ASSIGN string("foo")
0003 JMP 0005
0004 MATCH_ERROR bool(true)
0005 SEND_VAL T1 1
0006 DO_ICALL
0007 RETURN int(1)
LIVE RANGES:
     1: 0003 - 0005 (tmp/var)

Note that T1 range goes from 3-5, including MATCH_ERROR, which means that we'll try to free T1 when match throws, even though it has not been initialized.

@nikic
Copy link
Member

nikic commented Jul 10, 2020

My initial thought here was to fix this by adding a dummy T1 = MATCH_ERROR return value, that would make sure that the live range doesn't cross the MATCH_ERROR opcode. However, if I do that I get a misoptimization in block_pass (zend_t_usage).

@TysonAndre
Copy link
Contributor

My initial thought here was to fix this by adding a dummy T1 = MATCH_ERROR return value, that would make sure that the live range doesn't cross the MATCH_ERROR opcode. However, if I do that I get a misoptimization in block_pass (zend_t_usage).

Note that T1 range goes from 3-5, including MATCH_ERROR, which means that we'll try to free T1 when match throws, even though it has not been initialized.

@nikic - What about moving the default to be before the rest of the arms if it's a MATCH_ERROR? We already allow default => ... to explicitly be the first arm. This would have the benefit of avoiding an extra jump in one of the arms if UnhandledMatchError is assumed to be uncommon.

I.e. instead of this:

0001 MATCH bool(true) 1: 0002, 2: 0002, 3: 0002, 4: 0002, 5: 0002, default: 0004
0002 T1 = QM_ASSIGN string("foo")
0003 JMP 0005
0004 MATCH_ERROR bool(true)
0005 SEND_VAL T1 1

LIVE RANGES:
     1: 0003 - 0005 (tmp/var)

Use the following instead?

0001 MATCH bool(true) 1: 0003, 2: 0002, 3: 0002, 4: 0002, 5: 0002, default: 0002
0002 MATCH_ERROR bool(true)
0003 T1 = QM_ASSIGN string("foo")
0004 JMP 0005 (this opcode can be eliminated for a small performance benefit if default becomes the first arm)
0005 SEND_VAL T1 1

LIVE RANGES:
     1: 0004 - 0005 (tmp/var) (instead of including MATCH_ERROR)

I'm assuming that the solution you proposed above (T1 = MATCH_ERROR) works, it's just unoptimized, and that tooling for php 8.0 can assume that match expressions will be in php 8.0-dev.

@iluuu1994
Copy link
Member Author

@nikic I'm assuming you had the same misoptimization?

Specifically, this is wrong:

- 0045 ECHO string("1")
+ 0045 ECHO T1
...
- 0055 ECHO string("1, 1")
+ 0055 ECHO string("2, 1")

The exact place it is happening is here (for ZEND_QM_ASSIGN):

if (opline->op1_type == IS_CONST) {
literal_dtor(&ZEND_OP1_LITERAL(opline));
}
MAKE_NOP(opline);

Not sure if this is a match specific issue or a general optimization bug.

@TysonAndre Thanks for your suggestion, I'll try it!

iluuu1994 added a commit to iluuu1994/php-src that referenced this pull request Jul 11, 2020
As suggested by Tyson Andre:
php#5371 (comment)

Also fix line number of unhandled match error
iluuu1994 added a commit to iluuu1994/php-src that referenced this pull request Jul 11, 2020
As suggested by Tyson Andre:
php#5371 (comment)

Also fix line number of unhandled match error
iluuu1994 added a commit to iluuu1994/php-src that referenced this pull request Jul 11, 2020
As suggested by Tyson Andre:
php#5371 (comment)

Also fix line number of unhandled match error
@iluuu1994
Copy link
Member Author

Implemented here. Valgrind no longer complains.

iluuu1994 added a commit to iluuu1994/php-src that referenced this pull request Jul 11, 2020
As suggested by Tyson Andre:
php#5371 (comment)

Also fix line number of unhandled match error
iluuu1994 added a commit to iluuu1994/php-src that referenced this pull request Jul 11, 2020
As suggested by Tyson Andre:
php#5371 (comment)

Also fix line number of unhandled match error
iluuu1994 added a commit to iluuu1994/php-src that referenced this pull request Jul 11, 2020
As suggested by Tyson Andre:
php#5371 (comment)

Also fix line number of unhandled match error
@jrfnl
Copy link
Contributor

jrfnl commented Jul 12, 2020

Hi all, I've just been looking at this RFC and the implementation of it with an eye of adding support for match expressions to PHP_CodeSniffer and I was wondering why no new T_MATCH token has been introduced.

Is there anyone here who could enlighten me ?

Scratch that. I suspect that this feature isn't yet available in alpha2 which I was using for testing. Sorry for the noise.

@iluuu1994
Copy link
Member Author

iluuu1994 commented Jul 12, 2020

@jrfnl It's not available in alpha 2. It was merged a few days later.

@jrfnl
Copy link
Contributor

jrfnl commented Jul 12, 2020

@iluuu1994 Thanks for confirming. I look forward to a next alpha (for Windows), so I can actually test it.

php-pulls pushed a commit that referenced this pull request Jul 12, 2020
As suggested by Tyson Andre:
#5371 (comment)

Also fix line number of unhandled match error

Closes GH-5841.
@GrahamCampbell
Copy link
Contributor

GrahamCampbell commented Jul 12, 2020

You can use my PHP 8.0 docker image (on linux/mac or WSL2): registry.gitlab.com/grahamcampbell/php:8.0. It's based on the official docker PHP image (official as in by the Docker team, rather than the PHP core team), and includes some commonly used extensions, on by default, including Xdebug 3.0-dev.

nikic pushed a commit to nikic/PHP-Parser that referenced this pull request Jul 15, 2020
@zmitic
Copy link

zmitic commented Sep 20, 2020

First: I know this is not support forum so I am sorry about my question.

Second; I read on RFC that match condition can be any arbitrary expression. Does it mean we can do instance of?

Real-life example; I have base class that is being extended with 2 different classes (will be more) and each child must have matching DTO:

public static function createFromEntity(BaseConfig $config): self
{
    if ($config instanceof ServiceCategoryConfig) {
        return new ServiceCategoryConfigDTO($config->getCategory());
    }
    if ($config instanceof DiscountConfig) {
        return new DiscountConfigDTO($config->getDiscount());
    }

    // do something else here
}

Can match be used in cleaning this? I know I can put a method on entity class that will create DTO but I would rather avoid it.


If it can't, what about not requiring param and be something like:

return match() {
    $config instance of ServiceCategoryConfig => new ServiceCategoryConfigDTO($config->getCategory()), 
    $config instance of DiscountConfig => new DiscountConfigDTO($config->getDiscount()),
    default => // do something else here 
}

@iluuu1994
Copy link
Member Author

@zmitic Yes, but as with switch you have to pass true as the expression.

https://3v4l.org/stkDM

@zmitic
Copy link

zmitic commented Sep 20, 2020

@iluuu1994 Thank you, this is truly amazing feature!

@spiritinlife
Copy link

spiritinlife commented Oct 1, 2020

@iluuu1994

With the addition of union types wouldn't it make sense for match to handle this.
( I may be missing something obvious here )

<?php

class ValidationError {

   public function __construct(
       public string $message
   ) {}
    
}

function validate(string $errorsOnPoupanka): bool|ValidationError {
    
    if ( $errorsOnPoupanka === 'poupanka')
        return new ValidationError("Poupanka error");
    
    return true;
}

var_dump(match(validate("poupanka")) {
        ValidationError::class => $validationResult->message,
        true => "Nice",
});

Or even better something like this

var_dump(match(validate("poupanka")) {
        (ValidationError $error) => $error,
        (bool $v) => "Nice",
});

https://3v4l.org/sB1kM

@iluuu1994
Copy link
Member Author

@spiritinlife Matching by type would be part of pattern matching which is being discussed for 8.1. The syntax is not clear yet.

@spiritinlife
Copy link

@iluuu1994 I see thank you, is there a discussion i can follow for this ?

@iluuu1994
Copy link
Member Author

@spiritinlife No not at the moment. We're working on an RFC for enums/ADTs which would also make use of pattern matching. But nothing specific for type pattern matching.

devengineerikrime pushed a commit to devengineerikrime/PHP-Parser that referenced this pull request Jul 17, 2023
jrfnl added a commit to PHPCompatibility/PHPCompatibility that referenced this pull request Aug 4, 2023
PHP 8.0 introduced match expressions.

What with originally still supporting a wide range of PHPCS versions, detection of match expressions would need a lot of custom logic, so this would get a separate sniff (which I had as WIP locally).

However, now support for PHPCS < 3.7.1 has been dropped, detection of the keyword can be handled by the `NewKeywords` sniff.
Includes the tests I had originally set up for the separate sniff.

Refs:
* https://wiki.php.net/rfc/match_expression_v2
* php/php-src#5371
* php/php-src@9fa1d13

Related to 809
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet