Skip to content

Fix bug #63217 #1649

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

hikari-no-yume
Copy link
Contributor

Fixes https://bugs.php.net/bug.php?id=63217

I wouldn't merge this just yet, as the current fix most likely hurts performance, as it removes a buggy optimisation. I am going to try and add a replacement optimisation which doesn't have the same issue.

The nature of the bug means that fixing it is possibly a BC break, although one very unlikely to have any impact. For that reason, I'd prefer for it to go into 7.0.0.

@weltling
Copy link
Contributor

Hi Andrea,

it is quite late for 7.0.0 in general, but also as RC8 is already tagged and this is definitely not a security patch. I would suggest to target master therefore. Too bad, as it's such a long living ticket.

Apart from that, IMHO i would see the heaviness of this bug as low. Clear, the consistency of the ArrayAccess API matters, but the practical win would lay very much on the implementation. Adversely affecting the regular array API fe were IMO not worth it. So some balanced patch would need some time and good review.

Thanks

@hikari-no-yume
Copy link
Contributor Author

It could go into a 7.0.x release actually, the BC break risk is very low.

I'm going to try optimising it.

@weltling
Copy link
Contributor

I would really prefer to target master with this. What you say is true - the impact ArrayAccess part is very small. I'm more concerned about the regular array API, changing the array API for the reason to have "1" instead of 1 is not a sufficient reason to touch that so late in the release.

Thanks.

@hikari-no-yume
Copy link
Contributor Author

This doesn't change the API anywhere. The change is entirely internal to the executor.

@hikari-no-yume hikari-no-yume force-pushed the ArrayAccess_fix branch 2 times, most recently from a64f548 to 469487a Compare November 26, 2015 00:48
@hikari-no-yume
Copy link
Contributor Author

Okay, that second commit is a new optimisation, intended to mitigate the impact of removing the buggy optimisation somewhat.

Let's benchmark!

Test script:

<?php

$arr = [
    "foo" => true,
    "foobar" => 3,
    "elePHPant" => "qux",
    "long_key_name" => "something contrived",
    "123" => 76,
    "001234" => null,
    "6276372623" => "another"
];

$start = microtime(true);

for ($i = 0; $i < 30000000; $i++) {
    $foo = $arr["foo"];
    $arr["foo"] = $foo;
    $exists = isset($arr["foo"]);
    $foobar = $arr["foobar"];
    $arr["foobar"] = $foobar;
    $exists = isset($arr["foobar"]);
    $elePHPant = $arr["elePHPant"];
    $arr["elePHPant"] = $elePHPant;
    $exists = isset($arr["elePHPant"]);
    $long_key_name = $arr["long_key_name"];
    $arr["long_key_name"] = $long_key_name;
    $exists = isset($arr["long_key_name"]);
}

$end = microtime(true);

echo basename(PHP_BINARY), ": ", $end - $start, "s", PHP_EOL;

Machine is my MacBook Air 2013 13" (1.3GHz i5, 4GB RAM, OS X 10.11). Compiler: Apple LLVM version 7.0.0 (clang-700.1.76).

php-unfixed is PHP-7.0, php-fixed is the first commit (bugfix which removed the optimisation), php-optimised is the second (bugfix, but with my new optimisation).

Config line: ./configure --enable-phpdbg --disable-all --disable-cgi

And...

$ ./php-unfixed foobar.php; ./php-fixed foobar.php; ./php-optimised foobar.php
php-unfixed: 12.567418813705s
php-fixed: 12.229693889618s
php-optimised: 12.708938121796s
$ ./php-unfixed foobar.php; ./php-fixed foobar.php; ./php-optimised foobar.php
php-unfixed: 12.405853033066s
php-fixed: 12.296113967896s
php-optimised: 13.292367935181s
$ ./php-unfixed foobar.php; ./php-fixed foobar.php; ./php-optimised foobar.php
php-unfixed: 12.583276987076s
php-fixed: 12.502597093582s
php-optimised: 12.523111104965s
$ ./php-unfixed foobar.php; ./php-fixed foobar.php; ./php-optimised foobar.php
php-unfixed: 12.599713087082s
php-fixed: 12.289211988449s
php-optimised: 12.567456007004s
$ ./php-unfixed foobar.php; ./php-fixed foobar.php; ./php-optimised foobar.php
php-unfixed: 12.46280002594s
php-fixed: 12.284049034119s
php-optimised: 12.654495954514s

...what the hell? It looks like removing the optimisation we had actually makes things faster! Maybe the removal of the extra branch speeds things up.

If this can be replicated, we should just merge the bugfix alone (the first commit).

        unfixed         fixed               optimised
        12.5674188137   12.2296938896   12.7089381218
        12.4058530331   12.2961139679   13.2923679352
        12.5832769871   12.5025970936   12.523111105
        12.5997130871   12.2892119884   12.567456007
        12.4628000259   12.2840490341   12.6544959545

AVG     12.5238123894   12.3203331947   12.7492738247
DIFF    +0.2034791946   -0.2034791946

@weltling
Copy link
Contributor

Andrea,

I really don't feel comfortable with this patch in 7.0. Please fetch someone for review and target master.

Thanks.

@hikari-no-yume
Copy link
Contributor Author

That means the bug won't be fixed for users until late next year, which is a shame, as it makes ArrayAccess less useful. However, if that is how it has to be, so be it.

@hikari-no-yume
Copy link
Contributor Author

Interestingly, if the performance improvement is reproducible and happens on other configs too, we could trivially remove the optimisation yet keep the bug. I wouldn't recommend that, though :p

@rtheunissen
Copy link
Contributor

For what it's worth, this would be a significant fix for 7.0.0. Here is why:

Currently the only mechanism we have to make use of array syntax is via ArrayAccess, or internal object handlers (which are also affected by this). With the introduction of scalar typehints, it will be common to treat integer and string keys differently. This is not possible without this fix.

For example, you might have a Set type structure which allows for much faster array_unique behaviour, but you can't use isset($set['1']) because it wouldn't be consistent with $set->contains('1'), which does not treat 1 and "1" as the same value.

the impact ArrayAccess part is very small

It's the only impact of this fix, for which there is no workaround.

I'm more concerned about the regular array API

This fix doesn't affect the array API in any way.

It looks like removing the optimisation we had actually makes things faster!

So not only is this a significant consistency fix, but also potentially increases performance? That to me is a no brainer for a major release, and I would personally be very disappointed if this patch is deferred.

Wouldn't this also be a BC break, and therefore not even eligible for 7.0.x?

@hikari-no-yume
Copy link
Contributor Author

Well, we don't know yet on perf. It might perform differently on other compilers and OSes. But we'll see :)

@laruence
Copy link
Member

I think this fix missed the part in opcache, zend_optimizer_update_op2_const do some similar converting..

@hikari-no-yume
Copy link
Contributor Author

I wondered if there might be an opcache part. I'll look into that.

@hikari-no-yume
Copy link
Contributor Author

I removed my new optimisation from this pull request as I think it's a distraction, and it might actually make perf worse. It could always be added later, or another approach to re-optimisation. My new optimisation is now in a separate branch at TazeTSchnitzel/ArrayAccess_fix_with_new_optimisation instead.

@laruence: Thanks for the pointer, the bugfix now addresses the opcache version of the optimisation as well and updates the test case to check for it.

@hikari-no-yume
Copy link
Contributor Author

I did the same test as before, but with GCC (gcc version 5.2.0 (Homebrew gcc 5.2.0)).

Results:

    unfixed         fixed           optimised
    9.0254018307    8.9491841793    8.8605818748
    9.9200131893    9.5331468582    9.3726780415
    9.4989490509    9.5090491772    9.5777039528
    9.7903709412    9.7089078426    9.4753980637
    9.5222520828    9.5866940022    9.5860488415
    9.5696458817    9.6496810913    9.4691541195
    9.5228209496    9.6292300224    9.5004479885
    9.558713913     9.3653769493    9.4608740807
    8.9882009029    8.8360011578    8.9184868336
    8.9244539738    8.8565421104    8.757791996
    8.992069006     8.8712611198    8.8668079376
    8.9923410416    8.8937549591    8.8201458454
AVG 9.3587693969    9.2522895435    9.2221766313
DIFF0.1064798534    -0.1064798534   

Huh. Even on GCC, the fixed version is faster. But interestingly, my new optimisation is faster still - unlike on clang, where it is slower than the unfixed version. Odd.

Going to look into this a little more.

@hikari-no-yume
Copy link
Contributor Author

At @bwoebi's suggestion I looked at the instruction counts with callgrind (valgrind-3.11.0.SVN), with only 300,000 loop iterations (instead of 30,000,000) since valgrind is very slow. Results:

        unfixed     fixed       optimised
clang   925,791,973 924,605,145 922,185,296
gcc     744,469,730 754,075,468 756,455,717

Well, this is interesting. clang generates way more instructions, which explains the huge difference between it and GCC in the benchmark.

But more importantly, the unfixed version generates the least instructions in GCC, and the most in clang. My new optimisation generates the most instructions in GCC, yet takes the least time, and generates the least instructions in clang, yet takes the most time.

?!

@laruence laruence added the Bug label Nov 29, 2015
@hikari-no-yume
Copy link
Contributor Author

Regarding whether or not this is safe to merge, the danger lies in the assumption that constant operands to indexing opcodes are never numeric strings. So long as that assumption remains, it's unsafe to remove the compilation stage numeric string check. However, as this patch removes this assumption in the executor, it's safe to remove the compiler checks. If we wanted to, we could remove the VM assumption but not the compiler check: this doesn't fix the bug, but might provide a performance boost if my numbers are somehow reproducible. This would be a bit silly, though.

Note that with the VM assumption removed, it's of no consequence whether or not the compiler still does the check, nor does it matter what the opcodes contain, as the VM will always do a numeric string check now. So we could fix this bug in a 7.0.x release and it shouldn't break anything looking at the opcodes. Opcodes for an unfixed 7.0.0 would be executed correctly by a fixed 7.0.1, though the reverse wouldn't be true.

@laruence
Copy link
Member

instead of fix this in vm, maybe we should fix this in SPL ? I mean, for array access, always convert numeric string to long, which will make thing consistent, and also won't slowdown anything?

@laruence
Copy link
Member

like this patch: http://pastebin.com/S6c7WKYj we enforce array access always convert numeric string to int. which seems also acceptable as we do it long time for plain array access

@weltling
Copy link
Contributor

Andrea, i was actually talking about the normal array API in userland. As the patch touches the place where the boringly normal access to array happens. So to fix 1 vs. "1" it looks like using a gun to shoot a mouche, because it's unknown how the normal array usage is affected.

Clear it's a bug, a long standing one. I'd ask you to please check the Hui's suggestion as well, probably also to discuss this on internals. Touching such core functionality, especially removing optimization, IMHO should be justified and verified. If it has to be - so be, but please lets clear out the concerns.

Thanks.

@nikic
Copy link
Member

nikic commented Nov 30, 2015

@laruence Imho you should be able to implement a proper, type-safe map using ArrayAccess, if you want. That patch would only make the problem worse...

@dstogov
Copy link
Member

dstogov commented Nov 30, 2015

I like laruence's approach. Instead of keeping PHP5 behaviour of ArrayAccess that allows unconverted numeric strings, we should make them behave in consistency with regular arrays.

@jerrygrey
Copy link

I'm scratching my head at the need for the casting to int based on the results @TazeTSchnitzel got, if there is no performance gains from it why have it at all in either ArrayAccess or regular arrays?

@laruence
Copy link
Member

@nikic it depends, if we all can get a rule, that is, in PHP, numeric string is treat as long while doing arrayaccess, then this will be easy for the user. he can either treat all numeric string as int or string as he like, at least he will be sure, that numeric string and int will be passed in as int form.

@hikari-no-yume
Copy link
Contributor Author

ArrayAccess already doesn't behave like an array. You can use objects with ArrayAccess, e.g. SplObjectStorage. You can also use floats.

If we are to change ArrayAccess to coerce numeric strings to integers, we should probably change how it handles all other types. But we can't do that.

@hikari-no-yume
Copy link
Contributor Author

hikari-no-yume commented Dec 22, 2015

Even if this would hurt performance, the difference does not look to be that big even in this synthetic benchmark. I think it's a distraction.

@morrisonlevi
Copy link
Contributor

I disagree with @laruence and @dstogov on this one. The fix is not in the SPL. These must be the same when $foo implements ArrayAccess:

$get = $foo['1'];

$var = '1';
$get = $foo[$var];

Similarly these must also be the same when $foo implements ArrayAccess:

$get = $foo['1'];
$get = $foo->offsetGet('1');

Additionally, offsetGet must be given a variable of type string. Deviating from standard variable conversion behavior specifically for ArrayAccess is a bad idea. We need fewer special-cases, not more of them.

This cannot be anything except a bug. In this case it's an errant optimization in the engine causing this bug, not the SPL, so the engine must be fixed.

@trowski
Copy link
Member

trowski commented Jul 7, 2016

@krakjoe @dshafik Should we merge this for 7.1?

@nikic
Copy link
Member

nikic commented Jul 7, 2016

Independently of whether we want to merge it, the current patch is not complete, in particular adjustments in zend_inference.c will be necessary.

@krakjoe
Copy link
Member

krakjoe commented Jan 1, 2017

@TazeTSchnitzel the ship has sailed for 7.0 and 7.1, this PR needs to be rebased on master, completed (see Nikitas comment) and pushed forward there ... sorry about that ...

@hikari-no-yume
Copy link
Contributor Author

I'm aware.

I might update this eventually.

@krakjoe
Copy link
Member

krakjoe commented Feb 22, 2017

@TazeTSchnitzel if you do not intend to work on this PR, please close it.

If no commits are made a month hence, it will be considered abandoned and be closed.

@hikari-no-yume
Copy link
Contributor Author

Okay, I'm closing it. There's no need to keep it open even as a reminder, bugs.php.net reminds me regularly that it needs updating.

@rtheunissen
Copy link
Contributor

@hikari-no-yume I know you've lost interest in this bug so I would like to self-assign it and try my best to fix it myself. Are you definitely not going to fix it? I'm not entirely sure what's involved but happy to give it a shot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants