Skip to content
This repository

Big performance problem on large arrays #380

Closed
geekeasy opened this Issue July 01, 2011 · 9 comments

3 participants

Adam Fabien Potencier Nikita Popov
Adam

I struggled for a long time to find and fix a performance problem I was having. Eventually I found it, and fixed it with a terrible hack. But ideally, it can be solved in twig.

I created a simple test case to show the problem.

In test #1, I create an array with 10,000 elements with random numbers in their keys. In the twig template, I run through the values 1 to 5,000, and check to see if those value are defined.

Twig is lighting fast -- 0.04 seconds.

In test #2, I only make one change to the code. Instead of having a one dimensional array, I add another level to the array with only one element. $test[rand()] becomes $test['x'][rand()].

In this 2nd test, twig grinds to a standstill. The code takes 17 seconds to execute.

That 400x slower.

Note -- this is based on code from Symfony RC3.

= = = = = = = =

Test #1 -- 0.04 seconds

Controller:

$test = array();
for ($i=0; $i<10000; $i++) {
$test[rand()] = rand();
}

Twig:

{% for i in 1..5000 %}

{% if test[i] is defined %}

{{ test[i] }}
{% endif %}
{% endfor %}

= = = = = = = =

Test #2 -- 17.5 seconds

Controller:

$test = array();
for ($i=0; $i<10000; $i++) {
$test['x'][rand()] = rand();
}

Twig:

{% for i in 1..5000 %}

{% if test['x'][i] is defined %}

{{ test['x'][i] }}
{% endif %}
{% endfor %}

Nikita Popov
nikic commented July 02, 2011

Hi!

There are two issues here: The first one will be fixed as soon as I find a more appropriate fix than #381. That already reduces the runtime from 8 seconds to 4 seconds on my machine.

The second one is more complicated: After applying the above fix your first example (using one-dimensional array) takes 4 seconds, whereas the multidimensional array is blazingly fast. (So my results are exactly contrary to yours). I tracked the issue down to isset($context['test']) ? $context['test'] : null always copying the value of $context['test']. So in the one-dimensional case 5000 elements are copied every iteration whereas in the multidimensional only one needs to be copied (I think).

I'm not sure how the latter issue can be resolved. One thing I could think of is making sure that all used elements in $context are initialized to null and then access the elements directly in the code, without need for isset. Though I fear that this isn't as easy to implement as it sounds as the $context array is passed around between multiple files, so it's hard to say which indexes are actually used.

Adam

Thanks for the quick response. I'm also continuing to look into the problem to see if I can find a fix, but I'm not very familiar with the compiler code.

First of all, let me say that I'm perplexed. I have no idea what I did wrong with my tests yesterday, but now the performance seems at least as bad for one-dimensional arrays as 2-dimensional arrays. That may be good news in helping find and solve the problem.

But, it is important to note that performance degrades exponentially:

5,000 elements: 6 seconds
10,000 elements: 17 seconds
20,000 elements: 48 seconds
30,000 elements: 89 seconds
40,000 element: 2+ minutes (php timed out)

Nikita Popov
nikic commented July 02, 2011

First of all, let me say that I'm perplexed. I have no idea what I did wrong with my tests yesterday, but now the performance seems at least as bad for one-dimensional arrays as 2-dimensional arrays.

That's probably because you tested with random data. In that case results can vary. You can reproduce the issue by just filling an array with continuous data and outputting it in Twig:

<?php
$test = array();
for ($i = 0; $i <= 5000; $i++) {
    $test[$i] = $i;
}
{% for i in 0..5000 %}
    {{ test[i] }}
{% endfor %}

The fact that performance doesn't degrade linearly makes copying in isset($context['test']) ? $context['test'] : null a plausible explaination:
For 5000 elements PHP needs to copy these 5000 times, i.e. 5000*5000 = 25000000 elements to copy.
For 10000 elements PHP already needs to copy 10000 elements 10000 times: 10000*10000 = 100000000
For 40000 elements we're at 40000*40000 = 1600000000 copy processes

Adam

Here's a test solely in php:
<?php>

   $timer = microtime(true);
    $context = array('test'=>array());

    for ($i=0; $i<100000; $i++) {
        $context['test'][$i] = $i;
    }

    for ($i=0; $i<5000; $i++) {
        print isset($context['test']) ? $context['test'][$i] : '';
    }
    print "\nTIME =  " . (microtime(true) - $timer);

In this example, we have 100,000 element arrays and it executes in 0.06 seconds.

So, I'm not sure where the problem is. But from that test, it doesn't seem like isset() is causing it.

Nikita Popov
nikic commented July 02, 2011

Replace

print isset($context['test']) ? $context['test'][$i] : '';

with

isset($context['test']) ? $context['test'] : '';

That [$i] makes the difference. With it only one element is copied, without it the whole array. (Removed the print as it doesn't change the situation.)

Adam

LOL. Ummm ... is possible this is a bug/fault in php's ternary (?:) operator?

Try replacing this:

   $x = isset($context['test']) ? $context['test'] : '';

With a written out if statement:

        if (isset($context['test'])) {  
            $x = $context['test'] ;
        } else { 
            $x = null;
        }

The ternary operator never stopped loading. The if block ended in 0.05 seconds.

Nikita Popov
nikic commented July 02, 2011

It's not a bug, it's by design: The ternary always returns by value, as noted in the documentation:

Please note that the ternary operator is a statement, and that it doesn't evaluate to a variable, but to the result of a statement. This is important to know if you want to return a variable by reference. The statement return $var == 42 ? $a : $b; in a return-by-reference function will therefore not work and a warning is issued in later PHP versions.

I suppose though that the behavior could still be optimized. I don't see a particular reason why the return value of a ternary shouldn't be able to share a zval with the original variable. Maybe I'll poke somebody on internals about that.

For Twig though the fix is simple: Just don't use the ternary (#382).

Adam

It does seem that copying by value is the way the ternary operator works by design.

So, to fix this, the code really needs to be re-written with if-blocks.

But.... here's a hacked up solution if it helps you any.

isset($context['test']) ? (($x=$context['test'])||true) : ($x=false );

$x is assigned the value by reference, so it's super fast.

And it returns true or false.

Adam

Another solution, which would be easy to implement would be to write our own ternary function which mirrors the functionality of php's ternary operator, with the exception of returning values by reference.

But some tests would certainly have to be done to see how this additional function call affects performance.

Are there benchmarks in place which could be used?

Fabien Potencier fabpot referenced this issue from a commit July 12, 2011
Fabien Potencier merged branch nikic/optimizeVariableAccess2 (PR #382)
Commits
-------

5b91f12 Update tests
a5ef326 Use Template->getContext in non-strict mode too

Discussion
----------

Use Template->getContext() when in non-strict mode, too

As described in #380 using the current `isset($context['test']) ? $context['test'] : null` approach can cause problems in some cases: The ternary operator always returns by value and thus PHP's copy-on-write concept does not apply. So `$context['test']` needs to be copied in any case, even though a write never happens to it. Basically Twig is copying the values of all variables ever used inside Twig if it operates in non-strict mode. This is problematic when `$context['test']` contains big arrays as the whole array is copied.

In this patch I change Twig to use `Template->getContext` when in non-strict mode too and add an `isStrictVariables` check in there.
0ef96f6
Fabien Potencier fabpot referenced this issue from a commit July 12, 2011
Fabien Potencier merged branch nikic/improveIsDefinedPerformanceInNonStrictMode (PR #381)
Commits
-------

e4b8371 Improve performance of is defined in non strict mode

Discussion
----------

Improve performance of is defined in non strict mode

Addresses issue #380: This gives a good performance improvement on defined tests in non-strict code. About 2x.

Reasoning: It doesn't make sense to propagate the `is_defined_test` flag into deeper levels in non-strict mode, because all getAttr functions and name accesses will just return `null` if it doesn't exist and `null` is just as good as `false`.

---------------------------------------------------------------------------

by nikic at 2011/07/02 03:22:53 -0700

Though, do not merge yet please. Changes behavior concerning methods.

---------------------------------------------------------------------------

by nikic at 2011/07/02 13:34:32 -0700

Hm, I really don't know what I was thinking when I wrote my last comment. There shouldn't be any behavior changes introduced by that change (and tests pass).
90d99a7
Fabien Potencier fabpot closed this July 12, 2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.