Skip to content

Conversation

kelunik
Copy link
Member

@kelunik kelunik commented Mar 7, 2017

The cache evicts buckets with round robin now instead of not caching new entries at all.

See https://bugs.php.net/bug.php?id=73887.

/cc @bwoebi @nikic @staabm @krakjoe

The cache evicts buckets with round robin now instead of not caching new
entries at all.
@kelunik
Copy link
Member Author

kelunik commented Mar 7, 2017

@weltling Can this be merged into the next patch release or does it have to wait to go through RC?

@nikic nikic added the Feature label Mar 7, 2017
@bwoebi
Copy link
Member

bwoebi commented Mar 7, 2017

It doesn't look like anything ultra-urgent, I'd rather wait for the next RC.

@weltling
Copy link
Contributor

weltling commented Mar 8, 2017

@kelunik thanks for the PR. As usual, everything has to go through QA. Also, the patch looks like targeting master, as there's an API breach.

But also, I'd be not that quick. I see, that this is an attempt to implement some kind of LRU handling. However, some points are questionable, as for me. Those are

  • a benchmark/reproduce case is missing
  • how does incremental index ensure a bucket is least used? The entries are added based on the key, not incrementally.
  • ttl should be not ignored, at least as long as it's impossible to say the entry is a least used one

Please expand more on the implementation idea and give a benchmark. This part affects any FS function, so we have to be very conservative while evaluating changes to it. What were safe for stable branches is to follow the realpath_cache_find logic and walk through the buckets within the same or multiple slots, respecting ttl. A more comprehensive solution could taget next minor for proper LRU or LFU implementation.

Thanks.

@kelunik
Copy link
Member Author

kelunik commented Mar 8, 2017

how does incremental index ensure a bucket is least used? The entries are added based on the key, not incrementally.

It doesn't. But "random" eviction should be better than no eviction at all. Currently we just don't cache anything in case the cache is full and only evict on finds.

respecting ttl

The TTL is a maximum cache period, not a minimum. In case of a full cache, an item can be evicted at any time.

Both LFU and LRU are more complex and need additional data structures.

@weltling
Copy link
Contributor

weltling commented Mar 8, 2017

@kelunik yeah, that's my point as well, no suitable structure currently to integrate the algorithm. Whether it's complicated - probably not, a queue that would pop entries from the end were already an LRU, would need more refactoring ,though. Anyway, depending on approach, that would be more for master.

Fixing in user land done by

  • increasing the cache size, already done for default and for a specific app it's suitable to do individually
  • reducing the ttl

Both are fine tuning specific to a given app without any patching. But regarding the patch approach, worst case - it would free the entries just added in previous invocation. In that case the risk is to do false work by adding same things over and over, that's Murphy's.

If you want to target a stable branch, IMO some less intrusive and more robust approach is needed. Simply mirroring the logic from realpath_cache_find would be already an improvement on low cost. Maybe as extension on this, it would be ok to also free items that are about to expire, or by kinda grace period criteria.

For a given application, a fine tuning will have to be needed in any case, if one cares. Otherwise, a comprehensive implementation should target master.

Thanks.

@kelunik
Copy link
Member Author

kelunik commented Mar 9, 2017

Fixing in user land done by

  • increasing the cache size, already done for default and for a specific app it's suitable to do individually
  • reducing the ttl

Both of those aren't fixes.

  • Increasing the cache size is possible if you know the maximum, but it's not possible to predict that if you access and serve random files.
  • Reducing the TTL doesn't help at all, as entries are only evicted if find hits that bucket. In the worst case many files fill up bucket n at the beginning and are never evicted, because further file access only maps to buckets m with m !== n.

What works is watching the remaining space and triggering a full cache clear in case it's full, but that's obviously a lot worse.

But regarding the patch approach, worst case - it would free the entries just added in previous invocation. In that case the risk is to do false work by adding same things over and over, that's Murphy's.

Regardless of the approach, something like that might happen. In case of LRU it happens in case you repeatedly access n + 1 files in the same order where only n files fit into the cache.

@weltling
Copy link
Contributor

weltling commented Mar 9, 2017

Exactly, it's not perfect, but it's not different from any other like Apache or MySQL. There are no endless resources, so one has to fine tune for a specific app. This is the devop work, that we're hard to do from PHP.

Reducing the TTL doesn't help at all, as entries are only evicted if find hits that bucket. In the worst case many files fill up bucket n at the beginning and are never evicted, because further file access only maps to buckets m with m !== n.

Yes. So other slots might need to be walked through as well, to free space. The current TTL logic should be kept, if you target stable branch. Things that are configurable INI options should not be ignored.

Regardless of the approach, something like that might happen. In case of LRU it happens in case you repeatedly access n + 1 files in the same order where only n files fit into the cache.

No. When something is found or inserted, the item is moved on top of the queue. So then, you'd pop something and that would be an LRU item, which is known exactly. Remove some random and hope for best is not suitable for this case. LFU could probably be even better, if doing refactoring.

In the end - there should be either a solid implementation to target master, or a compatible approach to target a stable branch. The patch in the current form is none of both. Plus, a benchmark is required, please provide one.

Thanks.

@nikic
Copy link
Member

nikic commented Mar 9, 2017

I think doing random eviction (like this patch essentially does) is fine as a starting point. As long as the usual assumptions about temporal locality hold true, random eviction still performs better than no eviction, as it is more likely that the newly inserted path will be reused than any random path.

Of course we do need some benchmarks to determine the impact of this change. This would also tell us whether or not we should invest in a more sophisticated eviction strategy. A naive LRU implementation will take 16 bytes per realpath cache entry and it's not clear to me whether that's a good tradeoff or not.

We should also reevaluate the default realpath cache size if eviction is implemented. I think the change to a 4MB default size is excessive. I think in the thread where this change was made it was determined that even large applications do not use more than 1MB of realpath cache, so I'm not sure why we went with that particular number. It has been pointed out elsewhere that such a large realpath cache is problematic if you have many worker processes, as the cache is replicated in each one. With eviction implemented, we can probably go for a lower value here -- but I'm not sure just how well the random eviction will perform in some cases, for example a generational use caused by symlink swapping deployment. An LRU cache would be able to displace all the entries form the previous generation in one iteration, while random displacement will take more time to converge to a stable state.

@@ -232,6 +232,7 @@ typedef struct _virtual_cwd_globals {
zend_long realpath_cache_size;
zend_long realpath_cache_size_limit;
zend_long realpath_cache_ttl;
zend_ulong realpath_cache_clean_index;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @weltling pointed out, this is an ABI break.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a suggestion what to do instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case just moving the new member to be the last member in the struct is fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. 👍

@kelunik
Copy link
Member Author

kelunik commented Mar 9, 2017

@nikic In case of symlink deployments, the cache should be cleared manually to prevent inconsistencies.

@nikic @weltling Which kind of benchmark do you imagine? Real access patterns will obviously vary a lot, finding a good benchmark will be hard.

@nikic
Copy link
Member

nikic commented Mar 9, 2017

@kelunik Symlink deployments resolve symlinks at the webserver level (otherwise the deployment is non-atomic if opcache is used), so the realpath cache does not really matter for consistency.

@weltling
Copy link
Contributor

weltling commented Mar 9, 2017

A usual app can be fine tuned, it is the fact. The actual issue targets servers written in PHP, which is by far not the common case. The common case is an app running under some web server, with known requirements, etc. The increase of the default cache size is already the improvement for any case, even if it's oversized.

I'm not talking about no eviction, just that the patch is not yet well thought and there's currently no repro case to observer the behaviors. The realpath_cache_lookup()'s can be done also outside the core, so the patch can affect external exts as well.

We currently iterate the same slot in realpath_cache_find(), which f.e. could be optimized. If we find the exact path, but the bucket has expired - so a simple action could be to renew its ttl, no free required. In general, the find function should not free anything. Being lazy at the place would spare processor time, especially if the cache never gets full.

Even the default size is big now, we promise the cache TTL, even if it's not obvious on the user side. If we had a reproduce case, we'd see further strategy, indeed. As with this promise, only expired buckets should actually go away. If we abandon the promise, even conditionally, so it should be done a consistent way.

All in all, i'd not see any reason for handling this fast in the stable branch. The patch doesn't target the common case, the default cache size is big, and the fine tuning can be done. Far better were to first work out an idea to fix this in all the complexity, preferably based on the show case that we can observe and taking in account possible points/issues mentioned. Not only fixing the given case, but taking care about the other parts.

@kelunik for a repro, ideally there should be a synthetic case. Or a test repository with prepared app setup. As last, a VM to fetch would probably work, too, vmware or hyper-v as for me :)

Thanks.

@kelunik
Copy link
Member Author

kelunik commented Mar 9, 2017

Even the default size is big now, we promise the cache TTL, even if it's not obvious on the user side. If we had a reproduce case, we'd see further strategy, indeed. As with this promise, only expired buckets should actually go away. If we abandon the promise, even conditionally, so it should be done a consistent way.

A cache doesn't promise any TTL. A value store might, but not a cache. A cache evicts entries as it sees need in case it's full.

If you want to see a simple example benchmark:

┌[kelunik@kelunik] - [~/GitHub/php/php-src] - [18:18:08] - [bug-73887]
└[4855] $ time php -d "realpath_cache_size=16K" -r '$files = glob("/home/kelunik/GitHub/php/**/*"); for ($i = 0; $i < 1000000; $i++) { fopen($files[mt_rand(0, count($files) - 1)], "r"); } var_dump(realpath_cache_size());'
int(16384)

real	0m4.046s
user	0m2.268s
sys	0m1.700s

┌[kelunik@kelunik] - [~/GitHub/php/php-src] - [18:18:15] - [bug-73887]
└[4856] $ php -v
PHP 7.2.0-dev (cli) (built: Mar  7 2017 15:42:41) ( NTS DEBUG )
Copyright (c) 1997-2017 The PHP Group
Zend Engine v3.2.0-dev, Copyright (c) 1998-2017 Zend Technologies

┌[kelunik@kelunik] - [~/GitHub/php/php-src] - [18:18:17] - [bug-73887]
└[4857] $ time sapi/cli/php -d "realpath_cache_size=16K" -r '$files = glob("/home/kelunik/GitHub/php/**/*"); for ($i = 0; $i < 1000000; $i++) { fopen($files[mt_rand(0, count($files) - 1)], "r"); } var_dump(realpath_cache_size());'
int(16382)

real	0m2.948s
user	0m1.196s
sys	0m1.728s

@kelunik
Copy link
Member Author

kelunik commented Mar 9, 2017

A usual app can be fine tuned, it is the fact. The actual issue targets servers written in PHP, which is by far not the common case. The common case is an app running under some web server, with known requirements, etc. The increase of the default cache size is already the improvement for any case, even if it's oversized.

In case you fine tuned the limit and it's large enough for your app, nothing will change. Entries are only evicted in case of a full cache.

@kelunik
Copy link
Member Author

kelunik commented Mar 9, 2017

The previous benchmark isn't really fair. The build I used was configured differently. Compiled with the same options from master again now. First one is with the patch, latter without.

┌[kelunik@kelunik] - [~/GitHub/php/php-src] - [19:28:06] - [bug-73887]
└[4888] $ time sapi/cli/php -d "realpath_cache_size=16K" -r '$files = glob("/home/kelunik/GitHub/php/**/*"); for ($i = 0; $i < count($files); $i++) { for ($j = 0; $j < 5000; $j++) { fopen($files[$i], "r"); } } var_dump(realpath_cache_size());'
int(16300)

real	0m3.124s
user	0m1.100s
sys	0m2.016s
┌[kelunik@kelunik] - [~/GitHub/php/php-src] - [19:21:24] - [master]
└[4882] $ time sapi/cli/php -d "realpath_cache_size=16K" -r '$files = glob("/home/kelunik/GitHub/php/**/*"); for ($i = 0; $i < count($files); $i++) { for ($j = 0; $j < 5000; $j++) { fopen($files[$i], "r"); } } var_dump(realpath_cache_size());'
int(16328)

real	0m3.758s
user	0m1.260s
sys	0m2.496s

@kelunik
Copy link
Member Author

kelunik commented Mar 17, 2017

@weltling ping

@weltling
Copy link
Contributor

Thanks for the repro case. Likely, the result will be different with different cache size, especially with the current cache size, have you tried? For now, the gain seems not very significant.

A cache doesn't promise any TTL. A value store might, but not a cache. A cache evicts entries as it sees need in case it's full.

It is a tautology. The cache consists on storage. There's an INI option to control both the size and the TTL. It is a feature.

In case you fine tuned the limit and it's large enough for your app, nothing will change. Entries are only evicted in case of a full cache.

But this is the exact point - the doable solution is in place and we see there's an improvement potential on the patch that possibly won't fit into the stable branch or have side effects that we don't see yet.

I'm all for improving this part. The current state is not convincing for a stable branch, we sohuld really rethink and go for some refactoring, that allows a robust handling, especially it's enough time to to be well tested for 7.2. As for me, i'd feel more certain with an approach where we would know exactly why the entries are freed.

Thanks.

@kelunik
Copy link
Member Author

kelunik commented Mar 20, 2017

Thanks for the repro case. Likely, the result will be different with different cache size, especially with the current cache size, have you tried? For now, the gain seems not very significant.

Before I do that... what's significant for you?

There's an INI option to control both the size and the TTL. It is a feature.

Again, the TTL is a maximum. It doesn't make sense to say it is guaranteed, because it's not. Every path accessed after the cache is full is currently not cached at all. Quoting @nikic:

I think doing random eviction (like this patch essentially does) is fine as a starting point. As long as the usual assumptions about temporal locality hold true, random eviction still performs better than no eviction, as it is more likely that the newly inserted path will be reused than any random path.

The current state is not convincing for a stable branch, we sohuld really rethink and go for some refactoring

What's the eviction strategy you propose? The patch as is improves the situation for all long running scripts while not affecting applications which are fine tuned to fit exactly in any significant way (a single if).

But this is the exact point - the doable solution is in place

There's no solution in place. It just doesn't cache anything in case the cache is full.

@weltling
Copy link
Contributor

I took time to test your patch. The snippet, current 7.0 TS 64-bit, run on 7.0 branch source


$files = glob("./**/*");

$t = microtime(true);
for ($i = 0; $i < count($files); $i++) {
        for ($j = 0; $j < 5000; $j++) {
                if (is_file($files[$i])) {
                        fopen($files[$i], "r");
                }
        }
}
var_dump(microtime(true) - $t);
var_dump(realpath_cache_size());

Output with patch
float(234.97988390923)
int(134778)

Without patch
float(229.38653302193)
int(135266)

The realpath cache size is default, 4096k as of today. From what i see, the performance even gets worse, at least with the snippet.

Seems we talk past each other :) TTL is an INI option, even if the cache is full. Your approach is to free some random item, with no regard to TTL. But, even if storage is full, it doesn't mean there are no expired items. In theory, as Nikita also mentioned, the recent items are likely to be used. We need to test tihs. My concern, in theory, is that items added can be freed just in the next step, tuhs producing more overhead, because nothing in your patch ensures the items to be freed are not expired.

I'm not proposing any approach, maybe yet. Only what I say is, that the current approach is not convincing and doesn't fit stable branch. If i would propose or start one, it would be rather about going a proper LRU/LFU implementation, maybe involving some refactoring and targeting 7.2

  • the fix doesn't target the common case, but could impact it
  • the algorithm to free items is different in the search part and in the add part
  • the time frame between stable releases is too short to ensure stability
  • preferable were more tests, with real life apps

Though, I might be able to provide some QA, but likely can't work close on this at least for two next weeks. If there's another approach without concerns on stability, it might be still included in a stable branch, but otherwise there is yet 2-3 months time to workout a good solution for 7.2 that colud be we well tested in the pre release cycle.

Thanks.

@nikic
Copy link
Member

nikic commented Mar 23, 2017

I've implemented an alternative in #2431, which is probably more in line with what @weltling has in mind. If the cache is full it will remove all expired entries from the cache. This pruning is limited to intervals >= ttl, as it is a full cache scan.

To validate the implement, I've used the following test script:

<?php

function createFiles($gen, $n) {
    $dir = __DIR__ . '/rctest' . $gen;
    @mkdir($dir);
    for ($i = 0; $i < $n; $i++) {
        touch("$dir/$i.txt");
    }
}

function accessFiles($gen, $n, $k) {
    $dir = __DIR__ . '/rctest' . $gen;
    if (1) {
        for ($j = 0; $j < $k; $j++) {
            for ($i = 0; $i < $n; $i++) {
                fopen("$dir/$i.txt", "r");
            }
        }
    } else {
        for ($j = 0; $j < $k; $j++) {
            for ($i = 0; $i < $n; $i++) {
                fopen("$dir/$i.txt", "r");
            }
        }
    }
}

echo "Realpath cache size: ", ini_get('realpath_cache_size'), "\n";
echo "Realpath cache TTL: ", ini_get('realpath_cache_ttl'), "\n";

$n = 1000;
$k = 100;

createFiles(1, $n);

$t = microtime(true);
accessFiles(1, $n, $k);
echo "1. gen access time: ", microtime(true) - $t, "\n";

createFiles(2, $n);

$t = microtime(true);
accessFiles(2, $n, $k);
echo "2. gen access time: ", microtime(true) - $t, "\n";

sleep(4);

$t = microtime(true);
accessFiles(2, $n, $k);
echo "2. gen access time after sleep: ", microtime(true) - $t, "\n";

With sample output:

Realpath cache size: 70K
Realpath cache TTL: 4
1. gen access time: 0.28273296356201
2. gen access time: 0.39254808425903
2. gen access time after sleep: 0.28854489326477

The idea behind this benchmark is that we create two generations of files, where the cache size matches the size of one generation. The expectation was that prior to this change the access to the 1. gen is fast (cached) and to the 2. gen slow both before and after we exceed the TTL. After this change the first access to the 2. gen should still be slow, but the second one after expiry should once again be fast.

To my surprise it turned out that even with the current implementation the second (post-TTL) access is fast and the patch only provides a minor improvement. The reason is that the lookup code will remove expired entries not only for the path that is being looked up, but for any path that falls in the same cache bucket.

Because any cache insertion also involves a cache lookup, this means that chances are pretty good that the previous lookup will clean out a bucket, at least in this kind of generational test, which all cache entries expire simultaneously.

Due to this effect, I believe that the currently implemented invalidation strategy is actually much better than we initially assumed. It implies that at least over longer time frames (>>ttl) the realpath cache remains efficient. Of course, the change proposed in this PR (random invalidation without respect for ttl) may improve the situation by reducing the reaction time of the cache to access pattern changes (below the ttl level).

@weltling
Copy link
Contributor

I've now compared both patches and benchmarks crosswise, with same ini -d realpath_cache_size=70k -d realpath_cache_ttl=4 using the two given benchmarks.

vanilla
$ x64\Release\php.exe -n -d realpath_cache_size=70k -d realpath_cache_ttl=4 bugs\realpath\pr2407\var0.php
float(133.771504879)
int(60714)

patch 2407
$ x64\Release\php.exe -n -d realpath_cache_size=70k -d realpath_cache_ttl=4 bugs\realpath\pr2407\var0.php
float(123.37045788765)
int(60714)

PATCH 2431
$ x64\Release\php.exe -n -d realpath_cache_size=70k -d realpath_cache_ttl=4 bugs\realpath\pr2407\var0.php
float(124.28930401802)
int(60714)


vanilla
$ x64\Release\php.exe -n -d realpath_cache_size=70k -d realpath_cache_ttl=4 bugs\realpath\pr2431\var0.php
Realpath cache size: 70k
Realpath cache TTL: 4
1. gen access time: 12.686713933945
2. gen access time: 12.094058990479
2. gen access time after sleep: 13.321831941605

patch 2407
$ x64\Release\php.exe -n -d realpath_cache_size=70k -d realpath_cache_ttl=4 bugs\realpath\pr2431\var0.php
Realpath cache size: 70k
Realpath cache TTL: 4
1. gen access time: 13.221785068512
2. gen access time: 12.695547103882
2. gen access time after sleep: 12.606356859207

patch 2431
$ x64\Release\php.exe -n -d realpath_cache_size=70k -d realpath_cache_ttl=4 bugs\realpath\pr2431\var0.php
Realpath cache size: 70k
Realpath cache TTL: 4
1. gen access time: 10.875287055969
2. gen access time: 12.079700946808
2. gen access time after sleep: 11.939666032791

With the given ini conditions, it is to see some improvement with the bench from #2407 (comment). With the bench from #2407 (comment), seems this PR loses agains the patch in #2431. In the end what it tells to me - both patches optimize for the corresponding benchmark only. From the previous test, the patch from this PR makes it slower with the bigger cache size.

It also came to my attention yet, that we have a similar situation in ext/pcre. Patterns are cached and freed arbitrary. From this point, thinking about LRU might make a profit there, too. In regard of realpath, seems we still need to test with permutatiton matrix of size and ttl. I still think it is more confident to target master with the developments on this. As soon as the cache is full, the eviction algorithm affects the common case. Probably more benchmarks were not bad, but also there should be at least some basic test with some apps like wordpress or symfony. I might have some spare time this WE to experiment on some LRU implementation as well.

Thanks.

@kelunik
Copy link
Member Author

kelunik commented Mar 27, 2017

Unfortunately I don't have much free time to test this extensively currently. @staabm could you redo your tests with Symfony / Wordpress you initially did when creating the bug report?

@weltling
Copy link
Contributor

weltling commented Apr 4, 2017

While i still couldn't come to play with implementation, here's a very matching reading regarding the subject https://blogs.dropbox.com/tech/2012/10/caching-in-theory-and-practice/.

Thanks.

@weltling weltling closed this Apr 11, 2017
@weltling weltling reopened this Apr 11, 2017
@weltling
Copy link
Contributor

I was checking a possible LRU implementation some time ago master...weltling:realpath_lru2 but was led away from by the other topics. So far it's just an experiment that brought me to some more thoughts. Basically, the disctinct cases we talk about are

  • paths don't change, some small amount of random paths might be present
  • paths don't change, but there can be many so they won't fit cache size
  • random paths that may possibly not fit cache size

The current realpath cache implementation is good enough for the first case, which is also the base use case. The second one - is what this PR tries to improve. The third one - currently not handled, when the cache is full, paths are simply not cached.

In general, case one can be solved quite a simple way by observing the system and configuring the realpath cache. With the arbitrary situation it is possible as well, but in some cases might be hard to achieve. From what i've seen in my experiment, especially if paths don't change much, it might make sense to handle the case by the LRU strategy. In that case, it can allow setting the cache size lower while not having to care about TTL. The basic use case is still the simplest way to achieve speed, but in the long run LRU might win in the certain scenario.

For example, one could say - if the TTL setting is set to -1, we'd use the LRU strategy, and otherwise the currently implemented strategy. It my patch, i've tried to combine both strategies, but that doesn't seem to work well, while I haven't seen neither huge performance increase nor significant memory increase. Under certain conditions LRU is quite an option, still. If it were possible to combine both, that would be a good reason to reduce the cache size. So i'm dropping this here for anyone interested to perhaps review the patch and the idea.

Thanks.

@krakjoe
Copy link
Member

krakjoe commented Jul 25, 2017

There seems to be a consensus that actually this isn't really necessary, or at least that there is nothing to gain by merging this implementation (or another that @nikic worked on) ...

This is a topic that may be revisited at some time in the past, but for the moment, I'm closing this PR as the conversation has died and the will evaporated for now ...

Thanks for all the effort folks :)

@krakjoe krakjoe closed this Jul 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants