-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix performance issue in cache locking mechanism #22829
Fix performance issue in cache locking mechanism #22829
Conversation
Hi @IvanChepurnyi. Thank you for your contribution
For more details, please, review the Magento Contributor Assistant documentation |
Static test failures related to legacy code in AbstractBlock that are unrelated to the current PR. |
lib/internal/Magento/Framework/Cache/LockGuardedCacheLoader.php
Outdated
Show resolved
Hide resolved
Still, don't get why we need locks on write and how it will resolve problem? All client will try to write to cache storage and wait on lock |
…to wait for a lock in this case
The latest update exactly resolves an issue, test it with the load test and put logging on a number of writes happening during the flush operation. |
Well, now only one customer will write data. |
@kandy I would say something is wrong if you want to cache a number of lines in the 100GB file. Locking is currently used only in 2 places: config cache and block cache. |
This is just an example. You can choose any resource-intensive operation. |
Can you explain how this theoretical problem is affecting this specific pull request? The second option with continuous lock is still available for such cases - an original method is not removed. PS: Also, as probably you know, I am not a fan of optimization by caching, but for some places caching is inevitable. For example in resource-intensive operations, if this data can be stored and invalidated by any other means than volatile cache storage, I prefer that method. Usually, resource-intensive operations can be optimized by using indexation, using a better algorithm, building memory based microservice, etc. |
Well, let's discuss the initial implementation with your suggestion. PS: agree that if the cache has overheads comparable to the generation of data, we must delete the cache |
@kandy The reason is simple, webshop needs to sell goods to make money for the merchant. If it saves CPU cycles to make sure only one non-cached request happen, the merchant won't be happy. When all the requests are waiting for a released lock, then nothing can be sold. PHP-FPM pool gets used up and NGINX receives a growing number of incoming connections it cannot resolve to send through to fpm, so it responds with 503 error. All of these just because of a single lock. In the second case, requests are processed, yes slower than with a case when we have cached data, but still, nothing is putting a plug on FPM pool and server responds, hoster can scale by adding additional frontend nodes - everyone is happy! |
your sentence is correct only in case when time on generation is less than time on the check of lock present + sleep timeout. |
@kandy I am not talking theoretically, I am talking out of my experience. I've run into similar errors before when was working directly with merchants who received huge traffic where these issues were real. Mutex is not the answer, especially for the cache, especially in PHP. |
@kandy @IvanChepurnyi - What do you think if we will have community architecture call and discuss the solution, instead of typing a lot of text? |
@IvanChepurnyi I more deeply analyzed your solution and think it will resolve problems with locks on reading, especially on edge cases where cache disabled or data always return false. |
@vzabaznov @slavvka Wow. This PR was finally merged even after it was closed? 🤯 |
@ihor-sviziev Yes, it was closed because the related issue was closed. Once the code synced it got merged. I was also amazed :) |
Thank you @slavvka. I look forward to upgrading to 2.4 now |
We're also facing this issue, is there a patch available for 2.3? |
@SilvanLaroo you can extract a patch from this pull request but you have to separate patched files into composer patches per package as PR has diff file for a monorepo. |
did anyone face installation issues with this patch since 2.3.5? When we run a setup with an existing codebase and env.php on an empty database (
|
@riconeitzel if you’ll be able to reproduce this issue on 2.4-develop branch - please report it as separate issue. Thank you! |
if it's broken on 2.3.x not? |
@riconeitzel are you sure this pull request is a source of your issue? As seen in your stack trace, it is calling |
@IvanChepurnyi this patch isn't compatible with 2.3.5-p2, right? 😢 |
We encountered an issue after applying this patch on a Magento 2.3.5-p2 + Varnish 5. When a Varnish request was made to the homepage for the first time, the X-Magento-Tags were missing. We only got the tags on the next requests to the same page. The "fix" was removing this patch |
Hi @serbaninno, |
[$this->currentLock] | ||
) | ||
); | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This creates infinite loop if Database is used for AbstractBlock locking. All next $this->locker->lock()
calls return false.
If a cacheable block has a child cacheable block, the LockGuardedCacheLoader::lockedLoadData()
call is nested, but since there is already a lock in the Database backend, and it does not depend on the lock name, lock() returns false. See 2 lockedLoadData()
in this call stack.
LockGuardedCacheLoader.php:111, Magento\Framework\Cache\LockGuardedCacheLoader->lockedLoadData()
AbstractBlock.php:1125, Amasty\Label\Block\Label->_loadCache()
AbstractBlock.php:674, Amasty\Label\Block\Label->toHtml()
...
Interceptor.php:429, Magento\Catalog\Block\Product\Image\Interceptor->toHtml()
grid.phtml:52, include()
...
Template.php:301, Magento\CatalogWidget\Block\Product\ProductsList\Interceptor->_toHtml()
AbstractBlock.php:1100, Magento\CatalogWidget\Block\Product\ProductsList\Interceptor->Magento\Framework\View\Element\{closure:/var/www/html/vendor/magento/framework/View/Element/AbstractBlock.php:1094-1101}()
LockGuardedCacheLoader.php:103, Magento\Framework\Cache\LockGuardedCacheLoader->lockedLoadData()
AbstractBlock.php:1125, Magento\CatalogWidget\Block\Product\ProductsList\Interceptor->_loadCache()
AbstractBlock.php:674, Magento\CatalogWidget\Block\Product\ProductsList\Interceptor->toHtml()
Interceptor.php:830, Magento\CatalogWidget\Block\Product\ProductsList\Interceptor->toHtml()
Filter.php:121, Magento\Widget\Model\Template\Filter\Interceptor->generateWidget()
Interceptor.php:24, Magento\Widget\Model\Template\Filter\Interceptor->generateWidget()
Filter.php:132, Magento\Widget\Model\Template\Filter\Interceptor->widgetDirective()
Interceptor.php:37, Magento\Widget\Model\Template\Filter\Interceptor->widgetDirective()
LegacyDirective.php:42, ReflectionMethod->invokeArgs()
LegacyDirective.php:42, Magento\Framework\Filter\DirectiveProcessor\LegacyDirective->process()
Template.php:184, Magento\Widget\Model\Template\Filter\Interceptor->filter()
Filter.php:1080, Magento\Widget\Model\Template\Filter\Interceptor->filter()
Interceptor.php:58, Magento\Widget\Model\Template\Filter\Interceptor->___callParent()
Interceptor.php:138, Magento\Widget\Model\Template\Filter\Interceptor->Magento\Framework\Interception\{closure:/var/www/html/vendor/magento/framework/Interception/Interceptor.php:104-151}()
Interceptor.php:153, Magento\Widget\Model\Template\Filter\Interceptor->___callPlugins()
Interceptor.php:403, Magento\Widget\Model\Template\Filter\Interceptor->filter()
Page.php:170, Magento\VersionsCms\Block\Cms\Page->_toHtml()
AbstractBlock.php:1100, Magento\VersionsCms\Block\Cms\Page->Magento\Framework\View\Element\{closure:/var/www/html/vendor/magento/framework/View/Element/AbstractBlock.php:1094-1101}()
AbstractBlock.php:1104, Magento\VersionsCms\Block\Cms\Page->_loadCache()
AbstractBlock.php:674, Magento\VersionsCms\Block\Cms\Page->toHtml()
...
Http.php:120, Magento\Framework\App\Http\Interceptor->launch()
Interceptor.php:24, Magento\Framework\App\Http\Interceptor->launch()
Bootstrap.php:261, Magento\Framework\App\Bootstrap->run()
index.php:40, {main}()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please report a separate issue with all details for verifying and than fixing that?
We're having the same issues with very long load times on a 2.3.5-p2 shop, where sometimes page loading takes around 30 seconds and lockedLoadData is just busy waiting. We may upgrade to 2.4.x, but I'm a bit unsure if this really helps. This merge merges initial parts of this PR and is tagged with 2.4.2, but if I look e.g. at the test file in the tagged 2.4.2 version, it does not mention the non blocking variant and even the history does not mention fixes in these files. Can someone clear this up? @vzabaznov @ihor-sviziev is this really merged in 2.4.x and fixed? Or was the non blocking lock method removed again somehow? |
Probably it was merged and then removed, as they implemented something own; I don't think that rewritten solution is efficient and really solves the concurrency issue, but I already gave up on the original pull request. Maybe someday I will make it as an installable module that can implement a non-blocking cache, but that one is far from even the start of development as I am too busy right now. |
Hello all. On our project the config cache is huge because we have a lot of websites. Each time the config cache was flushed (config save in backoffice, or during imports or any other reason), I was seing on PHP process generating the configuration. And the other processes waiting for the lock release. After enabling l2 stale cache on the config as suggested by @kandy ☝️ , it worked like magic, and our problems went away ! Important thing : we only enabled the l2 cache for the config cache. Because I heard from colleagues that enabling it for all the caches slows overall response times, and also as it basically duplicates the cache on all local fronts, I did not want to overload my web front memories. Here is an extract from env.php:
In order to test it. On a single web front on dev or preprod: watch the php fpm processes piling up in one terminal window:
on another window, run a then quickly run 4 curls in background, and observe the response time, the locking and how processes pile up.
before the config fix: 1 process taking 30s, 3 processes taking 15s |
Description
This change replaces the locking strategy for resolving race condition in the configuration cache loader.
The previous implementation was waiting for the release of the lock for all concurrent connections when the cache was flushed. It was redundant to do as the whole website visitors wait for a lock release.
The new implementation creates a lock only for the write operation and the other connections who try to write it just return data uncached without performing any write operations. This approach removes any possibility of cyclic locking.
Fixed Issues
Manual testing scenarios
Contribution checklist (*)