-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unnecessary Redis Session Locking On All HTTP GET Requests - Affecting PWA Studio Concurrent GraphQL Requests #34758
Comments
Hi @mttjohnson. Thank you for your report.
Make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, Add a comment to the issue:
For more details, review the Magento Contributor Assistant documentation. Add a comment to assign the issue: To learn more about issue processing workflow, refer to the Code Contributions.
🕙 You can find the schedule on the Magento Community Calendar page. 📞 The triage of issues happens in the queue order. If you want to speed up the delivery of your contribution, join the Community Contributions Triage session to discuss the appropriate ticket. 🎥 You can find the recording of the previous Community Contributions Triage on the Magento Youtube Channel ✏️ Feel free to post questions/proposals/feedback related to the Community Contributions Triage process to the corresponding Slack Channel |
@mttjohnson,
|
I confirmed the issue. I think is valid to address this at least in GraphQL where we don't really need session |
We are also facing the same problem using our headless storefront and Magento checkout pages. Although we don't use graphql, our requests towards default and custom rest endpoints are stuck in Redis operations. We contacted Adobe support and they recommended us to disable the session locking and they also mentioned that this is default in the cloud version. I noticed that @colinmollenhour mentioned a few times (1, 2, 3) that the risk of this is low and that it could only cause minor side effects, which is encouraging.
Matt, is this something that actually happened to you or is it just a hypothetical situation? |
@damienwebdev I was reading the release notes on 2.4.5 last week and noticed that it mentioned there was some ability to disable sessions on GraphQL requests, and I think that would solve the big issues for operating Magento with a PWA that makes a lot of concurrent GraphQL requests. While writing to and locking sessions on all other requests hitting the backend is still excessive, it would be nice to confirm if this provides a workaround to this issues I identified above, so we could resolve the critical aspects of this issue. @luishenriqueamaral For the sites I managed we always enforced having session locking enabled, but I have heard reports from other people claiming the issues during checkout when session locking was disabled. I have however witnessed and helped troubleshoot issues where session locking was enabled but long running concurrent requests during checkout placement would cause the session lock to expire and session data similarly getting overwritten, causing payments to be captured without associated orders in the database. Sometimes we'd see this as payments without orders, and other times it would show up as multiple payments from the same initial quote. It's also worth pointing out that I've seen payments without order, and duplicate payments that didn't relate to any session problems, so that symptom can have multiple causes, it is just that session can be related to causing problems. I suspect that most of the session data getting overwritten isn't of much concern, but during checkout if critical data is written to the session, the act of it getting overwritten can result in some seriously difficult to troubleshoot issues in addition to customer service problems. |
I've come up with an acceptable temporary solution for this: colinmollenhour/php-redis-session-abstract#50 To temporarily fix this issue, here are the patches to apply: On And on In fact the A handy way of clearly and consistently reproducing this performance issue is by following these steps:
The
In this situation, the request But the scenario I've just described can also happen when using file storage instead of Redis. I understand the point of @colinmollenhour which says that API requests should not rely on instantiating the session since the common and widely adopted architectural style for building APIs is REST, and being RESTful implies being stateless. And the solution proposed by @mttjohnson brings the burden of having to list every requests that need read-only access to the session in some kind of whitelist (another Magento xml file?). Maybe the definitive solution is for Magento to implement a "last second session state check" with a diff mechanism that would only propagate actual changes made to the session to the write operation, and apply it on all types of session storage (file, memcache or Redis). Both solutions are actually implemented in the patches I supplied at the beginning of this comment. It's now just a matter of moving some part in Magento and another in @colinmollenhour's module, if both parties accept this way of resolving this extremely long running issue once and for all. |
@Yonn-Trimoreau I suppose you're right if you are referencing "Only one process may get a write lock on a session." - it should just read "Only one process may get a lock on a session." but that is just semantics.. As of now when you start a session is is basically assumed that there will be a write, like a "SELECT ... FOR UPDATE". Unlike a database, you can't know if there will be a write until the very end, basically during the "COMMIT". So I don't see how a concept of separate read/write locks translates, but I digress.. For example, let's say you add a product to the cart and so a "Added to cart" success message is added to the session storage. Then two requests are made at once and they both start the session reading the data without waiting. When they write the session they will both delete the message so that's fine, but still you have a different outcome than if you had a read lock (two requests get the success message instead of two). Another example may be a "firstPageAfterLogin" flag or a security nonce that is meant to be read only once. The problem is not serializing the writes in these cases, it is the reads. I'm not familiar with the Magento 2 frontend so I don't know what requests are occurring and why they need to use the session but my general suggestion would be to find which requests modify the session and which ones don't. The ones that don't should just be read-only and the ones that do should just use the normal locking. If you're in a hurry this could even be done outside of Magento 2 code: In if (preg_match('#^/(banner|customer)/#', $_SERVER['REQUEST_URI']??'')) {
define('SESSION_READ_ONLY', TRUE);
} Then replace
$this->connection[$pid] = new \Cm\RedisSession\Handler($this->config, $this->logger, defined('SESSION_READ_ONLY')); The read-only flag is already supported in https://github.com/colinmollenhour/php-redis-session-abstract/blob/master/src/Cm/RedisSession/Handler.php#L263 EDIT: After reading up farther I see/remember this was basically already suggested by @mttjohnson a while back. Using read-only sessions is safer than disabling locking since the read-only sessions can't clobber good data. |
@colinmollenhour Yes I am referencing this sentence "Only one process may get a write lock on a session". And semantics is actually the key to the problem here. A read lock is what LibreOffice or Microsoft Word implements for example. When you open it, a lock file is written next to the file you're editing, just to remind you that you've already opened it in another text editor instance, in case you try to double open it. So that you won't overwrite your data from one instance to another. A write lock is what the filesystem implements, or any NoSQL database, they run write operations one at a time to preserve atomicity of the data which is currently being written. What you're talking about when mentioning the "COMMIT" operation in SQL language is a transaction.
What's a real shame is that Redis actually provide such transactions. But they operate at the entry level, and since the whole session is stored serialized in a single entry, we cannot profit from this. As you can see, the following workflow is needed:
So if Magento takes care of the diff part of the process, your module should expose three methods: And yes, this will change the outcome, 100% agree on that. But since most Magento projects are presently forced to disable_locking for performance reasons, and that works quite ok 95% of the time, we have already proven that Magento can safely work without a read lock. What I would seriously like to remove is the "quite" part of this sentence, by ensuring that there is no random or weird and un-debuggable behavior in all this. Using read-only is half of the answer, since if you have multiple requests that need to write in parallel (and this actually happens in the Magento checkout), you will still have unnecessary waiting time. |
@colinmollenhour We did as you suggested in #34758 (comment) for requests that we're 100% confident we do not care about the session for.
It's not a perfect metric, but you can see where it was deployed and any session variation has dropped off And you can see average response time has dropped in this table, as well as the "max" time dropping way down because we're no longer fighting with session locking etc i think. Cheers for the simple recommendation! |
Yes definitely a simpler and faster solution, if you don't care about locking at all |
@Yonn-Trimoreau definitely a stop gap until we can get magento up to date to latest, this helps with some locking performance before the holiday period at least :) |
@Yonn-Trimoreau did you make any progress on this after the |
@Swahjak sorry no, the patch is working on my customers' ends, and to be frank with you, I know making this MR will take some time, and has probably a 30% chance of being approved and integrated in the core. Lot of efforts for a small chance of success, and since most of my customers are Enterprise Edition customers (that's why I massively encounter the /banner/ajax/load issue...), I would have hoped that a "Severity S0" issue would be taken more seriously by Adobe's team and that they would take up on the production of a definitive fix, based on my last comprehensive explanation + working patch. |
@Yonn-Trimoreau I feel you. Thank you for taking the time to respond, it's appreciated. |
Seems 2.4.5 and above have this flag which may help provided all the graphql endpoints are coded correctly.
|
Monitoring we have suggests this same issue occurs within the admin wysiwyg editor in Magento 2.4.7. When the editor is loaded it will attempt to load the thumbnails with For now i will implement @colinmollenhour's suggestion #34758 (comment) |
Hello, We have implemented @colinmollenhour's suggestion #34758 (comment) and are testing in pre-production environments. We are primarily concerned about improving performance (ie eliminating unnecessary redis locking) around the checkout flow. Wondering if other's could share the request URIs they are targeting. So far we have identified these:
Logging the requests in this flow looks like this (
|
@denniskopitz Looks like a great step forward, thanks for sharing. Why does |
I've mainly been focussing on the backend since that caused the most problems for our clients for that i had:
I don't think it was Magento framework code but a plugin, but i can definitely remember finding some messages being set on the session in some rest endpoints (after wondering why the hell our headless frontend wasn't showing these messages) |
Apologies for the delayed response @colinmollenhour. When testing, we found that setting the |
@indykoning Did your effort improve performance on the admin wysiwyg editor? We have also experienced that issue. |
We have the same issue with Cm\RedisSession\Handler::read 30sec Here is a Redis session config, I've also set debug log level to catch the issue, maybe this can shed light
|
I'm not familiar with Magento 2 checkout - does it send one request per quote item or something like that? Also if any other requests had a fatal error and the session lock was not released that could explain it. E.g. perhaps there is an out of memory error on the request just before the totals-information one. |
The vanilla m2 one step checkout sends everything via ajax to update quote as you enter information, then shipping options, then payment accepted redirects to success page. We utilize Bolt payment which has to communicate to Bolt servers then updates quote back in Magento via API which is why I'm reading this issue. @colinmollenhour can you clarify, if there is a fatal error anywhere in Magento 2 process then it does not release the lock for session? |
@AndresInSpace there is a "catch all" exception management in place. If any exception occurs, the session is closed cleanly. So the lock is released, don't worry about it. @colinmollenhour some situations can lead to having at least 3 Ajax requests sent in parallel in the normal checkout (not one per item, but each request can be time-consuming, especially if the cart contains a lot of items) And we've already discussed this, the 3 definitive solutions are:
@AndresInSpace you should look into using the patches I have provided above in this discussion. They are presently in place on 5 sites and they fix exactly the kind of issue you are describing. They are however not adapted to the latest version of colinmollenhour's library so I can send you an updated version if you want (which a colleague of mine has made compatible with the latest version). @colinmollenhour just to clarify things up: I understand why it would be problematic to do this in your library, as this would be a major breaking change for all your non-Magento users, and I don't think you should change your mind because of Magento's current issue. |
Edit: Please ignore my misunderstanding here :)
|
@AndresInSpace I think you are going out of road with this reasoning. |
My apologies everyone I had a misunderstanding, did not mean to branch the subject. @vadim4err IIRC the |
Preconditions (*)
Helpful Insights
Steps to reproduce (*)
Configure Magento
Load the PWA storefront home page
domain:syseng-seldon.cldev.io -media -static graphql
Simulate the PWA storefront requests on ANY Magento site
Replace the domain variable with the domain of the magento site you're testing
Load the HTML page containing js that will fetch multiple graphql requests concurrently
You can open Chrome Developer Tools on the Network tab and where you see one of the GraphQL calls, you can right click on the request line and "Copy as fetch" to get the javascript fetch statement for making that same request in an html page js inline script.
You can do this to re-create all the requests for a specific page by adding each fetch() call to the html file, or duplicate the exact same fetch() graphql request (40x) to reproduce the session locking behavior being seen here.
Simulate SOME of the expected behavior by globally disabling redis session locking on ALL requests
Expected result (*)
When concurrent GraphQL GET requests are made from a visitor, requests should be able to complete in parallel to keep page load time minimal, while still allowing requests that require session locking (where important session data is being written) in order to prevent some other request from overwriting data in the session. Important data getting overwritten in a session can negatively affect critical application behavior.
In simulating some of the correct behavior with Redis session locking disabled, I was able to load the home page and all 15 graphql requests within a window of 600 ms. There are other pages that may contain many more GraphQL requests where this can be even more important to have concurrent requests complete in parallel.
Looking at a waterfall of how the concurrent GraphQL requests complete, can reveal that multiple requests are completing at or within close to the same time.
Actual result (*)
With Redis session locking enabled which is the default and recommended safe behavior for redis session configs, the concurrent requests queue up, each waiting in sequence for a redis session lock to clear before the next request is able to complete.
This makes it look like several of the graphql requests are taking an excessive amount of time to complete, while others completed in less time, but while these requests started close to the same time, they spent a lot of time waiting for session locking to clear, resulting in requests being completed in sequence rather than in parallel.
Cause of Behavior
With the PWA Studio (Client Side React App) running as the frontend "storefront" of Magento and sending GraphQL calls to the backend of Magento there are different behavioral patterns in how requests are sent to and processed by the web server from how we have been seeing interactions when using the Magento "theme" as the frontend.
The PWA sends multiple concurrent AJAX calls to
/graphql
end points on the web server, and these requests are all processed by the Magento backend PHP application. It turns out that Magento architecture has it creating a session and locking that session regardless of the type of request or response being issued.Some
/graphql
requests are able to be cached by Varnish and because cached requests in Varnish do not execute code and therefore do not interact with redis sessions. This can mask the fact that each request that hits the backend will lock the session for the request and cause requests to be completed in sequence. Many of the graphql requests are not able to be cached in Varnish at this time.We have seen this behavior in the Magento "theme" frontend also, but there are typically only a few requests that typically happen concurrently, and we see these results show up in New Relic on transaction traces a lot for AJAX calls that are likely to be running concurrently with other requests with the same session.
The problem is not just isolated to AJAX calls like
customer/section/load
on the Magento "theme" frontend, it also tends to occasionally interfere with other requests on product pages, or AJAX calls in the checkout. In the example below the unlucky visitor ended up waiting 8.8s to get an initial response for loading the page instead of what should have taken 300ms because they had some other request that was locking their session. It doesn't happen often, but it's not pleasant when it does.The big behavioral difference is that most of the requests in the Magento "theme" frontend do NOT happen concurrently, so session locking on a couple concurrent requests doesn't affect things overall that much, and it tends to be more of an exception and most requests (even customer/section/load AJAX requests) are not delayed waiting for session locks to release on average, so this known deficiency is only causing mild problems with the Magento "theme" frontend overall, and limiting concurrent AJAX calls is a way of working around it.
This session locking causes lots of problems with a PWA by delaying concurrent AJAX calls causing them to wait in line and finish sequentially as the locks they are waiting on clear. This ends up making requests randomly appear like they are taking a really long time to complete while under the hood they are primarily just waiting in line for the session to be available so it can lock the session and complete it's request. This makes it very hard to identify from the client side where the problem is as it relates to concurrent requests for the same session and running the requests independently results in a very fast response.
While it's possible to disable session locking, this can cause serious issues with requests that make changes to session data and overwrite each other, and that can result in serious problems on checkout where payments are captured but orders fail to be saved in Magento resulting sometimes in a customer re-submitting the order and getting charged multiple times if they are able to get the order to complete successfully.
The library used by Magento (v2.3.4) is the latest available (colinmollenhour/php-redis-session-abstract v1.4.4) and it does support the ability to process read only requests without locking a session when the global config is set to utilize session locking... it just so happens that Magento does not utilize this functionality and opens all session connections without specifying if it needs to write to the session or not, thus defaulting to write mode and session locking.
Possible Solution
Requests that come into Magento as
GET
requests are typically expected to return generic publicly identical response that can be cached by Varnish, whilePOST
requests are explicitly not allowed to be cached by Varnish as they are expected to contain visitor/session/customer private data in the responses. Important write operations should typically happen inPOST
requests, and those types of requests would be expected to need and utilize session locking, whileGET
requests would generally be for returning generic data that is not visitor/session/customer specific and the same response would be returned to all requests and not involve any kind of write to the session. If any session write activity were to occur onGET
requests, it would likely be to update the timestamp of the most recent request to indicate the session is still active and has not expired yet (this is an assumption that should be verified).The text was updated successfully, but these errors were encountered: