-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unnecessary Redis Session Locking On All HTTP GET Requests - Affecting PWA Studio Concurrent GraphQL Requests #34758
Comments
Hi @mttjohnson. Thank you for your report.
Make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, Add a comment to the issue:
For more details, review the Magento Contributor Assistant documentation. Add a comment to assign the issue: To learn more about issue processing workflow, refer to the Code Contributions.
🕙 You can find the schedule on the Magento Community Calendar page. 📞 The triage of issues happens in the queue order. If you want to speed up the delivery of your contribution, join the Community Contributions Triage session to discuss the appropriate ticket. 🎥 You can find the recording of the previous Community Contributions Triage on the Magento Youtube Channel ✏️ Feel free to post questions/proposals/feedback related to the Community Contributions Triage process to the corresponding Slack Channel |
@mttjohnson,
|
I confirmed the issue. I think is valid to address this at least in GraphQL where we don't really need session |
Sessions don't start unless something initializes them so it should be possible to add a breakpoint or debug statement and look at the stack trace to see why they are starting in the first place and fix that. There are likely multiple places. |
@colinmollenhour, unfortunately, in Magento 2 it's not true. They're getting initialized automatically because of dependencies to SessionManager since they're started in the SessionManager constructor. magento2/lib/internal/Magento/Framework/Session/SessionManager.php Lines 99 to 124 in 38e2ae5
Some time ago I tried to fix that in #31350, but a lot of tests are getting broken :( |
The session supports disabling locking in the config so if you can figure out how to update the config in this context that would work: https://github.com/colinmollenhour/php-redis-session-abstract/blob/b70508a9b2183d4fc13871cf9138a52fbef776f3/src/Cm/RedisSession/Handler/ConfigInterface.php#L131 |
@colinmollenhour Why the lock sleep time is at least 0.5s ? Can't we sleep less ?
0.5s seems a huge amount of time |
@Nuranto Good question, it wasn't a highly researched conclusion.. The general idea is that such locks should be rare, basically only occurring if the user has some sort of long-running process that is locking the session or the server is being completely overwhelmed by traffic in which case checking the lock too frequently can just make the issue worse. If you have a situation that every page load causes locking delays such as this issue with concurrent GraphQL requests then it was not intended for this purpose and probably locking should be disabled entirely for these requests (since they don't modify the session so there is no point in locking) or ideally the session should not even be started at all since using sessions defeats the purpose of a stateless API design. Even if the lock sleep time is just 10ms, you're still causing the concurrent requests to be serialized so the delays to total page load time will be a lot more than 10ms. |
I am facing this issue in ScandiPWA as well. If the locking is disabled, then if you open multiple new tabs. And then, if you add the product to the cart, it gets destroyed when there are pending page loads. |
@colinmollenhour Maybe make this delay configurable via options. And also, a coworker suggest to randomize a bit delays so we don't have many processes sleeping at the same time, and then retry to lock at the same time too, leading to more locks. And @rsidhaarth is right, disabling the locking can cause bugs... However, I guess we don't have the choice to do that as long as this is not fixed in magento. |
I was thinking it would be worth exploring if it was possible to determine the HTTP request type before setting up the session handler so that it would be possible to pass in a third parameter to /**
* Get connection
*
* @return \Cm\RedisSession\Handler
* @throws SessionException
*/
private function getConnection()
{
$pid = getmypid();
if (!isset($this->connection[$pid])) {
# Determine HTTP request type
$requestType = mysteriousGetHttpRequestTypeMethod(); # returns GET/POST/OPTIONS etc...
# Parameter value to send to session handler indicating
# if locking the session is needed for writing to the session
$readOnlySession = $requestType === 'GET' ? true : false;
try {
$this->connection[$pid] = new \Cm\RedisSession\Handler($this->config, $this->logger, $readOnlySession);
} catch (ConnectionFailedException $e) {
throw new SessionException(new Phrase($e->getMessage()));
}
}
return $this->connection[$pid];
} Setting the session to read only would still allow the session to be read, but only require locking it when important POST requests come through that needs to write to the session, like during checkout where writing the order number to the session is a critical part of the checkout process during payment capture... which is when the session locking is really needed. If a GET request needs to write to the session for something as simple as updating a timestamp in the session, it looks like this wouldn't work as the _readOnly property of the session handler is checked on a write and would just return and ignore the request to write to the session.
I feel like if we dynamically disabled locking on different requests, that could still run into problems as there would be the potential of some random request coming through that could overwrite the value written from a request where session locking was used. Scenario being say something stupid like a GET request that takes for whatever reason 8 seconds to complete (but that never happens right...) and it writes to the session at the end of it's request based on some value it modified from reading the session at the beginning of it's request. If some other request (like and important POST) that enabled locking comes through and happens to send a write after the GET request read it's initial value from the session, the POST request sets up locking on the redis session, and then writes back to the session before the GET request writes, it's value could get overwritten by the long running GET request. I think we still need session locking on all writes to the session to ensure safety, but there should be some way to reduce the frequency of the writes to the session, as this seems unnecessary on most of the requests hitting the backend and at the very least unnecessary for being written as frequently as they are. |
@mttjohnson I'm open to a PR for supporting a no-locking flag in the handler. As a dirty hack you might be able to call I would still encourage the M2 community to focus on avoiding instantiating a session at all for requests that don't actually use it, though. It seems it should be up to the controller to determine if a session is instantiated and it should never be instantiated in an API request, period. |
@mttjohnson couldn't that be done on Magento's side, by disabling the disable_locking parameter when POST ? |
@Nuranto In thinking through this, I don't think we'd want to disable locking on any session writes because that puts the risk of whatever has a write with locking disabled of being able to overwrite session data for a request where locking is enabled and locking enabled requests expecting other write requests to be utilizing locking. The session locking is going to be important on all writes in order to guarantee nothing important gets overwritten... so the best approach would be to eliminate the need for a write (like in the case of an API call as @colinmollenhour has pointed out), and reduce the frequency that a write is performed while still allowing for read only operations. I haven't looked into things to see where in the request lifecycle the session is initially read, and where it's possibly written to, but if it was possible to separate those two operations, I'm wondering if the read operation could be done with a connection established for read-only requests so that the reads are not blocked by any other concurrent write operations and then immediately close the connection... and then if the request later needs to write to the session open a separate connection for writing with locking, re-read the session to ensure nothing has changed, and now that it has a lock it will stay consistent until it has a chance to write and change the data, then write the changes. I don't know how much overhead that would introduce by opening and closing the read-only connection quickly, and then opening a new connection for writing that would incur a read/write operations... as that introduces an additional read operation and connection setup/teardown into the mix, but that would be a possible way of handling it safely and minimally impact session writes so they aren't held open very long. If the session writes are managed separately and something simple is being updated in the session like just a last active timestamp in order to determine when the session should be expired (I haven't actually looked to see specifically what is getting written on every request), I wonder if after reading the session data initially, it could be determined if it should update the session or wait longer before updating the session. It would be possible to set a threshold of say 1-5 minutes so that a session write isn't occurring but once every 1-5 minutes in order to update the last active timestamp or something. If sessions are not expiring for 60 minutes, several hours, or days, then checking it to see if it needs to write, and then not writing but once every 5 minutes would significantly reduce the frequency of session writes for any call where sessions are currently getting written to on every request. |
We are also facing the same problem using our headless storefront and Magento checkout pages. Although we don't use graphql, our requests towards default and custom rest endpoints are stuck in Redis operations. We contacted Adobe support and they recommended us to disable the session locking and they also mentioned that this is default in the cloud version. I noticed that @colinmollenhour mentioned a few times (1, 2, 3) that the risk of this is low and that it could only cause minor side effects, which is encouraging.
Matt, is this something that actually happened to you or is it just a hypothetical situation? |
@damienwebdev I was reading the release notes on 2.4.5 last week and noticed that it mentioned there was some ability to disable sessions on GraphQL requests, and I think that would solve the big issues for operating Magento with a PWA that makes a lot of concurrent GraphQL requests. While writing to and locking sessions on all other requests hitting the backend is still excessive, it would be nice to confirm if this provides a workaround to this issues I identified above, so we could resolve the critical aspects of this issue. @luishenriqueamaral For the sites I managed we always enforced having session locking enabled, but I have heard reports from other people claiming the issues during checkout when session locking was disabled. I have however witnessed and helped troubleshoot issues where session locking was enabled but long running concurrent requests during checkout placement would cause the session lock to expire and session data similarly getting overwritten, causing payments to be captured without associated orders in the database. Sometimes we'd see this as payments without orders, and other times it would show up as multiple payments from the same initial quote. It's also worth pointing out that I've seen payments without order, and duplicate payments that didn't relate to any session problems, so that symptom can have multiple causes, it is just that session can be related to causing problems. I suspect that most of the session data getting overwritten isn't of much concern, but during checkout if critical data is written to the session, the act of it getting overwritten can result in some seriously difficult to troubleshoot issues in addition to customer service problems. |
I've come up with an acceptable temporary solution for this: colinmollenhour/php-redis-session-abstract#50 To temporarily fix this issue, here are the patches to apply: On And on In fact the A handy way of clearly and consistently reproducing this performance issue is by following these steps:
The
In this situation, the request But the scenario I've just described can also happen when using file storage instead of Redis. I understand the point of @colinmollenhour which says that API requests should not rely on instantiating the session since the common and widely adopted architectural style for building APIs is REST, and being RESTful implies being stateless. And the solution proposed by @mttjohnson brings the burden of having to list every requests that need read-only access to the session in some kind of whitelist (another Magento xml file?). Maybe the definitive solution is for Magento to implement a "last second session state check" with a diff mechanism that would only propagate actual changes made to the session to the write operation, and apply it on all types of session storage (file, memcache or Redis). Both solutions are actually implemented in the patches I supplied at the beginning of this comment. It's now just a matter of moving some part in Magento and another in @colinmollenhour's module, if both parties accept this way of resolving this extremely long running issue once and for all. |
@Yonn-Trimoreau I suppose you're right if you are referencing "Only one process may get a write lock on a session." - it should just read "Only one process may get a lock on a session." but that is just semantics.. As of now when you start a session is is basically assumed that there will be a write, like a "SELECT ... FOR UPDATE". Unlike a database, you can't know if there will be a write until the very end, basically during the "COMMIT". So I don't see how a concept of separate read/write locks translates, but I digress.. For example, let's say you add a product to the cart and so a "Added to cart" success message is added to the session storage. Then two requests are made at once and they both start the session reading the data without waiting. When they write the session they will both delete the message so that's fine, but still you have a different outcome than if you had a read lock (two requests get the success message instead of two). Another example may be a "firstPageAfterLogin" flag or a security nonce that is meant to be read only once. The problem is not serializing the writes in these cases, it is the reads. I'm not familiar with the Magento 2 frontend so I don't know what requests are occurring and why they need to use the session but my general suggestion would be to find which requests modify the session and which ones don't. The ones that don't should just be read-only and the ones that do should just use the normal locking. If you're in a hurry this could even be done outside of Magento 2 code: In if (preg_match('#^/(banner|customer)/#', $_SERVER['REQUEST_URI']??'')) {
define('SESSION_READ_ONLY', TRUE);
} Then replace
$this->connection[$pid] = new \Cm\RedisSession\Handler($this->config, $this->logger, defined('SESSION_READ_ONLY')); The read-only flag is already supported in https://github.com/colinmollenhour/php-redis-session-abstract/blob/master/src/Cm/RedisSession/Handler.php#L263 EDIT: After reading up farther I see/remember this was basically already suggested by @mttjohnson a while back. Using read-only sessions is safer than disabling locking since the read-only sessions can't clobber good data. |
@colinmollenhour Yes I am referencing this sentence "Only one process may get a write lock on a session". And semantics is actually the key to the problem here. A read lock is what LibreOffice or Microsoft Word implements for example. When you open it, a lock file is written next to the file you're editing, just to remind you that you've already opened it in another text editor instance, in case you try to double open it. So that you won't overwrite your data from one instance to another. A write lock is what the filesystem implements, or any NoSQL database, they run write operations one at a time to preserve atomicity of the data which is currently being written. What you're talking about when mentioning the "COMMIT" operation in SQL language is a transaction.
What's a real shame is that Redis actually provide such transactions. But they operate at the entry level, and since the whole session is stored serialized in a single entry, we cannot profit from this. As you can see, the following workflow is needed:
So if Magento takes care of the diff part of the process, your module should expose three methods: And yes, this will change the outcome, 100% agree on that. But since most Magento projects are presently forced to disable_locking for performance reasons, and that works quite ok 95% of the time, we have already proven that Magento can safely work without a read lock. What I would seriously like to remove is the "quite" part of this sentence, by ensuring that there is no random or weird and un-debuggable behavior in all this. Using read-only is half of the answer, since if you have multiple requests that need to write in parallel (and this actually happens in the Magento checkout), you will still have unnecessary waiting time. |
@colinmollenhour We did as you suggested in #34758 (comment) for requests that we're 100% confident we do not care about the session for.
It's not a perfect metric, but you can see where it was deployed and any session variation has dropped off And you can see average response time has dropped in this table, as well as the "max" time dropping way down because we're no longer fighting with session locking etc i think. Cheers for the simple recommendation! |
Yes definitely a simpler and faster solution, if you don't care about locking at all |
@Yonn-Trimoreau definitely a stop gap until we can get magento up to date to latest, this helps with some locking performance before the holiday period at least :) |
@Yonn-Trimoreau did you make any progress on this after the |
@Swahjak sorry no, the patch is working on my customers' ends, and to be frank with you, I know making this MR will take some time, and has probably a 30% chance of being approved and integrated in the core. Lot of efforts for a small chance of success, and since most of my customers are Enterprise Edition customers (that's why I massively encounter the /banner/ajax/load issue...), I would have hoped that a "Severity S0" issue would be taken more seriously by Adobe's team and that they would take up on the production of a definitive fix, based on my last comprehensive explanation + working patch. |
@Yonn-Trimoreau I feel you. Thank you for taking the time to respond, it's appreciated. |
Seems 2.4.5 and above have this flag which may help provided all the graphql endpoints are coded correctly.
|
Monitoring we have suggests this same issue occurs within the admin wysiwyg editor in Magento 2.4.7. When the editor is loaded it will attempt to load the thumbnails with For now i will implement @colinmollenhour's suggestion #34758 (comment) |
Hello, We have implemented @colinmollenhour's suggestion #34758 (comment) and are testing in pre-production environments. We are primarily concerned about improving performance (ie eliminating unnecessary redis locking) around the checkout flow. Wondering if other's could share the request URIs they are targeting. So far we have identified these:
Logging the requests in this flow looks like this (
|
Preconditions (*)
Helpful Insights
Steps to reproduce (*)
Configure Magento
Load the PWA storefront home page
domain:syseng-seldon.cldev.io -media -static graphql
Simulate the PWA storefront requests on ANY Magento site
Replace the domain variable with the domain of the magento site you're testing
Load the HTML page containing js that will fetch multiple graphql requests concurrently
You can open Chrome Developer Tools on the Network tab and where you see one of the GraphQL calls, you can right click on the request line and "Copy as fetch" to get the javascript fetch statement for making that same request in an html page js inline script.
You can do this to re-create all the requests for a specific page by adding each fetch() call to the html file, or duplicate the exact same fetch() graphql request (40x) to reproduce the session locking behavior being seen here.
Simulate SOME of the expected behavior by globally disabling redis session locking on ALL requests
Expected result (*)
When concurrent GraphQL GET requests are made from a visitor, requests should be able to complete in parallel to keep page load time minimal, while still allowing requests that require session locking (where important session data is being written) in order to prevent some other request from overwriting data in the session. Important data getting overwritten in a session can negatively affect critical application behavior.
In simulating some of the correct behavior with Redis session locking disabled, I was able to load the home page and all 15 graphql requests within a window of 600 ms. There are other pages that may contain many more GraphQL requests where this can be even more important to have concurrent requests complete in parallel.
Looking at a waterfall of how the concurrent GraphQL requests complete, can reveal that multiple requests are completing at or within close to the same time.
Actual result (*)
With Redis session locking enabled which is the default and recommended safe behavior for redis session configs, the concurrent requests queue up, each waiting in sequence for a redis session lock to clear before the next request is able to complete.
This makes it look like several of the graphql requests are taking an excessive amount of time to complete, while others completed in less time, but while these requests started close to the same time, they spent a lot of time waiting for session locking to clear, resulting in requests being completed in sequence rather than in parallel.
Cause of Behavior
With the PWA Studio (Client Side React App) running as the frontend "storefront" of Magento and sending GraphQL calls to the backend of Magento there are different behavioral patterns in how requests are sent to and processed by the web server from how we have been seeing interactions when using the Magento "theme" as the frontend.
The PWA sends multiple concurrent AJAX calls to
/graphql
end points on the web server, and these requests are all processed by the Magento backend PHP application. It turns out that Magento architecture has it creating a session and locking that session regardless of the type of request or response being issued.Some
/graphql
requests are able to be cached by Varnish and because cached requests in Varnish do not execute code and therefore do not interact with redis sessions. This can mask the fact that each request that hits the backend will lock the session for the request and cause requests to be completed in sequence. Many of the graphql requests are not able to be cached in Varnish at this time.We have seen this behavior in the Magento "theme" frontend also, but there are typically only a few requests that typically happen concurrently, and we see these results show up in New Relic on transaction traces a lot for AJAX calls that are likely to be running concurrently with other requests with the same session.
The problem is not just isolated to AJAX calls like
customer/section/load
on the Magento "theme" frontend, it also tends to occasionally interfere with other requests on product pages, or AJAX calls in the checkout. In the example below the unlucky visitor ended up waiting 8.8s to get an initial response for loading the page instead of what should have taken 300ms because they had some other request that was locking their session. It doesn't happen often, but it's not pleasant when it does.The big behavioral difference is that most of the requests in the Magento "theme" frontend do NOT happen concurrently, so session locking on a couple concurrent requests doesn't affect things overall that much, and it tends to be more of an exception and most requests (even customer/section/load AJAX requests) are not delayed waiting for session locks to release on average, so this known deficiency is only causing mild problems with the Magento "theme" frontend overall, and limiting concurrent AJAX calls is a way of working around it.
This session locking causes lots of problems with a PWA by delaying concurrent AJAX calls causing them to wait in line and finish sequentially as the locks they are waiting on clear. This ends up making requests randomly appear like they are taking a really long time to complete while under the hood they are primarily just waiting in line for the session to be available so it can lock the session and complete it's request. This makes it very hard to identify from the client side where the problem is as it relates to concurrent requests for the same session and running the requests independently results in a very fast response.
While it's possible to disable session locking, this can cause serious issues with requests that make changes to session data and overwrite each other, and that can result in serious problems on checkout where payments are captured but orders fail to be saved in Magento resulting sometimes in a customer re-submitting the order and getting charged multiple times if they are able to get the order to complete successfully.
The library used by Magento (v2.3.4) is the latest available (colinmollenhour/php-redis-session-abstract v1.4.4) and it does support the ability to process read only requests without locking a session when the global config is set to utilize session locking... it just so happens that Magento does not utilize this functionality and opens all session connections without specifying if it needs to write to the session or not, thus defaulting to write mode and session locking.
Possible Solution
Requests that come into Magento as
GET
requests are typically expected to return generic publicly identical response that can be cached by Varnish, whilePOST
requests are explicitly not allowed to be cached by Varnish as they are expected to contain visitor/session/customer private data in the responses. Important write operations should typically happen inPOST
requests, and those types of requests would be expected to need and utilize session locking, whileGET
requests would generally be for returning generic data that is not visitor/session/customer specific and the same response would be returned to all requests and not involve any kind of write to the session. If any session write activity were to occur onGET
requests, it would likely be to update the timestamp of the most recent request to indicate the session is still active and has not expired yet (this is an assumption that should be verified).The text was updated successfully, but these errors were encountered: