-
-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lots of calls to the cache to get Schema during the same query #6193
Comments
To give an example, each time the server receives a request, there's at least 4 Would there be a way to save the schema in the request or the database operation to minimize the number of call during one request/cloudfunction. I have a cloud function, where we do 3 queries in parallel, each with one pointer field included (the same). Last question: why is it so important to preserve the exact order of operations through a queue in the cacheAdapter? in which case would it be really annoying to have an unordered operation? a sessionToken that would be invalidated but still accessible for a very few milliseconds? |
I had an idea a while ago about a global singleton schema cache that gets updated every time a class / field is added or removed. I don’t believe this will be a problem for me personally because I use enableSingleSchemaCache. I believe the cache isn’t this way was to minimize side effects and they didn’t want too many change to the controller. We could leverage redis pub sub or mongo change stream (real time mongo updates 3.6+) to reduce side effects. Edit: We could use live query to watch the _Schema class. If only the schema wasn’t stored in the DB, this is an open source framework after all. I agree with you, although the schema cache has been improving, a better solution would be needed. How is your schema so big? @davimacedo @acinader @TomWFox Thoughts? I mentioned this on slack before. |
Not that big but each Schema find opération takes around 2-3ms so in the case above it’s 50ms on a cloud function that takes 80ms to perform... |
Feel free to add test cases so we know how many cache lookup you are using #6060 (comment) Besides that we need a solid solution. This problem is easily solved if a developer has 1 instance of parse server. But for multiple instances running against the same DB is what I’m trying figure out |
@SebC99 How does this sound, we remove the schema cache in favor of a singleton object. The singleton object will always be in sync with the _Schema class in the DB by leveraging the PubSub Adapter. This method would mimic LiveQuery subscribed to the _Schema class. This method would support multiple instances if redis is passed in, remove all side effects, memory leaks and queries will be much much faster because there will be no schema cache lookup. @acinader @davimacedo Thoughts? |
sounds reasonable to me |
So with multiple servers we would still rely on redis? I don't think redis is an issue anyway. But in which case does the order of operations really matters? the only one I can think of is for the user cache. Sessions, roles and schema shouldn't be problematic even with a slight desync I guess. And then, queuing the get commands seems useless, and we could simply have a kind of lock when we have set / put / del commands, and no lock for a get. It's just thought I would not know at all how to implement that, and I'm really not familiar with PubSub sorry! (Typically in my case, we use a Redis Cluster with master & read-only slave so we are not fully synced between write and read operations, and we haven't seen any issue so far) |
It sounds like a good idea and I believe it will be a huge improvement in terms of performance. In most of the applications, the schema does not change that much and we are paying a high cost to make sure it is always updated. The PubSub should solve this problem. |
Yes for the schema fix I'm proposing. Most users already have live query with redis already. We could make it required for multiple instances. Single instance will just use event emitters.
I honestly haven't looked into session, user and roles cache optimization. The queue was built specifically for schema caching (will be removed). We can definitely look into it at some point.
Here is how it works. You subscribe to a channel and get updates whenever you publish to the channel. I have a question. Both, LiveQuery and RedisCacheAdapter use a redisURL. It's recommended to use two different databases. Why? Does redis pubsub store in the database? I don't think so. |
Ideally, we wouldn't need to require redis. Postgress has notify / listen and as you point out mongo has a mechanism for this too so we should be able to do without bringing redis into the mix? |
I'm sure GraphQL has a similar feature too. |
I think GraphQL will not help in this context since it is in the API layer. I'd go with our PubSubAdapter and we could, in the long run, provide multiple options. Im the beginning, we could offer Redis for multiple process deployments and a memory one for single process deployments (not requiring redis to be installed). I think Postgres + MongoDB adapters are also a good idea but I wouldn't rely only on them in the beginning because only the most recent versions have the live stream functionality available. |
PG has supported LISTEN / NOTIFY since version 7 (we tests against 9.5). I'll have some time next week to work on this. @vitaly-t For postgres I'm looking to listen / notify changes made to |
@vitaly-t I just saw that, will probably add in the future. Quick update I got this working but I'm going to leverage |
An optimization can also be done with the |
@dplewis Have you had time to think about this? The number of call the the __SCHEMA__MAIN_SCHEMA is incredible. I don't know why the calls to Redis are sometimes longer as the redis cluster is large enough to handle this, so it may be the queue mechanism... Not sure about this. Happy to discuss this further! |
Yeah, I’m trying to figure out what to remove from the current implementation. Since the solution to sharing across multiple servers has been changed so many times over the years, most of it can be removed. I’ll spend some time next week on a PR |
I was looking at the way mongoDB Then the I don't really see the use of the PubSub adapter here... Am I missing something? |
Don't need the pub sub anymore since we require Mongodb 3.6 as a minimum.
|
Well then, that's a bit more complicated as it means moving the databaseController initialization... I'm afraid I won't be of any help |
Good work! I started look at past PR's (when it was singleCache). Changing the databaseController back should be simple. 09bd9e3#diff-d5ca9e73131b4f7750feeb9b51c43efb If you want to do a PR I can have a look at it and add Postgres as well. |
Sure, the only thing harder to change is the |
Here is it: #6743 |
Closing via #7214 |
In a similar way than #6060, we tried to instrument our Parse-Server production server using AWS-Xray (we're using an Elastic Beanstalk stack), and it seems we do a lot of requests to the cache to load the schema for any single requests (it's very high for some cloudFunctions we have)
When looking at the code, an weird thing is that we often call the
loadSchema()
method then its fulfilled, we called thegetOneSchema()
one on the resulting schemaController, and both of them are getting the full schema from the cache - with a big schema it's not a free call (it's around 5ms each in our case).Is there a specific reason for that? is there room for improvement?
The text was updated successfully, but these errors were encountered: