Lots of calls to the cache to get Schema during the same query #6193

SebC99 · 2019-11-06T20:15:35Z

In a similar way than #6060, we tried to instrument our Parse-Server production server using AWS-Xray (we're using an Elastic Beanstalk stack), and it seems we do a lot of requests to the cache to load the schema for any single requests (it's very high for some cloudFunctions we have)

When looking at the code, an weird thing is that we often call the loadSchema() method then its fulfilled, we called the getOneSchema() one on the resulting schemaController, and both of them are getting the full schema from the cache - with a big schema it's not a free call (it's around 5ms each in our case).

Is there a specific reason for that? is there room for improvement?

The text was updated successfully, but these errors were encountered:

SebC99 · 2019-11-06T22:48:26Z

To give an example, each time the server receives a request, there's at least 4 cache.get operation for the MAIN_SCHEMA, 2 for the _Session object, then 2 for the _User.

Would there be a way to save the schema in the request or the database operation to minimize the number of call during one request/cloudfunction.

I have a cloud function, where we do 3 queries in parallel, each with one pointer field included (the same).
And for that cloud function, we have 24 calls to the MAIN_SCHEMA through Redis...

Last question: why is it so important to preserve the exact order of operations through a queue in the cacheAdapter? in which case would it be really annoying to have an unordered operation? a sessionToken that would be invalidated but still accessible for a very few milliseconds?
Thanks!

dplewis · 2019-11-07T02:50:32Z

I had an idea a while ago about a global singleton schema cache that gets updated every time a class / field is added or removed. I don’t believe this will be a problem for me personally because I use enableSingleSchemaCache.

I believe the cache isn’t this way was to minimize side effects and they didn’t want too many change to the controller.

We could leverage redis pub sub or mongo change stream (real time mongo updates 3.6+) to reduce side effects. Edit: We could use live query to watch the _Schema class. If only the schema wasn’t stored in the DB, this is an open source framework after all.

I agree with you, although the schema cache has been improving, a better solution would be needed.

How is your schema so big?

@davimacedo @acinader @TomWFox Thoughts? I mentioned this on slack before.

SebC99 · 2019-11-07T07:06:42Z

Not that big but each Schema find opération takes around 2-3ms so in the case above it’s 50ms on a cloud function that takes 80ms to perform...

dplewis · 2019-11-07T07:12:48Z

Feel free to add test cases so we know how many cache lookup you are using #6060 (comment)

Besides that we need a solid solution. This problem is easily solved if a developer has 1 instance of parse server. But for multiple instances running against the same DB is what I’m trying figure out

dplewis · 2019-11-07T22:29:15Z

@SebC99 How does this sound, we remove the schema cache in favor of a singleton object.

The singleton object will always be in sync with the _Schema class in the DB by leveraging the PubSub Adapter. This method would mimic LiveQuery subscribed to the _Schema class.

This method would support multiple instances if redis is passed in, remove all side effects, memory leaks and queries will be much much faster because there will be no schema cache lookup.

@acinader @davimacedo Thoughts?

acinader · 2019-11-07T22:38:25Z

sounds reasonable to me

SebC99 · 2019-11-07T22:50:32Z

So with multiple servers we would still rely on redis? I don't think redis is an issue anyway.

But in which case does the order of operations really matters? the only one I can think of is for the user cache. Sessions, roles and schema shouldn't be problematic even with a slight desync I guess.
So we could use a queue only for the user.

And then, queuing the get commands seems useless, and we could simply have a kind of lock when we have set / put / del commands, and no lock for a get.

It's just thought I would not know at all how to implement that, and I'm really not familiar with PubSub sorry!

(Typically in my case, we use a Redis Cluster with master & read-only slave so we are not fully synced between write and read operations, and we haven't seen any issue so far)

davimacedo · 2019-11-07T22:54:35Z

It sounds like a good idea and I believe it will be a huge improvement in terms of performance. In most of the applications, the schema does not change that much and we are paying a high cost to make sure it is always updated. The PubSub should solve this problem.

dplewis · 2019-11-07T23:28:22Z

So with multiple servers we would still rely on redis?

Yes for the schema fix I'm proposing. Most users already have live query with redis already. We could make it required for multiple instances. Single instance will just use event emitters.

But in which case does the order of operations really matters?

I honestly haven't looked into session, user and roles cache optimization. The queue was built specifically for schema caching (will be removed). We can definitely look into it at some point.

I'm really not familiar with PubSub sorry!

Here is how it works. You subscribe to a channel and get updates whenever you publish to the channel.
https://docs.parseplatform.org/parse-server/guide/#scalability

I have a question. Both, LiveQuery and RedisCacheAdapter use a redisURL. It's recommended to use two different databases. Why? Does redis pubsub store in the database? I don't think so.

acinader · 2019-11-07T23:35:22Z

Ideally, we wouldn't need to require redis. Postgress has notify / listen and as you point out mongo has a mechanism for this too so we should be able to do without bringing redis into the mix?

dplewis · 2019-11-07T23:45:02Z

I'm sure GraphQL has a similar feature too.

davimacedo · 2019-11-07T23:57:10Z

I think GraphQL will not help in this context since it is in the API layer. I'd go with our PubSubAdapter and we could, in the long run, provide multiple options. Im the beginning, we could offer Redis for multiple process deployments and a memory one for single process deployments (not requiring redis to be installed). I think Postgres + MongoDB adapters are also a good idea but I wouldn't rely only on them in the beginning because only the most recent versions have the live stream functionality available.

dplewis · 2019-11-08T00:37:17Z

PG has supported LISTEN / NOTIFY since version 7 (we tests against 9.5).
Mongo supported real time updates since 3.6.

I'll have some time next week to work on this.

@vitaly-t For postgres I'm looking to listen / notify changes made to _Schema table (maybe need a trigger for that?). Do you have example code for LISTEN / NOTIFY in pg-promise?

vitaly-t · 2019-11-08T08:13:41Z

@dplewis https://github.com/vitaly-t/pg-promise/wiki/Learn-by-Example#listen--notify

dplewis · 2019-12-24T01:54:57Z

@vitaly-t I just saw that, will probably add in the future.

Quick update I got this working but I'm going to leverage enableSingleSchemaCache for backwards compatibility. Will try to get it in ASAP.

Moumouls · 2020-04-04T16:50:24Z

An optimization can also be done with the graphql dataloader tools:
https://github.com/graphql/dataloader
Multi-parallel cache requests on the same server will be grouped into one during the same NodeJS event loop.

SebC99 · 2020-06-06T07:51:22Z

@dplewis Have you had time to think about this?
To illustrate more, we've improved our tracing ability, and this is what a simple cloudFunction looks like for us with enableSingleSchemaCache activated:

The number of call the the __SCHEMA__MAIN_SCHEMA is incredible.
And as we use Redis for the Cache Adapter, all this calls go to Redis, whereas we could keep the memory Cache for the Schema (Session cache has to be shared with multiple servers, but Schema cache?).

I don't know why the calls to Redis are sometimes longer as the redis cluster is large enough to handle this, so it may be the queue mechanism... Not sure about this.

Happy to discuss this further!
Thanks

dplewis · 2020-06-06T20:32:06Z

Yeah, I’m trying to figure out what to remove from the current implementation. Since the solution to sharing across multiple servers has been changed so many times over the years, most of it can be removed.

I’ll spend some time next week on a PR

SebC99 · 2020-06-18T18:36:08Z

I was looking at the way mongoDB Collection.watch works, and it seems almost easy to use.
In the MongoSchemaCollection, we could start the watch stream in the constructor and save the last version of the schema in a static variable (we just need to ensure we don't start multiple streams while instantiating multiple MongoSchemaCollection)

Then the SchemaController could ignore completely the cache and could directly call the storageAdapter getAllClasses method, that would call the MongoSchemaCollection stored value.

I don't really see the use of the PubSub adapter here... Am I missing something?
The only thing I don't really know would be where to store the schema, in the MongoSchemaCollection, in the storageAdapter, in the SchemaController?

dplewis · 2020-06-18T18:40:59Z

Don't need the pub sub anymore since we require Mongodb 3.6 as a minimum.

Initialize the DatabaseController once (currently created for every request)
Create a new SingleCache and refactor the existing schema controller.
Minimal Cleanup

SebC99 · 2020-06-19T19:33:11Z

Well then, that's a bit more complicated as it means moving the databaseController initialization... I'm afraid I won't be of any help
I thought a more "simple" way would work, something like that:
hulab@253708b

dplewis · 2020-06-19T19:46:21Z

Good work! I started look at past PR's (when it was singleCache). Changing the databaseController back should be simple.

09bd9e3#diff-d5ca9e73131b4f7750feeb9b51c43efb

If you want to do a PR I can have a look at it and add Postgres as well.

SebC99 · 2020-06-19T21:50:18Z

Sure, the only thing harder to change is the clearSchemaCache method of the PromiseRouter as the databaseController not longer have access to the schema storage...
But it's only used without the enableSingleSchemaCache flag

SebC99 · 2020-06-19T21:53:54Z

Here is it: #6743
I'm not a big fan of the simple object for the singleSchemaCache object in the SchemaController but it was the simplest thing to do

dplewis · 2021-03-16T21:10:00Z

Closing via #7214

dplewis added enhancement type:question Support or code-level question labels Nov 8, 2019

dplewis mentioned this issue Dec 24, 2019

Parse Server much slower than Parse.com with same database #2654

Closed

dplewis mentioned this issue Feb 13, 2020

Memory Leak On Queries (Due To Caching/Certain Parse-Server And Node Versions) #6405

Closed

stale bot added the stale label Nov 8, 2020

mtrezza removed the type:question Support or code-level question label Nov 8, 2020

mtrezza removed the stale label Nov 8, 2020

SebC99 mentioned this issue Nov 13, 2020

Attempt to remove the schemaCache #6743

Closed

3 tasks

parse-community deleted a comment from stale bot Nov 13, 2020

mtrezza mentioned this issue Nov 24, 2020

There are a lot of logs as CREATE TABLE IF NOT EXISTS "_SCHEMA" #7021

Closed

4 tasks

dplewis mentioned this issue Feb 21, 2021

Improve single schema cache #7214

Merged

8 tasks

dplewis closed this as completed Mar 16, 2021

mtrezza added type:feature New feature or improvement of existing feature and removed type:improvement labels Dec 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lots of calls to the cache to get Schema during the same query #6193

Lots of calls to the cache to get Schema during the same query #6193

SebC99 commented Nov 6, 2019

SebC99 commented Nov 6, 2019

dplewis commented Nov 7, 2019 •

edited

Loading

SebC99 commented Nov 7, 2019

dplewis commented Nov 7, 2019

dplewis commented Nov 7, 2019 •

edited

Loading

acinader commented Nov 7, 2019

SebC99 commented Nov 7, 2019

davimacedo commented Nov 7, 2019

dplewis commented Nov 7, 2019

acinader commented Nov 7, 2019

dplewis commented Nov 7, 2019

davimacedo commented Nov 7, 2019

dplewis commented Nov 8, 2019 •

edited

Loading

vitaly-t commented Nov 8, 2019

dplewis commented Dec 24, 2019

Moumouls commented Apr 4, 2020

SebC99 commented Jun 6, 2020

dplewis commented Jun 6, 2020

SebC99 commented Jun 18, 2020 •

edited

Loading

dplewis commented Jun 18, 2020

SebC99 commented Jun 19, 2020

dplewis commented Jun 19, 2020

SebC99 commented Jun 19, 2020

SebC99 commented Jun 19, 2020

dplewis commented Mar 16, 2021

Lots of calls to the cache to get Schema during the same query #6193

Lots of calls to the cache to get Schema during the same query #6193

Comments

SebC99 commented Nov 6, 2019

SebC99 commented Nov 6, 2019

dplewis commented Nov 7, 2019 • edited Loading

SebC99 commented Nov 7, 2019

dplewis commented Nov 7, 2019

dplewis commented Nov 7, 2019 • edited Loading

acinader commented Nov 7, 2019

SebC99 commented Nov 7, 2019

davimacedo commented Nov 7, 2019

dplewis commented Nov 7, 2019

acinader commented Nov 7, 2019

dplewis commented Nov 7, 2019

davimacedo commented Nov 7, 2019

dplewis commented Nov 8, 2019 • edited Loading

vitaly-t commented Nov 8, 2019

dplewis commented Dec 24, 2019

Moumouls commented Apr 4, 2020

SebC99 commented Jun 6, 2020

dplewis commented Jun 6, 2020

SebC99 commented Jun 18, 2020 • edited Loading

dplewis commented Jun 18, 2020

SebC99 commented Jun 19, 2020

dplewis commented Jun 19, 2020

SebC99 commented Jun 19, 2020

SebC99 commented Jun 19, 2020

dplewis commented Mar 16, 2021

dplewis commented Nov 7, 2019 •

edited

Loading

dplewis commented Nov 7, 2019 •

edited

Loading

dplewis commented Nov 8, 2019 •

edited

Loading

SebC99 commented Jun 18, 2020 •

edited

Loading