Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Firestore very slow loading from cache #2466

Closed
andyboyd opened this issue Mar 4, 2019 · 4 comments
Closed

Firestore very slow loading from cache #2466

andyboyd opened this issue Mar 4, 2019 · 4 comments
Assignees
Milestone

Comments

@andyboyd
Copy link

@andyboyd andyboyd commented Mar 4, 2019

  • Xcode version: 10.1
  • Firebase SDK version: 5.17.0
  • Firebase Component: Firestore
  • Component version: 1.0.1

It's difficult to describe this in a "steps to reproduce" kind of way, but the general problem is that I'm seeing a particular listener being very slow to return when it's first set up when using offline caching.

Our database structure is reasonable complex, we have a collection called groups, which has subcollections for users and chat threads. The chat thread document itself has subcollections for messages, read receipts and chat participants. Normally, every user in the group is in every chat thread.

To render the group in our app, we have snapshot listeners on the group document itself, and one on the collection of chat threads, and one on the collection of users.

We noticed on one of our larger, more active groups, that it takes a long time (about 5 seconds on an iPhone X, reportedly 30 seconds - multiple minutes on older devices) to load the group every time the user navigated to the screen for it. This behaviour only became apparent once a user has viewed several of the chat threads, until then the group loads quickly.

We originally had an order by date query set up on the list of chat threads, and discovered that the firebase iOS SDK actually can't cache that query (lack of client side indexing, I believe?), and so it has to pull down the full collection every time, so we removed the ordering from the query, and just listen to the collection itself, and sort it in our own code, which improved things.

However, while it no longer displays the long lag every time the screen loads, it still does have the same lag the first time the query is run. We can get rid of this largely by turning off the offline persistence. Doing so reduces the lag significantly, but it has the significant drawback of making an internet connection required at all times, and it also makes certain operations appear much slower for the user, since they're now dependent on the internet connection. This isn't really a viable solution for us.

From the research I've done, it seems Firestore is using leveldb internally for its offline caching, and leveldb doesn't support indexing. My hunch is that this means whenever the data is accessed from the cache it has to be unpacked into memory, since without indexing on leveldb, I think this must be the only way to have any kind of performance when querying. It seems to me that whatever way the cache is constructed, the structure of our data means the actual contents of the chat threads must be getting included in that "unpacking" operation, which is why the performance is only slowing down once the threads have been opened (and therefore cached)

I have 3 questions off the back of this:

  1. Are my assumptions about the reasons this is happening somewhere close to the mark?
  2. Are there any ways I can tweak the caching behaviour, for example, that might help alleviate this?
  3. If the only thing that will properly fix this is client side indexing in the SDK, is this something that's being worked on?
@mikelehen
Copy link
Contributor

@mikelehen mikelehen commented Mar 4, 2019

Thank you for the report and sorry you're running into perf issues. :-( A couple notes:

This behaviour only became apparent once a user has viewed several of the chat threads, until then the group loads quickly.

I think I understand part of what's happening here and there's a relatively simple optimization we should be able to make. Right now, when you do a query over a collection (e.g. /groups/123/threads), we actually end up enumerating and parsing documents in subcollections underneath the collection as well (e.g. /groups/123/messages/threads/456) even though they cannot match the query. Because of the way data is persisted locally and the current lack of client-side indexing we can't avoid enumerating those documents, but we should be able to avoid parsing them via a simple inspection of the document path, and that should help performance considerably. We actually do this already in our Android client (https://www.google.com/url?q=https://github.com/firebase/firebase-android-sdk/blob/master/firebase-firestore/src/main/java/com/google/firebase/firestore/local/SQLiteRemoteDocumentCache.java%23L122&sa=D&usg=AFQjCNGwtpzvGTC-UckdVoS2ae2jAUzOaw), but that short-cut seems to be missing from iOS. I've opened an internal bug (b/127302263) and barring complications I should be able to open a PR for this shortly.

We originally had an order by date query set up on the list of chat threads, and discovered that the firebase iOS SDK actually can't cache that query (lack of client side indexing, I believe?), and so it has to pull down the full collection every time, so we removed the ordering from the query, and just listen to the collection itself, and sort it in our own code, which improved things.

I am having trouble understanding this. While we do lack client-side indexing right now (which means we must read each document in the collection and then apply the query filters in-memory), that shouldn't really matter wrt ordering because we're going to sort the documents that matched the query in-memory regardless (if you don't provide your own orderBy, then we will order by the document ID, but that should be about the same amount of work as any orderBy you specify). So if you're sure you saw a performance improvement, I'd be interested in a bit more detail (the exact queries you were doing and a rough measurement of the perf difference).

As for your questions:

  1. Yes, definitely close to the mark. Generally speaking, we store documents as a linear list, ordered by path. To perform queries we have to scan over the part of the list that could match the query (based on the query path), reading each document and applying the query filters. Once we've found all the documents that match the query filters we apply the orderBy and limit constraints.
  2. The one piece of tuning you can do today is set the cacheSizeBytes setting. You can try lowering this setting to reduce the accumulated amount of cached data, which should help keep queries from getting too slow (example below). If you continue to struggle with perf even after the fix I've mentioned, the other thing you could consider is restructuring your data to avoid subcollections, but this would be an unfortunate concession.
  3. As mentioned, I think there's a short-term optimization we can do that should help. Client-side indexing is definitely on our roadmap, and a lot of the initial design work has been done. That said, we have a couple other large efforts that are currently preventing us from actively working on it right now.

Example of setting cacheSizeBytes:

  let settings = FirestoreSettings()
  settings.cacheSizeBytes = 1024*1024
  firestore.settings = settings

Hope this helps! I'll follow up once we've implemented the fix I mentioned.

@mikelehen
Copy link
Contributor

@mikelehen mikelehen commented Mar 5, 2019

I've made the perf improvement I mentioned. I believe you can try it out by pulling from github master:

pod 'FirebaseCore', :git => 'https://github.com/firebase/firebase-ios-sdk.git', :branch => 'master'
pod 'FirebaseFirestore', :git => 'https://github.com/firebase/firebase-ios-sdk.git', :branch => 'master'

Else it should be released in the next few weeks (note it missed the cut for our very next release, so it'll be the one after that).

I am suspecting this will provide a meaningful improvement and is likely all we can do until we tackle the client-side indexing work I mentioned, so I'm closing this for now. but feel free to re-open if you have additional questions / concerns.

@mikelehen mikelehen closed this Mar 5, 2019
@andyboyd
Copy link
Author

@andyboyd andyboyd commented Mar 6, 2019

Thanks @mikelehen !

The fix you applied does seem to have resolved the issue for us. Which is awesome, thanks for the fast response.

To respond to the other points in the discussion above, I had already tried setting the cache size, which unfortunately didn't improve anything. But your fix has made it a moot point, so yay.

Regarding the ordered query, the query was along the lines of:

groupDocument
.collection("chat_threads")
.order(by: "lastPostedMessage.createdDate", descending: true)
.addSnapshotListener()

And it was taking several seconds to respond every time it ran. On my phone, it was around 5-10 seconds, but it's an iPhone X, so a pretty fast devices. Some of our users were reporting multiple minute waits. The time did seem to scale according to how many of the documents in the collection had been cached by the user. As I mentioned, there were subcollections under those documents, some of which have hundreds of documents in them, and one of those document types has a map in it which would have around 200 entries in this particular scenario.

When I removed the order(by:descending) part of the query, the lag was still present the first time the query was run, but subsequent executions were fast. I presume because the data was already in memory and didn't need to be read again from cache.

Not sure if that's helpful or not, but in any case your fix has resolved the issue as far as I'm concerned, so thanks again. Looking forward to when you get the client side indexing in to hopefully bring more performance improvements.

Thanks for the good work!

@paulb777 paulb777 added this to the M45 milestone Mar 6, 2019
@mikelehen
Copy link
Contributor

@mikelehen mikelehen commented Mar 6, 2019

Great to hear! Thanks for confirming that the fix had the desired effect.

And thanks for the additional details about the ordered query. Offhand I'd expect the query to behave exactly the same performance-wise, with and without the orderBy. But as it's a moot point now, I'm not going to worry over it. 😄

@firebase firebase locked and limited conversation to collaborators Oct 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants