Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow firestore listener performance with large data sets #1477

Closed
KabukiAdam opened this issue Jun 28, 2018 · 23 comments
Closed

Slow firestore listener performance with large data sets #1477

KabukiAdam opened this issue Jun 28, 2018 · 23 comments
Assignees
Milestone

Comments

@KabukiAdam
Copy link

@KabukiAdam KabukiAdam commented Jun 28, 2018

  • Xcode version: 9.4.1
  • Firebase SDK version: 5.3.0
  • Firebase Component: Firestore
  • Component version: 0.12.4

I’ve been experiencing surprisingly slow performance with Firestore listeners with larger data sets (thousands of documents). There are three issues that are seemingly related…let me know if you’d like me to break this up into separate issues.

All testing has done using a simple sample app running on an iPhone 7.

Slow listener response when querying large data set

For a data set with thousands of documents, it takes several seconds for the initial update from the listener. For example, a listener querying a data set with 3000 simple one-key documents takes 1-2 seconds for the initial listener update closure to be called. From what I’ve seen this appears to be simply the time it takes to query the local cache and return the 3000 documents. This seems surprisingly slow for locally persisted data. I’ve tested this both online and offline with the same results.

I found this closed issue which seems similar, but I wasn't sure if it was the exact same problem or not:
#413

Steps to reproduce:

  1. Create listener to query a data set of 3000 documents.
  2. Measure the time it takes for the first listener update to be called.

Paging with limit() does not help (actually a bit slower)

I looked into using limit(to: 100) on my listener query to only return the first 100 results, thinking this might get me faster results from the listener. But it was actually a bit slower when I did this. Should paging using limit() improve the performance of the listener query?

Steps to reproduce:

  1. Create listener to query a data set of 3000 documents, with a .limit(to: 100) added to the end of the query.
  2. Measure the time it takes for the first listener update to be called.
  3. It will take slightly longer time compared to without the limit (maybe 10% longer).

Listener performance is worse immediately after adding the large data set

If I set up the listener after adding the 3000 documents from the app (without delete/reinstall in between), then the initial listener query seems to take MUCH longer, around 20 seconds. This happens even after a re-launch of the app, and after I’ve made sure all the documents have been added to the cloud. If I then delete and reinstall the app, then the listener performs back at the 1-2 second timing again. It almost feels like after adding the 3000 documents that the local cache is in some degraded-performance state and a re-install allows it to re-cache everything cleanly (but I have no evidence this is what is actually happening).

Steps to reproduce:

  1. From an empty database, add 3000 simple documents from the app. Wait for all documents to be written to the cloud.
  2. Create listener to query that same set of 3000 documents.
  3. Measure the time it takes for the first listener update to be called.
  4. It will take much longer, like 20 seconds or so.
  5. Kill and force quit the app, repeat steps 2-4, same result.
  6. Delete and re-insatll the app, repeat steps 2-3, measured time now is 1-2 seconds.

I first ran into these issues with our own in-development app. But I’ve been able to consistently reproduce them using a very simple test app (which I can share if that would help).

@mikelehen
Copy link
Contributor

@mikelehen mikelehen commented Jun 29, 2018

@KabukiAdam Thank you for this detailed performance feedback! This is useful and I appreciate you taking the time to provide it. I haven't dug in too deep, but I'll briefly respond with my initial analysis.

Your first point (slow w/ large data set) is semi-expected. We may be able to optimize this some, but on the order of ~1 second per 1000 documents is probably expected right now. We generally assume apps will be querying smaller result sets (on the order of what fits on a screen).

Your second issue (paging with limit() does not help) is expected right now but we have planned improvements that should help a lot. Currently the client does not implement its own indexing of data which means that in order to satisfy your limit(), we need to load all of the data, sort it, and then apply the limit in-memory. As you noticed, this is just as slow (or slower) than querying the full data unfortunately. Once we implement client-side indexing, limit(100) should be ~30x faster than the full 3000 documents.

Your 3rd item (listener performance worse immediately after adding the large data set) is very surprising to me. I could imagine there could be some performance quirks while those writes have not yet been committed to the backend, but once they're committed I don't know why there would be a lingering performance issue. Can you confirm your completion block(s) were called? And were you using the WriteBatch API to do multiple writes at once? Or just 3000 individual setData() calls or similar? If it's easy to provide a repro app, that would make it much faster for us to investigate. Else we'll try to get this assigned to somebody for repro and investigation in the coming weeks.

Thanks again for the feedback!

@KabukiAdam
Copy link
Author

@KabukiAdam KabukiAdam commented Jul 2, 2018

@mikelehen Thank you for the prompt and helpful reply.

Regarding the listener performance with a large data set, it's good to hear that my results are not unexpected. However I can't help but feel this is still "slow" relative to what one would typically expect for locally-persisted data queries (similar queries using core data or sqlite are ~100 times faster). I do agree that 3000 documents is way more than would ever fit on a screen, but it is fairly common to have a very long list of things in a list in the UI, and allow the user to quickly scroll through it (even it only a small subset appears on the screen at once). And there are cases where even a screen-full of data could number 100 documents (for example "events" on a month-view calendar), which would take ~0.1 seconds to query, just barely fast enough given the expected UI latencies for native mobile apps. Anyway, that's just some extra feedback on the performance in general...if there's a better place to give this kind of feedback, I'd be happy to do so and go more detail. :)

Regarding the issue of paging with limit(), I'm very glad to hear that there are plans to implement client-side indexing to proportionally improve performance. Is there any information about the plan/timeframe for this to be implemented? I know time estimates (especially public ones) are somewhat tricky, but it would be extremely helpful if we had some idea about the timeframe you're shooting for. (Our app will depend on paging to deal with querying these larger data sets).

Regarding the issue of listener performance being worse right after adding the large data set. In my tests, it was doing 3000 individual setData() calls (called in a tight loop). And yes I can confirm that the completion block is called for each write.

I’ve pushed a copy of our test app to Github here: https://github.com/KabukiAdam/FirestoreListenerTest

Steps to reproduce using the above test app:

  1. Run app, tap “Test Large Data Set” to create the 3000 documents. The on-screen log will show when each callback is called after creating the document. Check Firebase console to confirm that all documents have been created.
  2. Force-quit app, disable networking (to guarantee local query), open app, tap Create Listener. Takes ~20s. NOTE if you tap Create Listener at the end of step 1, then it only takes ~7s here in step 3…not sure why the difference?
  3. Delete the app from the device (to start with empty local cache), enable networking, open app, tap Create Listener. First time it needs to query from cloud so it will take a bit longer, but on subsequent Create Listener taps it only takes ~1.2s.
  4. Force-quit app, disable networking (to guarantee local query), open app, tap Create Listener. Takes ~1.2s. Will always take ~1.2 forever after.

Results summary: ~1.2 sec seems to be the “normal” time, but I only see this after a delete/re-installl. The listener time right after adding the documents takes much longer (and it never seems to go down in time on its own).

Thanks again for your help with this.

@rsgowman rsgowman self-assigned this Jul 3, 2018
@rsgowman
Copy link
Member

@rsgowman rsgowman commented Jul 4, 2018

@KabukiAdam I think we've got this reproduced locally now. The issue seems to stem from looking up the 3000 documents locally in the underlying leveldb datastore, specifically when searching for local mutations to the documents (of which, there aren't any in this case, so those searches all fail as expected.)

Compacting the underlying database during startup seems to help this particular situation quite a bit; when I change the code to do that, step 2 goes from ~20s to ~1.2s, as you would otherwise expect. But I'm not sure on other implications to that change. We'll have to discuss this a bit internally and see what the best approach is. I'll update this issue again when we've got that figured out.

@businessengine
Copy link

@businessengine businessengine commented Jul 5, 2018

We faced the same issues where we have 10 collections with an average 500-3000 docs each - all must be listened when the app starts. When the number of docs for each is small e.g 100 each, the performance is fine, but now on an average 500-3000 docs (each docs is less than 2kb), it becomes extremely slow both online and offline.
Profiling:
image
Local DB Cache: 23Mb
image
@rsgowman We're interested on how you do compacting the leveldb for a quick remedy right now. Can you please tip us on that? Thanks.

@rsgowman
Copy link
Member

@rsgowman rsgowman commented Jul 5, 2018

@businessengine Check out #1494. It's not a finalized PR, but demonstrates the issue.

However, note that the specific performance degradation that this seems to help only applies to offline queries. When online, simply creating the query is enough to cause leveldb to perform quickly, so this may not be relevant at all for your use case.

As Mike pointed out up-thread, performance for large query sets could run as slow as ~1 second per 1000 docs. So if you're querying for (10 collections * 3000 docs each = ) 30000 docs, and it's taking ~30s (according to your image capture of your profiler) then this may be in line with current expectations.

Nonetheless, if you'd like to try patching your copy of firestore with the change in the above PR and report back here, it'd be incredibly useful for us to know if that improved things for you or not. (Alternatively, if you can provide a minimal repro app, I can make the above change and see if that helps.)

@businessengine
Copy link

@businessengine businessengine commented Jul 6, 2018

Thanks @rsgowman, we applied the patch and noticed the leveldb size reduced from23 to 15MB, however, overall performance is still the same (or unnoticeable). We may have to consider the solution where we cache big collections ourselves with alternate db such as realm or snappydb. Do we have any roadmap/milestones on when Firebase team will work on the client-indexing feature which may help improve performance for cases like this? Thanks.

@businessengine
Copy link

@businessengine businessengine commented Jul 7, 2018

We noticed differences when running our app offline vs online: Offline is much faster (e.g instant responsive when we update data).

@var-const
Copy link
Contributor

@var-const var-const commented Jul 14, 2018

@KabukiAdam Thanks for a great repro case! Experimenting with the test app you provided locally, it appears to me that #1533 significantly speeds up creating a listener for several K documents. If you have the opportunity, it would be great if you could try out that branch on your side, just to check whether you see a similar performance improvement. (I slightly modified the app to create 9K documents. In release mode using iPhone 8 Plus simulator in XCode 9.4, I get ~54.3s delay before listener creation on master and ~2.9s on the branch. If you see comparable numbers, I think it should solve the issue).

@businessengine I can't verify locally, but #1533 optimizes the code path that shows up in the stack trace you provided, so it's reasonable to presume it should improve performance for your case as well. If you have the opportunity to try out that branch on your side, I'd be very interested to know the results. Thanks!

@businessengine
Copy link

@businessengine businessengine commented Jul 14, 2018

Thanks @var-const Great news! We will try it out today.

@businessengine
Copy link

@businessengine businessengine commented Jul 14, 2018

Tried the branch varconst/coll-opt but got the following header missing: 'FirebaseCore/FIRAppEnvironmentUtil.h' file not found

We used
pod 'FirebaseCore', :git => 'https://github.com/firebase/firebase-ios-sdk.git', :branch => 'varconst/coll-opt'
pod 'FirebaseFirestore', :git => 'https://github.com/firebase/firebase-ios-sdk.git', :branch => 'varconst/coll-opt'

Any helps, please?

@var-const
Copy link
Contributor

@var-const var-const commented Jul 14, 2018

@businessengine Oh, really sorry about the trouble. There was some project restructuring recently, this failure looks related to that. The way I had it in my podfile was to only use the non-standard pod for Firestore:

pod 'Firebase/Core'
pod 'GoogleUtilities'
pod 'FirebaseFirestore', :git => 'https://github.com/firebase/firebase-ios-sdk.git', :branch => 'varconst/coll-opt'

I'll look into this more closely tomorrow.

@paulb777
Copy link
Member

@paulb777 paulb777 commented Jul 14, 2018

Our pod structure is in flux until the next release and installing from branches will vary for component and over time until we release the M30 milestone. More background at #1535.

Since Firestore is still compatible with both the latest and last release dependencies, it should be sufficient to add the one line to the Podfile:

pod 'FirebaseFirestore', :git => 'https://github.com/firebase/firebase-ios-sdk.git', :branch => 'varconst/coll-opt'

@KabukiAdam
Copy link
Author

@KabukiAdam KabukiAdam commented Jul 16, 2018

@var-const Yes, I was able to confirm that #1533 does seem to fix the long listener times that I was experiencing right after creating a large set of documents (the issue I described under Listener performance is worse immediately after adding the large data set in my original post). The listener time now seems to be the same as it would be after a fresh install.

Thanks for quick investigation and fix!

@var-const
Copy link
Contributor

@var-const var-const commented Jul 16, 2018

@KabukiAdam That's great to hear, thanks for the confirmation!

@businessengine To echo what @paulb777 Paul said, I could successfully build FirestoreListenerTest by only pointing the FirebaseFirestore pod to a Git branch, so that the Podfile looks like:

use_frameworks!

target 'FirestoreListenerTest' do
   pod 'Firebase/Core'
   pod 'FirebaseFirestore', :git => 'https://github.com/firebase/firebase-ios-sdk.git', :branch => 'varconst/coll-opt'
end

However, if I change FirebaseCore to also point to Git, I start having issues similar to those you're describing. Please let us know if you still run into build errors when having a Podfile similar to the snippet above.

@businessengine
Copy link

@businessengine businessengine commented Jul 18, 2018

Thanks @var-const we used that same Pod content too.

@var-const
Copy link
Contributor

@var-const var-const commented Jul 18, 2018

@businessengine Are you still seeing build errors when using a similar podspec? If errors persist, then if you could share the Firestore-related bits of your podspec and the errors you're seeing, I could take a look.

@businessengine
Copy link

@businessengine businessengine commented Jul 19, 2018

@var-const sorry for the misunderstanding, we were able to build successfully with the above pod lines. Notice a hug performance boost in batch writing, but caching big 10-15 collections when the app starts is somehow similar as before. We can extract our code and send you via email for further investigation. Please let us know if that works for you.

@var-const
Copy link
Contributor

@var-const var-const commented Jul 20, 2018

@businessengine Would you be able to upload a snippet of the problematic code to Github (doesn't necessarily have to compile as long as it clearly demonstrates the issue; of course, a minimal test app is best if at all possible)?

@var-const
Copy link
Contributor

@var-const var-const commented Jul 27, 2018

@KabukiAdam The fix is part of Firestore-0.13.0. I'm not sure whether the fix takes care of all the issues you listed out in the original post, or only the 3rd part ("Listener performance is worse immediately after adding the large data set"). Please let me know if you're still experiencing any of the performance issues outlined.

@mono0926
Copy link

@mono0926 mono0926 commented Aug 3, 2018

I noticed that Firebase 5.5.0, which contains Firestore-0.13.0, was released and I tried to use it.

I was troubled with this performance issue before v5.5.0, but after updated to v5.5.0, it seems to be fixed 👍

@wilhuff wilhuff added this to the 5.5.0 milestone Aug 3, 2018
@wilhuff
Copy link
Contributor

@wilhuff wilhuff commented Aug 3, 2018

This fix was released as a part of Firebase 5.5.0.

@wilhuff wilhuff closed this Aug 3, 2018
@KabukiAdam
Copy link
Author

@KabukiAdam KabukiAdam commented Aug 7, 2018

@var-const Thanks, and sorry for the late reply. It looks like this only fixes the 3rd part, which is what I expected.

From previous replies, it sounds like the first issue I reported ("Slow listener response when querying large data set") might simply be the intended performance.

The second issue ("Paging with limit() does not help (actually a bit slower)") is still an issue. Should I create a new Github issue so it can be tracked separately from this one?

@var-const
Copy link
Contributor

@var-const var-const commented Aug 7, 2018

@KabukiAdam Thanks for clarifying! Yes, it would be great if you could create a new issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants