Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low-end and mid-range devices cannot handle large collections of documents. #1052

Closed
markic22 opened this issue Dec 12, 2019 · 12 comments
Closed

Comments

@markic22
Copy link

markic22 commented Dec 12, 2019

[REQUIRED] Step 1: Describe your environment

  • Android Studio version: 3.5.2
  • Firebase Component: firebase-firestore
  • Component version: 21.3.0

[REQUIRED] Step 2: Describe the problem

Steps to reproduce:

The main issue we are having is with large collections of documents. When we have for example a collection of 6000 document and we need to have them all on device, the phone/app runes out of memory, or it's allocating is lasting for hours.

To give you real world example: We have collection of 6000 documents and when i'm adding snapshot listener to get those documents with my samsung a20s (just an example of low end device) the firestore sdk cannot handle it, it just goes on and on forever, trying to allocate enough of memory to perform inserting into local database I guess. This is deal breaker for us, because we need all that data, in some cases even combining over 40.000 documents. I know about query limit, that we can set the max size and only get limited amount of documents, but we need all of it. Pagination is not rly the answer here also, as we need to get this data at the start of application to do some calculations later on on all of the data combined.

The issue was reported to Firestore support and they confirmed that they could reproduce the issue and pointed me to the direction of this github page.

Relevant Code:

Here I made a sample app, with which I demonstrated the issue. You need to set your own google-services json. Then with the help of the app create arround 6k documents (each button press adds 1000, but you must wait for all of them to be uploaded), then unistall the app or delete storage (when you have 6k documents online) and open the app on low end or probably mid range device.

The result will be OOM or simply infinity of trying to allocate memory for this job.

https://github.com/markic22/largeDataFirestore

Let me know if you need any additional data!

Regards, Marko.

@google-oss-bot
Copy link
Contributor

I found a few problems with this issue:

  • I couldn't figure out how to label this issue, so I've labeled it for a human to triage. Hang tight.
  • This issue does not seem to follow the issue template. Make sure you provide all the required information.

@aguatno
Copy link

aguatno commented Dec 13, 2019

Hi @markic22 thank you for taking time filling this issue. I was able to get the application to crash with OOM error, could you confirm if this error log is similar from your end?

@markic22
Copy link
Author

Hi @aguatno. Thanks for checking the issue so fast.

There are actually different possible outcomes, on some real devices it's just trying to allocate memory indefinitely (saying something about GC trying to allocate memory and for every document it last couple of seconds to allocate) and not crashing at all (samsung a20s). Once I waited for 5 hours and it was still trying to allocate. On some devices I get the same crash as you did here.

@aguatno
Copy link

aguatno commented Dec 13, 2019

Could you share a screenshot of this log "it's just trying to allocate memory indefinitely" so I know where to look at? Thanks

@markic22
Copy link
Author

markic22 commented Dec 13, 2019

Sure, here is the screenshot of what is happening.

Now it also crashed... similar as yours above:
2019-12-13 18:27:23.249 32166-32183/com.example.largecollectiontest E/System: java.lang.OutOfMemoryError: Failed to allocate a 40 byte allocation with 0 free bytes and 0B until OOM, max allowed footprint 201326592, growth limit 201326592 at java.lang.ref.FinalizerReference.add(FinalizerReference.java:56) at com.android.internal.os.BinderInternal$GcWatcher.finalize(BinderInternal.java:62) at java.lang.Daemons$FinalizerDaemon.doFinalize(Daemons.java:250) at java.lang.Daemons$FinalizerDaemon.runInternal(Daemons.java:237) at java.lang.Daemons$Daemon.run(Daemons.java:103) at java.lang.Thread.run(Thread.java:764)

@wilhuff
Copy link
Contributor

wilhuff commented Dec 13, 2019

Unfortunately, you're running up against limitations in the Firestore SDK and the VM on these kinds of devices. From the log in the screenshot you posted, the heap is limited to 192 MB, which sounds like a lot except it's shared with code, and whatever else you're using. To get any reasonable performance you need to use significantly less than the maximum heap.

Documents use memory, and loading thousands of them uses nontrivial memory. While we have some initiatives underway to reduce memory usage, these are big changes and they're not going to happen in the short term.

Practically speaking what you're trying to do just isn't going to work on this kind of device. You can potentially make it work by disabling persistence. In that case we don't have to convert between the network representation of the data and our backing store, but even this might not be enough if the volume of your data is large enough. It also may not work for your use case.

In any case there really isn't anything we can do for you. My best advice is to find a way to just use less data. Preprocess the dataset elsewhere so that clients can load documents representing the view you want to present rather than trying to calculate it from the underlying data client-side.

@markic22
Copy link
Author

@wilhuff Thanks for your response.

Well this is simply unacceptable. In previous iteration of our application we managed local database ourselves, and we did it by writing to local database in batches, to avoid OOM issue. Can you please point me to documentation where it says that this is the limit of using firestore?

We spent almost 2 years on application which has data layer based on firestore and now you are saying you have no intention of fixing OOM issues on your product? We only noticed this issue now, since we needed to onboard client that has that large of database.

We need persistant data, as the offline functionality is what got us to choose firestore in the first place and yes, we need all that data for our application to be functional.

@wilhuff
Copy link
Contributor

wilhuff commented Dec 13, 2019

This isn’t an OOM issue we can fix. You’re trying to load so much data the device running out of RAM.

We’ve given performance guidance on this point in various forums over time. A fairly recent example is here: https://firebase.googleblog.com/2019/08/why-is-my-cloud-firestore-query-slow.html.

As I said earlier, we have some ideas for how to reduce memory usage, but these are big changes to how we process document data that can’t be done quickly. They’re not going to address your immediate concern. As much as I’d like to be able to do something, there’s no immediate remediation we can supply. If you want to support this kind of device you have to use less data and fewer documents.

@markic22
Copy link
Author

Ok. So your official response is that firestore sdk cannot handle more then few thousands of documents with persistant storage enabled?

I just want to make this completely clear, because I want to write on Firestore support, so they can include this information on documentation and limitations, so other companies and developers do not make the same mistake we did. And I want this answer to be the reference point for them.

I have read this article before, but those "optimisations" does not suit our use case.

@wilhuff
Copy link
Contributor

wilhuff commented Dec 13, 2019

I know this is frustrating, and I'm sorry I don't have a better answer for you.

What you’re running against is that Firestore is designed to expose snapshots of data over time. This is different than regular database like SQLite, where cursors only operate on a single row of data at a time at a single point in time. Firestore exposes the whole result set and shows what changed from snapshot to snapshot, making it really easy to directly integrate into UIs. However, this means it has to be able to hold the whole snapshot plus the metadata and overhead in RAM plus the next snapshot for comparison.

At the same time, we’ve seen some Android devices OOM just trying to hold a moderately sized image in memory. This is why we give the advice that your result sets should be small. Low-end Android devices run out RAM fast.

To be clear, many Firestore apps cache thousands of documents in persistence without an issue. The memory problem you're running into is due to trying to read them all in a single snapshot. This isn't how most Firestore apps work and it's not something we're optimized for.

We're happy to work with you on what design changes could be made to your application to make it work better with Firestore. Could you describe some more what these documents represent? Why is it so critical that they all be loaded at once? Are there ways to summarize the data where you keep the summaries in RAM but load main data separately? From what little you've told us of what you're actually doing, it seems like you're trying to do something we're just not designed to handle and there really isn’t any way I can fix that.

@markic22
Copy link
Author

Sure. For example we have application that is dealing with hotel management. Our application is meant for maids, supervisors CEO and so on. Firstly we are separating our data on UI by facility, each customer can have more facilities (for example: Hilton hotels can have 10 facilities, New York, San Francisco, Las Vegas...).

On each facility we have sectors(floors), spaces, assignments, consumptions (minibar, laundry...)... Hotels can have up to 1500 spaces in one facility and up to 3000 assignments. Assignments are not necessary tied to spaces.

spaces are its own collection, assignments are its own collections and so on..
Assignments, consumptions and other things are related to space and we are displaying all that data on screen where we show all spaces and separately on assignments list screen.

Because internet connectivity in hotels can be spotty or none existent in some area, we wanted to implement offline mode, so the maid just needs to login where she has the internet, and can go cleaning rooms anywhere, without worrying about some data not displaying or that data is not sent.

Is there anyway we can talk through Gmeet or Skype or anything similiar, so I can show and explain things in more detail and explain why we made the choices we did?

@wilhuff
Copy link
Contributor

wilhuff commented Dec 17, 2019

In very broad strokes, the high level of the description of the layout of the data makes sense to me.

However, this high level description seems to be at odds with the sample you shared. There you're packing all the data into a single "list" collection, discriminating on type. This structure is essentially an anti-pattern and leads to poor client-side performance (for reasons described in the doc I linked). You're much better off using actual collections and subcollections.

Secondarily, the sample is loading all the data for all the types at once, meaning that if it succeeded, you'd have 20k documents in RAM at the same time. I'm surprised this works on any device at all.

You seem to be treating Firestore as a way to bulk synchronize all the data for a given customer or facility to a device but that just won't work. To use Firestore effectively, query for the specific data that applies to the currently logged in user for the specific views you're showing. If you need to perform calculations over all the data, do so server-side (or in a function, etc) so that clients don't need to fetch it.

As far as a meeting goes, please understand that we're a tiny development team and just can't provide that level of support to everyone directly. Our Cloud support organization can consult on these kinds of performance problems and can give you specific advice on your application. We're not able to provide that kind of service here.

@firebase firebase locked and limited conversation to collaborators Jan 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants