Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Problems when working with large datasets #2499

Closed
thekevinbrown opened this issue Aug 27, 2019 · 21 comments
Closed

Performance Problems when working with large datasets #2499

thekevinbrown opened this issue Aug 27, 2019 · 21 comments
Assignees

Comments

@thekevinbrown
Copy link

thekevinbrown commented Aug 27, 2019

Goals

Our app allows users to sort the photos on their device in a unique way. This means they can't just use the built in photos app to do the sorting.

As a result we have Photos and Albums. We store a URI to the photo and what albums the photo has been sorted to.

Expected Results

Writing 50 records should take less than a second.

Actual Results

If there are more than 5,000 records in the database, adding 50 more takes ~1 second. Since this is react native this absolutely kills the responsiveness of the app.

NOTE: This is definitely a release build, Android.

Steps to Reproduce

I'll create a repro if necessary, just want to make sure this isn't expected behaviour before I do.

Code Sample

export class Photo {
	static schema: ObjectSchema = {
		name: 'Photo',
		primaryKey: 'uri',
		properties: {
			timestamp: { type: 'date', indexed: true },
			uri: 'string',
			width: 'double',
			height: 'double',
			albums: 'Album[]',
			inTrash: { type: 'bool', default: false, indexed: true },
			lastSeenImportId: { type: 'string?', indexed: true },
		},
	};

	// ...
}

export class Album {
	static schema: ObjectSchema = {
		name: 'Album',
		primaryKey: 'id',
		properties: {
			id: 'string',
			name: 'string',
			photos: { type: 'linkingObjects', objectType: 'Photo', property: 'albums' },
		},
	};

	// ...
}

Version of Realm and Tooling

  • Realm JS SDK Version: 2.29.2
  • React Native: 0.59.10
  • Client OS & Version: Android, all tested versions.
  • Which debugger for React Native: None, Release build with console.log profiling.
@kneth
Copy link
Member

kneth commented Aug 27, 2019

Do you write the records in one big transaction or 50 small transactions? The overhead of durability guaranty can be large, so smaller transaction can be expensive.

@bmunkholm
Copy link
Contributor

Also ensure you don't open/close the entire Realm every time.

A simple repro would be appreciated in case none of the above helps you.

@thekevinbrown
Copy link
Author

It's one big transaction and we leave a persistent Realm connection open.

I'll get to work on a repro.

@thekevinbrown
Copy link
Author

Alright, I haven't reproduced the slow writes yet, but I'm definitely getting much slower reads than I'd expect when doing a slice to get the end of an array. It looks like it's doing a scan through the whole dataset to find the last 3 records?

I can work around this in my app by sorting differently and grabbing the first record, but yeah, it's surprising behaviour to me.

Here's the repro:
https://github.com/thekevinbrown/realm-js-slow-writes

@thekevinbrown
Copy link
Author

thekevinbrown commented Aug 28, 2019

Still haven't been able to reproduce this in my minimal test case, but in the full app I've narrowed it down. It's the lag between calling realm.write and the start of my callback that should run in the transaction. During this lag the thread is locked.

const start = new Date();
realm.write(() => {
    console.log(
        `Transaction start time: ${(new Date().getTime() - start.getTime()) / 1000} seconds`
    );

    // Rest of my saving logic
});

Produces:

Transaction start time: 0.078 seconds
Transaction start time: 6.469 seconds
Transaction start time: 12.144 seconds
Transaction start time: 17.062 seconds
Transaction start time: 22.686 seconds
Transaction start time: 27.618 seconds
Transaction start time: 32.302 seconds
Transaction start time: 30.094 seconds
Transaction start time: 28.346 seconds
Transaction start time: 34.79 seconds
Transaction start time: 29.006 seconds
Transaction start time: 33.876 seconds
Transaction start time: 26.208 seconds
Transaction start time: 29.622 seconds
Transaction start time: 36.142 seconds
Transaction start time: 33.235 seconds
Transaction start time: 37.449 seconds
Transaction start time: 27.984 seconds
Transaction start time: 28.512 seconds
Transaction start time: 31.641 seconds
Transaction start time: 33.39 seconds
Transaction start time: 35.34 seconds
Transaction start time: 39.772 seconds
Transaction start time: 57.994 seconds
Transaction start time: 66.733 seconds
Transaction start time: 55.885 seconds
Transaction start time: 56.414 seconds

...etc

What can I do to figure out why this is taking so long? Is there logging of Realm internals I can enable?

I'm also happy to send the full app to you guys privately if that'd help.

@bmunkholm
Copy link
Contributor

Hmm,regarding read perforamnce: I would hope slice is implemented better than that :-) We just provide an iterator to slice.
Sorting in the other direction may help. Also be aware that keeping your sorted resultset around instead of regenerating it every time you want to use it can be an advantage if you have a large resultset.
You could also time the different operations to see if it really is slice or the filter or the sorting that takes up time.
There is also the option to use "SORT(timestamp, DESC) LIMIT(3)" in the filter predicate, which you give much better performance (I noticed that hasn't been documented actually... our bad - that will be fixed.).

Another thing, using .filtered('inTrash == false').filtered('albums.@count > 0') is slower than including both conditions in just one .filtered('inTrash == false AND albums.@count > 0').
So in conclusion, try to use:
.filtered('inTrash == false AND albums.@count > 0 SORT(timestamp, DESC) LIMIT(3)') and see if that helps.

@thekevinbrown
Copy link
Author

thekevinbrown commented Aug 28, 2019

@bmunkholm, thanks for that. I definitely see 18 second read times if I slice the end rather than the beginning. If I reverse the sort and grab the beginning it's <100ms. That's definitely reproduced in the repo above. I'll try your suggestions to see if I can make it even better.

The write lag is definitely the killer at the moment.

@bmunkholm
Copy link
Contributor

Re Write, I'm not sure what your "write logic" is and what happens between the timing logs. But writing something definitely don't take seconds.
You are not running this while debugging right? Is this on a device?
Try to see a trivial example here: #2154 (comment).

@thekevinbrown
Copy link
Author

It’s definitely release mode and it’s a Samsung Galaxy S9.

Is there any logging I can enable so I can see what it’s doing in the 60 seconds between calling realm.write and getting the callback called?

@thekevinbrown
Copy link
Author

There is nothing between the start line and the console log line other than the realm.write call.

I’m happy to send the project over so you can see the write logic. Would that help?

@bmunkholm
Copy link
Contributor

We don't have any logging.
But write() is synchronous and blocking in case another write is in progress.
Are you just writing this one place? Are you syncing to Cloud?

You can use realm.beginTransaction() and realm.commitTransaction() instead on the write(). to separate things and see what's using the time. Also realm.isInTransaction to check before the write that you are actually not in a transaction.

@thekevinbrown
Copy link
Author

Ok, I'll keep logging in my code and see if I can figure out what's causing it, then put that into the reproduction.

The idea is the import of photos happens in the 'background' (as much as we can in JS anyway) and that the user can keep using the app while the import is happening, so yes, there are other writes happening while this is underway. But again since JS isn't multithreaded, saying it's happening simultaneously would be wrong.

@thekevinbrown
Copy link
Author

thekevinbrown commented Aug 29, 2019

To answer the cloud question, no, there is no cloud syncing happening at all, this is purely an on device database.

I've reproduced very odd behaviour around addListener, and can actually make the app crash by exploiting what I think is non-ACID behaviour.

I've added a reproduction case in thekevinbrown/realm-js-slow-writes@c7ba3ba

My leading theory at the moment is that my writes are so slow because:

  1. I subscribe to changes on a query
  2. I create a record that triggers the listener
  3. Which then re-runs the query, but it doesn't see the record that was created in step 2.
  4. So then it decides it needs to create a record, go to 2, ergo infinite loop.

There was logic like this to work around a "Transaction already in progress" error, but the real question is why in the world is a transaction already in progress there? If JS is single threaded and realm.write() is synchronous, then it should be impossible to have that happen, no?

In any case, I wouldn't expect to be able to crash the app like this by setting my JS up this way.

@thekevinbrown
Copy link
Author

Running the repro as it stands immediately crashes the app on the Galaxy S9 we're using, but just in case you have trouble reproducing it, here's an Android bug report bundle from the device:

bugreport-2019-08-29-15-52-58.zip

FS/data/tombstones/tombstone_07 is an example of one of these crashes.

@kneth
Copy link
Member

kneth commented Aug 30, 2019

@thekevinbrown

Doing write transactions in listener can be tricky as the listener might be called before the first call has terminated. We have seen issues in some cases, and they "guided" us to implement the manual transactions (beginTransaction, etc.).

I need to investigate a bit more in order to explain the crash.

@thekevinbrown
Copy link
Author

Thanks very much for your help @kneth.

Logically I don't understand the difference between the two constructs, but if there is a difference between:

realm.write(() => {
    // whatever
});

and

realm.beginTransaction();
// whatever
realm.commitTransaction();

That'd be good to understand better.

We've worked around this for the time being in our app by replacing all Realm listeners with JS listeners that fire after the write() block has finished and it's working great thus far. I'm not sure how Realm's architecture is set up or how realistic this is, but the ideal for a database to me is that whether the data is flushed to disk or not, when you call create then immediately query, the record you just created is always returned. isInTransaction could still return true and the data may not have been persisted and flushed to disk, but at least our own queries return a coherent view of the data. Otherwise we can't really trust it as the source of truth.

Or another way to say it is there are use cases for serializable transactions in SQL, or atomic writes in CouchDB / Redis / MongoDB and in those cases it's crucial that data is either there or not there and nothing in between. It seems like we as a team don't understand the guarantees that Realm gives us in this regard, so if there are docs I can read about it that'd be really helpful, or if we've just hit a bug that's good to know too.

@kneth
Copy link
Member

kneth commented Sep 3, 2019

write() is nothing more than

try {
  realm.beginTransaction();
  // whatever
  realm.commitTransaction();
} catch (e) {
  realm.cancelTransaction();
}

The advantage of manual transaction is that you can do something like

doCommit = false;
if (!realm.isInTransaction) {
  realm.beginTransaction();
  doCommit = true;
}
// whatever
if (doCommit) {
  realm.commitTransaction();
}

Compared to other databases, Realm objects are live. It means that they are updated as the database is updated. The notifications are done through the event loop (on Android we use Looper).

Once commitTransaction() returns, data is persisted but the live objects (instances of Realm, result sets, objects, etc.) must be updated. We are notifying first as we believe that your app's UI must be updated to make it more responsive (we don't consider it a bug but a feature).

We don't have much documentation on how it works. A bit iOS centric but https://academy.realm.io/posts/live-objects-fine-grained-notifications-realm-update/ gives you an overview.

@thekevinbrown
Copy link
Author

@kneth so if I've understood correctly you're saying that this is a feature:

  • addListener
  • create
  • In listener, query for objects
  • No objects found

That's the expected behaviour?

I buy the logic for notifying quickly, and for why there might be a delay in updating existing live objects to reflect the new data, but asking for a new result set needs to return a correct result.

Otherwise how do we actually build a feature with listeners? The app's UI will be updated quickly but it'll be wrong. The screen won't reflect the item the user just added to the DB, and then when it's "really" there we don't get any notification to re-render.

@thekevinbrown
Copy link
Author

@kneth, it's now been a while, can you at least confirm whether this is expected behaviour or a bug?

I'm considering splitting out the various issues so we can untangle them and follow progress on each if they're going to be long running things.

@thekevinbrown
Copy link
Author

Since this has been running for a month and there are separate issues raised, I've split them out into separate issues so we can track progress on each item.

@kneth
Copy link
Member

kneth commented Oct 1, 2019

@thekevinbrown Thanks for splitting into multiple issues. I hope that I can return to them shortly.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants