Skip to content

Optimistic opening of a Realm file#3013

Merged
cmelchior merged 10 commits intoreleasesfrom
cm/bug/retry-locked-realm
Jun 16, 2016
Merged

Optimistic opening of a Realm file#3013
cmelchior merged 10 commits intoreleasesfrom
cm/bug/retry-locked-realm

Conversation

@cmelchior
Copy link
Copy Markdown
Contributor

This PR is the result of investigating #2459

We have anecdotal evidence that multiple processes might be open for the same app. We have not been able to verify this, but the result matches the "IncompatibleLockFile" errors reported in #2459. This happens if the two processes have two versions of Realm (e.g. during app upgrades).

This PR is based on our assumption that such an overlap is not intentional and only happens because one process is started before the previous was completely shut down.

So this PR introduces an optimistic opening scheme where we retry for 1 second before crashing as before.

I also added logging so users can log if they run into this issue.
The settings for this are currently public in SharedGroup, which will allow users to modify the timeout and retry-policy without us needing to release a new version of Realm Java.

@realm/java

}

// Returns the time to sleep before retrying opening the SharedGroup.
private long getSleepTime(int tries) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this could be static

// Returns the time to sleep before retrying opening the SharedGroup.
private long getSleepTime(int tries) {
if (INCREMENTAL_BACKOFF_MS == null) {
return 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think returning INCREMENTAL_BACKOFF_LIMIT_MS is better. Otherwise you will get many calls to Thread.sleep().

@stk1m1
Copy link
Copy Markdown
Contributor

stk1m1 commented Jun 16, 2016

Awesome job! 👍 (It would be x100 awesome if there is an multi-process test, but oh well... 😢)

// Keep these public so we can ask users to experiment with these values if needed.
// Should be locked down as soon as possible.
public static long[] INCREMENTAL_BACKOFF_MS = new long[] {1, 3, 10, 20}; // Will keep re-using last value until LIMIT is hit
public static long INCREMENTAL_BACKOFF_LIMIT_MS = 1000;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like make the default time out longer, like 3 seconds. The time before ANR to kill is 5 seconds. As long as we don't trigger ANR, we will always get the accurate results.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I could probably live with that. Retrying every 20 ms for 3 seconds is kinda massive though, so I'll probably cap at 50 ms instead then? So 1,3, 10, 20, 50 ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[1, 10, 100, 200, 400] makes more sense IMO. The magnitudes of retry times should be enough differentiated. And also, we should give some free cpu time for kernel to do scheduling.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjust to {1, 10, 20, 50, 100, 200, 400} and 3 sec timeout. This is a good comprise between being responsive and allowing the scheduler to work.

@beeender
Copy link
Copy Markdown
Contributor

You made it!! 👍

assertTrue(lockFile.exists());
FileOutputStream fooStream = new FileOutputStream(lockFile, false);
fooStream.write("Boom".getBytes());
fooStream.close();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a flush to ensure the file is modified before proceeding with the rest of the test

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flush is a no-op for FileOutputStream:

    /**
     * Flushes this stream. Implementations of this method should ensure that
     * any buffered data is written out. This implementation does nothing.
     *
     * @throws IOException
     *             if an error occurs while flushing this stream.
     */
    public void flush() throws IOException {
        /* empty */
    }

@nhachicha
Copy link
Copy Markdown
Collaborator

LGTM 👍

@cmelchior cmelchior merged commit f4e340c into releases Jun 16, 2016
@cmelchior cmelchior deleted the cm/bug/retry-locked-realm branch June 16, 2016 13:57
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants