Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat (client): flag to disable FKs on incoming TXs in SQLite #1281

Merged
merged 11 commits into from
Jun 4, 2024

Conversation

kevin-dp
Copy link
Contributor

This PR introduces an additional flag that can be enabled when electrifying a SQLite database. When the flag is enabled, the Satellite process disables FK checks on incoming transactions. This flag is useful for performance reasons as skipping FK checks greatly speeds up initial data sync when there is a lot of data.

Note that disabling FK checks can't be done transactionally so we have to disable FKs, apply the TX, and then re-enable FKs. Therefore, it is important that no transactions are ran concurrently (this is ensured by the mutex that our drivers hold).
We assume that Electric exlusively owns the DB connection and thus that nobody executes a concurrent TX directly on the DB connection.

Copy link

linear bot commented May 20, 2024

@samwillis samwillis self-requested a review May 20, 2024 15:27
Copy link
Contributor

@samwillis samwillis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevin-dp I think its possible for a non-satellite transaction to happen between dissabling FK checks and the actual satellite transaction, and again possible afterwards, before they are turned on again.

Our adapter mutex is at the transaction level only. Do we not need another mutex, a connection level one, that satellite can hold during what is in effect multiple transactions?

@davidmartos96
Copy link
Contributor

@kevin-dp Do you have some numbers about this performance improvement?

I'm wondering if using builder.deferOrDisableFKsForTx inside the adapter transaction would be equivalent, same as it's currently done in _applySubscriptionData. It would solve the concurrency issue.

@kevin-dp
Copy link
Contributor Author

@kevin-dp I think its possible for a non-satellite transaction to happen between dissabling FK checks and the actual satellite transaction, and again possible afterwards, before they are turned on again.

Our adapter mutex is at the transaction level only. Do we not need another mutex, a connection level one, that satellite can hold during what is in effect multiple transactions?

You're right, concurrent TXs could occur between the DAL and the Satellite process which could get badly interleaved.
Here's a summary of a discussion with @samwillis about how our drivers currently use mutexes:

  • drivers that are sync don't need the mutex in order to implement the interactive transactions safely (better-sqlite3)
  • our transaction APIs use the callback style so that in sync code there is no promise/await and so no mutex needed for sync drivers
  • async drivers (wa-sqlite, capacitor, tauri, and others) need the mutex for the transactions

All async drivers are built on top of our generic drivers.
The generic driver coordinates all methods run/query/runInTransaction/transaction through a mutex.

What we are trying to achieve here is to group PRAGMA foreign_keys = OFF, a remote transaction, and PRAGMA foreign_keys = ON inside a SQLite transaction. However, SQLite does not allow the foreign_keys pragma to occur in a transaction. Hence, to guarantee isolation with other TXs we need to manually coordinate through a mutex. I propose to introduce a group(f: (uncoordinatedAdapter: Adapter) => void) method in the adapter interface which can be used to conceptually group several statements that are isolated from other transactions by coordinating through a mutex.

So that the Satellite process could do:

adapter.group((a) => {
  a.run({ sql: `PRAGMA foreign_keys = OFF` })
  ...
  a.run({ sql: `PRAGMA foreign_keys = ON` })
})

We can extend the generic database adapter with the group method which starts by acquiring the lock, then calls the provided function (passing along a version of the adapter that does coordinate through the lock), and then releases the lock.

@kevin-dp
Copy link
Contributor Author

@kevin-dp Do you have some numbers about this performance improvement?

@samwillis did some initial experiments with disabling FKs and has some numbers about the speedup it can provide.

I'm wondering if using builder.deferOrDisableFKsForTx inside the adapter transaction would be equivalent, same as it's currently done in _applySubscriptionData. It would solve the concurrency issue.

Do you mean to defer/disable FKs inside every TX and thus calling that inside the adapter's transaction method?
The idea is to support a flag that developers can enable to speedup e.g. initial sync of large amounts of data by disabling FKs. If we just defer the foreign key checks to the end of the TX we will not get the speedup we're after.

@davidmartos96
Copy link
Contributor

Do you mean to defer/disable FKs inside every TX and thus calling that inside the adapter's transaction method?

Yes, that's what I meant. I suggested the defer because I thought that a possible performance degradation could be SQLite doing foreign key checks for every INSERT/UPDATE in the Satellite transaction and just deferring all checks to the end would be faster.

@kevin-dp kevin-dp force-pushed the kevindp/vax-1871-disable-fks-sqlite branch from ba748f5 to 1d0ccb3 Compare May 22, 2024 13:35
@kevin-dp
Copy link
Contributor Author

kevin-dp commented May 22, 2024

I added an implementation of group on the database adapters and used it in the Satellite process to disable FKs for certain transactions when the disableFKs flag is set. There is one weird case which is when we receive a remote migration, this calls into the migrator's applyIfNotAlready method which had to be changed to also disable the FKs when running SQLite and when the flag is set.

To be honest, this PR adds quite some additional complexity around the drivers for this optimization. I don't think i like it.

@samwillis
Copy link
Contributor

@davidmartos96

The performance issue that this PR is addressing is related to large initial syncs. What we have found is that under some circumstances the time to apply the transaction increases exponentially.

With the linearlite example, 1100 issues + their comments takes under a second (to fully load the app), and has scaled to that point almost linearly. At 1200 it's up to something like 5-10 seconds, and at 1300 it's at over 30. I forget the exact numbers (currently sitting in an airplane awaiting takeoff).

It's directly related to the order of the inserts and if the FK is pointing to a row inserted before or after its own row. I'm suspicious it affects the WASM SQLite specifically, but need to test. I think it's related to doing a full scan for each FK reference, and it may be that there is memory allocations that cause the slow down with WASM.

Deferring FK checks does not solve the problem unfortunately, only turning them off does. However, there is another potential fix, to topologically sort the inserts based on FK, this almost entirely eliminates the slow down, but unfortunately won't solve the problem for circular references.

I'm not sure yet which solution we will go for, maybe both.

@samwillis
Copy link
Contributor

@kevin-dp agreed, it's looking quite complex. Maybe we should take a look at the topological sort of the inserts before we make a decision?

@davidmartos96
Copy link
Contributor

davidmartos96 commented May 22, 2024

@samwillis Thanks for the input!

We would be interested to try to reproduce it in the Dart client. But first we would like to reproduce it with the official client. In the linearlite example we've added this. It creates 1500 issues and 1 comment per issue (using faker to get random values).
After inserting everything and replicating to Postgres, if we reset the local database, the browser console says it synced in around 1.5s. Higher values don't seem to make much difference in the sync time either. We are using the wa-sqlite driver in Linearlite.
What's noticeable is that higher values (3000-5000) will leave the issues list blank for a while after the random inserts loop finishes, but that could be something else, as the sync time stays low.

     <button
        onClick={async () => {
          const total = 1500

          for (let i = 0; i < total; i++) {
            console.log(i, 'out of', total)

            const title = faker.company.name()

            const date = new Date()
            const issue_id = uuidv4()
            db.issue.create({
              data: {
                id: issue_id,
                title: title,
                username: 'testuser',
                priority: Priority.NONE,
                status: Status.BACKLOG,
                description: faker.lorem.paragraphs(3),
                modified: date,
                created: date,
                kanbanorder: generateKeyBetween(null, null),
              },
            })

            db.comment.create({
              data: {
                id: uuidv4(),
                issue_id: issue_id,
                body: faker.lorem.paragraphs(1),
                created_at: new Date(),
                username: 'testuser',
              },
            })
          }
        }}
      >
        INSERT RANDOM
      </button>

Copy link
Contributor

@msfstef msfstef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, nice work! I like the refactoring of the transaction adapter API and the group one is quite simple so I think it works well, left a couple of comments w.r.t. tests and conventions

clients/typescript/src/util/transactions.ts Outdated Show resolved Hide resolved
clients/typescript/src/migrators/index.ts Outdated Show resolved Hide resolved
clients/typescript/src/drivers/generic/adapter.ts Outdated Show resolved Hide resolved
clients/typescript/src/config/index.ts Outdated Show resolved Hide resolved
@samwillis
Copy link
Contributor

@kevin-dp If we are going ahead with this could we maybe name the new grouping method exclusive rather than group?

@kevin-dp
Copy link
Contributor Author

kevin-dp commented Jun 4, 2024

@kevin-dp If we are going ahead with this could we maybe name the new grouping method exclusive rather than group?

Not a fan of exclusive it doesn't describe what it does.
What about groupExclusively or runExclusively ?

@kevin-dp
Copy link
Contributor Author

kevin-dp commented Jun 4, 2024

Agreed with @samwillis and @msfstef to rename group to runExclusively.

@kevin-dp kevin-dp merged commit f4f020d into main Jun 4, 2024
22 checks passed
@kevin-dp kevin-dp deleted the kevindp/vax-1871-disable-fks-sqlite branch June 4, 2024 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants