Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Schema deltas are no longer transactional on sqlite #14909

Closed
squahtx opened this issue Jan 24, 2023 · 4 comments · Fixed by #14910 or #14926
Closed

Schema deltas are no longer transactional on sqlite #14909

squahtx opened this issue Jan 24, 2023 · 4 comments · Fixed by #14910 or #14926
Assignees
Labels
A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db O-Occasional Affects or can be seen by some users regularly or most users rarely S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@squahtx
Copy link
Contributor

squahtx commented Jan 24, 2023

#13873 may have introduced a regression where schema deltas are no longer executed inside a transaction on sqlite. This can lead to database inconsistencies and bootloops when Synapse is killed while running database migrations.

Examples:

Previously: #6467.
Split out from #14401.

@squahtx squahtx added A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db O-Occasional Affects or can be seen by some users regularly or most users rarely S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. labels Jan 24, 2023
@squahtx
Copy link
Contributor Author

squahtx commented Jan 24, 2023

When using sqlite, schema deltas are not fully transactional (see docstring here). The schema delta will end up being half applied and the database will end up in an inconsistent state. Depending on the nature of the schema delta, the next run may fail.

It looks like this may have been introduced by #13873, which replaced the old handrolled code for parsing sql files with something more conventional. That was probably a good idea, but it does break the transactional semantics we were relying on for our migrations...

Originally posted by @richvdh in #14401 (comment)

@squahtx
Copy link
Contributor Author

squahtx commented Jan 24, 2023

When using sqlite, schema deltas are not fully transactional (see docstring here). The schema delta will end up being half applied and the database will end up in an inconsistent state. Depending on the nature of the schema delta, the next run may fail.

It looks like this may have been introduced by #13873, which replaced the old handrolled code for parsing sql files with something more conventional. That was probably a good idea, but it does break the transactional semantics we were relying on for our migrations...

Sorry, that's my mistake. FWIW the old parser still exists, see e.g.

def get_statements(f: Iterable[str]) -> Generator[str, None, None]:

The only other option would be to inject an explicit BEGIN TRANSACTION and COMMIT before and after the script. But that's not a like-for-like replacement for what had before. Maybe it's best to revert back.

(I recall that when I was making the full schema dumps that parser couldn't handle something postgres-specific. Trigger definitions maybe?)

Originally posted by @DMRobertson in #14401 (comment)

@squahtx
Copy link
Contributor Author

squahtx commented Jan 24, 2023

The only other option would be to inject an explicit BEGIN TRANSACTION and COMMIT before and after the script. But that's not a like-for-like replacement for what had before

It's not, but I think it would have the right semantics, and would avoid us maintaining our own SQL parser, with all the pitfalls that brings.

Originally posted by @richvdh in #14401 (comment)

squahtx pushed a commit that referenced this issue Jan 24, 2023
#13873 introduced a regression which causes sqlite database migrations
to no longer run inside a transaction. Wrap them in a transaction again,
to avoid database corruption when migrations are interrupted.

Fixes #14909.

Signed-off-by: Sean Quah <seanq@matrix.org>
squahtx added a commit that referenced this issue Jan 25, 2023
#13873 introduced a regression which causes sqlite database migrations
to no longer run inside a transaction. Wrap them in a transaction again,
to avoid database corruption when migrations are interrupted.

Fixes #14909.

Signed-off-by: Sean Quah <seanq@matrix.org>
@squahtx squahtx reopened this Jan 25, 2023
@squahtx
Copy link
Contributor Author

squahtx commented Jan 25, 2023

@DMRobertson points out here that there is a small window in between the schema delta being committed and Synapse updating its bookkeeping which is still non-transactional.

We need to leave the transaction open in executescript, if that is permitted, or move the bookkeeping update into the sql that gets run.

@squahtx squahtx self-assigned this Jan 25, 2023
squahtx pushed a commit that referenced this issue Jan 26, 2023
#14910 fixed the regression introduced by #13873 where sqlite database
migrataions would no longer run inside a transaction. However, it
committed the transaction before Synapse updated its bookkeeping of
which migrations have been run, which means that migrations may be run
again after they have completed successfully.

Leave the transaction open at the end of `executescript`, to restore the
old, correct behaviour.

Fixes #14909.

Signed-off-by: Sean Quah <seanq@matrix.org>
squahtx pushed a commit that referenced this issue Jan 26, 2023
#14910 fixed the regression introduced by #13873 where sqlite database
migrataions would no longer run inside a transaction. However, it
committed the transaction before Synapse updated its bookkeeping of
which migrations have been run, which means that migrations may be run
again after they have completed successfully.

Leave the transaction open at the end of `executescript`, to restore the
old, correct behaviour.

Fixes #14909.

Signed-off-by: Sean Quah <seanq@matrix.org>
squahtx added a commit that referenced this issue Jan 31, 2023
#14910 fixed the regression introduced by #13873 where sqlite database
migrations would no longer run inside a transaction. However, it
committed the transaction before Synapse updated its bookkeeping of
which migrations have been run, which means that migrations may be run
again after they have completed successfully.

Leave the transaction open at the end of `executescript`, to restore the
old, correct behaviour. Also make the PostgreSQL behaviour consistent
with SQLite.

Fixes #14909.

Signed-off-by: Sean Quah <seanq@matrix.org>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db O-Occasional Affects or can be seen by some users regularly or most users rarely S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
1 participant