Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

psycopg2.errors.UniqueViolation: could not create unique index "receipts_graph_unique_index" when upgrading from <1.68.0 to >=1.70.0 #14406

Closed
squahtx opened this issue Nov 10, 2022 · 2 comments · Fixed by #14453
Assignees
Labels
A-Background-Updates Filling in database columns, making the database eventually up-to-date A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db O-Occasional Affects or can be seen by some users regularly or most users rarely S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Release-Blocker Must be resolved before making a release

Comments

@squahtx
Copy link
Contributor

squahtx commented Nov 10, 2022

Seen as #14123

When upgrading Synapse from a version older than 1.68.0, the receipts_graph_unique_index background update may fail with

2022-10-10 16:50:14,874 - synapse.storage.background_updates - 428 - INFO - background_updates-0- Starting update batch on background update 'receipts_graph_unique_index'
2022-10-10 16:50:14,877 - synapse.storage.background_updates - 620 - INFO - background_updates-0- Adding index receipts_graph_unique_index to receipts_graph
2022-10-10 16:50:14,949 - synapse.storage.background_updates - 299 - ERROR - background_updates-0- Error doing update
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/synapse/storage/background_updates.py", line 291, in run_background_updates
    result = await self.do_next_background_update(sleep)
  File "/usr/local/lib/python3.9/site-packages/synapse/storage/background_updates.py", line 421, in do_next_background_update
    await self._do_background_update(desired_duration_ms)
  File "/usr/local/lib/python3.9/site-packages/synapse/storage/background_updates.py", line 464, in _do_background_update
    items_updated = await update_handler(progress, batch_size)
  File "/usr/local/lib/python3.9/site-packages/synapse/storage/background_updates.py", line 621, in updater
    await self.db_pool.runWithConnection(runner)
  File "/usr/local/lib/python3.9/site-packages/synapse/storage/database.py", line 976, in runWithConnection
    return await make_deferred_yieldable(
  File "/usr/local/lib/python3.9/site-packages/twisted/python/threadpool.py", line 244, in inContext
    result = inContext.theWork()  # type: ignore[attr-defined]
  File "/usr/local/lib/python3.9/site-packages/twisted/python/threadpool.py", line 260, in <lambda>
    inContext.theWork = lambda: context.call(  # type: ignore[attr-defined]
  File "/usr/local/lib/python3.9/site-packages/twisted/python/context.py", line 117, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/usr/local/lib/python3.9/site-packages/twisted/python/context.py", line 82, in callWithContext
    return func(*args, **kw)
  File "/usr/local/lib/python3.9/site-packages/twisted/enterprise/adbapi.py", line 282, in _runWithConnection
    result = func(conn, *args, **kw)
  File "/usr/local/lib/python3.9/site-packages/synapse/storage/database.py", line 969, in inner_func
    return func(db_conn, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/synapse/storage/background_updates.py", line 572, in create_index_psql
    c.execute(sql)
  File "/usr/local/lib/python3.9/site-packages/synapse/storage/database.py", line 389, in execute
    self._do_execute(self.txn.execute, sql, *args)
  File "/usr/local/lib/python3.9/site-packages/synapse/storage/database.py", line 432, in _do_execute
    return func(sql, *args, **kwargs)
psycopg2.errors.UniqueViolation: could not create unique index "receipts_graph_unique_index"
DETAIL:  Key (room_id, receipt_type, user_id)=(!watercooler-v9:maunium.net, m.read, @cat:feline.support) is duplicated.

The series of schema deltas involved are:

21/receipts.sql

  • Creates the receipts_graph(room_id, receipt_type, user_id, event_ids, data) table with CONSTRAINT receipts_graph_uniqueness UNIQUE (room_id, receipt_type, user_id)
    ie. there is one receipt for each user for each room.

72/07thread_receipts.sql.postgres (1.68.0)

  • Adds a thread_id column to receipts_graph
  • Adds a per-thread unique constraint: CONSTRAINT receipts_graph_uniqueness_thread UNIQUE (room_id, receipt_type, user_id, thread_id);
    ie. there is one receipt for each user for each thread in a room.
  • At this point, per-thread receipts don't work yet, because the original (room_id, receipt_type, user_id) constraint is too restrictive.

72/08thread_receipts.sql (1.68.0)

  • Queues up the receipts_graph_unique_index background update, which adds a (room_id, receipt_type, user_id) constraint WHERE thread_id IS NULL.
    ie. there is one non-thread receipt for each user for each room.

73/08thread_receipts_non_null.sql.postgres (1.70.0)

  • Drops the original, non-thread-aware, receipts_graph_uniqueness constraint, allowing thread receipts to work.

sqlite takes a similar, equally confusing path.

The window where there is no unique constraint

Since background updates are run asynchronously, the receipts_graph_unique_index background update may run after the last schema delta, leaving a window where there is no unique constraint on (room_id, receipt_type, user_id) for NULL thread_ids.

Unsafe upserts

But that isn't the bug. We have logic to deal with this window. See

UNIQUE_INDEX_BACKGROUND_UPDATES = {
"user_ips": "user_ips_device_unique_index",
"device_lists_remote_extremeties": "device_lists_remote_extremeties_unique_idx",
"device_lists_remote_cache": "device_lists_remote_cache_unique_idx",
"event_search": "event_search_event_id_idx",
"local_media_repository_thumbnails": "local_media_repository_thumbnails_method_idx",
"remote_media_cache_thumbnails": "remote_media_repository_thumbnails_method_idx",
"event_push_summary": "event_push_summary_unique_index2",
"receipts_linearized": "receipts_linearized_unique_index",
"receipts_graph": "receipts_graph_unique_index",
}
.
When one of these background updates is in progress, all our simple_upsert* operations are done manually without relying on unique constraints.
And we don't upsert into receipts_graph with handwritten SQL anywhere.

Emulated upsert internals

The emulated upsert first tries an UPDATE, then an INSERT if the UPDATE modified 0 rows.
The default isolation level in Synapse is REPEATABLE READ, which does not prevent the race where two upserts try to insert the same row at the same time.
But we've already thought of this and lock the entire table when doing the emulated upsert:

if lock:
self.engine.lock_table(txn, table)

Except the locking is controlled by a parameter... and we've left it as False:

self.db_pool.simple_upsert_txn(
txn,
table="receipts_graph",
keyvalues=keyvalues,
values={
"event_ids": json_encoder.encode(event_ids),
"data": json_encoder.encode(data),
},
where_clause=where_clause,
# receipts_graph has a unique constraint on
# (user_id, room_id, receipt_type), so no need to lock
lock=False,
)


In summary, there's a window where there is no non-thread unique constraint on receipts_graph and a race where we try to insert two new rows at the same time for the same (room_id, receipt_type, user_id).

The same probably applies to receipts_linearized.

@squahtx
Copy link
Contributor Author

squahtx commented Nov 10, 2022

Next steps

@squahtx squahtx added S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db A-Background-Updates Filling in database columns, making the database eventually up-to-date O-Occasional Affects or can be seen by some users regularly or most users rarely labels Nov 10, 2022
@DMRobertson
Copy link
Contributor

@DMRobertson suggests we remove lock=False entirely.
@reivilibre suspects it's left over from a time when sqlite did not support native upserts.

It's possible I should have done this in #13760. I removed can_native_upsert but was very confused about whether it was appropriate and/or safe to remove lock.

@squahtx squahtx self-assigned this Nov 10, 2022
@squahtx squahtx added the X-Release-Blocker Must be resolved before making a release label Nov 15, 2022
squahtx pushed a commit that referenced this issue Nov 15, 2022
Before creating the `receipts_graph_unique_index` and
`receipts_linearized_unique_index` unique indexes, we have to clean up
any duplicate receipts that may have crept in due to
#14406.

Signed-off-by: Sean Quah <seanq@matrix.org>
squahtx pushed a commit that referenced this issue Nov 15, 2022
Before creating the `receipts_graph_unique_index` and
`receipts_linearized_unique_index` unique indexes, we have to clean up
any duplicate receipts that may have crept in due to
#14406.

Signed-off-by: Sean Quah <seanq@matrix.org>
squahtx pushed a commit that referenced this issue Nov 15, 2022
Before creating the `receipts_graph_unique_index` and
`receipts_linearized_unique_index` unique indexes, we have to clean up
any duplicate receipts that may have crept in due to
#14406.

Signed-off-by: Sean Quah <seanq@matrix.org>
squahtx pushed a commit that referenced this issue Nov 16, 2022
To perform an emulated upsert into a table safely, we must either:
 * lock the table,
 * be the only writer upserting into the table
 * or rely on another unique index being present.

When the 2nd or 3rd cases were applicable, we previously avoided locking
the table as an optimization. However, as seen in #14406, it is easy to
slip up when adding new schema deltas and corrupt the database.

Since #13760, Synapse has required SQLite >= 3.27.0, which has support
for native upserts. This means that we now only perform emulated upserts
while waiting for background updates to add unique indexes.

Since emulated upserts are far less frequent now, let's remove the
option to skip locking tables, so that we don't shoot ourselves in the
foot again.

Signed-off-by: Sean Quah <seanq@matrix.org>
squahtx pushed a commit that referenced this issue Nov 16, 2022
To perform an emulated upsert into a table safely, we must either:
 * lock the table,
 * be the only writer upserting into the table
 * or rely on another unique index being present.

When the 2nd or 3rd cases were applicable, we previously avoided locking
the table as an optimization. However, as seen in #14406, it is easy to
slip up when adding new schema deltas and corrupt the database.

Since #13760, Synapse has required SQLite >= 3.27.0, which has support
for native upserts. This means that we now only perform emulated upserts
while waiting for background updates to add unique indexes.

Since emulated upserts are far less frequent now, let's remove the
option to skip locking tables, so that we don't shoot ourselves in the
foot again.

Signed-off-by: Sean Quah <seanq@matrix.org>
squahtx added a commit that referenced this issue Nov 28, 2022
To perform an emulated upsert into a table safely, we must either:
 * lock the table,
 * be the only writer upserting into the table
 * or rely on another unique index being present.

When the 2nd or 3rd cases were applicable, we previously avoided locking
the table as an optimization. However, as seen in #14406, it is easy to
slip up when adding new schema deltas and corrupt the database.

The only time we lock when performing emulated upserts is while waiting
for background updates on postgres. On sqlite, we do no locking at all.

Let's remove the option to skip locking tables, so that we don't shoot
ourselves in the foot again.

Signed-off-by: Sean Quah <seanq@matrix.org>
H-Shay pushed a commit that referenced this issue Dec 13, 2022
To perform an emulated upsert into a table safely, we must either:
 * lock the table,
 * be the only writer upserting into the table
 * or rely on another unique index being present.

When the 2nd or 3rd cases were applicable, we previously avoided locking
the table as an optimization. However, as seen in #14406, it is easy to
slip up when adding new schema deltas and corrupt the database.

The only time we lock when performing emulated upserts is while waiting
for background updates on postgres. On sqlite, we do no locking at all.

Let's remove the option to skip locking tables, so that we don't shoot
ourselves in the foot again.

Signed-off-by: Sean Quah <seanq@matrix.org>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Background-Updates Filling in database columns, making the database eventually up-to-date A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db O-Occasional Affects or can be seen by some users regularly or most users rarely S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Release-Blocker Must be resolved before making a release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants