Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Crashes on startup while generating presence stream #10106

Open
benbz opened this issue Jun 2, 2021 · 1 comment
Open

Crashes on startup while generating presence stream #10106

benbz opened this issue Jun 2, 2021 · 1 comment
Labels
S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@benbz
Copy link
Contributor

benbz commented Jun 2, 2021

Description

Crashes on startup while generating presence stream

Steps to reproduce

  • HS is running a set of workers: event-persister, federation-{inbound,reader,sender}, presence-writer
  • All workers and main process were restarted
  • event-persister and federation-sender crashed 3 times each on startup before successfully starting

Event Persister:

2021-06-02 07:46:12,318 - synapse.storage.util.id_generators - 55 - INFO - main- initialising stream generator for pushers(id)
2021-06-02 07:46:12,318 - synapse.storage.util.id_generators - 55 - INFO - main- initialising stream generator for deleted_pushers(stream_id)
2021-06-02 07:46:12,319 - synapse.storage.util.id_generators - 55 - INFO - main- initialising stream generator for public_room_list_stream(stream_id)
2021-06-02 07:46:12,321 - synapse.storage.util.id_generators - 55 - INFO - main- initialising stream generator for push_rules_stream(stream_id)
2021-06-02 07:46:12,327 - synapse.storage.util.id_generators - 55 - INFO - main- initialising stream generator for pushers(id)
2021-06-02 07:46:12,327 - synapse.storage.util.id_generators - 55 - INFO - main- initialising stream generator for deleted_pushers(stream_id)
2021-06-02 07:46:12,328 - synapse.storage.util.id_generators - 55 - INFO - main- initialising stream generator for local_group_updates(stream_id)
2021-06-02 07:46:12,330 - synapse.storage.util.id_generators - 55 - INFO - main- initialising stream generator for device_lists_stream(stream_id)
2021-06-02 07:46:12,331 - synapse.storage.util.id_generators - 55 - INFO - main- initialising stream generator for user_signature_stream(stream_id)
2021-06-02 07:46:12,331 - synapse.storage.util.id_generators - 55 - INFO - main- initialising stream generator for device_lists_outbound_pokes(stream_id)
2021-06-02 07:46:12,343 - twisted - 258 - ERROR - sentinel- Traceback (most recent call last):
2021-06-02 07:46:12,343 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/runpy.py", line 194, in _run_module_as_main
2021-06-02 07:46:12,343 - twisted - 258 - ERROR - sentinel-     return _run_code(code, main_globals, None,
2021-06-02 07:46:12,344 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/runpy.py", line 87, in _run_code
2021-06-02 07:46:12,344 - twisted - 258 - ERROR - sentinel-     exec(code, run_globals)
2021-06-02 07:46:12,344 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/site-packages/synapse/app/generic_worker.py", line 483, in <module>
2021-06-02 07:46:12,345 - twisted - 258 - ERROR - sentinel-     start(sys.argv[1:])
2021-06-02 07:46:12,345 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/site-packages/synapse/app/generic_worker.py", line 470, in start
2021-06-02 07:46:12,345 - twisted - 258 - ERROR - sentinel-     hs.setup()
2021-06-02 07:46:12,345 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/site-packages/synapse/server.py", line 280, in setup
2021-06-02 07:46:12,346 - twisted - 258 - ERROR - sentinel-     self.datastores = Databases(self.DATASTORE_CLASS, self)
2021-06-02 07:46:12,346 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/__init__.py", line 76, in __init__
2021-06-02 07:46:12,346 - twisted - 258 - ERROR - sentinel-     main = main_store_class(database, db_conn, hs)
2021-06-02 07:46:12,346 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/user_directory.py", line 553, in __init__
2021-06-02 07:46:12,347 - twisted - 258 - ERROR - sentinel-     super().__init__(database, db_conn, hs)
2021-06-02 07:46:12,347 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/user_directory.py", line 40, in __init__
2021-06-02 07:46:12,347 - twisted - 258 - ERROR - sentinel-     super().__init__(database, db_conn, hs)
2021-06-02 07:46:12,347 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/stats.py", line 103, in __init__
2021-06-02 07:46:12,348 - twisted - 258 - ERROR - sentinel-     super().__init__(database, db_conn, hs)
2021-06-02 07:46:12,348 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/presence.py", line 46, in __init__
2021-06-02 07:46:12,348 - twisted - 258 - ERROR - sentinel-     self._presence_id_gen = MultiWriterIdGenerator(
2021-06-02 07:46:12,348 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/site-packages/synapse/storage/util/id_generators.py", line 283, in __init__
2021-06-02 07:46:12,349 - twisted - 258 - ERROR - sentinel-     self._load_current_ids(db_conn, tables)
2021-06-02 07:46:12,349 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/site-packages/synapse/storage/util/id_generators.py", line 304, in _load_current_ids
2021-06-02 07:46:12,349 - twisted - 258 - ERROR - sentinel-     cur.execute(sql, (self._stream_name, self._writers))
2021-06-02 07:46:12,350 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 290, in execute
2021-06-02 07:46:12,350 - twisted - 258 - ERROR - sentinel-     self._do_execute(self.txn.execute, sql, *args)
2021-06-02 07:46:12,350 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 316, in _do_execute
2021-06-02 07:46:12,350 - twisted - 258 - ERROR - sentinel-     return func(sql, *args)
2021-06-02 07:46:12,351 - twisted - 258 - ERROR - sentinel- psycopg2.errors.SerializationFailure: could not serialize access due to concurrent update

Federation Sender:

2021-06-02 07:46:12,819 - synapse.storage.util.id_generators - 55 - INFO - main- initialising stream generator for pushers(id)
2021-06-02 07:46:12,820 - synapse.storage.util.id_generators - 55 - INFO - main- initialising stream generator for deleted_pushers(stream_id)
2021-06-02 07:46:12,821 - synapse.storage.util.id_generators - 55 - INFO - main- initialising stream generator for public_room_list_stream(stream_id)
2021-06-02 07:46:12,821 - synapse.storage.util.id_generators - 55 - INFO - main- initialising stream generator for push_rules_stream(stream_id)
2021-06-02 07:46:12,826 - synapse.storage.util.id_generators - 55 - INFO - main- initialising stream generator for pushers(id)
2021-06-02 07:46:12,826 - synapse.storage.util.id_generators - 55 - INFO - main- initialising stream generator for deleted_pushers(stream_id)
2021-06-02 07:46:12,827 - synapse.storage.util.id_generators - 55 - INFO - main- initialising stream generator for local_group_updates(stream_id)
2021-06-02 07:46:12,829 - synapse.storage.util.id_generators - 55 - INFO - main- initialising stream generator for device_lists_stream(stream_id)
2021-06-02 07:46:12,829 - synapse.storage.util.id_generators - 55 - INFO - main- initialising stream generator for user_signature_stream(stream_id)
2021-06-02 07:46:12,830 - synapse.storage.util.id_generators - 55 - INFO - main- initialising stream generator for device_lists_outbound_pokes(stream_id)
2021-06-02 07:46:12,841 - twisted - 258 - ERROR - sentinel- Traceback (most recent call last):
2021-06-02 07:46:12,842 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/runpy.py", line 194, in _run_module_as_main
2021-06-02 07:46:12,842 - twisted - 258 - ERROR - sentinel-     return _run_code(code, main_globals, None,
2021-06-02 07:46:12,842 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/runpy.py", line 87, in _run_code
2021-06-02 07:46:12,843 - twisted - 258 - ERROR - sentinel-     exec(code, run_globals)
2021-06-02 07:46:12,843 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/site-packages/synapse/app/federation_sender.py", line 23, in <module>
2021-06-02 07:46:12,843 - twisted - 258 - ERROR - sentinel-     start(sys.argv[1:])
2021-06-02 07:46:12,843 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/site-packages/synapse/app/generic_worker.py", line 470, in start
2021-06-02 07:46:12,844 - twisted - 258 - ERROR - sentinel-     hs.setup()
2021-06-02 07:46:12,844 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/site-packages/synapse/server.py", line 280, in setup
2021-06-02 07:46:12,844 - twisted - 258 - ERROR - sentinel-     self.datastores = Databases(self.DATASTORE_CLASS, self)
2021-06-02 07:46:12,845 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/__init__.py", line 76, in __init__
2021-06-02 07:46:12,845 - twisted - 258 - ERROR - sentinel-     main = main_store_class(database, db_conn, hs)
2021-06-02 07:46:12,845 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/user_directory.py", line 553, in __init__
2021-06-02 07:46:12,846 - twisted - 258 - ERROR - sentinel-     super().__init__(database, db_conn, hs)
2021-06-02 07:46:12,846 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/user_directory.py", line 40, in __init__
2021-06-02 07:46:12,846 - twisted - 258 - ERROR - sentinel-     super().__init__(database, db_conn, hs)
2021-06-02 07:46:12,846 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/stats.py", line 103, in __init__
2021-06-02 07:46:12,846 - twisted - 258 - ERROR - sentinel-     super().__init__(database, db_conn, hs)
2021-06-02 07:46:12,847 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/presence.py", line 46, in __init__
2021-06-02 07:46:12,847 - twisted - 258 - ERROR - sentinel-     self._presence_id_gen = MultiWriterIdGenerator(
2021-06-02 07:46:12,847 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/site-packages/synapse/storage/util/id_generators.py", line 283, in __init__
2021-06-02 07:46:12,847 - twisted - 258 - ERROR - sentinel-     self._load_current_ids(db_conn, tables)
2021-06-02 07:46:12,848 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/site-packages/synapse/storage/util/id_generators.py", line 304, in _load_current_ids
2021-06-02 07:46:12,848 - twisted - 258 - ERROR - sentinel-     cur.execute(sql, (self._stream_name, self._writers))
2021-06-02 07:46:12,848 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 290, in execute
2021-06-02 07:46:12,848 - twisted - 258 - ERROR - sentinel-     self._do_execute(self.txn.execute, sql, *args)
2021-06-02 07:46:12,849 - twisted - 258 - ERROR - sentinel-   File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 316, in _do_execute
2021-06-02 07:46:12,849 - twisted - 258 - ERROR - sentinel-     return func(sql, *args)
2021-06-02 07:46:12,849 - twisted - 258 - ERROR - sentinel- psycopg2.errors.SerializationFailure: could not serialize access due to concurrent update

Raising as this HS is testing out the presence worker and the presence stream appears to be the problematic one.

Version information

  • Homeserver: vector.modular.im
  • Version: 1.34.0
@erikjohnston
Copy link
Member

erikjohnston commented Jun 11, 2021

The function causing this is:

def _load_current_ids(
self,
db_conn,
tables: List[Tuple[str, str, str]],
):
cur = db_conn.cursor(txn_name="_load_current_ids")
# Load the current positions of all writers for the stream.
if self._writers:
# We delete any stale entries in the positions table. This is
# important if we add back a writer after a long time; we want to
# consider that a "new" writer, rather than using the old stale
# entry here.
sql = """
DELETE FROM stream_positions
WHERE
stream_name = ?
AND instance_name != ALL(?)
"""
cur.execute(sql, (self._stream_name, self._writers))
sql = """
SELECT instance_name, stream_id FROM stream_positions
WHERE stream_name = ?
"""
cur.execute(sql, (self._stream_name,))
self._current_positions = {
instance: stream_id * self._return_factor
for instance, stream_id in cur
if instance in self._writers
}
# We set the `_persisted_upto_position` to be the minimum of all current
# positions. If empty we use the max stream ID from the DB table.
min_stream_id = min(self._current_positions.values(), default=None)
if min_stream_id is None:
# We add a GREATEST here to ensure that the result is always
# positive. (This can be a problem for e.g. backfill streams where
# the server has never backfilled).
max_stream_id = 1
for table, _, id_column in tables:
sql = """
SELECT GREATEST(COALESCE(%(agg)s(%(id)s), 1), 1)
FROM %(table)s
""" % {
"id": id_column,
"table": table,
"agg": "MAX" if self._positive else "-MIN",
}
cur.execute(sql)
(stream_id,) = cur.fetchone()
max_stream_id = max(max_stream_id, stream_id)
self._persisted_upto_position = max_stream_id
else:
# If we have a min_stream_id then we pull out everything greater
# than it from the DB so that we can prefill
# `_known_persisted_positions` and get a more accurate
# `_persisted_upto_position`.
#
# We also check if any of the later rows are from this instance, in
# which case we use that for this instance's current position. This
# is to handle the case where we didn't finish persisting to the
# stream positions table before restart (or the stream position
# table otherwise got out of date).
self._persisted_upto_position = min_stream_id
rows = []
for table, instance_column, id_column in tables:
sql = """
SELECT %(instance)s, %(id)s FROM %(table)s
WHERE ? %(cmp)s %(id)s
""" % {
"id": id_column,
"table": table,
"instance": instance_column,
"cmp": "<=" if self._positive else ">=",
}
cur.execute(sql, (min_stream_id * self._return_factor,))
rows.extend(cur)
# Sort so that we handle rows in order for each instance.
rows.sort()
with self._lock:
for (
instance,
stream_id,
) in rows:
stream_id = self._return_factor * stream_id
self._add_persisted_position(stream_id)
if instance == self._instance_name:
self._current_positions[instance] = stream_id
cur.close()

we should split that transaction up so that its a) shorter and so hopefully less likely to conflict and b) we don't have a single transaction that is poking different tables (any of which changing can cause a problem).

@erikjohnston erikjohnston added S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. labels Jun 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

No branches or pull requests

2 participants