Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

When loading current ids, sort by stream_id to avoid incorrect overwrite and avoid errors caused by sorting alphabetical instance name which can be null #13585

Merged
merged 4 commits into from
Aug 24, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 11 additions & 3 deletions synapse/storage/util/id_generators.py
Original file line number Diff line number Diff line change
Expand Up @@ -446,23 +446,31 @@ def _load_current_ids(

rows: List[Tuple[str, int]] = []
for table, instance_column, id_column in tables:
logger.info(
"table=%s instance_column=%s id_column=%s",
table,
instance_column,
id_column,
)
sql = """
SELECT %(instance)s, %(id)s FROM %(table)s
WHERE ? %(cmp)s %(id)s
/* Sort so that we handle rows in order for each instance. */
ORDER BY %(id)s ASC
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved
""" % {
"id": id_column,
"table": table,
"instance": instance_column,
"cmp": "<=" if self._positive else ">=",
}
logger.info(
"sql=%s args=%s", sql, (min_stream_id * self._return_factor,)
)
cur.execute(sql, (min_stream_id * self._return_factor,))

# Cast safety: this corresponds to the types returned by the query above.
rows.extend(cast(Iterable[Tuple[str, int]], cur))

# Sort so that we handle rows in order for each instance.
Copy link
Contributor Author

@MadLittleMods MadLittleMods Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the correct order that we want? "Sort by [what] so that ... because [why]"

The previous code sorted it by instance_name because that was the first column selected in the SQL. Why do we want to do that? That code has been there since it was introduced

instance_name can be null instead of master which blows up the sort code. Another question is why do we see it as null sometimes instead of master in monolith mode?


As @richvdh pointed out, it does work to sort worker instances. But it's unclear to me why we want that or if that is intended. It feels like we might not need to sort at all.

Further discussion: https://matrix.to/#/!vcyiEtMVHIhWXcJAfl:sw1v.org/$XjldnX6z-QfMK-eAS77tEIbwN5kNA3tq8s9temUwONs?via=matrix.org&via=element.io&via=beeper.com

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(requesting review for this comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My best guest is the line:

self._current_positions[instance] = stream_id

Where if we didn't ensure that we handled the rows in ascending stream ID order (per-instance) then we could end up setting the current position to a lower stream ID than we ought to.

I think its enough to do rows.sort(key=lambda _, stream_id: stream_id) or similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context @erikjohnston!

I don't know of the symptoms this would manifest as or underlying problems this causes but I guess this fixes some other bugs with incorrect stream_ids (all the places we use MultiWriterIdGenerator).

Probably solving some duplicate primary key database errors at the very least. And maybe larger bugs like to_device messages not sending.

rows.sort()

with self._lock:
for (
instance,
Expand Down