Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Filter out auth chain queries that don't exist #16552

Conversation

realtyem
Copy link
Contributor

@realtyem realtyem commented Oct 26, 2023

Short version:

It is kind of a logic fix in that you're doing less work by disregarding the sentinel value of 0. This has two improvements:

  1. Avoiding expanding the chains dictionary unnecessarily.
  2. Avoiding passing more arguments through the SQL engine (and to the database) which we know will result in no matches.
Long version:

In _get_auth_chain_ids_using_cover_index_txn():

# Add the initial set of chains, excluding the sequence corresponding to
# initial event.
for chain_id, seq_no in event_chains.items():
chains[chain_id] = max(seq_no - 1, chains.get(chain_id, 0))

Seems to have a small logic problem, where the max() can evaluate to 0. The chains dict is passed into the following SQL block:

# We can use `execute_values` to efficiently fetch the gaps when
# using postgres.
sql = """
SELECT event_id
FROM event_auth_chains AS c, (VALUES ?) AS l(chain_id, max_seq)
WHERE
c.chain_id = l.chain_id
AND sequence_number <= max_seq
"""
rows = txn.execute_values(sql, chains.items())
results.update(r for r, in rows)

where it inhabits the max_seq variable. Inside the event_auth_chains database table, that column can never be 0.

Filter out that 0 before it gets to the SQL.

The temporary metric attached produces results that appear like this(over less than 24 hours):
auth_chain_condition_hit

The metric commit isn't particularly efficient and has no long term relevance, so it will not be included in the final render.

Pull Request Checklist

Signed-off-by: Jason Little realtyem@gmail.com

Auth chain chain_id's and sequence numbers can't be zero, but that can appear in the query here. Stop that from showing up.
@realtyem realtyem marked this pull request as ready for review October 26, 2023 10:30
@realtyem realtyem requested a review from a team as a code owner October 26, 2023 10:30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that chain IDs are never 0. They come from a DB-backed sequence:

self.event_chain_id_gen = build_sequence_generator(
db_conn,
database.engine,
get_chain_id_txn,
"event_auth_chain_id",
table="event_auth_chains",
id_column="chain_id",
)

CREATE SEQUENCE IF NOT EXISTS event_auth_chain_id;

which per https://www.postgresql.org/docs/16/sql-createsequence.html#id-1.9.3.81.6 should start at 1 and increase by 1.

Therefore querying for events in chain 0 will never return anything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, your point is that the sequence_number in the chains is never 0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these are also seeded at 1 and grow by 1:

# We found a chain ID/sequence number candidate, check its
# not already taken.
proposed_new_id = existing_chain_id[0]
proposed_new_seq = existing_chain_id[1] + 1
if chain_to_max_seq_no[proposed_new_id] < proposed_new_seq:
new_chain_tuple = (
proposed_new_id,
proposed_new_seq,
)

for chain_id, seq_no in event_chains.items():
chains[chain_id] = max(seq_no - 1, chains.get(chain_id, 0))
max_sequence_result = max(seq_no - 1, chains.get(chain_id, 0))
if max_sequence_result > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if comparing to != 0 would be clearer?

Suggested change
if max_sequence_result > 0:
if max_sequence_result != 0:

for chain_id, seq_no in event_chains.items():
chains[chain_id] = max(seq_no - 1, chains.get(chain_id, 0))
max_sequence_result = max(seq_no - 1, chains.get(chain_id, 0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we understand when max_sequence_result is 0?

I think I've convinced myself that sequence numbers are strictly positive, and chains is a map to sequence numbers. Therefore we would need to have seq_no == 1 and chain_id not in chains. Is there any particular meaning to this situation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this corresponds to a brand-new chain that isn't the target of some other chain. Which sounds like the kind of thing that only happens at the start of a room? (cc @erikjohnston: is any of this sane?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically handling the case where an initial event is the first thing in the chain (and nothing else references it). This is actually pretty common, as we start a new chain whenever we see a new type / state key pair.

chains[chain_id] = max(seq_no - 1, chains.get(chain_id, 0))
max_sequence_result = max(seq_no - 1, chains.get(chain_id, 0))
if max_sequence_result > 0:
chains[chain_id] = max_sequence_result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With your change, is it possible that chains can be empty after this loop? If so, will the query below fall over if we pass in an empty chains.items() to execute_values?

Copy link
Contributor

@DMRobertson DMRobertson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is probably fine, but I was careful to double-check this since event auth chain logic is important to keep watertight. I would like @erikjohnston to double-check though.

Comment on lines 305 to 306
for chain_id, seq_no in event_chains.items():
chains[chain_id] = max(seq_no - 1, chains.get(chain_id, 0))
max_sequence_result = max(seq_no - 1, chains.get(chain_id, 0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd think I'd replace this change with something like:

        for chain_id, seq_no in event_chains.items():
            # Check if the initial event is the first item in the chain. If so, then
            # there is nothing new to add from this chain.
            if seq_no == 1:
                continue
                
            chains[chain_id] = max(seq_no - 1, chains.get(chain_id, 0))

Maybe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if that is clearer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced logic in 94e22c2

Would you like me to lose that comment above the condition?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yeah, I think that comment is now redundant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 3eaeaff

Copy link
Member

@erikjohnston erikjohnston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@erikjohnston erikjohnston merged commit 460743d into matrix-org:develop Nov 22, 2023
38 checks passed
@realtyem realtyem deleted the auth-chain-exclude-retrieving-nonexistent-sequence branch November 22, 2023 12:48
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants