Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Users table has all users set to in_guild=False on user upsert #157

Closed
jchristgit opened this issue Jun 18, 2024 · 9 comments · Fixed by #165
Closed

Users table has all users set to in_guild=False on user upsert #157

jchristgit opened this issue Jun 18, 2024 · 9 comments · Fixed by #165

Comments

@jchristgit
Copy link
Member

When metricity starts up and synchronizes the users, all users temporarily have
their in_guild field set to false.

From looking at the code I am currently unsure how this can happen. As far as I
understand, the complete upsert is run in a transaction that is committed at
the end of each individual insert.

Demonstration as taken during a metricity restart (whilst User upsert:
messages can be seen in the logs):

metricity=# SELECT in_guild, COUNT(*) FROM users GROUP BY in_guild;  -- During sync
 in_guild | count
----------+--------
 f        | 884763
(1 row)

metricity=# SELECT in_guild, COUNT(*) FROM users GROUP BY in_guild;  -- After sync
 in_guild | count
----------+--------
 f        | 494638
 t        | 390125
(2 rows)
@wookie184
Copy link
Contributor

Looks like it's currently written to set all to not in the guild, then set those in the guild to true after

await sess.execute(update(models.User).values(in_guild=False))

@jchristgit
Copy link
Member Author

jchristgit commented Jun 18, 2024 via email

ChrisLovering added a commit that referenced this issue Jun 18, 2024
Closes #157

Previously we set all users in_guild to False, and relied on users being set back to in_guild when iterating through guild.members

However, this caused two problems
1. For a short window a users in_guild status was incorrect
2. It required an update for all users in_guild to be sent to postgres to update in_guild back to True.

This diff changes that, so instead only users who are not found in the guild have in_guild set to False
@ChrisLovering
Copy link
Member

ChrisLovering commented Jun 18, 2024

#159 seems like a way we could fix this. I haven't been able to test that postgres is ok about serialising that many strings in a single query though.

@ChrisLovering
Copy link
Member

sqlalchemy.exc.InterfaceError: (sqlalchemy.dialects.postgresql.asyncpg.InterfaceError) <class 'asyncpg.exceptions._base.InterfaceError'>: the number of query arguments cannot exceed 32767 will need to relook at this approach

@ChrisLovering
Copy link
Member

ChrisLovering commented Jun 18, 2024

My only thoughts on how to make the above PR work would be to create a TEMPORARY table with ON COMMIT DROP and somehow bulk insert all in-guild users into it.

Can't look into how to achieve that with SQLAlchemy right now though.

@wookie184
Copy link
Contributor

It might be easiest to select all in guild users from the database and then do db_in_guild - guild.members and mark them as not in guild with another query.

It's not quite as nice as doing it all on the database but in practice I don't think it makes a difference for our case.

@jchristgit
Copy link
Member Author

It might be easiest to select all in guild users from the database and then
do db_in_guild - guild.members and mark them as not in guild with another
query.

Won't this run into the same parameter issue?

I have an alternative idea, although it might be a bit crazy:

First we sort the list of user IDs to update ascending. Then we move the data in chunks, with special handling for the first and last chunk to incorporate an update for anything outside our known range as well.

Pseudocode (unsure of the exact SQLAlchemy syntax):

async with async_session() as sess:
    total = len(in_guild_user_id_chunks)
    for idx, chunk in enumerate(in_guild_user_id_chunks):
        not_user_in_chunk = ~models.User.id.in_(chunk)
        user_below_upper_bound = models.User.id <= max(chunk)
        user_above_lower_bound = models.User.id >= min(chunk)

        if idx == 0:
            # Update the complete lower end we are aware of
            condition = (user_below_upper_bound & not_user_in_chunk)
        elif idx == total - 1:
            # Update the complete upper end we are aware of
            condition = (user_above_lower_bound & not_user_in_chunk)
        else:
            # Just update this batch
            condition = (user_below_upper_bound & user_above_lower_bound & not_user_in_chunk)

        await sess.execute(update(models.User).where(condition).values(in_guild=False))
        await sess.commit()

Things to consider:

  • What happens if users join the server during the process?
    • Their in_guild value might be incorrectly set to False
    • We could guard against this by write locking the table at the start of the
      transaction (would have to do a single big transaction then)
  • What happens if users leave the server during the process?
    • Nothing of note, because at worst their in_guild value won't be set to
      False, which is a no-op.

@wookie184
Copy link
Contributor

My suggestion was to do the comparison against guild.members on the Python side rather than the database site, so basically what we currently do but only change the users we need to. Something like this:

        async with async_session() as sess:
            in_guild_users = await sess.execute(select(models.User).filter_by(in_guild=True)).scalars().all()
            guild_member_ids = {member.id for member in guild.members}

            for user in in_guild_users:
                if user.id not in guild_member_ids:
                    user.in_guild = False

            await sess.commit()

@jchristgit
Copy link
Member Author

Can confirm it works as expected now, no more dips are visible in statistics.

Thanks for the fix!

users

jchristgit added a commit to python-discord/infra that referenced this issue Jul 8, 2024
jchristgit added a commit to python-discord/infra that referenced this issue Jul 22, 2024
jchristgit added a commit to python-discord/infra that referenced this issue Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants