New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle Spanner Migration replaced_at records in the purge_old_records script #1538
Comments
➤ JR Conlin commented: Adding a bit of additional context to this bug. We believe that not all records were properly migrated from AWS to GCP. An unknown number of user records were migrated by setting the user’s current node to Spanner, but also with the replaced_at field set. The purge_old_records.py script looks to delete old user records from the database. The script determines a user record is “old” by querying (
There may exist an edge case where a user’s assigned node was set to Spanner, and the replaced_at datestamp was also set. This would cause all records for a given user to be candidates for deletion. In token server ( syncstorage-rs/tokenserver-db/src/models.rs Line 357 in b777fa0
syncstorage-rs/tokenserver-db/src/models.rs Line 402 in b777fa0
syncstorage-rs/tokenserver-db/src/models.rs Lines 444 to 445 in b777fa0
syncstorage-rs/tokenserver-db/src/models.rs Lines 459 to 460 in b777fa0
The product of this is that IF the most recent user record has a replaced_at value set, then we may accidentally delete that user’s record, even if they’re already assigned to the Spanner node. Our purge script should try to avoid that situation. Ideally, it should also remove the false replaced_at for the most recent user’s record. |
The purge_old_records.py script was disabled around the same time (early 2020) as the syncstorage migration from AWS (MySQL based syncstorage nodes) to GCP (Google Cloud Spanner based syncstorage node) began. Users were migrated to Spanner by basically:
replaced_at
value in their user record in the database), which signals:The problem we've encountered with the purge_old_records script is due to:
A)) All syncstorage data was copied to the Spanner database, including invalidated data pending deletion by the purge script which was no longer running on either the MySQL nor Spanner nodes. The script was modified in 2022 to detect invalidated data copied to Spanner and issue deletes for them (the "force" option)
B) Step #2 of the migration process did not take the purge script actions into account:
The work here is modifying the script to further analyze "replaced" records: the scripts current logic assumes these records point to invalidated syncstorage data when that is not always the case.
┆Issue is synchronized with this Jira Task
The text was updated successfully, but these errors were encountered: