Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add percentage routing and migration status to user table #163

Merged
merged 7 commits into from
Dec 31, 2019

Conversation

jrconlin
Copy link
Member

This adds a few entries into the .ini:

[tokenserver]
spanner_entry  = # Name of the spanner node name e.g.  https://spanner.example.com
spanner_node_id = # default spanner node id
migrate_new_user_percentage = # percentage of users to send to spanner

note the "percentage" is a terrible hack that just sends the first n
of every 100 users to spanner.

Closes #159

This adds a few entries into the .ini:

```
[tokenserver]
spanner_entry  = # Name of the spanner node name e.g.  https://spanner.example.com
spanner_node_id = # default spanner node id
migrate_new_user_percentage = # percentage of users to send to spanner

```
*note* the "percentage" is a terrible hack that just sends the first _n_
of every 100 users to spanner.

Closes #159
@jrconlin jrconlin requested a review from rfk December 19, 2019 01:00
Copy link
Contributor

@rfk rfk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up @jrconlin!

My r- here is specifically due to the comment below about modifying historical assignments records, which I don't think will be acceptable.

More broadly, I'm not sure I understand what you want to achieve here with the new "migration_state" column and the way you're managing the data model (which may be because the current data-model is not well explained). It would be great to get a short prose overview of the mechanics of the approach here.

It looks like you're basing the decision of whether to move the user to spanner on the autoincrement uid column. This column is not stable for a user over time - if they reset their FxA password, then the mechanics in update_user will replace their current node-assignment record with a new one with a new uid. In the process, I believe this patch would try to keep them on the spanner node, but would reset their effective migration_state column to the empty string.

In my head, a simpler version of this could look like:

  • In allocate_user:
    • Use random.randint(0, 100) to decide whether to assign to spanner of not; I don't think we need this to be based on anything other than pure probability do we?
    • Make the decision of whether to assign to spanner at the point where it currently calls get_best_node, so that the logic more clearly reads like "assign this user to spanner if they are lucky, otherwise to the best available node".
    • Pass the selected node into _CREATE_USER_RECORD and let it do its thing, rather than using a separate _MIGRATE_USER call.
  • In update_user:
    • Don't do anything special; this method already has logic for keeping a user assigned to the same node (but with a different uid) across password resets, so the choice to send them to spanner will be sticky by default.

But it's entirely possible that this will mis-behave in some cases that I have not considered.

tokenserver/assignment/sqlnode/sql.py Outdated Show resolved Hide resolved
tokenserver/assignment/sqlnode/sql.py Show resolved Hide resolved
tokenserver/assignment/sqlnode/sql.py Outdated Show resolved Hide resolved
tokenserver/assignment/sqlnode/sql.py Outdated Show resolved Hide resolved
tokenserver/assignment/sqlnode/sql.py Outdated Show resolved Hide resolved
tokenserver/assignment/__init__.py Outdated Show resolved Hide resolved
tokenserver/tests/test_memorynode.ini Outdated Show resolved Hide resolved
tokenserver/tests/test_memorynode.py Outdated Show resolved Hide resolved
@jrconlin
Copy link
Member Author

Thanks Ryan,

Part of this is probably due to my not quite understanding all the interaction points, so I absolutely expect some confusion.

The "migration_state" field serves several roles in my mind. One, it indicates a record that has been sent over to the Spanner system. Currently, this would only be new users, and the code only calls this in the case where a user is first assigned, but as older users are eventually migrated over, there would need to be some state to indicate that this user is no longer stored in MySQL (I am guessing this is the historical state you're concerned about).

This is partly due to the fact that the migration work I was doing shows that transferring over users may result in less projected interruption time than locking a node, transferring the users, then routing them to the new service. Since the user identification information would remain the same in both mysql and spanner, and this data would be reflected in the data set here, it would be straightforward to honor any data deletion requests.

In any case, that transition state needs to live somewhere, which means either a table or an additional column. Since the column doesn't require an index, does not impact existing data, and is tied to the same index that is used for the users table, adding the column made the most sense to me at the time. Very happy to move that state info somewhere else if it makes more sense.

I chose to select users to send based on their ID partly to ease testing and to give some level of predictability about which users would be routed to the spanner system. I also agree that it's probably needlessly complicated since the uid for mysql is the row sequence number, which is not available until after the insertion, thus resulting in the immediate MIGRATE_USER call. That same call could be used later as part of the future user migration script. Still, it's probably causing more problems than it's fixing and I'll simply opt for a random generator.

@rfk
Copy link
Contributor

rfk commented Dec 19, 2019

there would need to be some state to indicate that this user is no longer stored in MySQL

Sure, but the nodeid field indicates this already, but virtue of pointing to the spanner node :-)

I chose to select users to send based on their ID partly to ease testing and to give some level of
predictability about which users would be routed to the spanner system

Another option could be to use some deterministic function of the email parameter, e.g. int(md5(email).hexdigest()[:8], 16) or similar.

This is partly due to the fact that the migration work I was doing shows that transferring over users
may result in less projected interruption time than locking a node, transferring the users, then
routing them to the new service.
Since the user identification information would remain the same in both mysql and spanner,

Ah, interesting - so is it still a possibility that we'll try to move user data over to spanner on the backend rather than requiring existing users to re-upload their data on the client?

I see the attraction of trying to keep uid stable across such a migration, but I think it's likely to cause unexpected side-effects elsewhere. I'd recommend creating a new record in the users table at the time of migration.

and this data would be reflected in the data set here, it would be straightforward to honor
any data deletion requests.

To honor data deletion requests, we'd have to remember which MySQL node they were assigned to prior to the migration, but the current version of _MIGRATE_USER overwrites that with the nodeid of the spanner node.

I'm not opposed to adding a migration_state column, just be be clear. I think it would work OK to create a new record for the user at the time of migration (marking their previous record as replaced) and set migration_state only on their current record. Then update_user could propagate that new column forward across password resets etc.

That said, I reckon we could safely defer adding it until we need to use it on existing users.

* removed `migration_state` column
* changed picker to work off of sha1 of the email (for repro)
* normalized backend settings as much as possible.
* added "universal" test
@jrconlin jrconlin requested a review from rfk December 19, 2019 22:14
Copy link
Contributor

@rfk rfk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is shaping up nicely, thanks for splitting out the migration_state column for followup work.

My "request changes" here is because of my questions around the handling of the node id for the spanner backend, and the way it currently lives in config in a way that's entirely separate from the MySQL node ids. More details in comments below, but I wonder what you think about putting the spanner node into the db as a peer of the MySQL nodes, to reduce the possibility of configuration error.

tokenserver/assignment/memorynode.py Outdated Show resolved Hide resolved
tokenserver/assignment/__init__.py Outdated Show resolved Hide resolved
tokenserver/assignment/sqlnode/sql.py Outdated Show resolved Hide resolved
tokenserver/assignment/sqlnode/sql.py Outdated Show resolved Hide resolved
tokenserver/tests/test_service.py Outdated Show resolved Hide resolved
tokenserver/assignment/sqlnode/sql.py Outdated Show resolved Hide resolved
tokenserver/tests/test_service.py Outdated Show resolved Hide resolved
@pjenvey pjenvey requested a review from fzzzy December 23, 2019 18:48
tokenserver/assignment/sqlnode/sql.py Outdated Show resolved Hide resolved
tokenserver/assignment/sqlnode/sql.py Outdated Show resolved Hide resolved
tokenserver/assignment/__init__.py Outdated Show resolved Hide resolved
tokenserver/assignment/sqlnode/sql.py Outdated Show resolved Hide resolved
tokenserver/tests/test_service.py Outdated Show resolved Hide resolved
@rfk
Copy link
Contributor

rfk commented Dec 30, 2019

Is it important that we prevent existing users from being organically node-assigned to the spanner node?

@tublitzed
Copy link
Contributor

Is it important that we prevent existing users from being organically node-assigned to the spanner node?

@rfk - We've been planning the rollout in two stages; first new users only (less risky, no data to lose), and then migrating existing users. So, ideally yes; we'd like to be able to route only new users first. I don't think it would be catastrophic if a small number of existing users were routed over, but since that's the riskiest bit here it would be good to be able to ensure we've migrated their data first to reduce the risk of them connecting on some new device that doesn't have their sync data and losing access.

Copy link
Contributor

@rfk rfk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing through with this @jrconlin! I'd be comfortable shipping this as-is, but please also consider the potential config simplifications from #164

tokenserver/assignment/memorynode.py Outdated Show resolved Hide resolved
tokenserver/assignment/sqlnode/sql.py Outdated Show resolved Hide resolved
tokenserver/tests/test_service.py Outdated Show resolved Hide resolved
* add #164 changes
* move spanner migration to SQL test suite
 * only check migration for the migration check to prevent future issues
@jrconlin jrconlin requested a review from rfk December 31, 2019 00:16
Copy link
Member Author

@jrconlin jrconlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dismissing @rfk 's review since I merged in #164 and want to make sure that I didn't screw more things up.

Copy link
Contributor

@rfk rfk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I left some minor nits, take or leave them as you will.

tokenserver/assignment/memorynode.py Outdated Show resolved Hide resolved
tokenserver/assignment/memorynode.py Outdated Show resolved Hide resolved
tokenserver/assignment/sqlnode/sql.py Outdated Show resolved Hide resolved
tokenserver/tests/assignment/test_sqlnode.py Outdated Show resolved Hide resolved
tokenserver/tests/test_service.py Outdated Show resolved Hide resolved
tokenserver/tests/test_service.py Outdated Show resolved Hide resolved
tokenserver/tests/test_sql.ini Outdated Show resolved Hide resolved
tokenserver/tests/test_memorynode.ini Outdated Show resolved Hide resolved
tokenserver/assignment/sqlnode/sql.py Outdated Show resolved Hide resolved
@jrconlin jrconlin merged commit 07ef6b6 into master Dec 31, 2019
@jrconlin jrconlin deleted the feat/159 branch January 15, 2020 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adjust routing logic for new users to allow us to point X% of them to Spanner.
3 participants