Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add migration state flag to data store #125

Closed
jrconlin opened this issue Oct 29, 2019 · 20 comments
Closed

Add migration state flag to data store #125

jrconlin opened this issue Oct 29, 2019 · 20 comments

Comments

@jrconlin
Copy link
Member

For each user, we should add a state flag column in the data store to provide fine control over migration.

e.g.
"LOCAL" - User should continue using this node.
"MIGRATING" - return error putting client in pending migration state
"MIGRATED" - return 503 to force client to fetch new token and migrate
"ERROR" - An error occurred, user requires special attention.

@rfk
Copy link
Contributor

rfk commented Oct 29, 2019

we should add a state flag column in the data store

Doing a db migration on these nodes scares me. Possible approaches mentioned in the meeting:

  • Store per-user state in memcahed.
  • Calculate the state based on buckets, e.g. all users with uid % 100 < X are migrating.

Memcached is probably going to be the simplest option and gives the finest-grained control.

@jrconlin
Copy link
Member Author

Agreed. Why I left it open and nebulous as "data store". Memcache works just fine.

@rfk
Copy link
Contributor

rfk commented Oct 30, 2019

@jrconlin just to ensure we're all on the same page re: expectations, is this something your team is happy to take on, or would you like my input on implementation details?

@mhammond
Copy link

FWIW, I think MIGRATING is what would return a 5XX and MIGRATED would return a 401. In not sure how you see ERROR being used, but from the client's POV, that should probably also return a 5XX, so maybe ERROR and MIGRATING are actually the same if you squint?

@jrconlin
Copy link
Member Author

our team can do it, but we definitely need to make sure we have all the requirements defined.

@tublitzed tublitzed added this to Prioritized in Services Engineering Oct 30, 2019
@tublitzed
Copy link
Contributor

@rfk we'll take it on. It'd be great to work with your team though just to ensure we're all on the same page for requirements/approach.

@tublitzed
Copy link
Contributor

@jrconlin - should this still be open since it looks like you're tackling it as part of this PR?

@jrconlin
Copy link
Member Author

jrconlin commented Jan 8, 2020

Closing this.

Ops identified that Node 800 is considered the "spanner" dedicated node and would suffice as an indicator of customer state for tokenserver.

@jrconlin jrconlin closed this as completed Jan 8, 2020
Services Engineering automation moved this from Prioritized to Done Jan 8, 2020
@tublitzed
Copy link
Contributor

@jrconlin should we also update the related migration plan to reflect this?

@jrconlin
Copy link
Member Author

jrconlin commented Jan 8, 2020 via email

@jrconlin
Copy link
Member Author

jrconlin commented Jan 9, 2020

Looping back around after a cross team meeting today.

The client cannot tolerate the idea of writing data, getting a success state back and then having the server potentially lose that information. During migration, there is a chance (albeit very small) that a migration may start, the client may attempt writing to syncstorage, then the migration completes and the user is "moved" to the new data store. This would result in the server "losing" the previous write after telling the client that it was successfully stored. The client would be MUCH HAPPIER if the server were to indicate that the write failed for any reason, and the server tries again.

Database modification was ruled as being undesirable, so creating a distinct migration state flag to indicate the state of transition for the user is a bit harder.

Alternate options:

  1. much like how the spanner node is reserved, use a similar "migration" node number (say 10,000 + node). The migration script sets this node value at the start. StorageServer looks for this magical node value and returns a 500 preventing client writes while client data is being migrated.

  2. Create an empty "migrate" collection (666?). If this collection is present, fail the client as step No.1

  3. Migrate the user, then note if the data update times match, if not, remigrate user data. (Problem, what if the client has already updated spanner?)

  4. Migrate the user, then drop spanner user data if update times do not match.

  5. ???

@jrconlin jrconlin reopened this Jan 9, 2020
Services Engineering automation moved this from Done to Prioritized Jan 9, 2020
@mhammond
Copy link

mhammond commented Jan 9, 2020

Just fleshing out some of these options:

  • much like how the spanner node is reserved, use a similar "migration" node number (say 10,000 + node). The migration script sets this node value at the start. StorageServer looks for this magical node value and returns a 500 preventing client writes while client data is being migrated.

This implies that the legacy storage servers would need to check this node value before every write.

  • Create an empty "migrate" collection (666?). If this collection is present, fail the client as step No.1

Similar to the above, but at least the read would be local. In practice, this sounds alot like adding a "migrated" flag to the storage server rather than the tokenserver.

  • Migrate the user, then note if the data update times match, if not, remigrate user data. (Problem, what if the client has already updated spanner?)

  • Migrate the user, then drop spanner user data if update times do not match.

In both of these scenarios, we would probably just need to detect the situation and drop the spanner data entirely - ie, these users would, in effect, not be "migrated" but would end up treating this as an old-school node migration.

While this doesn't sound great, it might be OK because I'd expect that in-practice very few users would be impacted. However, I think we should treat this as the last-resort option.

  • ???

Another option we briefly discussed was that assuming there's a brief (ie, ~1s) period where these a possibility of things not being in sync, we could (say) return a 500 for all users on that node. I don't know enough about the server architecture to know if that makes sense or not.

It's also worth reiterating that clients cache the tokenserver token, so any solution will need to handle the fact that until that expiry or a 401, the clients aren't going to ask what the storage server is. IOW, from the clients POV, the perfect scenario is that:

  • from the instant the migration starts, their existing storage server returns 5XX responses.
  • at some point after the migration completes, their existing storage server returns a 401.
  • at no point between these 2 states is a 2XX seen.

@jrconlin
Copy link
Member Author

jrconlin commented Jan 9, 2020

@mhammond

Similar to the above, but at least the read would be local. In practice, this sounds alot like adding a "migrated" flag to the storage server rather than the tokenserver.

Right, likewise the other option would require a cross check. They both feel messy compared with creating a new column, but would work.

Setting a flag globally on the server is a bit tricky, because it means communicating with the various threads handling the links. There is also a user meta record on the storage server that might be extended to indicate the data is being migrated.

@rfk
Copy link
Contributor

rfk commented Jan 11, 2020

The client cannot tolerate the idea of writing data, getting a success state back and then
having the server potentially lose that information.
[...]
IOW, from the clients POV, the perfect scenario is that:

  • from the instant the migration starts, their existing storage server returns 5XX responses.
  • at some point after the migration completes, their existing storage server returns a 401.
  • at no point between these 2 states is a 2XX seen.

Sorry I missed the synchronous discussion, but why is it important that the client not see a 2XX here? IIUC at the end of the migration process, the user is going to be assigned to a new node, so the client will see a node-reassignment and will do a full-reconcile sync, which should recover from any lost writes.

I think it's important that we don't accept client writes while a migration in progress in order to let the migration work on a consistent snapshot of data, so the 5XX part seems important. But the possibility for lost writes is not clear to me here.

@jrconlin
Copy link
Member Author

Ok, I get the feeling that we're exposing an issue much, much larger here.

If I understand your point clearly, when a client discovers it's been assigned to a new node, it does a full reconcile sync write. This repopulates the new destination node with data. In that case it doesn't matter what was previously stored in the node, since the client will simply write new data to it. Correct? (We had discussion about this in several internal documents.

If that's the case, then migrating user data from existing MySQL nodes to the spanner node is effectively busy-work.

Now, if that's not quite correct, and that there is some value to coping existing data from an old node to the new node, then we need to ensure that the snapshot is "clean". To that end both nodes (old and new) should prevent accidental writing of sync data to prevent potential corruption or loss of data state. That could happen if a client gets a 2xx response message and updates what it thinks the server has.

The current plan is not to update the token server to indicate the new node is "available" until the data has been migrated over. Since that may take time a client may try to update data, thus a potential for lost data.

Mind you, if the larger "migrate doesn't matter" issue is in play, then you're right.

@rfk
Copy link
Contributor

rfk commented Jan 15, 2020

when a client discovers it's been assigned to a new node, it does a full reconcile sync write.
This repopulates the new destination node with data. In that case it doesn't matter what was
previously stored in the node, since the client will simply write new data to it. Correct?

I don't believe it will clobber existing data in the node, but will instead try to fetch any existing data from the new node and locally reconcile that data with its own view of the world, then upload any changes. But I'm not the most appropriate authority on client behavior here; @mhammond or @linacambridge may want to weigh in.

@linabutler
Copy link
Member

I don't believe it will clobber existing data in the node, but will instead try to fetch any existing data from the new node and locally reconcile that data with its own view of the world, then upload any changes.

Correct. The way Desktop decides to do this is via the (global and per-collection) syncIDs in the meta/global record, and nothing else. As long as all the syncIDs match, it doesn't matter if the actual storage server URL is different or not; the client assumes it's talking to the same storage server.

If the IDs match, Desktop syncs as before. If they don't, it'll throw away all local Sync metadata—change counters, sync IDs, last sync times, and so on—pull down everything from the server, do a full reconcile, and upload any new changes. In most cases, that delta will be nothing, so there shouldn't be an increased write load on the server. But it is super read-heavy.

However, Desktop will wipe all collections for a user if it's missing a meta/global record, or if m/g is missing a top-level syncID. It'll also wipe a collection if it doesn't have an entry in the engines field in m/g.

So I think there are a couple of approaches we can take:

  1. What @mhammond suggested in Add migration state flag to data store #125 (comment). When we're ready to migrate, we have the old node return a 5xx status code. Once migration is done, the old node returns a 401, and the client syncs with the new Spanner server. This approach lets us migrate all of a user's collections unmodified, including m/g. This will not do a full reconcile on the next sync, and means other clients might miss partial writes.
  2. If we want the client to do a full reconcile while keeping the server data, we'll need to insert a fake m/g record for the migrated user, with new syncIDs. We can't not migrate m/g, because that will make the client wipe the server.

In both cases, it's possible we'll do the migration just as a client is syncing. Since there's no way for the client to tell the server "hey, I'm syncing right now", and batch uploads aren't atomic across collections (and might even be split up for large collections), a client could successfully commit some writes, then get a 5xx. In this case, the client won't fast-forward its last sync time or update any Sync metadata. So we'll leave a partial write on the old server, which the client can fix up after the migration. But that gets tricky—if the user has multiple devices, and another one syncs with Spanner first, it'll see the result of the partial write, and try to fix it up before the interrupted device syncs again.

Option (2) is better for working around this because it will make every client download and re-apply the partial state. It might be wrong, but we're guaranteed not to lose those partial writes, and it'll be consistently wrong everywhere (and the user can fix it up).

@rfk
Copy link
Contributor

rfk commented Jan 16, 2020

As long as all the syncIDs match, it doesn't matter if the actual storage server URL is different
or not; the client assumes it's talking to the same storage server.

This was not in my mental model of the system, thanks for clarifying!

@pjenvey
Copy link
Member

pjenvey commented Jan 16, 2020

But that gets tricky—if the user has multiple devices, and another one syncs with Spanner first, it'll see the result of the partial write, and try to fix it up before the interrupted device syncs again.

What's the end result of this situation?

@tublitzed
Copy link
Contributor

Closing as wontfix based on recent change of plans to migrate per node vs per user.

Services Engineering automation moved this from Prioritized to Done Feb 27, 2020
@tublitzed tublitzed moved this from Done to Archived in Services Engineering Mar 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

6 participants