Skip to content
This repository has been archived by the owner. It is now read-only.

Data model refactor #973

Closed
rfk opened this issue Jul 2, 2015 · 11 comments
Closed

Data model refactor #973

rfk opened this issue Jul 2, 2015 · 11 comments

Comments

@rfk
Copy link
Member

@rfk rfk commented Jul 2, 2015

We've got a fair backlog of refactorings that we'd like to apply to the core account data model. In service of #957, let's bite the bullet and rejigger things. Below is what @dannycoates, @philbooth and I came up with the other day.

The current data model has all the account data in one big table like this:

+-----------------+---------------------+------+-----+---------+-------+
| Field           | Type                | Null | Key | Default | Extra |
+-----------------+---------------------+------+-----+---------+-------+
| uid             | binary(16)          | NO   | PRI | NULL    |       |
| normalizedEmail | varchar(255)        | NO   | UNI | NULL    |       |
| email           | varchar(255)        | NO   |     | NULL    |       |
| emailCode       | binary(16)          | NO   |     | NULL    |       |
| emailVerified   | tinyint(1)          | NO   |     | 0       |       |
| kA              | binary(32)          | NO   |     | NULL    |       |
| wrapWrapKb      | binary(32)          | NO   |     | NULL    |       |
| authSalt        | binary(32)          | NO   |     | NULL    |       |
| verifyHash      | binary(32)          | NO   |     | NULL    |       |
| verifierVersion | tinyint(3) unsigned | NO   |     | NULL    |       |
| verifierSetAt   | bigint(20) unsigned | NO   |     | NULL    |       |
| createdAt       | bigint(20) unsigned | NO   |     | NULL    |       |
| locale          | varchar(255)        | YES  |     | NULL    |       |
| lockedAt        | bigint(20) unsigned | YES  |     | NULL    |       |

Key ideas for refactoring:

  • pull out email addresses and their related metadata into a separte table, many-to-one related to the account
  • pull out password authentication data into a separate table, in a at-most-one-to-one relationship with the account; federated login uers may not have password data
  • add a new table indicating any federated login details that are associated with the account, also at-most-one-to-one with the account

So we'll wind up with a much slimmer accounts table:

+-----------------+---------------------+------+-----+---------+-------+
| Field           | Type                | Null | Key | Default | Extra |
+-----------------+---------------------+------+-----+---------+-------+
| uid             | binary(16)          | NO   | PRI | NULL    |       |
| kA              | binary(32)          | NO   |     | NULL    |       |
| createdAt       | bigint(20) unsigned | NO   |     | NULL    |       |
| locale          | varchar(255)        | YES  |     | NULL    |       |
| lockedAt        | bigint(20) unsigned | YES  |     | NULL    |       |

An "email" table that breaks out each address as its own thing, keyed by normalized address and mapping to a particular uid:

+-----------------+---------------------+------+-----+---------+-------+
| Field           | Type                | Null | Key | Default | Extra |
+-----------------+---------------------+------+-----+---------+-------+
| normalizedEmail | varchar(255)        | NO   | PRI | NULL    |       |
| email           | varchar(255)        | NO   |     | NULL    |       |
| uid             | binary(16)          | NO   |     | NULL    |       |
| primary         | tinyint(1)          | NO   |     | NULL    |       |
| emailCode       | binary(16)          | NO   |     | NULL    |       |
| emailState      | tinyint(5)          | NO   |     | 0       |       |
| createdAt       | bigint(20) unsigned | NO   |     | NULL    |       |

The "emailState" field here would allow us to represent states other than "unverified" and "verified", such as "bounced". This will probably evolve as we add more features in the multi-email case, like managing communication preferences.

Obviously we need to be able to look up the email addresses associated with a particular account:

+----------+------------+-----------------+--------------+-----------------+------------+
| Table    | Non_unique | Key_name        | Seq_in_index | Column_name     | Index_type | 
+----------+------------+-----------------+--------------+-----------------+------------+
| emails   |          0 | byUserID        |            1 | uid             | BTREE      |

We also create a separate "passwordAuth" table for the login-related data:

+-----------------+---------------------+------+-----+---------+-------+
| Field           | Type                | Null | Key | Default | Extra |
+-----------------+---------------------+------+-----+---------+-------+
| uid             | binary(16)          | NO   | PRI | NULL    |       |
| authEmail       | varchar(255)        | NO   |     | NULL    |       |
| wrapWrapKb      | binary(32)          | NO   |     | NULL    |       |
| authSalt        | binary(32)          | NO   |     | NULL    |       |
| verifyHash      | binary(32)          | NO   |     | NULL    |       |
| verifierVersion | tinyint(3) unsigned | NO   |     | NULL    |       |
| verifierSetAt   | bigint(20) unsigned | NO   |     | NULL    |       |

Of note here, we're explicitly storing the "authEmail" that should be used for salting the authPW calculation, but you can't use it to look up the row. This table is indexed by uid and the authEmail may not match the primary email on the account. To do a login based on email/password we'd need to:

  • SELECT by normalizeEmail from emails table, and join onto passwordAuth table via uid
  • if the password doesn't match, return "authEmail" to the client as part of the error response, so they can re-calculate authPW and try again

Also note that everything related to the password goes in this table, including wrap(wrap(kB)). Users without a password do not have a kB.

For a future federated-login world, we'd have a "federatedAuth" table to track what accounts came from what identity providers. The details of such a table are out of scope for this refactor.

@dannycoates @philbooth f?

How did I do capturing the discussion from the other day? Anything to add?


This is all fine and well from a schema-design point of view, but I suspect we might hit a few bumps when we get around to implementing it. For example, many of our queries rely on being able to slurp in a bunch of account data when you look up a sessionToken. But I suspect the only way to find the rough edges here will be to try it out.

If the above captures most of what we wanted in the refactor, I'll go ahead and try to break it down into smaller incremental tasks.

@dannycoates
Copy link
Member

@dannycoates dannycoates commented Jul 13, 2015

I'm finally looking at this...

@dannycoates
Copy link
Member

@dannycoates dannycoates commented Jul 13, 2015

I think this looks great relationally. The email records have a lot of mutable state (they do represent a state machine after all), but I like that the passwordAuth records can be immutable.

I think we should change authEmailpasswordSalt to unhitch that implementation detail from its intended use.

I'm sure we'll need to adjust it as we go, but I like this a lot 👍

@chilts
Copy link
Contributor

@chilts chilts commented Jul 14, 2015

if the password doesn't match, return "authEmail" to the client as part of the error response ...

Does this mean that I could try hitting the API with an email address of someone I know and gaining knowledge of another email address they might use? e.g. if I tried your Mozilla address but you'd signed up with your personal one (and added your Mozilla address later).

@rfk
Copy link
Member Author

@rfk rfk commented Jul 14, 2015

Does this mean that I could try hitting the API with an email address of someone
I know and gaining knowledge of another email address they might use?

This is a very good point. We could mitigate this by only allowing login with the primary email, and trying really hard to update authEmail whenever we discover that it's mis-matched. Or if we've come this far, we could try migrating to e.g. a hash of the email as salt rather than the raw email.

@rfk
Copy link
Member Author

@rfk rfk commented Jul 14, 2015

Taking this bug back, next step is for me to break it down into smaller implementation tasks.

@rfk rfk assigned rfk and unassigned chilts Jul 14, 2015
@dannycoates
Copy link
Member

@dannycoates dannycoates commented Jul 15, 2015

Does this mean that I could try hitting the API with an email address of someone I know and gaining knowledge of another email address they might use?

Depending on how we implement it I think the email change flow will cause passwordSalt and email to not be equal for a very brief window (between change and next successful login), or always equal. That window should be a factor in how we design the flow.

@rfk
Copy link
Member Author

@rfk rfk commented Jul 16, 2015

To be explicit: we still only plan to allow login with your "primary" email address. So you couldn't use a non-primary email to hit this API and discover the primary one, but there might be a brief window wherein you can hit it with the new primary email and discovery what the old one was.

@rfk
Copy link
Member Author

@rfk rfk commented Aug 2, 2015

@dannycoates says he can proceed with the federated-login stuff without us having to land any parts of this refactor, so I'm putting it back in the backlog for a little while. It will come up again as we move on device-management user stories.

@rfk rfk removed their assignment Aug 2, 2015
@dannycoates
Copy link
Member

@dannycoates dannycoates commented Aug 2, 2015

he can proceed with the federated-login stuff without us having to land any parts of this refactor

Specifically, I can easily land "phase 1" of federated-login now. Later stages, whoops flow, key options, and others will be made vastly easier with a refactor :)

@rfk
Copy link
Member Author

@rfk rfk commented Sep 21, 2015

ref #721 which describes some additional API changes we can make to reflect the status of an email.

@rfk
Copy link
Member Author

@rfk rfk commented Feb 27, 2017

We're doing parts of this with the "add recovery email" feature, so I don't think we need to keep this old issue open. We can still refer back to it if need be.

@rfk rfk closed this Feb 27, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants