Skip to content

Conversation

@ehfeng
Copy link
Contributor

@ehfeng ehfeng commented Aug 17, 2017

Because emails are not distinct per User, we're using the Email to store email-specific settings (email subscription state). UserEmail will represent verification that a given user "owns" an email address.

Alternatively, UserEmail.email could just be a key for Email.id, where the id value is the email string. The value for this seems minimal (UserEmail.email is never updated in place, so the chance of UserEmail.email != UserEmail.global_email.email is low), so I've done the more naive approach.


user = FlexibleForeignKey(settings.AUTH_USER_MODEL, related_name='emails')
email = models.EmailField(_('email address'))
global_email = FlexibleForeignKey('sentry.Email', related_name='useremails', null=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't need this since you can just index the email string and query on it

Copy link
Contributor

@mattrobenolt mattrobenolt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely need to ship the data migration after the schema migration, otherwise you'll end up with some rows that are still NULL.

It should also be considered if we care about case sensitivity here or not. I feel like we should always lowercase the value to prevent unintended duplicates due to case differences. Matt@example.com vs matt@example.com

And, what is going to keep this up to date for new rows? I don't see any code that's adding new Email rows except for the migration.

for useremail in RangeQuerySetWrapperWithProgressBar(queryset):
email, _ = Email.objects.get_or_create(email=useremail.email)
useremail.global_email = email
useremail.save()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't do this .save() Do the UserEmail.objects.filter(id=useremail.id).update(global_email=email) pattern instead to avoid race condition.

@ehfeng
Copy link
Contributor Author

ehfeng commented Aug 18, 2017

@mattrobenolt good point.

I'm looking at receivers/useremail.py and I'm assuming this is a good case for having a receiver on post_save. Especially as changes to User and UserEmail are generally user-triggered, so adding latency isn't bad?

@ehfeng
Copy link
Contributor Author

ehfeng commented Aug 21, 2017

@mattrobenolt I could modify it such that the host is case insensitive, but the username part of email can be case-sensitive. Obviously, Gmail, Yahoo, and Outlook make the username case-insensitive, so we could be the same, but email standard itself mandates case sensitivity.

I'm not sure if this would be a security issue—I'll defer to whatever you decide.

@mattrobenolt
Copy link
Contributor

Yeah, internally @tkaemming brought this up. But our own behavior for normal User.email is to treat it as case-insensitive and we use this for logins. My concern is that we don't 100% handle this everywhere correctly since it requires explicit action to handle this behavior.

If we have control over this, we can just apply a functional index that would handle uniqueness as case insensitive to enforce it correctly.

@dcramer
Copy link
Member

dcramer commented Aug 21, 2017

Can someone pull data on if we have case duplicated emails? if not lets move towards case insensitive.

@dcramer
Copy link
Member

dcramer commented Aug 21, 2017

Let's do a few things:

  1. Email.email is unique
  2. Email.email is stored in LOWERCASE at all times (i dont know if this is easy to do via a custom field?)
  3. UserEmail has a unique constraint on (user_id, email)
  4. UserEmail.email is stored same as Email.email (always lowercase)
  5. Migrate all existing emails in UserEmail to be lowercase

@mattrobenolt
Copy link
Contributor

Email.email is stored in LOWERCASE at all times (i dont know if this is easy to do via a custom field?)

I was going to do this with a functional index to enforce it on the database. Something like, CREATE UNIQUE INDEX ... (lower(email)) since we run custom SQL anyways for indexes.

@dcramer
Copy link
Member

dcramer commented Aug 21, 2017

@mattrobenolt is there a good reason to not store things lowercase? its a lot easier to be compatible if we do that

@mattrobenolt
Copy link
Contributor

But yeah, we can do a field as well, I think having the index enforce it makes it more explicit on the db side.

I think my concerns are how we have to deal with User.email today. We have to always make sure we do a case insensitive filter, etc. Which may be error prone.

I don't think a field would help with this case, would it? For example, doing Email.objects.filter(email='FOO@EXAMPLE.COM') vs Email.objects.filter(email__iexact='FOO@EXAMPLE.COM') Not sure if a custom field could coerce this behavior automatically? I know the db index would.

@dcramer
Copy link
Member

dcramer commented Aug 21, 2017

An index wouldn't coerce a filter clause, but a field could. We should see if we can do that, though someone from @getsentry/platform might need to help here.

I think doing the unique on lower() would be useful to ensure the guarantee, but we should do both.

@ehfeng
Copy link
Contributor Author

ehfeng commented Aug 21, 2017

We can't assume all email servers are case insensitive. If example.com's email server is case-sensitive, then someone with the address UserName@example.com would literally never be able to get our emails.

I prefer Matt's suggestion of enforcing in the index, but storing the case-sensitive value.

@dcramer
Copy link
Member

dcramer commented Aug 21, 2017

I'd prefer a case insensitive text field, but given we dont have that, I'm ok with saiyng "we will not support case sensitive email servers".

@ehfeng
Copy link
Contributor Author

ehfeng commented Aug 21, 2017

Adding this as reference:

While the above definition for Local-part is relatively permissive, for maximum interoperability, a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires (or uses) the Quoted-string form or where the Local-part is case-sensitive.

Therefore, it's probably safe to assume case-insensitivity.

@dcramer
Copy link
Member

dcramer commented Aug 21, 2017

Per talk internally, let's just make it a 'citext' field for postgres, and assume its case insensitive.

In MySQL we might be able to use a case insensitive collation, but im not sure if Django lets us provide that. Ensuring correctness for SQLite would be nice, but i'm not sure we have an option.

Copy link
Contributor

@mattrobenolt mattrobenolt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to move the backfill to another deploy. The problem is there will be a race condition. We'll deploy the new model, run the backfill, then between running the backfill and new code wtih the signal being live, we'll miss data.

"""
__core__ = True

email = CIEmailField(_('email address'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need unique=True here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. Added.

@ehfeng
Copy link
Contributor Author

ehfeng commented Aug 25, 2017

Removed backfill

@ehfeng
Copy link
Contributor Author

ehfeng commented Aug 25, 2017

@mattrobenolt We need to create the citext extension within the test database. Can you help with this?

@ghost
Copy link

ghost commented Sep 7, 2017

2 Warnings
⚠️ Changes require @getsentry/security sign-off
⚠️ PR includes migrations

Security concerns found

  • src/sentry/models/email.py
  • src/sentry/receivers/email.py

Migration Checklist

  • new columns need to be nullable (unless table is new)
  • migration with any new index needs to be done concurrently
  • data migrations should not be done inside a transaction
  • before merging, check to make sure there aren't conflicting migration ids

Generated by 🚫 danger

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a date_added field here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@dcramer
Copy link
Member

dcramer commented Sep 7, 2017

this is actually good to go at this point right @ehfeng? (i didnt check tests)

@mattrobenolt
Copy link
Contributor

I gotta take over and figure out what's causing citext to fail on some of the tests when I get a few spare minutes to poke at it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code should probably happen after fix_south is called.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. Unless you mean immediately after. But either way, I'm gonna see what makes things happy here. This was me just guessing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean immediately after, but not sure if it will actually cause issues by keeping it below. Theres sort of a "things should be in an ok state at this point", and I'd just suggest bundling the "database should be ready" parts.

@dcramer
Copy link
Member

dcramer commented Sep 7, 2017

i'll take this over and figure out why tests are broken real quick

@dcramer
Copy link
Member

dcramer commented Sep 7, 2017

pushed a fix for citext extension

@ehfeng
Copy link
Contributor Author

ehfeng commented Sep 7, 2017

@dcramer not sure why it's not working, but it looks like things passed: https://travis-ci.org/getsentry/sentry/builds/273059785
Merge?

@ehfeng ehfeng force-pushed the email-model branch 2 times, most recently from be8d53c to 5120b0a Compare September 7, 2017 23:21
@ehfeng
Copy link
Contributor Author

ehfeng commented Sep 7, 2017

It looks like master is broken right now. I'm going to wait until it's green, rebase, and see what happens. The current failing test seems to have appeared in 09883f6

@mattrobenolt
Copy link
Contributor

You'll need to regenerate migration for new id since it's out of date from current master.

@ehfeng
Copy link
Contributor Author

ehfeng commented Sep 8, 2017

Regenerated.

Adding Email model
Adding post_save triggers on UserEmail to fill in Email
Adding on_delete signal
adding CIText to testing setup
@ehfeng ehfeng merged commit 54cb964 into master Sep 8, 2017
@ehfeng ehfeng deleted the email-model branch September 8, 2017 18:23
__core__ = True

email = CIEmailField(_('email address'), unique=True)
date_added = models.DateTimeField(default=timezone.now)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't need an index on this now, we'll probably need to add one later if we try to do anything with this table regarding cleanup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to punt on this because I'm unsure when we'd need to do such a cleanup.



def delete_email(instance, **kwargs):
if UserEmail.objects.filter(email=instance.email).exists():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to note here, that this might be slightly inaccurate due to this not being a case insensitive lookup. You should do email__iexact. Because Email is case insensitive but UserEmail is, you'll get false positives.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point: #6070

@ehfeng ehfeng restored the email-model branch September 8, 2017 20:08
@ehfeng ehfeng deleted the email-model branch September 8, 2017 20:09
@ehfeng ehfeng mentioned this pull request Sep 8, 2017
@github-actions github-actions bot locked and limited conversation to collaborators Dec 22, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants