Skip to content

updated email_regexp and added test cases#4001

Merged
ulissesalmeida merged 4 commits intoheartcombo:masterfrom
kimgb:email_regexp
Apr 26, 2016
Merged

updated email_regexp and added test cases#4001
ulissesalmeida merged 4 commits intoheartcombo:masterfrom
kimgb:email_regexp

Conversation

@kimgb
Copy link
Copy Markdown
Contributor

@kimgb kimgb commented Mar 20, 2016

updated Devise::email_regexp to meet existing test cases plus valid emails "test@tt" & "test@valid---domain.com", non valid emails "test_user@example..server" & "test_user@example-.server".

@kimgb
Copy link
Copy Markdown
Contributor Author

kimgb commented Mar 21, 2016

Having sent the PR - I've solved one particular issue relating to a false negative, but I wonder that an increasingly complex validation is wise? Domains and emails are not simple validations, and I'd worry there are other false negatives waiting to happen. I wonder if it wouldn't be better to fall back to a simple regexp e.g. /\A[^@]+@[^@]+\z/ that can be trusted not to register false negatives? Users can always plug in stronger validation from elsewhere if they have a need.

Alternatively, more email test cases of the "valid, but unusual" kind.

Just a thought.

@ulissesalmeida
Copy link
Copy Markdown
Contributor

@kimgb Thank you for your thoughts. I think it is more frustrating a user with a valid e-mail that can't sign up than a user with non-valid e-mail sign up.

I'll summon a help of @britto . What do you think about these e-mail regexes?

@britto
Copy link
Copy Markdown
Contributor

britto commented Apr 8, 2016

As @ulissesalmeida pointed out, the general thought behind the default email validation is to be as permissive as possible, while catching some obvious cases of invalid email addresses. To keep it really minimal, I would suggest applying only 4 simple rules:

  1. No spaces allowed.
  2. No consecutive dots allowed.
  3. There must be exactly one @ character, and it must be surrounded.
  4. It must end with an alphanumerical character, even if it's an accented letter.

The regex below implements these rules:

/(?!.*\.\.)(?=\A[^@\s]+@[^@\s]+\z).*[[:alnum:]]\z/

That is the most permissive validation I can think of. Its rules are almost compatible with the current regex, except for a few special cases.

Scenarios where it gets more restrictive:

  1. Emails can no longer end with _. This is currently allowed because [^@\W]+ is equivalent to \w+, which includes underscore.
  2. Consecutive dots are no longer allowed anywhere. So emails like ..@a...b are now invalid.

Scenarios where it gets more permissive:

  1. Emails with no dots after the @ sign are now allowed. So a@b is now valid.
  2. Emails can now end with non-ascii letters. So a@b.ç is now valid.

@britto
Copy link
Copy Markdown
Contributor

britto commented Apr 8, 2016

If we want to keep backwards compatibility, we should not add new restrictions. Adding no new restrictions ensures we won't break existing apps on upgrade. In order to do that, we need to drop rules 2 and 4. Reducing the proposed regex to just:

/\A[^@\s]+@[^@\s]+\z/

It drops 2 restrictions from the current ruleset:

  1. No spaces allowed.
  2. There must be exactly one @ character, and it must be surrounded.
  3. There must be at least one subdomain.
  4. The TLD must be composed of word characters only.

@kimgb
Copy link
Copy Markdown
Contributor Author

kimgb commented Apr 8, 2016

I've since done more reading on email localparts and it doesn't get any easier. While highly unusual, localparts do permit a quoted string that can include a range of characters normally disallowed - including spaces, consecutive dots and @. E.g. if I'm remembering right, "a ..b@c"@d is a valid email. See: isemail.info/"a ..b@c"@d

Supporting this would mean dropping/rewriting the 2 remaining restrictions as listed by @britto above, to something like /\A.+@[^@\s]+\z/. I don't know to what extent, if at all, such unusual emails might appear in the wild - certainly never in my experience, but that doesn't tell you much.

@rafaelfranca
Copy link
Copy Markdown
Collaborator

If backwards compatibility is important why we don't change only the configured email regexp in the generated configuration? https://github.com/plataformatec/devise/blob/b397d33246ae3dbc725f681a3f718a20d34313af/lib/generators/templates/devise.rb#L156

That way new applications will have a better default, existing applications don't need to worry about breaking compatibility by mistake only because the library was upgraded and how want to break compatibility can just use the install generator to use the new regexp.

@britto
Copy link
Copy Markdown
Contributor

britto commented Apr 10, 2016

@kimgb @rafaelfranca Good points. Actually, I feel the default regexp should be the same as the generator regexp. Like any other default, for that matter. Currently only a few generated configs differ from their default values, such as allow_unconfirmed_access_for, confirm_within, paranoid and password_length.

@ulissesalmeida
Copy link
Copy Markdown
Contributor

@kimgb I think we can follow @rafaelfranca idea. We should release first a deprecation warning about and make the configuration explicit.

  • You can change the template in this file. We can update the documentation and uncomment the config with new default.
  • We can warn the default regex will change after the setup when the email regex is not set. Is it the best place?

I think we can use @britto regex. We should no worry about quoted e-mails now.

What do you think?

@ulissesalmeida
Copy link
Copy Markdown
Contributor

Issue related: #3997

@kimgb
Copy link
Copy Markdown
Contributor Author

kimgb commented Apr 13, 2016

OK - I've reverted to the older, more permissive regex /\A[^@\s]+@[^@\s]+\z/. Wise move, I think. The warning is handled in #4036 if I followed that right?

@ulissesalmeida
Copy link
Copy Markdown
Contributor

@kimgb Yes! Thank you @kimgb, it looks good. It will be released on Devie 4.1.

@ulissesalmeida
Copy link
Copy Markdown
Contributor

@kimgb After the release of 4.0.0 your branch now has conflicts with the master branch. Could you please rebase it? I think you also can remove together the deprecation message about email_regex.

Best

@ulissesalmeida
Copy link
Copy Markdown
Contributor

@kimgb I think something happened after your rebase, the PR still have conflicts and there's commits that is already on master.

@kimgb
Copy link
Copy Markdown
Contributor Author

kimgb commented Apr 21, 2016

Yes, definitely no idea how to use rebase! Tidied up some, but still conflicts. I think maybe I misunderstood the part about removing deprecation warnings?

@kimgb
Copy link
Copy Markdown
Contributor Author

kimgb commented Apr 22, 2016

I was using rebase fine, just needed to get past my aversion to rewriting history. I removed deprecation warnings, but only about email_regexp. Is that right?

@ulissesalmeida
Copy link
Copy Markdown
Contributor

@kimgb Yep, totally right. Now it looks very good. Thanks! \o/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants