# Email validator

Create an email address validator to ensure the addresses in the database are valid.

Any e-mail address is made of a local part, the symbol `@` and a domain name. Thus, the validation process is done in three steps, and using RegEx patterns:
1. Check if the email address has a valid base format, i.e., `localpart@domainname`.
2. Check the local part format.
3. Check the domain name format.

Because a vast majority of the users use an unquoted address, the validator do not accepts addresses with a quoted local part. For more information about the formatting of the local part, see this [link](https://en.wikipedia.org/wiki/Email_address#Local-part). In addition, as IP domain names are extremly rare except in email spam, the validator does not accept them either. For more information about the domain formatting, see this other [link](https://en.wikipedia.org/wiki/Email_address#Domain).

Of course, using RegEx patterns I can only verify that the email address is sintactically correct, but not whether it was misstyped or if it (exists checking the SMTP server). For a more complete validator that does extra verifications, one might use the `Python` third-party library `validate-email` (see their GitHub [repo](https://github.com/syrusakbary/validate_email)).

In the following cell I include `EmailValidator`, which is a `Python` class designed to check whether an email address is valid or not. To use it, one only needs to instantiate it and use its convenience function `is_valid`, which returns a boolean value indicating whether the input email is valid or not. In case `is_email_valid` returns `False`, it also prints a message indicating the  reason why. See the following example:
```python
validator = EmailValidator()
validator.is_email_valid('user@example.com')
# output: True
```

For a faster (and less exhaustive) validation, one can use its class method `fast_validation`, which also returns a boolean according to the validity of the input address but prints no messages (nor raises exceptions). See the following example:
```python
EmailValidator.fast_validation('userexample.com') # notice there is no need to instantiate the class
# output: False
```

Yet another example (with error message):
```python
validator = EmailValidator()
validator.is_email_valid('userexample.com')
# prints: NotValidEmailAddressSyntaxError: Expecting address syntax like `localpart@domainname`
# output: False
```

In [1]:
import sys
sys.path.append('./src/')

from validator import EmailValidator

## Timing

In [2]:
%%timeit
validator = EmailValidator()
validator.is_email_valid("user@test.com")

10.8 µs ± 59.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [3]:
%%timeit
EmailValidator.fast_validation("user@test.com")

4.23 µs ± 267 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


## Testing

In [23]:
valid_addresses = [
    'simple@example.com',
    'very.common@example.com',
    'disposable.style.email.with+symbol@example.com',
    'other.email-with-hyphen@example.com',
    'fully-qualified-domain@example.com',
    'user.name+tag+sorting@example.com',
    'x@example.com',
    'example-indeed@strange-example.com',
    'admin@mailserver1', 
    'example@s.example', 
    '" "@example.org',
    '"john..doe"@example.org',
    'mailhost!username@example.org',
    'user%example.com@example.org',
    'user-@example.org',
    '"A@b@c"@example.com',
    
    'jsmith@[192.168.2.1]',
    'jsmith@[IPv6:2001:db8::1]'
]

invalid_addresses = [
    'Abc.example.com', # no @ character
    'A@b@c@example.com', # only one @ is allowed outside quotation marks
    'a"b(c)d,e:f;g<h>i[j\k]l@example.com', # none of the special characters in this local-part are allowed outside quotation marks
    'just"not"right@example.com', # quoted strings must be the only element making up the local-part
    '"quote"separated"@address.com',
   r'this is"not\allowed@example.com', # spaces, quotes, and backslashes may only exist when within quoted strings and preceded by a backslash
    'this\ still\"not\\allowed@example.com', # even if escaped (preceded by a backslash), spaces, quotes, and backslashes must still be contained by quotes
    '1234567890123456789012345678901234567890123456789012345678901234+x@example.com', # local-part is longer than 64 characters
]

In [None]:
# it should consider all valid
all(EmailValidator.fast_validation(address) for address in valid_addresses)

In [17]:
# again, all valid
validator = EmailValidator()
all(validator.is_email_valid(address) for address in valid_addresses)

True

In [24]:
# it should consider all not valid
all(not EmailValidator.fast_validation(address) for address in invalid_addresses)

True

In [19]:
# again, not valid (and print error messages explaining why)
validator = EmailValidator()
all(not validator.is_email_valid(address) for address in invalid_addresses)

NotValidEmailAddressSyntaxError: Expecting address syntax like `localpart@domainname`
LocalPartSyntaxError: Invalid syntax for unquoted local part `a@b@c`.
It contains the following non-valid characters: `@`.
The accepted ones are printable US-ASCII characters not including the specials, i.e.:
  - Latin letters `a` to `z` and `A` to `Z`
  - Digits `0` to `9`
  - Printable characters `!#$%&'*+-/=?^_`{|}~`
  - Dot `.`, as long as it is not the first or last character and that it does not appear consecutively
LocalPartSyntaxError: Invalid syntax for unquoted local part `a"b(c)d,e:f;g<h>i[j\k]l`.
It contains the following non-valid characters: `:,<[;\>]"`.
The accepted ones are printable US-ASCII characters not including the specials, i.e.:
  - Latin letters `a` to `z` and `A` to `Z`
  - Digits `0` to `9`
  - Printable characters `!#$%&'*+-/=?^_`{|}~`
  - Dot `.`, as long as it is not the first or last character and that it does not appear consecutively
LocalPartSyntaxError: Invalid syntax

True