# Email validator

The aim of this project was to optimize and improve the performance of the initial Python code for email validation by implementing various optimizations, ultimately resulting in a faster and more efficient implementation.

In [2]:
import re
from typing import List

## Version 1

In [3]:
def valid_emails(strings: List[str]) -> List[str]:
    """Take list of potential emails and returns only valid ones"""

    valid_email_regex = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"

    def is_valid_email(email: str) -> bool:
        return bool(re.fullmatch(valid_email_regex, email))

    emails = []
    for email in strings:
        if is_valid_email(email):
            emails.append(email)

    return emails

There are several ways to potentially speed up the code:

1. The regular expression pattern is currently compiled every time the function is called. One solution is to precompile the pattern outside of the function and pass it in as an argument. This can speed up the function because the pattern only needs to be compiled once, rather than every time the function is called.
2. Instead of using a for loop to append valid emails to a list, you can use a list comprehension to create the list directly. List comprehensions are generally faster than for loops.
3. The built-in `filter()` function can be used to filter the list of strings based on the `is_valid_email()` function. This is generally faster than a for loop or list comprehension because it avoids creating a new list.

## Version 2. Precompiled regular expression

In [4]:
valid_email_regex = re.compile(r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$")

def valid_emails_2(strings: List[str], pattern: str) -> List[str]:
    """Take list of potential emails and returns only valid ones"""

    def is_valid_email(email: str) -> bool:
        return bool(pattern.fullmatch(email))

    emails = []
    for email in strings:
        if is_valid_email(email):
            emails.append(email)

    return emails

## Version 3. Precompiled regular expression and list comprehension

In [8]:
valid_email_regex = re.compile(r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$")

def valid_emails_3(strings: List[str], pattern: str) -> List[str]:
    """Take list of potential emails and returns only valid ones"""

    def is_valid_email(email: str) -> bool:
        return bool(pattern.fullmatch(email))

    emails = [email for email in strings if is_valid_email(email)]

    return emails

## Version 4. Precompiled regular expression and filter

In [13]:
valid_email_regex = re.compile(r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$")

def valid_emails_4(strings: List[str], pattern: str) -> List[str]:
    """Take list of potential emails and returns only valid ones"""

    def is_valid_email(email: str) -> bool:
        return bool(pattern.fullmatch(email))

    emails = list(filter(is_valid_email, strings))

    return emails

## Testing

In [5]:
emails = [
    "example1@email.com", "invalid1.email@.com", "anotherexample1@email.com", "not_an_email1", "email1@domain.", "no_at_symbol1.com",
    "example2@email.com", "invalid2.email@.com", "anotherexample2@email.com", "not_an_email2", "email2@domain.", "no_at_symbol2.com",
    "example3@email.com", "invalid3.email@.com", "anotherexample3@email.com", "not_an_email3", "email3@domain.", "no_at_symbol3.com",
    "example4@email.com", "invalid4.email@.com", "anotherexample4@email.com", "not_an_email4", "email4@domain.", "no_at_symbol4.com",
    "example5@email.com", "invalid5.email@.com", "anotherexample5@email.com", "not_an_email5", "email5@domain.", "no_at_symbol5.com",
    "example6@email.com", "invalid6.email@.com", "anotherexample6@email.com", "not_an_email6", "email6@domain.", "no_at_symbol6.com",
    "example7@email.com", "invalid7.email@.com", "anotherexample7@email.com", "not_an_email7", "email7@domain.", "no_at_symbol7.com",
    "example8@email.com", "invalid8.email@.com", "anotherexample8@email.com", "not_an_email8", "email8@domain.", "no_at_symbol8.com",
    "example9@email.com", "invalid9.email@.com", "anotherexample9@email.com", "not_an_email9", "email9@domain.", "no_at_symbol9.com",
    "example10@email.com", "invalid10.email@.com", "anotherexample10@email.com", "not_an_email10", "email10@domain.", "no_at_symbol10.com",
]

In [15]:
%%timeit
valid_emails(emails)

244 µs ± 21.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [16]:
%%timeit
valid_emails_2(emails, valid_email_regex)

121 µs ± 13.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [17]:
%%timeit
valid_emails_3(emails, valid_email_regex)

101 µs ± 11 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [18]:
%%timeit
valid_emails_4(emails, valid_email_regex)

87.9 µs ± 8.98 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


This pet-project aimed to create a faster implementation of email validation in Python using regular expressions. The original code used a for loop to iterate over a list of potential emails, applying a regular expression pattern to each one to determine if it was a valid email. The new implementation incorporated several optimizations to speed up this process.

Firstly, the regular expression pattern was precompiled outside of the function and passed in as an argument. This meant that the pattern was only compiled once, rather than every time the function was called, resulting in a significant performance boost.

Secondly, instead of using a for loop to append valid emails to a list, a list comprehension was used to create the list directly. List comprehensions are generally faster than for loops and can result in improved performance.

Lastly, the built-in filter() function was used to filter the list of strings based on the is_valid_email() function. This is generally faster than a for loop or list comprehension because it avoids creating a new list.

In summary, this project showcased how the performance of a Python implementation for email validation can be enhanced through various optimizations, resulting in a faster and more efficient implementation.