In [2]:
import re

#### Version 1

In [3]:
# Version 1
def is_valid_email(email):
    email_regex = r"\w+([\.-]?\w+)*@\w+([\.-]?\w+)*(\.\w{2,4})+$"
    return re.match(email_regex, email) is not None

Code explanation:

This code defines a Python function called `is_valid_email` that uses a regular expression to validate email addresses. Let's break down the regex pattern step by step:

1. `r"\w+([\.-]?\w+)*"`: This part matches the username portion of the email.
   - `\w+`: One or more word characters (letters, digits, or underscores)
   - `([\.-]?\w+)*`: Optionally followed by a dot or hyphen, then more word characters, repeated any number of times

2. `@`: Matches the @ symbol

3. `\w+([\.-]?\w+)*`: This matches the domain name, similar to the username part
   - `\w+`: One or more word characters
   - `([\.-]?\w+)*`: Optionally followed by a dot or hyphen, then more word characters, repeated any number of times

4. `(\.\w{2,4})+$`: This matches the top-level domain
   - `\.`: A literal dot
   - `\w{2,4}`: 2 to 4 word characters
   - `+$`: One or more times, until the end of the string

The function returns `True` if the email matches this pattern, and `False` otherwise.

In [4]:
# List of email addresses to test for version 1
emails_to_test = [
    "user@example.com",
    "user.name@example.co.uk",
    "user-name@example.org",
    "user123@example.net",
    "invalid.email@example",
    "invalid@.com",
    "@invalid.com",
    "user@invalid.",
    "user@invalid",
    "user@example.toolong",
]

# Test each email and print the result
for email in emails_to_test:
    if is_valid_email(email):
        print(f"{email} is a valid email address.")
    else:
        print(f"{email} is not a valid email address.")

user@example.com is a valid email address.
user.name@example.co.uk is a valid email address.
user-name@example.org is a valid email address.
user123@example.net is a valid email address.
invalid.email@example is not a valid email address.
invalid@.com is not a valid email address.
@invalid.com is not a valid email address.
user@invalid. is not a valid email address.
user@invalid is not a valid email address.
user@example.toolong is not a valid email address.


#### Version 2

In [5]:
# Version 2
def is_valid_email(email):
    # More comprehensive regex pattern for email validation
    email_regex = r'^(?=[a-zA-Z0-9@._%+-]{6,254}$)[a-zA-Z0-9._%+-]{1,64}@(?:[a-zA-Z0-9-]{1,63}\.){1,8}[a-zA-Z]{2,63}$'
    
    # Explanation of regex components:
    # ^ - Start of string
    # (?=[a-zA-Z0-9@._%+-]{6,254}$) - Positive lookahead for total length between 6 and 254 characters
    # [a-zA-Z0-9._%+-]{1,64} - Username: 1-64 characters of allowed symbols
    # @ - Literal @
    # (?:[a-zA-Z0-9-]{1,63}\.){1,8} - Domain: 1-8 parts, each 1-63 characters long, separated by dots
    # [a-zA-Z]{2,63} - Top-level domain: 2-63 characters, only letters
    # $ - End of string
    
    return re.match(email_regex, email) is not None

# Additional check for common typos and invalid patterns
def has_common_errors(email):
    common_errors = [
        r'\s',  # Contains whitespace
        r'\.{2,}',  # Contains consecutive dots
        r'^[.-]',  # Starts with a dot or hyphen
        r'[.-]@',  # Ends with a dot or hyphen before @
        r'@[.-]',  # Starts with a dot or hyphen after @
        r'[.-]$'   # Ends with a dot or hyphen
    ]
    return any(re.search(pattern, email) for pattern in common_errors)

def is_email_valid(email):
    return is_valid_email(email) and not has_common_errors(email)

Code explanation:

Certainly. Let's break down this more complex regular expression step by step:

1. `^` - Matches the start of the string.

2. `(?=[a-zA-Z0-9@._%+-]{6,254}$)` - This is a positive lookahead:
   - It ensures the entire email is between 6 and 254 characters long.
   - `[a-zA-Z0-9@._%+-]` allows letters, numbers, and common email special characters.

3. `[a-zA-Z0-9._%+-]{1,64}` - This matches the username part of the email:
   - Allows letters, numbers, and some special characters.
   - Must be between 1 and 64 characters long.

4. `@` - Matches the @ symbol literally.

5. `(?:[a-zA-Z0-9-]{1,63}\.){1,8}` - This matches the domain name:
   - `[a-zA-Z0-9-]{1,63}` allows letters, numbers, and hyphens, 1-63 characters long.
   - `\.` matches a literal dot.
   - `{1,8}` allows this pattern to repeat 1 to 8 times, for multiple subdomains.

6. `[a-zA-Z]{2,63}` - This matches the top-level domain:
   - Only allows letters.
   - Must be between 2 and 63 characters long.

7. `$` - Matches the end of the string.

This regex improves upon the previous version by:

- Enforcing overall length limits (6-254 characters).
- Setting specific length limits for username (max 64 chars) and domain parts (max 63 chars each).
- Allowing multiple subdomains.
- Restricting the top-level domain to letters only.

It's more precise in following email standards, but it's also more complex to read and understand. The combination of this regex with the additional checks for common errors (as in the `has_common_errors` function) provides a robust email validation system.

In [6]:
# List of email addresses to test
emails_to_test = [
    "user@example.com",
    "user.name@example.co.uk",
    "user-name@example.org",
    "user123@example.net",
    "invalid.email@example",
    "invalid@.com",
    "@invalid.com",
    "user@invalid.",
    "user@invalid",
    "user@example.toolong",
    "user..name@example.com",
    "user@exam ple.com",
    ".user@example.com",
    "user.@example.com",
    "us er@example.com",
    "user@example..com",
    "user@-example.com",
    "user@example.com-",
    "verylongusernamethatexceedssixtyfourcharactersisnotallowedbyemailstandards@example.com"
]

# Test each email and print the result
for email in emails_to_test:
    if is_email_valid(email):
        print(f"{email} is a valid email address.")
    else:
        print(f"{email} is not a valid email address.")

user@example.com is a valid email address.
user.name@example.co.uk is a valid email address.
user-name@example.org is a valid email address.
user123@example.net is a valid email address.
invalid.email@example is not a valid email address.
invalid@.com is not a valid email address.
@invalid.com is not a valid email address.
user@invalid. is not a valid email address.
user@invalid is not a valid email address.
user@example.toolong is a valid email address.
user..name@example.com is not a valid email address.
user@exam ple.com is not a valid email address.
.user@example.com is not a valid email address.
user.@example.com is not a valid email address.
us er@example.com is not a valid email address.
user@example..com is not a valid email address.
user@-example.com is not a valid email address.
user@example.com- is not a valid email address.
verylongusernamethatexceedssixtyfourcharactersisnotallowedbyemailstandards@example.com is not a valid email address.
