### An example dataset with email addresses

In [None]:
import pandas as pd
import numpy as np
df = pd.DataFrame({
    "email": [
        "yi@gmali.com", "yi@sfu.ca", "y i@sfu.ca", "Yi@gmail.com",
        "H ELLO@hotmal.COM", "hello", np.nan, "NULL"
    ]
})
df

## 1. Default clean_email()

By default, `clean_email()` will do a strict check to determine if an email address is in the correct format and set invalid values to NaN.

In [None]:
from dataprep.clean import clean_email
clean_email(df, "email")

## 2. `split` parameter

By setting the `split` parameter to True, the returned table will contain separate columns for the domain and username of valid emails.

In [None]:
clean_email(df, "email", split=True)

## 3. `remove_whitespace` parameter

When the `remove_whitespace` parameter is set to True, whitespace will be removed before checking if an email is valid.

In [None]:
clean_email(df, "email", remove_whitespace=True)

## 4. `fix_domain` parameter

When the `fix_domain` parameter is set to True, `clean_email()` will try to correct invalid domains.

In [None]:
clean_email(df, "email", fix_domain=True)

## 5. `error` parameter

When `errors="ignore"`, invalid emails will be left unchanged in the output

In [None]:
clean_email(df, "email", errors="ignore")

## 6. `validate_email()`

The function `validate_email()` returns True if an email address is valid and False otherwise. It can be applied on a string or a column of email addresses.

In [None]:
from dataprep.clean import validate_email
print(validate_email('Abc.example.com'))
print(validate_email('prettyandsimple@example.com'))
print(validate_email('disposable.style.email.with+symbol@example.com'))
print(validate_email('this is"not\allowed@example.com'))

In [None]:
validate_email(df["email"])

Note that `validate_email()` will do the strict semantic check by default.