Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse UTF-8 emails per RFC6532 #1103

Closed
wants to merge 1 commit into from

Conversation

@jeremy
Copy link
Collaborator

commented May 13, 2017

Implement RFC6532 extension to RFC5322 for parsing UTF-8 messages.

  • Ragel parser for valid UTF-8 characters
  • Parse as bytes rather than chars
  • Encode parsed strings as UTF-8
  • No longer b/q-encode UTF-8 header values when parsing emails
  • For compatibility with others, b/q-encode UTF-8 headers when generating emails

Fixes #39

@jeremy jeremy added this to the 2.7.0 milestone May 13, 2017

@jeremy jeremy force-pushed the jeremy:utf8-header-parsing branch 6 times, most recently from dfecac1 to da0d4b9 May 13, 2017

@jeremy jeremy force-pushed the jeremy:utf8-header-parsing branch 5 times, most recently from e542dbd to ccfd476 May 14, 2017

Parse UTF-8 mail headers per RFC6532
Implement RFC6532 extension to RFC5322 for parsing UTF-8 messages.

* Ragel parser for valid UTF-8 characters
* Parse as bytes rather than chars
* Encode parsed strings as UTF-8
* No longer b/q-encode UTF-8 header values when parsing emails
* For compatibility with others, b/q-encode UTF-8 headers when generating emails

Fixes #39
@nickrivadeneira nickrivadeneira referenced this pull request May 1, 2018

koppen added a commit to substancelab/activemodel-email_address_validator that referenced this pull request Aug 2, 2019

Clarify support for UTF-8 characters in email addresses
First of all, we already validate those as valid, so this is just a codification
of that expectation.

Second of all, it is a valid email address - at least according to RFC6532
<https://tools.ietf.org/html/rfc6532>. It still needs to be encoded properly
before actually being delivered. The Mail gem added support for this in v2.7.0:
mikel/mail#1103.

That said, actually finding services that support this is still hard (in 2019).
Neither Gmail nor Hotmail supports this for their own local accounts, but can
send to email addresses with non-US ASCII characters. Exim has added
experimental support, it seems (https://bugs.exim.org/show_bug.cgi?id=1516).

Firefox, Chrome, and Safari rejects email addresses with non-US ASCII characters
when validating email fields. Mailguns email validation API says the address is
invalid (or rather it gives `"result": "unknown", `).

Regardless of the lack of widespread support we allow these addresses because
the guiding principle is:

> Don't reject a valid email address realistically in use by a potential user.
> Err on the side of accepting too much.

An email address with non-US ASCII characters can realistically be in use by a
potential user, even though the major international players doesn't support
that.

koppen added a commit to substancelab/activemodel-email_address_validator that referenced this pull request Aug 2, 2019

Clarify support for UTF-8 characters in email addresses
First of all, we already validate those as valid, so this is just a codification
of that expectation.

Second of all, it is a valid email address - at least according to RFC6532
<https://tools.ietf.org/html/rfc6532>. It still needs to be encoded properly
before actually being delivered. The Mail gem added support for this in v2.7.0:
mikel/mail#1103.

That said, actually finding services that support this is still hard (in 2019).
Neither Gmail nor Hotmail supports this for their own local accounts, but can
send to email addresses with non-US ASCII characters. Exim has added
experimental support, it seems (https://bugs.exim.org/show_bug.cgi?id=1516).

Firefox, Chrome, and Safari rejects email addresses with non-US ASCII characters
when validating email fields. Mailguns email validation API says the address is
invalid (or rather it gives `"result": "unknown", `).

Regardless of the lack of widespread support we allow these addresses because
the guiding principle is:

> Don't reject a valid email address realistically in use by a potential user.
> Err on the side of accepting too much.

An email address with non-US ASCII characters can realistically be in use by a
potential user, even though the major international players doesn't support
that.

koppen added a commit to substancelab/activemodel-email_address_validator that referenced this pull request Aug 2, 2019

Clarify support for UTF-8 characters in email addresses
First of all, we already validate those as valid, so this is just a codification
of that expectation.

Second of all, it is a valid email address - at least according to RFC6532
<https://tools.ietf.org/html/rfc6532>. It still needs to be encoded properly
before actually being delivered. The Mail gem added support for this in v2.7.0:
mikel/mail#1103.

That said, actually finding services that support this is still hard (in 2019).
Neither Gmail nor Hotmail supports this for their own local accounts, but can
send to email addresses with non-US ASCII characters. Exim has added
experimental support, it seems (https://bugs.exim.org/show_bug.cgi?id=1516).

Firefox, Chrome, and Safari rejects email addresses with non-US ASCII characters
when validating email fields. Mailguns email validation API says the address is
invalid (or rather it gives `"result": "unknown", `).

Regardless of the lack of widespread support we allow these addresses because
the guiding principle is:

> Don't reject a valid email address realistically in use by a potential user.
> Err on the side of accepting too much.

An email address with non-US ASCII characters can realistically be in use by a
potential user, even though the major international players doesn't support
that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.