Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation of email address with non ASCII character fails #14

Open
Alina-Valea-Forter opened this issue Oct 18, 2023 · 2 comments
Open

Comments

@Alina-Valea-Forter
Copy link

Alina-Valea-Forter commented Oct 18, 2023

Hi,

The json that I am trying to validate has a non ASCII character in the email field e.g. the Spanish letter "ñ" and consequently validation fails with the following error:

A subschema had errors - #/email
Value fails format check "email", was "mu\u00F1eca@test.com" - #/email

Here is the test code:

fun test() {
    val schemaString = """
        {
          "${'$'}schema": "http://json-schema.org/draft-07/schema#",
          "type": "object",
          "properties": {
            "email": {
              "type": "string",
              "format": "email"
            }
          },
          "required": ["email"]
        }
    """.trimIndent()

    val jsonString = """
        {
            "email": "muñeca@test.com"
        }
    """.trimIndent()

    val schema = JSONSchema.parse(schemaString)
    val output = schema.validateBasic(jsonString)
    require(output.errors == null) {
        output.errors?.forEach {
            println("${it.error} - ${it.instanceLocation}")
        }
        "Json schema validation failed."
    }
}

Is there a way around this?

@pwall567
Copy link
Owner

Hi, thanks for the message.

In implementing this library I have attempted to follow strictly the JSON Schema specification, which says (JSON Schema Validation, section 7.3.2:

email: As defined by the "Mailbox" ABNF rule in RFC 5321, section 4.1.2

And RFC 5321, section 4.1.2 contains the following ABNF rules:

Mailbox        = Local-part "@" ( Domain / address-literal )

Local-part     = Dot-string / Quoted-string

Dot-string     = Atom *("."  Atom)

Atom           = 1*atext

atext is defined in RFC5322 section 3.2.3 as being the ASCII alphabetic and numeric characters, plus the following ASCII special characters:

! # $ % & ' * + - / = ? ^ _ ` { | } ~

The Quoted-string rule allows any combination of ASCII characters within double quotes, but even that does not allow characters above hex 7E. In fact, the specification goes on to say:

Systems MUST NOT define mailboxes in such a way as to require the use in SMTP of non-ASCII characters (octets with the high order bit set to one) or ASCII "control characters" (decimal value 0-31 and 127).

I realise that in practice, many mail systems may ignore these rules and allow non-ASCII characters in mail addresses, but I feel that as an implementer of JSON Schema I have no option but to follow the specification as closely as possible.

All this explanation doesn't help in your case, but you might like to try a pattern validation – the emailregex web site contains a number of suggestions (the form of Regex used by the library is of course the Java form).

I may consider allowing pluggable implementations of the format validations in a later version of the library, but I can't give you a timeline for that.

Sorry I can't be more help,

-Peter Wall

@Alina-Valea-Forter
Copy link
Author

Thanks, that was very informative. I will look into other options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants