Skip to content

Validating XML against XSD is slow for long strings if maxLength and pattern restrictions are defined

Notifications You must be signed in to change notification settings

petmaark/xsd-pattern-validation-test

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

XSD pattern validation example

Overview

This POC demonstrates that during JAXB unmarshalling invalid input can slow down the validation process if there is a regex pattern defined on the input.

Resources

xsd

There are two xsd schema definitions under the resources, both defining a schema for the following xml structure, where the value present in the data:notBlank tag must not be empty string, and must have a length between 1 and 255.

<data:NotBlankElement xmlns:data="http://icellmobilsoft.hu/pattern/validation/test/dto">
<!--Optional:-->
<data:notBlank>string</data:notBlank>
</data:NotBlankElement>

The xsds defines the restrictions in different ways:

xsd/hu/icellmobilsoft/pattern/validation/test/dto/unsafe.xsd

unsafe.xsd defines a simple type named SimpleText255NotBlankType it is a string with the following restrictions:

  • minLength = 1

  • maxLength = 255

  • pattern = .*[^\s].*

    <xs:simpleType name="SimpleText255NotBlankType">
        <xs:annotation>
            <xs:documentation xml:lang="en">String of maximum 255 characters, not blank</xs:documentation>
        </xs:annotation>
        <xs:restriction base="xs:string">
            <xs:minLength value="1"/>
            <xs:maxLength value="255"/>
            <xs:pattern value=".*[^\s].*"/>
        </xs:restriction>
    </xs:simpleType>

xsd/hu/icellmobilsoft/pattern/validation/test/dto/workaround.xsd

workaround.xsd defines a simple type named SimpleText255Type it is a string with the following restrictions:

  • minLength = 1

  • maxLength = 255

  • pattern = .{1,255} (regex pattern for min and max length)

This xsd also defines SimpleText255NotBlankType which restricts SimpleText255Type with the following pattern .*[^\s].*.

     <xs:simpleType name="SimpleText255Type">
        <xs:annotation>
            <xs:documentation xml:lang="en">String of maximum 255 characters</xs:documentation>
        </xs:annotation>
        <xs:restriction base="xs:string">
            <xs:minLength value="1"/>
            <xs:maxLength value="255"/>
            <xs:pattern value=".{1,255}"/>
        </xs:restriction>
    </xs:simpleType>
    <xs:simpleType name="SimpleText255NotBlankType">
        <xs:annotation>
            <xs:documentation xml:lang="en">String of maximum 255 characters, not blank</xs:documentation>
        </xs:annotation>
        <xs:restriction base="SimpleText255Type">
            <xs:pattern value=".*[^\s].*"/>
        </xs:restriction>
    </xs:simpleType>

xml

The project contains 3 xml files for testing:

  • valid.xml

    valid by both xsd-s

  • invalid.xml

    contains an empty string as value in the notBlank tag.

  • long_string.xml

    contains a long (ca. 1 000 000 chars) string as value in the notBlank tag.

Running

Prerequisite

  • java 8+

  • maven

Compile and run

  • Run mvn clean install in xsd-pattern-validation-test root directory

  • Run java -jar target/xsd-pattern-validation-test-1.0.0-SNAPSHOT.jar from project root directory

MainApplication.main()

The main method unmarshalls and validates the 3 xml-s by both xsd-s.

Effect:

The unmarshalling and validating of long_string.xml by the unsafe.xsd takes a lof more time (several minutes), then the other xml-s, even if its clearly invalid because of the character length. The reason is that the xerces implementation validates the regex first, and the other restrictions later, which is quite time consuming for a 1000000 character long string.

A possible workaorund for the problem is shown in the workaround.xsd. It defines a parent simple type with a simple regex limiting length (.{1,255}). The workaround validates the specific value much more quickly, because xerces validates the regex patterns on the extension chain, starting with the farest parent, until the first unmatching regex. Therefore the regex for the length is evaluated before the not blank regex, and since it fails quickly the other regex won’t be evaluated.

Warning
The validation error will be different for the two xsds. The unsafe.xsd will report cvc-maxLength-valid since the value conforms to the not blank regex pattern (.*[^\s].\*) but violates the maxLength restriction. The workaround.xsd will report cvc-pattern-valid error as the value violates the first regex (.{1,255}), therefore it won’t be checked against the remaining restrictions.

Example

log

[user@user-pc xsd-pattern-validation-test]$ java -jar target/xsd-pattern-validation-test-1.0.0-SNAPSHOT.jar
Nov 05, 2019 12:59:54 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Unmarshalling xml:[valid.xml], xsd:[xsd/hu/icellmobilsoft/pattern/validation/test/dto/unsafe.xsd]...
Nov 05, 2019 12:59:54 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Unmarshall finished, xml:[valid.xml], xsd:[xsd/hu/icellmobilsoft/pattern/validation/test/dto/unsafe.xsd], unmarshalled:[hu.icellmobilsoft.xsd.pattern.validation.test.dto.NotBlankElement@19dfb72a]
Nov 05, 2019 12:59:54 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Time elapsed: 10 ms
Nov 05, 2019 12:59:54 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Unmarshalling xml:[invalid.xml], xsd:[xsd/hu/icellmobilsoft/pattern/validation/test/dto/unsafe.xsd]...
Nov 05, 2019 12:59:54 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
SEVERE: Unmarshall failed, xml:[invalid.xml], xsd:[xsd/hu/icellmobilsoft/pattern/validation/test/dto/unsafe.xsd]...
javax.xml.bind.UnmarshalException
 - with linked exception:
[org.xml.sax.SAXParseException; lineNumber: 3; columnNumber: 35; cvc-pattern-valid: Value ' ' is not facet-valid with respect to pattern '.*[^\s].*' for type 'SimpleText255NotBlankType'.]
        at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:335)
        ...
Caused by: org.xml.sax.SAXParseException; lineNumber: 3; columnNumber: 35; cvc-pattern-valid: Value ' ' is not facet-valid with respect to pattern '.*[^\s].*' for type 'SimpleText255NotBlankType'.
        at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
        ...

Nov 05, 2019 12:59:54 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Time elapsed: 6 ms
Nov 05, 2019 12:59:54 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Unmarshalling xml:[long_string.xml], xsd:[xsd/hu/icellmobilsoft/pattern/validation/test/dto/unsafe.xsd]...
Nov 05, 2019 1:03:41 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
SEVERE: Unmarshall failed, xml:[long_string.xml], xsd:[xsd/hu/icellmobilsoft/pattern/validation/test/dto/unsafe.xsd]...
javax.xml.bind.UnmarshalException
 - with linked exception:
[org.xml.sax.SAXParseException; lineNumber: 3; columnNumber: 1048636; cvc-maxLength-valid: Value 'KORLÄÂĂ...ĂÂÄÂĂÂG' with length = '1048602' is not facet-valid with respect to maxLength '255' for type 'SimpleText255NotBlankType'.]
        at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:335)
       ...
Caused by: org.xml.sax.SAXParseException; lineNumber: 3; columnNumber: 1048636; cvc-maxLength-valid: Value 'KORLÄÂĂ...ĂÂÄÂĂÂG' with length = '1048602' is not facet-valid with respect to maxLength '255' for type 'SimpleText255NotBlankType'.
        at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
        ...

Nov 05, 2019 1:03:41 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Time elapsed: 226,955 ms
Nov 05, 2019 1:03:41 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Unmarshalling xml:[valid.xml], xsd:[xsd/hu/icellmobilsoft/pattern/validation/test/dto/workaround.xsd]...
Nov 05, 2019 1:03:41 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Unmarshall finished, xml:[valid.xml], xsd:[xsd/hu/icellmobilsoft/pattern/validation/test/dto/workaround.xsd], unmarshalled:[hu.icellmobilsoft.xsd.pattern.validation.test.dto.NotBlankElement@38082d64]
Nov 05, 2019 1:03:41 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Time elapsed: 1 ms
Nov 05, 2019 1:03:41 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Unmarshalling xml:[invalid.xml], xsd:[xsd/hu/icellmobilsoft/pattern/validation/test/dto/workaround.xsd]...
Nov 05, 2019 1:03:41 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
SEVERE: Unmarshall failed, xml:[invalid.xml], xsd:[xsd/hu/icellmobilsoft/pattern/validation/test/dto/workaround.xsd]...
javax.xml.bind.UnmarshalException
 - with linked exception:
[org.xml.sax.SAXParseException; lineNumber: 3; columnNumber: 35; cvc-pattern-valid: Value ' ' is not facet-valid with respect to pattern '.*[^\s].*' for type 'SimpleText255NotBlankType'.]
        at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:335)
        ...
Caused by: org.xml.sax.SAXParseException; lineNumber: 3; columnNumber: 35; cvc-pattern-valid: Value ' ' is not facet-valid with respect to pattern '.*[^\s].*' for type 'SimpleText255NotBlankType'.
        at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
        ...

Nov 05, 2019 1:03:41 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Time elapsed: 3 ms
Nov 05, 2019 1:03:41 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Unmarshalling xml:[long_string.xml], xsd:[xsd/hu/icellmobilsoft/pattern/validation/test/dto/workaround.xsd]...
Nov 05, 2019 1:03:41 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
SEVERE: Unmarshall failed, xml:[long_string.xml], xsd:[xsd/hu/icellmobilsoft/pattern/validation/test/dto/workaround.xsd]...
javax.xml.bind.UnmarshalException
 - with linked exception:
[org.xml.sax.SAXParseException; lineNumber: 3; columnNumber: 1048636; cvc-pattern-valid: Value 'KORLÄÂĂ...ĂÂÄÂĂÂG' is not facet-valid with respect to pattern '.{1,255}' for type 'SimpleText255NotBlankType'.]
        at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:335)
        ...
Caused by: org.xml.sax.SAXParseException; lineNumber: 3; columnNumber: 1048636; cvc-pattern-valid: Value 'KORLÄÂĂ...ĂÂÄÂĂÂG' is not facet-valid with respect to pattern '.{1,255}' for type 'SimpleText255NotBlankType'.
        at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
        ...

Nov 05, 2019 1:03:41 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Time elapsed: 290 ms
Nov 05, 2019 1:03:41 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication logStatistics
INFO: Statistics:
UnmarshallStatistics{timeElapsed=10, xmlInput='valid.xml', xsdPath='xsd/hu/icellmobilsoft/pattern/validation/test/dto/unsafe.xsd', valid=true}
UnmarshallStatistics{timeElapsed=6, xmlInput='invalid.xml', xsdPath='xsd/hu/icellmobilsoft/pattern/validation/test/dto/unsafe.xsd', valid=false}
UnmarshallStatistics{timeElapsed=226955, xmlInput='long_string.xml', xsdPath='xsd/hu/icellmobilsoft/pattern/validation/test/dto/unsafe.xsd', valid=false}
UnmarshallStatistics{timeElapsed=1, xmlInput='valid.xml', xsdPath='xsd/hu/icellmobilsoft/pattern/validation/test/dto/workaround.xsd', valid=true}
UnmarshallStatistics{timeElapsed=3, xmlInput='invalid.xml', xsdPath='xsd/hu/icellmobilsoft/pattern/validation/test/dto/workaround.xsd', valid=false}
UnmarshallStatistics{timeElapsed=290, xmlInput='long_string.xml', xsdPath='xsd/hu/icellmobilsoft/pattern/validation/test/dto/workaround.xsd', valid=false}
Table 1. Example statistics

Time elapsed (ms)

Xml input

Xsd used for validation

Validation result

10

valid.xml

unsafe.xsd

VALID

6

invalid.xml

unsafe.xsd

INVALID

226955

long_string.xml

unsafe.xsd

INVALID

1

valid.xml

workaround.xsd

VALID

3

invalid.xml

workaround.xsd

INVALID

290

long_string.xml

workaround.xsd

INVALID

About

Validating XML against XSD is slow for long strings if maxLength and pattern restrictions are defined

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages