This POC demonstrates that during JAXB unmarshalling invalid input can slow down the validation process if there is a regex pattern defined on the input.
There are two xsd schema definitions under the resources, both defining a schema for the following xml structure, where the value present in the data:notBlank
tag must not be empty string, and must have a length between 1 and 255.
<data:NotBlankElement xmlns:data="http://icellmobilsoft.hu/pattern/validation/test/dto">
<!--Optional:-->
<data:notBlank>string</data:notBlank>
</data:NotBlankElement>
The xsds defines the restrictions in different ways:
unsafe.xsd
defines a simple type named SimpleText255NotBlankType
it is a string with the following restrictions:
-
minLength = 1
-
maxLength = 255
-
pattern =
.*[^\s].*
<xs:simpleType name="SimpleText255NotBlankType">
<xs:annotation>
<xs:documentation xml:lang="en">String of maximum 255 characters, not blank</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:string">
<xs:minLength value="1"/>
<xs:maxLength value="255"/>
<xs:pattern value=".*[^\s].*"/>
</xs:restriction>
</xs:simpleType>
workaround.xsd
defines a simple type named SimpleText255Type
it is a string with the following restrictions:
-
minLength = 1
-
maxLength = 255
-
pattern =
.{1,255}
(regex pattern for min and max length)
This xsd also defines SimpleText255NotBlankType
which restricts SimpleText255Type
with the following pattern .*[^\s].*
.
<xs:simpleType name="SimpleText255Type">
<xs:annotation>
<xs:documentation xml:lang="en">String of maximum 255 characters</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:string">
<xs:minLength value="1"/>
<xs:maxLength value="255"/>
<xs:pattern value=".{1,255}"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="SimpleText255NotBlankType">
<xs:annotation>
<xs:documentation xml:lang="en">String of maximum 255 characters, not blank</xs:documentation>
</xs:annotation>
<xs:restriction base="SimpleText255Type">
<xs:pattern value=".*[^\s].*"/>
</xs:restriction>
</xs:simpleType>
-
Run
mvn clean install
inxsd-pattern-validation-test
root directory -
Run
java -jar target/xsd-pattern-validation-test-1.0.0-SNAPSHOT.jar
from project root directory
The main method unmarshalls and validates the 3 xml-s by both xsd-s.
The unmarshalling and validating of long_string.xml
by the unsafe.xsd
takes a lof more time (several minutes), then the other xml-s, even if its clearly invalid because of the character length.
The reason is that the xerces implementation validates the regex first, and the other restrictions later, which is quite time consuming for a 1000000 character long string.
A possible workaorund for the problem is shown in the workaround.xsd
.
It defines a parent simple type with a simple regex limiting length (.{1,255}
).
The workaround validates the specific value much more quickly, because xerces validates the regex patterns on the extension chain, starting with the farest parent, until the first unmatching regex.
Therefore the regex for the length is evaluated before the not blank regex, and since it fails quickly the other regex won’t be evaluated.
Warning
|
The validation error will be different for the two xsds.
The unsafe.xsd will report cvc-maxLength-valid since the value conforms to the not blank regex pattern (.*[^\s].\* ) but violates the maxLength restriction.
The workaround.xsd will report cvc-pattern-valid error as the value violates the first regex (.{1,255} ), therefore it won’t be checked against the remaining restrictions.
|
[user@user-pc xsd-pattern-validation-test]$ java -jar target/xsd-pattern-validation-test-1.0.0-SNAPSHOT.jar
Nov 05, 2019 12:59:54 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Unmarshalling xml:[valid.xml], xsd:[xsd/hu/icellmobilsoft/pattern/validation/test/dto/unsafe.xsd]...
Nov 05, 2019 12:59:54 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Unmarshall finished, xml:[valid.xml], xsd:[xsd/hu/icellmobilsoft/pattern/validation/test/dto/unsafe.xsd], unmarshalled:[hu.icellmobilsoft.xsd.pattern.validation.test.dto.NotBlankElement@19dfb72a]
Nov 05, 2019 12:59:54 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Time elapsed: 10 ms
Nov 05, 2019 12:59:54 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Unmarshalling xml:[invalid.xml], xsd:[xsd/hu/icellmobilsoft/pattern/validation/test/dto/unsafe.xsd]...
Nov 05, 2019 12:59:54 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
SEVERE: Unmarshall failed, xml:[invalid.xml], xsd:[xsd/hu/icellmobilsoft/pattern/validation/test/dto/unsafe.xsd]...
javax.xml.bind.UnmarshalException
- with linked exception:
[org.xml.sax.SAXParseException; lineNumber: 3; columnNumber: 35; cvc-pattern-valid: Value ' ' is not facet-valid with respect to pattern '.*[^\s].*' for type 'SimpleText255NotBlankType'.]
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:335)
...
Caused by: org.xml.sax.SAXParseException; lineNumber: 3; columnNumber: 35; cvc-pattern-valid: Value ' ' is not facet-valid with respect to pattern '.*[^\s].*' for type 'SimpleText255NotBlankType'.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
...
Nov 05, 2019 12:59:54 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Time elapsed: 6 ms
Nov 05, 2019 12:59:54 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Unmarshalling xml:[long_string.xml], xsd:[xsd/hu/icellmobilsoft/pattern/validation/test/dto/unsafe.xsd]...
Nov 05, 2019 1:03:41 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
SEVERE: Unmarshall failed, xml:[long_string.xml], xsd:[xsd/hu/icellmobilsoft/pattern/validation/test/dto/unsafe.xsd]...
javax.xml.bind.UnmarshalException
- with linked exception:
[org.xml.sax.SAXParseException; lineNumber: 3; columnNumber: 1048636; cvc-maxLength-valid: Value 'KORLÄÂĂ...ĂÂÄÂĂÂG' with length = '1048602' is not facet-valid with respect to maxLength '255' for type 'SimpleText255NotBlankType'.]
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:335)
...
Caused by: org.xml.sax.SAXParseException; lineNumber: 3; columnNumber: 1048636; cvc-maxLength-valid: Value 'KORLÄÂĂ...ĂÂÄÂĂÂG' with length = '1048602' is not facet-valid with respect to maxLength '255' for type 'SimpleText255NotBlankType'.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
...
Nov 05, 2019 1:03:41 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Time elapsed: 226,955 ms
Nov 05, 2019 1:03:41 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Unmarshalling xml:[valid.xml], xsd:[xsd/hu/icellmobilsoft/pattern/validation/test/dto/workaround.xsd]...
Nov 05, 2019 1:03:41 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Unmarshall finished, xml:[valid.xml], xsd:[xsd/hu/icellmobilsoft/pattern/validation/test/dto/workaround.xsd], unmarshalled:[hu.icellmobilsoft.xsd.pattern.validation.test.dto.NotBlankElement@38082d64]
Nov 05, 2019 1:03:41 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Time elapsed: 1 ms
Nov 05, 2019 1:03:41 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Unmarshalling xml:[invalid.xml], xsd:[xsd/hu/icellmobilsoft/pattern/validation/test/dto/workaround.xsd]...
Nov 05, 2019 1:03:41 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
SEVERE: Unmarshall failed, xml:[invalid.xml], xsd:[xsd/hu/icellmobilsoft/pattern/validation/test/dto/workaround.xsd]...
javax.xml.bind.UnmarshalException
- with linked exception:
[org.xml.sax.SAXParseException; lineNumber: 3; columnNumber: 35; cvc-pattern-valid: Value ' ' is not facet-valid with respect to pattern '.*[^\s].*' for type 'SimpleText255NotBlankType'.]
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:335)
...
Caused by: org.xml.sax.SAXParseException; lineNumber: 3; columnNumber: 35; cvc-pattern-valid: Value ' ' is not facet-valid with respect to pattern '.*[^\s].*' for type 'SimpleText255NotBlankType'.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
...
Nov 05, 2019 1:03:41 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Time elapsed: 3 ms
Nov 05, 2019 1:03:41 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Unmarshalling xml:[long_string.xml], xsd:[xsd/hu/icellmobilsoft/pattern/validation/test/dto/workaround.xsd]...
Nov 05, 2019 1:03:41 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
SEVERE: Unmarshall failed, xml:[long_string.xml], xsd:[xsd/hu/icellmobilsoft/pattern/validation/test/dto/workaround.xsd]...
javax.xml.bind.UnmarshalException
- with linked exception:
[org.xml.sax.SAXParseException; lineNumber: 3; columnNumber: 1048636; cvc-pattern-valid: Value 'KORLÄÂĂ...ĂÂÄÂĂÂG' is not facet-valid with respect to pattern '.{1,255}' for type 'SimpleText255NotBlankType'.]
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:335)
...
Caused by: org.xml.sax.SAXParseException; lineNumber: 3; columnNumber: 1048636; cvc-pattern-valid: Value 'KORLÄÂĂ...ĂÂÄÂĂÂG' is not facet-valid with respect to pattern '.{1,255}' for type 'SimpleText255NotBlankType'.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
...
Nov 05, 2019 1:03:41 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication unmarshallXml
INFO: Time elapsed: 290 ms
Nov 05, 2019 1:03:41 PM hu.icellmobilsoft.xsd.pattern.validation.test.application.MainApplication logStatistics
INFO: Statistics:
UnmarshallStatistics{timeElapsed=10, xmlInput='valid.xml', xsdPath='xsd/hu/icellmobilsoft/pattern/validation/test/dto/unsafe.xsd', valid=true}
UnmarshallStatistics{timeElapsed=6, xmlInput='invalid.xml', xsdPath='xsd/hu/icellmobilsoft/pattern/validation/test/dto/unsafe.xsd', valid=false}
UnmarshallStatistics{timeElapsed=226955, xmlInput='long_string.xml', xsdPath='xsd/hu/icellmobilsoft/pattern/validation/test/dto/unsafe.xsd', valid=false}
UnmarshallStatistics{timeElapsed=1, xmlInput='valid.xml', xsdPath='xsd/hu/icellmobilsoft/pattern/validation/test/dto/workaround.xsd', valid=true}
UnmarshallStatistics{timeElapsed=3, xmlInput='invalid.xml', xsdPath='xsd/hu/icellmobilsoft/pattern/validation/test/dto/workaround.xsd', valid=false}
UnmarshallStatistics{timeElapsed=290, xmlInput='long_string.xml', xsdPath='xsd/hu/icellmobilsoft/pattern/validation/test/dto/workaround.xsd', valid=false}
Time elapsed (ms) |
Xml input |
Xsd used for validation |
Validation result |
10 |
|
|
VALID |
6 |
|
|
INVALID |
226955 |
|
|
INVALID |
1 |
|
|
VALID |
3 |
|
|
INVALID |
290 |
|
|
INVALID |