Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update from 1.0.81 to 1.0.83 causes schema validation failure #814

Closed
scrawfor99 opened this issue Jun 7, 2023 · 5 comments · Fixed by #815
Closed

Update from 1.0.81 to 1.0.83 causes schema validation failure #814

scrawfor99 opened this issue Jun 7, 2023 · 5 comments · Fixed by #815
Assignees

Comments

@scrawfor99
Copy link

Hi all,

I am reaching out with an issue I have encountered.

I am trying to upgrade the dependency from 1.0.81 to 1.0.83 however after updating my schema validation tests now fail:

[validate JSON][ERROR][delete_by_query.json][$.delete_by_query.params.search_type.options[0]: does not match the regex pattern ^\d*|[a-zA-Z_]+$]
[validate JSON][ERROR][delete_by_query.json][$.delete_by_query.params.search_type.options[1]: does not match the regex pattern ^\d*|[a-zA-Z_]+$]
[validate JSON][ERROR][delete_by_query.json][$.delete_by_query.params.slices.type: does not match the regex pattern ^list|date|time|string|enum|int|double|long|boolean|number$]
[validate JSON][ERROR][exists.json][$.exists.params.version_type.options[0]: does not match the regex pattern ^\d*|[a-zA-Z_]+$]
[validate JSON][ERROR][exists.json][$.exists.params.version_type.options[1]: does not match the regex pattern ^\d*|[a-zA-Z_]+$]
[validate JSON][ERROR][exists.json][$.exists.params.version_type.options[2]: does not match the regex pattern ^\d*|[a-zA-Z_]+$]
[validate JSON][ERROR][exists.json][$.exists.params.version_type.options[3]: does not match the regex pattern ^\d*|[a-zA-Z_]+$]

From looking at the CHANGELOG it does not look like there were any changes between the versions that should cause a break in the regex matching or the the schema to be parsed differently.

Is this something you have seen before?

Here is an example doc:

{
  "exists":{
    "documentation":{
      "url":"https://www.elastic.co/guide/en/elasticsearch/reference/master/docs-get.html",
      "description":"Returns information about whether a document exists in an index."
    },
    "stability":"stable",
    "url":{
      "paths":[
        {
          "path":"/{index}/_doc/{id}",
          "methods":[
            "HEAD"
          ],
          "parts":{
            "id":{
              "type":"string",
              "description":"The document ID"
            },
            "index":{
              "type":"string",
              "description":"The name of the index"
            }
          }
        }
      ]
    },
    "params":{
      "stored_fields":{
        "type":"list",
        "description":"A comma-separated list of stored fields to return in the response"
      },
      "preference":{
        "type":"string",
        "description":"Specify the node or shard the operation should be performed on (default: random)"
      },
      "realtime":{
        "type":"boolean",
        "description":"Specify whether to perform the operation in realtime or search mode"
      },
      "refresh":{
        "type":"boolean",
        "description":"Refresh the shard containing the document before performing the operation"
      },
      "routing":{
        "type":"string",
        "description":"Specific routing value"
      },
      "_source":{
        "type":"list",
        "description":"True or false to return the _source field or not, or a list of fields to return"
      },
      "_source_excludes":{
        "type":"list",
        "description":"A list of fields to exclude from the returned _source field"
      },
      "_source_includes":{
        "type":"list",
        "description":"A list of fields to extract and return from the _source field"
      },
      "version":{
        "type":"number",
        "description":"Explicit version number for concurrency control"
      },
      "version_type":{
        "type":"enum",
        "options":[
          "internal",
          "external",
          "external_gte",
          "force"
        ],
        "description":"Specific version type"
      }
    }
  }
}

And the related SCHEMA:

},
            "title": "API url parts",
            "description": "Variable parts that make up an API url path"
        },
        "ParamPart": {
            "type": "object",
            "additionalProperties": false,
            "properties": {
                "type": {
                    "type": "string",
                    "pattern": "^list|date|time|string|enum|int|double|long|boolean|number$",
                    "title": "The type of the parameter."
                },
                "options": {
                    "type": "array",
                    "items": {
                        "type": "string",
                        "pattern": "^\\d*|[a-zA-Z_]+$"
                    },
                    "title": "Valid options when type is an enum"
                },
                "default": {
                    "type": ["string", "number", "boolean"],
                    "title": "Default value"
                },
                "deprecated": {
                    "oneOf": [
                      { "$ref": "#/definitions/Deprecated" },
                      { "type": "boolean" }
                    ]
                },
                "description": {
                    "type": "string",
                    "title": "A description for the parameter"
                },
                "required": {
                    "type": "boolean",
                    "title": "Whether the parameter is required"
                }
            },
            "required": [
                "description",
                "type"
            ],
            "allOf": [
                {
                    "if": {
                        "properties": { "type": { "const": "enum" } }
                    },
                    "then": {
                        "required": ["options"]
                    }
                }
            ],
            "title": "API parameter or part",
            "description": "The properties of an API parameter or part"
        },
        "Path": {
@fdutton
Copy link
Contributor

fdutton commented Jun 7, 2023

Both 1.0.82 and 1.0.83 contain changes to how regular expressions work. See #738 and #782

I wrote some unit-tests and was able to reproduce the issue with options but not the issue with slices. The problem with options is that the pattern is not matching the end anchor when using the JDK's pattern. This does not occur when using Joni's regex. You can switch to Joni by setting SchemaValidatorsConfig.setEcma262Validator(true). I'll continue investigating why the end anchor does not match.

package com.networknt.schema.regex;

import static org.junit.jupiter.api.Assertions.*;

import org.junit.jupiter.api.Test;

class Issue814Test {

    @Test
    void jdkTypePattern() {
        JDKRegularExpression ex = new JDKRegularExpression("^list|date|time|string|enum|int|double|long|boolean|number$");
        assertTrue(ex.matches("list"));
        assertTrue(ex.matches("string"));
        assertTrue(ex.matches("boolean"));
        assertTrue(ex.matches("number"));
        assertTrue(ex.matches("enum"));
    }

    @Test
    void jdkOptionsPattern() {
        JDKRegularExpression ex = new JDKRegularExpression("^\\d*|[a-zA-Z_]+$");
        assertTrue(ex.matches("external"));
        assertTrue(ex.matches("external_gte"));
        assertTrue(ex.matches("force"));
        assertTrue(ex.matches("internal"));
    }

    @Test
    void joniTypePattern() {
        JoniRegularExpression ex = new JoniRegularExpression("^list|date|time|string|enum|int|double|long|boolean|number$");
        assertTrue(ex.matches("list"));
        assertTrue(ex.matches("string"));
        assertTrue(ex.matches("boolean"));
        assertTrue(ex.matches("number"));
        assertTrue(ex.matches("enum"));
    }

    @Test
    void joniOptionsPattern() {
        JoniRegularExpression ex = new JoniRegularExpression("^\\d*|[a-zA-Z_]+$");
        assertTrue(ex.matches("internal"));
        assertTrue(ex.matches("external"));
        assertTrue(ex.matches("external_gte"));
        assertTrue(ex.matches("force"));
    }

}

@fdutton fdutton self-assigned this Jun 7, 2023
@scrawfor99
Copy link
Author

Hi @fdutton, thanks for the quick follow-up. Let me know if you need a hand with anything and I will open a PR if I come across a fix.

@fdutton
Copy link
Contributor

fdutton commented Jun 8, 2023

@scrawfor99 This issue is actually in Java.

The patterns in JSON Schema are not implicitly anchored so we must use Matcher.find() rather than Matcher.matches() or Matcher.lookingAt(). However, this method does not honor the end anchor when it is immediately preceded by a quantifier (e.g., ?, *, +, {n,m}, etc.). So, replacing your expression, ^\\d*|[a-zA-Z_]+$ with ^(\\d*|[a-zA-Z_]+)$ works.

To make this work in all cases, I am now checking for an end anchor and wrapping the pattern in a group.

stevehu pushed a commit that referenced this issue Jun 9, 2023
…ded by a quantifier. (#815)

Resolves #814

Co-authored-by: Faron Dutton <faron.dutton@insightglobal.com>
@kool79
Copy link

kool79 commented Oct 11, 2023

@scrawfor99
you use incorrect regexp.
when you write ^\\d*|[a-zA-Z_]+$ it is the same as (^\\d*)|([a-zA-Z_]+$)
and the 1st group ^\\d* is matched either to string which starts with digits OR to zero-length string at the start if string does not start with digit. i.e. it ALWAYS matched to something. I think it is not your target.

@fdutton
Please revert your fix. Java does not have any issues with regexp. It was verified by many coders during long many years.
Your fix made impossible and brokes regexp like "something which start with this OR ends with that". For example, if I want to pass any string which EITHER starts with "red" or ends with "banana" I will write:
^red|banana$.
So both "red apple" and "green banana", "red banana" will pass.
but your fix will change this to ^(red|banana)$
which will pass ONLY "red" or "banana"

p.s. you can use this tool: https://regexper.com/#%5E%5C%5Cd*%7C%5Ba-zA-Z_%5D%2B%24 to visualize how regexp works

@Stephan202
Copy link
Contributor

I think @kool79 is right. CC @stevehu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants