Skip to content

change regex to use re.search or re.fullmatch #1631

@yurikhan

Description

@yurikhan

Bug

Output of python -c "import pydantic.utils; print(pydantic.utils.version_info())":

             pydantic version: 1.5.1
            pydantic compiled: False
                 install path: /home/khan/.local/lib/python3.6/site-packages/pydantic
               python version: 3.6.9 (default, Nov  7 2019, 10:44:02)  [GCC 8.3.0]
                     platform: Linux-4.15.0-88-generic-x86_64-with-Ubuntu-18.04-bionic
     optional deps. installed: ['typing-extensions']

Related: #1396

import jsonschema
import pydantic

class Foo(pydantic.BaseModel):
    bar: str = pydantic.Field(..., regex='baz')

try:
    Foo(bar='bar baz quux')
except pydantic.ValidationError as e:
    print(e)
    # ValidationError: 1 validation error for Foo
    # bar
    #   string does not match regex "baz" (type=value_error.str.regex; pattern=baz)
else:
    print('Valid')

try:
    jsonschema.validate({'bar': 'bar baz quux'}, Foo.schema())
except jsonschema.ValidationError as e:
    print(e)
else:
    print('Valid')
    # Valid

In JSON Schema, all versions so far, the regular expression in the pattern keyword is treated as unanchored at both ends, i.e. re.search behavior rather than re.match or re.fullmatch.

Pydantic uses re.match to validate strings with Field regex argument, as explained in #1396.

When constructing a JSON Schema from a model, Pydantic generates a pattern keyword from a Field regex argument, without any regex postprocessing. Thus, the resulting JSON Schema validates differently from the original Pydantic model.

I strongly feel Pydantic should follow suit and use re.search to validate fields with regex argument, and explicitly call this out in the documentation. (Anchors in example usage are not sufficient.)

Alternatively, if you are concerned with backward compatibility, I propose the following long-term solution:

  1. Add a new Field (and constr) argument, perhaps named pattern after the JSON Schema keyword, mutually exclusive with the existing regex argument. This is API extension, thus, backward compatible.
  2. Implement validation for pattern using re.search.
  3. Change the behavior of BaseModel.schema to copy the pattern Field argument to the pattern schema keyword as is if present. This way, new users of the pattern argument get correct schema generation.
  4. One or more of:
    • Fix the behavior of BaseModel.schema so that a regex argument with value REGEX produces a schema pattern keyword with value ^(REGEX). (The grouping is necessary because the original regex may contain alternatives.) This way, users of the regex argument get schemas that behave consistently with the original Pydantic model.
    • Deprecate regex, suggesting pattern with explicit anchoring.
    • Emit a warning if a model containing fields with the regex argument is used to generate a JSON Schema. This way, users of the regex argument who are likely to be adversely affected by the inconsistensy get a heads-up.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions