Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permit use of (?-u) in byte-regex strategies (#336) #337

Merged
merged 2 commits into from
Sep 24, 2023

Commits on Sep 18, 2023

  1. Permit use of (?-u) in byte-regex strategies (proptest-rs#336)

    It is desirable to be able to generate, from a regex, byte sequences
    that are not necessarily valid UTF-8.  For example, suppose you have a
    parser that accepts any string generated by the regex
    
    ```
    [0-9]+(\.[0-9]*)?
    ```
    
    Then, in your test suite, you might want to generate strings from the
    complementary regular language, which you could do with the regex
    
    ```
    (?s:|[^0-9].*|[0-9]+[^0-9.].*|[0-9]+\.[0-9]*[^0-9].*)
    ```
    
    However, this will still only generate valid UTF-8 strings.  Maybe you
    are parsing directly from byte sequences read from disk, in which case
    you want to test the parser’s ability to reject invalid UTF-8 _as well
    as_ valid UTF-8 but not within the accepted language.  Then you want
    this slight variation:
    
    ```
    (?s-u:|[^0-9].*|[0-9]+[^0-9.].*|[0-9]+\.[0-9]*[^0-9].*)
    ```
    
    But this regex will be rejected by `bytes_regex`, because by default
    `regex_syntax::Parser` errors out on any regex that potentially
    matches invalid UTF-8.  The application — i.e. proptest — must opt
    into use of such regexes.  This patch makes proptest do just that, for
    `bytes_regex` only.
    
    There should be no change to the behavior of any existing test suite,
    because opting to allow use of `(?-u)` does not change the semantics
    of any regex that _doesn’t_ contain `(?-u)`, and any existing regex
    that _does_ contain `(?-u)` must be incapable of generating invalid
    UTF-8 for other reasons, or `regex_syntax::Parser` would be rejecting it.
    (For example, `(?-u:[a-z])` cannot generate invalid UTF-8.)
    
    This patch also adds a bunch of tests for `bytes_regex`, which AFAICT
    was not being tested at all.  Some of these use the new functionality
    and others don’t.  There is quite a bit of code duplication in the
    test helper functions — `do_test` and `do_test_bytes` are almost
    identical, as are `generate_values_matching_regex` and
    `generate_byte_values_matching_regex`.  I am not good enough at
    generic metaprogramming in Rust to factor out the duplication.
    zackw committed Sep 18, 2023
    Configuration menu
    Copy the full SHA
    5f5b02b View commit details
    Browse the repository at this point in the history
  2. [to squash] Correct for API change in regex-syntax 0.7

    Commit
    rust-lang/regex@706b07d
    renamed ParserBuilder::allow_invalid_utf8
    to ParserBuilder::utf8
    and inverted the sense of its argument.
    
    Separate commit for review purposes; should be squashed before landing
    to preserve bisectability of trunk.
    zackw committed Sep 18, 2023
    Configuration menu
    Copy the full SHA
    bdc2b1e View commit details
    Browse the repository at this point in the history