It is desirable to be able to generate, from a regex, byte sequences
that are not necessarily valid UTF-8. For example, suppose you have a
parser that accepts any string generated by the regex
```
[0-9]+(\.[0-9]*)?
```
Then, in your test suite, you might want to generate strings from the
complementary regular language, which you could do with the regex
```
(?s:|[^0-9].*|[0-9]+[^0-9.].*|[0-9]+\.[0-9]*[^0-9].*)
```
However, this will still only generate valid UTF-8 strings. Maybe you
are parsing directly from byte sequences read from disk, in which case
you want to test the parser’s ability to reject invalid UTF-8 _as well
as_ valid UTF-8 but not within the accepted language. Then you want
this slight variation:
```
(?s-u:|[^0-9].*|[0-9]+[^0-9.].*|[0-9]+\.[0-9]*[^0-9].*)
```
But this regex will be rejected by `bytes_regex`, because by default
`regex_syntax::Parser` errors out on any regex that potentially
matches invalid UTF-8. The application — i.e. proptest — must opt
into use of such regexes. This patch makes proptest do just that, for
`bytes_regex` only.
There should be no change to the behavior of any existing test suite,
because opting to allow use of `(?-u)` does not change the semantics
of any regex that _doesn’t_ contain `(?-u)`, and any existing regex
that _does_ contain `(?-u)` must be incapable of generating invalid
UTF-8 for other reasons, or `regex_syntax::Parser` would be rejecting it.
(For example, `(?-u:[a-z])` cannot generate invalid UTF-8.)
This patch also adds a bunch of tests for `bytes_regex`, which AFAICT
was not being tested at all. Some of these use the new functionality
and others don’t. There is quite a bit of code duplication in the
test helper functions — `do_test` and `do_test_bytes` are almost
identical, as are `generate_values_matching_regex` and
`generate_byte_values_matching_regex`. I am not good enough at
generic metaprogramming in Rust to factor out the duplication.