Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release regex 1.0 (May 1, 2018) #457

Closed
BurntSushi opened this issue Mar 13, 2018 · 15 comments
Closed

release regex 1.0 (May 1, 2018) #457

BurntSushi opened this issue Mar 13, 2018 · 15 comments

Comments

@BurntSushi
Copy link
Member

BurntSushi commented Mar 13, 2018

I think the 0.2 release has baked long enough. I propose that regex 1.0 be released on May 1, 2018.

Here are the key breaking changes (all supremely minor) I'd like to make:

  • Increase the minimum Rust version to 1.??.
  • Establish a policy around bumping the minimum Rust version. I would like to propose that new patch releases (1.x.y) should never increase the minimum Rust version required to compile regex, but that new minor version releases (1.x) may increase the minimum Rust version required to compile regex.
  • Disable octal syntax by default. Today, Regex::new(r"\1").unwrap().is_match("\u{1}") evaluates to true. Instead, I'd like it to emit an error that backreferences are not supported. We will provide a method on RegexBuilder to opt into the old syntax with octal escape sequences supported.
  • Ban (?-u:\B) from use in Regex::new, since it is permitted to match invalid UTF-8 boundaries. We, of course, continue to allow it for bytes::Regex::new. (?-u:\b) remains legal in Regex::new, since it cannot match invalid UTF-8 boundaries.
  • Remove the impl From<regex_syntax::Error> for regex::Error definition. The fact that this exists was an oversight, and it actually causes regex-syntax to be a public dependency of regex, which we very much do not want to happen.
  • As noted by @cuviper, we may some day want to support non-std but core/alloc-only use cases. To make this happen, we'll need to gate things on a std feature that is enabled by default. We need to add this feature in 1.0 and gate the entire crate on it. If we didn't, and added this gate in the future, then existing uses of default-features = false would likely break, which would be a breaking change.
BurntSushi added a commit to BurntSushi/regex that referenced this issue Mar 13, 2018
Previously, we had some inconsistencies in how we were handling ASCII
word boundaries. In particular, the translator was accepting a negated
ASCII word boundary even if the caller didn't disable the UTF-8 invariant.
This is wrong, since a negated ASCII word boundary can match between any
two arbitrary bytes. However, fixing this is a breaking change, so for
now we document the bug. We plan to fix it with regex 1.0. See rust-lang#457.

Additionally, we were incorrectly declaring that an ASCII word boundary
matched invalid UTF-8 via the Hir::is_always_utf8 property. An ASCII word
boundary must always match an ASCII byte on one side, which implies a
valid UTF-8 position.
BurntSushi added a commit that referenced this issue Mar 14, 2018
Previously, we had some inconsistencies in how we were handling ASCII
word boundaries. In particular, the translator was accepting a negated
ASCII word boundary even if the caller didn't disable the UTF-8 invariant.
This is wrong, since a negated ASCII word boundary can match between any
two arbitrary bytes. However, fixing this is a breaking change, so for
now we document the bug. We plan to fix it with regex 1.0. See #457.

Additionally, we were incorrectly declaring that an ASCII word boundary
matched invalid UTF-8 via the Hir::is_always_utf8 property. An ASCII word
boundary must always match an ASCII byte on one side, which implies a
valid UTF-8 position.
@jethrogb
Copy link

jethrogb commented Mar 14, 2018

Establish a policy around bumping the minimum Rust version. I would like to propose that new patch releases (1.x.y) should never increase the minimum Rust version required to compile regex, but that new minor version releases (1.x) may increase the minimum Rust version required to compile regex.

I'd advocate against this. Requiring a newer Rust version should be considered a breaking change.

Discussion on this topic rust-lang/api-guidelines#123

@BurntSushi
Copy link
Member Author

@jethrogb It's a compromise. The topic has been discussed to death. I frankly don't have the energy to continue discussing it.

@BurntSushi
Copy link
Member Author

My thoughts on the matter are here: rust-lang/api-guidelines#123 (comment)

@BurntSushi
Copy link
Member Author

BurntSushi commented Mar 14, 2018

I frankly don't have the energy to continue discussing it.

Let me rephrase this, because I do want to discuss it if there are things that I've missed or haven't thought about deeply. In particular, I would appreciate if further discussion built off of my thoughts here. If I'm missing something, then let's talk about it, but let's please try to avoid rehashing things.

@BurntSushi BurntSushi changed the title release regex 1.0 release regex 1.0 (May 1, 2018) Mar 14, 2018
@jethrogb
Copy link

My hope is that a community-wide decision on the matter can be reached (possibly as a part of the API guidelines initiative) before committing to a particular strategy just for this crate.

@BurntSushi
Copy link
Member Author

I don't see how that's going to happen. Did you read my comment that I linked? The regex crate will not be the first to adopt this policy.

@WiSaGaN
Copy link

WiSaGaN commented Mar 14, 2018

I'd advocate against this. Requiring a newer Rust version should be considered a breaking change.
Discussion on this topic rust-lang/api-guidelines#123

@jethrogb in that case, you can just pin to the minor version, which still allows updates of bugfix without upgrading your compiler.

@jethrogb
Copy link

jethrogb commented Mar 14, 2018

@BurntSushi but it will be (I think) the first crate under the rust-lang umbrella to adopt that policy, no?

@WiSaGaN Not if you have dependencies that silently update their dependency version requirements (which should be ok because they're semver compatible)! See the linked thread for additional discussion.

@WiSaGaN
Copy link

WiSaGaN commented Mar 15, 2018

@jethrogb If the dependency crates' owners decides to upgrade to regex 1.0, I think they should decide whether they should allow for minor version update or not depending on their own compatibility promises. They should specify the minor version of regex if they promise a stable minimum rustc version.

As long as regex is explicitly spelling out the version convention, I think the other crate owner should take the responsibility to take into account the fact. Have I missed something?

@jethrogb
Copy link

Have I missed something?

@WiSaGaN Yes. The point of semver is to not have to meticulously read each dependency's "compatibility policy" because it's spelled out in semver.

@BurntSushi
Copy link
Member Author

BurntSushi commented Mar 15, 2018

This conversation is frustrating. Semver does not spell out anything other than a high level interpretation of version numbers. There is already broad precedence in the Rust ecosystem that certain types of technically backwards incompatible changes aren't actually backwards incompatible when reasoning about semver. Moreover, the assumption that bumping the minimum Rust version is even a semver incompatible changed is contested, and acting like it isn't is not productive.

I will stress this again: let's please table this discussion unless you have something new to add.

@jethrogb Your distaste has been acknowledged. I don't find any of your arguments compelling because they don't bring anything new to the table. My plan at the moment is to continue as planned, and if that in practice causes problems in the ecosystem, then we can re-evaluate that policy.

@jethrogb
Copy link

jethrogb commented Mar 15, 2018

@BurntSushi Sorry, it wasn't my intention to have this discussion in this thread all over again. I'm fine with having the discussion in rust-lang/api-guidelines#123 and I suggest @WiSaGaN express their arguments over there is well.

My main gripe is with a crate under the rust-lang umbrella (this crate) unilaterally defining what the policy is before consensus is achieved/a decision is made in that issue.

@BurntSushi
Copy link
Member Author

BurntSushi commented Mar 15, 2018

My main gripe is with a crate under the rust-lang umbrella (this crate) unilaterally defining what the policy is before consensus is achieved/a decision is made in that issue.

OK, well, if you want to take that angle, then my proposal is far more conservative than what we currently do, at least for the nursery crates anyway. The nursery crates have bumped the minimum Rust version required to compile the crate in semver compatible releases quite a bit (some have even done so recently, albeit with good reason). The only official policy I'm aware of is that rust-lang crates must support at least stable minus 2 releases back, which is compatible with my proposal.

@BurntSushi
Copy link
Member Author

@WiSaGaN As @jethrogb suggested, please take this discussion to rust-lang/api-guidelines#123 In particular, please carefully read my comment on that thread, which points out problems with constraints like ~1.1: https://github.com/kbknapp/clap-rs#warning-about--dependencies

BurntSushi added a commit to BurntSushi/regex that referenced this issue Apr 28, 2018
This also clarifies our policy on increasing the minimum Rust version
required. In particular, we reserve the right to increase the minimum
Rust version in minor version releases of regexes, but never in patch
releases. We will default to a reasonably conservative interpretation
of this policy, and not bump the minimum required Rust version lightly.

If this policy turns out to be too aggressive, then we may alter it in
the future to state that the minimum Rust version is fixed for all of
regex 1.y.z, and can only be bumped on major regex version releases.

See: rust-lang#457
BurntSushi added a commit to BurntSushi/regex that referenced this issue Apr 28, 2018
This commit disables octal syntax by default, which will permit us to
produce useful error messages if a user tried to invoke a backreference.

This commit adds a new `octal` method to RegexBuilder and RegexSetBuilder
which permits callers to re-enable octal syntax.

See rust-lang#457
BurntSushi added a commit to BurntSushi/regex that referenced this issue Apr 28, 2018
The issue with the ASCII version of \B is that it can match between code
units of UTF-8, which means it can cause match indices reported to be on
invalid UTF-8 boundaries. Therefore, similar to things like `(?-u:\xFF)`,
we ban negated ASCII word boundaries from Unicode regular expressions.
Normal ASCII word boundaries remain accessible from Unicode regular
expressions.

See: rust-lang#457
BurntSushi added a commit to BurntSushi/regex that referenced this issue Apr 28, 2018
This removes a public `From` impl that automatically converts errors from
the regex-syntax crate to a regex::Error. This actually causes regex-syntax
to be a public dependency of regex, which was an oversight. We now remove
it, which completely breaks any source code coupling between regex and
regex-syntax.

See rust-lang#457
BurntSushi added a commit to BurntSushi/regex that referenced this issue Apr 28, 2018
This also clarifies our policy on increasing the minimum Rust version
required. In particular, we reserve the right to increase the minimum
Rust version in minor version releases of regexes, but never in patch
releases. We will default to a reasonably conservative interpretation
of this policy, and not bump the minimum required Rust version lightly.

If this policy turns out to be too aggressive, then we may alter it in
the future to state that the minimum Rust version is fixed for all of
regex 1.y.z, and can only be bumped on major regex version releases.

See rust-lang#457
BurntSushi added a commit to BurntSushi/regex that referenced this issue Apr 28, 2018
This commit disables octal syntax by default, which will permit us to
produce useful error messages if a user tried to invoke a backreference.

This commit adds a new `octal` method to RegexBuilder and RegexSetBuilder
which permits callers to re-enable octal syntax.

See rust-lang#457
BurntSushi added a commit to BurntSushi/regex that referenced this issue Apr 28, 2018
The issue with the ASCII version of \B is that it can match between code
units of UTF-8, which means it can cause match indices reported to be on
invalid UTF-8 boundaries. Therefore, similar to things like `(?-u:\xFF)`,
we ban negated ASCII word boundaries from Unicode regular expressions.
Normal ASCII word boundaries remain accessible from Unicode regular
expressions.

See rust-lang#457
BurntSushi added a commit to BurntSushi/regex that referenced this issue Apr 28, 2018
This removes a public `From` impl that automatically converts errors from
the regex-syntax crate to a regex::Error. This actually causes regex-syntax
to be a public dependency of regex, which was an oversight. We now remove
it, which completely breaks any source code coupling between regex and
regex-syntax.

See rust-lang#457
BurntSushi added a commit to BurntSushi/regex that referenced this issue Apr 28, 2018
This commit adds a new 'std' feature and enables it by default. This
permits us to one day add support for building regex without 'std' (but
with 'alloc', probably) by avoiding the introduction of incompatibilities.
Namely, this setup ensures that all of today's uses of
'--no-default-features' won't compile without also adding the 'std'
feature.

Closes rust-lang#457
@BurntSushi BurntSushi mentioned this issue Apr 28, 2018
@BurntSushi
Copy link
Member Author

For anyone following along at home, I've opened #471 that implements the above changes to bring us to 1.0.

Unless something comes up, my plan is to release this on Tuesday (May 1).

BurntSushi added a commit to BurntSushi/regex that referenced this issue May 1, 2018
This also clarifies our policy on increasing the minimum Rust version
required. In particular, we reserve the right to increase the minimum
Rust version in minor version releases of regexes, but never in patch
releases. We will default to a reasonably conservative interpretation
of this policy, and not bump the minimum required Rust version lightly.

If this policy turns out to be too aggressive, then we may alter it in
the future to state that the minimum Rust version is fixed for all of
regex 1.y.z, and can only be bumped on major regex version releases.

See rust-lang#457
BurntSushi added a commit to BurntSushi/regex that referenced this issue May 1, 2018
This commit disables octal syntax by default, which will permit us to
produce useful error messages if a user tried to invoke a backreference.

This commit adds a new `octal` method to RegexBuilder and RegexSetBuilder
which permits callers to re-enable octal syntax.

See rust-lang#457
BurntSushi added a commit to BurntSushi/regex that referenced this issue May 1, 2018
The issue with the ASCII version of \B is that it can match between code
units of UTF-8, which means it can cause match indices reported to be on
invalid UTF-8 boundaries. Therefore, similar to things like `(?-u:\xFF)`,
we ban negated ASCII word boundaries from Unicode regular expressions.
Normal ASCII word boundaries remain accessible from Unicode regular
expressions.

See rust-lang#457
BurntSushi added a commit to BurntSushi/regex that referenced this issue May 1, 2018
This removes a public `From` impl that automatically converts errors from
the regex-syntax crate to a regex::Error. This actually causes regex-syntax
to be a public dependency of regex, which was an oversight. We now remove
it, which completely breaks any source code coupling between regex and
regex-syntax.

See rust-lang#457
BurntSushi added a commit to BurntSushi/regex that referenced this issue May 1, 2018
This commit adds a new 'use_std' feature and enables it by default.
This permits us to one day add support for building regex without
'use_std' (but with 'alloc', probably) by avoiding the introduction
of incompatibilities. Namely, this setup ensures that all of today's
uses of '--no-default-features' won't compile without also adding the
'use_std' feature.

Closes rust-lang#457
BurntSushi added a commit that referenced this issue May 1, 2018
This also clarifies our policy on increasing the minimum Rust version
required. In particular, we reserve the right to increase the minimum
Rust version in minor version releases of regexes, but never in patch
releases. We will default to a reasonably conservative interpretation
of this policy, and not bump the minimum required Rust version lightly.

If this policy turns out to be too aggressive, then we may alter it in
the future to state that the minimum Rust version is fixed for all of
regex 1.y.z, and can only be bumped on major regex version releases.

See #457
BurntSushi added a commit that referenced this issue May 1, 2018
This commit disables octal syntax by default, which will permit us to
produce useful error messages if a user tried to invoke a backreference.

This commit adds a new `octal` method to RegexBuilder and RegexSetBuilder
which permits callers to re-enable octal syntax.

See #457
BurntSushi added a commit that referenced this issue May 1, 2018
The issue with the ASCII version of \B is that it can match between code
units of UTF-8, which means it can cause match indices reported to be on
invalid UTF-8 boundaries. Therefore, similar to things like `(?-u:\xFF)`,
we ban negated ASCII word boundaries from Unicode regular expressions.
Normal ASCII word boundaries remain accessible from Unicode regular
expressions.

See #457
BurntSushi added a commit that referenced this issue May 1, 2018
This removes a public `From` impl that automatically converts errors from
the regex-syntax crate to a regex::Error. This actually causes regex-syntax
to be a public dependency of regex, which was an oversight. We now remove
it, which completely breaks any source code coupling between regex and
regex-syntax.

See #457
BurntSushi added a commit that referenced this issue May 1, 2018
This commit adds a new 'use_std' feature and enables it by default.
This permits us to one day add support for building regex without
'use_std' (but with 'alloc', probably) by avoiding the introduction
of incompatibilities. Namely, this setup ensures that all of today's
uses of '--no-default-features' won't compile without also adding the
'use_std' feature.

Closes #457
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants