Needs canonical examples / reference implementation (including RegExp) #59

coolaj86 · 2013-04-24T23:01:09Z

RegExp in various languages that correctly parse semver
Reference implementations

JavaScript RegExp:

/^(([\d+)\.(\d|)\.(\d+))(?:-([\dA-Za-z\-]+(?:\.[\dA-Za-z\-]+)*))?(?:\+([\dA-Za-z\-]+(?:\.[\dA-Za-z\-]+)*))?$/

JavaScript Reference Implementation:

https://github.com/coolaj86/semver-utils

I would suggest having two sections - one for tested Regular Expressions that match and another for modules / reference implementations.

There should be at least canonical reference implementation of a parser / validator with an api that would make sense to copy in any language.

ghost · 2013-06-24T04:40:53Z

The regular expression needs to be updated to detect the presence of leading zeros.

Unfortunately, I haven't found a way to do that, otherwise I would have provided an updated example.

coolaj86 · 2013-06-24T05:38:43Z

Can you give an example of a valid semver string that has leading zeros? and also provide a reference to the documentation that suggests this is allowed (i.e. a phrase or example in the docs)?

If so I'll add your string to the tests in semver-utils and fix it.

ghost · 2013-06-24T06:56:30Z

Sorry, I meant to disallow.

The regular expression as used now does not seem to reject version numbers with leading zeros..

tbull · 2013-06-24T14:15:42Z

We decided only recently that leading zeroes are no longer allowed: semver/semver#112
That decision invalidated all regexps developed earlier.

gvlx · 2014-10-03T19:19:10Z

Hi,

I have been playing with the regex for version 2.0.0 and came up with this on regex101.

Expanded:

/^
(?'MAJOR'
    0|(?:[1-9]\d*)
)
\.
(?'MINOR'
    0|(?:[1-9]\d*)
)
\.
(?'PATCH'
    0|(?:[1-9]\d*)
)
(?:-(?'prerelease'
    (?:0|(?:[1-9A-Za-z-][0-9A-Za-z-]*))
    (?:\.
        (?:0|(?:[1-9A-Za-z-][0-9A-Za-z-]*))
    )*
))?
(?:\+(?'build'
    (?:0|(?:[1-9A-Za-z-][0-9A-Za-z-]*))
    (?:\.
        (?:0|(?:[1-9A-Za-z-][0-9A-Za-z-]*))
    )*
))?
$/

The pre-release and build patterns are very complex because they require the 'no leading zeros' rule.

I can't figure any benefits of that over a (very) relaxed pattern as in /[0-9A-Za-z]+([.-][0-9A-Za-z]+)*/ which is just 'dot-or-dash' separated alphanumeric identifiers (e.g "00000-aaaaa.bbbbb") which, for me, would be more useful (I usually have to use UUIDs and other mechanical identifiers).

Notice that according to the railroad diagram and the BNF (boy, isn't that hard to read! 😕) the identifier "0000.0000.0000.0000.------" is valid (leading zeros allowed).

If you can, please supply more edge cases 😄 on the regex101 page (in the first block is all cases are valid, in the second, invalid).

Happy hacking!

coolaj86 · 2014-10-03T19:26:49Z

I'm in the camp to veto the use of leading zeros. In JavaScript (and some other languages) parsers will default to octal and then your sorting could get all out of wack because suddenly '011' is less than '10' both lexicographically and numerically.

And what about 007 vs 07 vs 7? Numerically they're all the same in base 10 and base 8 so how would you know which version is the "newer" one?

For the love of all that is good on this earth: no leading zeros!!!

coolaj86 · 2014-10-03T19:30:15Z

Oh, sorry I missed the part about that being a build number. I'll have to look at the spec, but I don't think the parser I mentioned prohibits this either way.

gvlx · 2014-10-03T23:41:06Z

Hi,

The requirement is on 2.0.0 for pre-release but not on build version:

9 A pre-release version (...) Numeric identifiers MUST NOT include leading zeroes. (...)

But on the BNF:

<pre-release identifier> ::= <alphanumeric identifier>
                           | <numeric identifier>

<build identifier> ::= <alphanumeric identifier>
                     | <digits>

<alphanumeric identifier> ::= <non-digit>
                            | <non-digit> <identifier characters>
                            | <identifier characters> <non-digit>
                            | <identifier characters> <non-digit> <identifier characters>

<identifier characters> ::= <identifier character>
                          | <identifier character> <identifier characters>

<identifier character> ::= <digit>
                         | <non-digit>

<non-digit> ::= <letter>
              | "-"

<digit> ::= "0"
          | <positive digit>

The railroad diagram is less clear.

So maybe the text requires some correction.
Added pull request #95

So, version 2.0.1? (patterns allowed on 2.0.0 will still work here).

gvlx · 2014-10-04T00:09:33Z

New regex101 pattern:

^
(?'MAJOR'(?:
    0|(?:[1-9]\d*)
))
\.
(?'MINOR'(?:
    0|(?:[1-9]\d*)
))
\.
(?'PATCH'(?:
    0|(?:[1-9]\d*)
))
(?:-(?'prerelease'
    [0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*
))?
(?:\+(?'build'
    [0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*
))?
$

FichteFoll · 2014-10-12T22:26:26Z

9 A pre-release version (...) Numeric identifiers MUST NOT include leading zeroes. (...)

~~How I read this, it means that a pre-release identifier like 0123456789 is just not interpreted as a numeric but as an alphanumeric identifier and thus compared lexically instead of numerically.~~

identifiers consisting of only digits are compared numerically and identifiers with letters or hyphens are compared lexically in ASCII sort order. Numeric identifiers always have lower precedence than non-numeric identifiers.

Never mind, it appears that numeric identifiers with leading zeros are not accepted at all or at least have no precedence defined which is pretty much the same.

haacked · 2016-08-05T16:04:01Z

I'm cool with adding a new page that has a regex example. We just need to figure out how we'll change the layout of the site to accommodate such links. #57 has the same design issue.

fer-rum · 2017-01-04T08:55:25Z

The posted regex seems to have a little trouble with the spec: In §9 it states

Numeric identifiers MUST NOT include leading zeroes

So I assume that a version like 1.0.0-0123 should not be valid; However in the provided regex it will be accepted. I suppose the error is in the prerelease capture group where
[0-9A-Za-z-]+(\.[0-9A-Za-z-]+)* should be
[1-9A-Za-z-]+(\.[0-9A-Za-z-]+)*.

Also, am I correct that according to spec pre-release identifiers and build metadata behave differently with respect to the leading zero policy, since §10 misses the appropriate statement?
In this case the build capture group can be reduced to
[0-9A-Za-z]+, can't it?

Before I mess with the provided regex, it would be nice if someone could confirm/ falsify my suggestion.

Edit: Noticed the discrepancy between pre-release and build metadata spec.

haacked · 2017-01-04T18:16:49Z

Good catch. However, your change would also make 1.0.0-0abc invalid but there's no reason that shouldn't be invalid.

Taking a step back, an identifier is either a numeric or an alphanumeric. That's a bit tricky to capture in Regex.

numeric : [1-9][0-9*]
alphanumeric: [0-9]*[A-Za-z-]+[0-9A-Za-z-]* At least one character must be non-numeric

Hence it'd combine to be something like ([1-9][0-9*])|([0-9]*[A-Za-z-]+[0-9A-Za-z-]*)

So complicated. 😦 Does that look correct?

FichteFoll · 2017-01-09T20:25:24Z

Looks good, except that the asterisk in the numeric pattern has to go out of the set.

You could also speed it up if the second set in the alpha was matched exactly once so the engine doesn't have to backtrack.

fer-rum · 2017-02-10T09:40:17Z

@haacked would you like to update the regex then (Including the suggestion by @FichteFoll )?

This is probably a regurgitation of previously discussed topics, but can a reference regex be put into the spec?

FichteFoll · 2017-02-10T13:23:21Z

Another problem: Just 0 is not considered.

numeric : 0|[1-9][0-9]*
alphanumeric: [0-9]*[A-Za-z-][0-9A-Za-z-]* at least one character must be non-numeric
combined: 0|[1-9][0-9]*|[0-9]*[A-Za-z-][0-9A-Za-z-]*

fer-rum · 2017-02-14T10:12:12Z

Why should the term "0" be valid? It is purely numeric, starts with a '0' so it should be invalid.

Side thought:
(Why are leading zeros in a numeric version excluded anyway?
Is it an explicit goal to parse the pattern as natural numbers if applicable?
What about negative numbers then?
Or hex-representation?
Can I introduce a leading '0' if I want the number to be interpreted as octal?
Specs are clear, but the intentions aren't.)

FichteFoll · 2017-02-14T15:00:48Z

https://en.wikipedia.org/wiki/Leading_zero

Therefore, the usual decimal notation of integers does not use leading zeros except for the zero itself, which would be denoted as an empty string otherwise.

Leading zeroes are excluded for simplification and prevention of ambiguity. Is 0.01.1 bigger or smaller than 0.1.01? Are they equal? If they are equal, why do they not have the same string representation?

For negative numbers, I can only speculate. Considering that, in numbering, you want to "start" at a certain point and increase from that onward, it makes sense to have a generally specified starting point (i.e. lowest member) as zero instead of operating on the entire set of whole numbers, which is infinite in both directions.

It should be obvious to everyone that the spec speaks of decimal numbers in all places, which are the standard numbering system pretty much everywhere in the world, afaik.

fer-rum · 2017-02-14T16:44:54Z

It should be obvious to everyone that the speak speaks of decimal numbers in all places, which are the standard numbering system pretty much everywhere in the world, afaik.

The thing is that this will often be used by programming people who tend to think in octal sometimes. :)

Leading zeroes are excluded for simplification and prevention of ambiguity. Is 0.01.1 bigger or smaller than 0.1.01? Are they equal? If they are equal, why do they not have the same string representation?

This is not about the version numbers which are clearly decimal natural numbers w/o leading 0s.
In the build information, lexical sorting is the way to go, which states that the first one comes before the second one and both are not identical.

I am in favour of stating explicitly that natural decimal numbers (including 0 per se) without leading 0s are the only accepted form of purely numeric notation in the build information. (Alternatively one could go the C-way and assume that pure numeric expressions with leading 0s are interpreted as an octal number.)

Hexadecimal expressions should be no problem, they are usually prefixed by 0x, # or alphanumeric anyway.

I still am not aware why exactly the contents of the build information is restricted in such a way.

jwdonahue · 2017-12-03T22:54:38Z

Do we have an oracle of valid and invalid version strings to test against? We should not publish any regex's without such an oracle and test scripts to verify correctness. We should probably also do the same for version 1.0.0, which is still in use.

jwdonahue · 2017-12-03T23:21:05Z

@haacked

We just need to figure out how we'll change the layout of the site to accommodate such links...

The current home page has a menu of version links, why not add another one with links to a FAQ and any other pages need to be added?

haacked · 2017-12-06T00:28:30Z

Yeah, I think we could change the menu to have the following links:

Latest | Versions | About | FAQ

That'd probably cover what we need now. Down the road we might add "Implementations" or "Related Specs".

wolf99 · 2018-05-10T10:36:54Z

Did get added to the site? Is there a final recommended regex?

gvlx · 2018-05-17T12:32:34Z

Hi,

I don't think so, although no one did advance with other options.

The current version (with the comment from @haacked above) is now this one:

^
    (?'MAJOR'(?:
        0|(?:[1-9]\d*)
    ))
    \.
    (?'MINOR'(?:
        0|(?:[1-9]\d*)
    ))
    \.
    (?'PATCH'(?:
        0|(?:[1-9]\d*)
    ))
    (?:-(?'prerelease'
        [1-9A-Za-z-][0-9A-Za-z-]*(\.[0-9A-Za-z-]+)*
    ))?
    (?:\+(?'build'
        [0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*
    ))?
$

on regex101 [online regex tester]
on RegExr [online regex tester]
on Debuggex [online regex railroad diagram]
on jex.im [online regex railroad diagram]

It would be nice if someone could contribute with more tests here.

FichteFoll · 2018-05-17T15:47:23Z

1.2.3-2.02123.abc should be invalid but is matched (semver/semver#112 (comment)), whereas 1.0.0-0s123 is valid but not matched.

See #59 (comment).

jwdonahue · 2018-05-18T00:33:21Z

1.0.0+0.build.1-rc.10000aaa-kk-0.1 is a legal version string that is among the list of invalid strings in the test data. There are no special rules for numeric fields in the build meta tag. Anything in the set [0-9A-Za-z-] is allowed in any meta tag sequence.

gvlx · 2018-05-18T09:26:45Z

@FichteFoll, 1.0.0-0s123 is not valid because the spec says "Numeric identifiers MUST NOT include leading zeroes.".

yes, technically 0s123 is not a full numeric identifier but I do think this interpretation is reasonable.

gvlx · 2018-05-22T08:56:36Z

Hi @jwdonahue , your expression works on regex101.com with python if you remove the group naming (only a few regex engines understand that, and comments too).

It will almost works with golang if you remove the lookaheads (?= (that are giving the "not quantifiable" errors).

Tell me, you added some redundant matches and quantifiers {1}. Do they make the pattern more efficient on some regexp engines? Look at the description on Debuggex.

jwdonahue · 2018-05-22T17:15:13Z

Ok, I asked Stackoverflow for help last night and got some very useful feedback. It turns out the (?P<name>... named group form works on PCRE, .NET, Java and Python, and there is a PCRE library for golang, so this version works on regex101.com for everything but the golang mode on regex101.com and golang coders can use it with the library. Getting closer.

^(?P<VersionTripple>(?P<Major>0|[1-9][0-9]*)\.(?P<Minor>0|[1-9][0-9]*)\.(?P<Patch>0|[1-9][0-9]*)){1}(?P<Tags>(?:\-(?P<Prerelease>(?:(?=[0]{1}[0-9A-Za-z-]{0})(?:[0]{1})|(?=[1-9]{1}[0-9]*[A-Za-z]{0})(?:[0-9]+)|(?=[0-9]*[A-Za-z-]+[0-9A-Za-z-]*)(?:[0-9A-Za-z-]+)){1}(?:\.(?=[0]{1}[0-9A-Za-z-]{0})(?:[0]{1})|\.(?=[1-9]{1}[0-9]*[A-Za-z]{0})(?:[0-9]+)|\.(?=[0-9]*[A-Za-z-]+[0-9A-Za-z-]*)(?:[0-9A-Za-z-]+))*){1}){0,1}(?:\+(?P<Meta>(?:[0-9A-Za-z-]+(?:\.[0-9A-Za-z-]+)*))){0,1})$

@gvlx, named capture groups are required in my book. Relying on numbered capture groups is unstable. It's too easy for a random numbered capture group to accidentally show up in a revision of the regex that breaks all the code that uses the results of the previous version. I am not just looking for match/no-match Boolean result, I want to be able to access each of the fields by name. Though it would be neat if the results could somehow have PrereleaseTags[] and MetaTags[] arrays (minus the dots), it's simple enough to split the Prerelease and Meta tags on the dots, in a post processing step.

I think we're very close now, just need to eliminate some redundant quantifiers and investigate whether there are any useful optimizations. Regex 101 counts a total of 591 steps to process the 72 lines (1293 bytes) of test data. Not sure exactly what they are measuring there, but if there's any way to speed this thing up without losing functionality, that would be awesome.

Also, the test data is a bit random and should be rationalized and probably requires expansion.

jwdonahue · 2018-05-22T17:22:07Z

I have only one major worry with regard to posting a regex as a canonical example, and that is, I know of no way to prove its correctness relative to the spec. In the absence of such a proof, we should label these as plausible examples that are explicitly not canonical, if we publish them at all. The spec should always be the golden standard.

dls314 · 2018-05-22T18:38:38Z

if we publish them at all

I think this should be published, with some of the gory details, if for no other reason than otherwise people will re-create worse regex for the same purpose.

jwdonahue · 2018-05-22T21:22:05Z

Yup, I just did another search and looked a bit deeper than the the first page, and there's over hundreds of them here on GitHub and most are useless and some may even be worse than useless.

dgn · 2018-05-29T12:04:00Z

@gvlx I think you meant me. I removed my comment after realising I was wrong about that.

gvlx · 2018-05-31T15:04:00Z

Following the discussion in EBNF grammar and this clarification, I updated this description in W3C simple EBNF grammar, which is much clearer than the official SemVer specification's BNF grammar to help clear the regex ambiguities:

Version ::= VersionCore ('-' PreRelease)? ('+' Build)?

VersionCore ::= Major '.' Minor '.' Patch
Major ::= NumericIdentifier  
Minor ::= NumericIdentifier  
Patch ::= NumericIdentifier  
PreRelease ::= CompoundPreReleaseIdentifier 
Build      ::= CompoundBuildIdentifier 

CompoundPreReleaseIdentifier ::= PreReleaseIdentifier ('.' PreReleaseIdentifier)*
CompoundBuildIdentifier      ::= BuildIdentifier      ('.' BuildIdentifier)*

PreReleaseIdentifier     ::= AlphaNumericIdentifier | NumericIdentifier+
BuildIdentifier          ::= IdentifierCharacter+

NumericIdentifier        ::= '0' | ( PositiveNumeric NumericCharacter* )
AlphaNumericIdentifier   ::= IdentifierCharacter* NonNumericCharacter IdentifierCharacter*

IdentifierCharacter ::= NonNumericCharacter | NumericCharacter

NonNumericCharacter ::= [A-Za-z-]
NumericCharacter    ::= '0' | PositiveNumeric
PositiveNumeric     ::= [1-9]

Update 2018/07/17: finally I understood (call me slooow) the construction of the pre-release identifier! The element AlphaNumericIdentifier is made so that it always has a non-numeric character, avoiding the leading zero on pure numeric identifiers. (Thanks for @jwdonahue for correcting me on this).

Following this, I think the actual Regex for the two main extra tags are (using @DavidFichtmueller's simplifications here):

BuildIdentifier ::== /[0-9A-Za-z-]+/
PreReleaseIdentifier ::== /0|[1-9]\d*|\d*[A-Za-z-][0-9A-Za-z-]*/

which have to be tested as a compound identifier, i.e., /<identifier>(\.<identifier>)*/.

Maybe these patterns can be further optimized?

I suppose that, for performance reasons, pattern processing could be done on two passes?

Extract the main groups first /^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(-[.0-9a-zA-Z-]+)?(\+[.0-9a-zA-Z-]+)?$/ (will produce false positives);
Test the pre-release and build matchs separately as compound identifiers

Relequestual · 2018-07-12T14:30:54Z

If anyone is interested, here is a javascript regex version (regex101) I made, which passes all the same tests as those in the above PCRE version.

^((0|[1-9][0-9]*)\.){2}(0|[1-9][0-9]*){1}(-([0-9A-Za-z-]+\.?)+)*(\+([0-9A-Za-z-]+-?\.?)+)*$

It's a little more condensed, as it's not concerned with group names, and just validation. (I'm using it for a JSON Schema)

Relequestual · 2018-07-12T14:33:11Z

Question, the spec for pre-release and metadata says the identifiers may consist of ASCII alphanumerics and hyphens only. It doesn't mention anything about dots, however the previously mentioned regex allows for it. Is this implied somewhere other than by the attached examples?

jwdonahue · 2018-07-12T18:20:46Z

@Relequestual

Quoting from the spec.

9 . A pre-release version MAY be denoted by appending a hyphen and a series of dot separated identifiers immediately following the patch version.

10. Build metadata MAY be denoted by appending a plus sign and a series of dot separated identifiers immediately following the patch or pre-release version.

I would add that your regex has an unbounded or at least extremely long run-time when faced with this tortuous, nearly legal example:

99999999999999999999999.999999999999999999.99999999999999999----RC-SNAPSHOT.12.09.1--------------------------------..12

Notice the empty prerelease field just before that final '.12'? That triggers back-tracking, which can result in infinite and pointless machinations. Now it's possible that your javascript implementation simply times out and correctly fails this on your system, but it's not something you should count on in a production environment. To the best of my knowledge, .NET implementations seem to be the most susceptible (they don't give up easily). Since it can only happen with near-matches, and you simply want a pass/fail, your regex will be fine, provided that you ensure that a reasonable timeout is in place to cut-off this degenerate behavior.

jwdonahue · 2018-07-12T18:29:55Z

@Relequestual, FYI: I got around the back-tracking by using look-ahead to chose which regex snippet to apply. It's those sneak peaks ahead of the current anchor point that makes my regex so long. Adding named capture groups to your regex would not overly complicate it, but they won't fix the bugs in your code.

Your regex fails to reject 1.2.3-0123 and 1.2.3-0123.0123. Both are invalid due to leading zero in the purely numeric fields of the prerelease tag.

jwdonahue · 2018-07-12T18:35:40Z

@Relequestual, here's two more examples it failed to reject:

9.8.7+meta+meta
9.8.7-whatever+meta+meta

jwdonahue · 2018-07-12T18:53:05Z

@gvlx, I think I missed something in your last post, regarding what you are calling AlphanumIdentifierNLZ. That terminology is incorrect. It should be something like AllNumericNoLeadingZerosIdentifier. Alphanumeric identifiers do not have the leading zero prohibition, it only applies to all numeric identifiers, so -0zyz is a legal prerelease tag according to the spec.

FichteFoll · 2018-07-12T23:32:41Z

Depending on your regex engine, you could also look at atomic groups or possessive quantifiers to prevent backtracking.

gvlx · 2018-07-13T13:01:05Z

@jwdonahue, the name comes from the BNF, as pointed out in semver/semver#181 (comment).

Yes, not the best choice but I kept it for consistency with the BNF.

As you can read, it is a modified alphanumeric identifier which cannot have a leading zero. This only applies to the pre-release identifier, and it is one of the sources of performance degradation.

This is also the reason @Relequestual's #59 (comment) is not correct.

I still wonder if making the two step matching approach I suggested would improve performance as each regex would much simpler.

jwdonahue · 2018-07-14T04:05:10Z

@gvlx, are we talking past each other? Is there something I've missed? Nothing in the spec prohibits alphanumeric identifiers from having leading zeros. Only the pure numeric identifiers have that prohibition on them. I've been trying to get this point across to you and others for some time now. -0123zyx is a legal prerelease tag, even with the leading zero (maybe that wasn't obvious with your display font?). -0123 is not legal, due to the leading zero, -0 is legal. So -0.0cba is also legal. It's an important nuance that adds unavoidable complexity to the regex.

So again, I argue that the NLZ postfix belongs on the numeric identifier tag, not the alphanumeric identifier. In fact, the no-leading-zero numeric-identifier is a perfect description of the Major, Minor and Patch fields as well. NumericIdentifierNLZ applies to those majore, minor, patch and any numeric identifiers in the prerelease tag.

jwdonahue · 2018-07-14T04:29:11Z

@gvlx, I looked at atomic groups, but they aren't widely supported and weren't required to avoid the back-tracking problem and I am pretty sure that possessive quantifiers won't work due to all the cases where a zero is allowed to follow a dot.

jwdonahue · 2018-07-14T04:34:42Z

I am going to say this one more time, for clarity:

There are three classes of prerelease identifiers, just a zero, all numeric with no leading zero, and alphanumeric (leading zeros allowed).

jwdonahue · 2018-07-15T08:18:15Z

I just ran @DavidFichtmueller's suggested regex against our current data set and it passed with flying colors. See semver/semver#232. All it lacks are named capture groups.

Relequestual · 2018-07-17T14:44:55Z

Thanks @jwdonahue, I guess I'll need to revise my regex!
regex101 timed out with the example you provided.
I'm not a regex expert, so open to suggestions, specifically to fix the possible performance issue.

I'm mostly expecting this to be used in cases where there are no pre-release or meta information provided, but good to note the potential issues!

MyNameIsCosmo · 2018-08-06T20:53:15Z

Hello all,

Back in June I played around with the regex from both @gvlx and @jwdonahue and came up with some solution for named grouping, that I further condensed down to a regex without grouping, and then a regex with POSIX /compatibility/ (I say this loosely, cause it 'worked for me')

I put the expressions in the following Github repository along with some notes:
https://github.com/T5CC/semver-regex

Here's the named capture group expression:
^(?P<SemVer>(?P<Major>0|[1-9][0-9]*).(?P<Minor>0|[1-9][0-9]*).(?P<Patch>0|[1-9][0-9]*))(?P<Tags>(?:-(?P<Prerelease>(?:[0-9A-Za-z-]\.?)*))?(?:\+(?P<Meta>(?:[0-9A-Za-z-]\.?)*))?)?$
https://regex101.com/r/JpUgtQ/1

Edit:
Found out that the above regex does not invalidate leading zeros in prerelease tags.

This solution hasn't been checked for every edge case, but it does cover all the ones provided by @jwdonahue and it also processes 99999999999999999999999.999999999999999999.99999999999999999----RC-SNAPSHOT.12.09.1--------------------------------..12 without recursion (though a ton of steps).

Edit:
Catching up to recent issues, semver/semver#232 has an efficient expression.
I've updated the semver-regex repo to @DavidFichtmueller's solution, and gave credit.

jwdonahue · 2018-08-08T22:21:50Z

Okay, I think we've converged a good set of regex's over on semver/semver#232.

@coolaj86 please close this thread at your earliest possible convenience.

joewiz · 2018-12-16T08:13:20Z

@jwdonahue Thanks for your reply to my gist and for pointing me to this issue. I've updated my semver parsing and comparing library for XQuery, which I've rewritten to use the regex and test version strings shown here. Comments welcome - though gist won't notify me of comments, so I'd appreciate a ping here or via Twitter (same username there). https://gist.github.com/joewiz/b349e2853a17bf817e5d0013d01fa9f9

jwdonahue · 2019-08-10T20:41:16Z

It seems that PR #460 has been completed. We now have a pair of well vetted pair of regex's included in the spec.

Can we please close this discussion now?

test cases from semver/semver.org#59 (comment)

gvlx mentioned this issue Oct 5, 2014

PreRelease denotation regular expression (regex) semver/semver#149

Closed

rugk mentioned this issue Nov 3, 2015

Complete RegExp to verify version numbers semver/semver#279

Closed

silkentrance mentioned this issue Nov 22, 2017

RegEx for validating SemVer-numbers semver/semver#232

Closed

gvlx mentioned this issue Jun 27, 2018

EBNF grammar semver/semver#192

Closed

FichteFoll mentioned this issue Jun 27, 2018

BN form does not match spec semver/semver#448

Closed

jwdonahue mentioned this issue Oct 8, 2018

Is there any test vectors for testing? semver/semver#289

Closed

steveklabnik closed this as completed Aug 13, 2019

jwcranford added a commit to jwcranford/semver4j that referenced this issue Sep 11, 2019

Added unit tests based on test cases from semver

729a9c3

test cases from semver/semver.org#59 (comment)

jayschwa mentioned this issue Oct 6, 2020

std: Introduce SemanticVersion data structure ziglang/zig#6566

Merged

Needs canonical examples / reference implementation (including RegExp) #59

Needs canonical examples / reference implementation (including RegExp) #59

Comments

coolaj86 commented Apr 24, 2013

ghost commented Jun 24, 2013

coolaj86 commented Jun 24, 2013

ghost commented Jun 24, 2013

tbull commented Jun 24, 2013

gvlx commented Oct 3, 2014

coolaj86 commented Oct 3, 2014

coolaj86 commented Oct 3, 2014

gvlx commented Oct 3, 2014

gvlx commented Oct 4, 2014

FichteFoll commented Oct 12, 2014

haacked commented Aug 5, 2016

fer-rum commented Jan 4, 2017 • edited Loading

haacked commented Jan 4, 2017

FichteFoll commented Jan 9, 2017 • edited Loading

fer-rum commented Feb 10, 2017

FichteFoll commented Feb 10, 2017

fer-rum commented Feb 14, 2017

FichteFoll commented Feb 14, 2017 • edited Loading

fer-rum commented Feb 14, 2017 • edited Loading

jwdonahue commented Dec 3, 2017 • edited Loading

jwdonahue commented Dec 3, 2017

haacked commented Dec 6, 2017

wolf99 commented May 10, 2018 • edited Loading

gvlx commented May 17, 2018 • edited Loading

FichteFoll commented May 17, 2018 • edited Loading

jwdonahue commented May 18, 2018

gvlx commented May 18, 2018 • edited Loading

gvlx commented May 22, 2018

jwdonahue commented May 22, 2018 • edited Loading

jwdonahue commented May 22, 2018 • edited Loading

dls314 commented May 22, 2018

jwdonahue commented May 22, 2018 • edited Loading

dgn commented May 29, 2018

gvlx commented May 31, 2018 • edited Loading

Relequestual commented Jul 12, 2018

Relequestual commented Jul 12, 2018

jwdonahue commented Jul 12, 2018 • edited Loading

jwdonahue commented Jul 12, 2018

jwdonahue commented Jul 12, 2018

jwdonahue commented Jul 12, 2018

FichteFoll commented Jul 12, 2018

gvlx commented Jul 13, 2018

jwdonahue commented Jul 14, 2018 • edited Loading

jwdonahue commented Jul 14, 2018

jwdonahue commented Jul 14, 2018

jwdonahue commented Jul 15, 2018

Relequestual commented Jul 17, 2018

MyNameIsCosmo commented Aug 6, 2018 • edited Loading

jwdonahue commented Aug 8, 2018 • edited Loading

joewiz commented Dec 16, 2018

jwdonahue commented Aug 10, 2019

fer-rum commented Jan 4, 2017 •

edited

Loading

FichteFoll commented Jan 9, 2017 •

edited

Loading

FichteFoll commented Feb 14, 2017 •

edited

Loading

fer-rum commented Feb 14, 2017 •

edited

Loading

jwdonahue commented Dec 3, 2017 •

edited

Loading

wolf99 commented May 10, 2018 •

edited

Loading

gvlx commented May 17, 2018 •

edited

Loading

FichteFoll commented May 17, 2018 •

edited

Loading

gvlx commented May 18, 2018 •

edited

Loading

jwdonahue commented May 22, 2018 •

edited

Loading

jwdonahue commented May 22, 2018 •

edited

Loading

jwdonahue commented May 22, 2018 •

edited

Loading

gvlx commented May 31, 2018 •

edited

Loading

jwdonahue commented Jul 12, 2018 •

edited

Loading

jwdonahue commented Jul 14, 2018 •

edited

Loading

MyNameIsCosmo commented Aug 6, 2018 •

edited

Loading

jwdonahue commented Aug 8, 2018 •

edited

Loading