Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Needs canonical examples / reference implementation (including RegExp) #59

Closed
coolaj86 opened this issue Apr 24, 2013 · 65 comments
Closed

Comments

@coolaj86
Copy link

See semver/semver#32

  • RegExp in various languages that correctly parse semver
  • Reference implementations

JavaScript RegExp:

/^(([\d+)\.(\d|)\.(\d+))(?:-([\dA-Za-z\-]+(?:\.[\dA-Za-z\-]+)*))?(?:\+([\dA-Za-z\-]+(?:\.[\dA-Za-z\-]+)*))?$/

JavaScript Reference Implementation:

https://github.com/coolaj86/semver-utils

I would suggest having two sections - one for tested Regular Expressions that match and another for modules / reference implementations.

There should be at least canonical reference implementation of a parser / validator with an api that would make sense to copy in any language.

@ghost
Copy link

ghost commented Jun 24, 2013

The regular expression needs to be updated to detect the presence of leading zeros.

Unfortunately, I haven't found a way to do that, otherwise I would have provided an updated example.

@coolaj86
Copy link
Author

Can you give an example of a valid semver string that has leading zeros? and also provide a reference to the documentation that suggests this is allowed (i.e. a phrase or example in the docs)?

If so I'll add your string to the tests in semver-utils and fix it.

@ghost
Copy link

ghost commented Jun 24, 2013

Sorry, I meant to disallow.

The regular expression as used now does not seem to reject version numbers with leading zeros..

@tbull
Copy link

tbull commented Jun 24, 2013

We decided only recently that leading zeroes are no longer allowed: semver/semver#112
That decision invalidated all regexps developed earlier.

@gvlx
Copy link

gvlx commented Oct 3, 2014

Hi,

I have been playing with the regex for version 2.0.0 and came up with this on regex101.

Expanded:

/^
(?'MAJOR'
    0|(?:[1-9]\d*)
)
\.
(?'MINOR'
    0|(?:[1-9]\d*)
)
\.
(?'PATCH'
    0|(?:[1-9]\d*)
)
(?:-(?'prerelease'
    (?:0|(?:[1-9A-Za-z-][0-9A-Za-z-]*))
    (?:\.
        (?:0|(?:[1-9A-Za-z-][0-9A-Za-z-]*))
    )*
))?
(?:\+(?'build'
    (?:0|(?:[1-9A-Za-z-][0-9A-Za-z-]*))
    (?:\.
        (?:0|(?:[1-9A-Za-z-][0-9A-Za-z-]*))
    )*
))?
$/

The pre-release and build patterns are very complex because they require the 'no leading zeros' rule.

I can't figure any benefits of that over a (very) relaxed pattern as in /[0-9A-Za-z]+([.-][0-9A-Za-z]+)*/ which is just 'dot-or-dash' separated alphanumeric identifiers (e.g "00000-aaaaa.bbbbb") which, for me, would be more useful (I usually have to use UUIDs and other mechanical identifiers).

Notice that according to the railroad diagram and the BNF (boy, isn't that hard to read! 😕) the identifier "0000.0000.0000.0000.------" is valid (leading zeros allowed).

If you can, please supply more edge cases 😄 on the regex101 page (in the first block is all cases are valid, in the second, invalid).

Happy hacking!

@coolaj86
Copy link
Author

coolaj86 commented Oct 3, 2014

I'm in the camp to veto the use of leading zeros. In JavaScript (and some other languages) parsers will default to octal and then your sorting could get all out of wack because suddenly '011' is less than '10' both lexicographically and numerically.

And what about 007 vs 07 vs 7? Numerically they're all the same in base 10 and base 8 so how would you know which version is the "newer" one?

For the love of all that is good on this earth: no leading zeros!!!

@coolaj86
Copy link
Author

coolaj86 commented Oct 3, 2014

Oh, sorry I missed the part about that being a build number. I'll have to look at the spec, but I don't think the parser I mentioned prohibits this either way.

@gvlx
Copy link

gvlx commented Oct 3, 2014

Hi,

The requirement is on 2.0.0 for pre-release but not on build version:

9 A pre-release version (...) Numeric identifiers MUST NOT include leading zeroes. (...)

But on the BNF:

<pre-release identifier> ::= <alphanumeric identifier>
                           | <numeric identifier>

<build identifier> ::= <alphanumeric identifier>
                     | <digits>

<alphanumeric identifier> ::= <non-digit>
                            | <non-digit> <identifier characters>
                            | <identifier characters> <non-digit>
                            | <identifier characters> <non-digit> <identifier characters>

<identifier characters> ::= <identifier character>
                          | <identifier character> <identifier characters>

<identifier character> ::= <digit>
                         | <non-digit>

<non-digit> ::= <letter>
              | "-"

<digit> ::= "0"
          | <positive digit>

The railroad diagram is less clear.

So maybe the text requires some correction.
Added pull request #95

So, version 2.0.1? (patterns allowed on 2.0.0 will still work here).

@gvlx
Copy link

gvlx commented Oct 4, 2014

New regex101 pattern:

^
(?'MAJOR'(?:
    0|(?:[1-9]\d*)
))
\.
(?'MINOR'(?:
    0|(?:[1-9]\d*)
))
\.
(?'PATCH'(?:
    0|(?:[1-9]\d*)
))
(?:-(?'prerelease'
    [0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*
))?
(?:\+(?'build'
    [0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*
))?
$

@FichteFoll
Copy link
Contributor

9 A pre-release version (...) Numeric identifiers MUST NOT include leading zeroes. (...)

How I read this, it means that a pre-release identifier like 0123456789 is just not interpreted as a numeric but as an alphanumeric identifier and thus compared lexically instead of numerically.


identifiers consisting of only digits are compared numerically and identifiers with letters or hyphens are compared lexically in ASCII sort order. Numeric identifiers always have lower precedence than non-numeric identifiers.

Never mind, it appears that numeric identifiers with leading zeros are not accepted at all or at least have no precedence defined which is pretty much the same.

@haacked
Copy link
Contributor

haacked commented Aug 5, 2016

I'm cool with adding a new page that has a regex example. We just need to figure out how we'll change the layout of the site to accommodate such links. #57 has the same design issue.

@fer-rum
Copy link

fer-rum commented Jan 4, 2017

The posted regex seems to have a little trouble with the spec: In §9 it states

Numeric identifiers MUST NOT include leading zeroes

So I assume that a version like 1.0.0-0123 should not be valid; However in the provided regex it will be accepted. I suppose the error is in the prerelease capture group where
[0-9A-Za-z-]+(\.[0-9A-Za-z-]+)* should be
[1-9A-Za-z-]+(\.[0-9A-Za-z-]+)*.

Also, am I correct that according to spec pre-release identifiers and build metadata behave differently with respect to the leading zero policy, since §10 misses the appropriate statement?
In this case the build capture group can be reduced to
[0-9A-Za-z]+, can't it?

Before I mess with the provided regex, it would be nice if someone could confirm/ falsify my suggestion.

Edit: Noticed the discrepancy between pre-release and build metadata spec.

@haacked
Copy link
Contributor

haacked commented Jan 4, 2017

Good catch. However, your change would also make 1.0.0-0abc invalid but there's no reason that shouldn't be invalid.

Taking a step back, an identifier is either a numeric or an alphanumeric. That's a bit tricky to capture in Regex.

numeric : [1-9][0-9*]
alphanumeric: [0-9]*[A-Za-z-]+[0-9A-Za-z-]* At least one character must be non-numeric

Hence it'd combine to be something like ([1-9][0-9*])|([0-9]*[A-Za-z-]+[0-9A-Za-z-]*)

So complicated. 😦 Does that look correct?

@FichteFoll
Copy link
Contributor

FichteFoll commented Jan 9, 2017

Looks good, except that the asterisk in the numeric pattern has to go out of the set.

You could also speed it up if the second set in the alpha was matched exactly once so the engine doesn't have to backtrack.

@fer-rum
Copy link

fer-rum commented Feb 10, 2017

@haacked would you like to update the regex then (Including the suggestion by @FichteFoll )?

This is probably a regurgitation of previously discussed topics, but can a reference regex be put into the spec?

@FichteFoll
Copy link
Contributor

Another problem: Just 0 is not considered.

numeric : 0|[1-9][0-9]*
alphanumeric: [0-9]*[A-Za-z-][0-9A-Za-z-]* at least one character must be non-numeric
combined: 0|[1-9][0-9]*|[0-9]*[A-Za-z-][0-9A-Za-z-]*

@fer-rum
Copy link

fer-rum commented Feb 14, 2017

Why should the term "0" be valid? It is purely numeric, starts with a '0' so it should be invalid.

Side thought:
(Why are leading zeros in a numeric version excluded anyway?
Is it an explicit goal to parse the pattern as natural numbers if applicable?
What about negative numbers then?
Or hex-representation?
Can I introduce a leading '0' if I want the number to be interpreted as octal?
Specs are clear, but the intentions aren't.)

@FichteFoll
Copy link
Contributor

FichteFoll commented Feb 14, 2017

https://en.wikipedia.org/wiki/Leading_zero

Therefore, the usual decimal notation of integers does not use leading zeros except for the zero itself, which would be denoted as an empty string otherwise.

Leading zeroes are excluded for simplification and prevention of ambiguity. Is 0.01.1 bigger or smaller than 0.1.01? Are they equal? If they are equal, why do they not have the same string representation?

For negative numbers, I can only speculate. Considering that, in numbering, you want to "start" at a certain point and increase from that onward, it makes sense to have a generally specified starting point (i.e. lowest member) as zero instead of operating on the entire set of whole numbers, which is infinite in both directions.

It should be obvious to everyone that the spec speaks of decimal numbers in all places, which are the standard numbering system pretty much everywhere in the world, afaik.

@fer-rum
Copy link

fer-rum commented Feb 14, 2017

It should be obvious to everyone that the speak speaks of decimal numbers in all places, which are the standard numbering system pretty much everywhere in the world, afaik.

The thing is that this will often be used by programming people who tend to think in octal sometimes. :)

Leading zeroes are excluded for simplification and prevention of ambiguity. Is 0.01.1 bigger or smaller than 0.1.01? Are they equal? If they are equal, why do they not have the same string representation?

This is not about the version numbers which are clearly decimal natural numbers w/o leading 0s.
In the build information, lexical sorting is the way to go, which states that the first one comes before the second one and both are not identical.

I am in favour of stating explicitly that natural decimal numbers (including 0 per se) without leading 0s are the only accepted form of purely numeric notation in the build information. (Alternatively one could go the C-way and assume that pure numeric expressions with leading 0s are interpreted as an octal number.)

Hexadecimal expressions should be no problem, they are usually prefixed by 0x, # or alphanumeric anyway.

I still am not aware why exactly the contents of the build information is restricted in such a way.

@jwdonahue
Copy link

jwdonahue commented Dec 3, 2017

Do we have an oracle of valid and invalid version strings to test against? We should not publish any regex's without such an oracle and test scripts to verify correctness. We should probably also do the same for version 1.0.0, which is still in use.

@jwdonahue
Copy link

@haacked

We just need to figure out how we'll change the layout of the site to accommodate such links...

The current home page has a menu of version links, why not add another one with links to a FAQ and any other pages need to be added?

@haacked
Copy link
Contributor

haacked commented Dec 6, 2017

Yeah, I think we could change the menu to have the following links:

Latest | Versions | About | FAQ

That'd probably cover what we need now. Down the road we might add "Implementations" or "Related Specs".

@wolf99
Copy link

wolf99 commented May 10, 2018

Did get added to the site? Is there a final recommended regex?

@gvlx
Copy link

gvlx commented May 17, 2018

Hi,

I don't think so, although no one did advance with other options.

The current version (with the comment from @haacked above) is now this one:

^
    (?'MAJOR'(?:
        0|(?:[1-9]\d*)
    ))
    \.
    (?'MINOR'(?:
        0|(?:[1-9]\d*)
    ))
    \.
    (?'PATCH'(?:
        0|(?:[1-9]\d*)
    ))
    (?:-(?'prerelease'
        [1-9A-Za-z-][0-9A-Za-z-]*(\.[0-9A-Za-z-]+)*
    ))?
    (?:\+(?'build'
        [0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*
    ))?
$

on regex101 [online regex tester]
on RegExr [online regex tester]
on Debuggex [online regex railroad diagram]
on jex.im [online regex railroad diagram]

It would be nice if someone could contribute with more tests here.

@FichteFoll
Copy link
Contributor

FichteFoll commented May 17, 2018

1.2.3-2.02123.abc should be invalid but is matched (semver/semver#112 (comment)), whereas 1.0.0-0s123 is valid but not matched.

See #59 (comment).

@jwdonahue
Copy link

1.0.0+0.build.1-rc.10000aaa-kk-0.1 is a legal version string that is among the list of invalid strings in the test data. There are no special rules for numeric fields in the build meta tag. Anything in the set [0-9A-Za-z-] is allowed in any meta tag sequence.

@gvlx
Copy link

gvlx commented May 18, 2018

@FichteFoll, 1.0.0-0s123 is not valid because the spec says "Numeric identifiers MUST NOT include leading zeroes.".

yes, technically 0s123 is not a full numeric identifier but I do think this interpretation is reasonable.

@gvlx
Copy link

gvlx commented May 22, 2018

Hi @jwdonahue , your expression works on regex101.com with python if you remove the group naming (only a few regex engines understand that, and comments too).

It will almost works with golang if you remove the lookaheads (?= (that are giving the "not quantifiable" errors).

Tell me, you added some redundant matches and quantifiers {1}. Do they make the pattern more efficient on some regexp engines? Look at the description on Debuggex.

@jwdonahue
Copy link

jwdonahue commented May 22, 2018

Ok, I asked Stackoverflow for help last night and got some very useful feedback. It turns out the (?P<name>... named group form works on PCRE, .NET, Java and Python, and there is a PCRE library for golang, so this version works on regex101.com for everything but the golang mode on regex101.com and golang coders can use it with the library. Getting closer.

^(?P<VersionTripple>(?P<Major>0|[1-9][0-9]*)\.(?P<Minor>0|[1-9][0-9]*)\.(?P<Patch>0|[1-9][0-9]*)){1}(?P<Tags>(?:\-(?P<Prerelease>(?:(?=[0]{1}[0-9A-Za-z-]{0})(?:[0]{1})|(?=[1-9]{1}[0-9]*[A-Za-z]{0})(?:[0-9]+)|(?=[0-9]*[A-Za-z-]+[0-9A-Za-z-]*)(?:[0-9A-Za-z-]+)){1}(?:\.(?=[0]{1}[0-9A-Za-z-]{0})(?:[0]{1})|\.(?=[1-9]{1}[0-9]*[A-Za-z]{0})(?:[0-9]+)|\.(?=[0-9]*[A-Za-z-]+[0-9A-Za-z-]*)(?:[0-9A-Za-z-]+))*){1}){0,1}(?:\+(?P<Meta>(?:[0-9A-Za-z-]+(?:\.[0-9A-Za-z-]+)*))){0,1})$

@gvlx, named capture groups are required in my book. Relying on numbered capture groups is unstable. It's too easy for a random numbered capture group to accidentally show up in a revision of the regex that breaks all the code that uses the results of the previous version. I am not just looking for match/no-match Boolean result, I want to be able to access each of the fields by name. Though it would be neat if the results could somehow have PrereleaseTags[] and MetaTags[] arrays (minus the dots), it's simple enough to split the Prerelease and Meta tags on the dots, in a post processing step.

I think we're very close now, just need to eliminate some redundant quantifiers and investigate whether there are any useful optimizations. Regex 101 counts a total of 591 steps to process the 72 lines (1293 bytes) of test data. Not sure exactly what they are measuring there, but if there's any way to speed this thing up without losing functionality, that would be awesome.

Also, the test data is a bit random and should be rationalized and probably requires expansion.

@jwdonahue
Copy link

jwdonahue commented May 22, 2018

I have only one major worry with regard to posting a regex as a canonical example, and that is, I know of no way to prove its correctness relative to the spec. In the absence of such a proof, we should label these as plausible examples that are explicitly not canonical, if we publish them at all. The spec should always be the golden standard.

@dls314
Copy link

dls314 commented May 22, 2018

if we publish them at all

I think this should be published, with some of the gory details, if for no other reason than otherwise people will re-create worse regex for the same purpose.

@jwdonahue
Copy link

jwdonahue commented May 22, 2018

Yup, I just did another search and looked a bit deeper than the the first page, and there's over hundreds of them here on GitHub and most are useless and some may even be worse than useless.

@dgn
Copy link

dgn commented May 29, 2018

@gvlx I think you meant me. I removed my comment after realising I was wrong about that.

@gvlx
Copy link

gvlx commented May 31, 2018

Following the discussion in EBNF grammar and this clarification, I updated this description in W3C simple EBNF grammar, which is much clearer than the official SemVer specification's BNF grammar to help clear the regex ambiguities:

Version ::= VersionCore ('-' PreRelease)? ('+' Build)?

VersionCore ::= Major '.' Minor '.' Patch
Major ::= NumericIdentifier  
Minor ::= NumericIdentifier  
Patch ::= NumericIdentifier  
PreRelease ::= CompoundPreReleaseIdentifier 
Build      ::= CompoundBuildIdentifier 

CompoundPreReleaseIdentifier ::= PreReleaseIdentifier ('.' PreReleaseIdentifier)*
CompoundBuildIdentifier      ::= BuildIdentifier      ('.' BuildIdentifier)*

PreReleaseIdentifier     ::= AlphaNumericIdentifier | NumericIdentifier+
BuildIdentifier          ::= IdentifierCharacter+

NumericIdentifier        ::= '0' | ( PositiveNumeric NumericCharacter* )
AlphaNumericIdentifier   ::= IdentifierCharacter* NonNumericCharacter IdentifierCharacter*

IdentifierCharacter ::= NonNumericCharacter | NumericCharacter

NonNumericCharacter ::= [A-Za-z-]
NumericCharacter    ::= '0' | PositiveNumeric
PositiveNumeric     ::= [1-9]

Update 2018/07/17: finally I understood (call me slooow) the construction of the pre-release identifier! The element AlphaNumericIdentifier is made so that it always has a non-numeric character, avoiding the leading zero on pure numeric identifiers. (Thanks for @jwdonahue for correcting me on this).

Following this, I think the actual Regex for the two main extra tags are (using @DavidFichtmueller's simplifications here):

  • BuildIdentifier ::== /[0-9A-Za-z-]+/
  • PreReleaseIdentifier ::== /0|[1-9]\d*|\d*[A-Za-z-][0-9A-Za-z-]*/

which have to be tested as a compound identifier, i.e., /<identifier>(\.<identifier>)*/.

Maybe these patterns can be further optimized?

I suppose that, for performance reasons, pattern processing could be done on two passes?

  • Extract the main groups first /^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(-[.0-9a-zA-Z-]+)?(\+[.0-9a-zA-Z-]+)?$/ (will produce false positives);
  • Test the pre-release and build matchs separately as compound identifiers

@Relequestual
Copy link

If anyone is interested, here is a javascript regex version (regex101) I made, which passes all the same tests as those in the above PCRE version.

^((0|[1-9][0-9]*)\.){2}(0|[1-9][0-9]*){1}(-([0-9A-Za-z-]+\.?)+)*(\+([0-9A-Za-z-]+-?\.?)+)*$

It's a little more condensed, as it's not concerned with group names, and just validation. (I'm using it for a JSON Schema)

@Relequestual
Copy link

Question, the spec for pre-release and metadata says the identifiers may consist of ASCII alphanumerics and hyphens only. It doesn't mention anything about dots, however the previously mentioned regex allows for it. Is this implied somewhere other than by the attached examples?

@jwdonahue
Copy link

jwdonahue commented Jul 12, 2018

@Relequestual

Quoting from the spec.

9 . A pre-release version MAY be denoted by appending a hyphen and a series of dot separated identifiers immediately following the patch version.

10. Build metadata MAY be denoted by appending a plus sign and a series of dot separated identifiers immediately following the patch or pre-release version.

I would add that your regex has an unbounded or at least extremely long run-time when faced with this tortuous, nearly legal example:

99999999999999999999999.999999999999999999.99999999999999999----RC-SNAPSHOT.12.09.1--------------------------------..12

Notice the empty prerelease field just before that final '.12'? That triggers back-tracking, which can result in infinite and pointless machinations. Now it's possible that your javascript implementation simply times out and correctly fails this on your system, but it's not something you should count on in a production environment. To the best of my knowledge, .NET implementations seem to be the most susceptible (they don't give up easily). Since it can only happen with near-matches, and you simply want a pass/fail, your regex will be fine, provided that you ensure that a reasonable timeout is in place to cut-off this degenerate behavior.

@jwdonahue
Copy link

@Relequestual, FYI: I got around the back-tracking by using look-ahead to chose which regex snippet to apply. It's those sneak peaks ahead of the current anchor point that makes my regex so long. Adding named capture groups to your regex would not overly complicate it, but they won't fix the bugs in your code.

Your regex fails to reject 1.2.3-0123 and 1.2.3-0123.0123. Both are invalid due to leading zero in the purely numeric fields of the prerelease tag.

@jwdonahue
Copy link

@Relequestual, here's two more examples it failed to reject:

9.8.7+meta+meta
9.8.7-whatever+meta+meta

@jwdonahue
Copy link

@gvlx, I think I missed something in your last post, regarding what you are calling AlphanumIdentifierNLZ. That terminology is incorrect. It should be something like AllNumericNoLeadingZerosIdentifier. Alphanumeric identifiers do not have the leading zero prohibition, it only applies to all numeric identifiers, so -0zyz is a legal prerelease tag according to the spec.

@FichteFoll
Copy link
Contributor

Depending on your regex engine, you could also look at atomic groups or possessive quantifiers to prevent backtracking.

@gvlx
Copy link

gvlx commented Jul 13, 2018

@jwdonahue, the name comes from the BNF, as pointed out in semver/semver#181 (comment).

Yes, not the best choice but I kept it for consistency with the BNF.

As you can read, it is a modified alphanumeric identifier which cannot have a leading zero. This only applies to the pre-release identifier, and it is one of the sources of performance degradation.

This is also the reason @Relequestual's #59 (comment) is not correct.

I still wonder if making the two step matching approach I suggested would improve performance as each regex would much simpler.

@jwdonahue
Copy link

jwdonahue commented Jul 14, 2018

@gvlx, are we talking past each other? Is there something I've missed? Nothing in the spec prohibits alphanumeric identifiers from having leading zeros. Only the pure numeric identifiers have that prohibition on them. I've been trying to get this point across to you and others for some time now. -0123zyx is a legal prerelease tag, even with the leading zero (maybe that wasn't obvious with your display font?). -0123 is not legal, due to the leading zero, -0 is legal. So -0.0cba is also legal. It's an important nuance that adds unavoidable complexity to the regex.

So again, I argue that the NLZ postfix belongs on the numeric identifier tag, not the alphanumeric identifier. In fact, the no-leading-zero numeric-identifier is a perfect description of the Major, Minor and Patch fields as well. NumericIdentifierNLZ applies to those majore, minor, patch and any numeric identifiers in the prerelease tag.

@jwdonahue
Copy link

@gvlx, I looked at atomic groups, but they aren't widely supported and weren't required to avoid the back-tracking problem and I am pretty sure that possessive quantifiers won't work due to all the cases where a zero is allowed to follow a dot.

@jwdonahue
Copy link

I am going to say this one more time, for clarity:

There are three classes of prerelease identifiers, just a zero, all numeric with no leading zero, and alphanumeric (leading zeros allowed).

@jwdonahue
Copy link

I just ran @DavidFichtmueller's suggested regex against our current data set and it passed with flying colors. See semver/semver#232. All it lacks are named capture groups.

@Relequestual
Copy link

Thanks @jwdonahue, I guess I'll need to revise my regex!
regex101 timed out with the example you provided.
I'm not a regex expert, so open to suggestions, specifically to fix the possible performance issue.

I'm mostly expecting this to be used in cases where there are no pre-release or meta information provided, but good to note the potential issues!

@MyNameIsCosmo
Copy link

MyNameIsCosmo commented Aug 6, 2018

Hello all,

Back in June I played around with the regex from both @gvlx and @jwdonahue and came up with some solution for named grouping, that I further condensed down to a regex without grouping, and then a regex with POSIX /compatibility/ (I say this loosely, cause it 'worked for me')

I put the expressions in the following Github repository along with some notes:
https://github.com/T5CC/semver-regex

Here's the named capture group expression:
^(?P<SemVer>(?P<Major>0|[1-9][0-9]*).(?P<Minor>0|[1-9][0-9]*).(?P<Patch>0|[1-9][0-9]*))(?P<Tags>(?:-(?P<Prerelease>(?:[0-9A-Za-z-]\.?)*))?(?:\+(?P<Meta>(?:[0-9A-Za-z-]\.?)*))?)?$
https://regex101.com/r/JpUgtQ/1

Edit:
Found out that the above regex does not invalidate leading zeros in prerelease tags.

This solution hasn't been checked for every edge case, but it does cover all the ones provided by @jwdonahue and it also processes 99999999999999999999999.999999999999999999.99999999999999999----RC-SNAPSHOT.12.09.1--------------------------------..12 without recursion (though a ton of steps).

Edit:
Catching up to recent issues, semver/semver#232 has an efficient expression.
I've updated the semver-regex repo to @DavidFichtmueller's solution, and gave credit.

@jwdonahue
Copy link

jwdonahue commented Aug 8, 2018

Okay, I think we've converged a good set of regex's over on semver/semver#232.

@coolaj86 please close this thread at your earliest possible convenience.

@joewiz
Copy link

joewiz commented Dec 16, 2018

@jwdonahue Thanks for your reply to my gist and for pointing me to this issue. I've updated my semver parsing and comparing library for XQuery, which I've rewritten to use the regex and test version strings shown here. Comments welcome - though gist won't notify me of comments, so I'd appreciate a ping here or via Twitter (same username there). https://gist.github.com/joewiz/b349e2853a17bf817e5d0013d01fa9f9

@jwdonahue
Copy link

It seems that PR #460 has been completed. We now have a pair of well vetted pair of regex's included in the spec.

Can we please close this discussion now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests