-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Version is no longer totally ordered and comparison speed regressed #26464
Comments
Note that in #22973 we were happy with a 2x speedup, now we have a 4x slowdown :( |
For the record, version comparison affects in non-negligible ways the "setup" time of each concretization (it is needed to compute a lot of facts about packages and preferences) |
So, if instead we make the spec parser create GitVersion's we pay the cost of the regex only once, and otherwise there's just the cost of dispatching to the right (waiting for someone to mention memoization on is_commit 😆...) |
A few random thoughts that almost date back to #1997 The semantic of versions is package specific, in the sense that it makes little sense to compare versions across packages (what is the meaning of comparing For version semantic like the one in git commits (that can be used in any package that has an underlying git repo), we can maybe define a literal tag (similar to |
@alalazo I think I know what you're trying to say with "what is the meaning of comparing I guess what you're getting at is that every package can impose a different total order on a different set of versions S:
... and both S and < may depend on the package in some cases. In principle VersionRange could be implemented just knowing it takes two elements of S and the binary operator <, but because we've chosen a weird open-ended range for Version, it also needs some function called x:y contains all elements s in S such that |
That's exactly my point. We don't need to be able to compare arbitrary Version objects to each other (which would be a valid reason to have a single class representing a version in Spack's codebase). We just need a total order for objects referring to the same package. For the rest that's where I'd like to arrive. A relevant example of 2 for instance are C++ versions where 98 < 03 < 11 < 14 < 17 < 20. |
I was planning to start another issue on this, may well use this comment as a seed for it anyway, but one significant issue with the current situation is that it's horrifyingly broken for some common git usage models. The main one being the infamous git-flow, which we use on the raja project, and is used by a number of others. By design the commits that represent any given tag are not reachable from develop commits, or vice-versa. There is an actual ordering to them, but only by merging into release branches from develop and from release branches to main, tagging the merge commit that references the release branch. For all intents and purposes, that means that all git revisions on develop after the most recent release are treated as a version below zero. All commits that are between two tags are also treated as a negative version because none of their parents are tagged, their parents are on release branches or develop. The current model works only with exactly the tagged hashes, everything else is completely divorced from the order. As far as I can tell, there's no way for a package to deal with these, other than defining something generic with |
So, you mentioned memoization... Much as I would love to see the many, many corner cases go away. The implementation tolerates this one much better with a few tweaks, like doing a string length comparison with 40 before paying to match a regex that must be exactly 40 characters long. PR in the works, but this is the progress so far: script: from IPython import get_ipython
from spack.version import Version as V
import spack.repo as repo
ipython = get_ipython()
a=V('1.0.0')
b=V('1.0.1')
print('normal versions')
ipython.magic("%timeit a<b")
a=V('1f3a4cf7f8210e6cb50db9c193f1e843d1fc0ec4')
a.generate_commit_lookup(repo.get('raja'))
b=V('357933a42842dd91de5c1034204d937fce0a2a44')
print('hashes')
ipython.magic("%timeit a<b") before:
after:
The timing for commit versions assumes the repository has already been downloaded, without that the whole thing gets skewed like crazy by the download time. |
Yeah, that's a hacky fix though. It'd be better to just have a GitVersion object. Again, git versions are the exception. |
I agree, but the two vaguely doable options for that both have some nasty downsides when I look at the required refactoring.
My actual preference right now is to make it work in |
Possibly even replace the version tuple with a commit-version class actually, that would probably be less overall divergence. |
Steps to reproduce
After #24639 the set of Version objects no longer satisfy a total order with
<
:You may say this is an edge case, but generally I'm not a fan of extending Version with git-related things, because Git commit sha's are the exception, and they don't map perfectly to Spack registered versions.
If what we want is to compare a Git commit sha to a Spack registered Version, why not create a
GitVersion
class and implement<
and friends to compare withVersion
?Soon enough we'll be extending commit sha's to general git refs (branches, tags, sha's... of forks even?), and if that's going to be part of Version that'd be really bad.
Also note that we're executing a regex multiple times on every single version string for every comparsion
<
:spack/lib/spack/spack/version.py
Lines 370 to 386 in e2ee306
spack/lib/spack/spack/version.py
Lines 208 to 215 in e2ee306
which is redundant if we had a GitVersion object directly.
Currently:
Before #24639:
So, version comparison got 4x slower.
The text was updated successfully, but these errors were encountered: