-
Notifications
You must be signed in to change notification settings - Fork 691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Version
type opaque
#3905
Make Version
type opaque
#3905
Conversation
data Version = Version [Int] | ||
deriving (Data,Eq,Ord,Generic,Show,Read,Typeable) | ||
|
||
instance Binary Version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we say it explicitly we aren't going to preserve Binary
compatibility (pun unintended)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that makes sense; I already mention that Data.Version
and D.V.Version
are distinct types, so I'd just have to point out that Binary
instances cannot be expected to be compatible either.
unVersion :: Version -> [Int] | ||
unVersion (Version vs) = vs | ||
|
||
-- | Constant representing the /null' 'Version' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
formatting?
Haven't read carefully, but +1 from me in general. |
BTW, this also means that we can make the |
@23Skidoo unfortunately such Eq/Ord instances would violate obvious laws such as 'x == y' --> 'f x == f y' for all 'f'; the weird Eq/Ord instances were iirc also the reason why So if we want more sane Eq/Ord instances, we have to forget the trailing zeros in such a way that they can't be observed anymore. |
Hmm, the substitution law is not obvious actually. Usually in mathematics an equivalence relation is only required to be symmetric, reflexive and transitive. |
Well, we can't do that because people want to see them. So
The substitution law does indeed seem to be controversial, see https://www.reddit.com/r/haskell/comments/1njlqr/laws_for_the_eq_class/. |
@hvr Well, if we make 0.8 and 0.8.0 equivalent, with a #1368 Eq/Ord instance, your law is essentially stating that all functions f must respect the equivalence relation that relates 0.8 and 0.8.0. If you have an appropriate interface which doesn't make it possible to distinguish these two numbers (e.g., always normalizes), then the law will hold. |
For this PR, I don't think we should change Eq/Ord. Leave that for a later patchset. |
Build failure is real! (Test with old version of GHC). |
@23Skidoo I know that code in matrix.hho (and I bet in hackage-server as well) would break if the substitution/leibniz property wouldn't hold, and there's code I just recently added to cabal (guess which :-) ) which would result in subtle/confusing effects if |
Similiar to dabd9d9 which made `PackageName` opaque, this makes `Distribution.Version.Version` opaque. The most common version numbers occuring on Hackage are 3- and 4-component version. This results in significant Heap overhead due to `Data.Version`'s inefficient internal representation. So like the `PackageName` commit, this commit is a preparatory commit to pave the way for replacing `Version`'s internal representation by a representation with a memory footprint which can be an order of magnitude smaller.
e6b2b10
to
5f40eb0
Compare
The hash-consing code? Yes, that would break if there was both foo-1.0 and foo-1.0.0 in the index. |
unfortunately there are: #3515 (comment) we did add code to Hackage though to make it possible at some more distant point in the future to ignore those (or maybe carefully manually remove those historic duplicate versions from the index): haskell/hackage-server@27f9727 |
Needs changelog entry. |
@hvr Yeah, I remember, just couldn't find that ticket. I guess that we'll have to settle for using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to bikeshed, but naming is important for readability, and we use versions a lot.
So I suggest the follow renamings:
unVersion -> versionDigits
withVersion -> alterVersion
mkNullVersion -> nullVersion
and delete the existing nullVersion, replace the few places where it's used with == nullVersion
.
On naming, I also suggest we use "digits" as the name of the thing that a version consists of, rather than the slightly strange name "branch" that we inherited from base Version (which contrasted branch vs tags).
So apart from doing that search and replace, there's only a few minor things I've commented on specifically like now-out-of-date comments or where we could use alterVersion
.
-- | ||
-- @since 2.0 | ||
unVersion :: Version -> [Int] | ||
unVersion (Version vs) = vs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking we should give this a name like versionDigits
.
Once the internal rep changes, it'll be less clear that unVersion
should refer to producing the abstract representation rather than the underlying physical representation.
Having read through the rest of the code I also think it'd just read a lot better / more clearly.
|
||
-- functor-like helper | ||
withVersion :: ([Int] -> [Int]) -> Version -> Version | ||
withVersion f = mkVersion . f . unVersion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would alterVersion
be clearer? We tend to use withFoo for more IOish things.
-- | ||
-- @since 2.0 | ||
nullVersion :: Version -> Bool | ||
nullVersion = null . unVersion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fear we're going to get confused between nullVersion
and mkNullVersion
.
What if we just had nullVersion :: Version
and for the test sites we used v == nullVersion
?
|
||
isWildcardRange :: Version -> Version -> Bool | ||
isWildcardRange (Version branch1 _) (Version branch2 _) = check branch1 branch2 | ||
isWildcardRange ver1 ver2 = check (unVersion ver1) (unVersion ver2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an example case where arguably it'd read nicer / more clearly as
isWildcardRange ver1 ver2 = check (versionDigits ver1) (versionDigits ver2)
[] -> [0,1] -- should not happen | ||
[m1] -> [m1,1] -- e.g. version '1' | ||
(m1:m2:_) -> [m1,m2+1] | ||
majorUpperBound = withVersion $ \ver -> case ver of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alterVersion $ \digits -> case digits of
perhaps?
@@ -477,11 +476,11 @@ pvpize :: Version -> VersionRange | |||
pvpize v = orLaterVersion v' | |||
`intersectVersionRanges` | |||
earlierVersion (incVersion 1 v') | |||
where v' = (v { versionBranch = take 2 (versionBranch v) }) | |||
where v' = mkVersion . take 2 . unVersion $ v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alterVersion
@@ -2233,8 +2233,8 @@ parseDependencyOrPackageId = parse Parse.+++ liftM pkgidToDependency parse | |||
where | |||
pkgidToDependency :: PackageIdentifier -> Dependency | |||
pkgidToDependency p = case packageVersion p of | |||
Version [] _ -> Dependency (packageName p) anyVersion | |||
version -> Dependency (packageName p) (thisVersion version) | |||
v | nullVersion v -> Dependency (packageName p) anyVersion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be just as clear with v == nullVersion
. So I think we could simplify.
instance Arbitrary Version where | ||
arbitrary = do | ||
branch <- shortListOf1 4 $ | ||
frequency [(3, return 0) | ||
,(3, return 1) | ||
,(2, return 2) | ||
,(1, return 3)] | ||
return (Version branch []) -- deliberate [] | ||
return (mkVersion branch) -- deliberate [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop the (now out of date) comment.
@@ -531,9 +530,9 @@ extractInstallPlan = catMaybes . map confPkg . CI.SolverInstallPlan.toList | |||
|
|||
srcPkg :: SolverPackage UnresolvedPkgLoc -> (String, Int) | |||
srcPkg cpkg = | |||
let C.PackageIdentifier pn (Version (n:_) _) = | |||
let C.PackageIdentifier pn ver = -- (Version (n:_) _) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the comment?
++ [dateToSnapshotNumber date] | ||
} | ||
snapshotVersion date ver | ||
= mkVersion (unVersion ver ++ [dateToSnapshotNumber date]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alterVersion
@hvr Oh and I forgot to say, I wholly approve of this! Very good :-) |
@dcoutts I'm totally ok with all renamings, except for
Naming is important, hence do I cringe at calling something like "3001" a digit, which is really a number (or in the context of a "version number" a kind of sequence) I've tried looking for prior art, but descriptions of version numbering schemes I could find almost never refer to version elements as "digits" as that's a quite ambigous term (c.f. "PVP uses a 2-digit major version" vs "PVP uses a 2-{component,part} major version", or "trailing zero digits" vs "trailing zero components"), some specifications even evade the issue of naming the subcomponents (c.f. semver.org) and calls them generically "identifiers", Microsoft tends talk about e.g. 4-part version numbers, IBM refers to integers, some other refer to the numbers a version is made up as number, which are all more accurate terms than "digit" imho. At some level it would make sense to call them "version-digits" if version numbers were a positional numeral system, but version numbers are a different kind of mathematical structure since they're closer to a infinite-dimensional vectorspace, rather than a scalar, so the version components act rather like a coefficient to their respective basis-element, or like elements of a list or tuple, rather than a digit in a numeral... So I'd suggest alternatives:
|
My vote is for either |
Ok, fair enough.
I'm happy with either of those, with a preference for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Squash back into one patch and merge when green!
versionBranch = versionBranch version | ||
++ [dateToSnapshotNumber date] | ||
} | ||
snapshotVersion date = alterVersion (++ [dateToSnapshotNumber date]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
maxPoint | null allVersionsUsed = minPoint | ||
| otherwise = case maximum allVersionsUsed of | ||
Version vs _ -> Version (vs ++ [1]) [] | ||
| otherwise = mkVersion . (++[1]) . versionNumbers . maximum $ allVersionsUsed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alterVersion (++[1]) (maximum allVersionsUsed)
This optimises the `[Int]` representation to a 16-byte heap object for ~99% of version numbers (up to 4 components, each within a [0..0xfffe] value range) occuring on Hackage. One noteworthy improvement of this optimisation is a significant reduction of the size of the 01-index.cache file from previously 6299700 bytes (before haskell#3905) down to 3408864 bytes, i.e. down to ~54% Also, this reduces the memory footprint and GC overhead a bit for e.g. `cabal info zzz` (which reads in the index cache) from cabal.0: There is no package named 'zzz'. You may need to run 'cabal update' to get the latest list of available packages. 859,337,368 bytes allocated in the heap 447,261,128 bytes copied during GC 37,385,208 bytes maximum residency (19 sample(s)) 1,311,136 bytes maximum slop 103 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 1613 colls, 0 par 0.268s 0.268s 0.0002s 0.0012s Gen 1 19 colls, 0 par 0.227s 0.227s 0.0119s 0.0506s TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1) SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) INIT time 0.001s ( 0.001s elapsed) MUT time 0.431s ( 0.758s elapsed) GC time 0.495s ( 0.495s elapsed) EXIT time 0.006s ( 0.005s elapsed) Total time 0.934s ( 1.259s elapsed) Alloc rate 1,991,870,623 bytes per MUT second Productivity 46.9% of total user, 34.8% of total elapsed to cabal.1: There is no package named 'zzz'. You may need to run 'cabal update' to get the latest list of available packages. 834,314,392 bytes allocated in the heap 440,791,176 bytes copied during GC 36,663,112 bytes maximum residency (19 sample(s)) 2,225,040 bytes maximum slop 96 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 1574 colls, 0 par 0.254s 0.254s 0.0002s 0.0007s Gen 1 19 colls, 0 par 0.223s 0.223s 0.0118s 0.0474s TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1) SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) INIT time 0.001s ( 0.001s elapsed) MUT time 0.383s ( 0.699s elapsed) GC time 0.477s ( 0.477s elapsed) EXIT time 0.005s ( 0.005s elapsed) Total time 0.869s ( 1.182s elapsed) Alloc rate 2,175,866,164 bytes per MUT second Productivity 44.9% of total user, 33.0% of total elapsed
This optimises the `[Int]` representation to a 16-byte heap object for ~99% of version numbers (up to 4 components, each within a [0..0xfffe] value range) occuring on Hackage. One noteworthy improvement of this optimisation is a significant reduction of the size of the 01-index.cache file from previously 6299700 bytes (before #3905) down to 3408864 bytes, i.e. down to ~54% Also, this reduces the memory footprint and GC overhead a bit for e.g. `cabal info zzz` (which reads in the index cache) from cabal.0: There is no package named 'zzz'. You may need to run 'cabal update' to get the latest list of available packages. 859,337,368 bytes allocated in the heap 447,261,128 bytes copied during GC 37,385,208 bytes maximum residency (19 sample(s)) 1,311,136 bytes maximum slop 103 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 1613 colls, 0 par 0.268s 0.268s 0.0002s 0.0012s Gen 1 19 colls, 0 par 0.227s 0.227s 0.0119s 0.0506s TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1) SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) INIT time 0.001s ( 0.001s elapsed) MUT time 0.431s ( 0.758s elapsed) GC time 0.495s ( 0.495s elapsed) EXIT time 0.006s ( 0.005s elapsed) Total time 0.934s ( 1.259s elapsed) Alloc rate 1,991,870,623 bytes per MUT second Productivity 46.9% of total user, 34.8% of total elapsed to cabal.1: There is no package named 'zzz'. You may need to run 'cabal update' to get the latest list of available packages. 834,314,392 bytes allocated in the heap 440,791,176 bytes copied during GC 36,663,112 bytes maximum residency (19 sample(s)) 2,225,040 bytes maximum slop 96 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 1574 colls, 0 par 0.254s 0.254s 0.0002s 0.0007s Gen 1 19 colls, 0 par 0.223s 0.223s 0.0118s 0.0474s TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1) SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) INIT time 0.001s ( 0.001s elapsed) MUT time 0.383s ( 0.699s elapsed) GC time 0.477s ( 0.477s elapsed) EXIT time 0.005s ( 0.005s elapsed) Total time 0.869s ( 1.182s elapsed) Alloc rate 2,175,866,164 bytes per MUT second Productivity 44.9% of total user, 33.0% of total elapsed
Similiar to dabd9d9 which made
PackageName
opaque, this makesDistribution.Version.Version
opaque.The most common version numbers occuring on Hackage are 3- and
4-component version. This results in significant Heap overhead due to
Data.Version
's inefficient internal representation.So like the
PackageName
commit, this commit is a preparatory commit topave the way for replacing
Version
's internal representation by arepresentation with a memory footprint which can be an order of
magnitude smaller.