Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Version type opaque #3905

Merged
merged 3 commits into from
Sep 28, 2016
Merged

Make Version type opaque #3905

merged 3 commits into from
Sep 28, 2016

Conversation

hvr
Copy link
Member

@hvr hvr commented Sep 27, 2016

Similiar to dabd9d9 which made
PackageName opaque, this makes Distribution.Version.Version opaque.

The most common version numbers occuring on Hackage are 3- and
4-component version. This results in significant Heap overhead due to
Data.Version's inefficient internal representation.

So like the PackageName commit, this commit is a preparatory commit to
pave the way for replacing Version's internal representation by a
representation with a memory footprint which can be an order of
magnitude smaller.

data Version = Version [Int]
deriving (Data,Eq,Ord,Generic,Show,Read,Typeable)

instance Binary Version
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we say it explicitly we aren't going to preserve Binary compatibility (pun unintended)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that makes sense; I already mention that Data.Version and D.V.Version are distinct types, so I'd just have to point out that Binary instances cannot be expected to be compatible either.

unVersion :: Version -> [Int]
unVersion (Version vs) = vs

-- | Constant representing the /null' 'Version'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting?

@23Skidoo
Copy link
Member

Haven't read carefully, but +1 from me in general.

@23Skidoo
Copy link
Member

BTW, this also means that we can make the Eq and Ord instances more sane (see #1368).

@hvr
Copy link
Member Author

hvr commented Sep 27, 2016

@23Skidoo unfortunately such Eq/Ord instances would violate obvious laws such as 'x == y' --> 'f x == f y' for all 'f'; the weird Eq/Ord instances were iirc also the reason why versionTags are being deprecated.

So if we want more sane Eq/Ord instances, we have to forget the trailing zeros in such a way that they can't be observed anymore.

@23Skidoo
Copy link
Member

Hmm, the substitution law is not obvious actually. Usually in mathematics an equivalence relation is only required to be symmetric, reflexive and transitive.

@23Skidoo
Copy link
Member

23Skidoo commented Sep 27, 2016

So if we want more sane Eq/Ord instances, we have to forget the trailing zeros in such a way that they can't be observed anymore.

Well, we can't do that because people want to see them. So show will necessarily break the substitution law.

Usually in mathematics an equivalence relation is only required to be symmetric, reflexive and transitive.

The substitution law does indeed seem to be controversial, see https://www.reddit.com/r/haskell/comments/1njlqr/laws_for_the_eq_class/.

@ezyang
Copy link
Contributor

ezyang commented Sep 27, 2016

@hvr Well, if we make 0.8 and 0.8.0 equivalent, with a #1368 Eq/Ord instance, your law is essentially stating that all functions f must respect the equivalence relation that relates 0.8 and 0.8.0. If you have an appropriate interface which doesn't make it possible to distinguish these two numbers (e.g., always normalizes), then the law will hold.

@ezyang
Copy link
Contributor

ezyang commented Sep 28, 2016

For this PR, I don't think we should change Eq/Ord. Leave that for a later patchset.

@ezyang
Copy link
Contributor

ezyang commented Sep 28, 2016

Build failure is real! (Test with old version of GHC).

@hvr
Copy link
Member Author

hvr commented Sep 28, 2016

@23Skidoo I know that code in matrix.hho (and I bet in hackage-server as well) would break if the substitution/leibniz property wouldn't hold, and there's code I just recently added to cabal (guess which :-) ) which would result in subtle/confusing effects if Version had hidden state.

Similiar to dabd9d9 which made
`PackageName` opaque, this makes `Distribution.Version.Version` opaque.

The most common version numbers occuring on Hackage are 3- and
4-component version. This results in significant Heap overhead due to
`Data.Version`'s inefficient internal representation.

So like the `PackageName` commit, this commit is a preparatory commit to
pave the way for replacing `Version`'s internal representation by a
representation with a memory footprint which can be an order of
magnitude smaller.
@hvr
Copy link
Member Author

hvr commented Sep 28, 2016

@ezyang fyi, I pushed a fix which I expect to fix all build failures; as a nice benefit, this refactoring allows us to kill all that CPP-hackery in .Version

@phadej I've also tweaked the haddocks

@23Skidoo
Copy link
Member

@hvr

and there's code I just recently added to cabal (guess which :-) ) which would result in subtle/confusing effects if Version had hidden state.

The hash-consing code? Yes, that would break if there was both foo-1.0 and foo-1.0.0 in the index.

@hvr
Copy link
Member Author

hvr commented Sep 28, 2016

@23Skidoo

Yes, that would break if there was both foo-1.0 and foo-1.0.0 in the index.

unfortunately there are: #3515 (comment)

we did add code to Hackage though to make it possible at some more distant point in the future to ignore those (or maybe carefully manually remove those historic duplicate versions from the index): haskell/hackage-server@27f9727

@ezyang
Copy link
Contributor

ezyang commented Sep 28, 2016

Needs changelog entry.

@23Skidoo
Copy link
Member

@hvr Yeah, I remember, just couldn't find that ticket. I guess that we'll have to settle for using compare on normaliseVersion where appropriate, or add a NormalisedVersion newtype.

@hvr
Copy link
Member Author

hvr commented Sep 28, 2016

@23Skidoo @ezyang @phadej @dcoutts ...so, is the changelog entry the only thing missing before this PR can be merged?

Copy link
Contributor

@dcoutts dcoutts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to bikeshed, but naming is important for readability, and we use versions a lot.

So I suggest the follow renamings:

unVersion -> versionDigits
withVersion -> alterVersion
mkNullVersion -> nullVersion

and delete the existing nullVersion, replace the few places where it's used with == nullVersion.

On naming, I also suggest we use "digits" as the name of the thing that a version consists of, rather than the slightly strange name "branch" that we inherited from base Version (which contrasted branch vs tags).

So apart from doing that search and replace, there's only a few minor things I've commented on specifically like now-out-of-date comments or where we could use alterVersion.

--
-- @since 2.0
unVersion :: Version -> [Int]
unVersion (Version vs) = vs
Copy link
Contributor

@dcoutts dcoutts Sep 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking we should give this a name like versionDigits.

Once the internal rep changes, it'll be less clear that unVersion should refer to producing the abstract representation rather than the underlying physical representation.

Having read through the rest of the code I also think it'd just read a lot better / more clearly.


-- functor-like helper
withVersion :: ([Int] -> [Int]) -> Version -> Version
withVersion f = mkVersion . f . unVersion
Copy link
Contributor

@dcoutts dcoutts Sep 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would alterVersion be clearer? We tend to use withFoo for more IOish things.

--
-- @since 2.0
nullVersion :: Version -> Bool
nullVersion = null . unVersion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fear we're going to get confused between nullVersion and mkNullVersion.

What if we just had nullVersion :: Version and for the test sites we used v == nullVersion?


isWildcardRange :: Version -> Version -> Bool
isWildcardRange (Version branch1 _) (Version branch2 _) = check branch1 branch2
isWildcardRange ver1 ver2 = check (unVersion ver1) (unVersion ver2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an example case where arguably it'd read nicer / more clearly as

isWildcardRange ver1 ver2 = check (versionDigits ver1) (versionDigits ver2)

[] -> [0,1] -- should not happen
[m1] -> [m1,1] -- e.g. version '1'
(m1:m2:_) -> [m1,m2+1]
majorUpperBound = withVersion $ \ver -> case ver of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alterVersion $ \digits -> case digits of

perhaps?

@@ -477,11 +476,11 @@ pvpize :: Version -> VersionRange
pvpize v = orLaterVersion v'
`intersectVersionRanges`
earlierVersion (incVersion 1 v')
where v' = (v { versionBranch = take 2 (versionBranch v) })
where v' = mkVersion . take 2 . unVersion $ v
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alterVersion

@@ -2233,8 +2233,8 @@ parseDependencyOrPackageId = parse Parse.+++ liftM pkgidToDependency parse
where
pkgidToDependency :: PackageIdentifier -> Dependency
pkgidToDependency p = case packageVersion p of
Version [] _ -> Dependency (packageName p) anyVersion
version -> Dependency (packageName p) (thisVersion version)
v | nullVersion v -> Dependency (packageName p) anyVersion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be just as clear with v == nullVersion. So I think we could simplify.

instance Arbitrary Version where
arbitrary = do
branch <- shortListOf1 4 $
frequency [(3, return 0)
,(3, return 1)
,(2, return 2)
,(1, return 3)]
return (Version branch []) -- deliberate []
return (mkVersion branch) -- deliberate []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop the (now out of date) comment.

@@ -531,9 +530,9 @@ extractInstallPlan = catMaybes . map confPkg . CI.SolverInstallPlan.toList

srcPkg :: SolverPackage UnresolvedPkgLoc -> (String, Int)
srcPkg cpkg =
let C.PackageIdentifier pn (Version (n:_) _) =
let C.PackageIdentifier pn ver = -- (Version (n:_) _) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the comment?

++ [dateToSnapshotNumber date]
}
snapshotVersion date ver
= mkVersion (unVersion ver ++ [dateToSnapshotNumber date])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alterVersion

@dcoutts
Copy link
Contributor

dcoutts commented Sep 28, 2016

@hvr Oh and I forgot to say, I wholly approve of this! Very good :-)

@hvr
Copy link
Member Author

hvr commented Sep 28, 2016

@dcoutts I'm totally ok with all renamings, except for

On naming, I also suggest we use "digits" as the name of the thing that a version consists of, rather than the slightly strange name "branch" that we inherited from base Version (which contrasted branch vs tags)

Naming is important, hence do I cringe at calling something like "3001" a digit, which is really a number (or in the context of a "version number" a kind of sequence)

I've tried looking for prior art, but descriptions of version numbering schemes I could find almost never refer to version elements as "digits" as that's a quite ambigous term (c.f. "PVP uses a 2-digit major version" vs "PVP uses a 2-{component,part} major version", or "trailing zero digits" vs "trailing zero components"), some specifications even evade the issue of naming the subcomponents (c.f. semver.org) and calls them generically "identifiers", Microsoft tends talk about e.g. 4-part version numbers, IBM refers to integers, some other refer to the numbers a version is made up as number, which are all more accurate terms than "digit" imho.

At some level it would make sense to call them "version-digits" if version numbers were a positional numeral system, but version numbers are a different kind of mathematical structure since they're closer to a infinite-dimensional vectorspace, rather than a scalar, so the version components act rather like a coefficient to their respective basis-element, or like elements of a list or tuple, rather than a digit in a numeral...

So I'd suggest alternatives:

  • versionParts
  • versionComponents
  • versionIds
  • versionElements
  • versionNumbers
  • versionNumerals
  • versionToList
  • ...

@23Skidoo
Copy link
Member

My vote is for either versionNumbers or versionComponents.

@dcoutts
Copy link
Contributor

dcoutts commented Sep 28, 2016

Naming is important, hence do I cringe at calling something like "3001" a digit, which is really a number (or in the context of a "version number" a kind of sequence)

Ok, fair enough.

My vote is for either versionNumbers or versionComponents.

I'm happy with either of those, with a preference for versionNumbers.

Copy link
Contributor

@dcoutts dcoutts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Squash back into one patch and merge when green!

versionBranch = versionBranch version
++ [dateToSnapshotNumber date]
}
snapshotVersion date = alterVersion (++ [dateToSnapshotNumber date])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

maxPoint | null allVersionsUsed = minPoint
| otherwise = case maximum allVersionsUsed of
Version vs _ -> Version (vs ++ [1]) []
| otherwise = mkVersion . (++[1]) . versionNumbers . maximum $ allVersionsUsed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alterVersion (++[1]) (maximum allVersionsUsed)

@hvr hvr merged commit bb2026c into haskell:master Sep 28, 2016
hvr added a commit to hvr/cabal that referenced this pull request Oct 2, 2016
This optimises the `[Int]` representation to a 16-byte heap object for
~99% of version numbers (up to 4 components, each within a [0..0xfffe]
value range) occuring on Hackage.

One noteworthy improvement of this optimisation is a significant reduction
of the size of the 01-index.cache file from previously 6299700 bytes
(before haskell#3905) down to 3408864 bytes, i.e. down to ~54%

Also, this reduces the memory footprint and GC overhead a bit for
e.g. `cabal info zzz` (which reads in the index cache) from

    cabal.0: There is no package named 'zzz'.
    You may need to run 'cabal update' to get the latest list of available
    packages.
         859,337,368 bytes allocated in the heap
         447,261,128 bytes copied during GC
          37,385,208 bytes maximum residency (19 sample(s))
           1,311,136 bytes maximum slop
                 103 MB total memory in use (0 MB lost due to fragmentation)

                                         Tot time (elapsed)  Avg pause  Max pause
      Gen  0      1613 colls,     0 par    0.268s   0.268s     0.0002s    0.0012s
      Gen  1        19 colls,     0 par    0.227s   0.227s     0.0119s    0.0506s

      TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

      SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

      INIT    time    0.001s  (  0.001s elapsed)
      MUT     time    0.431s  (  0.758s elapsed)
      GC      time    0.495s  (  0.495s elapsed)
      EXIT    time    0.006s  (  0.005s elapsed)
      Total   time    0.934s  (  1.259s elapsed)

      Alloc rate    1,991,870,623 bytes per MUT second

      Productivity  46.9% of total user, 34.8% of total elapsed

to

    cabal.1: There is no package named 'zzz'.
    You may need to run 'cabal update' to get the latest list of available
    packages.
         834,314,392 bytes allocated in the heap
         440,791,176 bytes copied during GC
          36,663,112 bytes maximum residency (19 sample(s))
           2,225,040 bytes maximum slop
                  96 MB total memory in use (0 MB lost due to fragmentation)

                                         Tot time (elapsed)  Avg pause  Max pause
      Gen  0      1574 colls,     0 par    0.254s   0.254s     0.0002s    0.0007s
      Gen  1        19 colls,     0 par    0.223s   0.223s     0.0118s    0.0474s

      TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

      SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

      INIT    time    0.001s  (  0.001s elapsed)
      MUT     time    0.383s  (  0.699s elapsed)
      GC      time    0.477s  (  0.477s elapsed)
      EXIT    time    0.005s  (  0.005s elapsed)
      Total   time    0.869s  (  1.182s elapsed)

      Alloc rate    2,175,866,164 bytes per MUT second

      Productivity  44.9% of total user, 33.0% of total elapsed
hvr added a commit that referenced this pull request Oct 2, 2016
This optimises the `[Int]` representation to a 16-byte heap object for
~99% of version numbers (up to 4 components, each within a [0..0xfffe]
value range) occuring on Hackage.

One noteworthy improvement of this optimisation is a significant reduction
of the size of the 01-index.cache file from previously 6299700 bytes
(before #3905) down to 3408864 bytes, i.e. down to ~54%

Also, this reduces the memory footprint and GC overhead a bit for
e.g. `cabal info zzz` (which reads in the index cache) from

    cabal.0: There is no package named 'zzz'.
    You may need to run 'cabal update' to get the latest list of available
    packages.
         859,337,368 bytes allocated in the heap
         447,261,128 bytes copied during GC
          37,385,208 bytes maximum residency (19 sample(s))
           1,311,136 bytes maximum slop
                 103 MB total memory in use (0 MB lost due to fragmentation)

                                         Tot time (elapsed)  Avg pause  Max pause
      Gen  0      1613 colls,     0 par    0.268s   0.268s     0.0002s    0.0012s
      Gen  1        19 colls,     0 par    0.227s   0.227s     0.0119s    0.0506s

      TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

      SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

      INIT    time    0.001s  (  0.001s elapsed)
      MUT     time    0.431s  (  0.758s elapsed)
      GC      time    0.495s  (  0.495s elapsed)
      EXIT    time    0.006s  (  0.005s elapsed)
      Total   time    0.934s  (  1.259s elapsed)

      Alloc rate    1,991,870,623 bytes per MUT second

      Productivity  46.9% of total user, 34.8% of total elapsed

to

    cabal.1: There is no package named 'zzz'.
    You may need to run 'cabal update' to get the latest list of available
    packages.
         834,314,392 bytes allocated in the heap
         440,791,176 bytes copied during GC
          36,663,112 bytes maximum residency (19 sample(s))
           2,225,040 bytes maximum slop
                  96 MB total memory in use (0 MB lost due to fragmentation)

                                         Tot time (elapsed)  Avg pause  Max pause
      Gen  0      1574 colls,     0 par    0.254s   0.254s     0.0002s    0.0007s
      Gen  1        19 colls,     0 par    0.223s   0.223s     0.0118s    0.0474s

      TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

      SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

      INIT    time    0.001s  (  0.001s elapsed)
      MUT     time    0.383s  (  0.699s elapsed)
      GC      time    0.477s  (  0.477s elapsed)
      EXIT    time    0.005s  (  0.005s elapsed)
      Total   time    0.869s  (  1.182s elapsed)

      Alloc rate    2,175,866,164 bytes per MUT second

      Productivity  44.9% of total user, 33.0% of total elapsed
@hvr hvr deleted the pr/opaque-version branch October 2, 2016 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants