Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After implementing new installed package ID (hash of sdist), get rid of package keys #2745

Closed
ezyang opened this issue Jul 30, 2015 · 5 comments

Comments

Projects
None yet
3 participants
@ezyang
Copy link
Contributor

commented Jul 30, 2015

The Cabal Nix GSOC is planning to change installed package IDs to be the hash of the sdist tarball of an package (instead of the ABI hash of the compiled code). https://ghc.haskell.org/trac/ghc/wiki/Commentary/GSoC_Cabal_nix Once we do this, SPJ and I propose to GET RID of package keys, removing a nettlesome source of indirection from Cabal and GHC. Here are the details:

Currently, GHC has two notions of package identity: the installed package ID (presently the ABI hash) which uniquely identifies a package in the installed package database, and the package key, which is used for linker symbols and type identity. In fact, package keys are a COARSE version of installed package IDs: they contain information about the transitive dependencies Cabal picked, but not any details about the actual source code. (Put alternately: if the package key changes, the installed package ID changes, but not vice versa.)

Why do we have a fine-grained notion of identity, and a coarse-grained version? Historically, there were two reasons:

  1. The installed package ID was computed with ghc --abi-hash after compilation, where as the package key needed to be passed to GHC before compilation; and
  2. When developing a package, it seemed desirable to not recompile whenever source changed, which is what would occur if the package key were based on a hash of the sdist.

(1) is no longer applicable when installed package IDs are computed by hasing sdist. And (2) can be addressed simply by picking STABLE, fake installed package IDs when doing local development; e.g. "containers-2.0-inplace". If you want to install to the global database, you'll have to recompile everything with the right IPIDs, but for inplace development this should work great.

@ttuegel

This comment has been minimized.

Copy link
Member

commented Aug 8, 2015

👍

@ezyang

This comment has been minimized.

Copy link
Contributor Author

commented Aug 25, 2015

I went ahead and implemented a first draft of this at PR #2792

The primary complication is, whatever we do has to remain compatible with old versions of GHC. This means we still have to generate "package keys" and whatever format we generate has to be compatible with what ghc-pkg accepts. The best thing to do is to just say that an IPID is the PK (and this is what I have implemented). Unfortunately, when I coded ghc-pkg I gave it a very restrictive parser which only accepted "pkgname_HASH". This means, for GHC 7.10, we have to be very careful about the format of IPIDs, because we will otherwise break the parser format.

One possibility of working around with is to bring back is to bring back the distinction between an installed package ID and a "fuller" identifier which includes versions and is used in directory names, but this adds some modest extra complication to the code. This would allow us to make Cabal continue to work with GHC 7.10 in a nearly equivalent way (except for the fact that package keys are being calculated differently).

@ezyang

This comment has been minimized.

Copy link
Contributor Author

commented Aug 25, 2015

OK, I think the simplest thing to do here is to special case 7.10 to know about package keys (but in a very skeletal form).

ezyang added a commit to ezyang/cabal that referenced this issue Sep 10, 2015

Make InstalledUnitId (PackageKey) the primary identifier of packages.
In Backpack, it is useful for a Cabal file to contain multiple
components (henceforth referred to as units), which can be built
and installed individually.  However, it would be inappropriate to
identify these units by an 'InstalledPackageId' (since a package
can contain multiple units), so this patch set changes all relevant
occurrences of 'InstalledPackageId' to 'InstalledUnitId'.

NOTE: This is a lie, this patch actually renames the relevant
occurrences of 'InstalledPackageId' to the (existing) 'PackageKey';
the actual renaming will occur in a separate patch.

Simultaneously, this patch gets rid of the distinction between
'InstalledPackageId' and 'PackageKey' as much as possible (haskell#2745), so that
the 'InstalledPackageId' is (for non-Backpack packages) directly
equivalent to an 'InstalledUnitId' that can be used by GHC for
linker symbols and type equality.  This means that Cabal directly computes
'InstalledPackageId' during the configure step (very similar to how
'PackageKey' was computed.)  The upshot is that GHC and Cabal only ever
have to care about 'InstalledUnitId' (whereas prior to this patch you
had to juggle 'InstalledPackageId' and 'PackageKey'.)

The rest of the patch is to deal with the fallout of these two changes.
Here's the capsule summary of the rest of the changes:

    - 'id' is an InstalledUnitId; 'package-id' records the
      old (now non-unique) InstalledPackageId

    - 'depends' is now a list of InstalledUnitIds

    - New 'abi' field to record what the ABI of a unit is
      (as the InstalledUnitId is no longer computed by
      looking at the output of ghc --abi-hash).

    - The 'HasInstalledPackageId' typeclass is renamed to
      'HasInstalledUnitId'.

    - GHC 7.10 has explicit compatibility handling with
      a 'compatPackageKey' (an 'InstalledUnitID') which is
      in a compatible format.  The value of this is read out
      from the 'key' field.

    - An install path is is based off of the installed package
      ID, replacing the "library name" (which was derived by
      a package key.) There's a new variable '$ipid', and
      the old '$pkgkey' and '$libname' are updated to use this
      variable.

    - "-inplace" IPIDs are completely retired.  An internal
      package ID is now simply pkgname-pkgversion.  You can
      use the --ipid flag to explicitly ask for a different
      inplace IPID.

    - When we register a non-inplace package, we check to see
      if we are clobbering an existing IUID, and bail out if
      we are.

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>

ezyang added a commit to ezyang/cabal that referenced this issue Sep 10, 2015

Make InstalledUnitId (PackageKey) the primary identifier of packages.
In Backpack, it is useful for a Cabal file to contain multiple
components (henceforth referred to as units), which can be built
and installed individually.  However, it would be inappropriate to
identify these units by an 'InstalledPackageId' (since a package
can contain multiple units), so this patch set changes all relevant
occurrences of 'InstalledPackageId' to 'InstalledUnitId'.

NOTE: This is a lie, this patch actually renames the relevant
occurrences of 'InstalledPackageId' to the (existing) 'PackageKey';
the actual renaming will occur in a separate patch.

Simultaneously, this patch gets rid of the distinction between
'InstalledPackageId' and 'PackageKey' as much as possible (haskell#2745), so that
the 'InstalledPackageId' is (for non-Backpack packages) directly
equivalent to an 'InstalledUnitId' that can be used by GHC for
linker symbols and type equality.  This means that Cabal directly computes
'InstalledPackageId' during the configure step (very similar to how
'PackageKey' was computed.)  The upshot is that GHC and Cabal only ever
have to care about 'InstalledUnitId' (whereas prior to this patch you
had to juggle 'InstalledPackageId' and 'PackageKey'.)

The rest of the patch is to deal with the fallout of these two changes.
Here's the capsule summary of the rest of the changes:

    - 'id' is an InstalledUnitId; 'package-id' records the
      old (now non-unique) InstalledPackageId

    - 'depends' is now a list of InstalledUnitIds

    - New 'abi' field to record what the ABI of a unit is
      (as the InstalledUnitId is no longer computed by
      looking at the output of ghc --abi-hash).

    - The 'HasInstalledPackageId' typeclass is renamed to
      'HasInstalledUnitId'.

    - GHC 7.10 has explicit compatibility handling with
      a 'compatPackageKey' (an 'InstalledUnitID') which is
      in a compatible format.  The value of this is read out
      from the 'key' field.

    - An install path is is based off of the installed package
      ID, replacing the "library name" (which was derived by
      a package key.) There's a new variable '$ipid', and
      the old '$pkgkey' and '$libname' are updated to use this
      variable.

    - "-inplace" IPIDs are completely retired.  An internal
      package ID is now simply pkgname-pkgversion.  You can
      use the --ipid flag to explicitly ask for a different
      inplace IPID.

    - When we register a non-inplace package, we check to see
      if we are clobbering an existing IUID, and bail out if
      we are.

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>

ezyang added a commit to ezyang/cabal that referenced this issue Sep 10, 2015

Make InstalledUnitId (PackageKey) the primary identifier of packages.
In Backpack, it is useful for a Cabal file to contain multiple
components (henceforth referred to as units), which can be built
and installed individually.  However, it would be inappropriate to
identify these units by an 'InstalledPackageId' (since a package
can contain multiple units), so this patch set changes all relevant
occurrences of 'InstalledPackageId' to 'InstalledUnitId'.

NOTE: This is a lie, this patch actually renames the relevant
occurrences of 'InstalledPackageId' to the (existing) 'PackageKey';
the actual renaming will occur in a separate patch.

Simultaneously, this patch gets rid of the distinction between
'InstalledPackageId' and 'PackageKey' as much as possible (haskell#2745), so that
the 'InstalledPackageId' is (for non-Backpack packages) directly
equivalent to an 'InstalledUnitId' that can be used by GHC for
linker symbols and type equality.  This means that Cabal directly computes
'InstalledPackageId' during the configure step (very similar to how
'PackageKey' was computed.)  The upshot is that GHC and Cabal only ever
have to care about 'InstalledUnitId' (whereas prior to this patch you
had to juggle 'InstalledPackageId' and 'PackageKey'.)

The rest of the patch is to deal with the fallout of these two changes.
Here's the capsule summary of the rest of the changes:

    - 'id' is an InstalledUnitId; 'package-id' records the
      old (now non-unique) InstalledPackageId

    - 'depends' is now a list of InstalledUnitIds

    - New 'abi' field to record what the ABI of a unit is
      (as the InstalledUnitId is no longer computed by
      looking at the output of ghc --abi-hash).

    - The 'HasInstalledPackageId' typeclass is renamed to
      'HasInstalledUnitId'.

    - GHC 7.10 has explicit compatibility handling with
      a 'compatPackageKey' (an 'InstalledUnitID') which is
      in a compatible format.  The value of this is read out
      from the 'key' field.

    - An install path is is based off of the installed package
      ID, replacing the "library name" (which was derived by
      a package key.) There's a new variable '$ipid', and
      the old '$pkgkey' and '$libname' are updated to use this
      variable.

    - "-inplace" IPIDs are completely retired.  An internal
      package ID is now simply pkgname-pkgversion.  You can
      use the --ipid flag to explicitly ask for a different
      inplace IPID.

    - When we register a non-inplace package, we check to see
      if we are clobbering an existing IUID, and bail out if
      we are.

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>

ezyang added a commit to ezyang/cabal that referenced this issue Sep 10, 2015

Make InstalledUnitId (PackageKey) the primary identifier of packages.
In Backpack, it is useful for a Cabal file to contain multiple
components (henceforth referred to as units), which can be built
and installed individually.  However, it would be inappropriate to
identify these units by an 'InstalledPackageId' (since a package
can contain multiple units), so this patch set changes all relevant
occurrences of 'InstalledPackageId' to 'InstalledUnitId'.

NOTE: This is a lie, this patch actually renames the relevant
occurrences of 'InstalledPackageId' to the (existing) 'PackageKey';
the actual renaming will occur in a separate patch.

Simultaneously, this patch gets rid of the distinction between
'InstalledPackageId' and 'PackageKey' as much as possible (haskell#2745), so that
the 'InstalledPackageId' is (for non-Backpack packages) directly
equivalent to an 'InstalledUnitId' that can be used by GHC for
linker symbols and type equality.  This means that Cabal directly computes
'InstalledPackageId' during the configure step (very similar to how
'PackageKey' was computed.)  The upshot is that GHC and Cabal only ever
have to care about 'InstalledUnitId' (whereas prior to this patch you
had to juggle 'InstalledPackageId' and 'PackageKey'.)

The rest of the patch is to deal with the fallout of these two changes.
Here's the capsule summary of the rest of the changes:

    - 'id' is an InstalledUnitId; 'package-id' records the
      old (now non-unique) InstalledPackageId

    - 'depends' is now a list of InstalledUnitIds

    - New 'abi' field to record what the ABI of a unit is
      (as the InstalledUnitId is no longer computed by
      looking at the output of ghc --abi-hash).

    - The 'HasInstalledPackageId' typeclass is renamed to
      'HasInstalledUnitId'.

    - GHC 7.10 has explicit compatibility handling with
      a 'compatPackageKey' (an 'InstalledUnitID') which is
      in a compatible format.  The value of this is read out
      from the 'key' field.

    - An install path is is based off of the installed package
      ID, replacing the "library name" (which was derived by
      a package key.) There's a new variable '$ipid', and
      the old '$pkgkey' and '$libname' are updated to use this
      variable.

    - "-inplace" IPIDs are completely retired.  An internal
      package ID is now simply pkgname-pkgversion.  You can
      use the --ipid flag to explicitly ask for a different
      inplace IPID.

    - When we register a non-inplace package, we check to see
      if we are clobbering an existing IUID, and bail out if
      we are.

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>

ezyang added a commit to ezyang/cabal that referenced this issue Sep 10, 2015

Make InstalledUnitId (PackageKey) the primary identifier of packages.
In Backpack, it is useful for a Cabal file to contain multiple
components (henceforth referred to as units), which can be built
and installed individually.  However, it would be inappropriate to
identify these units by an 'InstalledPackageId' (since a package
can contain multiple units), so this patch set changes all relevant
occurrences of 'InstalledPackageId' to 'InstalledUnitId'.

NOTE: This is a lie, this patch actually renames the relevant
occurrences of 'InstalledPackageId' to the (existing) 'PackageKey';
the actual renaming will occur in a separate patch.

Simultaneously, this patch gets rid of the distinction between
'InstalledPackageId' and 'PackageKey' as much as possible (haskell#2745), so that
the 'InstalledPackageId' is (for non-Backpack packages) directly
equivalent to an 'InstalledUnitId' that can be used by GHC for
linker symbols and type equality.  This means that Cabal directly computes
'InstalledPackageId' during the configure step (very similar to how
'PackageKey' was computed.)  The upshot is that GHC and Cabal only ever
have to care about 'InstalledUnitId' (whereas prior to this patch you
had to juggle 'InstalledPackageId' and 'PackageKey'.)

The rest of the patch is to deal with the fallout of these two changes.
Here's the capsule summary of the rest of the changes:

    - 'id' is an InstalledUnitId; 'package-id' records the
      old (now non-unique) InstalledPackageId

    - 'depends' is now a list of InstalledUnitIds

    - New 'abi' field to record what the ABI of a unit is
      (as the InstalledUnitId is no longer computed by
      looking at the output of ghc --abi-hash).

    - The 'HasInstalledPackageId' typeclass is renamed to
      'HasInstalledUnitId'.

    - GHC 7.10 has explicit compatibility handling with
      a 'compatPackageKey' (an 'InstalledUnitID') which is
      in a compatible format.  The value of this is read out
      from the 'key' field.

    - An install path is is based off of the installed package
      ID, replacing the "library name" (which was derived by
      a package key.) There's a new variable '$ipid', and
      the old '$pkgkey' and '$libname' are updated to use this
      variable.

    - "-inplace" IPIDs are completely retired.  An internal
      package ID is now simply pkgname-pkgversion.  You can
      use the --ipid flag to explicitly ask for a different
      inplace IPID.

    - When we register a non-inplace package, we check to see
      if we are clobbering an existing IUID, and bail out if
      we are.

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>

ezyang added a commit to ezyang/cabal that referenced this issue Sep 10, 2015

Make InstalledUnitId (PackageKey) the primary identifier of packages.
In Backpack, it is useful for a Cabal file to contain multiple
components (henceforth referred to as units), which can be built
and installed individually.  However, it would be inappropriate to
identify these units by an 'InstalledPackageId' (since a package
can contain multiple units), so this patch set changes all relevant
occurrences of 'InstalledPackageId' to 'InstalledUnitId'.

NOTE: This is a lie, this patch actually renames the relevant
occurrences of 'InstalledPackageId' to the (existing) 'PackageKey';
the actual renaming will occur in a separate patch.

Simultaneously, this patch gets rid of the distinction between
'InstalledPackageId' and 'PackageKey' as much as possible (haskell#2745), so that
the 'InstalledPackageId' is (for non-Backpack packages) directly
equivalent to an 'InstalledUnitId' that can be used by GHC for
linker symbols and type equality.  This means that Cabal directly computes
'InstalledPackageId' during the configure step (very similar to how
'PackageKey' was computed.)  The upshot is that GHC and Cabal only ever
have to care about 'InstalledUnitId' (whereas prior to this patch you
had to juggle 'InstalledPackageId' and 'PackageKey'.)

The rest of the patch is to deal with the fallout of these two changes.
Here's the capsule summary of the rest of the changes:

    - 'id' is an InstalledUnitId; 'package-id' records the
      old (now non-unique) InstalledPackageId

    - 'depends' is now a list of InstalledUnitIds

    - New 'abi' field to record what the ABI of a unit is
      (as the InstalledUnitId is no longer computed by
      looking at the output of ghc --abi-hash).

    - The 'HasInstalledPackageId' typeclass is renamed to
      'HasInstalledUnitId'.

    - GHC 7.10 has explicit compatibility handling with
      a 'compatPackageKey' (an 'InstalledUnitID') which is
      in a compatible format.  The value of this is read out
      from the 'key' field.

    - An install path is is based off of the installed package
      ID, replacing the "library name" (which was derived by
      a package key.) There's a new variable '$ipid', and
      the old '$pkgkey' and '$libname' are updated to use this
      variable.

    - "-inplace" IPIDs are completely retired.  An internal
      package ID is now simply pkgname-pkgversion.  You can
      use the --ipid flag to explicitly ask for a different
      inplace IPID.

    - When we register a non-inplace package, we check to see
      if we are clobbering an existing IUID, and bail out if
      we are.

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
@ezyang

This comment has been minimized.

Copy link
Contributor Author

commented Sep 23, 2015

See also #2830

ezyang added a commit to ezyang/cabal that referenced this issue Sep 24, 2015

Make InstalledUnitId (PackageKey) the primary identifier of packages.
In Backpack, it is useful for a Cabal file to contain multiple
components (henceforth referred to as units), which can be built
and installed individually.  However, it would be inappropriate to
identify these units by an 'InstalledPackageId' (since a package
can contain multiple units), so this patch set changes all relevant
occurrences of 'InstalledPackageId' to 'InstalledUnitId'.

NOTE: This is a lie, this patch actually renames the relevant
occurrences of 'InstalledPackageId' to the (existing) 'PackageKey';
the actual renaming will occur in a separate patch.

Simultaneously, this patch gets rid of the distinction between
'InstalledPackageId' and 'PackageKey' as much as possible (haskell#2745), so that
the 'InstalledPackageId' is (for non-Backpack packages) directly
equivalent to an 'InstalledUnitId' that can be used by GHC for
linker symbols and type equality.  This means that Cabal directly computes
'InstalledPackageId' during the configure step (very similar to how
'PackageKey' was computed.)  The upshot is that GHC and Cabal only ever
have to care about 'InstalledUnitId' (whereas prior to this patch you
had to juggle 'InstalledPackageId' and 'PackageKey'.)

The rest of the patch is to deal with the fallout of these two changes.
Here's the capsule summary of the rest of the changes:

    - 'id' is an InstalledUnitId; 'package-id' records the
      old (now non-unique) InstalledPackageId

    - 'depends' is now a list of InstalledUnitIds

    - New 'abi' field to record what the ABI of a unit is
      (as the InstalledUnitId is no longer computed by
      looking at the output of ghc --abi-hash).

    - The 'HasInstalledPackageId' typeclass is renamed to
      'HasInstalledUnitId'.

    - GHC 7.10 has explicit compatibility handling with
      a 'compatPackageKey' (an 'InstalledUnitID') which is
      in a compatible format.  The value of this is read out
      from the 'key' field.

    - An install path is is based off of the installed package
      ID, replacing the "library name" (which was derived by
      a package key.) There's a new variable '$ipid', and
      the old '$pkgkey' and '$libname' are updated to use this
      variable.

    - "-inplace" IPIDs are completely retired.  An internal
      package ID is now simply pkgname-pkgversion.  You can
      use the --ipid flag to explicitly ask for a different
      inplace IPID.

    - When we register a non-inplace package, we check to see
      if we are clobbering an existing IUID, and bail out if
      we are.

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>

ezyang added a commit to ezyang/cabal that referenced this issue Sep 24, 2015

Make InstalledUnitId (PackageKey) the primary identifier of packages.
In Backpack, it is useful for a Cabal file to contain multiple
components (henceforth referred to as units), which can be built
and installed individually.  However, it would be inappropriate to
identify these units by an 'InstalledPackageId' (since a package
can contain multiple units), so this patch set changes all relevant
occurrences of 'InstalledPackageId' to 'InstalledUnitId'.

NOTE: This is a lie, this patch actually renames the relevant
occurrences of 'InstalledPackageId' to the (existing) 'PackageKey';
the actual renaming will occur in a separate patch.

Simultaneously, this patch gets rid of the distinction between
'InstalledPackageId' and 'PackageKey' as much as possible (haskell#2745), so that
the 'InstalledPackageId' is (for non-Backpack packages) directly
equivalent to an 'InstalledUnitId' that can be used by GHC for
linker symbols and type equality.  This means that Cabal directly computes
'InstalledPackageId' during the configure step (very similar to how
'PackageKey' was computed.)  The upshot is that GHC and Cabal only ever
have to care about 'InstalledUnitId' (whereas prior to this patch you
had to juggle 'InstalledPackageId' and 'PackageKey'.)

The rest of the patch is to deal with the fallout of these two changes.
Here's the capsule summary of the rest of the changes:

    - 'id' is an InstalledUnitId; 'package-id' records the
      old (now non-unique) InstalledPackageId

    - 'depends' is now a list of InstalledUnitIds

    - New 'abi' field to record what the ABI of a unit is
      (as the InstalledUnitId is no longer computed by
      looking at the output of ghc --abi-hash).

    - The 'HasInstalledPackageId' typeclass is renamed to
      'HasInstalledUnitId'.

    - GHC 7.10 has explicit compatibility handling with
      a 'compatPackageKey' (an 'InstalledUnitID') which is
      in a compatible format.  The value of this is read out
      from the 'key' field.

    - An install path is is based off of the installed package
      ID, replacing the "library name" (which was derived by
      a package key.) There's a new variable '$ipid', and
      the old '$pkgkey' and '$libname' are updated to use this
      variable.

    - "-inplace" IPIDs are completely retired.  An internal
      package ID is now simply pkgname-pkgversion.  You can
      use the --ipid flag to explicitly ask for a different
      inplace IPID.

    - When we register a non-inplace package, we check to see
      if we are clobbering an existing IUID, and bail out if
      we are.

Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>
@ezyang

This comment has been minimized.

Copy link
Contributor Author

commented Mar 30, 2016

OK I think this is done.

@ezyang ezyang closed this Mar 30, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.