Skip to content
This repository has been archived by the owner on Aug 2, 2020. It is now read-only.

Get rid of ghc-cabal and package-data.mk #18

Closed
snowleopard opened this issue Dec 21, 2015 · 67 comments
Closed

Get rid of ghc-cabal and package-data.mk #18

snowleopard opened this issue Dec 21, 2015 · 67 comments
Assignees

Comments

@snowleopard
Copy link
Owner

A large part of the build system is dedicated to dancing around package-data.mk files containing package-related data (such as package name, version, dependencies, etc.). These files are generated by utils/ghc-cabal program, which also needs to be built in a non-trivial way. See Rules/Data.hs and Oracles/PackageData.hs in particular.

Our long term plan is to eliminate this and get rid of ghc-cabal and package-data.mk files altogether. See Rules/Cabal.hs where Distribution.Package is used instead of ghc-cabal to extract package dependencies directly from .cabal files.

A potential solution will need to be carefully thought through and discussed. This is the thread to do this.

Let me formulate the rationale behind this more clearly:

  • Separation of build artefacts from sources: ghc-cabal creates package-data.mk files in a fixed location inside a package directory. This prevents us from moving build artefacts outside the source tree Move build products out of the ghc tree #113.
  • Performance: ghc-cabal always runs expensive configure scripts, significantly affecting performance of the whole build system. On top of that, parallel invocations of ghc-cabal are currently broken.
  • Complexity: Having a standalone ghc-cabal program built in a non-trivial way and package-data.mk files that need to be parsed significantly adds to the complexity of the build system and hence makes it more difficult to understand and maintain. Right now this is undoubtedly the most complicated and unreliable part of the build system.
@snowleopard
Copy link
Owner Author

Another example of accessing the information stored in *.cabal files directly: cf825fe

@angerman
Copy link
Collaborator

angerman commented Jan 3, 2016

My understanding is, that we would basically absorb ghc-cabal into the shake based build system, instead of shelling out to ghc-cabal?

@snowleopard
Copy link
Owner Author

@angerman Yes, that's the intention.

@snowleopard
Copy link
Owner Author

This is now being attempted in ghc-cabal branch. I merged @bgamari's PR #108 and now trying to compile. Something is wrong with deriving Generic.

@snowleopard
Copy link
Owner Author

@ndmitchell @angerman I'd appreciate your help with ghc-cabal branch if you've got time, as I won't be able to contribute much in the next 24h. The goal is to compile it and make sure it doesn't break the build. I'll then take over and use it to finalise #113.

@ndmitchell
Copy link
Collaborator

I'm not at a machine with MSYS/Mingw in the next 24 hours, so I'm not much use I'm afraid.

@angerman
Copy link
Collaborator

I'll give this a try.

@angerman angerman self-assigned this Jan 12, 2016
@angerman
Copy link
Collaborator

Alright, I've tried to give this a go. But I think this is not possible, I believe we run into this: https://ghc.haskell.org/trac/ghc/ticket/10514. We simply can not derive Generic for GADTs as it seems.
I.e. I couldn't even get this to compile in any way

data X a where
    A :: X Bool
    B :: X [String]

using deriving or standalone deriving.

  • deriving erring out with:

    Can't make a derived instance of ‘Generic (X a)’:
      Constructor ‘A’ has existentials or constraints in its type
      Constructor ‘B’ has existentials or constraints in its type
      Possible fix: use a standalone deriving declaration instead
    In the data declaration for ‘X’
    
  • Standalone Deriving erring out with:

    Can't make a derived instance of ‘Generic (X a)’:
      A must be a vanilla data constructor, and
      B must be a vanilla data constructor
    In the stand-alone deriving instance for
      ‘Generic a => Generic (X a)’
    

While I do not see Generic explicitly in @bgamari #108, even deriving instance Hashable (X a) wants Generic (X a).

@ndmitchell
Copy link
Collaborator

Can't you just write your own Hashable by hand? It shouldn't be too difficult.

@angerman
Copy link
Collaborator

Did I ever mention I have no idea what I'm doing? ;-)

We also need Binary. Let me see if I manage to build Hashable. There is however https://hackage.haskell.org/package/instant-hashable

@ndmitchell
Copy link
Collaborator

You can do deriving Show on a GADT, and then you can just make Hashable by doing hashWithSalt salt x = hashWithSalt salt $ show x. For Binary you need to undergo a bit more pain, and do it by hand.

@snowleopard
Copy link
Owner Author

This is what we have in Way.hs. Does this help?

-- Instances for storing in the Shake database
instance Binary Way where
    put = put . show
    get = fmap read get

instance Hashable Way where
    hashWithSalt salt = hashWithSalt salt . show

instance NFData Way where
    rnf (Way s) = s `seq` ()

@angerman
Copy link
Collaborator

@ndmitchell thanks!

@snowleopard if we had Read, that would work.

@angerman
Copy link
Collaborator

Hmm. We could of course just go ahead and implement read for each and every single one on our own ...

@ndmitchell
Copy link
Collaborator

@snowleopard, if you're going to do that, you might as well do: instance NFData Way where rnf = rnf . show, which again is entirely relying on read/show. Not sure how hard it is to do Read for a GADT, or if it's even possible with the given type signature.

@ggreif
Copy link
Contributor

ggreif commented Jan 12, 2016

You can (kind of) read a GADT, but the type index must be
existentially hidden:

data Hidden :: (* -> *) -> * where
  Hide :: gadt index -> Hidden gadt

So if you have

data MyGadt :: * -> * where
  C0 :: MyGadt ()
  C1 :: a -> MyGadt a

You can write a Read instance on Hidden MyGadt.

On 1/12/16, Neil Mitchell notifications@github.com wrote:

@snowleopard, if you're going to do that, you might as well do: instance NFData Way where rnf = rnf . show, which again is entirely relying on
read/show. Not sure how hard it is to do Read for a GADT, or if it's
even possible with the given type signature.


Reply to this email directly or view it on GitHub:
#18 (comment)

@ndmitchell
Copy link
Collaborator

I wonder if the GADTs are an essential part of this patch, or could be removed? They seem to be causing a lot of problems.

@angerman
Copy link
Collaborator

I wonder if the GADTs are an essential part of this patch, or could be removed? They seem to be causing a lot of problems.

Well. I must admit, I've hit a wall trying to get Binary to work.

@snowleopard
Copy link
Owner Author

I am pretty sure GADTs can be dropped at the cost of losing some type-safety.

@snowleopard snowleopard modified the milestones: tree-tremble, first-shake Jan 20, 2016
@snowleopard
Copy link
Owner Author

I've changed the milestone to tree-tremble. It looks like we won't settle this issue before first-shake, as I'd like to release it this week.

@ezyang
Copy link

ezyang commented Aug 15, 2016

Hey Cabal dev here. Are there things we can do to help? I think this is definitely a place where we can take changes to the Cabal library to make your life easier.

@snowleopard
Copy link
Owner Author

snowleopard commented Aug 16, 2016

Hey @ezyang!

Sure, any help would be great. Basically: we'd like to integrate Cabal into Hadrian in a more natural way: as a Haskell library, instead of calling ghc-cabal executable. That would greatly simplify Hadrian, get rid of stringly-typed oracle-ified interface to ghc-cabal, and will allow to optimise (re)builds, e.g. when there is no need for re-running configure scripts. If you'd like to try this I'd be happy to help make sense of the Hadrian codebase.

@ezyang
Copy link

ezyang commented Aug 16, 2016

I feel like we may have had this discussion before, but if you access Cabal directly as a library, how are you going to make sure Hadrian gets built against the most recent (bootstrap) version of Cabal? Assuming you have the correct version of Cabal, I think you just want a Shake rule for the configure step (not quite just configure though because we also need to run ./configure scripts), and then read out the things off of LocalBuildInfo. If you have a story for bootstrapping then I can probably do this part.

A more ambitious thing to do is completely Shake-ify Cabal's build system, and then slot in those rules into Hadrian. Probably not now!

@snowleopard
Copy link
Owner Author

how are you going to make sure Hadrian gets built against the most recent (bootstrap) version of Cabal?

We already use Cabal library (see src/Rules/Cabal.hs) to determine GHC package dependencies, and we build Hadrian using the in-tree Cabal by passing -i../libraries/Cabal/Cabal. So, I think this is a solved problem already. Have a look at *.sh, cabal.project, hadrian.cabal and stack.yaml files.

I think you just want a Shake rule for the configure step (not quite just configure though because we also need to run ./configure scripts), and then read out the things off of LocalBuildInfo.

Yep, this sounds right.

@ezyang
Copy link

ezyang commented Nov 17, 2016

So, I recently circled back to thinking about this problem.

The stated reason why we can't eliminate stage0/stage1/stage2 directories is because ghc-cabal hard codes where it places package-data.mk files. But I don't see any reason why we couldn't add a new flag to ghc-cabal to output the data, e.g., to stdout, where we could slurp it up directly, solving point (1).

I reviewed the PR at #166 (is there more significant code?) and I did not see any reason to believe that issues (2) or (3) could be addressed without significantly expanding the scope of the Cabal changes. Indeed, for example, see #170 for an example of where you MUST modify the Cabal library code, as it makes assumptions about the current working directory which you cannot easily manage in process. Another example is the calls to ./configure: that is managed by autoconfUserHooks and if you want to do caching you are going to have to drill down into this code.

To be clear, it would be very nice if we could rewrite Cabal's build system (in the Cabal library) in Shake. It's something that I've had my eye on for a while. But in my eye, it makes sense to shave that yak first, and then integrate those rules with Hadrian (we'll have to work a bit to make sure the two build systems compose.)

@snowleopard
Copy link
Owner Author

snowleopard commented Nov 17, 2016

@ezyang Thanks for your input!

But I don't see any reason why we couldn't add a new flag to ghc-cabal to output the data, e.g., to stdout, where we could slurp it up directly, solving point (1).

Yes, something like this is indeed possible, although ghc-cabal outputs not only package-data.mk files but also some autogen stuff. So, the new flag to ghc-cabal could specify the output directory for created files. Note though that running configure scripts in-tree causes other files to be built in-tree, which is still undesirable.

I reviewed the PR at #166 (is there more significant code?)

Yes, I believe this is the biggest effort on this.

Indeed, for example, see #170 for an example of where you MUST modify the Cabal library code, as it makes assumptions about the current working directory which you cannot easily manage in process. Another example is the calls to ./configure: that is managed by autoconfUserHooks and if you want to do caching you are going to have to drill down into this code.

Oh, I didn't realise that. So, you are saying the only way we can solve this issue is to change the Cabal library itself? That's definitely not what we had in mind.

To be clear, it would be very nice if we could rewrite Cabal's build system (in the Cabal library) in Shake.

I am a bit confused here. In this ticket we do not want to pursue this goal. All we want to do is to teach Hadrian to extract package metadata from .cabal files automatically, and perhaps occasionally run configure scripts of some of the packages that require that. We do not want to build anything using Cabal library, all building is (or rather should be) taken care of by Hadrian!

@Ericson2314
Copy link

Ericson2314 commented Nov 17, 2016

I am a bit confused here.

@snowleopard the idea is rather than Hadrian scraping text files, it could link Cabal directly and use it to generate shake rules---the same shake Cabal it itself would use. Very elegant!

@Ericson2314
Copy link

Also see haskell/cabal#4047 . The plan there would teach cabal-install about different ghc's for purpose's of cross compiling, but the same logic is equally applicable for bootstrapping. A shaked-up Cabal naturally leads to a Shaked-up cabal-install, yielding the same slick integration as described above. [I believe it's also a goal to factor out the solver from cabal-install as its own library.]

@snowleopard
Copy link
Owner Author

the idea is rather than Hadrian scraping text files, it could link Cabal directly and use it to generate shake rules---the same shake Cabal it itself would use. Very elegant!

@Ericson2314 I see, this would be great and elegant, but how soon is it going to happen? Do we want to depend on this future Cabal feature in this project? My intuitive answer is No.

I think you (and @ezyang) are saying that it just doesn't make sense to invest time in solving this issue until we have a better Cabal library, which does sound reasonable. Perhaps, we shouldn't bother indeed.

This issue is not on the crititical path for Hadrian to replace the old build system, so I'm fine if we decide not to pursue it until it becomes trivial to resolve with shakified Cabal.

@Ericson2314
Copy link

I'm not in a position propose whether a stopgap is worth it, just wanted to point out that eventually things can be really really slick.

Do we want to depend on this future Cabal feature in this project? My intuitive answer is No.

Do you mean depend on linking Cabal before Cabal is rewritten with Shake? I agree that would be silly. I assume from the rest of your post that you would like to use it once Cabal is so rewritten.

@snowleopard
Copy link
Owner Author

snowleopard commented Nov 17, 2016

Do you mean depend on linking Cabal before Cabal is rewritten with Shake?

@Ericson2314 Sorry, I didn't express myself well. I meant we don't want to wait for the shakified Cabal before releasing Hadrian as the main build system of GHC, i.e. we don't want to depend on it just yet.

I assume from the rest of your post that you would like to use it once Cabal is so rewritten.

Yes, it sounds promising.

@ezyang
Copy link

ezyang commented Nov 17, 2016

All we want to do is to teach Hadrian to extract package metadata from .cabal files automatically, and perhaps occasionally run configure scripts of some of the packages that require that.

Right. And unfortunately, to get the metadata, we have to run the equivalent of a ./Setup configure step, which is a big blob of code which, among other things, assumes that the CWD is the same directory as the package.

I see, this would be great and elegant, but how soon is it going to happen? Do we want to depend on this future Cabal feature in this project? My intuitive answer is No.

Yes, it is not going to happen for a while, so don't block on it. But on the other hand, I don't think it is an insurmountably complex project (ignoring Custom builds, for the moment.)

I might make an attempt at this over Christmas break. One big question I have, though, is how to setup any such build system so that it can be integrated with Hadrian. Hadrian has its own DSL going on for assembling command line calls and I wonder if that should be factored into a library that cabal-shake could use.

@snowleopard
Copy link
Owner Author

One big question I have, though, is how to setup any such build system so that it can be integrated with Hadrian. Hadrian has its own DSL going on for assembling command line calls and I wonder if that should be factored into a library that cabal-shake could use.

@ezyang Ah, yes! We do want to factor out the generic DSL part from Hadrian into a separate library. I didn't prioritise this but I can look into it soon (December-ish) if you would like to use it.

@ezyang
Copy link

ezyang commented Nov 18, 2016

It's not that urgent on my end; what I will probably do if I actually attack this problem is to just copy paste some of the low level combinators into my implementation, with the idea of dropping it for the library later. ;)

@bgamari
Copy link
Collaborator

bgamari commented Nov 3, 2017

This idea came up again in #395.

I have confirmed with @dcoutts that lib:Cabal indeed assumes that the cwd is the the root of the package being built. It doesn't sound like this is going to change in the near future.

@snowleopard
Copy link
Owner Author

Done in #531 🎉

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
GHC Transcendence
Ideas & Research
Development

Successfully merging a pull request may close this issue.

7 participants