-
Notifications
You must be signed in to change notification settings - Fork 276
Extended Dependency Generation #245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
I have questions about this proposal as is:
|
|
I am totally for the goal, but suspicious of the details. Namely this pre-/post- stuff does match what GHC does today, but is a fundamentally impure formalism: GHC needs to read other files than its pre-depends, so there's no way to sandbox build steps up front. There's no way I could directly plug this info into Nix. The corresponding pure formalism is dynamic /monadic dependencies. Some build steps instead of creating regular file outputs create more build plan. For example for the TH case GHC really ought to serialize it's continuation as a dynamic build step with an extra dependency on the to-be-read file. Now it may just be easier to restart to job with the extra dep than "really" serialize the continuation, but that can be viewed as an implementation detail. I would really like us to start with the pure formalism now, shoehorning the compile as needed in the short term but improving the fit as time goes on. CC @taktoa. |
|
OK here's a concrete proposal. GHC takes a list of paths which it assumes are up to date. If it only reads files within the set, the operation succeeds. If it needs read files outside of that step, it fails, but provides the missing path in the dumped info. This is a small change, but effectively mediates between GHC and the external build system.
This is still distasteful to me; not one of the nice proper architectures for incremental computation. But |
|
On a completely different note #243 along with https://gitlab.haskell.org/ghc/ghc/issues/10871 is very good for these sorts of things. Note that once all splices are run, GHC still has must of its work cut out for it, yet all the dynamism is gone. If we separate the TH cleanly, and serialize the post-splicing parsed source in one of those sat interface files, we can invoke GHC twice where the second time it does most of the slower work. Note this all means we can be fine grained in two dimensions: invoking GHC once per module, and twice per pipeline (two groupings of the pipeline stages). In particular, each TH stage can get it's own cheep downsweep, and spliced module imports no longer ruin everything since stage separated imports limits the dynamism induced by them. |
|
@Ericson2314, We discussed a bit over IRC, would this be an accurate change to the proposal: ghc will collecting dependencies throughout compilation and reporting them all (even if compilation fails). If compilation fails due to a missing dependency (or dependencies), then the missing dependency(ies) should be reported as such. A "reasonable" effort should be made to report as many missing dependencies as possible before failing. "Reasonable" means avoiding doing too much more work. The motivation for this is that a build system would then be able to generate the missing dependency and continue (in practice restart) the build. Without this behaviour, the build system may have no easy way to discover and generate such dependencies. |
Introducing some crude sandboxing is an interesting idea. It seems like an easy next step implementation-wise. IIUC this would be a way to check that the build system has a complete set of dependencies, which would help to identify bugs in a build system. It feels to me like this is slightly out of scope of this proposal. I Imagine it would be implemented as a separate feature. Do you think it should be included in this proposal? |
I agree with that.
Actually I no longer think that is needed. First some background for everyone not on IRC at that time. When we say "fails due to a missing dependency", I think it's important to consider that a missing file and out of date file are semantically both missing. With a traditional un-sandboxed build system, GHC has no way of knowing whether the file it reads is stale or not. Traditionally with I really want to avoid this buggy statefulness. But that doesn't mean helping build systems that aren't sound be sound; that's their problem. I realized how the proposed stuff can work with Nix (or Nix + NixOS/rfcs#40) without changes, so that satisfies me. For anyone curious, here's how:
So again avoiding worrying to much perfection, I'm satisfied that whatever GHC throws at us today can be shoehorned into soundness with a sufficiently crafty build system. I then hope such a build system can be used to speed up development of GHC itself, which will make it practical for the first time to undertake the major refactors that allow for much idealistic schemes ("dank incremental GHC"). The better in fact begets the perfect. |
5fdc8f4 to
2172f25
Compare
32f594a to
f5b0858
Compare
|
In the scope of my summer internship at Tweag, this is very relevant. I'm working on improving support for Haskell projects in the Bazel build system. Specifically, I'm working on building a persistent worker wrapping GHC. I've been able to make a working prototype using just GHC API but this doesn't scale for real challenges such as incremental compilation. We haven't worked out all details so far but here is a rough sketch of why I need this proposal to happen. Bazel sends compilation requests to a resident process (the worker) with subsets of project contents as inputs and expects the worker to build a target from the inputs (either a Haskell library or binary). In order to cache results of serving such requests, we need to get a handle on dependencies inside that inputs and recompile in one-shot mode only what is needed. |
|
@ulysses4ever you should talk to @DanielG whose GSOC work on GHC is for doing basically the same thing for the Haskell IDE engine instead of Bazel. |
|
@Ericson2314 thanks for the heads up! We discussed this a bit with @mpickering, who also referenced that GSoC project. My current understanding is that there is a subtle but essential (in my view) difference between these two tasks. Namely, HIE serves compilation requests with the per-file granularity. While Bazel sends compilation requests with per-subset granularity expecting the server to produce a linked result (a library or binary). |
|
@ulysses4ever By "subset" you mean a subset of all files, which are invalidated since last invocation? That's a difference but I don't feel like it's an essential one. Perhaps you need this proposal more than him, but that's fine. I suppose my point was a more off topic one: everyone benefits from a better division of labor in GHC between the "pure functional compilation" and build-system-esque crawling the files and caching. Haskell-ide-engine and Bazel want to be completely in charge of the latter. Given the prominence of LSP, I can imagine a future version of Bazel that tracks individual spans, not just files, and likewise expects the persistent worker to take individual span updates. Then yours and @DanielG's use-cases converge. |
|
By subset I mean a set of files in the project that are known to constitute a target (bin or lib) + their hashes. From these data, the worker should decide by itself what is up to date and what is not since the last invocation. It also has to preserve intermediate artifacts like .o and .hi. You may be quite right about convergence, I agree. |
|
@phadej, Thanks for the questions via IRC. I'll try to respond here and in the next few comments. How can we discover any dependencies before running CPP? Good question! You are right, EDIT I've decided we to simply move CPP deps to the precompilation dependencies. When implementing the |
|
Why do we report plugins twice: as Yes this is a good point. This was in the original proposal, and I left it there case there is some other options that could introduce dependencies, but I think this is a poor justification. We don't want consumers of this information to have to parse ghc command line arguments. If there is something from the options that introduces dependencies, then we should report those things explicitly as we do with plugins.
|
|
Plugins can declare if recompilation is necessary based on the plugin options (see plugins docs and proposal). Thanks for pointing this out. It looks like we could incorporate this info reporting the EDIT I've confirmed that plugins can in fact use addDependentFiles. |
|
@ulysses4ever, great to hear there is another potential use case! Is there anything in particular that you think is missing from the output we intent to generate? |
|
As there seems to be little interest in adding a |
Remove `OPTIONS_GHC` pragma output. Note `addDependentFile` may come from plugins as well as TH. Remove all items in the unresolved questions section. A DEPENDS pragma is left as future work as there doesn't seem to be any interest in this.
|
Just-opened https://github.com/michaelpj/ghc-proposals/blob/white-box-interface-files/proposals/0000-white-box-interface-files.rst will help in that if you do only some of compilation, you should be able to "save your progress" not just output dependencies. I think it would want an untyped-HIE file for parsing, maybe also renaming, and certainly hi-wb files for type checking and desugaring. This would permot extremely fine-grained incremental compilation. hi-wb files can also be used to keep the hi files "free" of specialized and inlinable definitions, so type checking isn't unecessarilly invalidated. This conbines well with #243 for removing TH as I mentioned above, too. CC @michaelpj. |
|
This information would be super helpful to Hadrian I suspect (cc @snowleopard). At the moment to compile N modules correctly requires running Only query is why allow globs in the files queried? I don't see why that is ever useful, but it does seem to complicate the design, not least by requiring a semantics for globs. |
|
Related to the above, I'm thrilled that http://www.well-typed.com/blog/2019/08/exploring-cloud-builds-in-hadrian/ has brought up these filesystem access errors. I believe we can and must get to 0, and that is basically the criteria for a good design for this feature. |
|
@ndmitchell Indeed, this should help Hadrian too! Let me also link this relevant discussion: |
|
@ndmitchell, the motivation to include globs was to capture cases where GHC might itself use glob patterns but now that you mention it I did a little searching and I suspect GHC never uses glob patters to find dependencies. I'll remove the glob patterns from the proposal, as they don't seem necessary. |
These were intended to capture the cases where GHC uses glob patterns, but I suspect GHC does not do that.
|
This is a very promising proposal, would love to see this implemented. Also @Ericson2314 thanks for your extensive comments, they have some valuable ideas that will take me some time to digest. |
|
I'm following a request for feedback on this proposal. Thanks for pointing me here! And thanks to the authors for working on this! I'm working on rules_haskell, the Bazel extension for Haskell. Bazel does not support dynamic dependencies in its Starlark API, so we cannot use dynamic dependency discovery to construct the dependency graph. However, Bazel does support pruning unused dependencies via Bazel builds are sandboxed and all required dependencies have to be declared upfront. Otherwise, they will be missing during the build step causing build errors. Without dynamic dependencies it is difficult to achieve Haskell module level granularity with Bazel build actions. And manually maintaining a module level dependency graph is very cumbersome. One way around this is caching on the level of a persistent worker as described above by @ulysses4ever. Another approach is In that context something like In summary, I think |
|
@dcbaker we would love your input on the matter :) |
|
I've given this a brief look, I'm sick at the moment, so i think I'll reserve my full analysis until I feel better and have a clear head :) I think this is the right direction, though for my use case (as a meson developer wanting a plan that is language agnostic) this is fairly GHC specific. I should probably go through the exercise of implementing GHC support in Meson to familiarize myself with its command line and features a bit more :) |
|
@DavidEichmann Where are we currently with this proposal? :) |
|
Hello, it is now 2024 and I don't see much disagreement with the proposal. Can we move this forward in any way, it seems there are a lot of upstream consumers (hadrian, cabal, stack) that could make good use of this feature. @DavidEichmann are you still available to implement? @dcbaker any further comments? |
|
@doyougnu I don't think @DavidEichmann is actively working on Haskell things these days and so is unlikely to pick this up (though he's welcome to correct me). There is recent interest from others though (https://gitlab.haskell.org/ghc/ghc/-/issues/24384 and https://gitlab.haskell.org/ghc/ghc/-/merge_requests/11994), so hopefully that will result in some progress. Perhaps @TerrorJack or @wavewave may be able to say more about their plans. Stepping back a bit, this feature primarily needs implementation effort (and perhaps input from build system experts), rather than an upfront design as a GHC proposal. So it's not obvious to me that moving something through the GHC proposals process is necessarily the right way to make progress. (Though if someone working on this wants to write up a proposal as a way to articulate their design and gather feedback, they are certainly welcome to do so.) |
This is a proposal to add a new GHC feature that would output detailed build dependency information in a machine readable format.
Rendered