Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GHC's base libraries: Combining stability with innovation #51

Merged
Merged
Changes from 19 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
e6194f9
Initial conversion from google doc
Ericson2314 Jun 11, 2023
adff787
Update 050-ghc-base-libraries.rst
adamgundry Jun 11, 2023
e492131
Reword goals
adamgundry Jun 14, 2023
8ce3f1b
Minor formatting fixes
adamgundry Jun 14, 2023
9b89aa5
Apply suggestions from code review
Ericson2314 Jun 15, 2023
9f5a4e9
Merge pull request #2 from adamgundry/patch-1
Ericson2314 Jun 15, 2023
2dcf6b4
Remove idea on dependency analysis
Ericson2314 Jun 24, 2023
0ea3a27
Introduce 3 libraries in dependency not reverse dependency order
Ericson2314 Jun 24, 2023
934a740
Mention cabal description and readme under social means
Ericson2314 Jun 24, 2023
6a3b708
Mention reaching out to HLint team
Ericson2314 Jun 24, 2023
1e54527
Reword opening introducion
Ericson2314 Jun 26, 2023
7d9a71d
New number
Ericson2314 Jun 26, 2023
beecddf
A few more small changes from SPJ
Ericson2314 Jun 26, 2023
6ab9197
Remove trailing whitespace
Ericson2314 Jun 26, 2023
66c8f93
Add blank line
Ericson2314 Jun 26, 2023
d03211e
Improve dash
Ericson2314 Jun 26, 2023
f165b46
Add team affiliations (or lack thereof) to authors table
Ericson2314 Jun 26, 2023
4123e23
Fix Typo
Ericson2314 Jun 26, 2023
f3d8e74
Add link
Ericson2314 Jun 26, 2023
82263f1
Add additional link to prior discussion
Ericson2314 Jun 27, 2023
e709d67
Reword first sentence
Ericson2314 Jun 27, 2023
9cd683d
Fix Typo
Ericson2314 Jun 27, 2023
c6a8c28
New goals section from SPJ
Ericson2314 Jul 7, 2023
e75a82a
Move discouragement ideas to discussion
Ericson2314 Jul 7, 2023
8ca912a
Fix intra-proposal link
Ericson2314 Jul 7, 2023
54a55a3
Fix more reST issues
Ericson2314 Jul 7, 2023
489c809
Remove block quoting
Ericson2314 Jul 9, 2023
89a3796
Add section on reinstalling base
Ericson2314 Jul 9, 2023
c157f56
agreement -> consensus
Ericson2314 Jul 9, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
330 changes: 330 additions & 0 deletions proposals/accepted/051-ghc-base-libraries.rst
@@ -0,0 +1,330 @@
================
GHC's base libraries
================
----------
Combining stability with innovation
----------

:Date: June 2023
:Authors:
John Ericson — secretary,
Ben Gamari — GHC,
Adam Gundry — GHC,
Andrew Lelechenko — CLC,
Julian Ospald — CLC,
Simon Peyton Jones — GHC

.. sectnum::
.. contents::

Introduction
=========

This document describes an plan agreed between the GHC Team and the Core Libraries Committee, saying how we plan to work in partnership to reconcile the goals of innovation and stability for GHC and its ecosystem.
Ericson2314 marked this conversation as resolved.
Show resolved Hide resolved

Formally, then, it is not so much a proposal as a record of an outcome.
We are, nevertheless, using the Haskell Foundation Technical Working Group proposals repo, so that we can have a permanent public record of how we plan to work, so that others can comment, and so that the document can be polished to add clarity where necessary.

Goals
=====

This proposal seeks to reconcile two goals, both of which are complex and multi-faceted.

1. **The Core Libraries Committee seeks to exercise its control over decisions affecting 'base' (API, performance, semantics), including its dependencies.**
Managing API changes of ``base`` is the CLC's primary mandate.
If ``base`` changes gratuitously, everyone suffers.

2. **The GHC team seeks the freedom to:**

- Innovate in the language design.
GHC has hundreds of extensions, and people suggest more all the time, via the GHC Proposals process.

- Move rapidly to fix bugs, improve performance, and refactor GHC's internals to pay down technical debt.

The two goals are in risk of conflict, and we have seen such conflict in the past.
The purpose of this proposal is to reconcile these goals, in a way that satisfies all parties.

Each of us places different emphasis on these goals, but respects them both.
The CLC does not want to constrain GHC unnecessarily; and the GHC team does not want to undermine CLC's efforts, indeed quite the reverse!

The proposal is based on `HF Proposal: standard library reform <https://github.com/haskellfoundation/tech-proposals/pull/47>`__, but is independent of it.
Ericson2314 marked this conversation as resolved.
Show resolved Hide resolved

Things we all agree about
=========================

Here are some points that everyone agrees about:

1. ``base``, and other packages that come with GHC, should adhere rigorously to the PVP.

2. Any complicated package, certainly including ``base``, has implementation "internals" that it may want to expose to hard-core users, but not to regular clients.
The right pattern for accommodating this is described in `Nikita's blog post <https://nikita-volkov.github.io/internal-convention-is-a-mistake/>`__: have two packages, an "internals" one exposing the internals, and a "stable" one that exposes the stable API. Both adhere to the PVP.

3. We should use this pattern for ``base``.

4. The Core Libraries Committee curates the API of ``base`` (here is `the charter <https://github.com/haskell/core-libraries-committee#base-package>`__), including:

- Types, specifically including what instances are exposed

- Semantics (including strictness)

- Performance

- Semantic changes to documentation (the charter says *"Documentation changes normally fall under GHC developers purview, except significant ones (e.g., adding or changing type class laws)."*)

Beyond that, it has no interest in the implementation details (e.g. alpha renaming, moving things between modules, comments).

5. In curating the ``base`` API, it is immaterial where code lives.
For example, if a change to ``ghc-prim`` changes the ``base`` API (as defined above) the GHC developers must consult the CLC.
The fact that the change isn't physically part of the ``base`` package is immaterial.
(Incidentally, ``base`` and ``ghc-prim`` are both part of the same GitHub repository, which also includes GHC's source code.)

Proposal
========

We propose to divide ``base`` into three packages:

- ``ghc-internals``: exposes aspects of GHC's internals that may be of interest to "hard-core" developers interested in maximum performance (see `Nikita's blog post <https://nikita-volkov.github.io/internal-convention-is-a-mistake/>`__).
The API of ``ghc-internals`` is fully under the control of the GHC team, and of no direct interest to the CLC — only its effects on the API of base.

- ``base``: as now, whose API is curated by CLC.
Depends on ``ghc-internals``, and hence on ``ghc-bignum`` and ``ghc-prim``.

- ``ghc-experimental``, initially empty, depends on ``base`` and on ``ghc-internals``.
Comment on lines +86 to +92

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so the governance structure is:

  • base governed by CLC and CLC proposals
  • ghc-internals governed solely by GHC team, no proposal necessary
  • ghc-experimental governed by GHC proposal process

The Win here is that GHC Team can unilaterally make changes to ghc-internals without asking anyone, which allows them to iterate quickly. GHC Proposals can go in ghc-experimental without involving CLC directly. And CLC has less extraneous work. Feels like a win-win-win to me.

Functions and data types here are intended to have their ultimate home in base, but while they are settling down they are subject to much weaker stability guarantees.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I'm always wary of here is the tendency for things to go stale - not quite popular enough to move to base, but useful enough to stay in ghc-{experimental,internal}. Is there room for a policy on how long something should be "unstable" before getting shelved or promoted?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your concern -- and it also applies to accepted-but-not-implemented GHC proposals.

For present purposes I think we should avoid scope creep for this document, and leave it as a matter for the GHC Steering Committee

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% agree that ghc-experimental should have a clear lifecycle, and ideally an "up-or-out" policy of some kind.

Generally, new functions and types introduced in GHC Proposals would start their life here.
Example: new type families and type constructors for tuples, `GHC Proposal #475 <https://github.com/ghc-proposals/ghc-proposals/pull/475>`__.

Another example: future APIs to access RTS statistics, which are fairly stable and user-exposed, but which are (by design) coupled closely to GHC's runtime and hence may change.

As its name suggests, the API of ``ghc-experimental`` is curated by the GHC team, although the CLC is willing to offer (non-binding) opinions, if consulted.

All three packages conform rigorously to the PVP.
(But see Section 5.3)

Some observations about this structure:

- We should use all possible social and technical means to discourage people from depending directly on ``ghc-internals``, because if such dependencies become frequent and ossified, it will lead to future pain when the API changes.
Saying "we told you not to rely on it" may be true, but won't lessen that pain.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the proposal feels orthogonal to the rest of the proposal, and I think it can be removed without impacting the benefits of the proposal.

If ghc-internals makes a breaking change, then per PVP, it will do a major version bump. This means that depending on ghc-internals has the same pain points as any other library - lax version bounds and a new release may break your code, or strict version bounds and then you have to make lots of Hackage revisions. I don't think providing a special case for ghc-internals makes much sense here, since we do not do that for ghc, ghc-prim, or any other package.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't feel like this part of the proposal is well-motivated - specifically why ghc-internals should be treated separately from a potential text-internals or bytestring-internals.

From my perspective, writing build-depends: ghc-internals is just as prone to pain as any other unbounded dependency. And build-depends: ghc-internals == 9.6.4.* should be perfectly safe, unless PVP is violated. But the point of this proposal is that we can make "breaking changes" to ghc-internals as a major version bump without incurring a major version bump for base - in other words, so we can be PVP compliant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't feel like this part of the proposal is well-motivated - specifically why ghc-internals should be treated separately from a potential text-internals or bytestring-internals.

It's really a response to a legitimate CLC concern that if it's too easy to depend on ghc-internals then that's what people will do. And if that happens we might grow an ecosystem of libraries many of whch depend, perhaps by accident, on ghc-internals. So every new release of GHC would force updates to all of those libraries. You could say "if you depend on ghc-internals then that's what you get, but the pain is real and not mitigated by saying "I told you so". To put it another way, it risks entirely bypassing the stability efforts of the CLC.

So that is, I think, the reason for this text. You are right that all the same issues apply to any -internals package, and one might wonder about mechanisms to discourage depending on them. But to avoid scope creep in this document we just stuck to ghc-internals.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discouragement is well motivated. Prohibition isn't.

We have to educate users more generally about the pattern we're trying to introduce here and how it relates to PVP, API stability, etc.

I could imagine that we try to summarize the result of these efforts for end users and library maintainers in a more distilled way and explain the pattern there in more depth.

I don't think we have to be particularly scared about ecosystem degrading.

  1. Hackage has a reverse deps feature and we can monitor packages depending on ghc-internals easily.
  2. There are many eyes on popular packages like aeson, servant, conduit etc. and I'm sure someone will scold any attempt at introducing a ghc-internals dependency
  3. Even if a package accidentally does so, it's not irreversible. We can scold the maintainer and ask why they saw the need to do so.
  4. "Pick your libraries carefully" already applies. Well educated maintainers who care about the ecosystem will play nicely.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonpj I think that makes good sense. I want GHC to be able to innovate freely and iterate quickly, and I wouldn't want CLC to start encroaching on the ghc-internals package for this reason. Where I think I'd draw the line is at describing the problem, noting it is a known ecosystem-wide problem that deserves a good solution, and somehow put a reminder on the package to use that solution when it is implemented.

There are a few perspectives here, and I think it'd be good to be really clear about which perspective a bullet is trying to satisfy. This one appears to be trying to satisfy both "industrial users want fewer breaking changes when upgrading major versions of GHC" and "CLC wants to be able to promote stability for the ecosystem" - which are overlapping but subtly different concerns.

As an industry user and OSS contributor, the primary pain point of upgrading GHC is that there are breaking changes that require a large chunk of the ecosystem to be touched. base is usually a relatively small part of this pain, with ghc-prim and template-haskell causing far more issues. I don't expect that ghc-internals is going to meaningfully impact the work that gets done here.

If ghc-internals is banned by cabal-install, then I don't see how it is different from un-exposed modules in base. If ghc-internals is banned on Hackage, then industrial users will need to vendor code in order to share libraries that depend on it - which increases the pain of a major upgrade by requiring applications to depend on ghc-internals directly, rather than sharing a library which can abstract over multiple versions of ghc-internals via CPP or other techniques. If there are warnings, they will be ignored for good and bad reasons, and maintainers will have another package to consider when doing ecosystem upgrades, but the overall workload doesn't change much.

As a CLC member, ghc-internals makes my life easier by having fewer changes to base, and changes to base that can be backed up by real ecosystem use in ghc-internals. But if using ghc-internals is heavily discouraged, then it's difficult to identify whether or not something in that package is satisfying the needs of the ecosystem.

Summarizing,

  • I don't think we should use strong wording like "all possible social and technical means."
  • I'm not even sure we need to discourage beyond calling the package ghc-internals and writing docs that it'll be a major version bump with breaking changes every GHC release.
  • I definitely think that proposing specific technical means for discouraging folks is out-of-scope here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about it, the more I am convinced that "all possible social and technical means" is too strong. It is obviously not literal. We are not going to hire a crack commando unit who survive as soldiers of fortune to go to your house if you depend on it.

Better language might be "develop both social and technical mechanisms to discourage..."

Copy link
Contributor

@simonpj simonpj Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parsonsmatt thanks, but I'm still not quite getting this. (It's surprisingly hard to explain all this accurately, as we are all finding!)

  • I do think we should discourage import of ghc-internals. For example, if the same function is available through base you should definitely use it from there. Why? Because the GHC team might simply move it around to a different module in ghc-internals -- it's "just" an implementation matter, after all. But that would break your code.

  • I think we all agree that "discourage" does not mean "prevent" or "ban".

  • I'm entirely open to changing the words that express "discourage". The "all social and techincal means" is a direct quote from @Bodigrim, who may have a view here.

    At very least "discourage" must include "cannot happen except by conscious choice", so that uninformed users don't casually depend on a function from ghc-internals. Eg. if it's also available from base, that option should be presented more prominently somehow.

  • I think that one process you have in mind is this:

    • CLC notices that lots of libraries are importing GHC.Foo( wombat ) from ghc-internals.
    • Even though depending on ghc-internals is discouraged, they still push through that pain barrier, because there is no alternative to wombat.
    • So CLC thinks "based on this evidence, perhaps we should adopt wombat into base.

    I think that's a great plan. But it's entirely compatible with (indeed somewhat based on) discouraging use of ghc-internals.

Woud you like to propose some alternative concrete form of words? (We could move the discussion of possible mechanims into Section 6, i.e. plainly rumination around the theme rather than part of the core plan.)

What mechanisms could we use?

- The name ``ghc-internals`` is a pretty strong signal all by itself.

- Cabal description and README explains how it is intended used (and not used).

- Hoogle could (by default anyway) never show stuff from ``ghc-internals``.

- Do not upload Haddocks for ``ghc-internals`` to Hackage.
(Ditto ``ghc-prim``.) Need to make sure that if someone wants to follow the Haddock source-code link to (say) Functor, they should still find it regardless of where it is actually defined.

- We could consider issuing a warning if you say ``-package ghc-internals`` (or ``ghc-bignum`` or ``ghc-prim``), one that was hard to silence.
Since we can have module-level ``WARNING`` pragmas with custom categories, one way to realise this would be to pick a category and add such pragmas to every module in the relevant packages, though we might want to do something more systematic.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds excellent. Much like "partial", "internal" seems like a fairly natural warning category that other people in the ecosystem might want to use. And it's easy for a user to say "I know what I'm doing" by just turning off that warning category in the module where they use internals.

Copy link
Collaborator

@gbaz gbaz Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's the category field in cabal package files which is pretty unstructured an all over the map. We could easily initiate a convention of the internal category actually meaning something -- e.g. triggering an extra textual notice on hackage pages listing the package, and extending cabal so it could maybe optionally filter or warn on such stuff.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like more work for less fine control? If we already have the mechanism in GHC at a module level, that seems sufficient to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hackage itself cannot and should not read module-level pragmas, nor can cabal. This proposal is about having some packages be internal and some not. So we need package-level mechanisms for marking things internal, such that both hackage and cabal can take appropriate action (perhaps additional html, like a big red warning box on hackage, and for cabal the ability to warn on using internals as a direct rather than indirect dependency, etc)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and for cabal the ability to warn on using internals as a direct rather than indirect dependency, etc

From what I understood people are also worried about indirect dependencies (coming from things other than the original package), because they may easily lack behind upgrades and bubble up churn.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be another thing that would be nice to address. the category field does not address that, but seems useful nonetheless, and I would encourage its consideration regardless.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Sorry, I realised that what I was saying was ambiguous based on the line I was commenting on: I'm in favour of a module-level INTERNAL warning, added manually to source files, not magically added to packages by cabal or Hackage)

The text of the warnings could encourage users to

- switch to a function exposed by base, and/or
- petition the CLC to expose this super-useful function from base.

- ``cabal check`` (a per-package check) could warn on packages that use ``ghc-internals``.

- ...what else?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use all possible social and technical means to discourage people from depending directly on ghc-internals

Really all? Including GHC/Cabal simply refusing to allow packages called ghc-internals if the package currently built isn't base or another internal whitelisted package?

But it says “discourage”, not “prevent”, which sounds reasonable.

Maybe more realistic: hackage could reject uploads of packages that depend on ghc-internals?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it says “discourage”, not “prevent”, which sounds reasonable.

Yes that is key. If people want to tinker with the internals of some open source software, we should not stop them. But we should make sure they are very aware they are stepping outside the stable safe zone. And likewise we ought to someday make anyone that uses the tinker's software (transitively!) also aware they are exposed to instability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The difference between ghc-prim and ghc-internal also isn't clear. But one answer is

GHC devs can structure (and restructure) their internals however they like, including dividing definitions arbitrarily between ghc-prim and ghc-internals if they so desire, so this question doesn't need to be answered in this document.

Another answer is

ghc-prim is (mostly, perhaps should be entirely) auto-generated docs for code that doesn't actually exist in textual form (it's just hard-coded in GHC), whereas ghc-internals is human-written code.

)


- In contrast, clients are *not* discouraged from depending on ``ghc-experimental``; although again its name should convey the idea that it might change at short notice.

``ghc-experimental`` allows the GHC Steering Committee to make initially-experimental language changes, which often involve new types and functions, without committing to permanently supporting the precise API, since it often takes a little while for these designs to settle down.

The existence of ``ghc-experimental`` should substantially ameliorate the difficulty that many GHC Proposals have a library-function component, but it is unlikely to be a *stable* API (having just been invented) and is therefore in conflict with the CLC's goals.

As they become stable, the CLC may want to consider adopting the new types and functions from ``ghc-experimental`` into ``base``.
(But CLC would not expect to curate the API of ``ghc-experimental``.)

- Perhaps ``ghc-experimental`` should be in the purview of the GHC Proposals process.
GHC devs should not just make up random APIs and pop them into ``ghc-experimental``; a scrutiny process would be valuable.

- Under this proposal, there is initially no change (whatsoever) to the API exposed by ``base``, or its performance characteristics.
The impact on clients should therefore be zero.

Over time, the GHC developers may make CLC proposals to remove types and functions that are currently in the ``base`` API, but are in truth part of GHC's implementation, and were originally exposed by historical accident.
But these are *future* proposals.

To make the transition suggested in these future proposals easier to manage, we have in progress a ["deprecated exports"](https://github.com/ghc-proposals/ghc-proposals/pull/595) mechanism that will ease such transitions.
For a transitional period, ``base`` can continue to export the function, but with a deprecation warning saying something like:

This is going to disappear from base.
You probably don't want to use it at all.
But if you absolutely must, get it from ``ghc-internals``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't say in the same document that we should use "all means necessary" to avoid ghc-internals dependencies, and also recommend people to import a function from there.

Suggested change
But if you absolutely must, get it from ``ghc-internals``.
If you absolutely must, define it yourself locally, or if this is impossible, let the CLC and GHC team know!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect ghc-internals to mainly contain things one cannot define locally? I would expect also some zany integrate-with-ghc-so-tightly-someone-turns-blue projects to always need to use ghc-internals too. I think it's less important to deny that category exists, than emphasize that the vast majority of projects are not in that category.

Put another way, I'd expect those projects to know who they are and not reach out to GHC devs or the CLC, and I'd expect almost everyone who does reach out to be told "actually, you don't need to do things that way". Is that what you were thinking too? Or were you imagining we might sometimes move something to ghc-internals (without a reexport elsewhere) that shouldn't be there, by mistake?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If such projects are relying on functionality from base that GHC developers and the CLC both agree should be moved to ghc-internals, they ought to reach out and say "hey, we use this for such-and-such", and make a case for it to be put in ghc-prim instead.

The point here is to make, FAIAP, a "hidden" library containing code in the strictest intersection of "compiler bootstrap" and "necessary to allow general-purpose programming": stuff like the guts of IO, datatype representations, and the like. There should be nothing in ghc-internals that is both:

  • useful for general programming, and
  • not exported from base, ghc-prim, or ghc-experimental.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the main "There should be nothing in ghc-internals that is both: useful for general programming, and not exported from..." point, but do not that I don't think anyone intends ghc-prim to be any more user-facing than ghc-internals. Perhaps the proposal should make this more explicit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not exported from base, ghc-prim, or ghc-experimental.

This should be "not exported fomr base or ghc-experimental". ghc-prim is every bit "internal" as ghc-internals. Anything in ghc-prim that is useful for general programming should be be exposed by base or ghc-experimenal. Indeed maybe we should make that clearer.


- To expose a new function from ``ghc-internals`` requires that any functions on which it depends are also in ``ghc-internals`` (not base).
So we may need to move code from ``base`` to ``ghc-internals``, leaving a shim behind in base.
In practice, that may mean that quite a lot of code will move into ``ghc-internals`` quite quickly.
But that's fine: *it is just an implementation matter*: provided the modules, exports, and API of ``base`` are maintained, it is immaterial to clients (and hence to CLC) exactly *how* they are maintained.

- This proposal is fully compatible with, and actively supports, the `CLC charter <https://github.com/haskell/core-libraries-committee#base-package>`__:

The primary responsibility of CLC is to manage API changes of ``base`` package.
The ownership of ``base`` belongs to GHC developers, and they can maintain it freely without CLC involvement as long as changes are invisible to clients.
Documentation changes normally fall under GHC developers purview, except significant ones (e.g., adding or changing type class laws).

- It also supports GHC innovation, by

- allowing GHC freedom to change aspects of its implementation

- allowing the GHC Steering Committee to add new functions and types in ``ghc-experimental``.

- One might wonder why GHC has three "internal" packages: ``ghc-internals``, ``ghc-bignum``, and ``ghc-prim``? Could they not be a single package? Answer: technically yes, but it helps to keep dependencies and responsibilities clear.
And it's purely an internal GHC matter; if the team wants to structure GHC's internals with three packages, or ten, that's up to them.
Ericson2314 marked this conversation as resolved.
Show resolved Hide resolved

Continuous integration
======================

A major difficulty is **knowing when the API of 'base' (as defined in Section 2) has changed.** A change requires CLC approval; but how do we know what commits (to ``base``, to ``ghc-internals``, to ``ghc-prim``) make such a change?

In the past we have relied on best efforts; but with a bunch of volunteers, mistakes will be made.
And mistakes can lead to a loss of trust.

The solution is obvious: we need to automate.
We therefore propose the following, as part of CI:

1. Compile a good chunk of Hackage (around 500 packages) against base.
We already do this, and it is a huge help in reassuring ourselves that a change does not lead to accidental breakage.

2. Test if any of the types (incl their kinds), functions (incl their types) and instances exposed by the ``base`` API are accidentally changed by a commit.
This is definitely going to happen, soon: @bgamari already has a prototype.
Comment on lines +173 to +174

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tool would be amazing to have for other libraries - I'd love to have something here so I could know for sure if a new release is versioned appropriately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this already exists as https://kowainik.github.io/posts/policeman-bristol - or is what you're looking for something else?

Copy link
Contributor

@bgamari bgamari Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

policeman is a good approximation but when I examined it for use in GHC I found it had a few shortcomings that made it challenging to use in GHC:

  • it makes no attempt at comparing anything beyond the names of exports; that is, the types of bindings, exposed instances, MINIMAL declarations of classes, and the like are not accounted for in the PVP assessment
  • there isn't a clear way to handle packages with platform-dependent exports; base is (unfortunately) such a package
  • it relies on .hie files, which we currently don't produce in the GHC build

Happily, dumping the declarations of a package is quite straightforward. I suspect someone could turn GHC's test into a useful Hackage package (or fold it into policeman).


3. Run the test suite of those packages that have a testsuite that

(a) is usable (e.g. that doesn't take too long to run),

(b) does not have dependencies that are outside the set mentioned in point (1), and

(c) passes before the change to GHC/``base``.

This checks semantics as well as types.

4. Running the performance test suite of some carefully chosen packages.
This checks for performance regressions.
Similar to (3), except that perf suites are less common and often more expensive to run.

5. Develop a new suite of performance tests, specifically for base.
This is quite open-ended; it is not clear what would be desirable, or how much it would cost.

Some modules in ``ghc-internals`` will very directly affect exports of ``base`` (e.g via shim).
These modules could be identified, via the existing ``CODEOWNERS`` mechanism, to ping CLC on any commit to those modules.
This list could be selective, or include all of ``ghc-internals``, at CLC's preference.

Some of these are cheap to do; others are less so.
Fortunately the HF seems willing to help.

*But whatever we do here will be a step forward* from our current, unsatisfactory situation.
Moreover, they will help with CI for changes to GHC itself! (It is rather *more* likely that a commit to GHC's simplifier will cause a perf regression in some package, than a commit to ``ghc-internals``.)

Discussion
==========

GHC Proposals process
---------------------

Some GHC proposals (a minority) directly affect the existing API of ``base``, and are not simply additions that can be exposed in ``ghc-experimental``.
It is unproductive for the GHC Steering Committee to have a long discussion, accept the proposals, and only *then* involve the CLC.

We propose that:

- A GHC Proposal should advertise, in a separate section:

- What changes, if any, it make to ``ghc-experimental``

- What changes, if any, it make to ``base``

- If there are any such changes, the author (and shepherd) should explicitly invite the CLC to participate in the discussion about the proposal.
The CLC will devote some effort to participating and, in the case of changes to ``base``, will subsequently hold a non-binding vote.

- Approval of the proposal (by the GHC Steering Committee, with the non-binding vote of CLC) is not a guarantee that the final implementation will land;
that depends on the implementation being well engineered etc (GHC team);
and the implementor should make an explicit proposal to the CLC specifying the precise changes.

Abstraction leakage
-------------------

We may foresee a couple of ways in which changes in ``ghc-internals`` could become client visible:

- Occasionally, an error message may mention a fully qualified name for an out-of-scope identifier.
For example (GHC test ``mod153``)::

Ambiguous occurrence ‘id’
It could refer to either ‘Prelude.id’,
imported from ‘Prelude’ at mod153.hs:2:8
(and originally defined in ‘GHC.Base’)
or ‘M.id’, defined at mod153.hs:2:21

The "originally defined in" mentions a module; and if that module is in a package that is not imported, GHC will package-qualify the module name.
And seeing ``ghc-internals:GHC.Base`` is perhaps less nice.
This is not a new problem: we already package-qualify modules in ``ghc-prim``.
One solution is to remove the "originally defined in.." parenthesis for types and functions that would require such package qualification.
Comment on lines +257 to +269

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO a lot of this would be helped by tracking the import/re-export provenance more directly. "Imported from A and originally defined in E" is nice but it's also often very nice to know the exact chain of imports and re-exports that brought something into scope. GHC could then have a flag to trim this provenance information to the direct dependencies of a package. That way, if you find yourself depending on ghc-internals:SecretType, you can see exactly how it got introduced into your codebase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In asking ghc-proposals/ghc-proposals#595 (comment) I was wondering something similar --- ideally we we do now know the entire provenance of imports and (re)exports, and there are a number of things we can do with it.


- Another form of leakage could be: a new class in ``ghc-internals``, *not exposed in base*, that is given instances for existing data types.
There is a risk that those instances might confusingly be visible to clients of ``base``.
If so, the CLC should at least be consulted.
Comment on lines +271 to +273

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't you need to import the relevant class to have these instances be visible? And if it's only exposed in ghc-internals, wouldn't that require an explicit import of the class?

This may just be an artifact of how instance visibility works across component boundaries, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily. The primary way that this exposure might happen is via Haddocks. For instance, imagine that we have a class, exposed via ghc-experimental (and not in base) but defined in ghc-internals which some class exposed via base is an instance of. Under haddock's current logic, this instance would be shown in the documentation for base, despite the class itself not being visible in base.


These issues concern error messages and documentation, neither of which are in the direct scope of CLC.
They are not new because we already have ``ghc-prim``.
They may not be show-stoppers, but we should be thoughtful about mitigating them.

Versions and backports
----------------------

We agree that the version number of ``ghc-internals`` may have a major bump between minor releases of GHC.
(Why? Because to fix the bug we change something in ``ghc-internals``.)

This makes an exception to a general rule: generally, a minor release of GHC (say 9.6.4) which only fixes bugs, never makes a major version bump to ``base``, or indeed any boot package.

We should discuss this (rather important) exception with the Stackage curators.

But this same issue could in principle affect ``base`` too.
Very occasionally a **bug-fix** might involve a change to the user-visible API.
Example: `role annotations on SNat <https://github.com/haskell/core-libraries-committee/issues/170>`__ (although there is a debate as to whether this specific change constitutes a "breaking change" under the PVP).

Under these circumstances we (together) will have to decide whether to

- Back port the fix, and not bump the major version of ``base`` (i.e. bend the PVP), or
- Bump the major version of base, but therefore be unable to fix the bug in the released GHC.

This is a decision for the CLC.
See PVP issue https://github.com/haskell/pvp/issues/10.
Comment on lines +293 to +299

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a technical limitation of the way that base and GHC work, and would in principle be fixed by allow a GHC version to work with multiple versions of base, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you proposing (eventually) to continue to ship the base with the bug and a new one with the bug? That would work. But I think the point in this case was that some bugs are not a matter of implementation but inherent to the interface itself.

(I remain very pro multiple base versions with one GHC, to be clear :).)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm confused - can a released GHC use a different version of base? I was under the impression that the base <-> GHC relationship was fixed, and every version of GHC can only use a single version of base. That means that GHC X.Y.Z must always use the same base version, and a new base version means a new release of GHC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Going on a tangent. This is not what this proposal is about.)

I think there is much confusion around this topic because people confuse GHC the software with GHC the official bindist.

The answer is Yes, and it's not even a new feature. See https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Unit/Types.hs#L562.

The short version that is one is liable to shoot themselves in the foot if they have more than one base in a package database, but one is always free to have multiple package databases they will not mix together. In each "parallel universe" of the separate package database, there is only one base, and no potential for confusion.

GHC in fact has no way of knowing whether this is the "official base" that is was shipped with the bindist, or ones own. How could it now? (It isn't shipped with trusted ABI hashes for base, for example, on the contrary, it expects base to have no ABI hashes; it is maximally trusting of whatever base you give it!)

For a variety of reasons, no one has yet proposed we take advantage of this. But we can. There is no technical limitation, just the ergonomic/human problem of it being easy to get our package databases with their separate bases mixed up.

(I believe @alt-romes might be working on getting all wired-ing packages ABI hashes to solve the humans-getting-confused problem, but I am not sure about that.)

Copy link
Contributor

@simonpj simonpj Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I remain very pro multiple base versions with one GHC, to be clear :).)

Indeed -- I think there is a consensus in favour of that!

But, to be clear, achieving that goal is not part of the current document. So this thread is somewhat "by the way".

Still, the plan outlined this document should make the goal of a reinstallable base much more achievable. GHC knows (wired into its binary) where, say GHC.Base.map is defined. If we move that definition from one module to another, everything will stop working. I think that is one reason that base and GHC are so tightly bound together. But if GHC.Base.map was in ghc-internals then GHC and ghc-internals would remain tightly bound together, but base would be much more loosely coupled. I don't yet see any technical obstacle to installing a fresh base without changing ghc.

I could be wrong; but all that matters for this conversation is that the plan here is a step in the right direction for the reinstallable-base goal.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic, thanks for the clarification!

Allowing base to iterate without incurring a GHC release cycle would be a big win for the ecosystem, for sure.


New classes
-----------

Suppose the author of a new library ``l`` defines a new class ``C``.
Good practice is for them to define an instance of ``C`` for all types in boot packages (packages needed to build GHC and Cabal).

Should ``ghc-experimental`` be considered a boot package in this sense?
After all, type ``S`` in ``ghc-experimental`` may change, which would break ``l``.
Agreed answer: no.
That is, we do not make it best-practice for library authors to give ``C`` instances for types exported only by ``ghc-experimental``.
(They can, of course, but it's fine not to.)

Other teams to consult
======================

There are other stakeholders in this space who we should consult, in addition to seeking GHC Steering Committee and CLC approval:

**Stackage curators**

- Is it OK to make a major bump in ``ghc-internals`` for a minor release of GHC?

**Haddock team**

- Hiding (in the documentation) instances that are not usable because the type or the class is not exposed.
Not clear that this is worth a technical solution.

**Hackage team**

- Can/should we support hiding ``ghc-internals`` on Hackage?

**Security team** / **Stability working group**

- It might be easy for the new security-vulnerability mechanism to also flag packages that depend transitively on ``ghc-internals``.
If they depend on it via ``base``, this is fine.
But if they depend on it via another package, this could be a hazard migrating to a newer GHC the code authors were not aware of.

**HLint team**

- Can we add a check for imports from ``ghc-internals``?