Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design for dataToTag# #104

Closed
simonpj opened this issue Nov 3, 2022 · 66 comments
Closed

Design for dataToTag# #104

simonpj opened this issue Nov 3, 2022 · 66 comments
Labels
approved Approved by CLC vote awaits-merge Approved, but MR is still unmerged

Comments

@simonpj
Copy link

simonpj commented Nov 3, 2022

Matthew and I would like to consult you about the API exposed to users of base for the dataToTag# operator.

Currently we have a primop

dataToTag# :: forall (a :: Type). a -> Int#

and a strangely named alias in GHC.Base

getTag :: forall (a :: Type). a -> Int#
getTag x = dataToTag# x

Of course this is all wrong:

  • dataToTag# is way too polymorphic: it can't possibly work for every type
  • It is levity-monomorphic, so you can't use it on unlifted data types
  • Plus, its implementation is a mess -- there is an ad hoc check that all uses of dataToTag# are to data types; but it's a fragile check,.

So Matthew is fixing that by introducing

type DataToTag :: forall {lev :: Levity}. TYPE (BoxedRep lev) -> Constraint
class DataToTag a where
     dataToTag# :: a -> Int#

That fixes both things at one blow:

  • dataToTag# now has a type class constrained type, so it's not over-polymoprhic
  • Both the class and its operation are levity-polymorphic.
  • The implementation is very nice; no hacks any more.

So far so good: it's a change, but a backward-compatible change.

But we'd also like to kill off the strangely named getTag while we are about it (with a deprecation cycle). And we propose to define

dataToTag :: forall (a :: TYPE LiftedRep). DataToTag a => a -> Int
dataToTag' :: forall (a :: TYPE UnliftedRep). DataToTag a => a -> Int

as wrappers for dataToTag# that return a civilised boxed Int. It would be nice to make these levity-polymorphic too, but you can't write

dataToTag x = I# (dataToTag# x)

because there is a levity-polymorphic binder x.

An alternative would be to put dataToTag into the class like this

class DataToTag a where
     dataToTag# :: a -> Int#
     dataToTag :: a -> Int

and now dataToTag can be levity-monomorphic. But the implementation is significantly more fiddly, because we have to build that dictionary on the fly.

The naming of the unlifted version is up for grabs. I suggested dataToTag' by analogy with foldl'

@treeowl
Copy link

treeowl commented Nov 3, 2022

I don't have any thoughts on your question. However, I do have thoughts about the class, because of some work I was doing recently. In particular, it would be nice to have one or two associated types, and probably another method:

  1. RoundTrips, which when 'True guarantees that tagToEnum# (dataToTag# x) = x. This holds for enumeration-like types (which may be GADTs and thus not Enum).
  2. NumCons :: Nat, the number of constructors.
  3. numCons :: Int, the number of constructors.

@tomjaguarpaw
Copy link
Member

it's a change, but a backward-compatible change

Is it backwards-compatible? That means that everywhere a user can successfully compile dataToTag# today they can compile dataToTag# after the change. How about

myDataToTag :: forall (a :: Type). a -> Int#
myDataToTag = dataToTag#

That compiles today. Will it compile after this change? I doubt it. But have I missed something here?

@simonpj
Copy link
Author

simonpj commented Nov 3, 2022

That compiles today. Will it compile after this change?

Ah, you are right. Of course: myDataToTag can't be that polymorphic: indeed that's the point!

So, fair cop: it's not backward compatible if you write over-polymorphic functions involving dataToTag.

@treeowl
Copy link

treeowl commented Nov 3, 2022

What about newtype wrappers of various sorts? Will dataToTag# still be able to peer through those?

@simonpj
Copy link
Author

simonpj commented Nov 3, 2022

What about newtype wrappers of various sorts? Will dataToTag# still be able to peer through those?

Can you give concrete examples of what you mean? DItto for your earlier suggestions, not all of which I could parse. Thanks!

@treeowl
Copy link

treeowl commented Nov 3, 2022

@simonpj dataToTag# and tagToEnum# are both lightweight "generic" functions, but they're missing some metadata. If we have a TagToEnum class, my round trip thing is irrelevant, so ignore that. But someone who has something TagToEnum and DataToTag is relatively likely to want to know how many constructors the type has. That will let them know things like how many bits are required to serialize the type.

@Bodigrim
Copy link
Collaborator

Bodigrim commented Nov 3, 2022

And we propose to define

dataToTag :: forall (a :: TYPE LiftedRep). DataToTag a => a -> Int
dataToTag' :: forall (a :: TYPE UnliftedRep). DataToTag a => a -> Int

as wrappers for dataToTag# that return a civilised boxed Int.

dataToTag# is usually used in low-level, performance-critical manipulations, so I would not say that there is users' demand for "civilised boxed Int", if it simplifies.

@treeowl
Copy link

treeowl commented Nov 3, 2022

The newtype question: today we can apply dataToTag# to All to determine whether it's true or false. Will that remain the case? I don't think it necessarily needs to, but it would be good to make an intentional decision.

A symmetry question: I don't see a TagToEnum constraint counterpart proposed above. That seems very sad; I would surely want both. Perhaps the answer is performance? tagToEnum# is not operationally polymorphic today, as I understand it. Would it be possible to make it so, and yet be sure to specialize it even with -O0?

@treeowl
Copy link

treeowl commented Nov 3, 2022

I just discovered that tagToEnum# doesn't work for GADTs. That makes it much less useful than I imagined it to be. Oh well.

@simonpj
Copy link
Author

simonpj commented Nov 3, 2022

Can I ask: does the CLC see dataToTag# as part of the API for base that users can rely on (and hence which the CLC creates) or as part of GHC internals, purely an internal implementation matter?

If the former, we are totally open for the CLC's guidance. Eg. no dataToTag and dataToTag' is fine with me.

@simonpj
Copy link
Author

simonpj commented Nov 3, 2022

The newtype question: today we can apply dataToTag# to All to determine whether it's true or false

I see. To spell it out, given

newtype All = MkAll Bool

is dataToTag# (x :: All) well typed? And if so what answer does it give?

Today I think the answer is "yes it is well typed", and would return the tag on the enclosed Bool. I don't know if that's what you want, and would welcome the CLC's guidance. It's a good question.

We can implement either behaviour I think; but rejecting the above is certainly easier. Accepting it would lead to more design questions like "what if the data constructor is not in scope".

My instinct to make a conservative choice now (reject), and loosen up later if we find compelling applications.

@simonpj
Copy link
Author

simonpj commented Nov 3, 2022

A symmetry question: I don't see a TagToEnum constraint counterpart proposed above

There will certainly be a TagToEnum follow up. But I thought it'd be simpler to do a trial run on a more focused topic.

@simonpj
Copy link
Author

simonpj commented Nov 3, 2022

But someone who has something TagToEnum and DataToTag is relatively likely to want to know how many constructors the type has. That will let them know things like how many bits are required to serialize the type.

Hmm. I'm not convinced. Can you show some compelling examples? I suspect you'll quickly need the Data class anyway if you want to do anything polymorphic. Currently dataToTag# is essentially always used monomorphically.

@Bodigrim
Copy link
Collaborator

Bodigrim commented Nov 3, 2022

Can I ask: does the CLC see dataToTag# as part of the API for base that users can rely on (and hence which the CLC creates) or as part of GHC internals, purely an internal implementation matter?

It's kinda borderline, but I'm inclined to say that it's in scope for CLC because of a new type class.

@Bodigrim
Copy link
Collaborator

Bodigrim commented Nov 4, 2022

@simonpj could you please elaborate why do you need a type class? Could we just have dataToTag# :: forall {lev :: Levity}. TYPE (BoxedRep lev) . a -> Int# as a primop?

@treeowl
Copy link

treeowl commented Nov 4, 2022

Hmm. I'm not convinced. Can you show some compelling examples? I suspect you'll quickly need the Data class anyway if you want to do anything polymorphic. Currently dataToTag# is essentially always used monomorphically.

tagToEnum# bans GADTs, which makes it unsuitable for what I wanted, so here's a more simplistic example. Suppose I want to serialize and deserialize arrays ("words") consisting of some enumeration type ("letters"). Efficient serialization using dataToTag# takes little code; there's no need to specialize to different types. Deserialization must invert serialization, so it must use tagToEnum#. But for that, it needs a dictionary. And it also needs to know how many bits to use for each element.


What I actually wanted to use this for, that would require more changes to work:

dependent-map offers a type

type DMap :: (k -> Type) -> (k -> Type) -> Type

which is basically the same as Data.Map, but with different kinds (and keys that are usually GADTs). I want to write something similar for the special case of keys with only nullary constructors. Rather than a binary tree, I want a sparse SmallArray (like one node of a HashMap). This requires knowing at the type level whether the number of constructors is appropriate for this representation, dataToTag# to get array indices, and a (thoroughly unsafe) use of tagToEnum# for indexed traversals.

@treeowl
Copy link

treeowl commented Nov 4, 2022

@simonpj could you please elaborate why do you need a type class? Could we just have dataToTag# :: forall {lev :: Levity}. TYPE (BoxedRep lev) . a -> Int# as a primop?

I don't think dataToTag# is supposed to make sense for function types. And somewhere (above? In the GHC ticket?) there's mention of the fact that it breaks parametricity; that's generally unfortunate, but it's not the only primop to do so.

@simonpj
Copy link
Author

simonpj commented Nov 4, 2022

@simonpj could you please elaborate why do you need a type class? Could we just have dataToTag# :: forall {lev :: Levity}. TYPE (BoxedRep lev) . a -> Int# as a primop?

Well, dataToTag# is, as its name suggests, a way of finding the tag of a data constructor. So you need to be sure that you are applying it to a data constructor and not, as David says, to a function. Or even perhaps a newtype. This is a way to make sure.

Does breaking parametricity matter? Well, foldr/build fusion becomes unsound.

tagToEnum# :: Int# -> a is a more compelling example, where a has to be a data type; but there is a nice duality here.

@ocharles
Copy link

ocharles commented Nov 4, 2022

Well, dataToTag# is, as its name suggests, a way of finding the tag of a data constructor. So you need to be sure that you are applying it to a data constructor and not, as David says, to a function. Or even perhaps a newtype. This is a way to make sure.

I have no idea if this is true, but are there perhaps other functions in the future that might also be interested in this? If so, that suggests maybe the type class you want (or perhaps even just a built in IsData :: Type -> Constraint constraint) would do the job, rather than bundling this into DataToTag? Or maybe DataToTag is exactly this class, but a more general name (IsData or something) would be more suitable.

I may well be picking bike shed paint colours out before we've even got the bike shed, though.

@Ericson2314
Copy link
Contributor

GHC.Base sounds to me like something that users are not supposed to think is stable?

@Bodigrim
Copy link
Collaborator

Bodigrim commented Nov 8, 2022

@Ericson2314 the situation is as follows:

  • Changing primops, exported from ghc-prim, does not currently mandate a CLC vote. That is fine.
  • However, GHC.Base is an ultimate source of many core type classes, e. g., class Monad, changes to which do require CLC vote.
  • The proposal suggests to add a new type class to GHC.Base. Strictly speaking, such changes require the same level of scrutiny as changing class Monad.

@simonpj how magical is the proposed new type class? Can users define their own instances?

@simonpj
Copy link
Author

simonpj commented Nov 9, 2022

Changing primops, exported from ghc-prim, does not currently mandate a CLC vote. That is fine.

Once again I ask: what is the criterion you are using here. I am super keen to have a clear criterion for

  • What is the API of base, which we try to keep stable, and about which we consult the CLC
  • What is the internals of base, which we make no attempt to keep stable, and about which we do not consult the CLC.

I am really keen to know! You clearly have a criterion @Bodigrim: can you say what it is? Talking of "core type classes" is just not precise. Is DataToTag a core type class?

I think your implicit criterion may be:

  • The stable API of base is defined to consist of the exports of all modules other than those with a GHC. prefix.

Let's review your bullets in the light of this proposed definition:

  • "Changing primops, exported from ghc-prim, does not currently mandate a CLC vote. That is fine." That would be true not because primops are expoted from ghc-prim, but because they are not exported by any non GHC.* module.
  • "However, GHC.Base is an ultimate source of many core type classes, e. g., class Monad, changes to which do require CLC vote." That is true precisely for those classes that are exported by non GHC.* module. For example Monad is exported by Control.Monad
  • "The proposal suggests to add a new type class to GHC.Base. Strictly speaking, such changes require the same level of scrutiny as changing class Monad." Under my proposed criterion that would not be true. It would require scrutiny iff it was exported from a non GHC.* module.

So, is that the definitino? And if not, what is the definition? Thanks! Clearing this up once and for all would be incredibly helpul

@tomjaguarpaw
Copy link
Member

tomjaguarpaw commented Nov 9, 2022

Here is an example of a criterion that might work:

  • The stable API of base is is defined to consist of the exports of all modules other than
    1. those with Internal as a component of the module name
    2. those, grandfathered in for stability, which are very clearly indicated at the top of their Haddocks to be unstable. No more such modules should be added in future.

This "internal modules" convention is widely and successfully used in the Haskell ecosystem.

[EDIT: corrected Unstable to Internal]

@simonpj
Copy link
Author

simonpj commented Nov 9, 2022

For base could we not use GHC.* as the prefix? I think we are close to that already, so it'd be a much less disruptive change than renaming all GHC.* to Unstable.*

(Separately, I though the widely-used convention was Internal not Unstable. At least that's what I've been told by others. But regardless, I'd love to make progress on base first, since a system-wide convention seems elusive.)

@tomjaguarpaw
Copy link
Member

Separately, I though the widely-used convention was Internal not Unstable

Oh, I beg your pardon, yes. I'll amend my post.

@tomjaguarpaw
Copy link
Member

For base could we not use GHC.* as the prefix?

That might be rather misleading, since there are plenty of modules exported from the ghc package whose names start GHC. and which are supposed to be public and reasonably stable, aren't there? I think a different stability naming convention between base and ghc would be rather confusing.

it'd be a much less disruptive change than renaming all GHC.* to Unstable.*

I agree that renaming GHC.* modules in base is very disruptive. I suggest it doesn't happen. Instead all such modules should be grandfathered in and simply marked prominently as "unstable" or "internal" at the top of their Haddocks.

@simonpj
Copy link
Author

simonpj commented Nov 9, 2022

That might be rather misleading, since there are plenty of modules exported from the ghc package whose names start GHC. and which are supposed to be public and reasonably stable, aren't there?

Are there? Most stuff comes through Prelude, or Data.List, or Control.Monad etc. What things do you have in mind? And if there are, would it not be better to export them through some more "blessed" module name?

I agree that renaming GHC.* modules in base is very disruptive. I suggest it doesn't happen. Instead all such modules should be grandfathered in and simply marked prominently as "unstable" or "internal" at the top of their Haddocks.

That is one possibility. It has the disadvantage that one has to know the list ot stable modules -- you can't tell from their names. But that's just my opinion. I am entirely content to submit to the Will of the CLC, if the CLC can come to a view on the matter. Thanks for contributing to the dialogue!

@clyring
Copy link
Member

clyring commented Nov 9, 2022

Sorry for my late arrival here. I have several remarks:

@simonpj could you please elaborate why do you need a type class? Could we just have dataToTag# :: forall {lev :: Levity}. TYPE (BoxedRep lev) . a -> Int# as a primop?

I included some discussion about this in (the current draft of) the overview note for the MR, which I have reproduced here for convenience.

Its one method, `dataToTag#`, has for decades existed as a primop with
the type `forall a. a -> Int#`.  And unlike its cousin `tagToEnum#`,
it really was as polymorphic as that type claimed it to be.  Every
built-in DataToTag instance is just a trivial wrapper around a primop.
So it's natural to ask: Why does the DataToTag class exist at all?

1. With its old type, `dataToTag#` broke parametricity and could peek
   through abstraction barriers because it is too polymorphic, and its
   behavior at types that are not represented as algebraic data types
   was both unspecified and useless.  It's a bit unfortunate for what
   is "morally" a very tame, safe, pure function to carry these defects.

2. Useful information about the type `dataToTag#` is used at is most
   conveniently available during typechecking:

   2a. For large constructor families, the constructor tag must
       sometimes be read from an object's info table, but for small
       constructor families it can be read directly from the pointer
       tag.  (See also Note [Data constructor dynamic tags] in
       GHC.StgToCmm.Closure.)

       We can generate slightly smaller and more efficient code for
       the latter scenario.  But during Stg-to-Cmm, the relevant type
       information is not reliably available.  To work around this,
       built-in DataToTag instances use different primops for each
       scenario: `dataToTagLarge#` and `dataToTagSmall#`.

       (This optimization was #17079 and then part of #21710.)

   2b. Branching on the result of a `dataToTag#` call is equivalent to
       branching on its argument directly.  Transforming the former
       into the latter often enables further transformations such as
       case-flattening and case-of-known-constructor, so we do so.
       See also Note [caseRules for dataToTag] in GHC.Core.Opt.ConstantFold.

       For data family instances, this transformation must be done at
       the representation type of the relevant data instance, rather
       than the data family type that the user sees.  Converting
       between these two types can be done by getting the relevant
       coercion from the FamInstEnvs, which would require tiresome
       plumbing to make available to caseRules.

       Instead, while creating a DataToTag instance for a data family
       type, we apply the underlying primop at the representation type
       and cast the resulting function to make the types match.  This
       establishes the invariant (not currently checked by lint) that
       a dataToTag primop is never applied to a value argument whose
       type is not headed by an AlgTyCon.

3. Using a typeclass affords us greater freedom to change the
   representation of some data types, since we may (if necessary)
   simultaneously change the implementation of `dataToTag#` for the
   affected types to avoid breaking things.

Similar reasoning to point 2b also means that uses of dataToTag# at a newtype are currently sometimes optimized less well than uses of dataToTag# at its underlying data type. Point 3 relates to this GHC proposal which calls for a breaking change to dataToTag# for this reason. To see the unspecified behavior suggested by point 1, try running the following program with today's GHC but with various settings:

{-# LANGUAGE MagicHash #-}
import GHC.Exts (Int(..), dataToTag#)

main :: IO ()
main = print (I# (dataToTag# flip))

Running interactively or compiling with -O0 I see 0 but with -O1 I see 2. (I think that's because flip has arity 3.)


Is it backwards-compatible? That means that everywhere a user can successfully compile dataToTag# today they can compile dataToTag# after the change. How about

myDataToTag :: forall (a :: Type). a -> Int#
myDataToTag = dataToTag#

That compiles today. Will it compile after this change? I doubt it. But have I missed something here?

Correct: This code is currently OK but will fail to typecheck after DataToTag is added, most likely with an error like No instance for ‘DataToTag a’ arising from a use of ‘dataToTag#’. But the breakage is all compile-time-only.

dataToTag# is usually used in low-level, performance-critical manipulations, so I would not say that there is users' demand for "civilised boxed Int", if it simplifies.

The most compelling use for dataToTag# in user code I have seen is to reduce boilerplate in Eq or Ord instances that aren't quite derivable. Here's an example from within GHC: Being able to say dataToTag lit1 < dataToTag lit2 instead of isTrue# (dataToTag# lit1 <# dataToTag# lit2) or I# (dataToTag# lit1) < I# (dataToTag# lit2) would be slightly prettier. But these wrappers are small, and if I am importing GHC.Exts to get dataToTag# anyway I am probably unfazed by such a small inconvenience.

@simonpj how magical is the proposed new type class? Can users define their own instances?

Rather magical: Users cannot define their own instances. (I suppose that since DataToTag is a one-method class users can technically provide something like local instances via withDict. But that's probably not very useful.)

I may well be picking bike shed paint colours out before we've even got the bike shed, though.

There is one un-answered question about the current implementation's interaction with backpack, and there are a few other non-essential tasks I'd like to take care of before merging. But the implementation is basically complete!

Here's a quick run-down of the basic user-facing design questions, and how the current implementation answers them:

  • What should the name of the class be? (Currently: DataToTag)
  • Where should the class live? (Currently: GHC.Magic)
  • Which modules should re-export DataToTag? (Currently: GHC.Exts and GHC.Base)
  • Should GHC.Base.getTag be deprecated? (Currently: it isn't)
  • Should convenience wrappers returning Int instead of Int# be provided? (Currently: they aren't)
  • Should a DataToTag T constraint be solved when not all data constructors of T are in scope?
    • Currently: They are not, because that could break an abstraction barrier.
  • How should a DataToTag T constraint behave when T uses DatatypeContexts?
    • Currently: Any datatype context is ignored. It could perhaps be argued that this breaks abstraction, since an equivalent user-written pattern-match on the type may require some constraints from that context. But this isn't very interesting, and I suspect nobody will be much upset about this loophole.
  • Should DataToTag's special solving behavior look through newtypes? (Currently: it doesn't)

@tomjaguarpaw
Copy link
Member

tomjaguarpaw commented Nov 9, 2022

That might be rather misleading, since there are plenty of modules exported from the ghc package whose names start GHC. and which are supposed to be public and reasonably stable, aren't there?

Are there?

I'm talking about the GHC API, which is a collection of modules exposed from the ghc package, all of whose names start with GHC. and some of which are presumably intended to be stable.

@simonpj
Copy link
Author

simonpj commented Nov 9, 2022

I'm talking about the GHC API, which is a collection of modules exposed from the ghc package, all of whose names start with GHC. and some of which are presumably intended to be stable.

Aha. I was talking exclusively about the base package; the GHC package is another matter, since every single module starts with GHC.! A global consensus (across all packages) seems elusive which is why I'm trying to narrow the question to a single package, a package that is the CLC's primary focus, in the hope of a definite view emerging.

@clyring
Copy link
Member

clyring commented Dec 10, 2022

@Bodigrim: I have pushed a commit reverting the unrelated primop type changes. I also bumped the version number to 9.2.100 just to make a stale-interfaces issue less likely. (I really hope that's the cause of the mime-mail-0.5.1 problem because I have no clue what else it could be, and still can't reproduce it.)

@Bodigrim
Copy link
Collaborator

@clyring this looks better. But could you please ensure that your branch is atop of 9.2.5? It seems to suffer from https://gitlab.haskell.org/ghc/ghc/-/issues/21964. I'm on M2.

@clyring
Copy link
Member

clyring commented Dec 18, 2022

The branch does contain the fix for that issue (74ca6191fa0), which is currently still the last commit in the ghc-9.2 branch.

@Bodigrim
Copy link
Collaborator

Sigh. I switched to my old, x64 machine and happy to tell that the change does not seem to affect any Stackage packages, which is good.

Here's a quick run-down of the basic user-facing design questions, and how the current implementation answers them: ...

@clyring could you confirm that your comment remains up-to-date with the most recent developments and there were no design changes?

@Bodigrim Bodigrim removed the awaits-impact-assessment Awaits an impact assessment (e. g., using https://github.com/Bodigrim/clc-stackage) label Dec 18, 2022
@clyring
Copy link
Member

clyring commented Dec 18, 2022

I'm surprised there's no breakage in clc-stackage. Yes, that comment remains up-to-date.

@Bodigrim
Copy link
Collaborator

Dear CLC members, let's vote on the proposal to replace

GHC.Exts.dataToTag# :: forall (a :: Type). a -> Int#

with

type DataToTag :: forall {lev :: Levity}. TYPE (BoxedRep lev) -> Constraint
class DataToTag a where
     dataToTag# :: a -> Int#

as detailed in https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8912. The motivation for the change is in the first half of #104 (comment). The final user-facing design is described in #104 (comment). In theory, as explained in #104 (comment), this is a breaking change, but practically no package in clc-stackage project is affected.
@tomjaguarpaw @chessai @emilypi @mixphix @cgibbard


+1 from me.

@mixphix
Copy link
Collaborator

mixphix commented Dec 19, 2022

+1

@tomjaguarpaw
Copy link
Member

-1 (would change to +1 if we don't expose DataToTag from base)


The net effect of this proposal on CLC-land (i.e. base) is

  1. improve the type of getTag
  2. expose DataToTag from GHC.Exts

1 seems fine. Although this is an implementation detail which probably have never been exposed from base (exposing from ghc-prim seems sufficient) the change doesn't make anything worse, since it seems there are no consumers who used it at the bad type. 2 is not fine. DataToTag is a GHC implementation detail and therefore should not be exposed from base. Anyone who wants to use it should import it from ghc-prim. Granted, we have a lot of implementation details that are already exposed from base, but let's not add more.

@simonpj
Copy link
Author

simonpj commented Dec 20, 2022

Tom as I understand it, you want to expose

getTag :: DataToTag a => a -> Int
dataToTag# :: DataToTag a => a -> Int#

That is, the same names as before, but with DataToTag class constraint. Is that what you want?

(The inconsistent naming is a wart, but perhpas one that the CLC thinks that the costs of making it consistent exceed the benefits?)

Naming aside though, can you do that without exposing the DataToTag class? Surely I must be able to write

myGet :: DataToTag a => a -> Int
myGet x = getTag x + 1

So I must be able to import DataToTag! It's not a GHC implementation detail: it's a proper type-class constraint on the types that you can all myGet on, isn't it?

@tomjaguarpaw
Copy link
Member

tomjaguarpaw commented Dec 21, 2022

I'm saying that both getTag and DataToTag/dataToTag# are GHC implementation details. People who want to use them should import them from ghc-prim. I don't understand why they'd be exposed from base at all, except by historical mistake in the case of getTag.

Since getTag is already exposed by base it seems prudent to reluctantly allow it to continue to be. DataToTag is not already and it shouldn't be. Users should import it from ghc-prim.

Does that help?

@simonpj
Copy link
Author

simonpj commented Dec 21, 2022

Aha, thank you. That is indeed clearer. Two thoughts.

First.

I'm saying that both getTag and DataToTag/dataToTag# are GHC implementation details.

If that's what the CLC wants then fine. But I think that many people entirely uninterested in GHC's implementation will want getTag to help them build efficient Ord instances. I don't think it's an implementation detail at all.

If the CLC wants to remove it from base (and the CLC's purview), then perhaps DEPRECATEing getTag would be a good way forward. If you don't deprecated it, its type DataToTag a => a -> Int will mention a class that is not available from base which would be ... unusual. I don't know any package that exports a function whose type constructors are not importable.

Second

People who want to use them should import them from ghc-prim.

That might be difficut, because ghc-prim is the stuff we need to build the Integer package(s), which are in turn needed for much of base. So in general it is not necessarily easy to move stuff from base into ghc-prim.

Since the CLC's wish is that the external (stable) API of base is all exposed modules, and non-exposed modules can't be imported, the only way to expose an implementation detail is from another package. That is perhpas a further reason to split base into ghc-base and base, with base being the purview of CLC, and ghc-base being implementation details.

@tomjaguarpaw
Copy link
Member

If that's what the CLC wants then fine.

I'm just speaking for myself here. The CLC makes decisions by majority vote. Whether that leads to the CLC "wanting" something, or to coherent long-range policy, is an open question. But it is about the maximum the CLC has the bandwidth for at the moment.

I don't think it's an implementation detail at all.

It exposes the implementation detail that GHC chooses to represent constructors with integers assigned in a particular way, and it works on the assumption that this assignment is stable across compiler invocations and also across compiler versions. It's not the deepest magic ever but it does seem deep magic enough that I'm wary of seeing it exposed from base. "It's already exposed from base": sure, but I don't want to perpetuate that. Avoiding exposing DataToTag seems like a practical way of discouraging people using this function from base!

I don't know any package that exports a function whose type constructors are not importable.

Sure, because normally packages expose functions that people are supposed to use.
I'm saying that people should not use dataToTag/getTag from base. I'm also open to marking the existing version DEPRECATED.

People who want to use them should import them from ghc-prim.

That might be difficut, because ghc-prim is the stuff we need to build the Integer package(s), which are in turn needed for much of base. So in general it is not necessarily easy to move stuff from base into ghc-prim.

But your MR already exposes them from ghc-prim doesn't it? I'm not trying to make a suggestion that applies in general; I'm trying to make a suggestion that applies in this case.

That is perhpas a further reason to split base into ghc-base and base, with base being the purview of CLC, and ghc-base being implementation details.

Exposing getTag/dataToTag stuff from a package other than base sounds like a good idea to me. Then, if people were to start using it successfully and it proved sufficiently popular, then I think it would be worth considering exposing it from base. But inclusion in base should lag popular usage, not lead it.

To reiterate, this is my personal opinion, not a coherent policy, certainly not CLC policy. Other CLC members may disagree with what I have written here. To help improve the situation in future, regardless of the outcome of this proposal, I think it would be valuable to try to form a coherent policy for base, GHC, and their relationship by continuing discussions such as #105.

@simonpj
Copy link
Author

simonpj commented Dec 21, 2022

To help improve the situation in future, regardless of the outcome of this proposal, I think it would be valuable to try to form a coherent policy for base, GHC, and their relationship by continuing discussions such as #105.

I would warmkly welcome such a policy, thank you.

@simonpj
Copy link
Author

simonpj commented Dec 21, 2022

But your MR already exposes them from ghc-prim doesn't it? I'm not trying to make a suggestion that applies in general; I'm trying to make a suggestion that applies in this case.

OK yes, that is true. Fine: I think Matthew and I would be open to what the CLC wants. But I suggest that the alternatives are:

  • Expose dataToTag and DataToTag as part of base's API.
  • Do not expose them, and deprecate the current getTag.
    To continue to expose an un-deprecated getTag, with a type that mentions a class that is not importable from base seems inconsistent.

We'll await a resolution from CLC. Thanks!

@clyring
Copy link
Member

clyring commented Dec 21, 2022

The advice at the top of GHC.Magic, the current proposed home of DataToTag (and its method dataToTag#), is:

Use GHC.Exts from the base package instead of importing this module directly.

Most of the other modules in ghc-prim have similar disclaimers. So our current messaging seems to suggest that DataToTag should be re-exported from GHC.Exts. Of course, we can break with that existing messaging, but I think we ought to have a somewhat clear vision of what our policy and messaging in this area should be before doing so.

@tomjaguarpaw
Copy link
Member

To continue to expose an un-deprecated getTag, with a type that mentions a class that is not importable from base seems inconsistent.

Yes, I appreciate my point of view on this is unconventional and I don't anticipate others in the CLC agreeing with me.

@Bodigrim
Copy link
Collaborator

@chessai @emilypi @cgibbard just a gentle reminder to vote.

@chessai
Copy link
Member

chessai commented Dec 23, 2022

+1

1 similar comment
@emilypi
Copy link
Member

emilypi commented Dec 23, 2022

+1

@Bodigrim
Copy link
Collaborator

Thanks all, 4 votes in favour are enough to approve.

@Bodigrim Bodigrim added the approved Approved by CLC vote label Dec 23, 2022
@cgibbard
Copy link
Contributor

+1

The change in type is certainly an improvement here, as at least as I understand it, one could imagine instances of the DataToTag class being handwritten and nothing being out of the ordinary semantically with the new version of the function, so while I would have been in favour of deprecating and removing the previous getTag, I think the new constrained one is probably fine.

@chshersh
Copy link
Member

I'm trying to summarise the state of this proposal as part of my volunteering effort to track the progress of all approved CLC proposals.

Field Value
Authors @simonpj, @clyring
Status not merged
base version Unknown
Merge Request (MR) https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8912
Blocked by ???
CHANGELOG entry present
Migration guide missing

Please, let me know if you find any mistakes 🙂


@clyring could you share a status update on the implementation and what are next steps? Also, please do let CLC know if you need any help, so we can coordinate and prioritise approved proposals accordingly!

@chshersh chshersh added the awaits-merge Approved, but MR is still unmerged label Mar 24, 2023
@clyring
Copy link
Member

clyring commented Mar 27, 2023

The underlying primop has some smelly special-cases and gunk near exprOkForSpeculation and in code generation (STG-to-whatever). I've slowly been working on cleaning that situation up, but doing so is no longer a blocker now that there is consensus about the user-facing design. I can probably remove most of the code generation changes from the patch implementing the class so that it can move forward concurrently.

sthagen pushed a commit to sthagen/ghc-ghc that referenced this issue Nov 9, 2023
Closes #20532. This implements CLC proposal 104:
  haskell/core-libraries-committee#104

The design is explained in Note [DataToTag overview]
in GHC.Tc.Instance.Class. This replaces the existing
`dataToTag#` primop.

These metric changes are not "real"; they represent Unique-related
flukes triggering on a different set of jobs than they did previously.
See also #19414.

Metric Decrease:
    T13386
    T8095
Metric Increase:
    T13386
    T8095

Co-authored-by: Simon Peyton Jones <simon.peytonjones@gmail.com>
@clyring
Copy link
Member

clyring commented Nov 15, 2023

The merge request !8912 implementing this proposal has finally landed.

These changes will appear in base-4.20.0, shipped with ghc-9.10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Approved by CLC vote awaits-merge Approved, but MR is still unmerged
Projects
None yet
Development

No branches or pull requests