Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relaxing HasField constraints (under review) #515

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ocharles
Copy link

@ocharles ocharles commented Jun 11, 2022

In the overloaded record fields proposal, there is a set of limitations as to when a user can declare custom HasField instances. In this proposal we relax these restrictions, allowing users to define HasField instances when they were previously unable to.

Rendered

@simonpj
Copy link
Contributor

simonpj commented Jun 14, 2022

Generally I like this proposal

Accepting incoherence for HasField instances may reduce optimization opportunities in some cases, because the compiler will not be able to assume that two dictionaries of the same type have the same value.

Well, that is already true for Num, Eq, Ord and every other class. You can define incoherent instances in different (sibling) modules. It's no different here, I think, is it? So it seems cruel to count this as a downside, since we accept it for all other classes with manual instances.

@adamgundry
Copy link
Contributor

Accepting incoherence for HasField instances may reduce optimization opportunities in some cases, because the compiler will not be able to assume that two dictionaries of the same type have the same value.

Well, that is already true for Num, Eq, Ord and every other class. You can define incoherent instances in different (sibling) modules. It's no different here, I think, is it? So it seems cruel to count this as a downside, since we accept it for all other classes with manual instances.

True, we could take the position that HasField allows incoherence only in rather dark corners (much like other "normal" classes) and hence it should be reasoned about and optimised under the assumption that incoherence will not occur. Indeed, with this proposal, the only cases where incoherence can arise are where the user defines a very polymorphic HasField instance that can conflict with normal selector instances, which should be rare. (For example, in the motivating example from the proposal the unB field is not exported and hence incoherence cannot arise.)

However, my position on this is influenced by a change I'd like to make but have not yet formally written up as a proposal: that we should automatically solve HasField constraints based on fields of record pattern synonyms. This would provide better backwards compatibility if users replace a record data constructor with a record pattern synonym. But since record pattern synonyms can be defined for any type in any module, this would also lead to HasField incoherence, which could be easier to observe in "normal" code. (Though perhaps we could define a notion of "orphan" pattern synonyms and discourage them to make this less likely...)

@ocharles
Copy link
Author

Hopefully @adamgundry's comment has responded adequately to @simonpj's. What's next for this? I note it's been mostly quiet for the two weeks since submission - should I wait a bit longer?

@nomeata
Copy link
Contributor

nomeata commented Jun 26, 2022

If you think that it's quiet because it just wasn't looked at by the community in general, maybe beat the drum on forums/reddit/mailing lists and solicit more input.

If you think it's quiet because there isn't much too comment, and possible issues would have brought up, you can formally submit for committee consideration. That is often a forcing function to get committee members to look at it more closely.

@ocharles
Copy link
Author

Thanks @nomeata! Will have a think tomorrow morning 👍

@goldfirere
Copy link
Contributor

After being initially skeptical, I'm in support after thinking about the ramifications.

@ocharles
Copy link
Author

ocharles commented Jul 5, 2022

Good to hear @goldfirere! Could you share your initial reservations anyway? I'd be curious to hear them 😃

@goldfirere
Copy link
Contributor

Well, I was wrong: I was most concerned by strange user experiences. For example, your instance HasField l r f => HasField l (T r) (T f) would work for everything -- except act very strangely for a label of unB. But this isn't actually a problem, because the magical instance triggers only when unB is in scope (and it normally wouldn't be). Precisely because we can control the availability of the magical instance, this makes sense.

@ocharles
Copy link
Author

Thanks for expanding, @goldfirere!

I would like to now submit this to the committee for consideration.

@adamgundry
Copy link
Contributor

ping @nomeata now you're back ^^^

@nomeata nomeata added the Pending shepherd recommendation The shepherd needs to evaluate the proposal and make a recommendataion label Aug 1, 2022
@nomeata nomeata changed the title Relaxing HasField constraints Relaxing HasField constraints (under review) Aug 1, 2022
@i-am-tom
Copy link

Hi! I've finally read through the proposal - thanks for your patience. Perhaps I'm biased as this is a problem I've run into myself, but I'm very in favour of this proposal. I'll recommend it to the committee!

@i-am-tom i-am-tom added Pending committee review The committee needs to evaluate the proposal and make a decision and removed Pending shepherd recommendation The shepherd needs to evaluate the proposal and make a recommendataion labels Aug 10, 2022
@simonpj
Copy link
Contributor

simonpj commented Aug 19, 2022

The proposal says

This change allows incoherent solutions to HasField constraints: two HasField constraints solved in different modules may be instantiated with different dictionaries.

But I don’t get it, and I don’t understand the example. I am confident that the analogy with IP is a red herring, because implicit parameter make (in effect) local instance declarations, and there is none of that here — only top level ones. So it seems there is more in common with ordinary overlapping instance decks.

In the example,

module M where
  data T = MkT { foo :: Int }

  bar = getField @"foo" (MkT 42)

module N where
  import M (T(MkT))

  instance HasField "foo" T Int where
    getField (MkT x) = negate x

  baz = getField @"foo" (MkT 42)

Surely the instance decl in N simply overlaps (completely in fact) the implicit instance decl in M (the built-in solver one), so baz is simply rejected, just as with overlapping instances.

I’m baffled here.

Generally I think the proposal should make it clear that the behaviour is JUST AS IF the built-in solver-magic instance declarations were written by the user in the module defining the type EXCEPT that they be once invisible if the field is not imported. (That last part is a bit magic.)

OH. Maybe in the example the point is that import M (T(MkT)) does not import the field foo and hence does not import the instance. If that is the point, please spell it out precisely, and much more loudly, both with and without the filtered import.

@simonpj
Copy link
Contributor

simonpj commented Aug 19, 2022

Generally speaking, I’m in support of this proposal.

Copy link
Contributor

@aspiwack aspiwack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that the question of whether the compiler may assume coherence or not is rather important here. The getField method is one that I expect to lend itself pretty well to specialisation (as it's virtually always called with a static field name). I don't think that GHC is currently capable of doing specialisation for classes that it doesn't assume to be coherent, so the performance implications may be important.

On the other hand, I may be wrong, maybe the standard getField will simply be inlined, no need for specialisation. I don't know. At any rate, this discussion should feature in the Alternatives section.

proposals/0000-hasfield-incoherence.rst Outdated Show resolved Hide resolved
@adamgundry
Copy link
Contributor

@simonpj

OH. Maybe in the example the point is that import M (T(MkT)) does not import the field foo and hence does not import the instance. If that is the point, please spell it out precisely, and much more loudly, both with and without the filtered import.

That is indeed the point. More broadly, the ability to "hide" instances based on imports/exports is what (I think) makes it similar to local instance declarations for IPs, though of course for HasField the visible instances don't vary within a single module, unlike IPs.

@aspiwack

It seems to me that the question of whether the compiler may assume coherence or not is rather important here. The getField method is one that I expect to lend itself pretty well to specialisation (as it's virtually always called with a static field name). I don't think that GHC is currently capable of doing specialisation for classes that it doesn't assume to be coherent, so the performance implications may be important.

Yes. I think it would be reasonable to say that GHC is entitled to assume HasField is coherent for optimisation purposes. (As @simonpj pointed out this is just as it does for normal classes like Eq, even though this assumption can be violated by a sufficiently-determined user.) In the original Reactive.Banana motivating example, incoherence cannot arise in practice because the field of Behaviour is not exported.

Here I suppose the analogy for IPs breaks down, because for IPs we really do expect different values depending on local scope, and it would be manifestly unreasonable to specialise.

@simonpj
Copy link
Contributor

simonpj commented Aug 24, 2022

That is indeed the point. More broadly, the ability to "hide" instances based on imports/exports is what (I think) makes it similar to local instance declarations for IPs,

Perhaps you can modify the proposal to make it impossible for a reader to misunderstand in the way that I did. At least point out the highly-significant selective import; and remind readers that the instance is only in scope if the field selector is.

@simonpj
Copy link
Contributor

simonpj commented Aug 24, 2022

Some thoughts

  1. Please add section numbers. It's easy to do so.

  2. In "Proposed spec" there are two paragraphs beginning "...HasField constraints have special treatment in the constraint solver". Please put these in a subsection headed "Overlapping instances" -- it wasn't immediately clear to me that the two paras belong together

  3. In that overlapping instance subsection, please say explicity that the language should obey all the existing overlap rule, just as if the built-in instances were user-written. And please give an example of the built-in instance for HasField, so that the overlap behaviour becomes clear.

  4. Give a variant of Example 2 in which the user-written instance needs an {-# OVERLAPPING #-} pragma; that it, it overlaps the built-in one, rather than being overlapped by it.

I will note that the built-in instances aren't really expressed as instances; they are code. So implementing these overlap rules will be fiddly.

  1. Under "Effects and interactions" in the first code example, I thought the instance HasField "foo" T Int decl would clash with the built-in one. But no: because the built-in one is invisible because the foo field is not imported. Please say this loud and clear, and point to the appropriate section of the user manual that describes this scoping behaviour.

@simonpj
Copy link
Contributor

simonpj commented Aug 24, 2022

I'm generally in favour; would like to see the above clarifications executed though

adamgundry and others added 2 commits August 24, 2022 21:03
Co-authored-by: Arnaud Spiwack <arnaud@spiwack.net>
@goldfirere
Copy link
Contributor

One question that seems unaddressed here is optimization. Right now, if the optimizer has d1 :: T and d2 :: T, where T :: Constraint, then it will freely substitute d1 for d2, for any T not headed by IP. This is because we generally assume that each constraint has exactly one form of evidence. The exception is IP, whose evidence can be written locally. By eliminating coherence for HasField, we would also seem to need to eliminate this coherence-backed optimization. Worse, we would need to do so unconditionally, for all HasField constraints, to be safe. It's hard for me to guess the impact of this change, but I know the optimization is there for a good reason.

Relatedly, IP has other restrictions as well, such as not being able to appear in the context of an instance declaration. This actually makes sense, because if an IP constraint is on an instance declaration for C, then C's dictionaries might not be coherent either, ruining the optimization above. To prevent contagious incoherence, GHC stops IP from being a constraint on an instance declaration. Would this proposal then mean we can no longer write HasField on an instance? If so, I would be against.

Sorry for being late with this negative analysis, but it didn't occur to me earlier.

@ocharles
Copy link
Author

ocharles commented Sep 7, 2022

Thanks for the feedback @simonpj and @goldfirere! I see @adamgundry has already responded to some points, but there's clearly some work left to do here. Some of this may be beyond my current abilities, but I'll try and do what I can and work with Adam to get all those open questions answered 👍

@nomeata
Copy link
Contributor

nomeata commented Sep 7, 2022

@ocharles or @adamgundry, are you by any chance attending Haskell Symposium? If so, would you like to present this proposal at the GHC Proposals session?

@adamgundry
Copy link
Contributor

@goldfirere I agree that the proposal could be clearer on the optimization point. I think refusing to specialise HasField or let it appear as a superclass would be too restrictive. It came up in earlier discussion and my suggestion is:

... GHC is entitled to assume HasField is coherent for optimisation purposes. (As @simonpj pointed out this is just as it does for normal classes like Eq, even though this assumption can be violated by a sufficiently-determined user.) In the original Reactive.Banana motivating example, incoherence cannot arise in practice because the field of Behaviour is not exported.

I'd appreciate feedback on whether this is acceptable.

If the risk of users accidentally introducing incoherence is too great, we could imagine requiring a modifier/pragma on potentially-problematic HasField instances (somewhat like INCOHERENT, although probably not INCOHERENT itself as we wouldn't want to force all such instances to be marked as incoherent for overlap resolution).


@ocharles or @adamgundry, are you by any chance attending Haskell Symposium? If so, would you like to present this proposal at the GHC Proposals session?

Alas I'm not able to make the Haskell Symposium this year. Good idea to organise such a session though!

@adamgundry
Copy link
Contributor

If the risk of users accidentally introducing incoherence is too great, we could imagine requiring a modifier/pragma on potentially-problematic HasField instances (somewhat like INCOHERENT, although probably not INCOHERENT itself as we wouldn't want to force all such instances to be marked as incoherent for overlap resolution).

Another possible approach occurred to me: we could have an on-by-default warning about potentially-problematic HasField instances (i.e. those that currently would be rejected outright). Users are then alerted that there is a need to ensure they will not be negatively affected by incoherence, and can disable the warning if appropriate.

@goldfirere
Copy link
Contributor

But I'm worried -- with the possibility of cross-module inlining, a key part of GHC's optimization pipeline -- that there will be no good way for even expert users to know whether they might be bitten by the incoherence. Module boundaries pretty effectively cease to exist in the optimizer.

It makes for a strange UX, but maybe this would work: If a module does not export a field name and never uses a HasField constraint at that field name, then an overlapping HasField instance is possible. Really, the built-in HasField instance is as if it never existed. This is strange, though, because it involves very spooky action at great distances.

This little optimization around class constraints causes us some pain (both here and elsewhere, though I don't have a ready link to the "elsewhere"). Do we know that it makes a difference? That is, perhaps we could just disable it and then see how much of a slowdown we get.

@nomeata
Copy link
Contributor

nomeata commented Feb 9, 2023

Whatever happened to Richard’s concerns? Could they be addressed?

@nomeata nomeata assigned angerman and unassigned i-am-tom Feb 9, 2023
@nomeata
Copy link
Contributor

nomeata commented Feb 9, 2023

I’ve reassigned shepherding this proposal to @angerman.

@angerman
Copy link
Contributor

After reading (and trying to understand this proposal), as well as reading the comments:

  • This looks like solid improvement to me from a "I want to do this, why can't I do this" perspective.
  • I share some of @goldfirere's concerns about "How do I know what's wrong if things go south"?
  • I can not find any impact (on current code) analysis? Will this potentially break any code?
  • @goldfirere (and the proposal) mention potential performance regressions. Is there a way to ballpark this? I assume this would largely depend on the codebase being a heavy user of HasField/getField?
  • Will this feature be guarded by -XIncoherentHasField?

@adamgundry
Copy link
Contributor

Thanks for the quick review @angerman!

There are two options on the table here:

  1. Treat HasField like IP (i.e. do not assume all dictionaries of the same type have the same value).
  2. Continue to assume all HasField dictionaries of the same type have the same value, i.e. accept that the new HasField instances permitted here may lead to incoherence (like INCOHERENT instances already do).

The current proposal text describes option 1, but I'm leaning towards option 2.

  • I share some of @goldfirere's concerns about "How do I know what's wrong if things go south"?

Assuming we choose option 2, my preferred suggestion is that we should have an on-by-default warning (maybe even a "severe" warning that is an error unless explicitly disabled?) that tells the user the instance might lead to incoherence and they are proceeding at their own risk. (Obviously the exact text would need to explain a bit more!)

This is not massively different to INCOHERENT pragmas in terms of the potential for problems later.

As @goldfirere explained, it is difficult to characterise exactly when an instance will cause problems in practice (and perhaps in the future we will identify a subset of instances that are safe, and hence not warned about, e.g. where the module doesn't use or export any conflicting fields). Thus I think it is reasonable to have a warning, which can be refined if necessary to not warn about safe cases.

  • I can not find any impact (on current code) analysis? Will this potentially break any code?

No. This proposal allows the user to define strictly more instances (those that are currently prohibited by a check in GHC). Thus I don't think it can break existing code.

  • @goldfirere (and the proposal) mention potential performance regressions. Is there a way to ballpark this? I assume this would largely depend on the codebase being a heavy user of HasField/getField?

Option 1 seems likely to reduce specialisation opportunities, and hence seems like it could be quite bad if HasField is used in a tight loop that suddenly acquires an extra dictionary argument. I don't have any concrete numbers, but I'm worried about this because it could affect any use of HasField, and that's a key motivation for option 2, which shouldn't regress performance.

  • Will this feature be guarded by -XIncoherentHasField?

We don't currently require a language extension to use HasField or define its instances. In general the idea is that importing a "special" typeclass is enough to opt in to its behaviour (e.g. we similarly don't use extensions to guard Typeable, KnownNat, etc.). Thus I would say not, and that a warning on such instances would be enough. But I won't insist if others feel an extension is necessary.

@nomeata
Copy link
Contributor

nomeata commented Feb 10, 2023

This looks like some serious discussion is still happening. Maybe label it as needs-revision (which should not be thought of as a “bad” thing), and re-submit when refined?

@simonpj
Copy link
Contributor

simonpj commented Feb 10, 2023

I had another look.

The proposal plans to lift four restrictions, but it is only supported by two examples. That's a bit thin on motivation. At least it'd be good to have an example supporting each restriction-lifting.

More fundamentally I'm trying to understand the overlap/incoherence issue. Take the main example:

class HasField x r a | x r -> a where
  getField :: r -> a

newtype Behavior a = Behavior { unB :: Time -> a }
  -- I have picked a particular representation for behaviours

-- (I1) Implicitly-generated instance for HasField
-- generated from the `newtype` decl
instance HasField "unB" (Behaviour a) (Time -> a) where ...

-- (I2) Explict user instance:
instance HasField x a b
      => HasField (x :: k) (Behavior a) (Behavior b) where ...
  getField (Behaviour a) = Behaviour (getField @x a)

The intent is that if r :: Behaviour t and you write r.unB, or getField @"unB" r, the implicitly-generated instance "wins".

But what about the functional dependencies? If we have [W] HasField alpha (Behaviour t) beta,
the first two arguments match the instance (I2) so we'll emit [W] beta ~ (Behaviour gamma).
And that would be totally wrong if alpha later got unified with "unB".

For ordinary instance declarations GHC avoids this problem by checking
for conflicts between instance decls. So if I do all this manually with
a normal class C like this:

data T a = MkT [a]

class C a b c | a b -> c where
  get :: b -> c

instance            C "foo" (T a) [a]
instance C r a b => C r     (T a) (T b)

then GHC complains

Foo.hs:12:10: error: [GHC-46208]
    Functional dependencies conflict between instance declarations:
      instance C "foo" (T a) [a] -- Defined at Foo.hs:12:10
      instance forall k (r :: k) a b. C r a b => C r (T a) (T b)
        -- Defined at Foo.hs:13:10

and rightly so.

So I'm a bit stumped:

  1. I don't think GHC actually implements fundeps for HasField just as if the implicit HasField instances were really there. I have just looked at doTopFundepImprovement and I see nothing special for HasField. Nor do I see anything in improveFromInstEnv. Those omissions look like bugs, don't they?

  2. If we fixed that bug, then if we lift the restrictions as proposed, the user-defined instance would be rejected above.

Some of this might go away if the instances looked more like this:

-- (I1')
instance (b ~ (Time -> a)) => HasField "unB" (Behaviour a) b

-- (I2')
instance (HasField x a b, c ~ (Behaviour b)
      => HasField (x :: k) (Behavior a) c where ...
  getField (Behaviour a) = Behaviour (getField @x a)

But then of course the fundep on HasField does nothing useful, with consequences that
are unclear to me.

All this seems pretty fundamental. I must be missing something

@phadej
Copy link
Contributor

phadej commented Feb 10, 2023

And that would be totally wrong if alpha later got unified with "unB".

That's what @adamgundry mentions as behaving as INCOHERENT instances, if you write r.unB you won't know what you get. (EDIT: actually I think you'll always get the user defined instance, as implicit instance solver won't have chance to kick in; I might be wrong - and that's just because it's implemented that way).

Those omissions look like bugs, don't they?

No. Special code solving HasField takes care of that. See Note [HasField instances] and e.g. matchHasField implentation:

Note [HasField instances]
~~~~~~~~~~~~~~~~~~~~~~~~~
Suppose we have

    data T y = MkT { foo :: [y] }

and `foo` is in scope.  Then GHC will automatically solve a constraint like

    HasField "foo" (T Int) b

by emitting a new wanted

    T alpha -> [alpha] ~# T Int -> b
matchHasField dflags short_cut clas tys
  = do { fam_inst_envs <- tcGetFamInstEnvs
       ; rdr_env       <- getGlobalRdrEnv
       ; case tys of
           -- We are matching HasField {k} x r a...
           [_k_ty, x_ty, r_ty, a_ty]
               -- x should be a literal string
             | Just x <- isStrLitTy x_ty
               -- r should be an applied type constructor
             , Just (tc, args) <- tcSplitTyConApp_maybe r_ty

The implicit HasField instance solver kicks in when the field name and type are known, and then it calculates the field and emits a new wanted, i.e. it simulates the functional dependency while producing the evidence. I guess this is more like if HasField type class were defined with associated type family, rather than with a fundep.

@simonpj
Copy link
Contributor

simonpj commented Feb 10, 2023

No. Special code solving HasField takes care of that.

Ah I see, thank you. The Note does not express it this way, but it's exactly asif we had implicitly generated (I1') above:

-- (I1')
instance (b ~ (Time -> a)) => HasField "unB" (Behaviour a) b

That is, in the implicitly-generated instance, apparently, the last parameter is always a type variable. So we never get any useful fundeps from it, so neglecting them is fine.

But then why don't we do write the user written instances like that too:

-- (I2')
instance (HasField x a b, c ~ (Behaviour b)
      => HasField (x :: k) (Behavior a) c where ...
  getField (Behaviour a) = Behaviour (getField @x a)

That would render the fundep essentially powerless, which consequences I don't understand. After all, it seems that the fundep on HasField is essentially powerless today, if you don't write any user instances.

Ah -- I suppose that today if you have [W] HasField f r alpha, [W] HasField f r beta then we will generate alpha ~ beta and perhaps that is useful.....

Returning to the original case, if you have [W] HasField alpha (Behaviour t) beta, from (I2) we'll emit [W] beta ~ Behaviour gamma. Then if later we discover beta ~ "unB" we'll end up with

  [W] HasField "unB" (Behaviour t) (Behaviour gamma)

which the built-in solver solves using (I1) yielding the perplexing result

  [W] (Behaviour gamma) ~ (Time -> delta)

This does not look good to me.

Maybe we should insist that user-defined instances also always have a type variable as the last parameter? (Or use a type family instead of the last parameter.)

@adamgundry
Copy link
Contributor

I agree that we can get strange behaviour if code uses both the implicitly-defined HasField "unB" (Behaviour a) b instance and the user-written HasField (x :: k) (Behavior a) c instance; that's precisely the original motivation for the restrictions this proposal asks to lift!

But in the motivating examples, we typically don't want the implicitly-defined instance, because unB isn't part of the public API. The problem is that we don't have a mechanism to suppress it.

Perhaps we could introduce such a mechanism. But the only options I see are having the semantics depend on module exports, which @goldfirere aptly called "very spooky action at great distances", or having a syntactic marker that a particular datatype/field shouldn't have HasField instances, which is addressed in the proposal's Alternatives:

We could imagine adding explicit annotations to particular types or fields to prevent HasField constraints being solved automatically for those cases, and relax the restrictions only when the annotations are present. However this requires new syntax (or a new modifier) and requires the author of the original datatype to add an annotation, which may prevent downstream users from adding useful HasField instances.

@phadej
Copy link
Contributor

phadej commented Feb 10, 2023

@adamgundry

The problem is that we don't have a mechanism to suppress it.

If the unB selector (even in NoFieldSelectors sense) isn't exported, the implicit solver won't solve for it.

@phadej
Copy link
Contributor

phadej commented Feb 10, 2023

which the built-in solver solves using (I1) yielding the perplexing result

I don't think the built-in solver will solve it. It should only ac if there are no matching user-defined instances.

I don't see how to construct that example using orphans, as you start from knowing I2 exists, so it's there already. And if beta ~ "unB" is discovered, the user instance I2 should match (again), and built-in solver isn't considered.

TL;DR built-in solver should be the fallback and user-defined instances should always take precedence.

@angerman
Copy link
Contributor

Given the ongoing discussion, and @adamgundry's preference for Option 2, I'll take @nomeata suggestion to change this to needs-revision.

@angerman angerman added Needs revision The proposal needs changes in response to shepherd or committee review feedback and removed Pending committee review The committee needs to evaluate the proposal and make a decision labels Feb 14, 2023
* Introduce severe warning when a potentially incoherent instance is used
* Clarify proposed change specification around overlapping instances
* Allow optimiser to assume canonicity for HasField
@adamgundry
Copy link
Contributor

Thanks everyone for your input. I've amended this proposal as follows:

  • Defining instances that violate the conditions now results in a warning; moreover this warning is an error by default. That is, users will have to explicitly opt in by downgrading the warning severity in order to define an instance that may conflict with a built-in instance. This reflects the fact that defining such instances is useful in some cases, but requires care to avoid problems arising from incoherence.
  • The proposed change specification is now hopefully clearer about the treatment of overlapping instances and functional dependencies.
  • The optimiser is no longer required to exercise caution around HasField constraints, avoiding the potential performance regressions. Instead, users defining potentially-incoherent instances are responsible for avoiding situations where optimisation might make semantic changes (much like with INCOHERENT pragmas).

I'd welcome further feedback.

@AntC2
Copy link
Contributor

AntC2 commented Feb 27, 2023

I'd welcome further feedback.

Thank you @adamgundry, very clear write up. (Which shows your design decisions.)

[implicit] instance[s] will not have the OVERLAPPABLE, OVERLAPPING or INCOHERENT flags set, regardless of language extensions in effect.

Yes I think that's a sensible design choice.

I agree the interaction between OVERLAP* of implicit vs explicit instances and FunDeps is ugly. I think this more reflects that those two extensions don't go together well in general, rather than anything specific to HasField. To take your example in 2.1:

    data T = MkT { f :: Int }  -- implicitly defines instance b ~ Int => HasField "f" T b

    instance HasField fld T () where
      getField _ = ()

I want to say the explicit instance is OVERLAPPABLE wrt the implicit instance -- because label fld is more general than "f". But banjaxed because resulting type () is more specific than the implicit instance's b [**]. Presumably I could write that explicit instance as:

instance {-# OVERLAPPABLE #-}  b ~ () => HasField fld T b where
  getField _ = ()

in which I do need explicit OVERLAPPABLE per your Example 2.

[**] I don't think this needs the 'Dysfunctional Instances' proposal that you link to. Merely: for the purposes of overlap-check, ignore FunDep target positions. (Yes this gets ugly if there's multiple FunDeps -- but that doesn't apply with HasField.) §2.2 says

The "implicit instance" introduced above is not legal for users to define because it violates the functional dependency on HasField.

That would be legal with UndecidableInstances. Don't we assume that's switched on for HasField instances even if not switched on in the module? That's one of the things that's 'magic'.

The compiler will not check consistency of functional dependencies between "implicit instances" and user-defined instances.

Hmm. That seems sad. From your examples given (though perhaps you have gnarlier examples in mind), I think the 'magic' needs four things that apply for both explicit and implicit instances of HasField [***]:

  • Assume UndecidableInstances applies for all instances of HasField.
  • For the purposes of overlap-checking, ignore the instance head's target position of the FunDep.
  • Require explicit OVERLAPPABLE or OVERLAPPING on the user-written instances. (Interpreting them to apply to the FunDep source positions of the instance head.)
  • Require instances overlap in strict substitution ordering -- that is, considering the FunDep source positions only.

IOW I'm anticipating FunDeps for HasField working the same as your possible future of a FieldType type family.

I'm thinking the -Wincoherent-hasfield-instances won't get triggered/won't need disabling -- but by all means retain it if it's already spec'd/coded ;-). I'm very reluctant to support INCOHERENT annotation on HasField instances.

[***] If it's any help, I've already amended Hugs to apply these rules for instances in general -- including the gnarly cases of multiple FunDeps ...

... and specifically for some very fancy overlap of instances over TRex records.

Addit: I should say the other key consideration for relaxing FunDep rules is that HasField's FunDep is Full in the Sulzmann et al jfp06 Definition 13 sense.

@simonpj
Copy link
Contributor

simonpj commented Feb 27, 2023

I said above

The proposal plans to lift four restrictions, but it is only supported by two examples. That's a bit thin on motivation. At least it'd be good to have an example supporting each restriction-lifting.

Would it be possible to address this?

@simonpj
Copy link
Contributor

simonpj commented Feb 27, 2023

`Several points.

  1. The proposal says

The "implicit instance" for a type T with a field f of type A is:

instance b ~ A => HasField "f" T b where
getField = f

But this directly contradicts the user manual entry for HasField. Which is right? (Answer: the proposal is; I just looked at GHC.Tc.Instance.Class.matchHasField.)
If the user manual is wrong, let's point that out in the proposal and raise a MR to fix the manual.

This matters, even aside from this proposal. Earlier I said

  • I don't think GHC actually implements fundeps for HasField just as if the implicit HasField instances were really there. I have just looked at doTopFundepImprovement and I see nothing special for HasField. Nor do I see anything in improveFromInstEnv. Those omissions look like bugs, don't they?

But actually, with the above implicit instance, these omissions make sense: given [W] HasField "f" T ty there is no "improvement" to do; we simply fire the instance declaration to simplify it to [W] ty ~ A. We should document this non-obvious point in Note [HasField] wherever that is in the compiler.


  1. Section 2.1 should explain what happens if you give an {-# OVERLAPPING #-} pragma to the user-define instance:
instance {-# OVERLAPPING #-} HasField fld T () where
  getField _ = ()

Then I think eg2 will be accepted.

Ditto what if you add {-# INCOHERENT #-}:

instance {-# INCOHERENT #-} HasField fld T () where
  getField _ = ()

Then I think eg1 will be accepted, via the implict instance. We should spell these things out, even though they are
consequences of saying "imagine that there is this implicit instance".


  1. The proposal says:

The "implicit instance" introduced above is not legal for users to define because it violates the functional dependency on HasField.

Please give the class for HasField so we can make sense of this comment. Also, the instance does satisfy
the Liberal Coverage Condition, so it would be OK as a user-defined instance if you said -XUndecidableInstances
(Or (better) if we had a per-instance undecidable-instance pragma, which we jolly well should.)


  1. The proposal says

The compiler will not check consistency of functional dependencies between "implicit instances" and user-defined instances

What does that mean? Give an example. I think it's something like this:

-- Class decl
class HasField x r a | x r -> a where ...

-- Implicit instance
instance b ~ A => HasField "f" T b where ...

-- User-defined instance
instance HasField "f" T () where ...

Do these contradict? No they do not; there will be consistency-check failure. Example without special purpose classes:

{-# LANGUAGE FunctionalDependencies, UndecidableInstances, DataKinds #-}
module Foo where

data T

class C x r a | x r -> a where {}

instance b ~ Int => C "f" T b where {}

instance C "f" T () where {}

This module is accepted today.

So, what does the sentence mean? I think it's simply the case that there can be no consistency failure; i.e. you can just delete the sentence.


  1. Earlier I wrote, using this code, based on the example in the proposal:
newtype Behavior a = Behavior { unB :: Time -> a }
  -- I have picked a particular representation for behaviours

-- (I1) Implicitly-generated instance for HasField
-- generated from the `newtype` decl
instance (b ~ (Time -> a)) => HasField "unB" (Behaviour a) b where ...

-- (I2) Explict user instance:
instance HasField x a b
      => HasField (x :: k) (Behavior a) (Behavior b) where ...
  getField (Behaviour a) = Behaviour (getField @x a)

If you have [W] HasField alpha (Behaviour t) beta, from the fundeps in (I2) we'll emit [W] beta ~ Behaviour gamma.
Let's say we unify beta, so we now have

  [W] HasField alpha (Behaviour t) (Behaviour gamma)

That matches (I2), but unifies with (I1) . So according to the user manual, no instance will fire, because (I1) is not marked as incoherent.

Now, if later we discover alpha ~ "wombat" we'll end up with

  [W] HasField "wombat" (Behaviour t) (Behaviour gamma)

Now (I1) no longer unifies, and we can unambiguously solve with (I2). That's good: a wombat field on t is lited to a wombat field on Behaviour t.

If alpha ~ "unB" things are murkier, becuase unB is a field of Behaviour itself. We get

  [W] HasField "unB" (Behaviour t) (Behaviour gamma)

Now both (I1) and (I2) match, so (absent overlapping instances) we won't pick either, which is probably a good thing.
(If you have a global -XOverlappingInstances, then (I2) may fire.)

None of this was obvious to me. I strongly urge that you work this through in the proposal to demonstrate the consequences.


  1. The proposal speaks of incoherence:

More precisely, a new "severe" warning flag -Wincoherent-hasfield-instances is introduced. "Severe" means that it is treated as an error by default (as if -Werror=incoherent-hasfield-instances were specified); see proposal #571 for a proposal to introduce this category more generally. When a user defines an instance that would violate the restrictions above, a warning will be emitted controlled by the -Wincoherent-hasfield-instances flag.

But is there anything incoherent? We simply behave as if the implicit instance existed, with GHC's existing rules for selecting instances, don't we?

Or are you suggesting that the instance lookup rules for HasField don't obey these rules, and hence may give rise to incoherence?

@AntC2
Copy link
Contributor

AntC2 commented Feb 28, 2023

@simonpj (I think you're being harsh: this stuff is a tangled mess inside GHC, whether or not we try to thread HasField through it.)

  1. Section 2.1 should explain what happens if you give an {-# OVERLAPPING #-} pragma to the user-define instance:
instance {-# OVERLAPPING #-} HasField fld T () where
  getField _ = ()

Then I think eg2 will be accepted.

I'd hope not (or at least I hope no attempts at usage sites will be accepted): because the instance heads overlap in no substitution ordering. OVERLAPPING means 'more specific than' vs OVERLAPPABLE means 'less specific than'(?)

Ditto what if you add {-# INCOHERENT #-}:

Then I'd expect the instance decls to get accepted, but usage sites to be problematic.

(As I said above, this is really nothing to do with HasField. Overlaps and FunDeps don't play well together generally.)

  1. ... would be OK as a user-defined instance if you said -XUndecidableInstances (Or (better) if we had a per-instance undecidable-instance pragma, which we jolly well should.)

We should treat all HasField instances (explicit or implicit) as if UndecidableInstances.

... None of this was obvious to me. I strongly urge that you work this through in the proposal to demonstrate the consequences.

Hmm. I'd say all the non-obvious behaviour is already there with existing FunDeps/overlap. I'm not sure HasField should get the 'blame' for it -- or carry the burden of fixing it. I'd rather HasField be allowed to overlook the existing (mostly undocumented) behaviour; show a clear path for sensible behaviour; then learn from that to tame GHC-at-large. (I believe I've arrived at the right behaviour in Hugs, but I wasn't at all concerned for backwards compatibility.)

  1. The proposal speaks of incoherence:

...

But is there anything incoherent? We simply behave as if the implicit instance existed, with GHC's existing rules for selecting instances, don't we?

Or are you suggesting that the instance lookup rules for HasField don't obey these rules, and hence may give rise to incoherence?

You've hinted HasField instances be Undecidable/overlaps Incoherent whether or not that's explicitly set in the module. Is an intermediate-level user expected to understand how HasField lookup goes despite not switching those on? And 'just get right' their explicit instances and usage sites?

@AntC2
Copy link
Contributor

AntC2 commented Mar 2, 2023

@simonpj The proposal plans to lift four restrictions, but it is only supported by two examples. That's a bit thin on motivation. At least it'd be good to have an example supporting each restriction-lifting.

IIRC the motivation was "a bit thin" for those restrictions in the first place. They're there out of an abundance of caution. We now have -XNoFieldSelectors (combined with Import/Export controls ) that can mean there's no field selectors at all for some datatypes, but the usual (implicit) ones for other datatypes.

  • If I get the drift of this proposal (but I agree it's not very explicit), the approach is to do away with syntactically-defined restrictions, and prefer an approach based on overall coherence [**] well-behavedness of instances for HasField.
  • 'HasField` has a particularly tractable definition for checking well-behavedness of instances/overlaps: there's a single FunDep; it's Full in the Sulzmann et al jfp06 Definition 13 sense.
  • OTOH HasField is strange in that it has a phantom type param (the field name) that drives the FunDep.

[**] I'd better avoid "coherence" as a term: that has already a specific sense wrt instances & overlaps.

So to take those four restrictions:

  • HasField _ r _ where r is a variable;
  • instance {-# OVERLAPPABLE #-} HasField "dataConstr" r String where ... as a diagnostic stub during development.
    (The idea being there should be diagnostic instance HasField "dataConstr" T String ... for all types T. But yes I'm struggling to think of a real example.)
  • HasField _ (T ...) _ if T is a data family (because it might have fields introduced later, using data instance declarations);
instance {-# OVERLAPPABLE #-} (HasField "foreName" (T ...) String, HasField "lastName" (T ...) String) =>
       HasField "fullName" (T ...) String where ...
  • All data instances of (T ...) are expected to include name fields; "fullName" is a virtual field that concats them.
  • HasField x (T ...) _ if x is a variable and T has any fields at all (but this instance is permitted if T has no fields);
  • Example 2 lifting the datatype into Behaviour (T ...) (and example 0? in sect 2.1) in the proposal. Again a catch-all/diagnostic stub during development.
  • HasField "foo" (T ...) _ if T has a field foo (but this instance is permitted if it does not).
  • instance {-# OVERLAPPING #-} HasField "foo" (T String Bool) String where ... -- as compared to the implicit instance HasField "foo" (T a b) a where ... -- to produce a specially formatted result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs revision The proposal needs changes in response to shepherd or committee review feedback
Development

Successfully merging this pull request may close these issues.

None yet

10 participants