-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Constraints and overlapping instances #3596
Comments
If by "should" you mean in an ideal world, then maybe 😄 But as things are, it is right that they are not allowed - instance resolutions works by looking at the instead head (type part) only, and then only afterward are constraints checked. So |
It's perhaps worth noting that GHC also rejects this as overlapping, although you only get an error when you attempt to use the instance: {-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE UndecidableInstances #-}
module Main where
class Foo a where
foo :: a -> Bool
instance (Eq a) => Foo a where
foo _ = True
data Bar = Bar
instance Foo Bar where
foo _ = False
main = putStrLn $ show $ foo Bar produces
I'm not particularly familiar with the tradeoffs here, but I think I can at least say that having contexts affect resolution could make instance resolution quite a bit more complicated to follow, as well as complicating the implementation quite a bit, so I'd suggest not implementing this. |
GHC is not as strict about orphan instances as PureScript is, and orphan instances would make my proposed behavior much more non-local, so I wouldn't suggest this for GHC, but maybe it makes sense for PureScript? I'm not very familiar with the tradeoffs here either, although it's clear that this would change constraint solving from being a relatively deterministic, constraint-to-instance-to-more-constraints process to something that might involve some backtracking (but not in a way that slows down compiling any existing code, due to PureScript's current restrictions). I'm (perhaps naively) optimistic about the code for that being pretty easy to follow, and I'd be happy to risk the implementation time trying it to see if you like it—but if the idea is a no-go at the conceptual level that would be silly. To be clear, I don't think I'm suggesting the compiler should do anything more complicated than handling the specific case where for a pair of instances, one instance's head is an unknown but with constraints, and the other instance's head is known; and handling that case entails trying to solve the constraints for the known instance head. (The code might get more complicated than that, because let's not do the work to solve those constraints twice and all—but conceptually, that's the request.) If the value of this feature is unclear and practical use cases would swing the balance, I can offer more words about my real-world motivations. |
I think that proposal to restrict it to work "guard style" for instance chains is intended as a way of keeping things local. I don't know if anyone has the consequences of doing this kind of thing really, it seems like there should be something out there though, as it is an "obvious" thing to want. |
I’ve thought about this a bit more and I’m pretty certain that I don’t want us to implement this, for a few reasons:
|
I’ve realised the example in my point 4 above is not a very good one, because it still doesn’t make sense if we did add this feature; of course we’d need to require that no types to be both Eq and Show for that to work, which is untenable. I came up with another issue with this though: it would mean that adding an instance has to be treated as a breaking change. Imagine version 1.0.0 of |
Ah. That, I concede, could be a hill worth dying on. A somewhat weaker form of this objection also applies to #3120, does it not? If you use overlapping instances in an instance chain, a dependency could add an instance which causes code in your module or a downstream module to select an earlier instance in the chain. This is weaker in two senses: first, it only changes which instance is selected instead of failing to compile (is that truly better though?); second, one doesn't accidentally write an instance chain—it could just be a strong best practice that if you write overlapping instances in your instance chain, you had better be truly indifferent about which is selected. What are your thoughts on that? I could rechannel my efforts into implementing #3120 and get almost as good a solution for my use case, but if that's also a no-go, I have a bigger problem. |
Yes, it does also apply to #3120, although in the case of an instance chain I think it's slightly less of an issue because you're explicitly casing on whether a particular instance exists, so presumably you want the behaviour to change if an upstream data type gets an instance it didn't previously have. That said, I think my objections 2, 3, and 5 above also apply to #3120, so I'm not particularly keen on doing that either, sorry.
I think the whole point of instance chains is to allow you to express things which you previously would have had to resort to overlapping instances to do; if your instances don't overlap then you shouldn't need chains at all. |
I think guards on instance chains would need a way to scope the backtracking so that you could tell the typechecker when to commit to an instance, otherwise you'll always have to backtrack which will definitely lead to terrible errors and poor performance. That's why the instance chain paper introduces a separate syntactic form for guards. |
Also, I should say that I am also not keen on this change. I really don't like the idea that I could create a type and it's automatically made instances of things without me opting into it. GHC has default signatures which provide implementations as long as you implement some other class, but you still have to opt into the instance. |
The paper you're talking about is Instance Chains: Type Class Programming Without Overlapping Instances by J. Garrett Morris and Mark P. Jones right? Because the impression I had from that paper is that there wasn't a separate syntactic form for guards, they just decided to use a different syntax for instance contexts, and also that instance search can backtrack more generally than just in one particular instance chain:
|
I very well could be misremembering. I'd compare it to writing a parser that always backtracks at every alternative regardless of whether any tokens were consumed. There isn't anything "wrong" with this, and you'll still get a working parser, but you're going to get very bad errors without a lot of work, and you're going to probably do far too much work before failing. Or another way to look at it, if you think of instances chains as type functions, then the head is like a pattern match and the context is like the branch expression. If you backtracked in the context, that would be like backtracking to another pattern if you threw any exception anywhere. We have a separate syntactic construct for value guards, so we should probably have it for type-level guards. |
Okay, let's leave these specific proposals to the side for a moment. There is sometimes a need to write recursive functions over an open set of possibly non-cooperating data types. A good example is So what should PureScript do, if anything, to meet this need? Logically, the only options are:
Let's talk about 1. It's obviously the most attractive from the perspective of a language maintainer, and as someone who occasionally has to tell petitioners that the thing they want isn't a good fit for a language I help to maintain, I promise I'm sympathetic to that position. But these are the options of a programmer trying to write a generic function like
I could be missing something, but I think that's it. I don't think this is a good place to be, and I doubt I'm alone in that! I say that not to be critical of PureScript's evolution thus far, and obviously I'm interested in helping and not just complaining. But I hope to start with a mutual recognition that this is a problem which deserves a better solution. So option number 2: tweaking type classes, by adding some limited backtracking to instance resolution, or adding guards to instance chains, or something else. If type classes are the answer to supporting recursive functions over open sets of possibly non-cooperating data types, and instances of classes are how you add new behavior to such functions, then adding an instance will be a potentially breaking change no matter how the feature works. It's unavoidable—if you write a function with behavior that's extensible through the participation of other modules but doesn't require that participation, then modules can change that function's behavior in a potentially breaking way. The question is how and whether this can be limited so that people don't live in fear every time they make something Maybe—and bear with me, this is a little half-baked as a proposal—there are actually two kinds of type classes. Type classes like I would venture that type classes of the first kind ( I know this still leaves a lot of your very good points unresolved—about ability of users to follow instance resolution, about the time complexity of backtracking, especially the FUD about the consequences of trying something new—but I can't even begin to address those until there's some agreement on the goal. Hopefully you agree that PureScript should have a way to write recursive functions over open etc. etc., and maybe you agree that type classes, as the closest thing to a solution we have so far, could in principle be adapted in some way to get all the way there. But that brings me to option 3: a language feature in some different area. Look, if some solution to the expression problem that isn't type classes is waiting on the roadmap to be implemented... well, I would say ‘great’, but I'm not sure PureScript needs two different approaches to the expression problem. I don't think this would happen. I don't think you think this would happen. I only included it in my list of options so as not to create a false dichotomy, since it's technically possible and I could be wrong about what you're planning! But this comment is too long already so I won't belabor it. To conclude, I think because PureScript, relative to GHC, makes fewer escape hatches available to its users for getting around the rules, someone trying to write hashing, pretty-printing, serializing, arbitrary-value-generating, and other generally-applicable but special-caseable recursive functions in PureScript has to contend with some severe limitations, even with instance chains in play. I think this is a sufficiently important use case to merit a little bit of rethinking of how some parts of the language—probably type classes—work. The recent addition of instance chains gives me hope that another incremental and well-motivated addition to the language might still be on the table. If I can get you on board with just those ideas, I'm willing to keep pitching different ways to solve the problem for as long as you're willing to read them. Thanks for your engagement so far, and let me know what you think. |
I think we could do with a bit more discussion on what you can do about this problem with the language as it is today, as I have a couple of suggestions. Let's suppose that I am depending on both One way I think you can get around the issue by defining a new class locally to your project, and you provide the missing instances via this class and instance chains: class Hashable' a where
-- this class will have the same methods as the class I care about
hash' :: a -> Hash a
instance hashableComplex :: Hashable a => Hashable' (Complex a) where
hash' = genericHash
-- If there are other third-party types we want to give instances to,
-- we can provide those instances here
else instance hashableSomethingElse :: Hashable' SomethingElse where
hash' = genericHash
else instance alreadyHashable :: Hashable a => Hashable' a where
hash' = hash Now we can use a newtype to allow using the above class together with things which use a standard newtype Hashable' a = Hashable' a
instance Hashable' a => Hashable (Hashable' a) where
hash (Hashable' a) = hash' a This is admittedly a bit more noisy, but it has a couple of properties which I think are really nice. Firstly, there's less non-local reasoning required, especially during upgrades of third-party libraries. If I upgrade to a later version of |
Oh actually, that doesn't really work does it, because if I want to hash, say, a
|
Yup, that pattern works great for defining a new function |
I'm still thinking through your comment but I'll just post a couple more immediate thoughts for now:
I'm not sure this is the best example, because Hashable does have a mathematical property which is very important for the correct operation of data structures like HashMap, that
As it happens I think my reaction to someone automatically making my custom Complex number type serializable would be more along the lines of 'uh, wait ... which one?', because there will often be a number of options: 1+2i could be represented in JSON as any of One scenario which I think would be somewhat unpleasant is if one of these default catch-all instances ended up being broken. Imagine Alice maintains a complex number library, and Bob has written some other library which provides a class with a catch-all instance, but neither of them are aware of each other's libraries. Now suppose this catch-all instance applies to Alice's complex number type, but that instance ends up being broken. This puts pressure on either (or both) of Alice and Bob to add a dependency on the other's library and provide a specific instance, even if neither of the libraries really have anything to do with each other. |
Fair, having a ‘mathematical property’ probably isn't the best way to describe the distinction. It's possible there is no meaningful distinction, but I feel like there is one and I'm just struggling to characterize it rigorously. It probably has more to do with being part of an interface to a system you can largely treat as a black box, versus being an input to a combinator that you do want to be able to reason about. So yes,
Granted. Please consider my comments to only apply to the case where the consumer only cares that
Two responses: first, with the current language, there's already pressure on either (or both) of Alice and Bob to take the other as a dependency if those libraries are to work together at all; so that specific complaint hardly makes the hypothetical a worse state of affairs. Second, it's pretty obviously a bug in Bob's library if his catch-all instance doesn't satisfy the laws of his own class. If his catch-all is lawful, then I don't know in what sense a consumer can consider it to be broken. It would be nice (ignoring the downsides) if a third party consumer could define an orphan instance specifying a better implementation of Bob's class for Alice's data (one that is faster or produces fewer hash collisions are the examples I have in mind, not one that satisfies more laws), but of course orphans bring their own troubles and I'm not campaigning for that—unless, perhaps, the solution that comes out of the rest of this conversation happens to invalidate the objections to orphans as well. |
I haven't made up my mind yet 😄
I think it does give us a worse state of affairs: previously, attempting to use Alice's data with Bob's class, you would end up with a failure at compile time. Now, you end up with a failure at runtime. I think this does increase the pressure on one or both of Alice and Bob because runtime failures are rightly considered to be worse than compile time failures. Then again I suppose it is fairly clearly a bug in Bob's library. |
Honestly, I think I'm kind of grasping at straws here to keep the conversation going; my responses to this thread are mostly guided by my feelings that the language and community is best served by reducing the number of type system features we're putting in (which I've expanded more on in a Discourse thread "Thoughts on future additions of type-level features"). In particular I'm not keen on adding type system features which haven't already been discussed in PL academia, because I think that way we run a much larger risk of implementing something which turns out to be broken; see for example the GeneralizedNewtypeDeriving extension in GHC Haskell. To go back to your original motivation:
I think the viewpoint I'm converging on is that this is just a really difficult problem. Are you aware of any features provided by other languages which can satisfy this need better than either of your 1a and 1b approaches above? Perhaps Scala's implicits? I don't know much about them but I think they suffer from a lot of the same issues as orphans. Maybe a good approach in this situation is to use regular functions instead of type classes (cf "Scrap your typeclasses"); I think these problems would be a fair bit easier to deal with if you had easy access to value-level versions of any instances which themselves need access to instances of inner types. For example, if I could easily get my hands on functions such as |
You might find this interesting too: https://github.com/reactormonk/modules |
One thing I'm not entirely clear on in this discussion is what the problem with newtypes is, in the presence of |
Let's say I've written an algorithm that tosses around a lot of records. I love records in PureScript, partly because their types don't need to be pre-declared, so you can work with a lot of variations on the same type without having to fill your code with similar declarations (unless it makes sense to). For example, I might have a function that accepts {
a :: Boolean,
b :: Int,
c :: Array (Either (NonEmptyArray String) String)
} and returns {
a :: Boolean,
b :: Maybe Int,
c :: Array (Either (NonEmptyArray String) (Tuple String Boolean))
} . Having written that function, I now want to lift it over Now I'm sad, because here are the two options I think I have:
\{ a, b, c } -> { a, b, c: map (left unwrap) c } . And of course this logic is different for many of the subtly different anonymous record types my algorithm is using. This adds clutter and brittleness, since it's now more code that needs to change if the details of those record types change, even in ways that don't involve the
I don't see how (If the intended resolution to this dilemma is to stop using records in such a free and easy way... well, lightweight anonymous records was one of the biggest attractions of PureScript for my current project, so I really hope that isn't it.) |
I'm wondering about a compromise solution. What if we allow that there be one default instance/instance of last resort, defined in the same module as the typeclass, that is tried only when there would fail to be another instance available. Would this be enough to solve the practical problems we've seen? Would this still work nicely with the rules of the existing typeclass system? I think there's potential in a simple idea like this. I think it does solve a large number of the common problems with providing instances for types that you do not control. Most of the time, like in the case of hashable, you can write some kind of instance for the typeclass (possibly depending on another typeclass) – it might not be the perfect instance, but it's better than nothing, and there's value in having it autoderived, for convenience and to ease over compatibility concerns. Note that, since we don't have instance guards, this instance would only be selected when there are no matching instance heads, so I don't think coherence would be a worry. And it wouldn't introduce any functionality like instance guards. And actually, it should be relatively simple to implement, since we already have instance chains. It would act like the last item of an instance chain. (But in a global manner, not limited to a single instance chain as it currently stands.) Does this sound reasonable? I know that this kind of feature has no precedent in Haskell, but I think it could work. What say you? |
Yeah, Scala's implicits score full marks on flexibility and get a fat zero on keeping dependencies from breaking your code and being easy for people to understand when they get buggy. I loved and hated them when Scala was my primary professional language for a few years. I think you're right that the problem is difficult but part of my point is that PureScript's design constraints make it especially difficult here, or maybe just especially necessary. In any language that doesn't enforce referential transparency, this kind of thing gets done with a run-time registration pattern, possibly with the aid of a distinct reflection system (Java, .NET), or possibly by extending a prototype (JS) or run-time representation of a base class (Python, Ruby). Of the languages that remain, Haskell permits orphan instances and limited overlapping of instances, and also has things like Template Haskell. Agda's instance arguments don't permit overlap by default, but do allow local instances and backtracking through dependency instance arguments. I'm very unfamiliar with Coq, but from its wiki, it appears to allow multiple instances in cases where Haskell does not. Among languages I know of, PureScript stands alone in the quadrant of ‘referential transparency at all costs’ and ‘instance resolution must always be simple and safe’, and it's here that the problem lives.
In other words, the combinator pattern. I'll gladly put up with building complex combinator values in some cases. Serialization, for instance; if I only need to communicate one big ADT across the wire, building an instance for that type with combinators is a small price to pay. I'm a lot less happy if I need to build a unique combinator assembly for every different, one-off, complex data type I want to put into a Without having tried it, that looks awesome. Its implementation, I'd like to note, uses overlapping instances. (And type families, and TH of course.) So that implementation wouldn't fly here, but maybe something like that interface, built into the language directly, would. It would certainly make me happier. And on that note, have you seen the Cochis paper? Here is a paper from PL academia which formalizes a scheme for resolving implicit values (or type class instances) that supports locally scoped values/instances like the modules library, maintains coherence, and is deterministic. I believe in practical terms, the upshot of this is that you would declare or import instances locally, and when resolving constraints, the matching value in the nearest lexical scope wins. (I'm thinking that existing global type class instances would be treated as if imported into an outermost global scope, and thus would still need to be forbidden from overlapping.) Integrating this into PureScript would entail some new syntax and design decisions (not to mention probably a lot of work under the hood), but it would be backed by research, it wouldn't require anyone to divide type classes into two categories, and I'm pretty sure it would result in a design where adding instances in dependencies still can't change the behavior of downstream code. I'd make a more fleshed-out proposal if you think this might have legs. |
It's not clear to me how the original proposal (or @MonoidMusician's suggestion) would fix the issue in the given example, since |
I'm assuming the default instance would be implemented in terms of |
Additionally, #3351 would make the burden of conversions quite small. Small enough that I personally would opt for newtypes just so I didn't have to accept a poor default implementation. |
Heh, fair. The answer is, if my original proposal were to be implemented, I'd go over to (I don't know about @MonoidMusician, but when I'm talking about It's a reasonable counter to ask why I don't just go over to It's an imperfect solution, though. If something like reactormonk/modules or Cochis ends up being more palatable to PS devs, that's great news to me. I just opened by asking for what I thought would be the least disruptive addition to the language that would offer some way forward. |
Is there something particularly insufficient about |
No, |
I think there are two separate discussions going on here:
SpecialisationThis issue started with an example of specialisation, which is currently not allowed in both PureScript and Haskell. For module Data.Hashable where
class Eq a <= Hashable a where
hash :: a -> Hash
instance Generic a => Hashable a where
hash x = ...
module MyApp where
data T = ...
-- Should be perfectly fine because `T` is more special then `forall a. Generic a`
instance Hashable T where
hash T = ... Let me refer to Rust. It's trait system is greatly similar to Haskell's type classes. They studied specialisation at length and implemented it in language. The details are described in a request for comments. So with a good definition of specialisation, it should not affect coherence at all! OrphansThe problem of instantiating a non-local class (or trait) for a non-local type is really a hard one. These orphan rules are also studied at great length by the Rust community, but they did not come up with an answer yet. Specialisation though, is thought as a way to soften the pain. See this special repository for the orphan problem for a quick overview. |
I believe Rust monomorphizes all functions when they're compiled, correct? This changes some of the trade-offs around specialization; in particular, see this playground. Since Rust makes two copies of Also, there may be a third category of thing being danced around: overriding, which is when an instance can be declared locally that is meant to replace another instance made available globally. This feature is present in Scala, and in the reactormonk/modules library linked above and the Cochis paper, and when available it can also be used to soften the absence of either of the other features you describe. |
Actually, hey @natefaubion, I have a question about As I've seen it, the usual argument for global instance uniqueness involves a |
I don't think so because HashSet wouldn't have a |
Rust indeed monomorphises everything. But I think that is not important here. The decision which implementation should be used (or which dictionary should be passed) is still made by the caller (i.e. the calls to module Upstream where
class Named a where
getName :: Proxy a -> String
instance {-# OVERLAPPABLE #-} Named a where
getName _ = "unnamed"
callGetName :: forall a. Named a => a -> String
callGetName _ = getName (Proxy :: Proxy a)
module Local where
import Upstream
data This = This
instance {-# OVERLAPPING #-} Named This where
getName _ = "This"
t0 = ( callGetName (0 :: Int), callGetName This )
-- > ("unnamed","This") Thing is, the compiler cannot reduce the context In PureScript you can have the same behaviour only by defining both instances in the same module and chain them together. The overlapping instance cannot be defined in another module, as one can do in Haskell or Rust. That is the pain point. I think specialisation of instances should work without using chains. |
To this point:
I don't think NonEmptyArray can have a Generic instance because Array doesn't have a Generic instance, and it isn't clear what that would even look like. I know it sounds like I'm just being pendantic, but I want to make sure we are on firm ground as to what the exact problem and solution space is. In the example you gave, the only robust solution is still to use newtyping. Note, I'm not saying specialization isn't useful, and I think we should keep discussing it and what it would mean for PureScript specifically (not Rust). |
Does that matter? This code compiles fine for me: import Data.Generic.Rep (class Generic)
newtype NonEmptyArray a = NonEmptyArray (Array a)
derive instance nonEmptyArrayGeneric :: Generic (NonEmptyArray a) _ |
It matters if you try to do anything with it, and if having the instance can let you violate the constraints of |
Yeah, Generic only “unwraps” one layer of a structure, so that’s fine. However, NonEmptyArray should not have a Generic instance, because providing a Generic instance effectively makes the constructors public, which breaks the guarantees which make NonEmptyArray useful. Map doesn’t have a Generic instance for the same reason. |
Cases like this are what the nominal role is for: if the structure of a value depends on more than the representation of a type argument, for example if it depends on a particular instance that that type argument has. I didn’t understand this until recently and I have been meaning to comment to that effect in the Coercible PR. So |
I was under the impression (perhaps incorrectly) that something like reactormonk/modules might be implementable in userland in PS; that was why I brought it up. I haven’t investigated that in any depth though. It’s probably worth looking into if you are interested in this problem. |
Another point in this design space is class morphisms which sounds like it would let you define a morphism from |
Given that there is no
Eq Bar
instance, should these two instances be allowed? PureScript 0.12.4 complains of overlapping instances.I know instance chains would be one workaround for this minimized example, but suppose
Foo
andequatableFoo
are put in one file, andBar
andbarFoo
in another—then instance chains wouldn't help, but provided that noEq Bar
instance is defined inBar
's file, I think the compiler should still be able to determine thatbarFoo
doesn't overlap withequatableFoo
, right?(I didn't find an exact duplicate for what I want, but #3120 seems related—that enhancement asks for constraints to be treated as guards in instance chains, with first-to-match semantics; and this asks for constraints to be treated as sort of like guards in overlapping instance sets, with exactly-one-match semantics.)
The text was updated successfully, but these errors were encountered: