ISet Should Be Partially-Applied on its Order #671
Comments
|
Sounds good to me, but I'd like to hear more about this fancy merge
|
|
I'll see about writing it up more fully. The essence of it is lazily amortizing the reordering required to make the conventional algorithms work. The amortization can be performed in such a way that the complexity addition to subsequent operations decays logarithmically, thus avoiding compounding complexity with subsequent unions. I suspect it is possible to play this trick with intersect as well, but I haven't given it any thought. My only concern at present is the fact that the algorithm I have in mind requires an enormous amount of thunking, which is likely to make baseline constant-factor performance dramatically slower on the JVM. An evil fast-path would be to use |
|
This would be an extremely bad idea. I am begging, seriously, let us not do these terrible, costly things that provide zero benefit. I have work to do now. |
|
Zero cost with several significant benefits (ie not being completely broken On Thursday, March 13, 2014, tonymorris notifications@github.com wrote:
|
|
Yeah sure. This kind of nonsense does not belong in a useful project. Keep it out please. |
|
Benefits:
Downsides:
And I don't know about the "not like Haskell" downside: Haskell could even do this, but it'd be more obviously bad. |
|
It's not obviously broken now. You lose data and you know now why. With my If you don't use local typeclasses, this doesn't affect you. The only cases On Thursday, March 13, 2014, Brian McKenna notifications@github.com wrote:
|
|
@djspiewak making broken things (e.g. local type-classes) only seem less broken (without actually changing the brokeness) while sacrificing sensible things (e.g. the universally quantified constraints on functions) is not a goal. |
|
@djspiewak I just want to leave this by pointing out that it does affect me: scala> val example1 = ISet.empty[Int => String].isEmpty
example1: Boolean = true
scala> val example2 = ISet.singleton[Int => String](_.toString).isEmpty
example2: Boolean = false
scala> val example3 = ISet.singleton[Int => String](_.toString).deleteMin
example3: scalaz.ISet[Int => String] = Tip() |
|
It is possible to debunk this nonsense with a whisper of breath. It is annoying to be compelled to expend any further effort on it. I refuse to do so. |
|
I'm in agreement that this is a bad idea. The basic point for me is that it would be impossible to construct an empty set of type Forall[ISet], which makes no sense. The existential type class thing works alright until you need to do something that is not completely first-order. Basically, if you are using more than one instance of a type class per type, then you're doing something very wrong. |
|
I also agree that this is a bad idea. It's regrettable that we don't have genuine newtypes in scala (I still can't reason about all the occasions under which value classes cause boxing), but I don't believe that we should ever make a change that supports incoherent type classes. If someone wants to do something that awful, they always have the option to maintain their own fork for their evil purposes; I think that the chances of such a fork becoming dominant are infinitesimal. |
|
Apparently a simple matter of correctness and robust API design in the face In any case, it's done, and clearly ISet will remain as it is. On Thursday, March 13, 2014, Kris Nuttycombe notifications@github.com
|
|
Yes clearly. Let's move on now. |
|
I didn't realize this was such a contentious issue. Could someone explain to me "like I'm five", why this is a bad idea? I am genuinely asking. Here are the points I see that have been made:
Also, for the record, I think @djspiewak was posting this in good faith, and you guys were pretty rude to him. |
|
This is really unfair and unreasonable. |
|
Yes, absolutely it will silently lose data. If you e.g. union with an Again, if you're using more than one ordering per type, you are already off the rails. |
|
@runarorama that example actually sounds okay to me, because the resulting ISet is well-formed with respect to the Order that was passed to union. You may have chosen a silly Order that considers all elements equal, but that seems okay to me. I was more asking about a use case where say I merge two ISet[A] where one was built with x: Order[A] and the other was built with x.reverseOrder, and I then union them with x as the Order. Will the resulting ISet be silently ill-formed or missing data, with respect to x, the ordering passed to union? I don't think I agree that it is necessarily a problem having more than one Order per type. I may be dynamically generating Order[A] values, storing them in a Map[String, Order[A]], and letting users pick an Order[A] by name at runtime. (Kind of like what we were doing with assembling Monoid/Reducer based computations at runtime.) Clearly this sort of thing is sometimes useful. It seems the issue is not with having multiple instances per se, it's that we sometimes would like to enforce or assume that there is a single instance, for operations like union, or for other reasons (like if we don't want to have to specify an Order for ISet.empty). Also, not that I endorse doing this, but you can enforce the one-instance rule by tagging your instances, and using the same universal quantification trick as ST, so something like: newtype Tag r a = Tag a newtype Order a = Order (a -> a -> Ordering) use :: Order a -> (forall t . Tag t (Order a) -> Tag t r) -> r |
|
"I was more asking about a use case where say I merge two In that case, you should be fine. But, you should have two different types for those orderings. In fact, perhaps "Clearly this sort of thing is sometimes useful." It's not all that clear to me. But the larger question here isn't whether this or that thing is useful. It is whether we should try to correct for incoherence introduced by the user. No, we should not. We should absolutely assume that the user is moral and good, and if they try to pass |
|
It's very important to note that, because Scala doesn't implement typeclasses but rather implicit dictionaries, it is possible to get into a bad situation with |
|
I know why Furthermore, what difference does it make if I supply the I thought about this and came up with the following:
|
|
The first criterion is trivial, while the second criterion is obvious when you consider that equality cannot be generally defined on |
|
@djspiewak I don't see how that's possible. What if I do Or did you mean to say that the ordering of the set's elements is not part of its identity? Because I don't buy that assumption - if |
|
@pchiusano only one instance per type-class per type gives type-class coherency, also known as confluence or instance consistency. It's an important property for reasoning and sometimes performance. There's some papers which quickly describe why we want this property - but nothing very in-depth. As to why not to do this: {-# LANGUAGE ExistentialTypes #-}
data Set a = Ord a => Tip | Bin (Set a) a (Set a)Well, it's just negatively useful. We now put an artificial constraint on our data type. Now all of our functions have more constraints - I want the minimum amount of constraints, since that gives me the most powerful parametricity. I can go on, but I think this demonstrates that this is a bad idea! |
|
No, the burden of proof is the other way. What is the actual argument for Let's try to keep this discussion at the level of principles. Rather than On Fri, Mar 14, 2014 at 9:42 AM, Paul Chiusano notifications@github.comwrote:
|
|
|
|
@runarorama I gave a long argument in the OP why the current situation is problematic, and I showed later on how users can get into that state without even violating typeclass coherence. The only strong counter-argument I've heard to date is the |
|
"The problem with the current ISet implementation is that it takes an Order[A] with every operation." This is not a problem in and of itself. "This is entirely a non-issue if we can make the assumption that no one is ever, ever using more than one Order instance for the same type." Exactly. Which is why it's not a problem in and of itself. It only becomes an issue if there are incoherent type class instances in play. In which case, any attempt to correct for that is as misguided as an attempt to correct for |
Explicitly pass two different orders. No typeclasses involved. |
|
@djspiewak explicitly passing something violates coherency. |
|
"Explicitly pass two different orders. No typeclasses involved." Right, so if the user chooses to do that, that is their choice. The law should not try to stop you from doing something stupid with your own life. |
|
At the end of the day, everything in Scalaz comes implicitly with the following warning: Do not expect this to work in a lawful and sane way in the face of |
For the love of god, will someone please link me a definition? This isn't twitter; we can actually cite things. |
|
@puffnfresh Thank you for your response. But it's not 'more constraints across the board'. We accept more constraints on @runarorama I don't see how this decision (of where to bind constraints) can be resolved by general principle. It feels like a question of what factoring makes the most sense for your program. For instance, one sometimes uses a For the record, I do not have strong feelings one way or another, nor do I have a strong argument for one way or the other for this particular case. |
|
"I think we all agree t hat mixing multiple instances of Order when working with sets is bad, so we can stop talking about that." That is the whole issue here: "This is entirely a non-issue if we can make the assumption that no one is ever, ever using more than one Order instance for the same type." So I agree, let's stop talking about this. |
|
@djspiewak Functional Pearl: Implicit Configurations quickly defines coherence:
In Scala, we use implicit parameters for type-classes, but we have to leave that as an implementation detail in order to have hopes of guaranteeing the above. |
|
@runarorama you still did not respond to my point about where constraints 'should' be bound. Can you please do that, otherwise we are just talking past each other. |
|
@puffnfresh Thanks for the definition. I think we are now going somewhat beyond the scope of this ticket, but I'll point out the critical element here:
The definition of coherence is a function of your definition of environment, which is essentially the argument I was making in New York. |
|
@pchiusano here is final def isEmpty =
this match {
case Tip() => true
case Bin(_, _, _) => false
}We know nothing about what the Set holds, so it has to be a function only of the Set's structure. Woo, parametricity! Let's add final def isEmpty(implicit o: Order[A]) =
/* let's do something awful by comparing elements */By giving our function another constraint, we've limited what we know about the function by limiting what we know about what the function does not do. |
|
I'm closing this issue. There's recent precedent with #619 to not capture instances in data types; see also our ports of (To contributors: Please do not reopen this issue.) |
|
@pchiusano Consider a simpler case, a type class If Now we change the type of
Our law is now incoherent. How do we reason about You can come up with a one-off, ad-hoc, and completely unnecessary solution to this question of how to recover the coherence of the |
|
@puffnfresh Yes, that makes sense, though depending on where constraints are, it means you get possibly fewer free theorems in other contexts that use @runarorama let me try to paraphrase your argument. You are saying that when you have some interface, governed by certain laws, it can be incoherent / undefined when implementing operations that accept multiple existential instances. It's not clear which existential instance should win, and in general, this has no correct answer, so let's not even try to answer it. And in general, existential instances are rather questionable in many cases anyway. I think I understand all the arguments here so I'm just going to drop it. Thanks for the discussion, and @larsrh sorry for hijacking this thread. :) |
|
@pchiusano Yes, exactly. I would make an even stronger statement and say that a general policy against existential instances is warranted because, more generally, we should not think of type class instances as values, even though Scala allows us to do that. Because thinking of them that way is harmful to reasoning about programs. |
|
@runarorama this is a little off-topic, but I wonder about the policy you're suggesting, because I think that existential instances have a place: specifically, they're important for information hiding. Existential instances at least seem like they allow you to potentially limit your maintenance burden by exposing only a minimal set of operations for a type. For a trivial example, something like this: data Foo a = forall f. Foldable f => Foo (f a) seems appropriate when I don't want to reveal anything more about a type other than the fact that it can be folded over, but still leave multiple providers of values of this type free to choose whatever representation is appropriate. If you disallow existential typeclasses, what's the alternative here? In my mind, there's a symmetry here: when defining functions, you may universally quantify the input to give your clients flexibility, and you may existentially quantify the output (with a data type as defined above) to give the library designer (yourself) similar flexibility. |
|
@nuttycom It's not existential type classes that I want to disallow, but treating instances as values. You would resolve this in Scala in the same way that it's resolved in Haskell: using universally quantified types. trait Fold[A,R] {
def apply[F[_]:Foldable](f: F[A]): R
}
sealed trait Foo[A] {
def apply[R](f: Fold[A,R]): R
}
object Foo {
def apply[F[_]:Foldable, A](a: F[A]): Foo[A] = new Foo[A] {
def apply[R](f: Fold[A,R]) = f(a)
}
}Note that while the // Requires `Foldable[List]`
val foo: Foo[Int] = Foo(List(1,2,3))
val x = foo(new Fold[Int, Int]) {
def apply[F[_]:Foldable](xs: F[Int]) =
// Can reference the foldable instance, but it's not usable outside of here
Foldable[F].foldMap(xs)(x => x)
}) |
|
@runarorama Ah, very clever. Thanks! |
|
@runarorama there is a typo in your code example // Requires `Foldable[List]`
val foo: Foo[Int] = Foo(List(1,2,3))
// here v
val x = foo(new Fold[Int, Int] {
def apply[F[_]:Foldable](xs: F[Int]) =
// Can reference the foldable instance, but it's not usable outside of here
Foldable[F].foldMap(xs)(x => x)
}) |
|
Free your mind! https://gist.github.com/bvenners/d07ca7bf2cf8021aab63 |
So to set the stage, this is not an argument for or against local typeclasses or anything similar. I'm well aware of what typeclass coherence is. This is simply about what we can do within Scala.
The problem with the current ISet implementation is that it takes an
Order[A]with every operation. This is entirely a non-issue if we can make the assumption that no one is ever, ever using more than oneOrderinstance for the same type. However, the instant someone does this (and we cannot prevent it, because of Scala being…Scala), data is lost. This is directly analogous to the clearly-insane situation where someone puts a mutable value with an unstable equality into a Java hashtable.The alternative to this implementation is to capture a single
Order[A]at construction time. This implementation has no differences relative to the current situation if people are obeying the unspoken single-instance contract for typeclasses. The only difference arises when someone actually violates this rule and starts using multiple orderings. Specifically: no data will be lost. The ISet will continue to function exactly as per normal under all operations, despite the new (possibly contradictory) ordering that was introduced by the user.The only point where things become different is in the case of union. The union of two sets involving different orderings is a major problem. The key realization though is it is no more or less of a problem under either implementation! If you union two sets with different orders under the current scheme, data will be lost from at least one side, and possibly both depending on how the orders interact. If you union two sets under different order with the captured, you still lose data. From an outside perspective, nothing changes.
What is interesting is that it is actually algorithmically possible to improve on the incoherent merge situation if we capture the order at construction time, whereas we have absolutely no hope if the order is always passed locally. I can provide more details on this point if necessary, but for now it suffices to say that there is an algorithm which can achieve a sub-log-linear merge (i.e. better than a full re-insertion) on a pair of lazy binary search trees preserving all data even when the trees are using distinct orderings (and this algorithm gracefully degrades to the standard merge algorithm when the orderings are identical). Said algorithm has no reliance on evil tricks like
ord1 eq ord2or similar.Benefits to partially-applying on the
Orderat construction time:Downsides:
Honestly, I don't even see a question here. If you're not using local typeclasses, then this change carries absolutely no effect. If you are using local typeclasses, then this change removes silent data loss cases and enables potentially much more efficient implementation details.
The text was updated successfully, but these errors were encountered: