New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Equality on arrays is wonky #403

Closed
eeue56 opened this Issue Sep 9, 2015 · 15 comments

Comments

Projects
None yet
6 participants
@eeue56
Contributor

eeue56 commented Sep 9, 2015

The problem

Issue #176 highlighted the fact that array equality takes into account the internal structure when comparing two arrays. The problem was raised by two arrays, of identical elements and length, being generated differently -

Array.fromList <| List.repeat 32 1

will generate

Object {ctor: "_Array", height: 1, table: Array[32], lengths: Array[32]}

whereas

Array.repeat 32 1

will generate

Object {ctor: "_Array", height: 2, table: Array[1], lengths: Array[1]}

Due to the way eq is implemented, the two arrays which are element-by-element equal are considered different as they have different heights and therefore nested differently.

Expected behvaiour

As datastructures in Elm are immutable, we don't really care about checking if the internal data structures are identical - [1, 1] should always be equal to [1, 1], no matter if one uses two tables and the other uses a single table internally.

The fix

eq should check if an object's ctor is an Array. If they both are, then we should compare Elm.Native.Array.length(right) == Elm.Native.Array.length(left). If they match, then we should iterate over both simultaneously and called eq on each element-pair as we go.

@eeue56

This comment has been minimized.

Show comment
Hide comment
@eeue56

eeue56 Sep 10, 2015

Contributor

So, I have a fix for this now by adding length and unsafeGet to Utils.js and adding


        if (l.ctor === "_Array") {
            if (l.ctor === r.ctor){
                if (length(l) != length(r)) return false;

                for (var i = 0; i < length(l); i++){
                    if (!eq(unsafeGet(i, l), unsafeGet(i, r))){
                        return false;
                    }
                }
                return true;
            } else {
                return false;
            }
        }

to the start of eq. It also decreased the speed for Array comparisons by a factor of ~5 times for unnested arrays.

For nested arrays it cut eq down from ~11600 ms to ~20ms for the test below

main = 
  show
    <| List.filter (\x -> not x)
    <| List.map (\n -> (Array.fromList (List.repeat n <| List.repeat 50 1)) == (Array.repeat n <| List.repeat 50 1)) [50000..50002]

So, overall, I think I have a fix and it will generally increase performance slightly, and in other cases it will massive increase performance.

The one blocker for me to create a PR to fix this though is where should I put Array.unsafeGet and Array.length? I'm not really fond of having specialized functions within Utils, and they can't be accessed in Array.js from Utils.js due to the fact that Array depends on List which depends on Utils.

Any thoughts?

Contributor

eeue56 commented Sep 10, 2015

So, I have a fix for this now by adding length and unsafeGet to Utils.js and adding


        if (l.ctor === "_Array") {
            if (l.ctor === r.ctor){
                if (length(l) != length(r)) return false;

                for (var i = 0; i < length(l); i++){
                    if (!eq(unsafeGet(i, l), unsafeGet(i, r))){
                        return false;
                    }
                }
                return true;
            } else {
                return false;
            }
        }

to the start of eq. It also decreased the speed for Array comparisons by a factor of ~5 times for unnested arrays.

For nested arrays it cut eq down from ~11600 ms to ~20ms for the test below

main = 
  show
    <| List.filter (\x -> not x)
    <| List.map (\n -> (Array.fromList (List.repeat n <| List.repeat 50 1)) == (Array.repeat n <| List.repeat 50 1)) [50000..50002]

So, overall, I think I have a fix and it will generally increase performance slightly, and in other cases it will massive increase performance.

The one blocker for me to create a PR to fix this though is where should I put Array.unsafeGet and Array.length? I'm not really fond of having specialized functions within Utils, and they can't be accessed in Array.js from Utils.js due to the fact that Array depends on List which depends on Utils.

Any thoughts?

@evancz

This comment has been minimized.

Show comment
Hide comment
@evancz

evancz Oct 30, 2015

Member

It is also wonky on Set and Dict. The root in all the cases is that the "meaning" of the data structure does not determine a 100% unique structure. It makes sense to me that all of these situations, where structural equality is not really what is desired, should be handled in a coherent way. There are solutions for that (like type classes) but I don't want to introduce a relatively intense language feature to resolve this particular issue.

So I guess I'm saying, if we go in this direction, it means we will special case Array and possibly Dict. But we will not special case any user defined thing. Hash maps, graphs, whatever else. So implicitly, we are saying "we have consistent behavior now, but we want to add some special cases, making things a bit better, but also making the behavior inconsistent." That's what I think.

I don't know if that's good or bad. I guess I'd be curious to know if these are things you considered in looking at this issue.

Member

evancz commented Oct 30, 2015

It is also wonky on Set and Dict. The root in all the cases is that the "meaning" of the data structure does not determine a 100% unique structure. It makes sense to me that all of these situations, where structural equality is not really what is desired, should be handled in a coherent way. There are solutions for that (like type classes) but I don't want to introduce a relatively intense language feature to resolve this particular issue.

So I guess I'm saying, if we go in this direction, it means we will special case Array and possibly Dict. But we will not special case any user defined thing. Hash maps, graphs, whatever else. So implicitly, we are saying "we have consistent behavior now, but we want to add some special cases, making things a bit better, but also making the behavior inconsistent." That's what I think.

I don't know if that's good or bad. I guess I'd be curious to know if these are things you considered in looking at this issue.

@evancz

This comment has been minimized.

Show comment
Hide comment
@evancz

evancz Oct 30, 2015

Member

Also, of all the things related to making Array nicer, this is probably the one thing that branches out into broader language design questions, so it's not just a matter of writing code and putting it in the right place. There are at least 3 viable solutions that are mutually exclusive, so it's a question of which one and when.

Member

evancz commented Oct 30, 2015

Also, of all the things related to making Array nicer, this is probably the one thing that branches out into broader language design questions, so it's not just a matter of writing code and putting it in the right place. There are at least 3 viable solutions that are mutually exclusive, so it's a question of which one and when.

@eeue56

This comment has been minimized.

Show comment
Hide comment
@eeue56

eeue56 Oct 30, 2015

Contributor

When I was looking at this, it seemed to make sense to me to consider adding a _eq method or something to each of the Elm objects, much like the current ctor. The algorithm for comparisons would then look like this -

if a.ctor === b.ctor:
  if not a._eq:
    False
  a._eq(b) 

There are a few benefits from this:

It would now be possible to add custom eq operations to Native modules. This would be great for anything implementing a new data structure that would enable things like fast eqs and still using Elm's built in == operator. The current eq is slow in some places, such as for Array (as my posts note). This would get around that issue.

From a language perspective, it would also avoid people going down the route of defining a equals method which did what == should do, but faster. This helps keep the language clean and readable.

This is the route that Python takes, for example. It would also translate from the Eq typeclass in Haskell -

instance Eq Array where
...

In an ideal world, I would love it if it were possible to define an eq that gets converted to the Native _eq at compile time, but I'm not sure how feasible that would be within the language. I would therefore only allow this _eq override to be done through Native modules.

I would prefer this to bringing in custom Array/Dict stuff into Utils.js.

Contributor

eeue56 commented Oct 30, 2015

When I was looking at this, it seemed to make sense to me to consider adding a _eq method or something to each of the Elm objects, much like the current ctor. The algorithm for comparisons would then look like this -

if a.ctor === b.ctor:
  if not a._eq:
    False
  a._eq(b) 

There are a few benefits from this:

It would now be possible to add custom eq operations to Native modules. This would be great for anything implementing a new data structure that would enable things like fast eqs and still using Elm's built in == operator. The current eq is slow in some places, such as for Array (as my posts note). This would get around that issue.

From a language perspective, it would also avoid people going down the route of defining a equals method which did what == should do, but faster. This helps keep the language clean and readable.

This is the route that Python takes, for example. It would also translate from the Eq typeclass in Haskell -

instance Eq Array where
...

In an ideal world, I would love it if it were possible to define an eq that gets converted to the Native _eq at compile time, but I'm not sure how feasible that would be within the language. I would therefore only allow this _eq override to be done through Native modules.

I would prefer this to bringing in custom Array/Dict stuff into Utils.js.

@eeue56

This comment has been minimized.

Show comment
Hide comment
@eeue56

eeue56 Oct 30, 2015

Contributor

Alternatively, Elm could just take the stance that comparables will be provided for in core through Utils.js, and if anyone else wants to define a eq, they have to implement it with a different name. So it would become common to see eq a b instead of a == b for non-core datastructures.

Contributor

eeue56 commented Oct 30, 2015

Alternatively, Elm could just take the stance that comparables will be provided for in core through Utils.js, and if anyone else wants to define a eq, they have to implement it with a different name. So it would become common to see eq a b instead of a == b for non-core datastructures.

@rtfeldman

This comment has been minimized.

Show comment
Hide comment
@rtfeldman

rtfeldman Nov 15, 2015

Member

Conceptually I think of Elm's philosophy as "it's not a big deal that you don't have a ton of power to define custom things, because the core data structures are really nice and you can get a long way just mixing and matching them."

I think special-casing equality for the core data structures of Array, Set, and Dict is consistent with this philosophy in the same way that number gets its own typeclass so that things like (+) and (*) can be nice, even though users can't define their own typeclasses.

Member

rtfeldman commented Nov 15, 2015

Conceptually I think of Elm's philosophy as "it's not a big deal that you don't have a ton of power to define custom things, because the core data structures are really nice and you can get a long way just mixing and matching them."

I think special-casing equality for the core data structures of Array, Set, and Dict is consistent with this philosophy in the same way that number gets its own typeclass so that things like (+) and (*) can be nice, even though users can't define their own typeclasses.

@rtfeldman

This comment has been minimized.

Show comment
Hide comment
@rtfeldman

rtfeldman Nov 15, 2015

Member

Also, it seems a bit weird that == is a runtime exception for functionA == functionB but just doesn't work properly for setA == setB. Seems like either the latter should be considered a bug or else something explicitly unsupported, like function equality, which results in a runtime exception.

That said, appreciate the point that this is a broader language discussion. 😃

Member

rtfeldman commented Nov 15, 2015

Also, it seems a bit weird that == is a runtime exception for functionA == functionB but just doesn't work properly for setA == setB. Seems like either the latter should be considered a bug or else something explicitly unsupported, like function equality, which results in a runtime exception.

That said, appreciate the point that this is a broader language discussion. 😃

@eeue56

This comment has been minimized.

Show comment
Hide comment
@eeue56

eeue56 Nov 16, 2015

Contributor

I have this issue with Set and Dict having special casing in Utils.js. Technically, I don't think they should need it - the fact they do is the result of eq being broken from a language perspective. The problem with all these datastructures is that they can be represented in many different ways - eq from Utils.js compares the representation, rather than the data. This is probably right from a "true" eq standpoint, but from a usability standpoint users don't care if 9 was used as the base node instead of 5, they just care that a set has both [9, 5].

At the moment, both Dict and Set are pure Elm. Special casing Array is pretty easy to do, since the implementation is in JS already. Javascript is where runtime exceptions exist and happen. I would much prefer to have things in Elm rather than JS wherever possible. It's easy enough to write a (more or less) performant eq in Elm for Dicts and Sets - my AllDict implementation does just that, albeit in a hacky way.

As for function equality, I also have a solution for that which works in a hacky way, for two identical implementations of the same function (though, names aren't checked as they currently aren't kept from compile)

Contributor

eeue56 commented Nov 16, 2015

I have this issue with Set and Dict having special casing in Utils.js. Technically, I don't think they should need it - the fact they do is the result of eq being broken from a language perspective. The problem with all these datastructures is that they can be represented in many different ways - eq from Utils.js compares the representation, rather than the data. This is probably right from a "true" eq standpoint, but from a usability standpoint users don't care if 9 was used as the base node instead of 5, they just care that a set has both [9, 5].

At the moment, both Dict and Set are pure Elm. Special casing Array is pretty easy to do, since the implementation is in JS already. Javascript is where runtime exceptions exist and happen. I would much prefer to have things in Elm rather than JS wherever possible. It's easy enough to write a (more or less) performant eq in Elm for Dicts and Sets - my AllDict implementation does just that, albeit in a hacky way.

As for function equality, I also have a solution for that which works in a hacky way, for two identical implementations of the same function (though, names aren't checked as they currently aren't kept from compile)

@eeue56

This comment has been minimized.

Show comment
Hide comment
@eeue56

eeue56 Nov 16, 2015

Contributor

FWIW - I know how to fix eq for Set and Dict currently (using special casing), but I think it might be worth having this discussion before adding it

Contributor

eeue56 commented Nov 16, 2015

FWIW - I know how to fix eq for Set and Dict currently (using special casing), but I think it might be worth having this discussion before adding it

@mgold

This comment has been minimized.

Show comment
Hide comment
@mgold

mgold Dec 27, 2015

Contributor

So I guess I'm saying, if we [fix the issue without adding a language feature], it means we will special case Array and possibly Dict. But we will not special case any user defined thing. Hash maps, graphs, whatever else. So implicitly, we are saying "we have consistent behavior now, but we want to add some special cases, making things a bit better, but also making the behavior inconsistent." That's what I think.

Inconsistent behavior is a matter of perspective. If you see Dict as a purely Elm data structure that nevertheless gets special treatment in the compiler that is unavailable to other authors, that's inconsistent. But if you see Int and List Int as supporting equality but suddenly Array Int and Set Int do not, then it's the current situation that's inconsistent.

The fact that Dict happens to be a pure-Elm data structure behind an opaque type, and Array is a bunch of JavaScript code, is an implementation detail. The fact that equality will silently fail, when silent failures are almost unheard of in Elm, speaks much louder than an implementation detail.

If the long-term goal is to add a language feature, than this is a short-term hack. Whenever that feature is added, data structure equality will be supported, so to the user, nothing in core changes.

Contributor

mgold commented Dec 27, 2015

So I guess I'm saying, if we [fix the issue without adding a language feature], it means we will special case Array and possibly Dict. But we will not special case any user defined thing. Hash maps, graphs, whatever else. So implicitly, we are saying "we have consistent behavior now, but we want to add some special cases, making things a bit better, but also making the behavior inconsistent." That's what I think.

Inconsistent behavior is a matter of perspective. If you see Dict as a purely Elm data structure that nevertheless gets special treatment in the compiler that is unavailable to other authors, that's inconsistent. But if you see Int and List Int as supporting equality but suddenly Array Int and Set Int do not, then it's the current situation that's inconsistent.

The fact that Dict happens to be a pure-Elm data structure behind an opaque type, and Array is a bunch of JavaScript code, is an implementation detail. The fact that equality will silently fail, when silent failures are almost unheard of in Elm, speaks much louder than an implementation detail.

If the long-term goal is to add a language feature, than this is a short-term hack. Whenever that feature is added, data structure equality will be supported, so to the user, nothing in core changes.

@rtfeldman

This comment has been minimized.

Show comment
Hide comment
@rtfeldman

rtfeldman Apr 15, 2016

Member

Sort 'em afterwards, yeah?

On Fri, Apr 15, 2016, 4:16 AM Kris Jenkins notifications@github.com wrote:

This also raises the question, how can we reliably compare sets for
equality? The toList workaround doesn't work:

» elm repl
---- elm repl 0.16.0 -----------------------------------------------------------

:help for help, :exit to exit, more at https://github.com/elm-lang/elm-repl

import Set
Set.fromList [0,1] == Set.fromList [1,0]
False : Bool


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
https://github.com/elm-lang/core/issues/403#issuecomment-210420961

Member

rtfeldman commented Apr 15, 2016

Sort 'em afterwards, yeah?

On Fri, Apr 15, 2016, 4:16 AM Kris Jenkins notifications@github.com wrote:

This also raises the question, how can we reliably compare sets for
equality? The toList workaround doesn't work:

» elm repl
---- elm repl 0.16.0 -----------------------------------------------------------

:help for help, :exit to exit, more at https://github.com/elm-lang/elm-repl

import Set
Set.fromList [0,1] == Set.fromList [1,0]
False : Bool


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
https://github.com/elm-lang/core/issues/403#issuecomment-210420961

@krisajenkins

This comment has been minimized.

Show comment
Hide comment
@krisajenkins

krisajenkins Apr 15, 2016

Contributor

@rtfeldman Yup. Sorry, I was being slow & clearly didn't delete that comment fast enough. 😊

Contributor

krisajenkins commented Apr 15, 2016

@rtfeldman Yup. Sorry, I was being slow & clearly didn't delete that comment fast enough. 😊

NobbZ referenced this issue in NobbZ/xelm Apr 17, 2016

Makes the testsuite more faulttolerant
There seems to be a bug in elm (elm-lang/core#403) which makes equality
checks fail when the should succeed.

FIX #67
@NobbZ

This comment has been minimized.

Show comment
Hide comment
@NobbZ

NobbZ Apr 18, 2016

Perhaps add an equals function to the affected types modules. That has to be used then instead of ==.

Another approach were to introduce overloading of functions. This way you can avoid typeclasses but introduce other ambiguities.

NobbZ commented Apr 18, 2016

Perhaps add an equals function to the affected types modules. That has to be used then instead of ==.

Another approach were to introduce overloading of functions. This way you can avoid typeclasses but introduce other ambiguities.

@mgold

This comment has been minimized.

Show comment
Hide comment
@mgold

mgold Apr 18, 2016

Contributor

equals works well for third-party libraries but would be a blight on the core libraries.

Type classes are a form of overloading. If you want to continue this discussion please do so on the mailing list.

Contributor

mgold commented Apr 18, 2016

equals works well for third-party libraries but would be a blight on the core libraries.

Type classes are a form of overloading. If you want to continue this discussion please do so on the mailing list.

@evancz

This comment has been minimized.

Show comment
Hide comment
@evancz

evancz Jun 25, 2016

Member

6f5fbb8 and 1c00777 implement a stopgap measure for the data structures in core. This reduces the corner cases to when people are implementing their own data structures (like Queue or Graph) in other packages.

We'll do a better thing later, but it is not the right time to focus on that.

Member

evancz commented Jun 25, 2016

6f5fbb8 and 1c00777 implement a stopgap measure for the data structures in core. This reduces the corner cases to when people are implementing their own data structures (like Queue or Graph) in other packages.

We'll do a better thing later, but it is not the right time to focus on that.

@evancz evancz closed this Jun 25, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment