Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop Semigroup/Monoid instances for Map; add SemigroupMap #38

Merged
merged 45 commits into from Jan 26, 2021

Conversation

JordanMartinez
Copy link
Contributor

@JordanMartinez JordanMartinez commented Dec 20, 2020

Based on feedback from the community, it seems like having two versions of Map, the current one unchanged (in that it's Semigroup instance prefers the first Map's values over the second Map's values when they both have the same key) and the new one added in this PR having its Semigroup instance unbiased, which can provide the same behavior as the biased Map but allows flexibility to handle that case differently.

Based on Harry's comment, the unbiased version of Map should use unionWith append for its implementation of append and union for its implementation of alt.

I have the following questions:

  • Should we introduce a breaking change here, in that the original Map is moved to Data.Map.Biased, so it's even that much clearer?
  • I foresee people getting a "Could not unify Map with type Map" errors because they imported functions from both modules on accident. Should we also change the name of one or both of these types to make such errors easier? For example, MapU or MapB?

Regarding documentation, where should we document the difference between the Biased and Unbiased Maps? I assume at both the Module level, the Map declaration and the Semigroup instance documentation for completeness.

And what specifically should the documentation say?

@hdgarrood
Copy link
Contributor

hdgarrood commented Dec 20, 2020

I think it should be possible to duplicate the API without duplicating the implementations. Something like this:

newtype Map k v = UnbiasedMap (Biased.Map k v)

deriving newtype instance [...]

lookup :: forall k v. Ord k => k -> Map k v -> Maybe v
lookup = coerce Biased.lookup

@hdgarrood
Copy link
Contributor

I foresee people getting a "Could not unify Map with type Map" errors because they imported functions from both modules on accident. Should we also change the name of one or both of these types to make such errors easier? For example, MapU or MapB?

Those names seem strange to me, and using different names would prevent being able to easily switch between them just by adjusting the imports. We really just need to fix this in the compiler.

@hdgarrood
Copy link
Contributor

Should we introduce a breaking change here, in that the original Map is moved to Data.Map.Biased, so it's even that much clearer?

I think if we do this (I’m undecided personally) we ought to have a deprecation cycle first.

@kl0tl
Copy link
Member

kl0tl commented Dec 20, 2020

  • Should we introduce a breaking change here, in that the original Map is moved to Data.Map.Biased, so it's even that much clearer?

I don’t think that having both Data.Map.Biased and Data.Map.Unbiased is much clearer than Data.Map and Data.Map.Unbiased, at least not enough to warrant the work implied by the deprecation of Data.Map.

  • I foresee people getting a "Could not unify Map with type Map" errors because they imported functions from both modules on accident. Should we also change the name of one or both of these types to make such errors easier? For example, MapU or MapB?

I agree with Harry, this isn’t something we ought to workaround with suffixes.

Thinking about usage a bit, if both Data.Map and Data.Map.Unbiased are imported in the same module at least one of them will have to be qualified to avoid conflicts.

import Data.Map
import Data.Map.Unbiased as Unbiased
import Data.Map as Biased
import Data.Map.Unbiased
import Data.Map as Biased
import Data.Map.Unbiased as Unbiased

In each case I don’t think prefixing the newtype constructor (UnbiasedMap instead of Map) helps much to disambiguate given the qualified imports, so could we drop the prefix from Harry suggestion?

I think it should be possible to duplicate the API without duplicating the implementations.

This is indeed possible, and it would nicely help to highlight where Data.Map.Unbiased diverges from Data.Map. The compiler cannot coerce polytypes though, so they need to be instantiated manually with type annotations:

lookup :: forall k v. Ord k => k -> Map k v -> Maybe v
lookup = coerce (Biased.lookup :: k -> Biased.Map k v -> Maybe v)

@kritzcreek
Copy link
Member

Should we introduce a breaking change here, in that the original Map is moved to Data.Map.Biased, so it's even that much clearer?

Let's take a step back and think about what a Map data structure's core functionality is. The two operations a Map needs to provide are insert and lookup, everything else is just fluff on top. Anything that doesn't impact how these two functions behave/perform shouldn't affect the naming scheme. A Semigroup instance is very low in the list of priorities when it comes to an implementation of Map.

I foresee people getting a "Could not unify Map with type Map" errors because they imported functions from both modules on accident. Should we also change the name of one or both of these types to make such errors easier? For example, MapU or MapB?

I feel strongly that it would be bad if we duplicated the entire API and ended up with incompatible autocompletes and imports.

My preferred solutions at this point in descending order:

  1. Remove the Semigroup instance. It's not canonical enough and we can't silently change the existing one.
  2. Remove the Semigroup instance and re-introduce it as the unbiased one in the next major release.
  3. Just leave things as they are

@hdgarrood
Copy link
Contributor

To me, not having the option of getting a Map type with the ideal Semigroup instance is much worse than any confusion that having two copies of the API would cause. Any autocompletion ought not to get too confused if one out of Data.Map and Data.Map.Unbiased is already imported (like, hopefully if I type Map.lookup and there is already one module imported as Map which exports lookup I should be fine?) but if it does get confused, I think we should just consider that as something we can fix in the tooling; I don’t think that should influence this decision. I suppose we could provide the newtype but not duplicate the rest of the API, but I think that would be kind of horrid to use as you’d have to coerce back and forth whenever you wanted to use the Semigroup instance.

@kritzcreek
Copy link
Member

kritzcreek commented Dec 20, 2020

To me, not having the option of getting a Map type with the ideal Semigroup instance is much worse than any confusion that having two copies of the API would cause

Sounds like Option 2 then. If there's an ideal one, there's no reason to have two different modules.

Any autocompletion ought not to get too confused if one out of Data.Map and Data.Map.Unbiased is already imported (like, hopefully if I type Map.lookup and there is already one module imported as Map which exports lookup I should be fine?) but if it does get confused,

If I'm typing Map.lookup without an existing import the IDE will suggest both modules. How do I decide which I need? There is no such thing as an "unbiased" Map in the CS literature. And then when I try to call an API function that asked me for a Map because that's how Pursuit/the IDE renders that signature , I pass it the wrong one.

@hdgarrood
Copy link
Contributor

If there's an ideal one, there's no reason to have two different modules.

There still is a reason to have two different modules: it allows us to provide the functionality we want to provide without breaking anything.

How do I decide which I need? There is no such thing as an "unbiased" Map in the CS literature.

I don’t think “is this already a known thing in CS literature” should be that much of a factor here, especially since I imagine a fairly large chunk of PureScript users don’t concern themselves with CS literature. If people don’t know why there are two, they could read the docs on Pursuit, where the difference would be explained. You’re right that it could be a bit awkward if you have the wrong kind of Map for a function you want to use, although that should be addressable by an application of coerce in most cases.

I also just want to check you’re aware that the idea is that we would only have two separate Map modules for a certain period of time; eventually we would just have Data.Map with the unbiased Semigroup instance. Does knowing that affect your preferred option?

@kritzcreek
Copy link
Member

kritzcreek commented Dec 21, 2020

I don’t think “is this already a known thing in CS literature” should be that much of a factor here, especially since I imagine a fairly large chunk of PureScript users don’t concern themselves with CS literature.

Well I'll just include StackOverflow in CS literature for this argument. Similarly Googling for various combinations of Biased Map/Dictionary/HashMap doesn't yield any results.

If people don’t know why there are two, they could read the docs on Pursuit,

So people go to Pursuit to try and understand which module to pick. I'd say the "large chunk of PureScript users that don't concern themselves with CS literature" also aren't going to be happy that they need to understand what a Semigroup is, before they can use a Map data type, Especially since they've used Maps in all their other languages just fine.

should be addressable by an application of coerce in most cases

I wouldn't want Coercible to be something non-expert users have to concern themselves with unless they really want to. It's a pretty advanced feature.

I also just want to check you’re aware that the idea is that we would only have two separate Map modules for a certain period of time; eventually we would just have Data.Map with the unbiased Semigroup instance. Does knowing that affect your preferred option?

I wasn't aware of that. If the end-goal is to end up with a single Data.Map that has the functionality we want we're working towards the same goal! I still think the best way to go about it is to delete the current instance, so that people have to rewrite all usages of <> for maps to union (whichever bias it is right now), before introducing the new instance, for all the reasons above.

@JordanMartinez
Copy link
Contributor Author

If the end-goal is to end up with a single Data.Map that has the functionality we want we're working towards the same goal!

That's exactly the goal.

The current Semigroup instance of Map...

instance semigroupMap :: Ord k => Semigroup (Map k v) where
  append = union

... will be changed to the following

instance semigroupMap :: (Ord k, Semigroup v) => Semigroup (Map k v) where
  append = unionWith append

The original behavior can still be achieved by using First v as the v in Map

leftMap `union` rightMap == (c leftMap) <> (c rightMap) where
  c :: forall k v. Map k v -> Map k (First v)
  c = coerce

@JordanMartinez
Copy link
Contributor Author

I've updated the unbiased Map to be a newtype around the biased Map. CI fails with MonadZero deprecation warnings and these errors:

[1/2 TypesDoNotUnify] test/Test/Data/Map.purs:47:15

  47    arbitrary = elements $ A :| [B, C, D, E, F, G, H, I, J]
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  
  Could not match type
  
    NonEmpty Array
  
  with type
  
    NonEmptyArray
  
  while trying to match type NonEmpty t1 SmallKey
    with type NonEmptyArray t0
  while checking that expression (NonEmpty A) [ B
                                              , C
                                              , D
                                              , E
                                              , F
                                              , G
                                              , H
                                              , I
                                              , J
                                              ]
    has type NonEmptyArray t0
  in value declaration arbSmallKey
  
  where t0 is an unknown type
        t1 is an unknown type

[2/2 TypesDoNotUnify] test/Test/Data/Map/Unbiased.purs:47:15

  47    arbitrary = elements $ A :| [B, C, D, E, F, G, H, I, J]
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  
  Could not match type
  
    NonEmpty Array
  
  with type
  
    NonEmptyArray
  
  while trying to match type NonEmpty t1 SmallKey
    with type NonEmptyArray t0
  while checking that expression (NonEmpty A) [ B
                                              , C
                                              , D
                                              , E
                                              , F
                                              , G
                                              , H
                                              , I
                                              , J
                                              ]
    has type NonEmptyArray t0
  in value declaration arbSmallKey
  
  where t0 is an unknown type
        t1 is an unknown type

@JordanMartinez
Copy link
Contributor Author

CI now builds. A few questions I have:

  • Since the tests for the biased map never test anything Semigroup-related, should the unbiased map even have tests? If not, I can remove that file. If so, I can port the values used across both (e.g. SmallKey) to a new module that both can use.
  • Any other bikeshedding we want to do here before this gets merged?

@kl0tl
Copy link
Member

kl0tl commented Dec 23, 2020

I think testing the Semigroup instance like I did in purescript/purescript-foreign-object#19 should be enough.

@hdgarrood
Copy link
Contributor

I think we should consider the points @kritzcreek raised a bit more. I can’t help but agree that everyone having to understand why there are two versions of every Map function is a fairly big disadvantage of this approach. I think I might also prefer just to remove the instance entirely and re-add it later.

@JordanMartinez
Copy link
Contributor Author

I'm not going to argue for/against dropping the instance below. Merely make a few comments.

I can’t help but agree that everyone having to understand why there are two versions of every Map function is a fairly big disadvantage of this approach. I think I might also prefer just to remove the instance entirely and re-add it later.

If people upgrade from v0.13.x to the future v0.15.x, the change will be silent. However, it's hard to make such a change less silent no matter which approach is taken.

The disadvantage of dropping it altogether is that Map no longer has a Semigroup instance. However, this latter issue can be worked around. Someone (say me) could publish a purescript-map-unbiased repo that implements the Data.Map.Unbiased module I did above which is a newtype around this repo's Map type and add the unbiased Semigroup instance. I would have to maintain the API, so that it stays consistent with this repo until the v0.15.x release. However, I don't think that's a huge maintenance burden. Those who are relying on the biased Semigroup behavior would be forced to either stop using type classes and return to using functions (e.g. union) or update their repos to use the unbiased repo with the map's values coerced to First. Those who want the unbiased Semigroup behavior would depend on my repo.

While core team members could agree not to create this repo personally, anyone in the community could create it. Eventually, someone will likely create it since those who need it will need to reimplement it in each repo/project in which they need it.

Thus, "dropping the instance" wouldn't remove the "two Map types" issue above but merely change how it manifests itself.

@JordanMartinez
Copy link
Contributor Author

Just thought I'd summarize what we have discussed so far.

AFAICT, the plan we have in place for making such a change accounts for problematic issues, such as the "silent changes" in the library, people jumping two major releases, enabling Map to still work in polymorphic code that uses a Semigroup at some point via SemigroupMap (excluding Phil's example), and addressing new learners confusion via docs and a cookbook recipe.

One of the desires to unbias the Semigroup instance comes from the desire to be consistent with Object, which is currently unbiased. Phil and Harry already basically agreed that this motive is a moot or at least less-important point. I imagine that less problems arise with Object because it's not being used for the same things that Map is.

The second motive, as claimed by a number of core contributors, is that unbiasing this instance makes it "more useful" in that one can make it left-biased, right-biased, or v-biased (i.e. depends on v's Semigroup instance). To date, no one has presented a case here that shows this "usefulness." Still, I found a previous issue where one proposed a right-biased union, so that fromFoldable (a <> b) == fromFoldable a <> fromFoldable b. I don't have examples off the top of my head, but one can see what leftMap <> rightMap would do if v was Semigroup a => Semigroup b => Tuple a b or Semigroup a => Array a or Semigroup a => Maybe a.

As Phil and Joe point out above, the current left-biased instance is not broken. And for better or worse, it has been around longer, so the chance that more of the ecosystem already depends on the instance being left-biased is higher. Lastly, if we didn't make this change, one could still get right-biased or v-biased <> if they define their own newtype and coerce between the two. However, v-biased and right-biased developers would still run afoul of Phil's example if it exists. While defining a separate Map type that newtypes the current Map type was the approach I originally took in this PR, we decided against that because it introduces two Map types that could further confuse beginners.

Upon reflection, I realized we already have two Map types: this library's Map and unordered-collections' HashMap. (side note: thoughts on renaming Map to OrdMap?) I imagine that whatever decision we make here will be followed in unordered-collections.

Phil's example above complicates things because he's right. If someone defines a type class like what he shows above, one cannot use coercion to solve the problem. AFAICT, one would be forced to do one of the following things:

  1. change the library's instance to be left-biased, which may break a third-party's code which needs it to be unbiased
  2. add a compatibility layer to the library and hope tha maintainer accepts the PR and continues to maintain the code
  3. fork the library, update the instance there, and depend on and maintain a fork
  4. update their own code, so that the Map type that is ultimately passed into such functions is Map k (First v)
  5. replace type class usages with regular ol' functions, which exposes the boilerplate that type classes hide

Phil's example clearly demonstrates his point. However, isn't this problem just as problematic for v-biased and right-biased developers if we don't unbias this instance because the left-biased instance is "blessed?" Moreover, I agree with @kl0tli that the type class feels contrived. It was useful for clarifying what Phil described originally in text by providing a concrete code-based example. However, is there a second more realistic one that further illustrates this problem, something that demonstrates lost business value? If there isn't, this point might not matter. If there is and it takes long to think of an example, we might not realize that until after we've made a breaking change that's hard to undo. While PureScript seems willing to make breaking changes to get things "right," it seems we don't want another Eff-to-Effect transition in the form of Map's Semigroup instance.

Let's hypothesize.

  1. We agree not to make this change and close the PR because "that ship has sailed". Right-biased and v-biased developers suffer if they run into Phil's example above. Those who want the unbiased or right-biased union can define their own newtype in their own project. If it's frequent enough, someone might decide to publish a library that provides a newtype over Map that changes the instance to be unbiased. Great! Now we have 3 Map types. To make things consistent, a similar newtype is created for HashMap in unordered-collections. Now we have 4 Map types.
  2. We merge this PR. Right-biased and left-biased developers suffer if they run into Phil's example above. Those who want the left-biased instance can define their own newtype. If frequent enough, the same 4 Map types issue above occurs.
  3. We do what this PR originally did and provide two Map types in this repo, one that is left-biased and another that is unbiased. Someone suffers if they run into Phil's example above and the author only supports one of the Map types. unordered-collections follows-suit and we have 4 Map types again.
  4. We drop the instance altogether and force everyone to use newtypes or regular old functions again because, as Gary implied, all three instances are law-abiding and useful in the correct context in the same way that Additive and Multiplicative are law-abiding and useful for Int and Number in the right context.
  5. We decide to implement this breaking change in the future once other language features @kl0tli mentioned in the original issue are implemented that help mitigate more of these problems/annoyances.

Finally, (and just as a reminder because I know these discussions can get out of hand quickly) we're not enemies. Thank you for participating in this discussion and making arguments that (hopefully) push us towards better conclusions on what to do in these situations. I hope that throughout this discussion I have understood you and shown respect to you and your reasoning.

@garyb
Copy link
Member

garyb commented Jan 10, 2021

I'm not sure if this is convincing for you Joe, but my support for this change comes from the fact I specifically avoid using the current instance since I find it such a footgun. That's also why I originally suggested dropping the instance entirely in favour of newtyping to choose an instance, Additive / Multiplicative style. If we can't agree on which instance should be the default for Map, that would still be my preference.

@paf31
Copy link
Contributor

paf31 commented Jan 13, 2021

I'm curious to hear what people think about changing the Semigroup instance for lists and arrays to the equivalent thing (append = alignWith (mergeThese append)) because arguably a lot of the same arguments could be made there (it's more general, it's not inherently left-biased, I'm likely to use it in any cases where I'm thinking of a list as a map from a set of indices to values), but I doubt anyone would seriously entertain doing it.

@hdgarrood
Copy link
Contributor

I don’t think that instance is either more or less general than the current one, as there’s no situation in which the two instances can be made to behave in the same way (unlike for Map, where you do get the same behaviour if the values are wrapped in First). But yes, this is why I’ve never brought up any idea of “unbiasedness” because I have never found that compelling either. I really just want it because it’s more useful in practice. If we want to talk about general principles which make the proposed instance preferable to the current one, I’d suggest that implicitly chucking data away is a poor default.

I don’t think lists can be thought of as maps from indices to values because you can’t “insert” at index n unless all indices <= n are already occupied.

@i-am-the-slime
Copy link

i-am-the-slime commented Jan 20, 2021

Here's another unsubstantiated data point of one:
I have introduced a bug into software by assuming that <> would "recursively" apply <> on the values of the Map. Ironically this assumption of "recursiveness" was why I used <> over union because that one has the note about it being biased. The intuition for my assumption was that this is what happens for Just and Right values. But most importantly this one:

{ a: "hi" } <> { a: "there"} == `{ a: "hithere" }`

So that's what's wrong with the current instance from my naïve standpoint.

But more importantly than trying to find what's wrong about the old instance, I'm interested in what's right about it. We've established that it is law-abiding and the same as Haskell. I'm very curious about practical applications of the old instance in some applications or libraries (I have the same reservations about union) and I'd wager that changing it to be "unbiased" would actually fix more bugs than it would produce.

@garyb
Copy link
Member

garyb commented Jan 25, 2021

We're going to go ahead with this change. Most of the feedback when this was originally proposed was positive, and most of the pushback seems to have been around inadvertently breaking the behaviour of existing code by changing the meaning of the instance, but the proposal as it stands now sidesteps that by removing the instance. It also mirrors the choice we made for numbers where a newtype is required to direct the behaviour, since there's an absence of a clear best default instance. The fix for the breaking change this will impose is also pretty trivial, excluding situations like those that Phil outlined.

I do see Phil's concern about not being able to do the kind of thing he suggested in his hypothetical example too, but it seems Cyril had some suggestions to work around that. I think the number of people affected by that situation is going to be a lot smaller than the number of feet being shot by the current instance in the long run.

I am also definitely sympathetic to the argument that the core libraries should offer good stability, but I actually take that as an argument for doing this now rather than later - the longer we leave it, the worse it will be to change it.

This change isn't just for aesthetic or ideological reasons - the fact there's a part of the core libraries that I think "oh yeah, avoid using that" says there is definitely a problem to me. (Although ironically, after all this discussion, I personally would probably never again forget that it behaves the way it does just now 😄).

src/Data/Map/Internal.purs Outdated Show resolved Hide resolved
@JordanMartinez
Copy link
Contributor Author

CI passes. Any other updates to make before merging this?

@garyb
Copy link
Member

garyb commented Jan 26, 2021

Maybe worth mentioning in the changelog that Alt and Plus were added with the current semigroup/monoid behaviour too, but otherwise LGTM!

@JordanMartinez
Copy link
Contributor Author

Does the following work?

@garyb
Copy link
Member

garyb commented Jan 26, 2021

Yup!

@JordanMartinez JordanMartinez changed the title Add Unbiased Map as separate module Drop Semigroup/Monoid instances for Map; add SemigroupMap Jan 26, 2021
@JordanMartinez JordanMartinez merged commit 0b540ea into purescript:master Jan 26, 2021
@JordanMartinez JordanMartinez deleted the addUnbiasedMap branch January 26, 2021 20:25
srghma added a commit to srghma/purescript-halogen that referenced this pull request Feb 3, 2021
milesfrain added a commit to milesfrain/purescript-react-basic-hooks that referenced this pull request Mar 3, 2021
milesfrain added a commit to milesfrain/purescript-react-basic-hooks that referenced this pull request Mar 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants