Skip to content
This repository was archived by the owner on Oct 4, 2020. It is now read-only.

Faster & simpler traverse for StrMap #96

Merged
merged 2 commits into from
May 29, 2017
Merged

Faster & simpler traverse for StrMap #96

merged 2 commits into from
May 29, 2017

Conversation

joshuahhh
Copy link
Contributor

Please take a look!

Here's some benchmarking (code):

sequence

@hdgarrood
Copy link
Contributor

This looks very encouraging, but I'm not overly keen on having a copy of the old implementation in the tests to test against. Do we not already have some code testing that StrMap's Traversable instance does what it should do?

@joshuahhh
Copy link
Contributor Author

Sounds reasonable. I added some more straightforward tests of Traverse.

Technical tangent: Since the entries in a StrMap are not ordered, StrMaps are not naturally foldable or traversable. You have two options...

  • Write instances of fold / traverse which proceed in an arbitrary order. This will work perfectly if you only fold commutative functions, or traverse using commutative applicatives (those for which f <$> a <*> b = flip f <$> b <*> a). It will work imperfectly but usefully otherwise. For instance, Array is not a commutative applicative. This means there are actually pairs of StrMap (Array Int)s m1 and m2 such that m1 == m2 but sequence m1 /= sequence m2. However, the applicative instance for Array IS commutative up to reordering of the array, so A.sort (sequence m1) == A.sort (sequence m2). (This is why the "sequence works (for m = Array)" test calls A.sort before comparing the results.)

  • Write instances of fold / traverse which proceed in sorted order. This will avoid the problems above, but with a performance cost. And, since StrMaps are fundamentally not ordered structures and you really shouldn't be folding non-commutative functions on them anyway, it seems like a silly cost to ensure a pedantic property.

Just throwing that out there, as a disclaimer and in case you have any thoughts on it.

Thanks a bunch!

@joshuahhh
Copy link
Contributor Author

Not-at-all-urgent ping on this, @hdgarrood. (Thanks!)

@hdgarrood
Copy link
Contributor

This looks sensible to me but I am not sure why the current instance was written that way, and I'd like to get input from someone who does before approving this.

Also just wondering re your tangent: the current Traversable StrMap instance proceeds in an arbitrary order, like the current one, right? Also,

This means there are actually pairs of StrMap (Array Int)s m1 and m2 such that m1 == m2

Presumably this depends on insertion order? Can you construct such a pair?

@joshuahhh
Copy link
Contributor Author

This looks sensible to me but I am not sure why the current instance was written that way, and I'd like to get input from someone who does before approving this.

Sounds wise. The traverse instance for StrMap dates from 1463e59. @joneshf, can you see this pull request? Do you see anything wrong about replacing

traverse f ms = foldr (\x acc -> union <$> x <*> acc) (pure empty) ((map (uncurry singleton)) <$> (traverse f <$> toArray ms))

with

traverse f ms = fold (\acc k v -> insert k <$> f v <*> acc) (pure empty) ms

?

Also just wondering re your tangent: the current Traversable StrMap instance proceeds in an arbitrary order, like the current one, right?

Yup!

Can you construct such a pair?

Here is an example with the old traverse. Presumably the new version works the same way.

@hdgarrood
Copy link
Contributor

Actually I've just realised - won't the new version still be quadratic, since each insert needs to make a copy of the entire accumulator? I wonder if just converting to Map String for the traversal and then back again at the end might be better.

@joshuahhh
Copy link
Contributor Author

Good point. I suppose it depends on use-case.

If you are sequenceing a StrMap (IO Int) with 1000 key-value pairs, you win by converting to / from Maps.

The situation that motivated me was more similar to sequenceing a StrMap (Array Int) with a small handful of key-value pairs, but with potentially large arrays. In this case, the conversions would likely be quite expensive.

If you think the first situation is important to protect, then I might back off on my approach, since I probably want to add further optimizations which move away from use of a generic sequence anyway.

@hdgarrood
Copy link
Contributor

I would imagine that converting StrMap a to Map String a can be done in n log n time, regardless of the size of the values; the values themselves shouldn't need to be deep-copied or anything.

@hdgarrood
Copy link
Contributor

Ohh, I see, you would need to do map mapToStrMap at the end, which would potentially involve calling mapToStrMap n^2 times.

It's just occurred to me that I have never wanted to use this instance, so I really don't have any strong opinions here.

This PR does seem to be an improvement over the current code in any case so I'd be happy to see it merged now, and then we can revisit these issues later.

@paf31
Copy link
Contributor

paf31 commented May 28, 2017

@hdgarrood Let's merge this if you're happy with the trade-offs. Thanks for reviewing!

Thanks @joshuahhh!

@hdgarrood
Copy link
Contributor

Yes, thank you!

@hdgarrood hdgarrood merged commit 7bdd5c1 into purescript-deprecated:master May 29, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants