Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uniformly add ordered collection functions to Set module where they apply, so further harmonising collection functions #1361

Open
6 tasks done
tomcl opened this issue May 18, 2024 · 11 comments

Comments

@tomcl
Copy link

tomcl commented May 18, 2024

I propose we extend the Set library module collection by adding existing ordered collection module functions, where those make sense. This is similar to the previous harmonisation of all ordered collections.

Additions would be (I think):

  • allPairs
  • average
  • averageBy
  • collect
  • compareWith
  • concat (alias of existing unionMany)
  • countBy
  • exactlyOne
  • groupBy
  • init
  • sum
  • sumBy
  • tryExactlyOne
  • unfold
  • where (alias of existing filter)

I've omitted functions that would be non-deterministic (or rather Set implementation-dependent) for sets. But we do have such Set functions (maxElement, minElement) and in order to be minimally inconsistent the following could also be added:

  • min (expected alias of minElement)
  • minBy (this does not have the strong performance justification of min, but has many use cases)
  • max (expected alias of maxElement)
  • maxBy (this does not have the strong performance justification of max, but has many use cases)

The existing way of approaching this problem in F# is ...

  • Add a set of non-standard library helpers
  • Add boilerplate to access ordered collection functions via conversion to/from

Pros and Cons

The advantages of making this adjustment to F# are

  • Set could be used in code whenever that is appropriate without extra code to work around lack of Set functions, and with the expressiveness of Set union and difference operators. At the moment the lack of expected (from other collections) functions discourages this and can lead to overuse of List simply because it has a more complete set of functions even though semantically a set is appropriate.
  • Cognitive overhead is minimised, because available functions on collection modules are more uniform

The disadvantages of making this adjustment to F# are ...

  • There would be a larger number of functions in the Set module. See below for discussion.
  • Set as Map has quite a high performance overhead. Set might be overused, not considering this, if its use were not discouraged. I do not myself consider this a valid argument.

Extra information

Estimated cost

Small, if implemented via equivalent List or Seq functions. Although not optimal the loss of speed here would be slow relative to the overhead of the Set itself.

Related suggestions: (put links to related suggestions here)

The discussion in #803 is similarly about adding to Set module in a low-cognitive-overhead way, but specifically related to total versions of non-total functions. That introduces extra issues and the (lamentable) fact that closing all F# non-total library functions by adding total equivalents would mean a large number of try-prefix functions added.

However some of the argument is identical to here. When is it worth the cost of adding library functions to reduce cognitive overhead overall?

In both cases there is less cognitive overhead if functions are added where they are expected copies (changing module, or adding try) of existing non-total functions.

Adding less-used functions to a library module makes finding more-used functions from autocomplete list or module documentation more difficult.

Personally, I support #803, in spite of the disadvantage.

That cognitive overhead disadvantage is less extreme - in fact negative - here, it can be argued, because the larger set of functions are a subset of what we already have in the more heavily used List, Array, and Seq modules.

Affidavit (please submit!)

Please tick these items by placing a cross in the box:

  • This is not a question (e.g. like one you might ask on StackOverflow) and I have searched StackOverflow for discussions of this issue
  • This is a language change and not purely a tooling change (e.g. compiler bug, editor support, warning/error messages, new warning, non-breaking optimisation) belonging to the compiler and tooling repository
  • This is not something which has obviously "already been decided" in previous versions of F#. If you're questioning a fundamental design decision that has obviously already been taken (e.g. "Make F# untyped") then please don't submit it
  • I have searched both open and closed suggestions on this site and believe this is not a duplicate. (Maybe it was discussed during the ordered collection harmonisation - which I cannot now find?)

Please tick all that apply:

  • This is not a breaking change to the F# language design
  • I would be willing to help implement and/or test this

For Readers

If you would like to see this issue implemented, please click the 👍 emoji on this issue. These counts are used to generally order the suggestions by engagement.

@abelbraaksma
Copy link
Contributor

Add boilerplate to access ordered collection functions via conversion to/from

Since a Set implements IEnumerable<'T>, the boilerplate is just mySet :> seq<_> |> Seq.average/sum/exactlyOne. Of course, for those that filter the set, the returned value has to be converted into a set again, which is costly.

I think it makes sense to have these. We should seek parity between collections where possible. I'd prefer the same on Map, but its nature makes that a little less obvious.

Other related: #768.

@T-Gro
Copy link

T-Gro commented May 20, 2024

Reading the proposed lists, is there any function where the expectation would be to utilize inner structure of the Set and gain a lot better perf characteristics compared to using Seq.* module functions ?

@tomcl
Copy link
Author

tomcl commented May 20, 2024

We already have minElement, maxElement, contains that do this.

I can't think of any big performance advantage from the others - hence I think implementation could be via List or Array and quick to do.

The motivation here is making the core libraries more consistent and reducing cognitive burden, rather than adding anything.

FYI there is some additional functionality - not available in any other module - that for performance reasons would be desirable which is:

largestElementBelow value set // returns largest element smaller than value in set as option
smallestElementAbove value set // return smallest element larger than value in set as option

Both should return an option in case there are no elements meeting the condition.

@abelbraaksma
Copy link
Contributor

is there any function where the expectation would be to utilize inner structure of the Set and gain a lot better perf characteristics compared to using Seq.* module functions ?

@T-Gro, I'd think about init, concat and collect, for instance. But is performance the only motivation here?

Perhaps a better question to ask is whether we should strive for parity between collections, as we've done in the past. We've managed to bring lists, seqs and arrays close together and many functions already exist in Set (since a very long time, btw). I believe it is time to bring this orthogonality back, as seq/list/array have evolved, but Set hasn't.

@voronoipotato
Copy link

Sets are often really convenient, especially when you have all the items up front. Performance is not meaningfully a factor in my personal usage. For me the benefits are the ability to think in terms of sets. What would be the difference between Set.concat and Set.unionMany? If the benefits are performance, why not just improve the performance of Set.unionMany ? I wouldn't object to an alias to Set.concat for people looking for it though. Init sounds good. Collect sounds like it could be kinda useful.

@tomcl
Copy link
Author

tomcl commented May 20, 2024

The motivation is uniformity between collections and so reducing the cognitive burden of remembering (or looking up) new functions.

concat is indeed an alias of unionMany.

Probably the one new operation on Set module we should have but do not is symmetricDifference, that is approved in principle as #768 . That is a bit more work if implemented natively using the Set implementation with good performance.

@abelbraaksma
Copy link
Contributor

Also related, binary search function (approved) for map/set: #82

@voronoipotato
Copy link

@tomcl sweet then I support this suggestion. sounds awesome.

@tomcl
Copy link
Author

tomcl commented May 20, 2024

@abelbraaksma re #82 good find. I was thinking of slightly different version of same thing but there is a clear need for this in terms of performance. But, if added to Set, it should also be added to Map.

@tomcl
Copy link
Author

tomcl commented May 20, 2024

@tomcl sweet then I support this suggestion. sounds awesome.

:) Great. You could up-vote the issue! More votes => more likely to happen.

It looks to me as though a good procedure would be to add any outstanding approved-in-principle functions and the cloned from ordered collection functions all at one go.

@voronoipotato
Copy link

voted, thanks for reminding me :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants