Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set.disjoint #6450

Closed
vicuna opened this issue Jun 3, 2014 · 10 comments

Comments

Projects
None yet
2 participants
@vicuna
Copy link

commented Jun 3, 2014

Original bug ID: 6450
Reporter: @alainfrisch
Assigned to: @nojb
Status: resolved (set by @nojb on 2018-08-21T17:01:03Z)
Resolution: fixed
Priority: normal
Severity: feature
Target version: 4.08.0+dev/beta1/beta2
Category: standard library

Bug description

Checking disjointness of sets is a common operation and it can be implemented more efficiently from inside the Set module:

    let rec disjoint s1 s2 =
      match (s1, s2) with
        (Empty, _) | (_, Empty) -> true
      | (Node(l1, v1, r1, _), t2) ->
          if s1 == s2 then false
          else match split v1 t2 with
            (l2, false, r2) -> disjoint l1 l2 && disjoint r1 r2
          | (l2, true, r2) -> false

(it could even probably be more clever, using respective depth of the subtree.)

Any opposition to including this function?

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 4, 2014

Comment author: @mshinwell

If the answer to PR 6279 is that these functions should go in external libraries, I don't see how the inclusion of this (far less useful) function could be justified.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 4, 2014

Comment author: @alainfrisch

#6279 is about a function which would not really benefit from being included in the functor. Well, actually, I just realized that the split function is exposed in Set, so the disjoint function could indeed be implemented outside the functor as well. But keeping it in Set would make it possible to apply further optimizations to it.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 4, 2014

Comment author: @chambart

Note that this one cannot be efficiently implemented outside. The split function is effectively exported, but there is no function to find the root of the tree (here v1 in the pattern matching). And we wouldn't want such a root function, since it wouldn't have a clear semantics.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 5, 2014

Comment author: @alainfrisch

Ah yes, indeed.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 6, 2014

Comment author: @mshinwell

I get the feeling that, increasingly, the criterion for inclusion in the standard library should be whether or not the feature is needed now in the compiler distribution itself. If the feature is not needed there, then it goes outside, as per Damien's comment.

If that is not the criterion, then the next most reasonable criterion would seem to be whether the feature assists in the provision of a comprehensive standard library. In that scenario, I cannot see how the inclusion of [Set.disjoint] but not [Set.map] could be justified.

I cannot see why a good criterion for inclusion in the standard library would be one based on efficiency of some core set of functions, leaving users to write the remaining functions on top. That would seem to lead to a fragmented interface, and bugs.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 6, 2014

Comment author: @alainfrisch

There is no consensus in the community about a good general purpose alternative to the standard library, and I believe the "official" stdlib will remain widely used, especially for libraries. If an extension fits well in the stdlib, I don't see why we should prevent ourselves from including it. Of course, we need to apply some reason and not include gazillions of tiny one-liner helper functions which only complicate the interface. I've no strong opinion about Set.map. This does not seem to be a very common operation on sets and it does not benefit particularly from the choice of the underlying concrete data structure; and it's a one-liner. I would have a more positive opinion if the function did something non-trivial, such as guaranteeing a linear complexity for the case where the mapped function is monotonic on the elements in the set (this does not seem very difficult).

Checking disjointness of sets, or combining maps, on the other hand, seem more common and immediately benefit from being included in the stdlib.

I cannot see why a good criterion for inclusion in the standard library would > be one based on efficiency of some core set of functions, leaving users to
write the remaining functions on top. That would seem to lead to a fragmented
interface, and bugs.

So what do you propose? Rejecting simple and useful addition, esp. when they cannot be implemented externally without switching to an entirely different library, means that we encourage users to stop using the official stdlib. This is not a good approach while there is no good alternative to it.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jun 11, 2014

Comment author: ybarnoy

There are advantages to having no dependencies. My old research project's approach was not to use any dependencies that were not in-tree. We therefore used the stdlib with our own enhancements. This makes the code very portable, especially for windows systems where opam doesn't work.

Therefore, I see nothing wrong with enhancing the stdlib with useful functionality, even if it's never going to be at the level of Batteries or Core. Heck, had stdlib had low precedence function application (@@) earlier, we wouldn't have littered our code with our own symbol. I'm still amazed that there's no standard function composition operator in stdlib.

Stdlib is the gold standard by which the other 2 libraries live, even if Core likes to flaunt its divergence from stdlib, and it's worth maintaining, even if it's not cutting edge IMO.

@vicuna

This comment has been minimized.

Copy link
Author

commented Dec 6, 2016

Comment author: @alainfrisch

For the records, we have been using the following version for some time:

    (* Same as split, but compute the left and right subtrees
       only if the pivot element is not in the set.  The right subtree
       is computed on demand. *)

    type split_bis =
      | Found
      | NotFound of t * (unit -> t)

    let rec split_bis x = function
        Empty ->
          NotFound (Empty, (fun () -> Empty))
      | Node(l, v, r, _) ->
          let c = Ord.compare x v in
          if c = 0 then Found
          else if c < 0 then
            match split_bis x l with
            | Found -> Found
            | NotFound (ll, rl) -> NotFound (ll, (fun () -> join (rl ()) v r))
          else
            match split_bis x r with
            | Found -> Found
            | NotFound (lr, rr) -> NotFound (join l v lr, rr)

    let rec disjoint s1 s2 =
      match (s1, s2) with
        (Empty, _) | (_, Empty) -> true
      | (Node(l1, v1, r1, _), t2) ->
          if s1 == s2 then false
          else match split_bis v1 t2 with
            NotFound(l2, r2) -> disjoint l1 l2 && disjoint r1 (r2 ())
          | Found -> false
@vicuna

This comment has been minimized.

Copy link
Author

commented Aug 12, 2018

Comment author: @nojb

#1986

@vicuna

This comment has been minimized.

Copy link
Author

commented Aug 21, 2018

Comment author: @nojb

PR merged.

@vicuna vicuna closed this Aug 21, 2018

@vicuna vicuna added the stdlib label Mar 14, 2019

@vicuna vicuna added this to the 4.08.0 milestone Mar 14, 2019

@vicuna vicuna added the feature-wish label Mar 20, 2019

@nojb nojb modified the milestones: 4.08.0, 4.08 Mar 29, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.