Skip to content
{{ message }}

# ocaml / ocaml Public

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

# Add efficient functions to create sets from lists or arrays#4986

Closed
opened this issue Feb 26, 2010 · 5 comments
Closed

# Add efficient functions to create sets from lists or arrays#4986

opened this issue Feb 26, 2010 · 5 comments
Labels

## Comments

### vicuna commented Feb 26, 2010

Original bug ID: 4986
Reporter: @alainfrisch
Status: resolved (set by @alainfrisch on 2013-07-09T11:02:47Z)
Resolution: open
Priority: normal
Severity: feature
Fixed in version: 4.02.0+dev
Category: ~DO NOT USE (was: OCaml general)
Monitored by: mehdi @hcarty

## Bug description

Creating a set from a list or an array is quite common. It could be useful
to add such functions in the Set module.

A trivial implementation is:

let of_array a =
Array.fold_right add a empty

A more efficient implementation is to sort the array first, and then build the balanced tree directly:

``````let of_sorted_array a =
let rec sub i0 = function
| 0 -> Empty
| 1 -> Node (Empty, a.(i0), Empty, 1)
| 2 -> Node (Node(Empty, a.(i0), Empty, 1), a.(i0 + 1), Empty, 2)
| 3 -> Node (Node(Empty, a.(i0), Empty, 1), a.(i0 + 1), Node(Empty, a.(i0 + 2), Empty, 1), 2)
| n ->
let nl = n / 2 in
let l = sub i0 nl in
let r = sub (i0 + nl + 1) (n - nl - 1) in
create l a.(i0 + nl) r
in
sub 0 (Array.length a)
``````

(The cases for n=1,2,3 are just extra optimizations.)

When creating sets from arrays of random integers of size 100000,
the optimized version is almost twice as fast as the naive one (ocamlopt, x86).

## File attachments

### vicuna commented Jul 11, 2012

 Comment author: @damiendoligez Note: the code looks wrong, nl should probably be something like (i0 + n) / 2.

### vicuna commented Jul 11, 2012

 Comment author: @alainfrisch Why is it wrong? nl represents the length of the subarray, i0 the index of its first element.

### vicuna commented Sep 17, 2012

 Comment author: @damiendoligez I don't know what I was thinking when I wrote that. The code looks fine.

### vicuna commented Dec 4, 2012

 Comment author: @alainfrisch A patch is attached, with also a new List.sort_dedup functions (like List.sort, but removes duplicates). Performance compared to an implementation based on folding Set.add is usually improved in a significant way, especially for long lists. The only case of degraded performances is for long lists with a lot of duplicates. Examples: list of length 10000 made of random strings of length 5 (x1000): 6.61s -> 4.62s list of length 10 made of random strings of length 5 (x1000000): 0.96s -> 0.76s list of length 1000 made of random integers between 0 and 9 (x10000): 0.45s -> 0.49s list of length 1000 made of random integers between 0 and 9999 (x10000): 1.89s -> 1.28s

### vicuna commented Jul 9, 2013

 Comment author: @alainfrisch Commit 13876 on trunk adds List.sort_uniq and Set.of_list.

closed this Jul 9, 2013
mentioned this issue Mar 14, 2019
Closed
to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant