New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functions `List.compare_length` and `List.compare_length_with` #760

Merged
merged 1 commit into from Sep 18, 2016

Conversation

Projects
None yet
@lefessan
Contributor

lefessan commented Aug 12, 2016

A standard inefficiency pattern is the test List.length list > N where N is often 0, 1 or 2, while List.length will still iter on the whole list. We propose to introduce two new functions:

  • List.compare_length l1 l2 compares the lengths of the two lists
  • List.length_compare l n compares the length of the list to an integer
    Both functions stops after the minimal number of iterations needed on the lists.
@lpw25

This comment has been minimized.

Show comment
Hide comment
@lpw25

lpw25 Aug 12, 2016

Contributor

I like both those functions, but I think the names are too similar. compare_length seems to fit quite well with what it does, but length_compare is not a particularly obvious name so perhaps rename that one.

Contributor

lpw25 commented Aug 12, 2016

I like both those functions, but I think the names are too similar. compare_length seems to fit quite well with what it does, but length_compare is not a particularly obvious name so perhaps rename that one.

@hcarty

This comment has been minimized.

Show comment
Hide comment
@hcarty

hcarty Aug 12, 2016

Contributor

Maybe check_length?

Contributor

hcarty commented Aug 12, 2016

Maybe check_length?

@lefessan

This comment has been minimized.

Show comment
Hide comment
@lefessan

lefessan Aug 12, 2016

Contributor

Could also be compare_lengths and compare_length, depending on the number of lists passed as arguments.

Contributor

lefessan commented Aug 12, 2016

Could also be compare_lengths and compare_length, depending on the number of lists passed as arguments.

@lijunsong

This comment has been minimized.

Show comment
Hide comment
@lijunsong

lijunsong Aug 17, 2016

Contributor

This seems trivial, but ;; at the end of the two functions feel so odd in this file.

Contributor

lijunsong commented Aug 17, 2016

This seems trivial, but ;; at the end of the two functions feel so odd in this file.

@bobzhang

This comment has been minimized.

Show comment
Hide comment
@bobzhang

bobzhang Aug 17, 2016

Member

@lefessan I wrote those two functions several times, btw, can compare_length do a little more work, it returns an int instead, if < 0 then is false, otherwise return the length

Member

bobzhang commented Aug 17, 2016

@lefessan I wrote those two functions several times, btw, can compare_length do a little more work, it returns an int instead, if < 0 then is false, otherwise return the length

@braibant

This comment has been minimized.

Show comment
Hide comment
@braibant

braibant Aug 18, 2016

Contributor

it returns an int instead, if < 0 then is false, otherwise return the length

I really think we should avoid this kind of encodings. If this behavior is important, let's use a (polymorphic) variant type.

Re. the naming of length_compare, what about compare_length_with_int?

Contributor

braibant commented Aug 18, 2016

it returns an int instead, if < 0 then is false, otherwise return the length

I really think we should avoid this kind of encodings. If this behavior is important, let's use a (polymorphic) variant type.

Re. the naming of length_compare, what about compare_length_with_int?

@jberdine

This comment has been minimized.

Show comment
Hide comment
@jberdine

jberdine Aug 18, 2016

jberdine commented Aug 18, 2016

@lefessan

This comment has been minimized.

Show comment
Hide comment
@lefessan

lefessan Aug 29, 2016

Contributor

I followed @jberdine naming suggestion. Encoding the length of the shortest list in the result would be possible, but not very friendly (i.e. the result would be 1+length to avoid the zero).

Maybe an alternative to List.compare_to_length would be to provide a List.length_smaller_than list n that returns the length of the list if smaller than n, else returns n.

Contributor

lefessan commented Aug 29, 2016

I followed @jberdine naming suggestion. Encoding the length of the shortest list in the result would be possible, but not very friendly (i.e. the result would be 1+length to avoid the zero).

Maybe an alternative to List.compare_to_length would be to provide a List.length_smaller_than list n that returns the length of the list if smaller than n, else returns n.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Sep 2, 2016

Member

shorter_than l1 l2 and length_less_than l1 n would also be explicit names.

Member

gasche commented Sep 2, 2016

shorter_than l1 l2 and length_less_than l1 n would also be explicit names.

@yallop

This comment has been minimized.

Show comment
Hide comment
@yallop

yallop Sep 2, 2016

Member

I like @gasche's suggestions. Both shorter_than and length_less_than encourage a less imperative way of thinking, and read nicely in code:

match e with
  A x, B y when shorter_than x y -> ...
Member

yallop commented Sep 2, 2016

I like @gasche's suggestions. Both shorter_than and length_less_than encourage a less imperative way of thinking, and read nicely in code:

match e with
  A x, B y when shorter_than x y -> ...
@lefessan

This comment has been minimized.

Show comment
Hide comment
@lefessan

lefessan Sep 2, 2016

Contributor

@gasche The functions do not return booleans, but integers, like the compare function.

Contributor

lefessan commented Sep 2, 2016

@gasche The functions do not return booleans, but integers, like the compare function.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Sep 2, 2016

Member

Right. Then the best I can think of right now is compare_length (to be read in the sense of Haskell's compareonlength and compare_length_with -- which I now notice is very similar to Thomas compare_length_with_int.

Another function that could replace most use-cases of compare_length_with is valid_index (or has_nth).

Member

gasche commented Sep 2, 2016

Right. Then the best I can think of right now is compare_length (to be read in the sense of Haskell's compareonlength and compare_length_with -- which I now notice is very similar to Thomas compare_length_with_int.

Another function that could replace most use-cases of compare_length_with is valid_index (or has_nth).

@alainfrisch

This comment has been minimized.

Show comment
Hide comment
@alainfrisch

alainfrisch Sep 5, 2016

Contributor

Thought experiment: what would it take to let users write if List.length l < 100 then ... and have it compiled the efficient way? Could flambda (be extended to) optimize the code automatically based on the definition of List.length? A more pedestrian way would be to turn List.length into an external primitive and add ad hoc rules in the compiler to detect forms that can be optimized. Many more optimizations could be implemented on functions from the stdlib if the compiler knew about them (e.g. avoiding intermediate copies in s1 ^ s2 ^ s3).

Contributor

alainfrisch commented Sep 5, 2016

Thought experiment: what would it take to let users write if List.length l < 100 then ... and have it compiled the efficient way? Could flambda (be extended to) optimize the code automatically based on the definition of List.length? A more pedestrian way would be to turn List.length into an external primitive and add ad hoc rules in the compiler to detect forms that can be optimized. Many more optimizations could be implemented on functions from the stdlib if the compiler knew about them (e.g. avoiding intermediate copies in s1 ^ s2 ^ s3).

@Drup

This comment has been minimized.

Show comment
Hide comment
@Drup

Drup Sep 5, 2016

Contributor

@alainfrisch So basically GHC's fusion rules ? A lot of people are going to feel strongly about that. :)

Contributor

Drup commented Sep 5, 2016

@alainfrisch So basically GHC's fusion rules ? A lot of people are going to feel strongly about that. :)

@alainfrisch

This comment has been minimized.

Show comment
Hide comment
@alainfrisch

alainfrisch Sep 5, 2016

Contributor

I'm not familiar with GHC fusion rules, but it seems they can be specified in "normal" user code. I was referring to something more ad hoc (exposing some well-chosen functions from the stdlib as external % primitives to let the compiler know about them). This is much less powerful, but much simpler and perhaps a bit less controversial.

Contributor

alainfrisch commented Sep 5, 2016

I'm not familiar with GHC fusion rules, but it seems they can be specified in "normal" user code. I was referring to something more ad hoc (exposing some well-chosen functions from the stdlib as external % primitives to let the compiler know about them). This is much less powerful, but much simpler and perhaps a bit less controversial.

@lefessan

This comment has been minimized.

Show comment
Hide comment
@lefessan

lefessan Sep 5, 2016

Contributor

The problem with "fusion rules" is how to be sure they are going to apply to your particular case ? For example, if List.length x < 100 then ... is easy to detect, but what about let l = List.length x in if l < 100 then ... where l is never reused ? At the end, if you need to know what is detected and what is not, it might be easier to just use directly the optimized functions.

Contributor

lefessan commented Sep 5, 2016

The problem with "fusion rules" is how to be sure they are going to apply to your particular case ? For example, if List.length x < 100 then ... is easy to detect, but what about let l = List.length x in if l < 100 then ... where l is never reused ? At the end, if you need to know what is detected and what is not, it might be easier to just use directly the optimized functions.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche Sep 5, 2016

Member

I think that having optimizations that reduce constant factors (for example a more efficient test for str = "") is fair game, but I would be very careful with optimization that introduce algorithmic speedups. In fact it's not so much about constant-time vs. more, it's about preserving or not my model of what code is running.

With list operations, there is a crisp type definition that determines how most operations are implemented (but not their stack space usage), and anyone, even beginners, can quickly guess that List.length cannot possibly not traverse the whole list, same for fold, map etc. If we start optimizing arbitrary operations, how are we going to let people reason about efficiency without having to check, for any possible optimizable pattern, whether it is optimized or not?

To summarize, I think optimizing (length li) <comparison> <constant> is not a good idea.

Member

gasche commented Sep 5, 2016

I think that having optimizations that reduce constant factors (for example a more efficient test for str = "") is fair game, but I would be very careful with optimization that introduce algorithmic speedups. In fact it's not so much about constant-time vs. more, it's about preserving or not my model of what code is running.

With list operations, there is a crisp type definition that determines how most operations are implemented (but not their stack space usage), and anyone, even beginners, can quickly guess that List.length cannot possibly not traverse the whole list, same for fold, map etc. If we start optimizing arbitrary operations, how are we going to let people reason about efficiency without having to check, for any possible optimizable pattern, whether it is optimized or not?

To summarize, I think optimizing (length li) <comparison> <constant> is not a good idea.

@lefessan

This comment has been minimized.

Show comment
Hide comment
@lefessan

lefessan Sep 5, 2016

Contributor

Actually, since compiler plugins will be available in 4.04, why not write a plugin to do these optimizations, and let developers choose to use or not use it for a particular module.

Contributor

lefessan commented Sep 5, 2016

Actually, since compiler plugins will be available in 4.04, why not write a plugin to do these optimizations, and let developers choose to use or not use it for a particular module.

@xavierleroy

This comment has been minimized.

Show comment
Hide comment
@xavierleroy

xavierleroy Sep 5, 2016

Contributor

Thought experiment: what would it take to let users write if List.length l < 100 then ... and have it compiled the efficient way?

Don't overestimate the cleverness of the compiler. I can see a compiler that performs compile-time symbolic evaluation turning List.length l < 100 into

  match l with [] -> true | _:: l ->
  match l with [] -> true | _:: l ->
  (* etc, etc *)
  match l with [] -> true | _ :: _ -> false

Re-rolling these 100 matches into a recursive function is another matter entirely, and more of the domain of a program synthesis tool than of an optimizing compiler.

Contributor

xavierleroy commented Sep 5, 2016

Thought experiment: what would it take to let users write if List.length l < 100 then ... and have it compiled the efficient way?

Don't overestimate the cleverness of the compiler. I can see a compiler that performs compile-time symbolic evaluation turning List.length l < 100 into

  match l with [] -> true | _:: l ->
  match l with [] -> true | _:: l ->
  (* etc, etc *)
  match l with [] -> true | _ :: _ -> false

Re-rolling these 100 matches into a recursive function is another matter entirely, and more of the domain of a program synthesis tool than of an optimizing compiler.

@lefessan

This comment has been minimized.

Show comment
Hide comment
@lefessan

lefessan Sep 6, 2016

Contributor

Following @gasche comment, I changed compare_to_length into compare_length_with, but I still prefer compare_lengths to compare_length. I don't like valid_index, as a name for a function over lists, as it seems to encourage using a list instead of an array for indexed operations.

Contributor

lefessan commented Sep 6, 2016

Following @gasche comment, I changed compare_to_length into compare_length_with, but I still prefer compare_lengths to compare_length. I don't like valid_index, as a name for a function over lists, as it seems to encourage using a list instead of an array for indexed operations.

@lefessan lefessan changed the title from Add functions `List.compare_length` and `List.length_compare` to Add functions `List.compare_length` and `List.compare_length_with` Sep 8, 2016

@lefessan

This comment has been minimized.

Show comment
Hide comment
@lefessan

lefessan Sep 15, 2016

Contributor

It seems that we have reached a consensus. I add a commit to update CHANGES, and I merge.

Contributor

lefessan commented Sep 15, 2016

It seems that we have reached a consensus. I add a commit to update CHANGES, and I merge.

@yallop

This comment has been minimized.

Show comment
Hide comment
@yallop

yallop Sep 15, 2016

Member

compare_lengths is too specific. It would be better to have a function

List.compare : ('a -> 'a -> int) -> 'a list -> 'a list -> int

(generalizing the type to support different element types, if you prefer)

and then define

compare_lengths = List.compare (fun _ _ -> 0)
Member

yallop commented Sep 15, 2016

compare_lengths is too specific. It would be better to have a function

List.compare : ('a -> 'a -> int) -> 'a list -> 'a list -> int

(generalizing the type to support different element types, if you prefer)

and then define

compare_lengths = List.compare (fun _ _ -> 0)
@lefessan

This comment has been minimized.

Show comment
Hide comment
@lefessan

lefessan Sep 15, 2016

Contributor

I see your point, but I am pretty sure it would make the performance of compare_lengths much worse (except maybe with flambda). I would suggest to make an independant PR for it, that would not modify compare_lengths.

Contributor

lefessan commented Sep 15, 2016

I see your point, but I am pretty sure it would make the performance of compare_lengths much worse (except maybe with flambda). I would suggest to make an independant PR for it, that would not modify compare_lengths.

@lefessan lefessan merged commit f872d67 into ocaml:trunk Sep 18, 2016

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@lefessan lefessan deleted the lefessan:2016-08-12-list-length-comparisons branch Sep 18, 2016

@@ -232,7 +232,7 @@
(apply f (field 0 param) (field 1 param)))
map =
(function f l
(apply (field 12 (global List!)) (apply uncurry f)
(apply (field 14 (global List!)) (apply uncurry f)

This comment has been minimized.

@avsm

avsm Sep 18, 2016

Member

was this intended to be committed with the rest of the patch? It appears unrelated.

@avsm

avsm Sep 18, 2016

Member

was this intended to be committed with the rest of the patch? It appears unrelated.

This comment has been minimized.

@gasche

gasche Sep 18, 2016

Member

I think this test exposes the offset of some functions (List.map here, I suppose) in the List module, so it may change when new definitions are added before the functions used.

@gasche

gasche Sep 18, 2016

Member

I think this test exposes the offset of some functions (List.map here, I suppose) in the List module, so it may change when new definitions are added before the functions used.

This comment has been minimized.

@lefessan

lefessan Sep 19, 2016

Contributor

Yes, it is a fix for the testsuite needed by the patch, because it changes the offset of List.map in the signature.

@lefessan

lefessan Sep 19, 2016

Contributor

Yes, it is a fix for the testsuite needed by the patch, because it changes the offset of List.map in the signature.

camlspotter pushed a commit to camlspotter/ocaml that referenced this pull request Oct 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment