New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an overflow bug in String.concat #805

Closed
wants to merge 3 commits into
base: trunk
from

Conversation

Projects
None yet
6 participants
@yallop
Member

yallop commented Sep 12, 2016

[tl;dr: String.concat can segfault due to overflow; this PR fixes the bug and makes the function faster.]

The Description of the Bug

This PR fixes a bug from last century in the String.concat function, which catenates a list of strings:

String.concat "," ["a"; "b"; "c"]
~>
"a,b,c"

and whose description includes the following:

Raise [Invalid_argument] if the result is longer than
{!Sys.max_string_length} bytes. *)

Unfortunately, computing the length of the output string may overflow an int, in which case the result is not the Invalid_argument exception but an out-of-bounds write. This situation is perhaps unlikely on a modern 64-bit platform, but on what the configure script calls "a regular 32 bit architecture" the bug can be triggered with a single large string and a list of moderate length.

The Example

Here's an example to illustrate the problem. The idea is to create a list containing multiple references to a single large string of length Sys.max_string_length, or 16777211. The number of entries in the list is carefully chosen so that computing their sum wraps around past max_int (1073741823) and min_int, all the way past zero again to give a number between zero and Sys.max_string_length -- i.e. a valid length for a string but significantly less than the combined lengths of the members of the list.

let big = String.make Sys.max_string_length 'x'

let push x l = l := x :: !l
let (+=) a b = a := !a + b

let sz, l = ref 0, ref []

let () = begin
  while !sz >= 0 do push big l; sz += Sys.max_string_length done;
  while !sz <= 0 do push big l; sz += Sys.max_string_length done;
end

let _ = String.concat "" !l

Here's the unfortunate result of compiling and running the program:

Segmentation fault (core dumped)

The Proposed Fix

The fix is simple, in principle: the code that computes the size of the result string must be extended with a check that the sum does not overflow an int. In general overflow is only detectable during the computation; by the time the final result is computed it may be too late, as the example above shows.

Here is the current code for computing the size of the result string:

let num = ref 0 and len = ref 0 in
List.iter (fun s -> incr num; len := !len + length s) l;
(!len + length sep * (!num - 1))

Naively adding an overflow check to each iteration (i.e. to the function passed to List.iter) slows things down slightly, as we might expect:

Effect on performance of adding an overflow check

List length: 1 2 4 8 16 32 64 128 256
Current code (ns): 13.08 20.44 23.72 39.31 83.46 141.42 266.81 505.84 994.86
With overflow check (ns): 13.82 19.94 27.07 43.77 83.76 157.16 300.61 578.60 1119.99

In mitigation, it seems unlikely that computing the size of the result in String.concat is a significant bottleneck in most OCaml programs.

Still, it's a shame to slow things down unnecessarily. So this PR takes advantage of some optimization opportunities to make the size computation faster (while adding in the overflow check, of course). Instead of calling List.iter with a closure we might compute the size in a simple loop implemented as a recursive function with an accumulator. And we can then unroll the loop so as not to perform the overflow check every iteration. The unrolling uses the following observation: since Sys.max_string_length is significantly smaller than max_int, if adding String.length s to size causes an overflow, then we can add quite a few more string lengths to size before it becomes positive again.

Here's a comparison of the performance of the existing size computation and the implementation in this PR:

Performance of current implementation vs this PR

List length: 1 2 4 8 16 32 64 128 256
Current code (ns): 13.08 20.44 23.72 39.31 83.46 141.42 266.81 505.84 994.86
New implementation (ns): 3.70 5.04 8.51 14.70 27.15 53.66 112.77 215.85 419.42

With flambda the gap narrows a bit:

Performance of current implementation vs this PR (flambda)

List length: 1 2 4 8 16 32 64 128 256
Current code (ns): 8.49 10.79 15.74 25.84 48.46 93.41 183.54 357.02 711.02
New implementation (ns): 4.29 5.25 8.43 15.74 27.38 52.16 108.34 209.81 419.05

A final note

This PR fixes a bug in String.concat and Bytes.concat. The Array.concat function has a similar problem, which I'll address in a separate PR.

@yallop

This comment has been minimized.

Show comment
Hide comment
@yallop

yallop Sep 12, 2016

Member

One question: does OCaml actually guarantee that int is "modulo" — i.e. that overflow wraps? If not, I'll need to tweak things a bit.

Member

yallop commented Sep 12, 2016

One question: does OCaml actually guarantee that int is "modulo" — i.e. that overflow wraps? If not, I'll need to tweak things a bit.

@alainfrisch

This comment has been minimized.

Show comment
Hide comment
@alainfrisch

alainfrisch Sep 12, 2016

Contributor

The "Integer arithmetic" section in Pervasives' doc starts with:

Integers are 31 bits wide (or 63 bits on 64-bit processors). All operations are taken modulo 2^31
(or 2^63). They do not fail on overflow.

js_of_ocaml has 32-bit integers (with modulo semantics, I believe), and probably Bucklescript as well.

Contributor

alainfrisch commented Sep 12, 2016

The "Integer arithmetic" section in Pervasives' doc starts with:

Integers are 31 bits wide (or 63 bits on 64-bit processors). All operations are taken modulo 2^31
(or 2^63). They do not fail on overflow.

js_of_ocaml has 32-bit integers (with modulo semantics, I believe), and probably Bucklescript as well.

@alainfrisch

This comment has been minimized.

Show comment
Hide comment
@alainfrisch

alainfrisch Sep 12, 2016

Contributor

(I don't know what's the maximum string length for JS backends, but I can very well imagine that it is not small enough compared to max_int to make the overflow detection safe with the unrolling hack. That said, missing an overflow in a JS backend is probably harmless: no "segfault", just a JS exception.)

Contributor

alainfrisch commented Sep 12, 2016

(I don't know what's the maximum string length for JS backends, but I can very well imagine that it is not small enough compared to max_int to make the overflow detection safe with the unrolling hack. That said, missing an overflow in a JS backend is probably harmless: no "segfault", just a JS exception.)

@alainfrisch

This comment has been minimized.

Show comment
Hide comment
@alainfrisch

alainfrisch Sep 12, 2016

Contributor

Which inputs did you use for the benchmark? The optimization is cute, but I wonder what's the impact on the whole String.concat (not just the length computation). If the cost of the first List.iter deserves the optimization, so should the cost of the second one.

Contributor

alainfrisch commented Sep 12, 2016

Which inputs did you use for the benchmark? The optimization is cute, but I wonder what's the impact on the whole String.concat (not just the length computation). If the cost of the first List.iter deserves the optimization, so should the cost of the second one.

@yallop

This comment has been minimized.

Show comment
Hide comment
@yallop

yallop Sep 12, 2016

Member

For the benchmark I used separators of length zero and one (which together account for the majority of calls to String.concat in the OCaml implementation), and strings of length 26.

If the cost of the first List.iter deserves the optimization, so should the cost of the second one.

Indeed. And there are further opportunities to improve the performance of the function overall. For example, since zero-length separators are common, it might be worth detecting that case upfront and avoiding n-1 calls to unsafe_blit that have no effect.

Member

yallop commented Sep 12, 2016

For the benchmark I used separators of length zero and one (which together account for the majority of calls to String.concat in the OCaml implementation), and strings of length 26.

If the cost of the first List.iter deserves the optimization, so should the cost of the second one.

Indeed. And there are further opportunities to improve the performance of the function overall. For example, since zero-length separators are common, it might be worth detecting that case upfront and avoiding n-1 calls to unsafe_blit that have no effect.

@dbuenzli

This comment has been minimized.

Show comment
Hide comment
@dbuenzli

dbuenzli Sep 12, 2016

Contributor

The unrolling uses the following observation: since Sys.max_string_length is significantly smaller than max_int

I personally don't like this assumption and Alain already pointed places where it may not hold. Since I have the same bug in Astring I have been thinking about solving the problem a bit differently.

My idea is to check each addition of the members of the list and at the same time count the separators (in other words compute list length - 1). At the end of the loop we can do some arithmetic to see if 1) multiplying the sep length by the sep count would be greater than Sys.max_string_length 2) check if adding the total sep lengths to the accumulated lengths is greater than Sys.max_string_length.

Now, of course the sep count could overflow aswell. However in order to do on a 32-bit platform your list would need to be of length > (1 lsl 31). But this list would use ((1 lsl 31) * 4 * 2) / (1024 * 1024 * 1024) = 16Go of memory which is beyond the addressable memory limit.

Contributor

dbuenzli commented Sep 12, 2016

The unrolling uses the following observation: since Sys.max_string_length is significantly smaller than max_int

I personally don't like this assumption and Alain already pointed places where it may not hold. Since I have the same bug in Astring I have been thinking about solving the problem a bit differently.

My idea is to check each addition of the members of the list and at the same time count the separators (in other words compute list length - 1). At the end of the loop we can do some arithmetic to see if 1) multiplying the sep length by the sep count would be greater than Sys.max_string_length 2) check if adding the total sep lengths to the accumulated lengths is greater than Sys.max_string_length.

Now, of course the sep count could overflow aswell. However in order to do on a 32-bit platform your list would need to be of length > (1 lsl 31). But this list would use ((1 lsl 31) * 4 * 2) / (1024 * 1024 * 1024) = 16Go of memory which is beyond the addressable memory limit.

@dbuenzli

This comment has been minimized.

Show comment
Hide comment
@dbuenzli

dbuenzli Sep 12, 2016

Contributor

What I said can be seen in code here.

Contributor

dbuenzli commented Sep 12, 2016

What I said can be seen in code here.

@yallop

This comment has been minimized.

Show comment
Hide comment
@yallop

yallop Sep 12, 2016

Member

I don't know what's the maximum string length for JS backends

js_of_ocaml retains 32-bit OCaml's string restrictions, but has 32-bit rather than 31-bit ints:

js_of_ocaml

Bucklescript is less constrained by convention, and has Sys.max_string_length > max_int.

buckle1
buckle2

Member

yallop commented Sep 12, 2016

I don't know what's the maximum string length for JS backends

js_of_ocaml retains 32-bit OCaml's string restrictions, but has 32-bit rather than 31-bit ints:

js_of_ocaml

Bucklescript is less constrained by convention, and has Sys.max_string_length > max_int.

buckle1
buckle2

@bobzhang

This comment has been minimized.

Show comment
Hide comment
@bobzhang

bobzhang Sep 12, 2016

Member

In JS vm, the string is represented by a rope, so the limit would be memory.
For this function in particular, we probably we will provide a much more efficient alternative for JS backend(shortcut String.concat avoid using buffer), so it may not affect us.
But it would be nice to avoid exception as a control flow if you can, local exception is preferred if necessary.

Member

bobzhang commented Sep 12, 2016

In JS vm, the string is represented by a rope, so the limit would be memory.
For this function in particular, we probably we will provide a much more efficient alternative for JS backend(shortcut String.concat avoid using buffer), so it may not affect us.
But it would be nice to avoid exception as a control flow if you can, local exception is preferred if necessary.

@lefessan

This comment has been minimized.

Show comment
Hide comment
@lefessan

lefessan Sep 12, 2016

Contributor

@yallop What are you benchmarking here exactly ? The cost of concat, or the cost of computing the final size ? A naive solution would be to keep the current (wrong) final size computation, but do an overflow check before the unsafe_blit operations ? Is it too inefficient ?

Contributor

lefessan commented Sep 12, 2016

@yallop What are you benchmarking here exactly ? The cost of concat, or the cost of computing the final size ? A naive solution would be to keep the current (wrong) final size computation, but do an overflow check before the unsafe_blit operations ? Is it too inefficient ?

@yallop

This comment has been minimized.

Show comment
Hide comment
@yallop

yallop Sep 12, 2016

Member

@yallop What are you benchmarking here exactly ? The cost of concat, or the cost of computing the final size ?

The cost of computing the final size.

A naive solution would be to keep the current (wrong) final size computation, but do an overflow check before the unsafe_blit operations ? Is it too inefficient ?

I haven't measured, but it sounds less efficient. Wouldn't that involve two checks before each write -- for overflow of offset+length, and for out-of-bounds writes? (Consider the case where the sum of the lengths of the strings is 3+2*max_int, i.e. 1.)

Member

yallop commented Sep 12, 2016

@yallop What are you benchmarking here exactly ? The cost of concat, or the cost of computing the final size ?

The cost of computing the final size.

A naive solution would be to keep the current (wrong) final size computation, but do an overflow check before the unsafe_blit operations ? Is it too inefficient ?

I haven't measured, but it sounds less efficient. Wouldn't that involve two checks before each write -- for overflow of offset+length, and for out-of-bounds writes? (Consider the case where the sum of the lengths of the strings is 3+2*max_int, i.e. 1.)

@lefessan

This comment has been minimized.

Show comment
Hide comment
@lefessan

lefessan Sep 14, 2016

Contributor

If you are only measuring the cost of the computation of the final size, I am a bit skeptical that it is worth adding so much complexity to make it efficient, when I would assume that most of the cost of String.concat comes from copying the strings content into the new string.

Contributor

lefessan commented Sep 14, 2016

If you are only measuring the cost of the computation of the final size, I am a bit skeptical that it is worth adding so much complexity to make it efficient, when I would assume that most of the cost of String.concat comes from copying the strings content into the new string.

@yallop

This comment has been minimized.

Show comment
Hide comment
@yallop

yallop Sep 14, 2016

Member

so much complexity

I don't think the code is complex at all. It's a simple recursive function that adds numbers and checks for overflow.

Which part do you consider difficult to understand?

Member

yallop commented Sep 14, 2016

so much complexity

I don't think the code is complex at all. It's a simple recursive function that adds numbers and checks for overflow.

Which part do you consider difficult to understand?

@dbuenzli

This comment has been minimized.

Show comment
Hide comment
@dbuenzli

dbuenzli Sep 14, 2016

Contributor

Which part do you consider difficult to understand?

Personally I really dislike your assumption about the relationship between Sys.max_string_length and max_int at least it should be asserted in some kind of test (even though that test wouldn't be run e.g. in the js backends).

I'm also doubtful about the real world performance impact, as @lefessan suggests the nano seconds you are fishing are likely to be lost in the noise of the actual memory copies. If that is the case there is little point to be made to go through these convolutions to implement this.

Contributor

dbuenzli commented Sep 14, 2016

Which part do you consider difficult to understand?

Personally I really dislike your assumption about the relationship between Sys.max_string_length and max_int at least it should be asserted in some kind of test (even though that test wouldn't be run e.g. in the js backends).

I'm also doubtful about the real world performance impact, as @lefessan suggests the nano seconds you are fishing are likely to be lost in the noise of the actual memory copies. If that is the case there is little point to be made to go through these convolutions to implement this.

@yallop

This comment has been minimized.

Show comment
Hide comment
@yallop

yallop Sep 14, 2016

Member

I'm also doubtful about the real world performance impact

This was addressed in the description.

And the broader point is in danger of being lost in the noise: this PR is about fixing a bug (without degrading performance), and deliberately makes no claims about real world performance impacts.

Member

yallop commented Sep 14, 2016

I'm also doubtful about the real world performance impact

This was addressed in the description.

And the broader point is in danger of being lost in the noise: this PR is about fixing a bug (without degrading performance), and deliberately makes no claims about real world performance impacts.

@lefessan

This comment has been minimized.

Show comment
Hide comment
@lefessan

lefessan Sep 15, 2016

Contributor

Which part do you consider difficult to understand?

It's not the complexity of the code in terms of understanding (although understanding the code after 6 months takes more time than just after writing it), but in terms of number of lines. If we add 30 lines of code to fix an insignificant performance problem in each stdlib function, when just one line would be enough to fix the bug, the code of stdlib would just become unmanageable. Also, it seems easier to prove the function correct if there is only one check, instead of a call to an auxilliary function.

Contributor

lefessan commented Sep 15, 2016

Which part do you consider difficult to understand?

It's not the complexity of the code in terms of understanding (although understanding the code after 6 months takes more time than just after writing it), but in terms of number of lines. If we add 30 lines of code to fix an insignificant performance problem in each stdlib function, when just one line would be enough to fix the bug, the code of stdlib would just become unmanageable. Also, it seems easier to prove the function correct if there is only one check, instead of a call to an auxilliary function.

@lefessan

Too many lines of code when just one line would fix the bug.
If we keep this code for performance reason, I suggest to factor Bytes.sum_lengths and String.sum_lengths (by adding as parameter the function length)

@bobzhang

This comment has been minimized.

Show comment
Hide comment
@bobzhang

bobzhang Sep 15, 2016

Member

The patch is overly complicated and it might be slower in some backends (exception used as control flow here), I like @lefessan 's simple fix

Member

bobzhang commented Sep 15, 2016

The patch is overly complicated and it might be slower in some backends (exception used as control flow here), I like @lefessan 's simple fix

@yallop

This comment has been minimized.

Show comment
Hide comment
@yallop

yallop Sep 15, 2016

Member

@bobzhang: I have no idea what you mean by "exception used as control flow" (or what you mean by "complicated", for that matter).

Exceptions are used as, well, exceptions.

Member

yallop commented Sep 15, 2016

@bobzhang: I have no idea what you mean by "exception used as control flow" (or what you mean by "complicated", for that matter).

Exceptions are used as, well, exceptions.

@bobzhang

This comment has been minimized.

Show comment
Hide comment
@bobzhang

bobzhang Sep 15, 2016

Member

I mean complicated because a single line would fix the issue. (sorry for my poor English)

Member

bobzhang commented Sep 15, 2016

I mean complicated because a single line would fix the issue. (sorry for my poor English)

@yallop yallop closed this Sep 15, 2016

@yallop yallop deleted the yallop:string-concat-bounds branch Sep 15, 2016

@damiendoligez

This comment has been minimized.

Show comment
Hide comment
@damiendoligez

damiendoligez Sep 15, 2016

Member

My thoughts:

  • This bug needs to be fixed. Why did you close this PR?
  • Loop unrolling looks like a premature optimization.
  • @bobzhang is talking about the cost of setting a handler for Overflow, although I don't see any obvious way to avoid it.
  • I don't understand how any int can be larger than max_int. There must be something very wrong in Bucklescript to make Sys.max_string_length > max_int true.
  • I'd like to see the one-line fix that @lefessan is talking about.
Member

damiendoligez commented Sep 15, 2016

My thoughts:

  • This bug needs to be fixed. Why did you close this PR?
  • Loop unrolling looks like a premature optimization.
  • @bobzhang is talking about the cost of setting a handler for Overflow, although I don't see any obvious way to avoid it.
  • I don't understand how any int can be larger than max_int. There must be something very wrong in Bucklescript to make Sys.max_string_length > max_int true.
  • I'd like to see the one-line fix that @lefessan is talking about.
@dbuenzli

This comment has been minimized.

Show comment
Hide comment
@dbuenzli

dbuenzli Sep 15, 2016

Contributor

@bobzhang is talking about the cost of setting a handler for Overflow, although I don't see any obvious way to avoid it.

FWIW the final solution I have in astring avoids it.

Contributor

dbuenzli commented Sep 15, 2016

@bobzhang is talking about the cost of setting a handler for Overflow, although I don't see any obvious way to avoid it.

FWIW the final solution I have in astring avoids it.

@bobzhang

This comment has been minimized.

Show comment
Hide comment
@bobzhang

bobzhang Sep 15, 2016

Member

@damiendoligez Per discussions with @dbuenzli we will set Sys.max_string_length to max_int

Member

bobzhang commented Sep 15, 2016

@damiendoligez Per discussions with @dbuenzli we will set Sys.max_string_length to max_int

@yallop

This comment has been minimized.

Show comment
Hide comment
@yallop

yallop Sep 15, 2016

Member

@damiendoligez:

This bug needs to be fixed. Why did you close this PR?

For one thing, there's unanimous opposition from OCaml core developers and others. For another, a lot of the comments have been unsupported value judgements or technically-dubious assertions. To be fair, there have also been one or two reasonable points. Overall, though, this thread has been no fun at all.

Loop unrolling looks like a premature optimization.

Fair enough. I don't mind removing the unrolling. My personal view is that the tradeoff for standard library code should generally be slightly more in favour of performance over clarity, and that loop unrolling doesn't really affect clarity much anway. But others will (reasonably) differ here, and it's up to the core developers to decide.

@bobzhang is talking about the cost of setting a handler for Overflow, although I don't see any obvious way to avoid it.

It's easy to avoid: raise invalid_arg directly from the summation function and remove the handler in concat.

Member

yallop commented Sep 15, 2016

@damiendoligez:

This bug needs to be fixed. Why did you close this PR?

For one thing, there's unanimous opposition from OCaml core developers and others. For another, a lot of the comments have been unsupported value judgements or technically-dubious assertions. To be fair, there have also been one or two reasonable points. Overall, though, this thread has been no fun at all.

Loop unrolling looks like a premature optimization.

Fair enough. I don't mind removing the unrolling. My personal view is that the tradeoff for standard library code should generally be slightly more in favour of performance over clarity, and that loop unrolling doesn't really affect clarity much anway. But others will (reasonably) differ here, and it's up to the core developers to decide.

@bobzhang is talking about the cost of setting a handler for Overflow, although I don't see any obvious way to avoid it.

It's easy to avoid: raise invalid_arg directly from the summation function and remove the handler in concat.

@lefessan

This comment has been minimized.

Show comment
Hide comment
@lefessan

lefessan Sep 16, 2016

Contributor

Here is a version (I have done a manual CSE on String.length to simplify the code):

let concat sep l =
  match l with
  | [] -> ""
  | hd :: tl ->
      let num = ref 0 and len = ref 0 in
      List.iter (fun s -> incr num; len := !len + length s) l;
      let r_len = !len + length sep * (!num - 1) in
      let r = B.create r_len in
      let hd_len = length hd in
      if hd_len > r_len then invalid_arg "String.concat overflow";
      unsafe_blit hd 0 r 0 hd_len;
      let pos = ref hd_len in
      let sep_length = length sep in
      List.iter
        (fun s ->
          let s_len = length s in
          let final_pos = !pos + sep_length + s_len in
          if final_pos > r_len then invalid_arg "String.concat overflow";
          unsafe_blit sep 0 r !pos sep_length;
          pos := !pos + sep_length;
          unsafe_blit s 0 r !pos s_len;
          pos := !pos + s_len)
        tl;
      Bytes.unsafe_to_string r
Contributor

lefessan commented Sep 16, 2016

Here is a version (I have done a manual CSE on String.length to simplify the code):

let concat sep l =
  match l with
  | [] -> ""
  | hd :: tl ->
      let num = ref 0 and len = ref 0 in
      List.iter (fun s -> incr num; len := !len + length s) l;
      let r_len = !len + length sep * (!num - 1) in
      let r = B.create r_len in
      let hd_len = length hd in
      if hd_len > r_len then invalid_arg "String.concat overflow";
      unsafe_blit hd 0 r 0 hd_len;
      let pos = ref hd_len in
      let sep_length = length sep in
      List.iter
        (fun s ->
          let s_len = length s in
          let final_pos = !pos + sep_length + s_len in
          if final_pos > r_len then invalid_arg "String.concat overflow";
          unsafe_blit sep 0 r !pos sep_length;
          pos := !pos + sep_length;
          unsafe_blit s 0 r !pos s_len;
          pos := !pos + s_len)
        tl;
      Bytes.unsafe_to_string r
@yallop

This comment has been minimized.

Show comment
Hide comment
@yallop

yallop Sep 16, 2016

Member

@lefessan: Your code is much larger than the version in the standard library:

  • 40% more lines

  • 60% more characters

  • 70% more non-whitespace characters

Member

yallop commented Sep 16, 2016

@lefessan: Your code is much larger than the version in the standard library:

  • 40% more lines

  • 60% more characters

  • 70% more non-whitespace characters

@alainfrisch

This comment has been minimized.

Show comment
Hide comment
@alainfrisch

alainfrisch Sep 16, 2016

Contributor

@lefessan Should we protect the call to B.create so that users don't get an Invalid_argument referring to String.create when they call String.concat?

Contributor

alainfrisch commented Sep 16, 2016

@lefessan Should we protect the call to B.create so that users don't get an Invalid_argument referring to String.create when they call String.concat?

@alainfrisch

This comment has been minimized.

Show comment
Hide comment
@alainfrisch

alainfrisch Sep 16, 2016

Contributor

Wouldn't it be simpler and cleaner to compute the real length on the fly in the first List.iter (including the sep length) and checking against max_string_length during the accumulation?

Contributor

alainfrisch commented Sep 16, 2016

Wouldn't it be simpler and cleaner to compute the real length on the fly in the first List.iter (including the sep length) and checking against max_string_length during the accumulation?

@alainfrisch

This comment has been minimized.

Show comment
Hide comment
@alainfrisch

alainfrisch Sep 16, 2016

Contributor

Here is what I mean:

let concat sep l =
  match l with
  | [] -> ""
  | hd :: tl ->
      let len = ref (length hd) in
      List.iter
        (fun s ->
           len := !len + length sep + length s;
           if !len > Sys.max_string_length then
             invalid_arg "String.concat"
        )
        tl;
      let r = Bytes.create !len in
      unsafe_blit hd 0 r 0 (length hd);
      let pos = ref (length hd) in
      List.iter
        (fun s ->
          unsafe_blit sep 0 r !pos (length sep);
          pos := !pos + length sep;
          unsafe_blit s 0 r !pos (length s);
          pos := !pos + length s)
        tl;
      Bytes.unsafe_to_string r

(The overflow detection depends on the fact that max_string_length * 3 < max_int. This can be avoided at the cost of doing two tests.)

Contributor

alainfrisch commented Sep 16, 2016

Here is what I mean:

let concat sep l =
  match l with
  | [] -> ""
  | hd :: tl ->
      let len = ref (length hd) in
      List.iter
        (fun s ->
           len := !len + length sep + length s;
           if !len > Sys.max_string_length then
             invalid_arg "String.concat"
        )
        tl;
      let r = Bytes.create !len in
      unsafe_blit hd 0 r 0 (length hd);
      let pos = ref (length hd) in
      List.iter
        (fun s ->
          unsafe_blit sep 0 r !pos (length sep);
          pos := !pos + length sep;
          unsafe_blit s 0 r !pos (length s);
          pos := !pos + length s)
        tl;
      Bytes.unsafe_to_string r

(The overflow detection depends on the fact that max_string_length * 3 < max_int. This can be avoided at the cost of doing two tests.)

@lefessan

This comment has been minimized.

Show comment
Hide comment
@lefessan

lefessan Sep 16, 2016

Contributor

@yallop As I said, I did some manual CSE, so the longer code. Anyway, @alainfrisch 's solution is much better (although it could benefit from some CSE :-) ).

Contributor

lefessan commented Sep 16, 2016

@yallop As I said, I did some manual CSE, so the longer code. Anyway, @alainfrisch 's solution is much better (although it could benefit from some CSE :-) ).

@yallop

This comment has been minimized.

Show comment
Hide comment
@yallop

yallop Sep 16, 2016

Member

@lefessan: you argued that my code adds 30 lines (which it doesn't), and that your solution used only one line (which it doesn't).

Member

yallop commented Sep 16, 2016

@lefessan: you argued that my code adds 30 lines (which it doesn't), and that your solution used only one line (which it doesn't).

@lefessan

This comment has been minimized.

Show comment
Hide comment
@lefessan

lefessan Sep 16, 2016

Contributor

By the way, you said that there was an "unanimous opposition from OCaml core developers and others", but, at least for me, the opposition was on the way you over-optimized your code, not on the fact that you found the bug and proposed a solution. Anyway, if you look at other PR (take a look at mine :-) ), you will see that the treatment is the same one for everybody.

Contributor

lefessan commented Sep 16, 2016

By the way, you said that there was an "unanimous opposition from OCaml core developers and others", but, at least for me, the opposition was on the way you over-optimized your code, not on the fact that you found the bug and proposed a solution. Anyway, if you look at other PR (take a look at mine :-) ), you will see that the treatment is the same one for everybody.

@lefessan

This comment has been minimized.

Show comment
Hide comment
@lefessan

lefessan Sep 16, 2016

Contributor

@yallop If you look closely at my code, you will see that, except the additions of let for CSEing already existing computations, I only added 2 lines if XXX_len > r_len then invalid_arg.... Alain's solution only adds one.

Again, that's part of the reviewing process, you shouldn't take it personally when another developer complains about the length of your code, or proposes to use another solution, even if you think they are wrong.

Contributor

lefessan commented Sep 16, 2016

@yallop If you look closely at my code, you will see that, except the additions of let for CSEing already existing computations, I only added 2 lines if XXX_len > r_len then invalid_arg.... Alain's solution only adds one.

Again, that's part of the reviewing process, you shouldn't take it personally when another developer complains about the length of your code, or proposes to use another solution, even if you think they are wrong.

@alainfrisch

This comment has been minimized.

Show comment
Hide comment
@alainfrisch

alainfrisch Sep 16, 2016

Contributor

And an optimized version:

let concat sep l =
  match l with
  | [] -> ""
  | hd :: tl ->
    let hd_len = length hd in
    let len = ref hd_len in
    let sep_len = length sep in

      let l = ref tl in
      while begin
        match !l with
        | [] -> false
        | s :: rest ->
           l := rest;
           len := !len + sep_len + length s;
           if !len > Sys.max_string_length then
             invalid_arg "String.concat";
           true
      end do () done;

      let r = Bytes.create !len in
      unsafe_blit hd 0 r 0 hd_len;
      let pos = ref hd_len in
      let l = ref tl in
      while begin
        match !l with
        | [] -> false
        | s :: rest ->
           l := rest;
           let s_len = length s in
           unsafe_blit sep 0 r !pos sep_len;
           pos := !pos + sep_len;
           unsafe_blit s 0 r !pos s_len;
           pos := !pos + s_len;
           true
      end do () done;

      Bytes.unsafe_to_string r

(about 25% faster than the current String.concat on:

let () =
  let s = String.make 16 ' ' in
  let l = [s;s;s;s;s;s;s;s;s;s] in
  for i = 1 to 10000000 do
   ignore (concat s l)
  done

)

Contributor

alainfrisch commented Sep 16, 2016

And an optimized version:

let concat sep l =
  match l with
  | [] -> ""
  | hd :: tl ->
    let hd_len = length hd in
    let len = ref hd_len in
    let sep_len = length sep in

      let l = ref tl in
      while begin
        match !l with
        | [] -> false
        | s :: rest ->
           l := rest;
           len := !len + sep_len + length s;
           if !len > Sys.max_string_length then
             invalid_arg "String.concat";
           true
      end do () done;

      let r = Bytes.create !len in
      unsafe_blit hd 0 r 0 hd_len;
      let pos = ref hd_len in
      let l = ref tl in
      while begin
        match !l with
        | [] -> false
        | s :: rest ->
           l := rest;
           let s_len = length s in
           unsafe_blit sep 0 r !pos sep_len;
           pos := !pos + sep_len;
           unsafe_blit s 0 r !pos s_len;
           pos := !pos + s_len;
           true
      end do () done;

      Bytes.unsafe_to_string r

(about 25% faster than the current String.concat on:

let () =
  let s = String.make 16 ' ' in
  let l = [s;s;s;s;s;s;s;s;s;s] in
  for i = 1 to 10000000 do
   ignore (concat s l)
  done

)

@yallop

This comment has been minimized.

Show comment
Hide comment
@yallop

yallop Sep 16, 2016

Member

you shouldn't take it personally when another developer complains about the length of your code

It's not just "another developer": it's one of the maintainers of the project. And it's not just about complaints; it's exaggerations and untruths.

If the OCaml team welcomes contributions of this sort, it would be better to at least strive for basic accuracy in reviews.

Reviews are essential, but they should aim to say what steps need to be taken to have the PR merged, not just list some things you don't like.

Alain's solution only adds one.

The current standard library code is 17 lines long. Alain's code is 23 lines long. (And he's just posted another version, 38 lines long.)

Member

yallop commented Sep 16, 2016

you shouldn't take it personally when another developer complains about the length of your code

It's not just "another developer": it's one of the maintainers of the project. And it's not just about complaints; it's exaggerations and untruths.

If the OCaml team welcomes contributions of this sort, it would be better to at least strive for basic accuracy in reviews.

Reviews are essential, but they should aim to say what steps need to be taken to have the PR merged, not just list some things you don't like.

Alain's solution only adds one.

The current standard library code is 17 lines long. Alain's code is 23 lines long. (And he's just posted another version, 38 lines long.)

@lefessan

This comment has been minimized.

Show comment
Hide comment
@lefessan

lefessan Sep 16, 2016

Contributor

Reviews are essential, but they should aim to say what steps need to be taken to have the PR merged, not just list some things you don't like.

That's not how it works here. In many projects, there is only one project leader, who can spend all his time doing that. Unfortunately, that's not the case for OCaml, there are some project members with commit rights, that's my case, but I cannot say "here is what you have to do to have this PR merged", because other project members might disagree with me. So, I give my opinion, it's just one opinion among other ones, and many PRs have been merged in the last years/months against my opinion... Now, if other members also share that opinion, it becomes harder to dismiss it as just "exaggerations and untruths".

Contributor

lefessan commented Sep 16, 2016

Reviews are essential, but they should aim to say what steps need to be taken to have the PR merged, not just list some things you don't like.

That's not how it works here. In many projects, there is only one project leader, who can spend all his time doing that. Unfortunately, that's not the case for OCaml, there are some project members with commit rights, that's my case, but I cannot say "here is what you have to do to have this PR merged", because other project members might disagree with me. So, I give my opinion, it's just one opinion among other ones, and many PRs have been merged in the last years/months against my opinion... Now, if other members also share that opinion, it becomes harder to dismiss it as just "exaggerations and untruths".

@alainfrisch

This comment has been minimized.

Show comment
Hide comment
@alainfrisch

alainfrisch Sep 16, 2016

Contributor

@yallop Frankly, I think you are overreacting to this. Most people involved in the discussion simply expressed their feeling that the extra logic to "remove the performance overhead" was not worth unless justified by benchmarks on the entire function. It's not a matter of number of lines or size of the patch, but about justifying the addition of potentially useless logic and avoiding premature optimization.

Contributor

alainfrisch commented Sep 16, 2016

@yallop Frankly, I think you are overreacting to this. Most people involved in the discussion simply expressed their feeling that the extra logic to "remove the performance overhead" was not worth unless justified by benchmarks on the entire function. It's not a matter of number of lines or size of the patch, but about justifying the addition of potentially useless logic and avoiding premature optimization.

@yallop

This comment has been minimized.

Show comment
Hide comment
@yallop

yallop Sep 16, 2016

Member

Now, if other members also share that opinion, it becomes harder to dismiss it as just "exaggerations and untruths".

Whether the code is 30 lines long isn't really something on which opinions can legitimately differ.

It's not a matter of number of lines

It's not the complexity of the code in terms of understanding [...] but in terms of number of lines.

Member

yallop commented Sep 16, 2016

Now, if other members also share that opinion, it becomes harder to dismiss it as just "exaggerations and untruths".

Whether the code is 30 lines long isn't really something on which opinions can legitimately differ.

It's not a matter of number of lines

It's not the complexity of the code in terms of understanding [...] but in terms of number of lines.

@alainfrisch

This comment has been minimized.

Show comment
Hide comment
@alainfrisch

alainfrisch Sep 16, 2016

Contributor

Whether the code is 30 lines long isn't really something on which opinions can legitimately differ.

That seems about right (considering the two copies in Bytes and String), though, even if largely irrelevant in my opinion.

It's not a matter of number of lines
It's not the complexity of the code in terms of understanding [...] but in terms of number of lines.

Are you trying to make the point that different people have different opinions?

Contributor

alainfrisch commented Sep 16, 2016

Whether the code is 30 lines long isn't really something on which opinions can legitimately differ.

That seems about right (considering the two copies in Bytes and String), though, even if largely irrelevant in my opinion.

It's not a matter of number of lines
It's not the complexity of the code in terms of understanding [...] but in terms of number of lines.

Are you trying to make the point that different people have different opinions?

@yallop

This comment has been minimized.

Show comment
Hide comment
@yallop

yallop Sep 16, 2016

Member

considering the two copies in Bytes and String

The claim was about 30 additional lines per function.

Are you trying to make the point that different people have different opinions?

I'm a bit disappointed to see this kind of comment from you, @alainfrisch, given how reasonable you usually are (and were in the early comments in this thread). I thought it was reasonable to group both excerpts together, given that you approved one and wrote the other.

Member

yallop commented Sep 16, 2016

considering the two copies in Bytes and String

The claim was about 30 additional lines per function.

Are you trying to make the point that different people have different opinions?

I'm a bit disappointed to see this kind of comment from you, @alainfrisch, given how reasonable you usually are (and were in the early comments in this thread). I thought it was reasonable to group both excerpts together, given that you approved one and wrote the other.

@lefessan

This comment has been minimized.

Show comment
Hide comment
@lefessan

lefessan Sep 16, 2016

Contributor

@alainfrisch Could you submit a PR referencing this one with your code ? (the one with exactly 23 lines) So that we can close the issue.

Contributor

lefessan commented Sep 16, 2016

@alainfrisch Could you submit a PR referencing this one with your code ? (the one with exactly 23 lines) So that we can close the issue.

@damiendoligez

This comment has been minimized.

Show comment
Hide comment
@damiendoligez

damiendoligez Sep 16, 2016

Member

Just a question for you guys: why do we care about optimizing String.concat? I can see a case for optimizing maps/sets/hashtables to death, but this? Can you even imagine a program where the performance of String.concat would be critical?

Member

damiendoligez commented Sep 16, 2016

Just a question for you guys: why do we care about optimizing String.concat? I can see a case for optimizing maps/sets/hashtables to death, but this? Can you even imagine a program where the performance of String.concat would be critical?

@alainfrisch

This comment has been minimized.

Show comment
Hide comment
@alainfrisch

alainfrisch Sep 16, 2016

Contributor

why do we care about optimizing String.concat?

It's certainly less important than for datastructures, but since we are all looking at this function very closely anyway, a 25% gain on a typical payload with only a few lines of extra source code (and probably smaller binary code) seems a reasonable tradeoff to me.

Contributor

alainfrisch commented Sep 16, 2016

why do we care about optimizing String.concat?

It's certainly less important than for datastructures, but since we are all looking at this function very closely anyway, a 25% gain on a typical payload with only a few lines of extra source code (and probably smaller binary code) seems a reasonable tradeoff to me.

@yallop

This comment has been minimized.

Show comment
Hide comment
@yallop

yallop Sep 16, 2016

Member

a 25% gain on a typical payload with only a few lines of extra source code (and probably smaller binary code) seems a reasonable tradeoff to me.

In that case I'm sure you'll like this implementation, which is just as fast as yours (i.e. 25% below the stdlib on your benchmark) and less than half the length

let check_pos v = if v < 0 then invalid_arg "String.concat" else v

let rec sum_lengths acc seplen = function
  | [] -> acc
  | [x] -> length x + acc
  | hd :: tl -> sum_lengths (check_pos (length hd + seplen + acc)) seplen tl

let rec unsafe_blits dst pos sep seplen = function
    [] -> dst
  | [x] ->
    unsafe_blit x 0 dst pos (length x); dst
  | hd :: tl ->
    unsafe_blit hd 0 dst pos (length hd);
    unsafe_blit sep 0 dst (pos + seplen) seplen;
    unsafe_blits dst (pos + seplen + length hd) sep seplen tl

let concat sep l =
  let sep_len = length sep in
  unsafe_blits (B.create (sum_lengths 0 sep_len l)) 0 sep sep_len l
Member

yallop commented Sep 16, 2016

a 25% gain on a typical payload with only a few lines of extra source code (and probably smaller binary code) seems a reasonable tradeoff to me.

In that case I'm sure you'll like this implementation, which is just as fast as yours (i.e. 25% below the stdlib on your benchmark) and less than half the length

let check_pos v = if v < 0 then invalid_arg "String.concat" else v

let rec sum_lengths acc seplen = function
  | [] -> acc
  | [x] -> length x + acc
  | hd :: tl -> sum_lengths (check_pos (length hd + seplen + acc)) seplen tl

let rec unsafe_blits dst pos sep seplen = function
    [] -> dst
  | [x] ->
    unsafe_blit x 0 dst pos (length x); dst
  | hd :: tl ->
    unsafe_blit hd 0 dst pos (length hd);
    unsafe_blit sep 0 dst (pos + seplen) seplen;
    unsafe_blits dst (pos + seplen + length hd) sep seplen tl

let concat sep l =
  let sep_len = length sep in
  unsafe_blits (B.create (sum_lengths 0 sep_len l)) 0 sep sep_len l
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment