Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate Obj.truncate #2279

Merged
merged 5 commits into from Apr 1, 2019

Conversation

@stedolan
Copy link
Contributor

commented Mar 4, 2019

In the multicore GC, object headers (tags and lengths) are immutable. These fields are not modified by any normal OCaml features, but can be changed using the primitives Obj.set_tag and Obj.truncate, which are therefore unsupported in multicore.

Obj.set_tag was deprecated by #1725, and this PR does the same for Obj.truncate. I found 20-odd uses of Obj.truncate in OPAM, of which most were copy-pasted from the stdlib uses discussed below. (I also found two "real" uses, in frama-c and base).

Most of this PR is removal of the uses of Obj.truncate internally in the stdlib:

camlinternalMod.ml

There was an old bug caused by an interaction between the bytecode VM's implementation of currying and the initialisation of recursive modules. In short, the bytecode VM used the length of a closure to determine how many arguments have been partially applied, and the recursive module initialisation did not preserve closure lengths. The fix was to use Obj.truncate to ensure the closure length didn't change.

Instead, I've added an extra field to the closures generated by partial application in the bytecode VM to track the number of partially applied arguments. I've also added a test based on the original bug report.

weak.ml

The implementation of weak hashtables used Obj.truncate to shorten the arrays used as per-bucket collision chains. I changed this to create a new array instead. (Creating a new array is slower than truncating, but since this only occurs in the slow path where we're scanning and compacting a bucket anyway it's still O(n)).

Sys.argv in the toplevel

The toplevel uses an arguably horrible hack to mutate the Sys.argv array when a script is run with ocaml foo.ml arg. The script should see [| "foo.ml"; "arg" |] in Sys.argv, even though the program starts with Sys.argv as [| "ocaml"; "foo.ml"; "arg" |]. To get around this, the toplevel copies the new argv over the old one, and then Obj.truncates the argv array to the new length. (This only works if the new argv is shorter than the old, otherwise you get an error).

I replaced this arguably horrible hack with an unarguably, definitively horrible hack, based on a patch we've used in multicore for the last while: instead of mutating the Sys.argv array, the toplevel now mutates the Sys module itself to swap in the new argv.

Testsuite

There was one test that used Obj.truncate in the implementation of array_to_list_in_place. I've replaced that function with Array.to_list.

@gasche

This comment has been minimized.

Copy link
Member

commented Mar 4, 2019

I wonder about the performance impact, in bytecode, of the change of closure representation. Adding one extra word makes each closure consume more space, but maybe the idea is that the code length is already relatively high, so the impact is negligible?

If I understand the camlinternalMod.update_mod problem correctly, an alternative approach would be similar in spirit to your previous Obj.with_tag work: provide an Obj.extend_closure function to create a new copy of a closure with more, dummy environment fields (with () in them, I guess). When we use a tie-the-knot approach of backpatching with a new closure, as in update_mod, this allows us fix the case where the final closure has less environment variables than the pre-initialized one.

(I don't remember exactly how the indexing of closure variables work; any need to fix the offsets in the code would kill the proposal, but I guess that it should be possible to avoid it by padding the extra environments at the end or at the beginning, wisely.)

This is just idle speculation; my guess is that the performance impact of the closure representation change will end up being very small (a test would still be useful), and your own time to experiment with alternative approaches non-existent anyway.

@gasche

This comment has been minimized.

Copy link
Member

commented Mar 4, 2019

I think that this PR would be easier to review with the various fixes (modules, Weak, argv, sorts) being split into different PRs. I reviewed the Sys.argv change and believe it is correct), I would approve the Weak stuff, and the misc/sorts.ml change is obviously good to go. I haven't reviewed the bytecode closure representation change yet, and I think that the Sys.argv trick might deserve a discussion of its own (but I think it is a better hack than the existing one).

@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Mar 4, 2019

Instead, I've added an extra field to the closures generated by partial application in the bytecode VM to track the number of partially applied arguments.

I don't like this at all. The VM is fine; recursive modules are the problem.

@stedolan

This comment has been minimized.

Copy link
Contributor Author

commented Mar 4, 2019

@gasche

I wonder about the performance impact, in bytecode, of the change of closure representation. Adding one extra word makes each closure consume more space, but maybe the idea is that the code length is already relatively high, so the impact is negligible?

This is not what's happening! There is no change here to the general bytecode closure representation. Making a closure still uses the same amount of space.

When a bytecode function is called, and there are fewer arguments available than required, the interpreter creates a thunk. This thunk is laid out as a closure and contains n+2 fields, where n is the number of arguments available (the extra 2 are used to store the code pointer and environment). This patch changes these thunks (and only these) to have n+3 fields, using the new field to store n.

There is no appreciable performance change. (I measured by bootstrapping, which is a heavy user of the bytecode VM. The new code actually ran very slightly faster, but the difference was well within measurement noise).

If I understand the camlinternalMod.update_mod problem correctly, an alternative approach would be similar in spirit to your previous Obj.with_tag work: provide an Obj.extend_closure function to create a new copy of a closure with more, dummy environment fields (with () in them, I guess). When we use a tie-the-knot approach of backpatching with a new closure, as in update_mod, this allows us fix the case where the final closure has less environment variables than the pre-initialized one.

(I don't remember exactly how the indexing of closure variables work; any need to fix the offsets in the code would kill the proposal, but I guess that it should be possible to avoid it by padding the extra environments at the end or at the beginning, wisely.)

I am afraid that this won't work. The issue is that the RESTART instruction, which is the target of the partial application thunks described above, uses the actual length of the closure object to determine how many arguments were previously supplied. Extending the closure with dummy fields will cause this calculation to give the wrong answer, which is why this patch stores the number of previously-supplied arguments explicitly.

@xavierleroy

I don't like this at all. The VM is fine; recursive modules are the problem.

I agree, so I spent some time thinking about a solution that only changed recursive modules.

The current code implements recursive modules by compiling them as normal modules and overwriting closures afterwards (not overwriting module fields, but mutating the actual closures). Since the stub closure and the real closure do not in general have the same length, this means that either closure lengths must be mutable or the VM must not rely on closure lengths.

I think the cleanest solution would be to change how recursive modules are built: instead of compiling them as normal modules and resolving all references eagerly, all references should go through a mutable cell, which initially points at a dummy module and is later updated to point at the real module. This would have several advantages: as well as removing the need to truncate, it would mean that closures become actually immutable (allowing us to remove some ugly special cases from multicore that handle "mutating immutable objects"), and it would allow removal of CamlinternalMod.

However, it also has disadvantages: it amounts to a rewrite of the tricky recursive module initialisation code (which is not a project I have time for at the moment), and there are backwards compatibility concerns because the exact set of modules that can be initialised without raising Undefined_recursive_module would presumably change.

@gasche

This comment has been minimized.

Copy link
Member

commented Mar 4, 2019

Apologies for the (misunderstanding above and) naive questions, I'm not familiar with the compilation of recursive modules.

For recursive value declarations (which include functions in mixed functions/values scenarios), we use a size computation to allocate closures of the right size from the start, backpatched later with caml_update_dummy_function. Two questions:

  • Isn't this also a case where closure are mutated -- so @stedolan would also need to change that to get rid of the mutating-closure exceptions?
  • Could we use the same size-calculation strategy when compiling recursive modules? I see that recursive modules only use the signature of the modules to generate the initialization code, but we have the definitions at hand as well.
@stedolan

This comment has been minimized.

Copy link
Contributor Author

commented Mar 4, 2019

  • Isn't this also a case where closure are mutated -- so @stedolan would also need to change that to get rid of the mutating-closure exceptions?

  • Could we use the same size-calculation strategy when compiling recursive modules? I see that recursive modules only use the signature of the modules to generate the initialization code, but we have the definitions at hand as well.

caml_update_dummy_function is indeed awkward, but recursive modules are much worse. We can distinguish three levels of mutability:

  1. Only one write - e.g. ordinary immutable structures, like lists, tuples, etc.
  2. All writes precede all reads - e.g. immutable structures with complex initialisation, where structures are initially filled with dummy values and then patched.
  3. Arbitrary mixture of reads and writes - e.g. reference cells

2 is indeed a bit more complicated to handle than 1: in multicore, we need to be careful about the write barrier for the final initialisation. This style is used for let rec (when there's at least one non-closure value) and also for functional update of large records (these are copied first and then the copy is mutated).

2 is still easier to deal with than 3, since optimisations that assume immutability still work: for instance, common subexpression elimination is valid since any two reads must read the same write, the final one.

The perhaps surprising fact is that recursive modules fall into case 3, not 2. Unlike let rec, it is possible for code to observe half-constructed recursive modules, and to observe "immutable" module fields changing as the initialisation completes. (Here's an example)

The trouble is that the well-formedness of a recursive module definition is detected dynamically, not statically. This means that the same-size strategy doesn't work, as the sizes cannot be determined at compile time.

We cannot easily replace this dynamic check with a static one, as very complex dynamic properties are relied upon. For instance, a standard use of recursive modules is to pass the module being defined to a functor, knowing that the functor's initialisation happens not to invoke the functions of its parameter which are not yet defined.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Mar 4, 2019

I agree the implementation of recursive modules is tricky. One reason is the desire to update the dummy function values with the actual function values as soon as possible, so that later computations see the actual function.

I'm assuming the problematic code is the following (in camlinternalMod):

      if Obj.tag n = Obj.closure_tag && Obj.size n <= Obj.size o
      then begin overwrite o n; Obj.truncate o (Obj.size n) (* PR#4008 *) end
      else overwrite o (Obj.repr (fun x -> (Obj.obj n : _ -> _) x))

Note that the then branch (overwrite and truncate) is an optimization. The else branch (jump through an intermediate function) is always correct. Perhaps we could disable the optimization in bytecode (speed matters less than in native code...), and keep it only in native code, in which case the Obj.truncate is not necessary? (To be checked!) Something like

      if Obj.tag n = Obj.closure_tag && Obj.size n <= Obj.size o && Sys.backend_type = Native
      then overwrite o n
      else overwrite o (Obj.repr (fun x -> (Obj.obj n : _ -> _) x))
@stedolan

This comment has been minimized.

Copy link
Contributor Author

commented Mar 4, 2019

That works very nicely, thanks! I had initially worried that it would cause an extra allocation for multi-argument functions since effectively, functions end up eta-expanded for one argument, turning f a b into something like (fun x -> f x) a b. However, this pattern only causes an extra closure allocation under the strategy native code uses for currying, not that used by bytecode.

In other words, the optimisation makes a big difference for native code and not much for bytecode, while the issue requiring truncate only seems to affect bytecode. So enabling the optimisation only for native code does seem like the right solution.

I had a look for any possible dependencies on closure lengths in native code, and I don't think there are any. Cmm only generates accesses to the object length on array accesses (bound checks) and for the Parraylength primitive. Parraylength is only generated in two places: matching.ml, in array patterns, and from the Obj.size primitive. Every use of Obj.size other than this one is either on an object that's definitely not a closure, or works for any object (e.g. the debugger, tools/dumpobj.ml).

Copy link
Member

left a comment

With the substantial simplification to the recursive-module part of the change, I could review everything and I believe that the PR is correct.

I'll leave the PR open for a few more days in case someone wants to react to the following technique to update Sys.argv, used in this patch. (I see nothing worse with it than the previous one, and in fact it feels cleaner to me.)

let override_sys_argv new_argv =
  let new_argv = Obj.repr new_argv in
  let old_argv = Obj.repr Sys.argv in
  let sys_mod = Obj.repr (module Sys : SYS) in
  for i = 0 to Obj.size sys_mod - 1 do
    if Obj.field sys_mod i == old_argv then
      Obj.set_field sys_mod i new_argv
  done;
  Arg.current := 0
stdlib/weak.ml Show resolved Hide resolved
toplevel/opttoploop.ml Outdated Show resolved Hide resolved
@lpw25

This comment has been minimized.

Copy link
Contributor

commented Mar 4, 2019

Does the Sys.argv change really work in general? At least with flambda I would expect Sys.argv to be given a static location and for all static accesses to go directly to that location rather than through the module block.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Mar 4, 2019

The Sys.argv change is used exclusively by the toplevel loop. If it works with the bytecode compiler, that's good enough in my opinion.

@gasche

This comment has been minimized.

Copy link
Member

commented Mar 4, 2019

There are users of the native toplevel in the wild, and also I think it would feel wrong to have code in the distribution that works incorrectly when native-compiled.

@gasche gasche dismissed their stale review Mar 4, 2019

the Sys.argv issue requires more thinking

@gasche

This comment has been minimized.

Copy link
Member

commented Mar 4, 2019

Would it be possible for Sys to define argv as follows

external argv : int array = "%argv"

and have accesses to Sys.argv be rewritten by the compiler into a reference lookup?

@bobot

This comment has been minimized.

Copy link
Contributor

commented Mar 4, 2019

and there are backwards compatibility concerns because the exact set of modules that can be initialised without raising Undefined_recursive_module would presumably change.

Your proposition (and others like using mutable fields), should have no impact for existing code. Indeed your proposition just change the way the closures are accessed and updated but not the time when the updates are done and so the state of the closures is the same at any point.

But it would be indeed easier to discuss that once this MR is splitted. This MR could be reduced to the statement that Obj.truncate is deprecated: modification of obj.mli and Changes.

@mshinwell

This comment has been minimized.

Copy link
Contributor

commented Mar 5, 2019

I think we should at least consider changing the type of argv to be a function. This is obviously a breaking change, but the fix required to code is very straightforward, and with some careful management perhaps we could fix packages in advance (for example by providing some kind of shim package in OPAM that does the type change).

(Edit: I haven't grepped all of the package sources, but I wouldn't be surprised if there are fewer occurrences of argv than one might think, by virtue of people using libraries such as Cmdliner.)

@stedolan stedolan force-pushed the stedolan:deprecate-obj-truncate branch from 6853123 to 079d463 Mar 5, 2019
@stedolan

This comment has been minimized.

Copy link
Contributor Author

commented Mar 5, 2019

@gasche

Would it be possible for Sys to define argv as follows

external argv : int array = "%argv"

and have accesses to Sys.argv be rewritten by the compiler into a reference lookup?

This is an excellent suggestion, thank you! I hadn't thought of that. I've changed argv to work this way and removed the previous hack. References to Sys.argv now expand to a call to the primitive caml_sys_argv, and there's a new primitive caml_sys_modify_argv for use by the toplevel (but not exported by the stdlib).

@lpw25

Does the Sys.argv change really work in general? At least with flambda I would expect Sys.argv to be given a static location and for all static accesses to go directly to that location rather than through the module block.

I think you're right. (I admit I hadn't given the native toplevel much thought). Following @gasche's suggestion, the new version is much more Flambda-friendly: %sys_argv is expanded during Lambda generation, so Flambda only sees an ordinary C call.

@mshinwell

I think we should at least consider changing the type of argv to be a function. This is obviously a breaking change, but the fix required to code is very straightforward, and with some careful management perhaps we could fix packages in advance (for example by providing some kind of shim package in OPAM that does the type change).

(Edit: I haven't grepped all of the package sources, but I wouldn't be surprised if there are fewer occurrences of argv than one might think, by virtue of people using libraries such as Cmdliner.)

I have in fact grepped all of the package sources, and there are an order of magnitude more references to Sys.argv than to Cmdliner in OPAM. While I am generally on the "break things" side of the argument, I don't think this is an option.

@bobot

Your proposition (and others are like using mutable fields), should have no impact for existing code. Indeed your proposition just change the way the closures are accessed and updated but not the time when the updates are done and so the state of the closures is the same at any point.

But it would be indeed easier to discuss that once this MR is splitted. This MR could be reduced to the statement that Obj.truncate is deprecated: modification of obj.mli and Changes.

Since implementing @xavierleroy's suggestion yesterday, I'm not planning to do any more work on recursive modules in this PR, so splitting that off does not seem to be worth the effort of testing and updating multiple PRs.

runtime/caml/sys.h Outdated Show resolved Hide resolved
@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Mar 5, 2019

There are users of the native toplevel in the wild

The native toplevel is not officially supported, as far as I know. Plus, we're not talking about breaking it, just about a potential breakage if it is compiled with flambda. I'm not even sure ocamlnat works with flambda today.

, and also I think it would feel wrong to have code in the distribution that works incorrectly when native-compiled.

... and flambda-optimized. Well, unsafe code that does things impossible in the language (mutating a field of a structure) risks being miscompiled, yes.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Mar 5, 2019

About this "0-ary primitive" business: it feels like a crime against call-by-value. Are we really sure we want names (value components of structs) that can change value between two evaluations? You start like that and you end up with Algol 60 :-)

@stedolan

This comment has been minimized.

Copy link
Contributor Author

commented Mar 5, 2019

I will hold off on submitting a PR for Sys.__COUNTER__ for the moment, then.

@gasche

This comment has been minimized.

Copy link
Member

commented Mar 5, 2019

Are we really sure we want names (value components of structs) that can change value between two evaluations?

If we are going to hack around our mutability model somewhere for backward-compatibility reasons, then I think that "don't trust external declarations to behave like real values" is probably one of the saner ways to do it. The previous approach of mutating array lengths really has devastating effects on correct-optimization opportunities, while at least this one can be taught to an optimizer (or program analyzer, etc.) without pessimizing everything else.

@gasche
gasche approved these changes Mar 5, 2019
Copy link
Member

left a comment

Approving again: I like the new external approach and the PR looks good to me. Again, I'm leaving more time for the discussion before we could consider a merge.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Mar 5, 2019

I don't understand why we need to switch from 0-ary to 1-ary. Is there anything that prevents having 0-ary external primitives?

@gasche

This comment has been minimized.

Copy link
Member

commented Mar 6, 2019

@alainfrisch: the current patchset does use a zero-ary primitive, which is compiled into a call to a unary C function (with a unit argument passed). See https://github.com/ocaml/ocaml/pull/2279/files#diff-211d3d6458c9b887e83e308ff769be00 .

I think that 0-ary primitives work, but we avoided them so far because they look like values (from a distance) while in fact they are not. (0-ary methods have a similar property) I believe it's fine once you got used to it, but I can understand that reasonable people would disagree.

@lpw25

This comment has been minimized.

Copy link
Contributor

commented Mar 6, 2019

I think that 0-ary primitives work, but we avoided them so far because they look like values (from a distance) while in fact they are not.

That's not the only reason to avoid them. They behave strangely with module ascription. If you ascribe Sys a module type with val argv : string array then you have a side-effect in a module coercion. I don't think it particularly matters in this case since the effect shouldn't ever really be observable, but you wouldn't want to do it for other primitives.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Mar 6, 2019

to a unary C function

I've seen that, and I'm asking why we need to pass one argument to the C functions. C functions are happy not to take any argument. Do we have something in the compiler that doesn't work well with 0-ary C function calls?

They behave strangely with module ascription.

Just for the fun it:

let () = let open (Stdlib : sig val __LOC__ : string end) in print_endline __LOC__
let () = let open Stdlib in print_endline __LOC__

prints:

File "stdlib.mli", line 239, characters 0-38
File ".\\foo.ml", line 2, characters 42-49
@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Mar 6, 2019

Yes, __LOC__ is an unfortunate precedent, showing what goes wrong with this approach :-)

At least %sys_argv in this PR is a special primitive, and there is still no way for users to declare their own 0-ary primitives as calling their own C functions. (I wrongly assumed the latter.) So, the potential for damage is no worse than __LOC__. Not sure I'm fully reassured.

My mental picture of the problem is not that Sys.argv changes over time, but that toplevel scripts execute in an environment that has a different implementation of the Sys module, differing from the initial implementation of Sys by the value of Sys.argv. With the bytecode toplevel I think I know how to implement this, using the Symtable functions. With the native toplevel I have no clue, but not much patience either.

@stedolan

This comment has been minimized.

Copy link
Contributor Author

commented Mar 6, 2019

I did consider that approach, and it does seem cleaner than any of the various forms of mutability. The difficulty is that it is not sufficient just to shadow Sys: one must also replace Arg with a module whose functions close over the new Sys.argv rather than the old, and one must also ensure any seperately-compiled libraries (e.g. Cmdliner) which are loaded by the script see the new Sys and not the one from the environment in which they were compiled. (I am not saying that this difficulty is insurmountable; I am saying that this is where I gave up)

@stedolan

This comment has been minimized.

Copy link
Contributor Author

commented Mar 6, 2019

(It's also possible that this difficulty is imagined, and that the Symtable functions you mention will just do the right thing. I don't understand how binding works here)

@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Mar 6, 2019

You're right, there is probably something wrong with the Arg module in this approach, since it is linked and initialized before the new Sys module shadows the initial module. For cmdliner and other external libraries, if they are loaded with #load inside the toplevel script, they should see the new Sys module.

@stedolan

This comment has been minimized.

Copy link
Contributor Author

commented Mar 18, 2019

@xavierleroy are you OK with this PR being merged?

Copy link
Contributor

left a comment

Looks globally good to me. One question and one suggestion below.

toplevel/opttoploop.ml Outdated Show resolved Hide resolved
toplevel/opttoploop.ml Show resolved Hide resolved
toplevel/toploop.ml Outdated Show resolved Hide resolved
stdlib/camlinternalMod.ml Outdated Show resolved Hide resolved
@stedolan stedolan force-pushed the stedolan:deprecate-obj-truncate branch from 8d93d10 to a0581fa Mar 23, 2019
@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Mar 24, 2019

CI precheck is cranky these days but eventually said "yes".
check-typo was not happy because of TAB characters in runtime/sys.c. (Naughty boy!) I pushed a fix straight to the branch.
I think this is ready for squashing and merging, but there is a bootstrap involved, so I'm not hitting the "squash" button, trying to remember the bootstrap policy instead.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Mar 24, 2019

My conclusion is that the bootstrap is fine. Merging ASAP.

@gasche

This comment has been minimized.

Copy link
Member

commented Mar 24, 2019

@xavierleroy I would rather keep a clean history on this one, because it is made out of several changes that are (1) independent and (2) tricky. If we look back at them in the future (I would be surprised if we didn't), we will be happy to have clean commits.

Could we just let @stedolan rebase appropriately and merge then? If you would like to go faster than that, I can offer to rework the history on my side.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Mar 24, 2019

I doubt the changes are completely independent, and some of the intermediate states are dead-ends.

I was in the process of writing a neat summary in a commit message, but I can let @stedolan handle the rebase / squash / merge. I still prefer commit messages that are not one-liners and actually explain what is being done, however.

stedolan added 5 commits Mar 4, 2019
CamlinternalMod contains an optimisation for the initialisation
of recursive modules containing closures, where dummy closures
are updated in-place. This optimisation was buggy on bytecode,
since the bytecode interpreter relies on the lengths of blocks
containing closures (see #4008).

This commit disables the optimisation for bytecode (where it
had much less effect than on native code, and where performance
is of less concern anyway). The optimisation is still applied
on native-code, but without the use of Obj.truncate.

Also adds a test for #4008 (which introduced the truncate).
When running a script with "ocaml foo.ml", the toplevel needs to
run foo.ml with a different Sys.argv than the initial value, since
foo.ml must not see the initial "ocaml" argument.

Previously, this was done with Obj.truncate to shorten the Sys.argv
array. This patch changes it by introducing a primitive %sys_argv.
Uses of this primitive expand to a call to a new C primitive, which
returns the argv array (and can be modified by the toplevel).
@stedolan stedolan force-pushed the stedolan:deprecate-obj-truncate branch from 6c503eb to 466d3bc Apr 1, 2019
@stedolan

This comment has been minimized.

Copy link
Contributor Author

commented Apr 1, 2019

I've rebased the patch series to be more informative and less historically accurate (The dead-ends for camlinternalmod and the Sys.argv hack are now gone). The commit messages for the nontrivial changes (Sys.argv and Camlinternalmod) are now informative. The trivial ones, in weak.ml and the testsuite, are still one-liners.

I would prefer if this is merged either by "Create a merge request" or by "Rebase and merge", not squashed into a single commit. This series contains two bootstraps, as this is necessary when changing a primitive used by the stdlib. Dropping the first bootstrap makes it hard to reproduce.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Apr 1, 2019

Thank you very much! I'm delighted to see this perfectly curated sequence of commits! I'll merge using a merge commit, so that the history is preserved and there is a pointer to these interesting discussions on the PR.

@xavierleroy xavierleroy merged commit 36d299b into ocaml:trunk Apr 1, 2019
0 of 2 checks passed
0 of 2 checks passed
continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
@gasche

This comment has been minimized.

Copy link
Member

commented Apr 1, 2019

Thanks @stedolan for the nice work. Finding a suitable replacement for the Sys.argv hack was sort of fun, even if it didn't happen on April Fool's day.

@shindere

This comment has been minimized.

Copy link
Contributor

commented Apr 2, 2019

This PR broke the spacing in runtime/sys.c.

Fixed in commit 364a2b6

@aalekseyev

This comment has been minimized.

Copy link
Contributor

commented Jul 18, 2019

The argv truncation hack is used in three separate applications at Jane Street and it's now packaged in the form of:

val Core.Sys.override_argv : string array -> unit

I'm not sure this was the expected outcome of this PR, given the comments above like this one:

the effect shouldn't ever really be observable

We don't think any of those applications are likely to break because of this, but they do become more fragile (an addition of let argv = argv somewhere can make the "override" ineffective).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.