Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate Obj.set_tag #1725

Merged
merged 6 commits into from Feb 14, 2019

Conversation

@stedolan
Copy link
Contributor

commented Apr 16, 2018

Under the multicore GC, object headers (tags and lengths) are immutable. Not in the current sense of "don't change these unless you know what you're doing", but in the stronger sense of "if you ever change these, the GC will randomly segfault as it scans the object you're messing with from another thread". Consequently, Obj.set_tag is unsupported. (Obj.truncate too, although this PR only talks about the former).

Currently, this PR doesn't contain any code. Before doing a bunch of hacking, I'd like to discuss with the maintainers (cc @xavierleroy, @damiendoligez) whether (a) deprecating and later removing set_tag is something they find acceptable in principle and (b) whether the below is a reasonable plan for doing it.

This change will break code that uses Obj.set_tag, although I'd argue that occasional breakage is the price of using Obj. Still, I did some grepping of the stdlib and the OPAM universe to find uses of set_tag, and work out replacements for common uses:

Lazy

CamlinternalLazy uses set_tag to change Lazy_tag to Forward_tag when thunks are forced.

I suggest using an extra word in the Lazy.t structure to distinguish forced and unforced thunks, which will increase the size of unforced thunks from 2 to 3 words. For thunks that are eventually forced the increase is temporary, since the GC short-circuits forced thunks during tracing.

Creating new blocks with unexpected tags

Some code (example) uses set_tag to create a block with a tag other than what OCaml normally chooses. (@alainfrisch has also written about using this trick to get polymorphic comparison to understand hashconsing, although I haven't come across uses in the wild).

For these uses, a new primitive Obj.with_tag : int -> Obj.t -> Obj.t suffices: like dup, but allows the user to specify an alternative tag for the copy. This still allows the construction of objects with unusual tags, but by creating a fresh object with the desired tag, rather than by mutation. If performance is a concern, an optimisation that detects when with_tag is applied directly to an allocation and elides the first allocation is easy enough to implement.

Preventing the GC from scanning int array

At runtime, int array looks just like any other array, so gets needlessly scanned by the GC. Some programs (example) explicitly change the tag of integer arrays to prevent scanning.

Unlike the other uses, this use of set_tag can be silently ignored without affecting correctness. It is a reasonable optimisation, though, and it would be nice to support it properly. Perhaps we could introduce a new intarray type (matching floatarray) which has the same layout as ordinary arrays, but is tagged with No_scan_tag. (Unlike floatarray, the layout is always compatible with standard arrays, so = private int array is a reasonable definition of this type).

Unfixable cases

Some code is not so easily fixable, either by making optimisations based on undocumented properties of the current runtime (example), or by targeting not the OCaml language but a runtime model that explictly includes set_tag (example).

@bluddy

This comment has been minimized.

Copy link

commented Apr 16, 2018

Perhaps we could introduce a new intarray type (matching floatarray) which has the same layout as ordinary arrays, but is tagged with No_scan_tag.

There should be other cases like this. Perhaps it would be better to dedicate a bit in the 64-bit header to indicate whether the GC should scan the data.

@damiendoligez

This comment has been minimized.

Copy link
Member

commented Apr 26, 2018

There should be other cases like this. Perhaps it would be better to dedicate a bit in the 64-bit header to indicate whether the GC should scan the data.

I don't think so. Whether data should be scanned by the GC or not, is highly dependent on the data itself, there is no orthogonality here.

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented May 11, 2018

I suggest using an extra word in the Lazy.t structure to distinguish forced and unforced thunks, which will increase the size of unforced thunks from 2 to 3 words. For thunks that are eventually forced the increase is temporary, since the GC short-circuits forced thunks during tracing.

More generally, can you describe how lazy blocks are supported in the multicore GC? Marking the block as a "forward" block and storing the forward pointer need to be done atomically from the point of view of other threads. How do you achieve that?

The performance impact of the new fields should be evaluated. The size increase could be noticeable for lazy-heavy data structures , and even if lazy thunks are forced quickly the extra allocations can put more pressure on the GC. Also, scanning and other generic operations could become a bit slower.

(Unlike floatarray, the layout is always compatible with standard arrays, so = private int array is a reasonable definition of this type).

I agree that adding intarray is a good idea, but I would refrain from exposing the fact that it's a subtype of "int array". This would allow creating two int array values with the same elements, but which would be different when compared with the generic equality operator.

@bobot bobot referenced this pull request May 14, 2018
@kayceesrk kayceesrk referenced this pull request May 15, 2018
@kayceesrk

This comment has been minimized.

Copy link
Contributor

commented May 16, 2018

@alainfrisch Here is the issue that tracks Lazy proposal for Multicore OCaml ocaml-multicore/ocaml-multicore#188

@whitequark whitequark referenced this pull request Jun 16, 2018
3 of 10 tasks complete
Copy link
Contributor

left a comment

I apologise hugely for requesting changes on a 1-line GPR!

stdlib/obj.mli Outdated Show resolved Hide resolved
stedolan added 2 commits Feb 12, 2019
  - Adds Obj.with_tag as a partial replacement.
  - Adds caml_obj_make_forward for use of Camlinternal{Lazy,Mod}
@stedolan stedolan force-pushed the stedolan:deprecate-set-tag branch from 65b7714 to 51025d1 Feb 12, 2019
@stedolan

This comment has been minimized.

Copy link
Contributor Author

commented Feb 12, 2019

New implementation just pushed.

This one keeps the current behaviour for Lazy, which was the most controversial part of the current design. Multicore's invariants have been weakened from "tags never change" to "tags never change unless they're Lazy_tag" (see ocaml-multicore/ocaml-multicore#226).

A new primitive, Obj.with_tag has been added to cover the use-cases of allocating an object with a nonstandard tag. There's also a new optimisation (with tests) in simplif.ml to ensure that patterns like Obj.with_tag (Obj.repr { ... new allocation ... }) do not cause any more allocations than the original set_tag-based code.

@stedolan stedolan dismissed dra27’s stale review Feb 12, 2019

(refers to old version)

bytecomp/simplif.ml Show resolved Hide resolved
runtime/obj.c Outdated Show resolved Hide resolved
stdlib/camlinternalLazy.ml Show resolved Hide resolved
@@ -55,6 +55,7 @@ external field : t -> int -> t = "%obj_field"
*)
external set_field : t -> int -> t -> unit = "%obj_set_field"
external set_tag : t -> int -> unit = "caml_obj_set_tag"
[@@ocaml.deprecated "Use with_tag instead."]

This comment has been minimized.

Copy link
@gasche

gasche Feb 12, 2019

Member

It's not necessarily a great idea to deprecate a feature at the same version that the replacement is added: a maintainer that would follow the deprecation advice must force its users to upgrade to the latest version (or use conditionals). Could we wait for one version between with_tag and set_tag's deprecation?

This comment has been minimized.

Copy link
@stedolan

stedolan Feb 12, 2019

Author Contributor

Happy to follow whatever deprecation policy works best here, but I don't really understand the issue. I thought @@ocaml.deprecated was the way to signal that a replacement is available? Is the concern here programs compiled with -warn-error breaking?

This comment has been minimized.

Copy link
@damiendoligez

damiendoligez Feb 13, 2019

Member

Normally, we would deprecate it now and remove it in the next deprecation purge, in about 5 years. We still don't have any mechanism to indicate degrees of deprecation (or how much time before it gets removed). Such a mechanism might be an intermediate step before deprecation.

In the case of Obj.set_tag I think we'll remove it out-of-cycle to avoid delaying multicore, so it's better to deprecate right now anyway.

This comment has been minimized.

Copy link
@dbuenzli

dbuenzli Feb 13, 2019

Contributor

It's not necessarily a great idea to deprecate a feature at the same version that the replacement is added:

Regarding this comment about @gasche I fear a bit nothing is ever going to be deprecated (people will simply forget) if nothing special is done. One simple measure would be to have in the repo text file with a table of the form:

ID | vReplacement | vDeprecated notice | vRemove
--------------------------------------------------

Before a release one could then apply appropriate changes according to what the table says.

This comment has been minimized.

Copy link
@gasche

gasche Feb 13, 2019

Member

@dbuenzli yes, that sounds like a good idea. I guess some of your merged PRs have some maybe-deprecate-later identifiers in flight, do you have enough to send a PR to populate that file? (Did we merge PrintExc successor yet?)

(The file should also point to the (M/G)PR that contains the deprecation action/discussion.)

This comment has been minimized.

Copy link
@nojb

nojb Feb 13, 2019

Contributor

(Did we merge PrintExc successor yet?)

No, it is #2137, but it is not yet ready, more work needed.

This comment has been minimized.

Copy link
@dbuenzli

dbuenzli Feb 13, 2019

Contributor

I could have a look but a bit busy® right now®.

@gasche

This comment has been minimized.

Copy link
Member

commented Feb 12, 2019

I have an uneasy feeling when compiler code matches on the name of C functions. I suspect that the externals whose semantic is part of the language (enough that the compiler knows about them) should be %foo functions, that later get turned into C primitives by the unsupporting backends (possibly all backends) if desired.
However, I understand why the PR just uses a new C primitive, given that the rest of the Obj module is in this style (but then, it looks like there were no Obj-specific optimizations in the compiler before?).

@stedolan

This comment has been minimized.

Copy link
Contributor Author

commented Feb 12, 2019

I was also unsure about exposing this as "caml_obj_with_tag" vs. "%obj_with_tag". In doubt, I went for the version that's less code, but I don't have a strong preference. Do you think this would be better as a primitive + later conversion to a C call?

@gasche

This comment has been minimized.

Copy link
Member

commented Feb 12, 2019

Personally I think that all of Obj's primitives could/should be moved to %obj_* primitives instead of C calls, because we make assumptions on them, but that can be done separately. (And it may break compatibility of some libraries that re-export Obj, sigh...)

I also don't have very strong feelings on the best deprecation policy.

runtime/obj.c Show resolved Hide resolved
@gasche
gasche approved these changes Feb 12, 2019
Copy link
Member

left a comment

I reviewed the patch for correctness and I think the new primitive is a nice thing to have in any case. The only thing I'm unsure about is the idea of deprecating set_tag right away; for a more widely used API I would frown upon deprecated in the same version its replacement appears, but this might not make a big difference for such a low-level function?

I'm approving, but I think this deserves a third pair of eyes (and maybe a decision on deprecation policy) before merging.

@stedolan

This comment has been minimized.

Copy link
Contributor Author

commented Feb 12, 2019

I don't understand the AppVeyor failure. It's complaining about the new C primitive, but the Travis build is fine. Perhaps a bootstrap would fix it? I don't understand why this differs on windows. @dra27 ?

@nojb

This comment has been minimized.

Copy link
Contributor

commented Feb 12, 2019

It fails when building flexlink (only used on Windows) using the bootstrap compiler. My understanding is that the only way out is a bootstrap.

Copy link
Member

left a comment

Third pair of eyes here. Looks good except for a small (and optional) suggestion.
In this case I think it's OK to deprecate right away.

@@ -55,6 +55,7 @@ external field : t -> int -> t = "%obj_field"
*)
external set_field : t -> int -> t -> unit = "%obj_set_field"
external set_tag : t -> int -> unit = "caml_obj_set_tag"
[@@ocaml.deprecated "Use with_tag instead."]

This comment has been minimized.

Copy link
@damiendoligez

damiendoligez Feb 13, 2019

Member

Normally, we would deprecate it now and remove it in the next deprecation purge, in about 5 years. We still don't have any mechanism to indicate degrees of deprecation (or how much time before it gets removed). Such a mechanism might be an intermediate step before deprecation.

In the case of Obj.set_tag I think we'll remove it out-of-cycle to avoid delaying multicore, so it's better to deprecate right now anyway.

testsuite/tests/lib-obj/with_tag.ml Show resolved Hide resolved
@gasche

This comment has been minimized.

Copy link
Member

commented Feb 13, 2019

@stedolan: you should add Damien as reviewer and then it's good to merge.

@gasche gasche merged commit 5a29ea7 into ocaml:trunk Feb 14, 2019
2 checks passed
2 checks passed
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@gasche

This comment has been minimized.

Copy link
Member

commented Feb 14, 2019

Merged, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.