Skip to content

atomic record fields: implement [%atomic.field ...]#13707

Closed
gasche wants to merge 25 commits into
ocaml:trunkfrom
gasche:atomic_fields_separated_1
Closed

atomic record fields: implement [%atomic.field ...]#13707
gasche wants to merge 25 commits into
ocaml:trunkfrom
gasche:atomic_fields_separated_1

Conversation

@gasche
Copy link
Copy Markdown
Member

@gasche gasche commented Jan 5, 2025

This PR is an alternative to #13404 which implements an alternative design that was also mentioned in RFC #39. It is currently built on top of #13404, so it implements both [%atomic.loc foo.count] (as #13404) and [%atomic.field count] (new in the present PR). I marked it as a "draft" because, if we decided that [%atomic.field count] is the better design, we would probably remove support for [%atomic.loc foo.count] before merging.

Summary of previous episodes

Consider a record type declaration with a field annotation as atomic:

type t = { mutable count : int [@atomic] }

The atomic record fields RFC (#39) initially proposed a new construction [%atomic.field count], which would have type (t, int) Atomic.Field.t, and whose runtime representation would be the integer 0 (the offset of the record field). One difficulty we noted in the RFC is that this has to support type-based disambiguation of record fields.

Then @bclement-ocp made a simplifying proposal, which looks like [%atomic.loc x.count] (when x has type t), at type int Atomic.Loc.t. The runtime representation is the pair (x, 0), but the pair is optimized away if an atomic primitive is directly applied to this value. This is simpler because the magic type Atomic.Foo.t takes a single type parameter, and there is no need for new/specific label disambiguation code, we can reuse the exact same logic as field access.

This proposal was implemented by @clef-men in #13404, and reviewed by @OlivierNicole. When we discussed it at the last developer meeting, people asked if we really want the [%atomic.loc x.count] version and if we should consider [%atomic.field count] instead, for the following reasons:

  • this is slightly more expressive
  • when the atomic field expression is not directly applied to an atomic primitive (which is the common case), then this is slightly more efficient

After positive feedback on this suggestion from @polytypic, we decided to explore the [%atomic.field ...] route. The present PR is the result of my exploration.

Bad interactions between %atomic.field and inline record fields.

My conclusion after implementing this feature is that [%atomic.field ...] does not work well for atomic fields in inline records. I consider this a serious blocker because inline records are being used in concurrent OCaml code that would benefit from atomic fields.

The problem comes from an intentional limitation of OCaml support for inline records: consider

type u = Empty | Filled of { mutable amount : int [@atomic] }

let fetch_and_add (x : u) n = match x with
  | Empty -> invalid_arg "fetch_and_add"
  | Filled r -> Atomic.Field.fetch_and_add r [%atomic.field amount] n

In this example, Filled r has type u, so the type of r is the inline record type of Filled. This type is pretty-printed as u.Filled by the type-checker, but:

  1. this is not valid syntax, the user is not allowed to write this
  2. the type-checker ensures that this type does not "escape" the scope of r by preventing any use of r other than direct field access

Due to restriction (1), we cannot explicitly disambiguate all %atomic.field expressions with a type annotation, in this example for example we cannot write ([%atomic.field amount] : (u.Filled, _) Atomic.Field.t), so there are situations where it will be impossible for the user to correctly indicate which of the several record fields amount they mean.

I thought (see my "hunch" in ocaml/RFCs#39) that this would be okay in practice because in the cases where the %atomic.field expression is directly applied to an atomic primitive, as is the case in this example, bidirectional propagation of argument types should do the trick: Atomic.Field.fetch_and_add has type 'r -> ('r, int) Atomic.Field.t -> int -> int, so the type of r (the first argument of fetch_and_add should be propagated) into the expected type of the second argument, allowing disambiguation.

Unfortunately this does not work as expected, due to restriction (2) the use of r in argument position is forbidden by the type-checking code for variables of inline record type, and this example fails with the following error:

Error: This form is not allowed as the type of the inlined record could escape.

My conclusion is that %atomic.field is not usable with inline records, unless we consider significant changes to the type-checking of inline records.

Where to go from there?

I see three reasonable options going forward:

  1. Give up on %atomic.field and merge Atomic record fields #13404 which implements the simpler %atomic.loc, has been morally approved and carefully reviewed.

  2. Merge something like the current state of the present PR, with a fully working %atomic.loc and additionally a version of %atomic.field that has advantages but does not work with inline records. Later we can consider changing the type-checking of inline records to lift this limitation, and we would end up with two features that work well. (And some duplication as each atomic primitive foo would be available as Atomic.Field.foo and also Atomic.Loc.foo, but there are 5 of them so it's okay.)

  3. Decide that we absolutely want %atomic.field and not %atomic.loc, and that we must first change the type-checking of inline records to lift the current limitations of %atomic.field.

To improve the type-checking of inline records, I think we could do one of the following:

A. give up on the idea that inline record types cannot be named, make u.Filled actual syntax, and lift all limitations on those types, or

B. instead of the current heavy-handed approach to preventing the escape of inline record types, handle them like existential types (scoping them at the current level), so that mentioning them in the result type of the expression is forbidden but passing them to arbitrary functions inside the expression is fine.

gasche and others added 25 commits November 21, 2024 14:50
This is a breaking change because this function was (unfortunately)
exposed outside CAML_INTERNALS, and is used by exactly one external
user, you guessed it:
  https://github.com/ocaml-multicore/multicore-magic/blob/360c2e829c9addeca9ccaee1c71f4ad36bb14a79/src/Multicore_magic.mli#L181-L185
  https://github.com/ocaml-multicore/multicore-magic/blob/360c2e829c9addeca9ccaee1c71f4ad36bb14a79/src/unboxed5/multicore_magic_atomic_array.ml#L36-L43

We chose to change the prototype to remain consistent with the naming
convention for the new caml_atomic_*_field primitives, which will be
added to support atomic record fields.

User code can easily adapt to this new prototype we are using, but not
in a way that is compatible with both old and new versions of
OCaml (not without some preprocessing at least).

Another option would be to expose

    int caml_atomic_cas_field(value obj, intnat fld, value, value)
    value caml_atomic_cas_field_boxed(value obj, value vfld, value, value)

but no other group of primitives in the runtime uses this _boxed
terminology, they instead use

    int caml_atomic_cas_field_unboxed(value obj, intnat fld, value, value)
    value caml_atomic_cas_field(value obj, value vfld, value, value)

and this would again break compatiblity -- it is not easier to convert
code to that two-version proposal, and not noticeably more efficient.

So in this case we decided to break compatibility (of an obscure,
experimental, undocumented but exposed feature) in favor of
consistency and simplificity of the result.
…_exchange_field] and [caml_atomic_fetch_add_field].
Uses of existing atomic primitives %atomic_foo, which act on
single-field references, are now translated into %atomic_foo_field,
which act on a pointer and an offset -- passed as separate arguments.

In particular, note that the arity of the internal Lambda primitive
    Patomic_load
increases by one with this patchset. (Initially we renamed it into
    Patomic_load_field
but this creates a lot of churn for no clear benefits.)

We also support primitives of the form %atomic_foo_loc, which
expects a pair of a pointer and an offset (as a single argument),
as we proposed in the RFC on atomic fields
  ocaml/RFCs#39
(but there is no language-level support for atomic record fields yet)

Co-authored-by: Clément Allain <clef-men@orange.fr>
Requires a bootstrap.

Co-authored-by: Gabriel Scherer <gabriel.scherer@gmail.com>
This type will be used for ['a Atomic.Loc.t], as proposed
in the RFC
  ocaml/RFCs#39

We implement this here to be able to use it in the stdlib later,
after a bootstrap.
We want to use [mark_label_used] in a context where we cannot easily
find the label declaration, only the label description (from the
environment).
This bootstrap is not required by a compiler change, but it enables
the use of the predefined type `'a atomic_loc` and the
expression-former [%atomic.loc ...] in the standard library.
@fpottier
Copy link
Copy Markdown
Contributor

fpottier commented Jan 8, 2025

Maybe we could discuss the treatment of inline records again. In the above code, when we match with the pattern Filled r, what is the reason why we currently insist that r should not escape? As long as we never mutate an object's tag, it seems that we could allow r to escape, with a record type. Am I too naive?

@bclement-ocp
Copy link
Copy Markdown
Contributor

While discussing an optimisation that could be made for what I'll call "heterogenous arrays":

type _ field = A : int field | B : float field | C : string field

type drecord = { mutable a : int ; mutable b : float ; mutable c : string }

let[@inline] ( .!() ) (type a) r (f : a field) : a =
  match f with
  | A -> r.a
  | B -> r.b
  | C -> r.c

let[@inline] ( .!()<- ) (type a) r (f : a field) (v : a) =
  match f with
  | A -> r.a <- v
  | B -> r.b <- v
  | C -> r.c <- v

where the accessed could be simplified to direct access instead of using a switch, @lthls remarked that the [%atomic.field] construction, if extended to non-atomic records, would effectively perform this optimisation.

type drecord = { mutable a : int ; mutable b : float ; mutable c : string }

type 'a field = (drecord, 'a) Field.t

let a : int field = [%field a]
let b : float field = [%field b]
let c : string field = [%field c]

let[@inline] ( .!() ) (type a) r (f : a field) : a = Field.get r f

let[@inline] ( .!()<- ) (type a) r (f : a field) (v : a) = Field.set r f v

It's not related to atomics at all, but if we figure out the issues related to inline records (which I believe is ongoing), it is an example where it would be beneficial to extend the Field API to non-atomics.

@clef-men
Copy link
Copy Markdown
Contributor

I believe this optimization could be useful in Kcas:
https://github.com/ocaml-multicore/kcas/blob/e859db298765dfa4b63dc9c7f50a92ef21a51dbf/src/kcas/kcas.ml#L253
https://github.com/ocaml-multicore/kcas/blob/e859db298765dfa4b63dc9c7f50a92ef21a51dbf/src/kcas/kcas.ml#L262

We discussed it with @gasche some time ago. As far as I remember, we were thinking of performing it in the compiler.

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Feb 10, 2025

Yes, I think it wouldn't be too hard to extend the affine-switch-detection pass to deal with integer constants under a few (hardcoded) contexts of interest, here applications of P{get,set}field. But I agree that doing it in the language is also nice... if we can manage to stay on the razor-thin consensual side of the first-class-field debate.

@gasche gasche mentioned this pull request Apr 12, 2025
@OlivierNicole
Copy link
Copy Markdown
Contributor

I assume we can close this now that #13404 has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants