Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate structured constants into their Obj.t representation at compile rather than link time #11997

Merged
merged 2 commits into from
Feb 22, 2023

Conversation

shindere
Copy link
Contributor

@shindere shindere commented Feb 7, 2023

This PR has been taken out of #11996. It changes the way structured
constants are represented in CMO file. Before this PR, these constants were
stored using their internal representation as provided by the Lambda module.
With this PR constants are further translated at compile-time so that they
can be stored as objects of type Obj.t. In other words, with this PR
compilation goes one step further and thus less has to be done to read a CMO
file, since the translation step that took place at load time now takes
place at compile time.

This change is one step towards making file formats use standard types only
(that is, types provided by the standard library) rather than types
which are internal to the compiler.

As reported
here,
this PR has an impact on the output of ocamlobjinfo. Indeed, before this
PR constants were stored in a typed way, which is nolonger the case with
this PR. ocamlobjinfo can thus no longer print the constants as nicely as
it did before, since it has less knowledge about them. That's why this PR
proposes to mark the corresponding change as a breakingchange.

Copy link
Contributor

@nojb nojb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the code, which is clearly correct.

@@ -162,7 +186,7 @@ let init () =
Const_base(Const_int (-i-1))
])
in
literal_table := (c, cst) :: !literal_table)
literal_table := (c, (transl_const cst)) :: !literal_table)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
literal_table := (c, (transl_const cst)) :: !literal_table)
literal_table := (c, transl_const cst) :: !literal_table)

@shindere
Copy link
Contributor Author

shindere commented Feb 7, 2023 via email

@@ -20,7 +20,7 @@ open Misc
(* Relocation information *)

type reloc_info =
Reloc_literal of Lambda.structured_constant (* structured constant *)
Reloc_literal of Obj.t (* structured constant *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is no longer valid

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relocatable literal reads fine to me

@hhugo
Copy link
Contributor

hhugo commented Feb 7, 2023

Could you show how obj dump differ ? Should we add a test for this ?

@nojb
Copy link
Contributor

nojb commented Feb 7, 2023

Could you show how obj dump differ ? Should we add a test for this ?

As far as I can see, the only difference is that it cannot differentiate some immediate (int-like) values, so eg char and int (which are both printed as integers).

@shindere
Copy link
Contributor Author

shindere commented Feb 7, 2023 via email

@gasche
Copy link
Member

gasche commented Feb 7, 2023

Remark: I think that I was wrong about this change being useful for camlboot. The problem we have in camlboot is that the format of bytecode executables is determined by the host runtime rather than the compiler sources: constants are serialized as OCaml values (and the serialization format of OCaml values is itself implementation-defined) instead of being serialized in a format defined in the compiler/byterun sources themselves.

The present change affects the representation of constants in .cmo files but not in bytecode executables, it moves .cmo files constant from Lambda.structured_constant to Obj.t, but the bytecode executables were already using Obj.t. I have no objection there (why not indeed use the same format in .cmo files and final binaries), but this is not related to the camlboot needs.

@hhugo
Copy link
Contributor

hhugo commented Feb 7, 2023

(The change will be useful to jsoo)

@shindere
Copy link
Contributor Author

shindere commented Feb 7, 2023 via email

@shindere
Copy link
Contributor Author

shindere commented Feb 7, 2023 via email

@hhugo
Copy link
Contributor

hhugo commented Feb 7, 2023

To me, it seems weird to have a "breaking change" without any test exhibiting the change. That said, I don't care too much about the change of behavior and wouldn't be sad if tests are not added.

I'm unsure what we should be testing for?

I think just an expect test with the result of dumpobj on some (stable enough) bytecode file.

@nojb
Copy link
Contributor

nojb commented Feb 7, 2023

There is already a test for dumpobj at https://github.com/ocaml/ocaml/blob/trunk/testsuite/tests/tool-dumpobj/test.ml.

Having said that, I don't think this PR should be marked as a breaking change and I am not even sure a test should be added for the output of dumpobj: dumpobj is strictly a developer tool and is not documented anywhere.

@hhugo
Copy link
Contributor

hhugo commented Feb 7, 2023

Extending the existing test with one value per constructor in structure_constant seems good and should not add too much noise

@nojb
Copy link
Contributor

nojb commented Feb 7, 2023

Extending the existing test with one value per constructor in structure_constant seems good and should not add too much noise

Good idea.

@shindere
Copy link
Contributor Author

shindere commented Feb 7, 2023 via email

@alainfrisch
Copy link
Contributor

alainfrisch commented Feb 7, 2023

I vaguely remember thinking about proposing that a couple of decades ago, but changing my mind because I reached the conclusion that it would introduce the first manipulation of "generic" Obj.t values within the compiler itself (and could perhaps complicate some bootstrap/cross-compilation scenario, etc). Implicitly the format of .cmo files, seen as typed OCaml values, would now depend on the concrete representation of OCaml values at runtime. The compiler already depends on unsafe marshaling, but this could conceptually be replaced by explicit serializers derived from type definitions; after this change, we are stuck with the need to use unsafe features in the compiler itself. At least, we are lucky that js_of_ocaml is quite compatible with the marshaling format (at least for reading it)!

@xavierleroy
Copy link
Contributor

Meanwhile, I have marked the change as non-breaking. This really makes
sense to me since the tool is not even installed. So I am indeed a bit
skeptical about the importance of adding more tests.

Agreed! dumpobj is just there to help with debugging the bytecode compiler, if the need ever arises. Its output is a best effort, and remains so after the proposed change.

@alainfrisch
Copy link
Contributor

Concretely, imagine we want to implement maximal sharing of structured constants during the final linking stage. We would need to work on Obj.t values within the compiler. And in the mode where float arrays have their special unboxed representation, it's not ok to consider Obj.t as a universal type. E.g. it's unsound to create an array of Obj.t if some of them can be floats but not all of them. With the change proposed here, we'd need to worry about such low-level problems (as in any project working on Obj.t).

Copy link
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks! Minor comments below.

In more details: this is moving the conversion from Lambda.structured_constant to Obj.t earlier in the compilation chain. I don't see what can go wrong with this. If it can help simplifying Dynlink, that's good. Actually, it can also reduce the size of .cmo and .cma files (because of a more compact encoding of structured constants), and that's good too.

@@ -20,7 +20,7 @@ open Misc
(* Relocation information *)

type reloc_info =
Reloc_literal of Lambda.structured_constant (* structured constant *)
Reloc_literal of Obj.t (* relocatable literal *)
Copy link
Contributor

@xavierleroy xavierleroy Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather leave "structured constant", as this is what it is: a compile-time constant that is not just an integer or a constant constructor. Or just "compile-time constant" if you prefer. There's nothing "relocatable" in this constant.

Changes Outdated
Comment on lines 430 to 431
- #11997: translate structured constants into their Obj.t representation
at compile time rather than run time. Changes the way dumpobj prints
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with the "rather than run time" part. Currently, the translation is performed at link-time: static link-time in the bytecode linker, run-time link-time in Dynlink. Suggested change:

Suggested change
- #11997: translate structured constants into their Obj.t representation
at compile time rather than run time. Changes the way dumpobj prints
- #11997: translate structured constants into their Obj.t representation
at compile time rather than link time. Changes the way dumpobj prints

@shindere
Copy link
Contributor Author

shindere commented Feb 8, 2023 via email

@xavierleroy
Copy link
Contributor

This PR is stalled, and it might be because I haven't replied to @shindere's question

What do you think about the concerns raised by@alainfrisch, though?

I think that if we really want to hash-cons the initial global table at link-time, we can still do it by working at the Obj.t level instead of at the structured_constant level. It will be more ugly, for sure, but I don't see any impossibility here.

Moreover, I don't think something that we might do in the future (but haven't done in 25 years of OCaml yet) and might become a little bit more difficult should block an improvement that we can do now.

@shindere shindere changed the title Translate structured constants as part of compilation rather than at runtime Translate structured constants into their Obj.t representation at compile rather than link time Feb 22, 2023
shindere and others added 2 commits February 22, 2023 17:31
Before this commit, relocatable literals under the Reloc_literal
constructor were represented by objects
of type Lambda.structured_constant
in the CMO files. These objects were translated into their Obj.t
representation as they were loaded.

With this commit, the translation from Lambda.structured_constant
to Obj.t occurs as part of the compilation rather than when the CMO file
is loaded.
@shindere
Copy link
Contributor Author

shindere commented Feb 22, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants