-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Translate structured constants into their Obj.t representation at compile rather than link time #11997
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the code, which is clearly correct.
bytecomp/symtable.ml
Outdated
@@ -162,7 +186,7 @@ let init () = | |||
Const_base(Const_int (-i-1)) | |||
]) | |||
in | |||
literal_table := (c, cst) :: !literal_table) | |||
literal_table := (c, (transl_const cst)) :: !literal_table) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
literal_table := (c, (transl_const cst)) :: !literal_table) | |
literal_table := (c, transl_const cst) :: !literal_table) |
7c918c7
to
fe283ce
Compare
Nicolás Ojeda Bär (2023/02/07 04:59 -0800):
@nojb commented on this pull request.
I reviewed the code, which is clearly correct.
thanks a lot!
> @@ -162,7 +186,7 @@ let init () =
Const_base(Const_int (-i-1))
])
in
- literal_table := (c, cst) :: !literal_table)
+ literal_table := (c, (transl_const cst)) :: !literal_table)
```suggestion
literal_table := (c, transl_const cst) :: !literal_table)
```
I have applied your suggestion and added you as a reviewer to the
Changes entry.
|
@@ -20,7 +20,7 @@ open Misc | |||
(* Relocation information *) | |||
|
|||
type reloc_info = | |||
Reloc_literal of Lambda.structured_constant (* structured constant *) | |||
Reloc_literal of Obj.t (* structured constant *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is no longer valid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
relocatable literal
reads fine to me
Could you show how obj dump differ ? Should we add a test for this ? |
As far as I can see, the only difference is that it cannot differentiate some immediate (int-like) values, so eg |
hhugo (2023/02/07 05:20 -0800):
@hhugo commented on this pull request.
> @@ -20,7 +20,7 @@ open Misc
(* Relocation information *)
type reloc_info =
- Reloc_literal of Lambda.structured_constant (* structured constant *)
+ Reloc_literal of Obj.t (* structured constant *)
The comment is no longer valid
Indeed. Any suggestion beyond `relocatable literal`?
What is no longer (so) valid is that it's structured, right? But
shallthe comment mention that it's a constant?
|
Remark: I think that I was wrong about this change being useful for camlboot. The problem we have in camlboot is that the format of bytecode executables is determined by the host runtime rather than the compiler sources: constants are serialized as OCaml values (and the serialization format of OCaml values is itself implementation-defined) instead of being serialized in a format defined in the compiler/byterun sources themselves. The present change affects the representation of constants in .cmo files but not in bytecode executables, it moves .cmo files constant from |
(The change will be useful to jsoo) |
fe283ce
to
b3b200e
Compare
hhugo (2023/02/07 05:52 -0800):
@hhugo commented on this pull request.
> @@ -20,7 +20,7 @@ open Misc
(* Relocation information *)
type reloc_info =
- Reloc_literal of Lambda.structured_constant (* structured constant *)
+ Reloc_literal of Obj.t (* structured constant *)
`relocatable literal` reads fine to me
Okay, I did the change.
|
hhugo (2023/02/07 05:22 -0800):
Could you show how obj dump differ ?
I tried to come up with an example to elaborate on @nojb's response but,
to my surprise, I was not able to exhibit any difference. I should make
clear thatthe expected difference is in the output of `dumpobj` and not
`ocamlobjinfo` as I think I have written, so I updated the Changes entry
accordingly.
Should we add a test for this ?
I'm unsure what we should be testing for?
|
To me, it seems weird to have a "breaking change" without any test exhibiting the change. That said, I don't care too much about the change of behavior and wouldn't be sad if tests are not added.
I think just an expect test with the result of |
There is already a test for Having said that, I don't think this PR should be marked as a breaking change and I am not even sure a test should be added for the output of |
Extending the existing test with one value per constructor in structure_constant seems good and should not add too much noise |
Good idea. |
b3b200e
to
578b2da
Compare
Nicolás Ojeda Bär (2023/02/07 07:50 -0800):
There is already a test for `dumpobj` at https://github.com/ocaml/ocaml/blob/trunk/testsuite/tests/tool-dumpobj/test.ml.
Having said that, I don't think this PR should be marked as a breaking
change and I am not even sure a test should be added for the output of
`dumpobj`: `dumpobj` is strictly a developer tool and is not
documented anywhere.
@nojb should I understand that @hhugo has convinced you and that you thus
would like to have a test, or am I not understanding you correctly here?
I now remember that we added the `dumpobj` test to make sure the tool
was compiled and linked properly and had no obvious breakage.
Meanwhile, I have marked the change as non-breaking. This really makes
sense to me since the tool is not even installed. So I am indeed a bit
skeptical about the importance of adding more tests.
|
I vaguely remember thinking about proposing that a couple of decades ago, but changing my mind because I reached the conclusion that it would introduce the first manipulation of "generic" Obj.t values within the compiler itself (and could perhaps complicate some bootstrap/cross-compilation scenario, etc). Implicitly the format of .cmo files, seen as typed OCaml values, would now depend on the concrete representation of OCaml values at runtime. The compiler already depends on unsafe marshaling, but this could conceptually be replaced by explicit serializers derived from type definitions; after this change, we are stuck with the need to use unsafe features in the compiler itself. At least, we are lucky that js_of_ocaml is quite compatible with the marshaling format (at least for reading it)! |
Agreed! |
Concretely, imagine we want to implement maximal sharing of structured constants during the final linking stage. We would need to work on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks! Minor comments below.
In more details: this is moving the conversion from Lambda.structured_constant to Obj.t earlier in the compilation chain. I don't see what can go wrong with this. If it can help simplifying Dynlink, that's good. Actually, it can also reduce the size of .cmo and .cma files (because of a more compact encoding of structured constants), and that's good too.
file_formats/cmo_format.mli
Outdated
@@ -20,7 +20,7 @@ open Misc | |||
(* Relocation information *) | |||
|
|||
type reloc_info = | |||
Reloc_literal of Lambda.structured_constant (* structured constant *) | |||
Reloc_literal of Obj.t (* relocatable literal *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather leave "structured constant", as this is what it is: a compile-time constant that is not just an integer or a constant constructor. Or just "compile-time constant" if you prefer. There's nothing "relocatable" in this constant.
Changes
Outdated
- #11997: translate structured constants into their Obj.t representation | ||
at compile time rather than run time. Changes the way dumpobj prints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with the "rather than run time" part. Currently, the translation is performed at link-time: static link-time in the bytecode linker, run-time link-time in Dynlink. Suggested change:
- #11997: translate structured constants into their Obj.t representation | |
at compile time rather than run time. Changes the way dumpobj prints | |
- #11997: translate structured constants into their Obj.t representation | |
at compile time rather than link time. Changes the way dumpobj prints |
578b2da
to
967f5e6
Compare
Xavier Leroy (2023/02/07 09:02 -0800):
@xavierleroy approved this pull request.
Thanks a lot foor your approval, Xavier.
Looks good to me.
What do you think about the concerns raised by @alainfrisch, though?
In more details: this is moving the conversion from
Lambda.structured_constant to Obj.t earlier in the compilation chain.
I don't see what can go wrong with this. If it can help simplifying
Dynlink, that's good. Actually, it can also reduce the size of .cmo
and .cma files (because of a more compact encoding of structured
constants), and that's good too.
Indeed. To make this point more precise, the `structured_constant` type
depends on the `constant` type defined in `Asttypes`, which means that,
for string constants, their location in the source file is encoded in
the `structured_constant` type and was hence stored in .cmo and .cma
files, which would no longer be the case after this change since `Obj.t`
objects do not embed locations.
> @@ -20,7 +20,7 @@ open Misc
(* Relocation information *)
type reloc_info =
- Reloc_literal of Lambda.structured_constant (* structured constant *)
+ Reloc_literal of Obj.t (* relocatable literal *)
I'd rather leave "structured constant", as this is what it is: a
compile-time constant that is not just an integer, or a constant
constructor. Or just "compile-time constant" if you prefer. There's
nothing "relocatable" in this constant.
I chose the more conservative way. I wanted to add `translated from
Lambda.structured_constant` but that would have involved shifting all
the comments in the type definition to the left, which I didn't dare
doing.
> +- #11997: translate structured constants into their Obj.t representation
+ at compile time rather than run time. Changes the way dumpobj prints
I disagree with the "rather than run time" part. Currently, the translation is performed at link-time: static link-time in the bytecode linker, run-time link-time in Dynlink. Suggested change:
```suggestion
- #11997: translate structured constants into their Obj.t representation
at compile time rather than link time. Changes the way dumpobj prints
```
It's more accurate indeed, thanks. Done. And added you to the list of
reviewers in Changes.
|
This PR is stalled, and it might be because I haven't replied to @shindere's question
I think that if we really want to hash-cons the initial global table at link-time, we can still do it by working at the Moreover, I don't think something that we might do in the future (but haven't done in 25 years of OCaml yet) and might become a little bit more difficult should block an improvement that we can do now. |
Before this commit, relocatable literals under the Reloc_literal constructor were represented by objects of type Lambda.structured_constant in the CMO files. These objects were translated into their Obj.t representation as they were loaded. With this commit, the translation from Lambda.structured_constant to Obj.t occurs as part of the compilation rather than when the CMO file is loaded.
967f5e6
to
464100a
Compare
Xavier Leroy (2023/02/21 09:24 -0800):
This PR is stalled, and it might be because I haven't replied to
@shindere's question
Well it's rather because I have been clumsy in the first place, sorry
about that.
My intention was to make sure Alain's objection has been taken int
consideration although you did already approve the PR. Then I should
have asked the question in a wider way so that others also felt invited
to response, so sorry for having kept things more narrow than necessary.
> What do you think about the concerns raised ***@***.***, though?
I think that if we *really* want to hash-cons the initial global table
at link-time, we can still do it by working at the `Obj.t` level
instead of at the `structured_constant` level. It will be more ugly,
for sure, but I don't see any impossibility here.
Okay. After all, I can even imagine a variant of the format where we
could have either Obj.t or structured constants, I guess.
Moreover, I don't think something that we might do in the future (but
haven't done in 25 years of OCaml yet) and might become a little bit
more difficult should block an improvement that we can do now.
Okay, thanks for your feedback.
Consequently, I rebased the PR on latest trunk and re-did the bootstrap.
Now waiting until CI is happy and then I think I'll merge.
Thanks again to all those who have participated.
|
This PR has been taken out of #11996. It changes the way structured
constants are represented in CMO file. Before this PR, these constants were
stored using their internal representation as provided by the Lambda module.
With this PR constants are further translated at compile-time so that they
can be stored as objects of type
Obj.t
. In other words, with this PRcompilation goes one step further and thus less has to be done to read a CMO
file, since the translation step that took place at load time now takes
place at compile time.
This change is one step towards making file formats use standard types only
(that is, types provided by the standard library) rather than types
which are internal to the compiler.
As reported
here,
this PR has an impact on the output of
ocamlobjinfo
. Indeed, before thisPR constants were stored in a typed way, which is nolonger the case with
this PR.
ocamlobjinfo
can thus no longer print the constants as nicely asit did before, since it has less knowledge about them. That's why this PR
proposes to mark the corresponding change as a breakingchange.