Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make string/bytes distinguishable in bytecode #1617

Closed
wants to merge 10 commits into
base: trunk
from

Conversation

Projects
None yet
8 participants
@hhugo
Copy link
Contributor

hhugo commented Feb 17, 2018

Follow up to #596.
This PR move the distinction between bytes and string down to the bytecode.
This should allow the js_of_ocaml compiler to choose different runtime representation for bytes and string (when safe-string is on).

The most important changes are:

  • In bytecode, Bytes.unsafe_to_string & Bytes.unsafe_of_string are no longer no-op and now call into the C runtime. The native compilation is unchanged.
  • Add a new bytecode instruction for %string_unsafe_get, different from the instruction used for %bytes_unsafe_get
@hhugo

This comment has been minimized.

Copy link
Contributor Author

hhugo commented Feb 17, 2018

For people reading emails only, I've amended the feature description:

In bytecode, Bytes.unsafe_to_string & Bytes.unsafe_of_string are no longer no-op and now call into the C runtime. The native compilation is unchanged.

"%caml_string_set32u", Pstring_set_32(true);
"%caml_string_set64", Pstring_set_64(false);
"%caml_string_set64u", Pstring_set_64(true);
"%caml_string_set16", Pbytes_set_16(false);

This comment has been minimized.

@nojb

nojb Feb 20, 2018

Contributor

Is there a good reason to keep the old names %caml_string_set*? I agree that backwards compatibility is generally good, but am just wondering.

This comment has been minimized.

@hhugo

hhugo Feb 20, 2018

Author Contributor

As @xclerc pointed out, a library such as ocp-indent will benefit from this backwards compatible change.

This comment has been minimized.

@gasche

gasche Feb 20, 2018

Member

Several packages were broken by a previous version of a similar change by Hongbo that turned %string_safe_set into %bytes_safe_set -- see #596 (comment) . I would be wary of changing the external-referenceable name of a primitive.

/**
* [caml_string_set16] is deprecated,
* use [caml_bytes_set16] instead
*/

This comment has been minimized.

@nojb

nojb Feb 20, 2018

Contributor

What is the reason for keeping these caml_string_set* C primitives? AFAICS they are not exposed to user code and not used by the bytecode interpreter.

This comment has been minimized.

@xclerc

xclerc Feb 20, 2018

Contributor

I guess it is related to your previous comment, but someone might
reference them through external, e.g. ocp-endian.

This comment has been minimized.

@nojb

nojb Feb 20, 2018

Contributor

But they would reference the %caml_string_set* name, not the C primitive, right?

This comment has been minimized.

@xclerc

xclerc Feb 20, 2018

Contributor

Sorry, you meant "make the old OCaml primitive name point to a new C symbol"?

I think there could still be a problem, with an .o file generated by a previous version
of the compiler and then linked with the new runtime; however, if magic numbers are
bumped it should not be a problem (cannot remember whether magic numbers are
automatically bumped nowadays).

This comment has been minimized.

@nojb

nojb Feb 20, 2018

Contributor

Yes, bumping the magic number would do the trick; I don't think supporting "old-with-new" linking is a good idea anyway.

This comment has been minimized.

@hhugo

hhugo Feb 20, 2018

Author Contributor

I've made the change.

@hhugo hhugo force-pushed the hhugo:safe-string2 branch 2 times, most recently from 7ff7b7b to ef36776 Feb 20, 2018

@hhugo

This comment has been minimized.

Copy link
Contributor Author

hhugo commented Feb 20, 2018

I expect the tests to pass now.
Bootstrapping was required in order to remove primitives from the runtime.

@hhugo

This comment has been minimized.

Copy link
Contributor Author

hhugo commented Feb 20, 2018

File "/home/travis/build/ocaml/ocaml/toplevel/opttopmain.ml", line 213, characters 24-29:
Error: Unbound value unset
Hint: Did you mean set?
make: *** [toplevel/opttopmain.cmx] Error 2

I'm not sure what this is about..

@shindere

This comment has been minimized.

Copy link
Contributor

shindere commented Feb 20, 2018

@hhugo hhugo force-pushed the hhugo:safe-string2 branch from ef36776 to df6f5b4 Feb 20, 2018

@hhugo

This comment has been minimized.

Copy link
Contributor Author

hhugo commented Feb 21, 2018

Marshal.to_string and Marshal.to_bytes were calling into the same c function. It is now fixed.

@nojb

This comment has been minimized.

Copy link
Contributor

nojb commented Feb 27, 2018

This patch looks correct to me, and it should be backwards compatible thanks to leaving the name of old externals caml_string_set* untouched.

However, I think the cmo magic number needs to be bumped due to the bytecode change.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

xavierleroy commented Mar 2, 2018

Warning: this PR contains a bootstrap. (Several, actually.) According to house regulations it will have to be merged manually by a core developer.

@trefis

This comment has been minimized.

Copy link
Contributor

trefis commented Mar 6, 2018

However, I think the cmo magic number needs to be bumped due to the bytecode change.

Indeed, but I think it was agreed recently that we'd change all the magic numbers between each major release anyway.
For example, we've recently merged some changes to types.ml without changing the magic number of cmis just yet.

@damiendoligez

This comment has been minimized.

Copy link
Member

damiendoligez commented Mar 7, 2018

@hhugo could you reorder the commits (moving up the marshall commit) so we only have to deal with two bootstraps instead of three?

Also, rebase Changes (but not the bootstrap compilers, they are a nice reminder that we need to merge manually).

@hhugo hhugo force-pushed the hhugo:safe-string2 branch from b901fc4 to 6104ce5 Mar 7, 2018

@hhugo hhugo force-pushed the hhugo:safe-string2 branch from 6104ce5 to 66ee4af Mar 7, 2018

@hhugo

This comment has been minimized.

Copy link
Contributor Author

hhugo commented Mar 7, 2018

Not sure what you meant by

Also, rebase Changes (but not the bootstrap compilers, they are a nice reminder that we need to merge manually).

This PR now only include 2 bootstraps

@hhugo

This comment has been minimized.

Copy link
Contributor Author

hhugo commented Mar 13, 2018

I guess this PR needs a formal approval and a core-dev to manual merge & redo the bootstraps.
@nojb, would you be able to do this ?

@nojb

This comment has been minimized.

Copy link
Contributor

nojb commented Mar 13, 2018

@hhugo Yes, will do.

@nojb

nojb approved these changes Mar 13, 2018

nojb added a commit that referenced this pull request Mar 15, 2018

Merge pull request #1659 from nojb/safe-string2
Rebase of #1617 (make string/bytes distinguishable in bytecode)
@nojb

This comment has been minimized.

Copy link
Contributor

nojb commented Mar 15, 2018

Merged as #1659, thanks!

@nojb nojb closed this Mar 15, 2018

@hhugo

This comment has been minimized.

Copy link
Contributor Author

hhugo commented Mar 15, 2018

Thanks for all the boostraps

@hhugo hhugo deleted the hhugo:safe-string2 branch Mar 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.