Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: move the UTF8 transform to the compiler core #995

Merged
merged 3 commits into from
Dec 21, 2023

Conversation

anmonteiro
Copy link
Member

  • found in ppx: show strange quoted strings transform #863
  • moves the utf8 transformations from the ppx to the compiler core, to make sure they're applied last
  • still missing the integration with [@@mel.inline], as can be seen in the snapshotted test case

Copy link
Member

@jchavarri jchavarri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure i fully understand what the utf8 string processing does, but I have a couple of questions:

  • Does this PR remove the need of ppx authors to process j* attrs? If yes, I understand it'll be done in another PR? couldn't find any refs to that in this one.
  • Should Melange be able to process utf strings like \u{1F680}? I tried on the playground but the ppx will error out.
  • Do you think it'd make sense to test the UTF8 end to end with the ppx? Right now I can see some direct tests of Utf8_string.Interp.transform_test but I can't find anything that uses the j or js attrs.
  • Finally, what are the downsides of this approach vs the options we discussed in the original issue?

Thanks!

| ( Some ({ attr_name = { txt; loc }; _ } as attr),
Pexp_constant (Pconst_string (s, _, Some dec)) ) -> (
match
Melange_ffi.Utf8_string.Interp.transform ~loc ~delim:dec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the transform have to be moved from the expression to the structure item section?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what's happening here. This code concerns only [@@mel.inline], which transforms let x = {j| ... |j} [@@mel.inline] into an external with Ffi_inline_const (Const_string {s;unicode}).

We perform this transformation in the PPX because there's code that can be inlined, and code that can't. Example:

let x = {j| foo |j} [@@mel.inline] (* <-- constant string, can be inlined *)

(* will be transformed into `"foo" ^ x`, which isn't a constant, and cannot be inlined *)
let y = {j| foo $(x) |j} [@@mel.inline] 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what's happening here.

I am confused. This code seems to have been moved from line 405, where it was inside the method! expression e section, into this section, which is the method! structure_item str one.

We perform this transformation in the PPX because there's code that can be inlined, and code that can't.

Makes sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s moved to structure item to catch the only the inlined cases. The remaining transformations happen in the core lib now

@anmonteiro
Copy link
Member Author

Does this PR remove the need of ppx authors to process j* attrs? If yes, I understand it'll be done in another PR? couldn't find any refs to that in this one.

Yes, it does. By moving the transformation to the compiler, there are no longer cases where *j appears in the AST that PPX authors interact with.

Should Melange be able to process utf strings like \u{1F680}? I tried on the playground but the ppx will error out.

This is not a Melange issue. It works if you use OCaml syntax, and it's not a feature of {j| ... |j}. Js.log "\u{1F680}" will do what you expect.

For Reason syntax, you're running into reasonml/reason#2337

Do you think it'd make sense to test the UTF8 end to end with the ppx? Right now I can see some direct tests of Utf8_string.Interp.transform_test but I can't find anything that uses the j or js attrs.

These are tested in the runtime tests under jscomp/test. The fact that we don't see a diff in the snapshotted JS means that they're working end-to-end. In fact, you'll see that the first commit (3e6f7d6) broke {j| .. |j} + [@@mel.inline] and produced such a diff.

Finally, what are the downsides of this approach vs the options we discussed in the original issue?

The downsides here are:

  • we call Utf8_string.Interp.transform in 2 places rather than 1:
    1. in the compiler core, for processing regular strings
    2. in the PPX, to transform [@@mel.inline]d strings into externals
  • the unicode string transformation is sort of a "compiler frontend" job that's being done in the core compiler (which is a bit against [melange.ppx]: move the entire PPX frontend to melange.ppx #583), but needs to run after all the external PPXes have been applied.

Other than that, I think this actually solves the problem quite cleanly and has the additional benefit of merging the utf8 string code back into one module in jscomp/common.

@@ -22,6 +22,7 @@ module Ppx_entry = struct

let rewrite_implementation (ast : Parsetree.structure) : Parsetree.structure =
Ast_config.iter_on_mel_config_stru ast;
let ast = Melange_ffi.Utf8_string.rewrite_structure ast in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this will have impact on compilation times? Is there a way to measure it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think so. What we were doing in the ppx before, we’re doing here now. The additional transformation is happening for Mel.inline which is a very small subset of cases

@anmonteiro anmonteiro merged commit 0735ef1 into main Dec 21, 2023
4 checks passed
@anmonteiro anmonteiro deleted the anmonteiro/move-utf8-transform branch December 21, 2023 19:37
anmonteiro added a commit that referenced this pull request Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants