Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix compilation of nested packs #12609

Merged
merged 3 commits into from
Nov 21, 2023
Merged

Fix compilation of nested packs #12609

merged 3 commits into from
Nov 21, 2023

Conversation

lthls
Copy link
Contributor

@lthls lthls commented Sep 26, 2023

Follow-up from #1391.
That PR was meant to prevent accessing a packed module outside its pack, but it failed to properly account for nested packs.

Copy link
Contributor

@chambart chambart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@chambart
Copy link
Contributor

chambart commented Sep 26, 2023

This probably warrant a bug fix release (sorry)

@Octachron
Copy link
Member

Indeed, I will cut one once a plan has been settled for the zstd issue.

@gasche
Copy link
Member

gasche commented Sep 26, 2023

(cross-referencing #12581 which reported the issue at hand here.)

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the implementation too verbose -- see inline comment.

As someone who hasn't followed the issue closely, I would like to read a bit more details about what is going on. Could someone explain, as if I had completely forgotten what packs are, what is the compilation model of packs, why we get errors when accessing things from inside the pack, and why "nested packs" are in fact okay?

(Does the compiler know about "nested packs" or is this an emergent property? When we pass -for-pack Foo.Bar, does the compiler somehow know (before this PR) that the . is a separator?)

if p2.[l1] = '.' then ()
else error ()
end
else error ());
Copy link
Member

@gasche gasche Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too much code. Could you:

  1. Move this to an auxiliary function.
  2. Write a comment to explain what the criterion is.
  3. Write a simpler implementation that is close to the comment?

For example:

String.equal p1 p2 || String.starts_with ~prefix:(p1 ^ ".") p2

@lthls
Copy link
Contributor Author

lthls commented Sep 28, 2023

The pack model is to take a bunch of compilation units, and to put them together to create a single compilation unit with a sub-module for each input unit.
In bytecode, in addition to creating the new unit this replaces the Psetglobal and Pgetglobal primitives in the input units by prefixed version (packing a.cmo and b.cmo into p.cmo will end up with three Psetglobal primitives, for P.A, P.B and P). All of this can be done at packing time, and compiling the units a.cmo and b.cmo with -for-pack P is not strictly necessary. It is also possible to link both a.cmo and p.cmo in the same executable, as they won't conflict. Linking the original a.cmo is not a good idea, as you will end up with two copies of the same module, but linking another unrelated unit named a.cmo is supported (one might say it's the whole point).

In native mode, we don't want to patch object files, whose format is mostly out of our control. So to be able to link in the same executable both a.cmx, packed in p.cmx, and another a.cmx that is not packed, we need to make sure they define different symbols for their toplevel modules. That's where -for-pack comes into play: a.cmx would normally define a symbol named camlA, but if it was compiled with -for_pack P then it will instead define camlP.A (or camlP__A in older versions). The actual packing operation becomes simply concatenating the object files (plus a simple new object for the pack itself), and creating a .cmx file that subsumes the individual ones.

The difference with bytecode is that it becomes impossible to link the original a.cmx with p.cmx, as it would introduce duplicate symbols. Flambda also detects this early when a module outside the pack depends on both the pack and one of the individual modules, as it will read two .cmx files defining the same symbol (this is the error reported at #7645).

Nested packs are not special. You can create one using -pack Inner -for-pack Outer, and the modules in the inner pack need to be compiled with -for-pack Outer.Inner. Both the bytecode and native scheme support arbitrary nesting of packs.
As to what is allowed to refer to what, the rule is simple: you can refer to a module that is going to be packed only if the current module will ultimately end up in the pack of the imported unit. This implies that the pack has not been created yet, which ensures that there are no conflicts. This also ensures that a build system removing/hiding individual .cmx (or .cmo) files after packing is not going to break anything.
This prevents weird cases where you compile a.cmx, to be packed in p.cmx, but before creating the pack you compile c.cmx, which uses A directly (not through a pack). This case would have worked before (unless your build system starts making the pack too early), but is now considered an error. Note that nothing in p.cmx could have depended on c.cmx, as otherwise there wouldn't be any valid linking order.

The rule was implemented too strictly in #1391, as Outer.Inner.X will end up in Outer.Inner which will end up in the same pack as Outer.Y.

(Does the compiler know about "nested packs" or is this an emergent property? When we pass -for-pack Foo.Bar, does the compiler somehow know (before this PR) that the . is a separator?)

Yes and no. In bytecode, this doesn't matter, as discussed elsewhere. In native code, before #11430 there was specific code to translate . in pack names to the module separator __, but this is now unnecessary.
Apparently pack names are never validated, so you can create packs with arbitrary strings in their names. You will of course not be able to refer to them if they're not valid module names.
In native code only (and since before #1391), when creating a pack the compiler checks that all packed units were compiled with the expected -for-pack option, and that also happens to enforce . as the separator for nested packs, but that's all.

@gasche
Copy link
Member

gasche commented Oct 11, 2023

I am abstractly convinced that this makes sense, but I have not made the actual work of thinking deeply about this and convincing myself in the details that this makes sense. However, I don't see myself having the time do this before at least October 20th, and I would rather not block the process. I propose to work from @chambart's approval and @lthls masterful explanation above, and merge this.

(Someone needs to fix the Changes conflict, and I think maybe Vincent is in holidays now? @Octachron, would you maybe rebase the Changes and merge? We also need to backport this in 5.1, so you want to look at it anyway.)

@Octachron
Copy link
Member

Changes file refreshed, and I have tested that camlp4 works without structural changes with this patch. Once the CI is green, I will merge and cherry-pick to the 5.1 branch.

@Octachron Octachron merged commit ecf19e0 into ocaml:trunk Nov 21, 2023
9 checks passed
Octachron pushed a commit that referenced this pull request Nov 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants