Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #8769 -- "index out of bounds" when compiling packed module #8770

Merged
merged 4 commits into from Jun 28, 2019

Conversation

@lpw25
Copy link
Contributor

@lpw25 lpw25 commented Jun 26, 2019

When compiling alias coercions we generated invalid Pfield primitives. This didn't matter normally because they were always just removed by Simplif. However, Simplif was not called when creating a -pack. This PR stops generating invalid Pfield primitives, and starts calling Simplif when creating a -pack.

Fixes #8769

@lpw25
Copy link
Contributor Author

@lpw25 lpw25 commented Jun 26, 2019

I'll add the failing code as a test tomorrow

@gasche
Copy link
Member

@gasche gasche commented Jun 26, 2019

There is another suspicious usage of pos_cc_listin Translmod.transl_package, calling get_component g.(pos). Is this also unsafe?

(Maybe we should add to typing/TODO.md that positions in structures should become int option instead of carrying around negative values?)

@@ -91,7 +91,10 @@ let rec apply_coercion loc strict restr arg =
(fun _ -> apply_coercion loc Alias cc lam)

and apply_coercion_field loc get_field (pos, cc) =
apply_coercion loc Alias cc (get_field pos)
if pos >= 0 then
apply_coercion loc Alias cc (get_field pos)
Copy link
Contributor

@nojb nojb Jun 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add assert (pos >= 0) in the definition of get_field?

Copy link
Contributor Author

@lpw25 lpw25 Jun 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now moved the check itself into get_field which is slightly cleaner and more consistent with the other get_field function that appears in this file.

@gasche
Copy link
Member

@gasche gasche commented Jun 27, 2019

Here is a reference to the original PR that revealed the bug: #1610.

I find this part of the codebase confusing to read, and maybe a few comments would help. Some questions below.

I understand that some module components are "erased" and others are "runtime". Only runtime components have a valid position. The pos field is -1 for erased components, and a consecutive sequence starting from 0 for runtime components.

One would thus expect the runtime representation of modules to be a block whose size is the number of runtime components. This is what the definitions of get_field suggest -- it gets values by indexing a block at the pos index. But this is not what apply_coercion does in the Tcoerce_structure case, where it returns a block whose size is the total number of components, ordered according to their positions in the pos_cc_field list. transl_structure also builds a block whose size is the size of pos_cc_list. What gives?

Is the intent that erased components still get a slot in the runtime value, but their value is just ()? In that case, in the definition of apply_coercion_field in your patch, why are you returning apply_coercion loc Alias cc lambda_unit in that case instead of just lambda_unit? (This seems definitely wrong in, say, the Tcoerce_structure case, and possibly an important difference in the Tcoerce_primitive case.)

@gasche
Copy link
Member

@gasche gasche commented Jun 27, 2019

In the Tcoerce_structure case of compose_coercions, why is v2.(pos1) correct?

@lpw25
Copy link
Contributor Author

@lpw25 lpw25 commented Jun 27, 2019

The pos_cc_field list describes how to construct the runtime components of one module from the runtime components of another. The construction is done by fetching values out of the old module at given positions and then applying coercions to them.

So a pos_cc_field list like [(3, Tcoerce_none)] means build a single (runtime) component module by fetching the 3rd component out of the old module and using it directly without a coercion.

The issue comes from the coercions Tcoerce_primitive and Tcoerce_alias -- they coerce from a non-runtime component to a runtime component. That means that the position of their input is -1 which is the position given to non-runtime components.

Since #1610 the use of -1 as the position of non-runtime components has mostly disappeared. This use in pos_cc_field is probably the last one. A better representation for pos_cc_field would be one that did not require a position when the coercion was Tcoerce_primitive or Tcoerce_alias, and I would like to make a PR with that change -- but for trunk rather than 4.08/4.09 where I think this simple fix is more appropriate.

@gasche
Copy link
Member

@gasche gasche commented Jun 27, 2019

A better representation for pos_cc_field would be one that did not require a position when the coercion was Tcoerce_primitive or Tcoerce_alias, and I would like to make a PR with that change -- but for trunk rather than 4.08/4.09 where I think this simple fix is more appropriate.

Yes, I agree with the reasoning of having a cleaner refactor in trunk and a less-invasive fix in release branches.

Looking at the code more: instead of changing apply_coercion_field in the way you are doing, why don't you change the definition of get_field in apply_coercion to return lambda_unit on negative positions, just like the transl_structure version is doing?

With your version, I cannot convince myself that wrap_id_loc_list is correct on negative positions when called from apply_coercion, while it's immediate with mine.

@lpw25
Copy link
Contributor Author

@lpw25 lpw25 commented Jun 27, 2019

Looking at the code more: instead of changing apply_coercion_field in the way you are doing, why don't you change the definition of get_field in apply_coercion to return lambda_unit on negative positions, just like the transl_structure version is doing?

I had the same observation and pushed the same change just a few of minutes before your comment went up.

gasche
gasche approved these changes Jun 27, 2019
Copy link
Member

@gasche gasche left a comment

Good to go (but it should have a Changes entry).

@lpw25
Copy link
Contributor Author

@lpw25 lpw25 commented Jun 27, 2019

I've added #8769 as a regression test. For the changes entry, should I add a new section for 4.08.1 or should I just rebase this over onto the 4.09 branch? Basically, are we going to make a 4.08.1?

@gasche
Copy link
Member

@gasche gasche commented Jun 27, 2019

4.08 already has a post-release Changes section, and I think it could go there. (I would copy this section in the Changes file of the 4.09 branch, placed before the 4.09 part.) I don't know whether we will have a 4.08.1. My guess is that we will find out depending on the bugs reported -- this one for example.

@lpw25
Copy link
Contributor Author

@lpw25 lpw25 commented Jun 28, 2019

Ah, my PR was against an old version of the 4.08 branch that did not yet have the post-release Changes section. I've rebased it and added a Changes entry.

@gasche
Copy link
Member

@gasche gasche commented Jun 28, 2019

I think it would be nice to also credit the bug reporter in the Changes entry.

(..., report by Fabian @copy)

@lpw25
Copy link
Contributor Author

@lpw25 lpw25 commented Jun 28, 2019

Good point. Done.

@gasche
Copy link
Member

@gasche gasche commented Jun 28, 2019

I think you should feel free to merge when the CI is happy again. Can you take care of the cherry-picking as well? (And: thanks for the fix!)

@lpw25 lpw25 merged commit c4dc6ba into ocaml:4.08 Jun 28, 2019
2 checks passed
lpw25 added a commit that referenced this issue Jun 28, 2019
* Don't generate illegal Pfield's when compiling alias coercions

* Simplify lambda code when compiling packs

* Add regression test for pr8769

* Add Changes entry
@xavierleroy
Copy link
Contributor

@xavierleroy xavierleroy commented Jun 28, 2019

CI is broken on all platforms:

File "/home/barsac/ci/builds/workspace/main/flambda/false/label/ocaml-linux-64/bytecomp/bytepackager.ml", line 198, characters 12-35:
198 |   let lam = Simplif.simplify_lambda target_name lam in
                  ^^^^^^^^^^^^^^^^^^^^^^^
Error: This function has type Lambda.lambda -> Lambda.lambda
       It is applied to too many arguments; maybe you forgot a `;'.

@lpw25
Copy link
Contributor Author

@lpw25 lpw25 commented Jun 28, 2019

Yeah, cherry-picking accident. Fix incoming.

lpw25 added a commit that referenced this issue Jun 28, 2019
* Don't generate illegal Pfield's when compiling alias coercions

* Simplify lambda code when compiling packs

* Add regression test for pr8769

* Add Changes entry
@lpw25
Copy link
Contributor Author

@lpw25 lpw25 commented Jun 28, 2019

Ok, cherry-picked onto trunk as 11edca8 and onto 4.09 as 686756a and 18c0f50.

@gasche gasche changed the title Fix #8769 Fix #8769 -- "index out of bounds" when compiling packed module Jul 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants