Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #8769 -- "index out of bounds" when compiling packed module #8770

Merged
merged 4 commits into from Jun 28, 2019

Conversation

@lpw25
Copy link
Contributor

commented Jun 26, 2019

When compiling alias coercions we generated invalid Pfield primitives. This didn't matter normally because they were always just removed by Simplif. However, Simplif was not called when creating a -pack. This PR stops generating invalid Pfield primitives, and starts calling Simplif when creating a -pack.

Fixes #8769

@lpw25

This comment has been minimized.

Copy link
Contributor Author

commented Jun 26, 2019

I'll add the failing code as a test tomorrow

@gasche

This comment has been minimized.

Copy link
Member

commented Jun 26, 2019

There is another suspicious usage of pos_cc_listin Translmod.transl_package, calling get_component g.(pos). Is this also unsafe?

(Maybe we should add to typing/TODO.md that positions in structures should become int option instead of carrying around negative values?)

@@ -91,7 +91,10 @@ let rec apply_coercion loc strict restr arg =
(fun _ -> apply_coercion loc Alias cc lam)

and apply_coercion_field loc get_field (pos, cc) =
apply_coercion loc Alias cc (get_field pos)
if pos >= 0 then
apply_coercion loc Alias cc (get_field pos)

This comment has been minimized.

Copy link
@nojb

nojb Jun 27, 2019

Contributor

Would it make sense to add assert (pos >= 0) in the definition of get_field?

This comment has been minimized.

Copy link
@lpw25

lpw25 Jun 27, 2019

Author Contributor

I've now moved the check itself into get_field which is slightly cleaner and more consistent with the other get_field function that appears in this file.

@gasche

This comment has been minimized.

Copy link
Member

commented Jun 27, 2019

Here is a reference to the original PR that revealed the bug: #1610.

I find this part of the codebase confusing to read, and maybe a few comments would help. Some questions below.

I understand that some module components are "erased" and others are "runtime". Only runtime components have a valid position. The pos field is -1 for erased components, and a consecutive sequence starting from 0 for runtime components.

One would thus expect the runtime representation of modules to be a block whose size is the number of runtime components. This is what the definitions of get_field suggest -- it gets values by indexing a block at the pos index. But this is not what apply_coercion does in the Tcoerce_structure case, where it returns a block whose size is the total number of components, ordered according to their positions in the pos_cc_field list. transl_structure also builds a block whose size is the size of pos_cc_list. What gives?

Is the intent that erased components still get a slot in the runtime value, but their value is just ()? In that case, in the definition of apply_coercion_field in your patch, why are you returning apply_coercion loc Alias cc lambda_unit in that case instead of just lambda_unit? (This seems definitely wrong in, say, the Tcoerce_structure case, and possibly an important difference in the Tcoerce_primitive case.)

@gasche

This comment has been minimized.

Copy link
Member

commented Jun 27, 2019

In the Tcoerce_structure case of compose_coercions, why is v2.(pos1) correct?

@lpw25

This comment has been minimized.

Copy link
Contributor Author

commented Jun 27, 2019

The pos_cc_field list describes how to construct the runtime components of one module from the runtime components of another. The construction is done by fetching values out of the old module at given positions and then applying coercions to them.

So a pos_cc_field list like [(3, Tcoerce_none)] means build a single (runtime) component module by fetching the 3rd component out of the old module and using it directly without a coercion.

The issue comes from the coercions Tcoerce_primitive and Tcoerce_alias -- they coerce from a non-runtime component to a runtime component. That means that the position of their input is -1 which is the position given to non-runtime components.

Since #1610 the use of -1 as the position of non-runtime components has mostly disappeared. This use in pos_cc_field is probably the last one. A better representation for pos_cc_field would be one that did not require a position when the coercion was Tcoerce_primitive or Tcoerce_alias, and I would like to make a PR with that change -- but for trunk rather than 4.08/4.09 where I think this simple fix is more appropriate.

@lpw25 lpw25 force-pushed the lpw25:fix-8769 branch from 90edcbe to bc4196e Jun 27, 2019

@gasche

This comment has been minimized.

Copy link
Member

commented Jun 27, 2019

A better representation for pos_cc_field would be one that did not require a position when the coercion was Tcoerce_primitive or Tcoerce_alias, and I would like to make a PR with that change -- but for trunk rather than 4.08/4.09 where I think this simple fix is more appropriate.

Yes, I agree with the reasoning of having a cleaner refactor in trunk and a less-invasive fix in release branches.

Looking at the code more: instead of changing apply_coercion_field in the way you are doing, why don't you change the definition of get_field in apply_coercion to return lambda_unit on negative positions, just like the transl_structure version is doing?

With your version, I cannot convince myself that wrap_id_loc_list is correct on negative positions when called from apply_coercion, while it's immediate with mine.

@lpw25

This comment has been minimized.

Copy link
Contributor Author

commented Jun 27, 2019

Looking at the code more: instead of changing apply_coercion_field in the way you are doing, why don't you change the definition of get_field in apply_coercion to return lambda_unit on negative positions, just like the transl_structure version is doing?

I had the same observation and pushed the same change just a few of minutes before your comment went up.

@gasche

gasche approved these changes Jun 27, 2019

Copy link
Member

left a comment

Good to go (but it should have a Changes entry).

@lpw25

This comment has been minimized.

Copy link
Contributor Author

commented Jun 27, 2019

I've added 8769 as a regression test. For the changes entry, should I add a new section for 4.08.1 or should I just rebase this over onto the 4.09 branch? Basically, are we going to make a 4.08.1?

@gasche

This comment has been minimized.

Copy link
Member

commented Jun 27, 2019

4.08 already has a post-release Changes section, and I think it could go there. (I would copy this section in the Changes file of the 4.09 branch, placed before the 4.09 part.) I don't know whether we will have a 4.08.1. My guess is that we will find out depending on the bugs reported -- this one for example.

@lpw25 lpw25 force-pushed the lpw25:fix-8769 branch from b404d5f to 0aa8156 Jun 28, 2019

@lpw25

This comment has been minimized.

Copy link
Contributor Author

commented Jun 28, 2019

Ah, my PR was against an old version of the 4.08 branch that did not yet have the post-release Changes section. I've rebased it and added a Changes entry.

@gasche

This comment has been minimized.

Copy link
Member

commented Jun 28, 2019

I think it would be nice to also credit the bug reporter in the Changes entry.

(..., report by Fabian @copy)

@lpw25 lpw25 force-pushed the lpw25:fix-8769 branch from 0aa8156 to 2dbf6a7 Jun 28, 2019

@lpw25

This comment has been minimized.

Copy link
Contributor Author

commented Jun 28, 2019

Good point. Done.

@gasche

This comment has been minimized.

Copy link
Member

commented Jun 28, 2019

I think you should feel free to merge when the CI is happy again. Can you take care of the cherry-picking as well? (And: thanks for the fix!)

@lpw25 lpw25 merged commit c4dc6ba into ocaml:4.08 Jun 28, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

lpw25 added a commit that referenced this pull request Jun 28, 2019

Fix #8769 (#8770)
* Don't generate illegal Pfield's when compiling alias coercions

* Simplify lambda code when compiling packs

* Add regression test for pr8769

* Add Changes entry
@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Jun 28, 2019

CI is broken on all platforms:

File "/home/barsac/ci/builds/workspace/main/flambda/false/label/ocaml-linux-64/bytecomp/bytepackager.ml", line 198, characters 12-35:
198 |   let lam = Simplif.simplify_lambda target_name lam in
                  ^^^^^^^^^^^^^^^^^^^^^^^
Error: This function has type Lambda.lambda -> Lambda.lambda
       It is applied to too many arguments; maybe you forgot a `;'.
@lpw25

This comment has been minimized.

Copy link
Contributor Author

commented Jun 28, 2019

Yeah, cherry-picking accident. Fix incoming.

lpw25 added a commit that referenced this pull request Jun 28, 2019

Fix #8769 (#8770)
* Don't generate illegal Pfield's when compiling alias coercions

* Simplify lambda code when compiling packs

* Add regression test for pr8769

* Add Changes entry
@lpw25

This comment has been minimized.

Copy link
Contributor Author

commented Jun 28, 2019

Ok, cherry-picked onto trunk as 11edca8 and onto 4.09 as 686756a and 18c0f50.

@gasche gasche changed the title Fix #8769 Fix #8769 -- "index out of bounds" when compiling packed module Jul 23, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.