New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly perform side effects for "/" and "mod" (PR#7533) #1173

Merged
merged 2 commits into from May 15, 2017

Conversation

Projects
None yet
4 participants
@mshinwell
Contributor

mshinwell commented May 15, 2017

This is a first attempt at a patch for PR#7533, where side effects in dividends occurring in "/" or "mod" expressions may be erroneously deleted.

mshinwell added some commits May 15, 2017

@mshinwell mshinwell added the bug label May 15, 2017

@mshinwell mshinwell added this to the 4.05.0 milestone May 15, 2017

@stedolan

This comment has been minimized.

Show comment
Hide comment
@stedolan

stedolan May 15, 2017

Contributor

Looks good.

Contributor

stedolan commented May 15, 2017

Looks good.

@lpw25

This comment has been minimized.

Show comment
Hide comment
@lpw25

lpw25 May 15, 2017

Contributor

Looks right to me

Contributor

lpw25 commented May 15, 2017

Looks right to me

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche May 15, 2017

Member

There is a difference between the two bind calls on the dividend and the divisor: for the divisor, we bind because the value may be used more than once, so not-binding could duplicate computation (and effects). For the dividend in this patch, we bind because the value may not be used, but we want to ensure its effects are not deleted. This means that for this bind it would perfectly correct (and generate better code) to do a substitution directly if the value is pure. Is this substitution performed later? If not, would it make sense to have separate bind_several_uses and bind_atmost_one_use variants to indicate intent, and have the second just pass pure values to its function parameter?

P.S.: I just looked at cmmgen, it seems that bind_load is going in this direction, but a variant with more complete behavior and clearer name could still be nice.

Member

gasche commented May 15, 2017

There is a difference between the two bind calls on the dividend and the divisor: for the divisor, we bind because the value may be used more than once, so not-binding could duplicate computation (and effects). For the dividend in this patch, we bind because the value may not be used, but we want to ensure its effects are not deleted. This means that for this bind it would perfectly correct (and generate better code) to do a substitution directly if the value is pure. Is this substitution performed later? If not, would it make sense to have separate bind_several_uses and bind_atmost_one_use variants to indicate intent, and have the second just pass pure values to its function parameter?

P.S.: I just looked at cmmgen, it seems that bind_load is going in this direction, but a variant with more complete behavior and clearer name could still be nice.

@mshinwell

This comment has been minimized.

Show comment
Hide comment
@mshinwell

mshinwell May 15, 2017

Contributor

I think it's worth considering improvements in this area, but we should do the most straightforward fix first, especially given the history of bugs in this area.

@lpw25 and I were discussing trying to force these terms into a normal form (as Flambda does) so that this sort of problem doesn't arise. One way might be to partition side-effecting and non-side-effecting terms; then I think the kind of separation of "bind" functions that you propose would fall out naturally.

Contributor

mshinwell commented May 15, 2017

I think it's worth considering improvements in this area, but we should do the most straightforward fix first, especially given the history of bugs in this area.

@lpw25 and I were discussing trying to force these terms into a normal form (as Flambda does) so that this sort of problem doesn't arise. One way might be to partition side-effecting and non-side-effecting terms; then I think the kind of separation of "bind" functions that you propose would fall out naturally.

@mshinwell

This comment has been minimized.

Show comment
Hide comment
@mshinwell

mshinwell May 15, 2017

Contributor

(By "force" I mean change the types of expressions so that side effecting operations are forced to be let-bound.)

Contributor

mshinwell commented May 15, 2017

(By "force" I mean change the types of expressions so that side effecting operations are forced to be let-bound.)

@gasche gasche merged commit fd5c5c8 into ocaml:4.05 May 15, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche May 15, 2017

Member

Let's merge this, then.

Member

gasche commented May 15, 2017

Let's merge this, then.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche May 15, 2017

Member

"mantis user jmi" is Jan Midtgaard, I'll add better attribution to the Changelog eventually.

Member

gasche commented May 15, 2017

"mantis user jmi" is Jan Midtgaard, I'll add better attribution to the Changelog eventually.

gasche added a commit that referenced this pull request May 15, 2017

Merge pull request #1173 from mshinwell/pr7533-4.05
Correctly perform side effects for "/" and "mod" (PR#7533)
@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche May 15, 2017

Member

I cherry-picked in trunk as e55ac0a.

Member

gasche commented May 15, 2017

I cherry-picked in trunk as e55ac0a.

@gasche

This comment has been minimized.

Show comment
Hide comment
@gasche

gasche May 15, 2017

Member

I had a further look at cmmgen. There are some opportunities for doing the refinement I suggested in div_int, mod_int, safe_divmod_bi, lookup_{tag,label}, bigarray_{get,set}, and in the Cached case of the Usend translation.

There is code to analyse purity of expressions in two parts of the backend:

  • there is a nice effect-analysis of code for Clambda expressions in bytecomp/semantics_of_primitives
  • there is effect-and-coeffect analysis code for Cmm in the code selection pass (asmcomp/selectgen and asmcomp//selection).

I have the impression that the effect analysis loses precision as the program representation becomes more low-level: Clambda primitives that are known to be pure (or "generative", so erasable) become translated in ways that are harder to prove pure (instruction selection has architecture-specific parts to reason about which primitives get turned into assembly instead of a C call on which precision is lost). This suggests that, for the purpose of this question (can the evaluation of an unused expression be erased without changing the program semantics?), relying on Clambda-level information is better than relying on Cmm-level information (which is hard to flow from Selection to Cmmgen anyway).

I think that if we wanted to do this, one could consider changing the transl_* functions of Cmmgen to produce not a Cmm.expression but a pair Cmm.expression * bool indicating whether the original Clambda expression was pure (in the sense of Closure.no_effects). Helper functions such as div_int could access this information.

The main question, of course, is whether this would actually matter on real-world code.

Just for fun, I implemented the Selection.effects_of approach ( gasche@e351278 ), but I don't really know how I would go about assessing the impact of the change. On one artificial example I just tried (let test x y = x / y), the code produced is different (the untagging of x after after the division-by-zero test rather than before it), but of the exact same size and I wouldn't know about performance differences.

Member

gasche commented May 15, 2017

I had a further look at cmmgen. There are some opportunities for doing the refinement I suggested in div_int, mod_int, safe_divmod_bi, lookup_{tag,label}, bigarray_{get,set}, and in the Cached case of the Usend translation.

There is code to analyse purity of expressions in two parts of the backend:

  • there is a nice effect-analysis of code for Clambda expressions in bytecomp/semantics_of_primitives
  • there is effect-and-coeffect analysis code for Cmm in the code selection pass (asmcomp/selectgen and asmcomp//selection).

I have the impression that the effect analysis loses precision as the program representation becomes more low-level: Clambda primitives that are known to be pure (or "generative", so erasable) become translated in ways that are harder to prove pure (instruction selection has architecture-specific parts to reason about which primitives get turned into assembly instead of a C call on which precision is lost). This suggests that, for the purpose of this question (can the evaluation of an unused expression be erased without changing the program semantics?), relying on Clambda-level information is better than relying on Cmm-level information (which is hard to flow from Selection to Cmmgen anyway).

I think that if we wanted to do this, one could consider changing the transl_* functions of Cmmgen to produce not a Cmm.expression but a pair Cmm.expression * bool indicating whether the original Clambda expression was pure (in the sense of Closure.no_effects). Helper functions such as div_int could access this information.

The main question, of course, is whether this would actually matter on real-world code.

Just for fun, I implemented the Selection.effects_of approach ( gasche@e351278 ), but I don't really know how I would go about assessing the impact of the change. On one artificial example I just tried (let test x y = x / y), the code produced is different (the untagging of x after after the division-by-zero test rather than before it), but of the exact same size and I wouldn't know about performance differences.

gasche added a commit to gasche/ocaml that referenced this pull request May 16, 2017

bind_affine: a hacky attempt at reducing the number of Clet generated…
… by cmmgen

(see #1173)

Using Selection from Cmmgen is not very nice -- and currently the code
would be broken for any arch that is not amd64, although that could be
easily fixed. It would be better to rely on the result of
Clambda-level effect analysis, and propagate them around within
Cmmgen.

camlspotter pushed a commit to camlspotter/ocaml that referenced this pull request Oct 17, 2017

Merge pull request #1173 from mshinwell/pr7533-4.05
Correctly perform side effects for "/" and "mod" (PR#7533)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment