Skip to content

Conversation

@rxwei
Copy link
Contributor

@rxwei rxwei commented Nov 14, 2019

Instruction visitors in PullbackEmitter should not consume adjoint values preemptively, becuase adjoint values are managed "globally" in a basic block by the block temporary mechanism. The crasher in TF-962 was caused by the previously unexercised logic in PullbackEmitter::visitTupleInst, where a destructure_tuple instruction is emitted with an adjoint value being its operand. This causes an over-consume.

This patch fixes this by creating a copy of the adjoint value before destructuring it, and recording all destructured elements as block temporaries.

TODO: The differentiation transform rarely visits a tuple instruction. More tests should be added, for example, cases where the tuple type's tangent type is not a tuple (([Float], Int)).

Resolves TF-962.

Instruction visitors in `PullbackEmitter` should not consume adjoint values preemptively, becuase adjoint values are managed "globally" in a basic block by the block temporary mechanism. The crasher in TF-962 was caused by the previously unexercised logic in `PullbackEmitter::visitTupleInst`, where a `destructure_tuple` instruction is emitted with an adjoint value being its operand. This casues an over-consume.

This patch fixes this by creating a copy of the adjoint value before destructuring it, and recording all destructured elements as block temporaries.

TODO: The differentiation transform rarely visits a `tuple` instruction. More tests should be added, for example, cases where the tuple type's tangent type is not a tuple (`([Float], Int)`).

Resolves TF-962.
@rxwei rxwei added the tensorflow This is for "tensorflow" branch PRs. label Nov 14, 2019
@rxwei rxwei requested review from dan-zheng and marcrasi November 14, 2019 09:56
@dan-zheng dan-zheng requested a review from bgogul November 14, 2019 10:02
Copy link
Contributor

@dan-zheng dan-zheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

Instruction visitors in PullbackEmitter should not consume adjoint values preemptively, becuase adjoint values are managed "globally" in a basic block by the block temporary mechanism.

I'll keep this principle in mind!

@dan-zheng
Copy link
Contributor

@swift-ci Please test tensorflow

@dan-zheng
Copy link
Contributor

TODO: The differentiation transform rarely visits a tuple instruction. More tests should be added, for example, cases where the tuple type's tangent type is not a tuple (([Float], Int)).

Thanks for pointing this out!
Filed TF-964 to track crash for active tuple instruction with non-tuple tangent type.
The fix is simple but has some complications with other pending patches (#28259 (comment)).

Copy link
Contributor

@dan-zheng dan-zheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rxwei!

@dan-zheng dan-zheng merged commit 79160af into swiftlang:tensorflow Nov 15, 2019
bgogul pushed a commit that referenced this pull request Nov 19, 2019
…#28257)

Instruction visitors in `PullbackEmitter` should not consume adjoint values preemptively, because adjoint values are managed "globally" in a basic block by the block temporary mechanism. The crasher in TF-962 was caused by the previously unexercised logic in `PullbackEmitter::visitTupleInst`, where a `destructure_tuple` instruction is emitted with an adjoint value being its operand. This causes an over-consume.

This patch fixes this by creating a copy of the adjoint value before destructuring it, and recording all destructured elements as block temporaries.

TODO: The differentiation transform rarely visits `tuple` instructions. More tests should be added, for example, cases where the tuple type's tangent type is not a tuple (`([Float], Int)`).

Resolves TF-962.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tensorflow This is for "tensorflow" branch PRs.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants