-
Notifications
You must be signed in to change notification settings - Fork 10.6k
[AutoDiff] Fix over-consume when differentiating tuple instruction.
#28257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Instruction visitors in `PullbackEmitter` should not consume adjoint values preemptively, becuase adjoint values are managed "globally" in a basic block by the block temporary mechanism. The crasher in TF-962 was caused by the previously unexercised logic in `PullbackEmitter::visitTupleInst`, where a `destructure_tuple` instruction is emitted with an adjoint value being its operand. This casues an over-consume. This patch fixes this by creating a copy of the adjoint value before destructuring it, and recording all destructured elements as block temporaries. TODO: The differentiation transform rarely visits a `tuple` instruction. More tests should be added, for example, cases where the tuple type's tangent type is not a tuple (`([Float], Int)`). Resolves TF-962.
dan-zheng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
Instruction visitors in
PullbackEmittershould not consume adjoint values preemptively, becuase adjoint values are managed "globally" in a basic block by the block temporary mechanism.
I'll keep this principle in mind!
|
@swift-ci Please test tensorflow |
Thanks for pointing this out! |
dan-zheng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rxwei!
…#28257) Instruction visitors in `PullbackEmitter` should not consume adjoint values preemptively, because adjoint values are managed "globally" in a basic block by the block temporary mechanism. The crasher in TF-962 was caused by the previously unexercised logic in `PullbackEmitter::visitTupleInst`, where a `destructure_tuple` instruction is emitted with an adjoint value being its operand. This causes an over-consume. This patch fixes this by creating a copy of the adjoint value before destructuring it, and recording all destructured elements as block temporaries. TODO: The differentiation transform rarely visits `tuple` instructions. More tests should be added, for example, cases where the tuple type's tangent type is not a tuple (`([Float], Int)`). Resolves TF-962.
Instruction visitors in
PullbackEmittershould not consume adjoint values preemptively, becuase adjoint values are managed "globally" in a basic block by the block temporary mechanism. The crasher in TF-962 was caused by the previously unexercised logic inPullbackEmitter::visitTupleInst, where adestructure_tupleinstruction is emitted with an adjoint value being its operand. This causes an over-consume.This patch fixes this by creating a copy of the adjoint value before destructuring it, and recording all destructured elements as block temporaries.
TODO: The differentiation transform rarely visits a
tupleinstruction. More tests should be added, for example, cases where the tuple type's tangent type is not a tuple (([Float], Int)).Resolves TF-962.