Skip to content
Permalink
Browse files

add TouchInst for buffers that don't need to be initialized (#4000)

Summary:
Per #3903 when IRGening ConcatNode we insert a `SplatInst` so the output buffer is initialized. In that case since we overwrite the entire buffer we don't need that splat and it's wasted work. This PR solves the problem by doing nothing, or rather by adding a new instruction `TouchInst` which does nothing except mark its output as initialized. I've only updated the case of Concat, but there may be other situations we use `splat` as an initializer for memory that is overwritten.

Documentation: N/A
Pull Request resolved: #4000

Test Plan:
Ran tests in debug and release. This code is covered by the Concat operator tests for backends that use IR - Interpreter, CPU and OpenCL (Habana and NNPI use Nodes instead).

Fixes #3903.

Differential Revision: D19376756

Pulled By: nickgg

fbshipit-source-id: 2eccc64f3501090c5919b8162502586877eaf716
  • Loading branch information
nickgg authored and facebook-github-bot committed Jan 14, 2020
1 parent 87f00a3 commit bc8a675d06d3abed925365670555137cc0b39779
@@ -1392,6 +1392,10 @@ void BoundInterpreterFunction::fwdSplatInst(const glow::SplatInst *I) {
llvm_unreachable("Unsupported tensor type");
}

void BoundInterpreterFunction::fwdTouchInst(const glow::TouchInst *) {
// Do nothing for a TouchInst
}

void BoundInterpreterFunction::fwdInsertTensorInst(
const glow::InsertTensorInst *I) {
Tensor *outT = getTensor(I->getDest());
@@ -736,7 +736,7 @@ Error OpenCLFunction::execute(ExecutionContext *context) {
for (const auto &I : F_->getInstrs()) {
// Skip memory allocation instructions as they are NOPs.
if (isa<AllocActivationInst>(I) || isa<DeallocActivationInst>(I) ||
isa<TensorViewInst>(I)) {
isa<TensorViewInst>(I) || isa<TouchInst>(I)) {
continue;
}
// The kernels are named after the name of the instruction, plus the "W"
@@ -269,7 +269,9 @@ void IRGenVisitor::post(Node *parent, Node *N) {

auto *dest = builder_.createAllocActivationInst(CC->getName(),
CC->getResult().getType());
builder_.createSplatInst(CC->getName(), dest, 0);
// Mark the buffer as initialized, this is safe since the InsertTensors
// below will fully overwrite the buffer.
builder_.createTouchInst(CC->getName(), dest);
auto inputs = CC->getInputs();

// We start inserting to the shape at (0,0, ... ).
@@ -972,6 +972,10 @@ void LLVMIRGen::generateLLVMIRForDataParallelInstr(
ARITHMETIC_UNARY_OP_WITH_IMM_CASE(Splat, "splat", Value);
#undef ARITHMETIC_UNARY_OP_WITH_IMM_CASE

case Kinded::Kind::TouchInstKind:
// do nothing;
break;

case Kinded::Kind::ElementSelectInstKind: {
auto *ES = cast<ElementSelectInst>(I);
auto *dest = ES->getDest();
@@ -621,6 +621,11 @@ int main(int argc, char **argv) {
.autoVerify(VerifyKind::NoVerify)
.autoIRGen();

BB.newInstr("Touch")
.addOperand("Dest", OperandKind::Out)
.dataParallel()
.autoVerify(VerifyKind::NoVerify);

BB.newInstr("InsertTensor")
.addOperand("Dest", OperandKind::InOut)
.addOperand("Src", OperandKind::In)

0 comments on commit bc8a675

Please sign in to comment.
You can’t perform that action at this time.