Skip to content

Conversation

@dan-garvey
Copy link
Collaborator

@dan-garvey dan-garvey commented Nov 2, 2021

Probably should have just kept dropout separate. One issue with dropout is that it has an argument called p, which is the probability an element is zero'd. This has a naming conflict the the asm printer p. I'm not sure what the best way to avoid this is, and am open to suggestion. Currently to work I just edited the generated cpp file, but that obviously isn't upstreamable.

I've tried so far to model slice in a way that it easily extends to other slice like ops. My plan is to continue to generalize things out of the if (auto SliceTensor = dyn_cast<AtenSliceTensorOp>(op)) statement.

There is currently an issue with AtenSliceTensor when there is a start and end for each dimension. For a reason I have yet to determine, the final dimension always maintains the shape of the original tensor instead of being updated to the slice size. In the IR, it actually correctly determines the final dim, but somehow loses track of the initial dim:

I'm not too familiar with how the canonicalizer works, so I'm looking into that.

slice_mlir

@ramiro050
Copy link
Collaborator

A possible way to deal with the naming conflict could be to rename the argument name in the beginning of

def raw_emit_op(operator: JitOperator, f: TextIO, *, traits: List[str],

right before the ODS is generated.

For example, you could do

def raw_emit_op(operator: JitOperator, f: TextIO, *, traits: List[str],
                has_folder: bool, has_canonicalizer: bool):
    for arg in operator.arguments:
        if arg['name'] == 'p':
            arg['name'] = 'p0'

    # rest of function

@dan-garvey
Copy link
Collaborator Author

A possible way to deal with the naming conflict could be to rename the argument name in the beginning of

def raw_emit_op(operator: JitOperator, f: TextIO, *, traits: List[str],

right before the ODS is generated.

For example, you could do

def raw_emit_op(operator: JitOperator, f: TextIO, *, traits: List[str],
                has_folder: bool, has_canonicalizer: bool):
    for arg in operator.arguments:
        if arg['name'] == 'p':
            arg['name'] = 'p0'

    # rest of function

Yeah I considered this also but I don't know if continuing to add exceptional cases there scales very prettily. Maybe its fine for now?

@ramiro050
Copy link
Collaborator

Yeah I considered this also but I don't know if continuing to add exceptional cases there scales very prettily. Maybe its fine for now?

Hm, yeah. To make it scale, we can just have a global list of reserved keywords, and then have something like

for value in itertools.chain(operator.arguments, operator.returns):
    if value['name'] in RESERVED_KEYWORDS:
        value['name'] += '0'

Then, every time a reserved keyword becomes a problem, we just add it to RESERVED_KEYWORDS.

@silvasean
Copy link
Contributor

Created LLVM bug for this: https://bugs.llvm.org/show_bug.cgi?id=52387

Pending that, Ramiro's suggestion sounds good.

@jpienaar
Copy link
Member

jpienaar commented Nov 3, 2021

Yeah I considered this also but I don't know if continuing to add exceptional cases there scales very prettily. Maybe its fine for now?

Hm, yeah. To make it scale, we can just have a global list of reserved keywords, and then have something like

for value in itertools.chain(operator.arguments, operator.returns):
    if value['name'] in RESERVED_KEYWORDS:
        value['name'] += '0'

Then, every time a reserved keyword becomes a problem, we just add it to RESERVED_KEYWORDS.

This is similar to what we do in the Haskell bindings. Having such a mechanism is useful. Uniformly prefixing is another option then it is your own "namespace".

@ramiro050
Copy link
Collaborator

ramiro050 commented Nov 3, 2021

The issue with the AtenSliceTensor is very treaky! What is happening during the canonicalization pass is that the last slice is folded into a cast because the last slice (depending on your inputs) will have a form similar to

%13 = tensor.insert_slice %12 into %10[0, 0, 0] [1, 1, 2] [1, 1, 1] : tensor<?x1x2xf32> into tensor<1x1x2xf32>

Since the offset is zero for all dims, the result shape is statically known, and the stride is 1, this is the same as a cast, so the folding makes sense.

The problem at the moment is that there are subsequent casts that erase the folded cast knowledge, as can be seen from the IR before the folding:

func @forward(%arg0: !torch.vtensor<[?,?,?],f32>) -> !torch.vtensor<[?,?,?],f32> {
  %c1 = arith.constant 1 : index
  %c2 = arith.constant 2 : index
  %0 = torch_c.to_builtin_tensor %arg0 : !torch.vtensor<[?,?,?],f32> -> tensor<?x?x?xf32>
  %1 = tensor.dim %0, %c1 : tensor<?x?x?xf32>
  %2 = tensor.dim %0, %c2 : tensor<?x?x?xf32>
  %3 = linalg.init_tensor [1, %1, %2] : tensor<1x?x?xf32>
  %4 = tensor.cast %0 : tensor<?x?x?xf32> to tensor<1x?x?xf32>
  %5 = tensor.insert_slice %4 into %3[0, 0, 0] [1, %1, %2] [1, 1, 1] : tensor<1x?x?xf32> into tensor<1x?x?xf32>
  %6 = tensor.dim %5, %c2 : tensor<1x?x?xf32>
  %7 = linalg.init_tensor [1, 1, %6] : tensor<1x1x?xf32>
  %8 = tensor.cast %5 : tensor<1x?x?xf32> to tensor<1x1x?xf32>
  %9 = tensor.insert_slice %8 into %7[0, 0, 0] [1, 1, %6] [1, 1, 1] : tensor<1x1x?xf32> into tensor<1x1x?xf32>
  %10 = linalg.init_tensor [1, 1, 2] : tensor<1x1x2xf32>
  %11 = tensor.cast %10 : tensor<1x1x2xf32> to tensor<?x1x2xf32>
  %12 = tensor.cast %9 : tensor<1x1x?xf32> to tensor<?x1x2xf32>
  %13 = tensor.insert_slice %12 into %10[0, 0, 0] [1, 1, 2] [1, 1, 1] : tensor<?x1x2xf32> into tensor<1x1x2xf32>
  %14 = tensor.cast %13 : tensor<1x1x2xf32> to tensor<?x1x2xf32>
  %15 = tensor.cast %14 : tensor<?x1x2xf32> to tensor<?x?x?xf32>
  %16 = torch_c.from_builtin_tensor %15 : tensor<?x?x?xf32> -> !torch.vtensor<[?,?,?],f32>
  return %16 : !torch.vtensor<[?,?,?],f32>
}

This is happening because the RefineTypes is not being done correctly. At the moment all that RefineTypes is doing is assing the result shape to unknowns:

} else if (auto sliceTensor = dyn_cast<AtenSliceTensorOp>(op)) {
// Select several elements from the target dim according to the start,
// end, step. All the other dims are the same as input.
auto setDim = [](int64_t &targetDim, int64_t dim,
ArrayRef<LatticeElement<ValueKnowledge> *> operands) {
targetDim = kUnknownSize;

However, If the slicer is able to determine the final shape statically, RefineTypes should also be able to do this. Once this is fixed, the result tensor should be of the correct size. However, you might run into the cast problem I stumbled upon #376 (comment) when performing e2e testing.

Edit: The reason the folding into a cast is valid is because of the additional condition that the input tensor to the slice

%13 = tensor.insert_slice %12 into %10[0, 0, 0] [1, 1, 2] [1, 1, 1] : tensor<?x1x2xf32> into tensor<1x1x2xf32>

is compatible with the cast.

@cathyzhyi
Copy link
Contributor

The issue with the AtenSliceTensor is very treaky! What is happening during the canonicalization pass is that the last slice is folded into a cast because the last slice (depending on your inputs) will have a form similar to

%13 = tensor.insert_slice %12 into %10[0, 0, 0] [1, 1, 2] [1, 1, 1] : tensor<?x1x2xf32> into tensor<1x1x2xf32>

Since the offset is zero for all dims, the result shape is statically known, and the stride is 1, this is the same as a cast, so the folding makes sense.

The problem at the moment is that there are subsequent casts that erase the folded cast knowledge, as can be seen from the IR before the folding:

func @forward(%arg0: !torch.vtensor<[?,?,?],f32>) -> !torch.vtensor<[?,?,?],f32> {
  %c1 = arith.constant 1 : index
  %c2 = arith.constant 2 : index
  %0 = torch_c.to_builtin_tensor %arg0 : !torch.vtensor<[?,?,?],f32> -> tensor<?x?x?xf32>
  %1 = tensor.dim %0, %c1 : tensor<?x?x?xf32>
  %2 = tensor.dim %0, %c2 : tensor<?x?x?xf32>
  %3 = linalg.init_tensor [1, %1, %2] : tensor<1x?x?xf32>
  %4 = tensor.cast %0 : tensor<?x?x?xf32> to tensor<1x?x?xf32>
  %5 = tensor.insert_slice %4 into %3[0, 0, 0] [1, %1, %2] [1, 1, 1] : tensor<1x?x?xf32> into tensor<1x?x?xf32>
  %6 = tensor.dim %5, %c2 : tensor<1x?x?xf32>
  %7 = linalg.init_tensor [1, 1, %6] : tensor<1x1x?xf32>
  %8 = tensor.cast %5 : tensor<1x?x?xf32> to tensor<1x1x?xf32>
  %9 = tensor.insert_slice %8 into %7[0, 0, 0] [1, 1, %6] [1, 1, 1] : tensor<1x1x?xf32> into tensor<1x1x?xf32>
  %10 = linalg.init_tensor [1, 1, 2] : tensor<1x1x2xf32>
  %11 = tensor.cast %10 : tensor<1x1x2xf32> to tensor<?x1x2xf32>
  %12 = tensor.cast %9 : tensor<1x1x?xf32> to tensor<?x1x2xf32>
  %13 = tensor.insert_slice %12 into %10[0, 0, 0] [1, 1, 2] [1, 1, 1] : tensor<?x1x2xf32> into tensor<1x1x2xf32>
  %14 = tensor.cast %13 : tensor<1x1x2xf32> to tensor<?x1x2xf32>
  %15 = tensor.cast %14 : tensor<?x1x2xf32> to tensor<?x?x?xf32>
  %16 = torch_c.from_builtin_tensor %15 : tensor<?x?x?xf32> -> !torch.vtensor<[?,?,?],f32>
  return %16 : !torch.vtensor<[?,?,?],f32>
}

This is happening because the RefineTypes is not being done correctly. At the moment all that RefineTypes is doing is assing the result shape to unknowns:

} else if (auto sliceTensor = dyn_cast<AtenSliceTensorOp>(op)) {
// Select several elements from the target dim according to the start,
// end, step. All the other dims are the same as input.
auto setDim = [](int64_t &targetDim, int64_t dim,
ArrayRef<LatticeElement<ValueKnowledge> *> operands) {
targetDim = kUnknownSize;

However, If the slicer is able to determine the final shape statically, RefineTypes should also be able to do this. Once this is fixed, the result tensor should be of the correct size. However, you might run into the cast problem I stumbled upon #376 (comment) when performing e2e testing.

Edit: The reason the folding into a cast is valid is because of the additional condition that the input tensor to the slice

%13 = tensor.insert_slice %12 into %10[0, 0, 0] [1, 1, 2] [1, 1, 1] : tensor<?x1x2xf32> into tensor<1x1x2xf32>

is compatible with the cast.

@ramiro050 I think I fail to understand the issue. Generally speaking, the RefineType is allowed to give less accurate info and we rely on TorchToLinalg to get more accurate info. This is the reason we often have a cast op at the end of each TorchToLinalg conversion to up cast the more accurate type info to the type decided by RefineType.

I guess I didn't get what's going wrong. The IR looks all right to me.

  %13 = tensor.insert_slice %12 into %10[0, 0, 0] [1, 1, 2] [1, 1, 1] : tensor<?x1x2xf32> into tensor<1x1x2xf32>
  %14 = tensor.cast %13 : tensor<1x1x2xf32> to tensor<?x1x2xf32>
  %15 = tensor.cast %14 : tensor<?x1x2xf32> to tensor<?x?x?xf32>

@ramiro050
Copy link
Collaborator

@cathyzhyi I may be misunderstanding how the types work. Every cast I've seen done at the end of a TorchToLinalg conversion gets the result type using something of the form

RankedTensorType resultType =
typeConverter->convertType(op->getResult(0).getType()).cast<RankedTensorType>();

Which uses information from RefineTypes. I've also checked that this is the case by implementing the RefineTypes part, and the result MLIR is

module attributes {torch.debug_module_name = "MyModule"}  {
  func @forward(%arg0: tensor<?x?x?xf32>) -> tensor<1x1x2xf32> {                                                                            
    %c2 = arith.constant 2 : index
    %c1 = arith.constant 1 : index
    %0 = tensor.dim %arg0, %c1 : tensor<?x?x?xf32>
    %1 = tensor.dim %arg0, %c2 : tensor<?x?x?xf32>
    %2 = linalg.init_tensor [1, %0, %1] : tensor<1x?x?xf32>
    %3 = tensor.cast %arg0 : tensor<?x?x?xf32> to tensor<1x?x?xf32>
    %4 = tensor.insert_slice %3 into %2[0, 0, 0] [1, %0, %1] [1, 1, 1] : tensor<1x?x?xf32> into tensor<1x?x?xf32>
    %5 = linalg.init_tensor [1, 1, %1] : tensor<1x1x?xf32>
    %6 = tensor.cast %4 : tensor<1x?x?xf32> to tensor<1x1x?xf32>
    %7 = tensor.insert_slice %6 into %5[0, 0, 0] [1, 1, %1] [1, 1, 1] : tensor<1x1x?xf32> into tensor<1x1x?xf32>
    %8 = tensor.cast %7 : tensor<1x1x?xf32> to tensor<1x1x2xf32>
    return %8 : tensor<1x1x2xf32>
  }
}

As can be seen, the result type of the function is set by the RefinePublicReturn pass, which is then the information used in the final cast of a TorchToLinalg conversion.

@ramiro050
Copy link
Collaborator

ramiro050 commented Nov 3, 2021

For context, I am doing the following computation:

def forward(self, x):
        return x[0:1, 1:2, 1:3]

@cathyzhyi
Copy link
Contributor

Yea, type conversion like this

Type newResultType = getTypeConverter()->convertType(op.getType());
rewriter.replaceOpWithNewOp<tensor::CastOp>(op, newResultType, conv2d);
should be used only for getting element type info or rank or the castOp to up cast to the type decided by RefineType. We should not depend on it for shape information.

Is the issue here the result type of the function is set by the RefinePublicReturn pass doesn't work for dynamic shape? Are you trying to make the return shape static? I think I am still not seeing the issue.

@dan-garvey
Copy link
Collaborator Author

dan-garvey commented Nov 3, 2021

Yea, type conversion like this

Type newResultType = getTypeConverter()->convertType(op.getType());
rewriter.replaceOpWithNewOp<tensor::CastOp>(op, newResultType, conv2d);

should be used only for getting element type info or rank or the castOp to up cast to the type decided by RefineType. We should not depend on it for shape information.
Is the issue here the result type of the function is set by the RefinePublicReturn pass doesn't work for dynamic shape? Are > you trying to make the return shape static? I think I am still not seeing the issue.

The issue is that the final result ends up being the wrong shape, and even contains the values that should have sliced away.

return x[0:5:1, 0:3:1, 0:4:1]
results in:
https://gist.github.com/dan-garvey/47a0877dc64ec14920d9717817bf2738

while return x[0:5:1, :, 0:4:1] is successful

@cathyzhyi
Copy link
Contributor

cathyzhyi commented Nov 3, 2021

After some discussion with @ramiro050, we believe there seems to be a weird canonicalization pattern for insert_slice. I filed a bug upstream here https://bugs.llvm.org/show_bug.cgi?id=52391. For now, using extract_slice is probably more efficient anyways so we could use extract_slice instead.

@silvasean
Copy link
Contributor

The "p" parameter issue should be fixed in https://reviews.llvm.org/rGa7fc39f21353b657af941e84f964aa0607c92e93 Just need to update llvm.

@cathyzhyi
Copy link
Contributor

cathyzhyi commented Nov 4, 2021

@ramiro050 @dan-garvey Just FYI, we used tensor.insert_slice incorrectly. see https://llvm.discourse.group/t/meaning-of-sizes-for-tensor-insert-slice/4669/2?u=cathyzhyi.

@ramiro050
Copy link
Collaborator

@ramiro050 @dan-garvey Just FYI, we used tensor.insert_slice incorrectly. see https://llvm.discourse.group/t/meaning-of-sizes-for-tensor-insert-slice/4669/2?u=cathyzhyi.

Oh, I suppose that makes a some sense. So the folding we saw is correct then.

I do, however, find the documentation very misleading. From the documentation (https://mlir.llvm.org/docs/Dialects/TensorOps/#tensorinsert_slice-mlirtensorinsertsliceop)

The insert_slice operation supports the following arguments:

  • source: the tensor that is inserted.
  • dest: the tensor into which the source tensor is inserted.
  • offsets: tensor-rank number of offsets into the dest tensor into which the slice is inserted.
  • sizes: tensor-rank number of sizes which specify the sizes of the result tensor type.
  • strides: tensor-rank number of strides that specify subsampling in each dimension.

I suppose this has to be a typo because I don't know how to interpret this other than as: sizes represents the size of the output tensor.

@ramiro050
Copy link
Collaborator

I do, however, find the documentation very misleading. From the documentation (https://mlir.llvm.org/docs/Dialects/TensorOps/#tensorinsert_slice-mlirtensorinsertsliceop)

oh, I just realized your first comment @cathyzhyi on Discourse mentions this issue with the documentation 😂

@cathyzhyi
Copy link
Contributor

@ramiro050 @dan-garvey Just FYI, we used tensor.insert_slice incorrectly. see https://llvm.discourse.group/t/meaning-of-sizes-for-tensor-insert-slice/4669/2?u=cathyzhyi.

Oh, I suppose that makes a some sense. So the folding we saw is correct then.

I do, however, find the documentation very misleading. From the documentation (https://mlir.llvm.org/docs/Dialects/TensorOps/#tensorinsert_slice-mlirtensorinsertsliceop)

The insert_slice operation supports the following arguments:

  • source: the tensor that is inserted.
  • dest: the tensor into which the source tensor is inserted.
  • offsets: tensor-rank number of offsets into the dest tensor into which the slice is inserted.
  • sizes: tensor-rank number of sizes which specify the sizes of the result tensor type.
  • strides: tensor-rank number of strides that specify subsampling in each dimension.

I suppose this has to be a typo because I don't know how to interpret this other than as: sizes represents the size of the output tensor.

yea, the wording is ambiguous here. I guess it means the result of restoring the reduced-rank for the reduced rank case but not the output tensor. I will try to update the doc upstream to make it more clear.

@dan-garvey dan-garvey force-pushed the aten-slice-select-dropout branch 3 times, most recently from 5784843 to 65cc5f9 Compare November 10, 2021 00:27
@dan-garvey dan-garvey force-pushed the aten-slice-select-dropout branch 3 times, most recently from 2ac3c33 to e1b5a32 Compare November 12, 2021 05:36
@dan-garvey dan-garvey force-pushed the aten-slice-select-dropout branch from 4cf5547 to c868e89 Compare November 19, 2021 20:50
@dan-garvey dan-garvey force-pushed the aten-slice-select-dropout branch 2 times, most recently from e1b5a32 to 611f411 Compare November 22, 2021 21:28
@cathyzhyi
Copy link
Contributor

@dan-garvey this contains multiple ops. The slice and select are related but dropout and AtenAddInt are quite separate. Can you split the dropout and AtenAddint into separate PRs So that it's easier to merge in those smaller ones first?

@dan-garvey dan-garvey mentioned this pull request Nov 23, 2021
@dan-garvey dan-garvey force-pushed the aten-slice-select-dropout branch 3 times, most recently from 2e4cfc5 to ec404ca Compare November 29, 2021 21:26
@dan-garvey dan-garvey changed the title Add lowering for slice, select int, dropout Add lowering for slice and select int Nov 30, 2021
@dan-garvey dan-garvey force-pushed the aten-slice-select-dropout branch 3 times, most recently from dd0953a to 703502d Compare November 30, 2021 23:21
@dan-garvey dan-garvey force-pushed the aten-slice-select-dropout branch from 703502d to b6cc375 Compare December 1, 2021 19:30
SmallVector<Value> inputShape = getTensorSizes(rewriter, loc, input);
Value dimSize = inputShape[dim];

auto adjustStartOrEnd = [](Location loc, Value dimSize, Value &startOrEnd,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using [&] could capture the variable from context and Location loc, Value dimSize, ConversionPatternRewriter &rewriter can be used directly rather than passing as arguments. The logic to check NoneType can also be wrapped into the lambda. The lambda become

auto adjustStartOrEnd = [&](Value startOrEndTorch, Value startOrEndBuiltIn, Value valueForNone) {
  if (startOrEndTorchType.getType().isa<NoneType>())
      return valueForNone;
...
}

@dan-garvey dan-garvey force-pushed the aten-slice-select-dropout branch from b6cc375 to 4c223c9 Compare December 2, 2021 19:16
Copy link
Contributor

@cathyzhyi cathyzhyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!!

# ==============================================================================

# This Test currently xfails due to https://github.com/llvm/torch-mlir/issues/448
class SliceStartEqEnd_Module(torch.nn.Module):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SliceStartEqEnd_Module => SliceStartEqEndModule

@dan-garvey dan-garvey force-pushed the aten-slice-select-dropout branch from 4c223c9 to a4f7dcf Compare December 3, 2021 00:22
@dan-garvey dan-garvey force-pushed the aten-slice-select-dropout branch from a4f7dcf to b15324b Compare December 3, 2021 03:54
@dan-garvey dan-garvey merged commit a52aded into llvm:main Dec 3, 2021
@dan-garvey dan-garvey deleted the aten-slice-select-dropout branch December 3, 2021 04:09
sjarus pushed a commit to sjarus/torch-mlir that referenced this pull request Dec 3, 2021
@AmosLewis
Copy link
Collaborator

AmosLewis commented Dec 20, 2022

Looks like the DecomposeAtenSelectIntOp led to a bug in distilgpt2. The gdb trace is in the comment. @dan-garvey
https://gist.github.com/AmosLewis/465795bfa1a4004fb96c742666515027

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants