Add QLinearConcat for DML EP by zhangxiang1993 · Pull Request #16971 · microsoft/onnxruntime

zhangxiang1993 · 2023-08-02T15:42:14Z

[ OK ] QLinearConcatS8.ExpectFail_WrongZeroPointType_1 (372 ms)
[ RUN ] QLinearConcatS8.InputOne_Dynamic
[ OK ] QLinearConcatS8.InputOne_Dynamic (255 ms)
[ RUN ] QLinearConcatS8.InputOne_Const
[ OK ] QLinearConcatS8.InputOne_Const (255 ms)
[----------] 11 tests from QLinearConcatS8 (3385 ms total)

[----------] Global test environment tear-down
[==========] 21 tests from 3 test suites ran. (9355 ms total)
[ PASSED ] 21 tests.

zhangxiang1993 · 2023-08-02T16:02:33Z

+
+        // broadcast y_scale and y_zero_point to output shape
+        m_inputTensorDescs[OnnxInputIndex::yScale] = TensorDesc(
+            kernelCreationContext.GetInputEdgeDescription(OnnxInputIndex::yScale).tensorDataType,


kernelCreationContext.GetInputEdgeDescription(OnnxInputIndex::yScale).tensorDataType

yScaleDataType #Resolved

zhangxiang1993 · 2023-08-02T16:02:43Z

+        );
+
+        m_inputTensorDescs[OnnxInputIndex::yZeroPoint] = TensorDesc(
+            kernelCreationContext.GetInputEdgeDescription(OnnxInputIndex::yZeroPoint).tensorDataType,


kernelCreationContext.GetInputEdgeDescription(OnnxInputIndex::yZeroPoint).tensorDataType

yZeroPointDataType #Closed

zhangxiang1993 · 2023-08-02T16:03:15Z

+
+            // broadcast x_scale and x_zero_point to shape of corresponding x
+            m_inputTensorDescs[tuple_start + 1] = TensorDesc(
+                kernelCreationContext.GetInputEdgeDescription(tuple_start + 1).tensorDataType,


kernelCreationContext.GetInputEdgeDescription(tuple_start + 1).tensorDataType

xScaleDataType #Closed

zhangxiang1993 · 2023-08-02T16:03:26Z

+            );
+
+            m_inputTensorDescs[tuple_start + 2] = TensorDesc(
+                kernelCreationContext.GetInputEdgeDescription(tuple_start + 2).tensorDataType,


kernelCreationContext.GetInputEdgeDescription(tuple_start + 2).tensorDataType

xZeroPointDataType #Resolved

sumitsays · 2023-08-02T16:12:53Z

    // Given an axis in ONNX axis numbering, return the axis adjusted for DML based on how the sizes have been coerced.
    // Note this function presumes the axis attribute is relative to the first input tensor (which is always the case).
+    uint32_t GetDmlAdjustedAxis(int32_t onnxAxis, const MLOperatorKernelCreationContext& kernelCreationContext, uint32_t dmlDimCount, uint32_t firstInputIndex);
    uint32_t GetDmlAdjustedAxis(int32_t onnxAxis, const MLOperatorKernelCreationContext& kernelCreationContext, uint32_t dmlDimCount);


Should we assign 0 as a default value to firstInputIndex parameter and remove the 2nd overloaded method? #Resolved

sumitsays · 2023-08-02T16:16:39Z

+{
+// QLinearConcat = Dequantize + Join + Quantize
+// This kernel is the first usage of graph based implementation
+class DmlOperatorQLinearConcat : public DmlOperator, public QLinearConcatHelper


[nit] Should we remove this comment? #Resolved

sumitsays · 2023-08-02T16:25:19Z

+        std::vector<DML_ELEMENT_WISE_DEQUANTIZE_LINEAR_OPERATOR_DESC> dequantizeOperatorDescs(input_count);
+        std::vector<DML_OPERATOR_DESC> dmlOpDesc(input_count);
+        std::vector<const DML_OPERATOR_DESC*> opDescs = {};
+        for (uint32_t input_index = 0; input_index < input_count; ++input_index)


[nit] Is there any specific reason we have used = {} to initialize this particular std::vector? #Closed

        static const int sc_sinceVer_QuickGelu = 1;
        static const int sc_sinceVer_GroupNorm = 1;
        static const int sc_sinceVer_DynamicQuantizeMatMul = 1;
+        static const int sc_sinceVer_QLinearConcat = 1;


AnaghaRaoAMD · 2023-08-02T21:37:20Z

+constexpr static std::array<SupportedTensorDataTypes, 3> supportedTypeListQLinearConcat= {
+    SupportedTensorDataTypes::Float32,
+    SupportedTensorDataTypes::Ints8Bit,
+    SupportedTensorDataTypes::Ints8Bit|SupportedTensorDataTypes::Float32,


contribop mentions TF supports any float tensor type, should we also consider supporting fp16?

I think only tensor(float) is specified, tensor(float16) is for fp16
Type Constraints
T8 : tensor(uint8), tensor(int8)
Constrain input and output types to 8 bit signed and unsigned tensors.
TF : tensor(float)
Constrain scale types to any float tensor type.
TV : tensor(uint8), tensor(int8), tensor(float)
Sequence of (Tensor, Scale, ZeroPoint) tuples. The type is sequence of (T8, TF, T8).

fdwr · 2023-08-04T19:01:49Z

+    // This order matches the ONNX schema.
+    enum OnnxInputIndex
+    {
+        yScale,


yScale

Suggested change

yScale,

YScale

Consistent casing with Count. #Closed

fdwr · 2023-08-04T19:02:34Z

+        QLinearConcatHelper(kernelCreationContext, kernelCreationContext.GetTensorShapeDescription())
+    {
+
+        DmlOperator::Initialize(kernelCreationContext);


[nit] extra blank line #Closed

fdwr · 2023-08-04T19:04:33Z

+        auto outputShape = kernelCreationContext.GetTensorShapeDescription().GetOutputTensorShape(0);
+
+        // inputs: {y_scale, y_zero_point, tuple(x_tensor, x_scale, x_zero_point)}
+        uint32_t input_def_count = kernelCreationContext.GetInputCount();


input_def_count

inputDefinitionCount

Naming intrafile consistency. (also, DML and DML EP always use 🐪, not 🐍) #Closed

fdwr · 2023-08-04T19:06:01Z

+        std::vector<const DML_OPERATOR_DESC*> opDescs;
+        for (uint32_t input_index = 0; input_index < input_count; ++input_index)
+        {
+            auto tuple_start = 2 + input_index * 3;


tuple_start

tupleStartIndex #Resolved

fdwr · 2023-08-04T19:06:38Z

+        std::vector<DML_ELEMENT_WISE_DEQUANTIZE_LINEAR_OPERATOR_DESC> dequantizeOperatorDescs(input_count);
+        std::vector<DML_OPERATOR_DESC> dmlOpDesc(input_count);
+        std::vector<const DML_OPERATOR_DESC*> opDescs;
+        for (uint32_t input_index = 0; input_index < input_count; ++input_index)


input_count

inputCount

etcetera... #Closed

fdwr · 2023-08-04T19:10:46Z

+        // inputs: {y_scale, y_zero_point, tuple(x_tensor, x_scale, x_zero_point)}
+        uint32_t input_def_count = kernelCreationContext.GetInputCount();
+        ML_CHECK_VALID_ARGUMENT(input_def_count >= 5 && (input_def_count - 2) % 3 == 0,
+              "Each input must be (tensor, scale, zero_point) tuple!");


[](http://example.com/codeflow?start=8&length=6)

[nit] 6->4 space indent.

Alternately, consider splitting the long line more readably.

ML_CHECK_VALID_ARGUMENT( input_def_count >= 5 && (input_def_count - 2) % 3 == 0, "Each input must be (tensor, scale, zero_point) tuple!" );

Or better yet, splitting it into two separate conditions, which would make it much clearer the specific error to the user. Also, then the lines are not so long and don't need to wrap:

ML_CHECK_VALID_ARGUMENT(inputDefinitionCount >= 5, "Require at least 5 inputs."); ML_CHECK_VALID_ARGUMENT((inputDefinitionCount - 2) % 3 == 0, "Each input must be (tensor, scale, zero_point) tuple!");

Which is like what you do below, two separate ones rather than &&ing them:

ML_CHECK_VALID_ARGUMENT(xScaleDataType == yScaleDataType, "Wrong input type encountered for scale"); ML_CHECK_VALID_ARGUMENT(xZeroPointDataType == yZeroPointDataType, "Wrong input type encountered for zero point"); ``` #Resolved

fdwr · 2023-08-04T19:13:59Z

+                TensorAxis::W,
+                TensorAxis::RightAligned,
+                NchwDimensionCount, // minDimensionCount
+                0 // guaranteedBaseOffsetAlignment)


)

0 // guaranteedBaseOffsetAlignment) ->
0 // guaranteedBaseOffsetAlignment #Resolved

[nit] Indent also inconsistent from all the other TensorDesc calls (8 instead of 4)

fdwr · 2023-08-04T19:15:48Z

+        joinDesc.OutputTensor = &namedJoinOutputTensorDesc;
+        joinDesc.Axis = dmlAxis;
+
+        const DML_OPERATOR_DESC opJoinDesc{DML_OPERATOR_JOIN, &joinDesc};


Suggested change

const DML_OPERATOR_DESC opJoinDesc{DML_OPERATOR_JOIN, &joinDesc};

const DML_OPERATOR_DESC opJoinDesc = {DML_OPERATOR_JOIN, &joinDesc};

[nit] Consistent assignment style with nearby MLOperatorGraphDesc operatorGraphDesc = {};. #Closed

fdwr

Nits, but otherwise looks good Xiang.

+            ML_CHECK_VALID_ARGUMENT(xZeroPointDataType == yZeroPointDataType, "Wrong input type encountered for zero point");
+
+            // broadcast x_scale and x_zero_point to shape of corresponding x
+            m_inputTensorDescs[tupleStartIndex + 1] = TensorDesc(


+                0 // guaranteedBaseOffsetAlignment
+            );
+
+            m_inputTensorDescs[tupleStartIndex + 2] = TensorDesc(


+            namedDequantizeOperatorDescs[inputIndex] = intermediateOutputTensorDescs[inputIndex].GetDmlDesc();
+
+            dequantizeOperatorDescs[inputIndex].InputTensor = &inputDescs[tupleStartIndex];
+            dequantizeOperatorDescs[inputIndex].ScaleTensor = &inputDescs[tupleStartIndex + 1];


+
+            dequantizeOperatorDescs[inputIndex].InputTensor = &inputDescs[tupleStartIndex];
+            dequantizeOperatorDescs[inputIndex].ScaleTensor = &inputDescs[tupleStartIndex + 1];
+            dequantizeOperatorDescs[inputIndex].ZeroPointTensor = &inputDescs[tupleStartIndex + 2];


fdwr · 2023-08-17T20:53:42Z

@@ -65,7 +65,7 @@ namespace Dml

    // Given an axis in ONNX axis numbering, return the axis adjusted for DML based on how the sizes have been coerced.
    // Note this function presumes the axis attribute is relative to the first input tensor (which is always the case).


Note this function presumes the axis attribute is relative to the first input tensor (which is always the case).

Stale comment. It's not always the case anymore. #Resolved

fdwr

fdwr

TY, XZ.

### Description [Cherry Pick Reviewed] ``` [ OK ] QLinearConcatS8.ExpectFail_WrongZeroPointType_1 (372 ms) [ RUN ] QLinearConcatS8.InputOne_Dynamic [ OK ] QLinearConcatS8.InputOne_Dynamic (255 ms) [ RUN ] QLinearConcatS8.InputOne_Const [ OK ] QLinearConcatS8.InputOne_Const (255 ms) [----------] 11 tests from QLinearConcatS8 (3385 ms total) [----------] Global test environment tear-down [==========] 21 tests from 3 test suites ran. (9355 ms total) [ PASSED ] 21 tests. ``` [#16971](#16971) ### Motivation and Context  Co-authored-by: Xiang Zhang <xianz@microsoft.com>

### Description [Cherry Pick Reviewed] ``` [ OK ] QLinearConcatS8.ExpectFail_WrongZeroPointType_1 (372 ms) [ RUN ] QLinearConcatS8.InputOne_Dynamic [ OK ] QLinearConcatS8.InputOne_Dynamic (255 ms) [ RUN ] QLinearConcatS8.InputOne_Const [ OK ] QLinearConcatS8.InputOne_Const (255 ms) [----------] 11 tests from QLinearConcatS8 (3385 ms total) [----------] Global test environment tear-down [==========] 21 tests from 3 test suites ran. (9355 ms total) [ PASSED ] 21 tests. ``` [microsoft#16971](microsoft#16971) ### Motivation and Context  Co-authored-by: Xiang Zhang <xianz@microsoft.com>

### Description [Cherry Pick Reviewed] ``` [ OK ] QLinearConcatS8.ExpectFail_WrongZeroPointType_1 (372 ms) [ RUN ] QLinearConcatS8.InputOne_Dynamic [ OK ] QLinearConcatS8.InputOne_Dynamic (255 ms) [ RUN ] QLinearConcatS8.InputOne_Const [ OK ] QLinearConcatS8.InputOne_Const (255 ms) [----------] 11 tests from QLinearConcatS8 (3385 ms total) [----------] Global test environment tear-down [==========] 21 tests from 3 test suites ran. (9355 ms total) [ PASSED ] 21 tests. ``` [#16971](microsoft/onnxruntime#16971) ### Motivation and Context  Co-authored-by: Xiang Zhang <xianz@microsoft.com>

zhangxiang1993 added 2 commits August 2, 2023 06:38

Add QLinearConcat

fd789d8

add comments

b7852c7

zhangxiang1993 requested review from AnaghaRaoAMD, PatriceVignola and fdwr and removed request for PatriceVignola August 2, 2023 15:58

zhangxiang1993 commented Aug 2, 2023

View reviewed changes

sumitsays reviewed Aug 2, 2023

View reviewed changes

github-advanced-security AI found potential problems Aug 2, 2023

View reviewed changes

AnaghaRaoAMD reviewed Aug 2, 2023

View reviewed changes

resolve comments

cf09e8a

fdwr reviewed Aug 4, 2023

View reviewed changes

resolve comments

6a1e79d

github-advanced-security AI found potential problems Aug 16, 2023

View reviewed changes

fdwr reviewed Aug 17, 2023

View reviewed changes

resolve comments

acadfb7

fdwr approved these changes Aug 17, 2023

View reviewed changes

resolve comments

af19dab

fdwr approved these changes Aug 17, 2023

View reviewed changes

zhangxiang1993 merged commit d3345f3 into DmlPrototype Aug 17, 2023

zhangxiang1993 deleted the user/xianz/QLinearConcat branch August 17, 2023 22:15

AnaghaRaoAMD pushed a commit that referenced this pull request Nov 3, 2023

Add QLinearConcat for DML EP (#16971)

6273ddf

AnaghaRaoAMD mentioned this pull request Nov 3, 2023

Add QLinearConcat for DML EP (#16971) #18268

Merged

	const DML_OPERATOR_DESC opJoinDesc{DML_OPERATOR_JOIN, &joinDesc};
	const DML_OPERATOR_DESC opJoinDesc = {DML_OPERATOR_JOIN, &joinDesc};

		@@ -65,7 +65,7 @@ namespace Dml

		// Given an axis in ONNX axis numbering, return the axis adjusted for DML based on how the sizes have been coerced.
		// Note this function presumes the axis attribute is relative to the first input tensor (which is always the case).

Conversation

zhangxiang1993 commented Aug 2, 2023

Uh oh!

zhangxiang1993 Aug 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhangxiang1993 Aug 2, 2023 • edited by fdwr Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhangxiang1993 Aug 2, 2023 • edited by fdwr Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhangxiang1993 Aug 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sumitsays Aug 2, 2023 • edited by zhangxiang1993 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sumitsays Aug 2, 2023 • edited by zhangxiang1993 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sumitsays Aug 2, 2023 • edited by fdwr Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Check warning

AnaghaRaoAMD Aug 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhangxiang1993 Aug 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fdwr Aug 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fdwr Aug 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fdwr Aug 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fdwr Aug 4, 2023 • edited by zhangxiang1993 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fdwr Aug 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fdwr Aug 4, 2023 • edited by zhangxiang1993 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fdwr Aug 4, 2023 • edited by zhangxiang1993 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fdwr Aug 4, 2023

Choose a reason for hiding this comment

Uh oh!

fdwr Aug 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fdwr left a comment

zhangxiang1993 Aug 2, 2023 •

edited

Loading

zhangxiang1993 Aug 2, 2023 •

edited by fdwr

Loading

zhangxiang1993 Aug 2, 2023 •

edited by fdwr

Loading

zhangxiang1993 Aug 2, 2023 •

edited

Loading

sumitsays Aug 2, 2023 •

edited by zhangxiang1993

Loading

sumitsays Aug 2, 2023 •

edited by zhangxiang1993

Loading

sumitsays Aug 2, 2023 •

edited by fdwr

Loading

AnaghaRaoAMD Aug 2, 2023 •

edited

Loading

zhangxiang1993 Aug 4, 2023 •

edited

Loading

fdwr Aug 4, 2023 •

edited

Loading

fdwr Aug 4, 2023 •

edited

Loading

fdwr Aug 4, 2023 •

edited

Loading

fdwr Aug 4, 2023 •

edited by zhangxiang1993

Loading

fdwr Aug 4, 2023 •

edited

Loading

fdwr Aug 4, 2023 •

edited by zhangxiang1993

Loading

fdwr Aug 4, 2023 •

edited by zhangxiang1993

Loading

fdwr Aug 4, 2023 •

edited

Loading

fdwr Aug 17, 2023 •

edited

Loading