Skip to content

MLAS: improve quantized depthwise convolution#6513

Merged
tracysh merged 9 commits intomasterfrom
tracysh/qdwconv
Feb 2, 2021
Merged

MLAS: improve quantized depthwise convolution#6513
tracysh merged 9 commits intomasterfrom
tracysh/qdwconv

Conversation

@tracysh
Copy link
Contributor

@tracysh tracysh commented Jan 31, 2021

Description: Improve the performance of QLinearConv for depthwise convolutions.

Motivation and Context

  1. Added support for an im2col variant that returns an array of buffer pointers instead of memcpy'ing the original data (see https://arxiv.org/pdf/1907.02129.pdf). This saves memcpy time and reduces the size of the intermediate buffer needed to hold the im2col transform. This im2col will also be used to support quantized pooling ops.
  2. Added an AVX2 depthwise convolution kernel.

On an older Broadwell test system, the mobilenetv2-7.quant.onnx from the E2E example drops from 4.6ms to 3.6ms per inference. The original FP32 model runs in 3.2ms on the same system.

@tracysh tracysh requested a review from a team as a code owner January 31, 2021 03:12
output_start,
output_count,
worker_col_buffer,
padding_data.data());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input data is continuous. Let's say the size of one image is NSize. Will it be faster to just add NSize on the indirection buffer for each iteration?

Copy link
Member

@yufenglee yufenglee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@tracysh tracysh merged commit 9a6e715 into master Feb 2, 2021
@tracysh tracysh deleted the tracysh/qdwconv branch February 2, 2021 05:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants