New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QNNPACK] Block Sparse kernel. First commit. #50585
Conversation
Summary: This diff introduces sparse kenel for sse. Uses 1x4 block sparse pattern. Test Plan: ./build/local/q8gemm-sparse-test Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
@@ -571,6 +572,15 @@ if(PYTORCH_QNNPACK_BUILD_TESTS) | |||
target_link_libraries(q8gemm-test PRIVATE pytorch_qnnpack cpuinfo fp16 gtest gtest_main) | |||
add_test(q8gemm-test q8gemm-test) | |||
|
|||
add_executable(q8gemm-sparse-test test/q8gemm_sparse.cc) | |||
set_target_properties(q8gemm-sparse-test PROPERTIES | |||
CXX_STANDARD 14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pointing out that the rest of the code uses C++11, but this is OK with me personally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take this back, I was looking at the original file. :)
|
||
#pragma once | ||
#include <cstdint> | ||
#include <vector> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our fork of QNNPACK is still a C only library except for the unittests. Are you OK with introducing C++ into the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should structure this better. This is not really used in any kernels, so operators wont be using this. We will use packing on cpp side. But thanks for pointing this out.
const uint8_t* zero_points) { | ||
BCSRMatrix bcsr_mat; | ||
bcsr_mat.row_values.resize(N); | ||
// K must be > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An assert would be helpful to check for preconditions. I think generous use of asserts is good practice - has no runtime cost but can come in handy when dealing with code that behaves in unexpected ways.
(K % col_block_size) == 0 ? num_blocks : num_blocks - 1; | ||
const uint32_t remainder_elements = K % col_block_size; | ||
|
||
bcsr_mat.row_values.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bcsr_mat.row_values.resize() followed by clear should not be necessary when followed by reserve() and operating on a newly constructed vector. If you just want to pre-allocate memory without modifying the vector size then a call to reserve should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is an oversight. Thanks for catching. Will fix it.
#include <qnnpack/params.h> | ||
#include <qnnpack/requantization.h> | ||
|
||
namespace { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't matter since this is a unit test but placing anonymous or static functions in a header file means each compilation unit is going to end up with its own copy since this implies internal linkage. I am not actually sure if the compiler consolidates all [similar] functions into one in practice but avoid this usage personally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right. It will just create a separate copy for each translation unit. But I am more lax with test code and did this just out of practice.
|
||
#define MR 8 | ||
#define COL_BLOCK_SIZE 4 | ||
#define PACKED_A_BLOCK_SIZE COL_BLOCK_SIZE*MR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to be used anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catch.
#define COL_BLOCK_SIZE 4 | ||
#define PACKED_A_BLOCK_SIZE COL_BLOCK_SIZE*MR | ||
|
||
#define LOAD_TRANSPOSED_A_BLOCK(id, index, col_block_id) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally lean on using forced inline functions if possible. The thing with macros like this is that if you end up accidentally using any arithmetic in the arguments at call-site LOAD_TRANSPOSED_A_BLOCK(a, b, c+1) this will have unexpected consequences. Also the commonality like col_block_id * COL_BLOCK_SIZE + index
can be factored out more readily when using functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opted this way mainly because I can generate variables names like va0, va1 etc. that are later used and can avoid passing 10 args to the function. I agree about arithmetics used in arg leading to unexpected consequences. I will fix it appropriately.
Summary: This diff introduces sparse kenel for sse. Uses 1x4 block sparse pattern. Test Plan: ./build/local/q8gemm-sparse-test Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500) [ghstack-poisoned]
Summary: This diff introduces sparse kenel for sse. Uses 1x4 block sparse pattern. Test Plan: ./build/local/q8gemm-sparse-test Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500) [ghstack-poisoned]
Summary: This diff introduces sparse kenel for sse. Uses 1x4 block sparse pattern. Test Plan: ./build/local/q8gemm-sparse-test Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500) [ghstack-poisoned]
Summary: This diff introduces sparse kenel for sse. Uses 1x4 block sparse pattern. Test Plan: ./build/local/q8gemm-sparse-test Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500) [ghstack-poisoned]
Summary: This diff introduces sparse kenel for sse. Uses 1x4 block sparse pattern. Test Plan: ./build/local/q8gemm-sparse-test Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500) [ghstack-poisoned]
Summary: This diff introduces sparse kenel for sse. Uses 1x4 block sparse pattern. Test Plan: ./build/local/q8gemm-sparse-test Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500) [ghstack-poisoned]
Summary: This diff introduces sparse kenel for sse. Uses 1x4 block sparse pattern. Test Plan: ./build/local/q8gemm-sparse-test Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500) [ghstack-poisoned]
Summary: This diff introduces sparse kenel for sse. Uses 1x4 block sparse pattern. Test Plan: ./build/local/q8gemm-sparse-test Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500) [ghstack-poisoned]
Summary: This diff introduces sparse kenel for sse. Uses 1x4 block sparse pattern. Test Plan: ./build/local/q8gemm-sparse-test Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500) [ghstack-poisoned]
Summary: This diff introduces sparse kenel for sse. Uses 1x4 block sparse pattern. Test Plan: ./build/local/q8gemm-sparse-test Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500) [ghstack-poisoned]
Summary: This diff introduces sparse kenel for sse. Uses 1x4 block sparse pattern. Test Plan: ./build/local/q8gemm-sparse-test Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500) [ghstack-poisoned]
Summary: This diff introduces sparse kenel for sse. Uses 1x4 block sparse pattern. Test Plan: ./build/local/q8gemm-sparse-test Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500) [ghstack-poisoned]
Summary: This diff introduces sparse kenel for sse. Uses 1x4 block sparse pattern. Test Plan: ./build/local/q8gemm-sparse-test Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500) [ghstack-poisoned]
Summary: This diff introduces sparse kenel for sse. Uses 1x4 block sparse pattern. Test Plan: ./build/local/q8gemm-sparse-test Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500) [ghstack-poisoned]
Summary: This diff introduces sparse kenel for sse. Uses 1x4 block sparse pattern. Test Plan: ./build/local/q8gemm-sparse-test Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500) [ghstack-poisoned]
This pull request has been merged in 520f96b. |
Stack from ghstack:
Summary:
This diff introduces sparse kenel for sse. Uses 1x4 block sparse
pattern.
Test Plan:
./build/local/q8gemm-sparse-test
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D25925500