Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QNNPACK] Block Sparse kernel. First commit. #50585

Closed
wants to merge 16 commits into from

Conversation

kimishpatel
Copy link
Contributor

@kimishpatel kimishpatel commented Jan 15, 2021

Stack from ghstack:

Summary:
This diff introduces sparse kenel for sse. Uses 1x4 block sparse
pattern.

Test Plan:
./build/local/q8gemm-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D25925500

Summary:
This diff introduces sparse kenel for sse. Uses 1x4 block sparse
pattern.

Test Plan:
./build/local/q8gemm-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@@ -571,6 +572,15 @@ if(PYTORCH_QNNPACK_BUILD_TESTS)
target_link_libraries(q8gemm-test PRIVATE pytorch_qnnpack cpuinfo fp16 gtest gtest_main)
add_test(q8gemm-test q8gemm-test)

add_executable(q8gemm-sparse-test test/q8gemm_sparse.cc)
set_target_properties(q8gemm-sparse-test PROPERTIES
CXX_STANDARD 14
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pointing out that the rest of the code uses C++11, but this is OK with me personally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take this back, I was looking at the original file. :)


#pragma once
#include <cstdint>
#include <vector>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our fork of QNNPACK is still a C only library except for the unittests. Are you OK with introducing C++ into the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should structure this better. This is not really used in any kernels, so operators wont be using this. We will use packing on cpp side. But thanks for pointing this out.

const uint8_t* zero_points) {
BCSRMatrix bcsr_mat;
bcsr_mat.row_values.resize(N);
// K must be > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An assert would be helpful to check for preconditions. I think generous use of asserts is good practice - has no runtime cost but can come in handy when dealing with code that behaves in unexpected ways.

(K % col_block_size) == 0 ? num_blocks : num_blocks - 1;
const uint32_t remainder_elements = K % col_block_size;

bcsr_mat.row_values.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bcsr_mat.row_values.resize() followed by clear should not be necessary when followed by reserve() and operating on a newly constructed vector. If you just want to pre-allocate memory without modifying the vector size then a call to reserve should be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is an oversight. Thanks for catching. Will fix it.

#include <qnnpack/params.h>
#include <qnnpack/requantization.h>

namespace {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't matter since this is a unit test but placing anonymous or static functions in a header file means each compilation unit is going to end up with its own copy since this implies internal linkage. I am not actually sure if the compiler consolidates all [similar] functions into one in practice but avoid this usage personally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right. It will just create a separate copy for each translation unit. But I am more lax with test code and did this just out of practice.


#define MR 8
#define COL_BLOCK_SIZE 4
#define PACKED_A_BLOCK_SIZE COL_BLOCK_SIZE*MR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be used anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch.

#define COL_BLOCK_SIZE 4
#define PACKED_A_BLOCK_SIZE COL_BLOCK_SIZE*MR

#define LOAD_TRANSPOSED_A_BLOCK(id, index, col_block_id) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally lean on using forced inline functions if possible. The thing with macros like this is that if you end up accidentally using any arithmetic in the arguments at call-site LOAD_TRANSPOSED_A_BLOCK(a, b, c+1) this will have unexpected consequences. Also the commonality like col_block_id * COL_BLOCK_SIZE + index can be factored out more readily when using functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted this way mainly because I can generate variables names like va0, va1 etc. that are later used and can avoid passing 10 args to the function. I agree about arithmetics used in arg leading to unexpected consequences. I will fix it appropriately.

Summary:
This diff introduces sparse kenel for sse. Uses 1x4 block sparse
pattern.

Test Plan:
./build/local/q8gemm-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500)

[ghstack-poisoned]
Summary:
This diff introduces sparse kenel for sse. Uses 1x4 block sparse
pattern.

Test Plan:
./build/local/q8gemm-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500)

[ghstack-poisoned]
Summary:
This diff introduces sparse kenel for sse. Uses 1x4 block sparse
pattern.

Test Plan:
./build/local/q8gemm-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500)

[ghstack-poisoned]
Summary:
This diff introduces sparse kenel for sse. Uses 1x4 block sparse
pattern.

Test Plan:
./build/local/q8gemm-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500)

[ghstack-poisoned]
Summary:
This diff introduces sparse kenel for sse. Uses 1x4 block sparse
pattern.

Test Plan:
./build/local/q8gemm-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500)

[ghstack-poisoned]
Summary:
This diff introduces sparse kenel for sse. Uses 1x4 block sparse
pattern.

Test Plan:
./build/local/q8gemm-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500)

[ghstack-poisoned]
Summary:
This diff introduces sparse kenel for sse. Uses 1x4 block sparse
pattern.

Test Plan:
./build/local/q8gemm-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500)

[ghstack-poisoned]
Summary:
This diff introduces sparse kenel for sse. Uses 1x4 block sparse
pattern.

Test Plan:
./build/local/q8gemm-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500)

[ghstack-poisoned]
Summary:
This diff introduces sparse kenel for sse. Uses 1x4 block sparse
pattern.

Test Plan:
./build/local/q8gemm-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500)

[ghstack-poisoned]
Summary:
This diff introduces sparse kenel for sse. Uses 1x4 block sparse
pattern.

Test Plan:
./build/local/q8gemm-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500)

[ghstack-poisoned]
Summary:
This diff introduces sparse kenel for sse. Uses 1x4 block sparse
pattern.

Test Plan:
./build/local/q8gemm-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500)

[ghstack-poisoned]
Summary:
This diff introduces sparse kenel for sse. Uses 1x4 block sparse
pattern.

Test Plan:
./build/local/q8gemm-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500)

[ghstack-poisoned]
Summary:
This diff introduces sparse kenel for sse. Uses 1x4 block sparse
pattern.

Test Plan:
./build/local/q8gemm-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500)

[ghstack-poisoned]
Summary:
This diff introduces sparse kenel for sse. Uses 1x4 block sparse
pattern.

Test Plan:
./build/local/q8gemm-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500)

[ghstack-poisoned]
Summary:
This diff introduces sparse kenel for sse. Uses 1x4 block sparse
pattern.

Test Plan:
./build/local/q8gemm-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925500](https://our.internmc.facebook.com/intern/diff/D25925500)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 520f96b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants