Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QNNPACK, Sparsity] Sparse kernel with 4x8 blocking #50590

Closed
wants to merge 18 commits into from

Conversation

kimishpatel
Copy link
Contributor

@kimishpatel kimishpatel commented Jan 15, 2021

Stack from ghstack:

Summary:
Larger blocking across M dim such as 8 in previous PR is likely
introducing wasted compute on the shapes being benchmarked.
Here we introduced 4x8 blocking of mrxnr. This helps 1) in packing
smaller data for small values of M and 2) for compute kernel it writes
same number of bytes but more contiguously. It is not certain but it
likely helps.

Test Plan:
q8gemm-sparse-test
fully-connected-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D25925499

Summary:
Larger blocking across M dim such as 8 in previous PR is likely
introducing wasted compute on the shapes being benchmarked.
Here we introduced 4x8 blocking of mrxnr. This helps 1) in packing
smaller data for small values of M and 2) for compute kernel it writes
same number of bytes but more contiguously. It is not certain but it
likely helps.

Test Plan:
q8gemm-sparse-test
fully-connected-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
Summary:
Larger blocking across M dim such as 8 in previous PR is likely
introducing wasted compute on the shapes being benchmarked.
Here we introduced 4x8 blocking of mrxnr. This helps 1) in packing
smaller data for small values of M and 2) for compute kernel it writes
same number of bytes but more contiguously. It is not certain but it
likely helps.

Test Plan:
q8gemm-sparse-test
fully-connected-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925499](https://our.internmc.facebook.com/intern/diff/D25925499)

[ghstack-poisoned]
Summary:
Larger blocking across M dim such as 8 in previous PR is likely
introducing wasted compute on the shapes being benchmarked.
Here we introduced 4x8 blocking of mrxnr. This helps 1) in packing
smaller data for small values of M and 2) for compute kernel it writes
same number of bytes but more contiguously. It is not certain but it
likely helps.

Test Plan:
q8gemm-sparse-test
fully-connected-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925499](https://our.internmc.facebook.com/intern/diff/D25925499)

[ghstack-poisoned]
Summary:
Larger blocking across M dim such as 8 in previous PR is likely
introducing wasted compute on the shapes being benchmarked.
Here we introduced 4x8 blocking of mrxnr. This helps 1) in packing
smaller data for small values of M and 2) for compute kernel it writes
same number of bytes but more contiguously. It is not certain but it
likely helps.

Test Plan:
q8gemm-sparse-test
fully-connected-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925499](https://our.internmc.facebook.com/intern/diff/D25925499)

[ghstack-poisoned]
Summary:
Larger blocking across M dim such as 8 in previous PR is likely
introducing wasted compute on the shapes being benchmarked.
Here we introduced 4x8 blocking of mrxnr. This helps 1) in packing
smaller data for small values of M and 2) for compute kernel it writes
same number of bytes but more contiguously. It is not certain but it
likely helps.

Test Plan:
q8gemm-sparse-test
fully-connected-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925499](https://our.internmc.facebook.com/intern/diff/D25925499)

[ghstack-poisoned]
Summary:
Larger blocking across M dim such as 8 in previous PR is likely
introducing wasted compute on the shapes being benchmarked.
Here we introduced 4x8 blocking of mrxnr. This helps 1) in packing
smaller data for small values of M and 2) for compute kernel it writes
same number of bytes but more contiguously. It is not certain but it
likely helps.

Test Plan:
q8gemm-sparse-test
fully-connected-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925499](https://our.internmc.facebook.com/intern/diff/D25925499)

[ghstack-poisoned]
Summary:
Larger blocking across M dim such as 8 in previous PR is likely
introducing wasted compute on the shapes being benchmarked.
Here we introduced 4x8 blocking of mrxnr. This helps 1) in packing
smaller data for small values of M and 2) for compute kernel it writes
same number of bytes but more contiguously. It is not certain but it
likely helps.

Test Plan:
q8gemm-sparse-test
fully-connected-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925499](https://our.internmc.facebook.com/intern/diff/D25925499)

[ghstack-poisoned]
Summary:
Larger blocking across M dim such as 8 in previous PR is likely
introducing wasted compute on the shapes being benchmarked.
Here we introduced 4x8 blocking of mrxnr. This helps 1) in packing
smaller data for small values of M and 2) for compute kernel it writes
same number of bytes but more contiguously. It is not certain but it
likely helps.

Test Plan:
q8gemm-sparse-test
fully-connected-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925499](https://our.internmc.facebook.com/intern/diff/D25925499)

[ghstack-poisoned]
Summary:
Larger blocking across M dim such as 8 in previous PR is likely
introducing wasted compute on the shapes being benchmarked.
Here we introduced 4x8 blocking of mrxnr. This helps 1) in packing
smaller data for small values of M and 2) for compute kernel it writes
same number of bytes but more contiguously. It is not certain but it
likely helps.

Test Plan:
q8gemm-sparse-test
fully-connected-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925499](https://our.internmc.facebook.com/intern/diff/D25925499)

[ghstack-poisoned]
Summary:
Larger blocking across M dim such as 8 in previous PR is likely
introducing wasted compute on the shapes being benchmarked.
Here we introduced 4x8 blocking of mrxnr. This helps 1) in packing
smaller data for small values of M and 2) for compute kernel it writes
same number of bytes but more contiguously. It is not certain but it
likely helps.

Test Plan:
q8gemm-sparse-test
fully-connected-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925499](https://our.internmc.facebook.com/intern/diff/D25925499)

[ghstack-poisoned]
Summary:
Larger blocking across M dim such as 8 in previous PR is likely
introducing wasted compute on the shapes being benchmarked.
Here we introduced 4x8 blocking of mrxnr. This helps 1) in packing
smaller data for small values of M and 2) for compute kernel it writes
same number of bytes but more contiguously. It is not certain but it
likely helps.

Test Plan:
q8gemm-sparse-test
fully-connected-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925499](https://our.internmc.facebook.com/intern/diff/D25925499)

[ghstack-poisoned]
Summary:
Larger blocking across M dim such as 8 in previous PR is likely
introducing wasted compute on the shapes being benchmarked.
Here we introduced 4x8 blocking of mrxnr. This helps 1) in packing
smaller data for small values of M and 2) for compute kernel it writes
same number of bytes but more contiguously. It is not certain but it
likely helps.

Test Plan:
q8gemm-sparse-test
fully-connected-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925499](https://our.internmc.facebook.com/intern/diff/D25925499)

[ghstack-poisoned]
Summary:
Larger blocking across M dim such as 8 in previous PR is likely
introducing wasted compute on the shapes being benchmarked.
Here we introduced 4x8 blocking of mrxnr. This helps 1) in packing
smaller data for small values of M and 2) for compute kernel it writes
same number of bytes but more contiguously. It is not certain but it
likely helps.

Test Plan:
q8gemm-sparse-test
fully-connected-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925499](https://our.internmc.facebook.com/intern/diff/D25925499)

[ghstack-poisoned]
Summary:
Larger blocking across M dim such as 8 in previous PR is likely
introducing wasted compute on the shapes being benchmarked.
Here we introduced 4x8 blocking of mrxnr. This helps 1) in packing
smaller data for small values of M and 2) for compute kernel it writes
same number of bytes but more contiguously. It is not certain but it
likely helps.

Test Plan:
q8gemm-sparse-test
fully-connected-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925499](https://our.internmc.facebook.com/intern/diff/D25925499)

[ghstack-poisoned]
Summary:
Larger blocking across M dim such as 8 in previous PR is likely
introducing wasted compute on the shapes being benchmarked.
Here we introduced 4x8 blocking of mrxnr. This helps 1) in packing
smaller data for small values of M and 2) for compute kernel it writes
same number of bytes but more contiguously. It is not certain but it
likely helps.

Test Plan:
q8gemm-sparse-test
fully-connected-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925499](https://our.internmc.facebook.com/intern/diff/D25925499)

[ghstack-poisoned]
Summary:
Larger blocking across M dim such as 8 in previous PR is likely
introducing wasted compute on the shapes being benchmarked.
Here we introduced 4x8 blocking of mrxnr. This helps 1) in packing
smaller data for small values of M and 2) for compute kernel it writes
same number of bytes but more contiguously. It is not certain but it
likely helps.

Test Plan:
q8gemm-sparse-test
fully-connected-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925499](https://our.internmc.facebook.com/intern/diff/D25925499)

[ghstack-poisoned]
Summary:
Larger blocking across M dim such as 8 in previous PR is likely
introducing wasted compute on the shapes being benchmarked.
Here we introduced 4x8 blocking of mrxnr. This helps 1) in packing
smaller data for small values of M and 2) for compute kernel it writes
same number of bytes but more contiguously. It is not certain but it
likely helps.

Test Plan:
q8gemm-sparse-test
fully-connected-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925499](https://our.internmc.facebook.com/intern/diff/D25925499)

[ghstack-poisoned]
Summary:
Larger blocking across M dim such as 8 in previous PR is likely
introducing wasted compute on the shapes being benchmarked.
Here we introduced 4x8 blocking of mrxnr. This helps 1) in packing
smaller data for small values of M and 2) for compute kernel it writes
same number of bytes but more contiguously. It is not certain but it
likely helps.

Test Plan:
q8gemm-sparse-test
fully-connected-sparse-test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25925499](https://our.internmc.facebook.com/intern/diff/D25925499)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 70830b5.

@facebook-github-bot facebook-github-bot deleted the gh/kimishpatel/39/head branch February 9, 2021 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants