Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnn: add gemm_layer in place of fully_connected_layer for onnx models #23897

Merged
merged 61 commits into from Sep 19, 2023
Merged
Show file tree
Hide file tree
Changes from 55 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
6cf9631
first commit
fengyuentau May 31, 2023
d94d776
turned C from input to constant; force C constant in impl; better han…
fengyuentau Jun 1, 2023
03476d5
integrate with gemm from ficus nn
fengyuentau Jun 30, 2023
d466f77
fix const inputs
fengyuentau Jul 2, 2023
f4c3640
adjust threshold for int8 tryQuantize
fengyuentau Jul 2, 2023
7a19272
adjust threshold for int8 quantized 2
fengyuentau Jul 2, 2023
05d0793
support batched gemm and matmul; tune threshold for rcnn_ilsvrc13; up…
fengyuentau Jul 4, 2023
fd14e6b
add gemm perf against innerproduct
fengyuentau Jul 6, 2023
9d5ac58
add perf tests for innerproduct with bias
fengyuentau Jul 7, 2023
de0beac
fix perf
fengyuentau Jul 13, 2023
baac71d
add memset
fengyuentau Jul 18, 2023
ec613bc
renamings for next step
fengyuentau Jul 20, 2023
8657065
add dedicated perf gemm
fengyuentau Jul 25, 2023
dfff691
add innerproduct in perf_gemm
fengyuentau Jul 25, 2023
7cacfc0
remove gemm and innerproduct perf tests from perf_layer
fengyuentau Jul 25, 2023
61e458e
add perf cases for vit sizes; prepack constants
fengyuentau Aug 2, 2023
8164e3b
remove batched gemm; fix wrong trans; optimize KC
fengyuentau Aug 2, 2023
3ed1d48
remove prepacking for const A; several fixes for const B prepacking
fengyuentau Aug 7, 2023
c24d944
add todos and gemm expression
fengyuentau Aug 8, 2023
c78a09e
add optimized branch for avx/avx2
fengyuentau Aug 21, 2023
e9301b7
trigger build
fengyuentau Aug 21, 2023
b5c4bc4
update macros and signature
fengyuentau Aug 21, 2023
6a3cf14
update signature
fengyuentau Aug 21, 2023
7d00e56
fix macro
fengyuentau Aug 25, 2023
5c0897a
fix bugs for neon aarch64 & x64
fengyuentau Aug 25, 2023
a7b9c3a
add backends: cuda, cann, inf_ngraph and vkcom
fengyuentau Aug 28, 2023
66eb2e2
fix cuda backend
fengyuentau Aug 28, 2023
8eadede
test commit for cuda
fengyuentau Aug 29, 2023
4ec306b
test cuda backend
fengyuentau Aug 29, 2023
d81dae6
remove debug message from cuda backend
fengyuentau Aug 29, 2023
d407727
use cpu dispatcher
fengyuentau Aug 29, 2023
0697b49
fix neon macro undef in dispatcher
fengyuentau Aug 30, 2023
8d94a23
fix dispatcher
fengyuentau Aug 30, 2023
6ba3c9a
fix inner kernel for neon aarch64
fengyuentau Aug 30, 2023
67ee373
fix compiling issue on armv7; try fixing accuracy issue on other plat…
fengyuentau Aug 30, 2023
0e543f4
broadcast C with beta multiplied; improve func namings
fengyuentau Aug 31, 2023
fc35800
fix bug for avx and avx2
fengyuentau Aug 31, 2023
66c3d47
put all platform-specific kernels in dispatcher
fengyuentau Aug 31, 2023
e843852
fix typos
fengyuentau Aug 31, 2023
8a84865
attempt to fix compile issues on x64
fengyuentau Aug 31, 2023
9a1747a
run old gemm when neon, avx, avx2 are all not available; add kernel f…
fengyuentau Sep 5, 2023
2b307a7
fix typo
fengyuentau Sep 5, 2023
4b5cd4b
quick fix: add macros for pack4
fengyuentau Sep 5, 2023
ae4247c
quick fix: use vmlaq_f32 for armv7
fengyuentau Sep 5, 2023
5ae13a9
quick fix for missing macro of fast gemm pack f32 4
fengyuentau Sep 6, 2023
bf274bf
disable conformance tests when optimized branches are not supported
fengyuentau Sep 6, 2023
d00060e
disable perf tests when optimized branches are not supported
fengyuentau Sep 6, 2023
beaddba
decouple cv_try_neon and cv_neon_aarch64
fengyuentau Sep 10, 2023
efb7dab
drop googlenet_2023; add fastGemmBatched
fengyuentau Sep 10, 2023
1275cd3
fix step in fastGemmBatched
fengyuentau Sep 11, 2023
07cf1c5
cpu: fix initialization ofb; gpu: support batch
fengyuentau Sep 11, 2023
78afd01
quick followup fix for cuda
fengyuentau Sep 11, 2023
d88577a
add default kernels
fengyuentau Sep 11, 2023
8feb258
quick followup fix to avoid macro redef
fengyuentau Sep 11, 2023
e695285
optmized kernels for lasx
fengyuentau Sep 12, 2023
235156c
resolve mis-alignment; remove comments
fengyuentau Sep 12, 2023
f614554
tune performance for x64 platform
fengyuentau Sep 12, 2023
c08dd61
tune performance for neon aarch64
fengyuentau Sep 12, 2023
c1406ca
tune for armv7
fengyuentau Sep 13, 2023
02718dc
comment time consuming tests
fengyuentau Sep 13, 2023
a0f7379
quick follow-up fix
fengyuentau Sep 13, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions modules/dnn/CMakeLists.txt
Expand Up @@ -9,6 +9,7 @@ ocv_add_dispatched_file_force_all("int8layers/layers_common" AVX2 AVX512_SKX LAS
ocv_add_dispatched_file_force_all("layers/cpu_kernels/conv_block" AVX AVX2)
ocv_add_dispatched_file_force_all("layers/cpu_kernels/conv_depthwise" AVX AVX2 RVV LASX)
ocv_add_dispatched_file_force_all("layers/cpu_kernels/conv_winograd_f63" AVX AVX2)
ocv_add_dispatched_file_force_all("layers/cpu_kernels/fast_gemm_kernels" AVX AVX2 NEON LASX)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NEON

We usually don't use runtime dispatching with NEON (it doesn't work due to different ABI).
Whole library is compiled with NEON instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing via #24315


ocv_add_module(dnn opencv_core opencv_imgproc WRAP python java objc js)

Expand Down
10 changes: 10 additions & 0 deletions modules/dnn/include/opencv2/dnn/all_layers.hpp
Expand Up @@ -1101,6 +1101,16 @@ CV__DNN_INLINE_NS_BEGIN
static Ptr<LayerNormLayer> create(const LayerParams& params);
};

class CV_EXPORTS GemmLayer : public Layer {
public:
bool trans_a;
bool trans_b;
float alpha;
float beta;
dkurt marked this conversation as resolved.
Show resolved Hide resolved

static Ptr<GemmLayer> create(const LayerParams& params);
};

//! @}
//! @}
CV__DNN_INLINE_NS_END
Expand Down
249 changes: 249 additions & 0 deletions modules/dnn/perf/perf_gemm.cpp
@@ -0,0 +1,249 @@
// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.

#include "perf_precomp.hpp"
#include <opencv2/dnn/shape_utils.hpp>

namespace opencv_test {

struct GemmParam_t {
std::vector<int> a_shape;
std::vector<int> b_shape;
std::vector<int> c_shape;
bool trans_a;
bool trans_b;

GemmParam_t(std::vector<int> a_shape_, std::vector<int> b_shape_, std::vector<int> c_shape_ = {}, bool trans_a_ = false, bool trans_b_ = false)
: a_shape(a_shape_), b_shape(b_shape_), c_shape(c_shape_), trans_a(trans_a_), trans_b(trans_b_) {}
};

// TODO: Dsiable most of the test cases except vision transformers to save time
static const GemmParam_t test_gemm_configs[] = {
// vision transformers cases
{ { 768, 768 }, { 768, 768 }, { 768 } },
{ { 1024, 1024 }, { 1024, 1024 }, { 1024 } },
{ { 50, 768 }, { 768, 2304 } },
{ { 197, 768 }, { 768, 2304 } },
{ { 50, 1024 }, { 1024, 3072 } },
{ { 197, 1024 }, { 1024, 3072 } },

// square mat
{ { 64, 64 }, { 64, 64 } },
{ { 128, 128 }, { 128, 128 } },
{ { 256, 256 }, { 256, 256 } },
{ { 512, 512 }, { 512, 512 } },
{ { 1024, 1024 }, { 1024, 1024 } },
{ { 4096, 4096 }, { 4096, 4096 } },

// retangular mat
{ { 256, 256 }, { 256, 1024 } },
{ { 256, 1024 }, { 1024, 256 } },
{ { 256, 1024 }, { 1024, 1024 } },
{ { 1024, 1024 }, { 1024, 256 } },
{ { 1024, 256 }, { 256, 1024 } },
{ { 1024, 256 }, { 256, 256 } },

// with C
{ { 256, 256 }, { 256, 256 }, { 256 } },
{ { 256, 256 }, { 256, 1024 }, { 1024 } },
{ { 256, 1024 }, { 1024, 256 }, { 256 } },
{ { 256, 1024 }, { 1024, 1024 }, { 1024 } },
// { { 1024, 1024 }, { 1024, 1024 }, { 1024 } },
{ { 1024, 1024 }, { 1024, 256 }, { 256 } },
{ { 1024, 256 }, { 256, 1024 }, { 1024 } },
{ { 1024, 256 }, { 256, 256 }, { 256 } },

// with C and trans_b
{ { 256, 256 }, { 256, 256 }, { 256 } , false, true},
{ { 256, 1024 }, { 256, 1024 }, { 256 } , false, true},
{ { 256, 1024 }, { 1024, 1024 }, { 1024 } , false, true},
{ { 1024, 1024 }, { 1024, 1024 }, { 1024 } , false, true},
{ { 1024, 256 }, { 1024, 256 }, { 1024 } , false, true},
{ { 1024, 256 }, { 256, 256 }, { 256 } , false, true},

// with C and trans_b and trans_a
{ { 256, 256 }, { 256, 256 }, { 256 } , true, true},
{ { 1024, 256 }, { 256, 1024 }, { 256 } , true, true},
{ { 256, 1024 }, { 1024, 256 }, { 1024 } , true, true},
{ { 1024, 1024 }, { 1024, 1024 }, { 1024 } , true, true},
};

struct GemmParamId
{
enum {
GEMM_0 = 0,
GEMM_LAST = sizeof(test_gemm_configs) / sizeof(test_gemm_configs[0])
};
int val_;
GemmParamId(int val = 0) : val_(val) {}
operator int() const { return val_; }
static ::testing::internal::ParamGenerator<GemmParamId> all()
{
enum { NUM = (int)GEMM_LAST };
GemmParamId v_[NUM]; for (int i = 0; i < NUM; ++i) { v_[i] = GemmParamId(i); } // reduce generated code size
return ::testing::ValuesIn(v_, v_ + NUM);
}
};

static inline void PrintTo(const GemmParamId& v, std::ostream* os)
{
CV_Assert((int)v >= 0); CV_Assert((int)v < GemmParamId::GEMM_LAST);
const GemmParam_t& p = test_gemm_configs[(int)v];

auto print_shape = [os](const std::vector<int>& shape, const std::string tag) {
if (shape.empty()) {
return ;
}

*os << tag << "=[";
for (size_t i = 0; i < shape.size(); ++i) {
if (i == shape.size() - 1) {
*os << shape[i] << "]";
break;
}
*os << shape[i] << ", ";
}
};

print_shape(p.a_shape, "A");
print_shape(p.b_shape, ", B");
print_shape(p.c_shape, ", C");
*os << ", trans_a=" << p.trans_a << ", trans_b=" << p.trans_b;
}

typedef tuple<GemmParamId, tuple<Backend, Target> > GemmTestParam_t;
typedef TestBaseWithParam<GemmTestParam_t> Gemm;

PERF_TEST_P_(Gemm, gemm)
{
int test_id = (int)get<0>(GetParam());
ASSERT_GE(test_id, 0); ASSERT_LT(test_id, GemmParamId::GEMM_LAST);
const GemmParam_t& params = test_gemm_configs[test_id];
auto a_shape = params.a_shape;
auto b_shape = params.b_shape;
auto c_shape = params.c_shape;
auto trans_a = params.trans_a;
auto trans_b = params.trans_b;
float alpha = 1.f;
float beta = 1.f;

Backend backend_id = get<0>(get<1>(GetParam()));
Target target_id = get<1>(get<1>(GetParam()));

bool have_bias = c_shape.empty() ? false : true;

Mat A(static_cast<int>(a_shape.size()), a_shape.data(), CV_32F);
randu(A, -1.0f, 1.0f);
Mat B(static_cast<int>(b_shape.size()), b_shape.data(), CV_32F);
randu(A, -1.0f, 1.0f);

LayerParams lp;
lp.type = "Gemm";
lp.name = "testLayer";
lp.set("transA", trans_a);
lp.set("transB", trans_b);
lp.set("alpha", alpha);
lp.set("beta", beta);
lp.set("real_ndims_C", static_cast<int>(c_shape.size()));

lp.set("constB", true);
lp.blobs.push_back(B);
if (have_bias) {
Mat C(static_cast<int>(c_shape.size()), c_shape.data(), CV_32F);
randu(C, -1.0f, 1.0f);
lp.set("have_bias", true);
lp.set("constC", true);
lp.blobs.push_back(C);
}

Net net;
int id = net.addLayerToPrev(lp.name, lp.type, lp);
net.connect(0, 0, id, 0);
net.setPreferableBackend(backend_id);
net.setPreferableTarget(target_id);

// warmup
{
net.setInput(A);
Mat out = net.forward();
}

TEST_CYCLE()
{
Mat res = net.forward();
}

SANITY_CHECK_NOTHING();
}

PERF_TEST_P_(Gemm, innerproduct)
{
int test_id = (int)get<0>(GetParam());
ASSERT_GE(test_id, 0); ASSERT_LT(test_id, GemmParamId::GEMM_LAST);
const GemmParam_t& params = test_gemm_configs[test_id];
auto a_shape = params.a_shape;
auto b_shape = params.b_shape;
auto c_shape = params.c_shape;
auto trans_a = params.trans_a;
auto trans_b = params.trans_b;

Backend backend_id = get<0>(get<1>(GetParam()));
Target target_id = get<1>(get<1>(GetParam()));

bool have_bias = c_shape.empty() ? false : true;

Mat A(static_cast<int>(a_shape.size()), a_shape.data(), CV_32F);
randu(A, -1.0f, 1.0f);
Mat B(static_cast<int>(b_shape.size()), b_shape.data(), CV_32F);
randu(A, -1.0f, 1.0f);

LayerParams lp;
lp.type = "InnerProduct";
lp.name = "testLayer";
if (trans_a) {
cv::transpose(A, A);
}
if (!trans_b) {
cv::transpose(B, B);
}
lp.blobs.push_back(B);
lp.set("num_output", B.size[0]);
if (have_bias) {
Mat C(static_cast<int>(c_shape.size()), c_shape.data(), CV_32F);
randu(C, -1.0f, 1.0f);
lp.blobs.push_back(C);
lp.set("bias_term", true);
} else {
lp.set("bias_term", false);
}

Net net;
int id = net.addLayerToPrev(lp.name, lp.type, lp);
net.connect(0, 0, id, 0);
net.setPreferableBackend(backend_id);
net.setPreferableTarget(target_id);

// warmup
{
std::vector<std::string> input_names(2);
input_names[0] = "A";
net.setInputsNames(input_names);
net.setInput(A, input_names[0]);
Mat out = net.forward();
}

TEST_CYCLE()
{
Mat res = net.forward();
}

SANITY_CHECK_NOTHING();
}

INSTANTIATE_TEST_CASE_P(/**/, Gemm, Combine(
GemmParamId::all(),
dnnBackendsAndTargets(false, false) // defined in ../test/test_common.hpp
));

} // namespace
1 change: 1 addition & 0 deletions modules/dnn/src/init.cpp
Expand Up @@ -101,6 +101,7 @@ void initializeLayerFactory()
CV_DNN_REGISTER_LAYER_CLASS(Reduce, ReduceLayer);
CV_DNN_REGISTER_LAYER_CLASS(LRN, LRNLayer);
CV_DNN_REGISTER_LAYER_CLASS(InnerProduct, InnerProductLayer);
CV_DNN_REGISTER_LAYER_CLASS(Gemm, GemmLayer);
CV_DNN_REGISTER_LAYER_CLASS(Softmax, SoftmaxLayer);
CV_DNN_REGISTER_LAYER_CLASS(SoftMax, SoftmaxLayer); // For compatibility. See https://github.com/opencv/opencv/issues/16877
CV_DNN_REGISTER_LAYER_CLASS(MVN, MVNLayer);
Expand Down