Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gemmer::sgemm not match caffe_cpu_gemm<float> completely #46

Closed
victorygogogo opened this issue Sep 27, 2017 · 11 comments
Closed

Gemmer::sgemm not match caffe_cpu_gemm<float> completely #46

victorygogogo opened this issue Sep 27, 2017 · 11 comments
Assignees

Comments

@victorygogogo
Copy link

Gemmer::sgemm not match caffe_cpu_gemm completely ,
when you test a model,you can found that the loss is iscrease.
then I found that Gemmer::sgemm not support the transformed mat.

can baidu solve it ?

@cocodark
Copy link
Contributor

Is the 'loss' you mentioned the accurancy loss cased by quantification?

@victorygogogo
Copy link
Author

I mean if the input matrix need transpose ,I use this function cause a wrong result ,may cause the loss is bigger. do you knonw the function on "caffe caffe_cpu_gemm"
void caffe_cpu_gemm(const CBLAS_TRANSPOSE TransA,
const CBLAS_TRANSPOSE TransB, const int M, const int N, const int K,
const float alpha, const float* A, const float* B, const float beta,
float* C) {
LOG(INFO) << "Running for caffe_cpu_gemm trans start" ;
int lda = (TransA == CblasNoTrans) ? K : M;
int ldb = (TransB == CblasNoTrans) ? N : K;
cblas_sgemm(CblasRowMajor, TransA, TransB, M, N, K, alpha, A, lda, B,
ldb, beta, C, N);
}

@cocodark
Copy link
Contributor

Our Gemmer is not support transposing matrix indeed, we put transpose procedure in the conversion of model, in order to accelerate matrix manipulation. @victorygogogo

@cocodark
Copy link
Contributor

`/**

  • transpose matrix in advance

  • @param data

  • @param shape

  • @return
    */
    float *trans_matrix(const float *data, vector shape) {

    int m = shape[0];
    int n = shape[1];

    float *trans = new float[m * n];

    for (int i = 0; i < n; ++i) {
    for (int j = 0; j < m; ++j) {

         trans[i * m + j] = data[j * n + i];
     }
    

    }
    return trans;

}` in caffe2mdl.cpp @victorygogogo

@victorygogogo
Copy link
Author

OK ,thank you

@cocodark
Copy link
Contributor

You're welcome.

@victorygogogo
Copy link
Author

I found that ,I use this function ,it is slower than the openblas lib of gemm.by the way ,I use neon on a phone.

@cocodark
Copy link
Contributor

You found our gemm is slower than openblas?

@wangshankun
Copy link

openblas considered the cache control&reuse,share job,multithreading , that three factors speed up the gemm

@cocodark cocodark reopened this Sep 29, 2017
@victorygogogo
Copy link
Author

@cocodark yes , I use neon ,but some matrix needs transpose,so I did transpose in this function to match the "cblas_sgemm" function. even if not use transpose ,I test 30 times cost time is about6.7s, if use the cblas_sgemm,it costs about 2.7s.

@cocodark
Copy link
Contributor

cocodark commented Sep 30, 2017

@victorygogogo ,excellent research work, currently our gemm is accelerated by neon, we'll try the tricks mentioned by @wangshankun ,such as cache control&reuse 、 multithreading to make it faster.If you are interested in gemm optimization work, code contributions will be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants