-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a BLAS lib & CLBlast for CPU & GPU speedups. [Enhancement] #7
Comments
Yeah... I definitely want to keep the simplicity of the repo. I'll take a look. |
If CPU(BLAS)&GPU(CLBlast) speedups were applied, I am very interested in benchmarking the program with different quantization model by different edge devices, such as RK3588,Nvidia Jetson Orin Nano, even Android mobile(like Qualcomm Snapdragon 7/8). |
So I tried using cblas #include <cblas.h>
void accum(float *a, float *b, int size) {
cblas_saxpy(size, 1.0f, b, 1.0f, a, 1);
}
void rmsnorm(float* o, float* x, float* weight, int size) {
float ss = cblas_sdot(size, x, 1.0f, x, 1.0f);
ss /= size;
ss += 1e-5f;
ss = 1.0f / sqrt(ss);
for (int j = 0; j < size; j++) {
o[j] = weight[j] * (ss * x[j]);
}
}
void matmul(float* xout, float* x, float* w, int n, int d) {
cblas_sgemv(CblasRowMajor, CblasNoTrans, d, n, 1.0f, w, n, x, 1, 0.0f, xout, 1);
} Gives a decent speedup |
Added BLAS support: + Openblas + CLBlast (GPU) CLBlast is considerable slower. Needs investigation. Added APE binary prompt support Usage: Ape run: $ run.com Baremetal Boot: $ qemu-system-x86_64 -serial stdio -hda run.com (input is broken on baremetal) Updated Makefile Usage: make runopenblas make runclblast
Available in a separate fork. Thanks for the discussion. Closing. |
@karpathy I was thinking if you'd consider using a BLAS lib to speed up compute so that larger models may work.
If that is the case, please also have a option to compile with CLBlast (compatible drop in blas) so that compute can get offloaded to GPU via OpenCL.
https://www.netlib.org/blas/#_reference_blas_version_3_11_0
https://www.openblas.net/
https://github.com/CNugteren/CLBlast
The text was updated successfully, but these errors were encountered: