adding support for linux binaries #5106

DarkReaperBoy · 2024-01-24T06:43:40Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

(yup, i searched 'linux binary' and 'linux binaries' and didn't found a single issue)

Feature Description

as i said in the title, it would be great if llama.cpp officially adds support for automated compile in
https://github.com/ggerganov/llama.cpp/releases
because i get compile issues and i'm not programmer, so it would be great if you would officially include such binary, even if it's 343.2 MIB (359.8712832 MB) of size like what kobold.cpp is

Motivation

alot of compile errors and discord calls, and the fact that kobold.cpp has it already, as for why it's 'necessary', because newbies like me could also use it, and the fact that kobold.cpp is very slow in updates and can't use the mainline features

Possible Implementation

maybe this would be useful?
https://github.com/Nexesenex/kobold.cpp/blob/concedo_experimental/.github/workflows/kcpp-build-release-linux.yaml

but overall, feel free to close and ignore this issue if it isn't a priority, but it hugely sucks

supportend · 2024-01-24T10:04:22Z

because i get compile issues

What issues?

Titaniumtown · 2024-01-24T16:15:32Z

It compiles fine for me, what is your build environment. The CI tests if it can compile on Linux and it does. It's more likely to be specific to your build environment.

Edit: try using zig!

DarkReaperBoy · 2024-01-26T08:04:37Z

because i get compile issues

What issues?

I llama.cpp build info: 
I UNAME_S:   Linux
I UNAME_P:   x86_64
I UNAME_M:   x86_64
I CFLAGS:    -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_OPENBLAS -I/usr/include/openblas  -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -pthread -march=native -mtune=native  -Wdouble-promotion 
I CXXFLAGS:  -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_OPENBLAS -I/usr/include/openblas  -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi
I NVCCFLAGS: -use_fast_math --forward-unknown-to-host-compiler -arch=native -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 
I LDFLAGS:   -L/usr/lib64/openblas-openmp -lopenblas  -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib -L/usr/local/cuda/targets/aarch64-linux/lib -L/usr/lib/wsl/lib 
I CC:        cc (conda-forge gcc 13.2.0-3) 13.2.0
I CXX:       g++ (conda-forge gcc 13.2.0-3) 13.2.0

cc  -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_OPENBLAS -I/usr/include/openblas  -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -pthread -march=native -mtune=native  -Wdouble-promotion    -c ggml.c -o ggml.o
ggml.c: In function 'ggml_init':
ggml.c:2293:9: error: implicit declaration of function 'ggml_init_cublas'; did you mean 'ggml_cpu_has_cublas'? [-Werror=implicit-function-declaration]
 2293 |         ggml_init_cublas();
      |         ^~~~~~~~~~~~~~~~
      |         ggml_cpu_has_cublas
ggml.c: In function 'ggml_compute_forward':
ggml.c:14680:21: error: implicit declaration of function 'ggml_cuda_compute_forward'; did you mean 'ggml_compute_forward'? [-Werror=implicit-function-declaration]
14680 |     bool skip_cpu = ggml_cuda_compute_forward(params, tensor);
      |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
      |                     ggml_compute_forward
cc1: some warnings being treated as errors
make: *** [Makefile:539: ggml.o] Error 1

the error is random at times, my environment is conda with cuda and gcc=11 installed, but that isn't the point, also @Titaniumtown, i am not a programmer to use/know zig, i wanna proper binary support ;-;

jboero · 2024-01-27T16:59:06Z

I package RPMs using Fedora COPR for a few architectures and platforms (Fedora, CentOS, Amazon Linux, etc) on x86_64, aarch64, etc. These are basic builds without AVX512 or some optimizations. I use build date as version because code tags/hashes aren't ideal and don't sort well.

https://copr.fedorainfracloud.org/coprs/boeroboy/brynzai/monitor/

The OSS nature of COPR means I can't really pre-build the CUBLAS releases with CUDA being a proprietary license but they can still be built locally. I've been pushing for these to be mainlined in Fedora. What is your OS?

I also notice that OpenBLAS builds in Fedora aren't straightforward because shell pkg-config --cflags-only-I openblas isn't able to find the openblas libraries using standard openblas-devel packaging in Fedora. This can be built manually adding /usr/include/openblas to include path, etc.

Then again I experience much slower inference with OpenBLAS than standard CPU compiled with LLAMA_FAST optimizations. Maybe OpenBLAS isn't optimized or doesn't properly use my 11th gen Intel efficiently.

What Linux distros are you looking for? I don't have .deb packaged. What I would recommend is a generic statically linked binary in releases. That way it should work with most Linux distros regardless of packaging. Again CUDA still can't be statically linked so this option would only support CPU, CLBLAST, or OpenBLAS.

jboero · 2024-01-27T17:00:55Z

because i get compile issues

What issues?

I llama.cpp build info: 
I UNAME_S:   Linux
I UNAME_P:   x86_64
I UNAME_M:   x86_64
I CFLAGS:    -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_OPENBLAS -I/usr/include/openblas  -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -pthread -march=native -mtune=native  -Wdouble-promotion 
I CXXFLAGS:  -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_OPENBLAS -I/usr/include/openblas  -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi
I NVCCFLAGS: -use_fast_math --forward-unknown-to-host-compiler -arch=native -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 
I LDFLAGS:   -L/usr/lib64/openblas-openmp -lopenblas  -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib -L/usr/local/cuda/targets/aarch64-linux/lib -L/usr/lib/wsl/lib 
I CC:        cc (conda-forge gcc 13.2.0-3) 13.2.0
I CXX:       g++ (conda-forge gcc 13.2.0-3) 13.2.0

cc  -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_OPENBLAS -I/usr/include/openblas  -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -pthread -march=native -mtune=native  -Wdouble-promotion    -c ggml.c -o ggml.o
ggml.c: In function 'ggml_init':
ggml.c:2293:9: error: implicit declaration of function 'ggml_init_cublas'; did you mean 'ggml_cpu_has_cublas'? [-Werror=implicit-function-declaration]
 2293 |         ggml_init_cublas();
      |         ^~~~~~~~~~~~~~~~
      |         ggml_cpu_has_cublas
ggml.c: In function 'ggml_compute_forward':
ggml.c:14680:21: error: implicit declaration of function 'ggml_cuda_compute_forward'; did you mean 'ggml_compute_forward'? [-Werror=implicit-function-declaration]
14680 |     bool skip_cpu = ggml_cuda_compute_forward(params, tensor);
      |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
      |                     ggml_compute_forward
cc1: some warnings being treated as errors
make: *** [Makefile:539: ggml.o] Error 1

the error is random at times, my environment is conda with cuda and gcc=11 installed, but that isn't the point, also @Titaniumtown, i am not a programmer to use/know zig, i wanna proper binary support ;-;

It looks like you're enabling both LLAMA_OPENBLAS and LLAMA_CUBLAS which is probably your problem. Remove one of them and try again? (Env variables possibly?)

ngxson · 2024-01-28T11:01:12Z

FYI, if you're not familiar with compiling, just use docker.

Install docker using one command: https://gist.github.com/zulhfreelancer/254c4a157c586dd232c1a51db0f6eac3

Then heading to the "Docker" section in README: https://github.com/ggerganov/llama.cpp#docker

julien-c · 2024-03-14T12:40:45Z

came here to +1 this

Vaibhavs10 · 2024-03-15T11:43:14Z

Came here to +1, too - especially for Mac!

FishHawk · 2024-04-29T03:47:47Z

For some cloud servers that charge by the second, using precompiled llamacpp saves compilation time, which means saving money.

DarkReaperBoy added the enhancement New feature or request label Jan 24, 2024

github-actions bot added the stale label Apr 15, 2024

github-actions bot removed the stale label Apr 30, 2024

github-actions bot added the stale label May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding support for linux binaries #5106

adding support for linux binaries #5106

DarkReaperBoy commented Jan 24, 2024

supportend commented Jan 24, 2024 •

edited

Titaniumtown commented Jan 24, 2024 •

edited

DarkReaperBoy commented Jan 26, 2024 •

edited

jboero commented Jan 27, 2024

jboero commented Jan 27, 2024 •

edited

ngxson commented Jan 28, 2024

julien-c commented Mar 14, 2024

Vaibhavs10 commented Mar 15, 2024

FishHawk commented Apr 29, 2024

adding support for linux binaries #5106

adding support for linux binaries #5106

Comments

DarkReaperBoy commented Jan 24, 2024

Prerequisites

Feature Description

Motivation

Possible Implementation

supportend commented Jan 24, 2024 • edited

Titaniumtown commented Jan 24, 2024 • edited

DarkReaperBoy commented Jan 26, 2024 • edited

jboero commented Jan 27, 2024

jboero commented Jan 27, 2024 • edited

ngxson commented Jan 28, 2024

julien-c commented Mar 14, 2024

Vaibhavs10 commented Mar 15, 2024

FishHawk commented Apr 29, 2024

supportend commented Jan 24, 2024 •

edited

Titaniumtown commented Jan 24, 2024 •

edited

DarkReaperBoy commented Jan 26, 2024 •

edited

jboero commented Jan 27, 2024 •

edited