Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom CMAKE_ARGS are overwritten in the "backend/cpp/llama/grpc-server" target for the "llama-cpp" backend #1317

Closed
countzero opened this issue Nov 22, 2023 · 0 comments · Fixed by #1334
Assignees
Labels
bug Something isn't working

Comments

@countzero
Copy link

countzero commented Nov 22, 2023

LocalAI version:
https://github.com/mudler/LocalAI/tree/763f94ca80827981d0b5e5e41ee6a21fec5f5f67

Environment, CPU architecture, OS, and Version:
Linux 9a4562508d46 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 GNU/Linux
The build is executed in a Docker container based on golang:1.21-bookworm from https://hub.docker.com/_/golang

Describe the bug
All CMAKE_ARGS from the environment are overwritten in the Makefile target backend/cpp/llama/grpc-server:

CMAKE_ARGS="${ADDED_CMAKE_ARGS}" LLAMA_VERSION=$(CPPLLAMA_VERSION) $(MAKE) -C backend/cpp/llama grpc-server

To Reproduce

Build the following Dockerfile:

FROM golang:1.21-bookworm

LABEL maintainer="stadt.werk GmbH <info@stadtwerk.org>"

SHELL ["/bin/bash", "-c"]

WORKDIR /opt/stadtwerk

RUN apt-get update && \
    apt-get install --yes \
        ca-certificates \
        cmake \
        curl \
        git \
        patch \
        pip \
        software-properties-common && \
    apt-get clean

RUN apt-add-repository contrib && \
    curl -O https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb && \
    dpkg -i cuda-keyring_1.1-1_all.deb && \
    rm -f cuda-keyring_1.1-1_all.deb && \
    apt-get update && \
    apt-get install --yes \
        cuda-nvcc-12-3 \
        libcublas-dev-12-3 \
        libcusparse-dev-12-3 \
        libcusolver-dev-12-3 && \
    apt-get clean

RUN git clone https://github.com/mudler/LocalAI && \
    git -C ./LocalAI checkout "763f94c"

WORKDIR /opt/stadtwerk/LocalAI

RUN CMAKE_ARGS="-DLLAMA_NATIVE=OFF" \
    BUILD_GRPC_FOR_BACKEND_LLAMA="ON" \
    GRPC_BACKENDS="backend-assets/grpc/llama-cpp" \
    make BUILD_TYPE="cublas" CUDACXX="/usr/local/cuda/bin/nvcc" GO_TAGS="" build

HEALTHCHECK --interval=1m --timeout=10m --retries=10 \
    CMD curl --fail http://localhost:8080/readyz || exit 1

EXPOSE 8080

ENTRYPOINT ["./local-ai", "--debug"]

Expected behavior

It should pass the CMAKE_ARGS="-DLLAMA_NATIVE=OFF" to the build context of the grpc-server.

Logs

521.8 cd llama.cpp && mkdir -p build && cd build && cmake .. -Dabsl_DIR=/opt/stadtwerk/LocalAI/backend/cpp/grpc/installed_packages/lib/cmake/absl -DProtobuf_DIR=/opt/stadtwerk/LocalAI/backend/cpp/grpc/installed_packages/lib/cmake/protobuf -Dutf8_range_DIR=/opt/stadtwerk/LocalAI/backend/cpp/grpc/installed_packages/lib/cmake/utf8_range -DgRPC_DIR=/opt/stadtwerk/LocalAI/backend/cpp/grpc/installed_packages/lib/cmake/grpc -DCMAKE_CXX_STANDARD_INCLUDE_DIRECTORIES=/opt/stadtwerk/LocalAI/backend/cpp/grpc/installed_packages/include -DLLAMA_CUBLAS=ON && cmake --build . --config Release
521.9 -- The C compiler identification is GNU 12.2.0
521.9 -- The CXX compiler identification is GNU 12.2.0
521.9 -- Detecting C compiler ABI info
522.0 -- Detecting C compiler ABI info - done
522.0 -- Check for working C compiler: /usr/bin/cc - skipped
522.0 -- Detecting C compile features
522.0 -- Detecting C compile features - done
522.0 -- Detecting CXX compiler ABI info
522.1 -- Detecting CXX compiler ABI info - done
522.1 -- Check for working CXX compiler: /usr/bin/c++ - skipped
522.1 -- Detecting CXX compile features
522.1 -- Detecting CXX compile features - done
522.1 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
522.1 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
522.1 -- Found Threads: TRUE
522.1 -- Found CUDAToolkit: /usr/local/cuda/include (found version "12.3.103")
522.2 -- cuBLAS found
522.9 -- The CUDA compiler identification is NVIDIA 12.3.103
522.9 -- Detecting CUDA compiler ABI info
523.6 -- Detecting CUDA compiler ABI info - done
523.7 -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
523.7 -- Detecting CUDA compile features
523.7 -- Detecting CUDA compile features - done
523.7 -- Using CUDA architectures: 52;61;70
523.7 GNU ld (GNU Binutils for Debian) 2.40
523.7 -- CMAKE_SYSTEM_PROCESSOR: x86_64
523.7 -- x86 detected
523.7 -- Found ZLIB: /usr/lib/x86_64-linux-gnu/libz.so (found version "1.2.13")
523.7 -- Using protobuf version 24.3.0 | Protobuf_INCLUDE_DIRS:  | CMAKE_CURRENT_BINARY_DIR: /opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/build/examples/grpc-server
523.7 -- Configuring done
523.8 -- Generating done
523.9 -- Build files have been written to: /opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/build
523.9 gmake[2]: Entering directory '/opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/build'
523.9 gmake[3]: Entering directory '/opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/build'
523.9 gmake[4]: Entering directory '/opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/build'
523.9 gmake[4]: Leaving directory '/opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/build'
523.9 gmake[4]: Entering directory '/opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/build'
523.9 [  1%] Building C object CMakeFiles/ggml.dir/ggml.c.o
526.9 In function 'ggml_op_name',
526.9     inlined from 'ggml_get_n_tasks' at /opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/ggml.c:15698:17:
526.9 /opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/ggml.c:2019:24: warning: array subscript 69 is above array bounds of 'const char *[68]' [-Warray-bounds]
526.9  2019 |     return GGML_OP_NAME[op];
526.9       |            ~~~~~~~~~~~~^~~~
526.9 /opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/ggml.c: In function 'ggml_get_n_tasks':
526.9 /opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/ggml.c:1589:21: note: while referencing 'GGML_OP_NAME'
526.9  1589 | static const char * GGML_OP_NAME[GGML_OP_COUNT] = {
526.9       |                     ^~~~~~~~~~~~
530.4 [  2%] Building C object CMakeFiles/ggml.dir/ggml-alloc.c.o
530.8 [  3%] Building C object CMakeFiles/ggml.dir/ggml-backend.c.o
531.4 [  4%] Building C object CMakeFiles/ggml.dir/ggml-quants.c.o
531.7 In file included from /usr/lib/gcc/x86_64-linux-gnu/12/include/immintrin.h:105,
531.7                  from /opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/ggml-impl.h:74,
531.7                  from /opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/ggml-quants.h:3,
531.7                  from /opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/ggml-quants.c:1:
531.7 /usr/lib/gcc/x86_64-linux-gnu/12/include/fmaintrin.h: In function 'ggml_vec_dot_q4_0_q8_0':
531.7 /usr/lib/gcc/x86_64-linux-gnu/12/include/fmaintrin.h:63:1: error: inlining failed in call to 'always_inline' '_mm256_fmadd_ps': target specific option mismatch
531.7    63 | _mm256_fmadd_ps (__m256 __A, __m256 __B, __m256 __C)
531.7       | ^~~~~~~~~~~~~~~
531.7 /opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/ggml-quants.c:2520:15: note: called from here
531.7  2520 |         acc = _mm256_fmadd_ps( d, q, acc );
531.7       |               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
531.7 /usr/lib/gcc/x86_64-linux-gnu/12/include/fmaintrin.h:63:1: error: inlining failed in call to 'always_inline' '_mm256_fmadd_ps': target specific option mismatch
531.7    63 | _mm256_fmadd_ps (__m256 __A, __m256 __B, __m256 __C)
531.7       | ^~~~~~~~~~~~~~~
531.7 /opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/ggml-quants.c:2520:15: note: called from here
531.7  2520 |         acc = _mm256_fmadd_ps( d, q, acc );
531.7       |               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
531.7 /usr/lib/gcc/x86_64-linux-gnu/12/include/fmaintrin.h:63:1: error: inlining failed in call to 'always_inline' '_mm256_fmadd_ps': target specific option mismatch
531.7    63 | _mm256_fmadd_ps (__m256 __A, __m256 __B, __m256 __C)
531.7       | ^~~~~~~~~~~~~~~
531.7 /opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/ggml-quants.c:2520:15: note: called from here
531.7  2520 |         acc = _mm256_fmadd_ps( d, q, acc );
531.7       |               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
531.7 /usr/lib/gcc/x86_64-linux-gnu/12/include/fmaintrin.h:63:1: error: inlining failed in call to 'always_inline' '_mm256_fmadd_ps': target specific option mismatch
531.7    63 | _mm256_fmadd_ps (__m256 __A, __m256 __B, __m256 __C)
531.7       | ^~~~~~~~~~~~~~~
531.7 /opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/ggml-quants.c:2520:15: note: called from here
531.7  2520 |         acc = _mm256_fmadd_ps( d, q, acc );
531.7       |               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
531.7 gmake[4]: *** [CMakeFiles/ggml.dir/build.make:118: CMakeFiles/ggml.dir/ggml-quants.c.o] Error 1
531.7 gmake[4]: Leaving directory '/opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/build'
531.7 gmake[3]: *** [CMakeFiles/Makefile2:664: CMakeFiles/ggml.dir/all] Error 2
531.7 gmake[3]: Leaving directory '/opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/build'
531.7 gmake[2]: Leaving directory '/opt/stadtwerk/LocalAI/backend/cpp/llama/llama.cpp/build'
531.7 gmake[2]: *** [Makefile:146: all] Error 2
531.7 make[1]: *** [Makefile:49: grpc-server] Error 2
531.7 make[1]: Leaving directory '/opt/stadtwerk/LocalAI/backend/cpp/llama'
531.7 make: *** [Makefile:417: backend/cpp/llama/grpc-server] Error 2

Additional context

As a quick workaround you can add the missing CMAKE_ARGS like:

RUN sed -i 's/CMAKE_ARGS="${ADDED_CMAKE_ARGS}"/CMAKE_ARGS="${CMAKE_ARGS} ${ADDED_CMAKE_ARGS}"/g' Makefile

This also fixes #1196

We have to use CMAKE_ARGS="-DLLAMA_NATIVE=OFF" to fix the Error: inlining failed in call to ‘always_inline’ ‘_mm256_cvtph_ps’ as described in: ggerganov/llama.cpp#107

@countzero countzero added the bug Something isn't working label Nov 22, 2023
@mudler mudler linked a pull request Nov 25, 2023 that will close this issue
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants