Error on Macbook Pro with M1 and 64GB (docker compose -f docker-compose-70b.yml up -d) #13

itamargero · 2023-08-17T12:06:03Z

sing /usr/local/lib/python3.11/site-packages

Finished processing dependencies for llama-cpp-python==0.1.77

Initializing server with:

Batch size: 2096

Number of CPU threads: 8

Number of GPU layers: 0

Context window: 4096

Traceback (most recent call last):

File "", line 198, in _run_module_as_main

File "", line 88, in _run_code

File "/app/llama_cpp/server/main.py", line 46, in

app = create_app(settings=settings)

      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/app/llama_cpp/server/app.py", line 313, in create_app

llama = llama_cpp.Llama(

        ^^^^^^^^^^^^^^^^

File "/app/llama_cpp/llama.py", line 313, in init

assert self.model is not None

       ^^^^^^^^^^^^^^^^^^^^^^

AssertionError

Exception ignored in: <function Llama.del at 0xffff863f6200>

Traceback (most recent call last):

File "/app/llama_cpp/llama.py", line 1510, in del

if self.ctx is not None:

   ^^^^^^^^

AttributeError: 'Llama' object has no attribute 'ctx'

/usr/local/lib/python3.11/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated.

!!

    ********************************************************************************

    Please avoid running ``setup.py`` and ``easy_install``.

    Instead, use pypa/build, pypa/installer or other

    standards-based tools.


    See https://github.com/pypa/setuptools/issues/917 for details.

    ********************************************************************************

!!

easy_install.initialize_options(self)

/usr/local/lib/python3.11/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.

!!

    ********************************************************************************

    Please avoid running ``setup.py`` directly.

    Instead, use pypa/build, pypa/installer or other

    standards-based tools.


    See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.

    ********************************************************************************

!!

self.initialize_options()

/usr/local/lib/python3.11/site-packages/pydantic/_internal/fields.py:126: UserWarning: Field "model_alias" has conflict with protected namespace "model".

You may be able to resolve this warning by setting model_config['protected_namespaces'] = ('settings_',).

warnings.warn(

llama.cpp: loading model from /models/llama-2-70b-chat.bin

llama_model_load_internal: format = ggjt v3 (latest)

llama_model_load_internal: n_vocab = 32000

llama_model_load_internal: n_ctx = 4096

llama_model_load_internal: n_embd = 8192

llama_model_load_internal: n_mult = 4096

llama_model_load_internal: n_head = 64

llama_model_load_internal: n_head_kv = 64

llama_model_load_internal: n_layer = 80

llama_model_load_internal: n_rot = 128

llama_model_load_internal: n_gqa = 1

llama_model_load_internal: rnorm_eps = 1.0e-06

llama_model_load_internal: n_ff = 24576

llama_model_load_internal: freq_base = 10000.0

llama_model_load_internal: freq_scale = 1

llama_model_load_internal: ftype = 2 (mostly Q4_0)

llama_model_load_internal: model size = 65B

llama_model_load_internal: ggml ctx size = 0.21 MB

warning: failed to mlock 221184-byte buffer (after previously locking 0 bytes): Cannot allocate memory

Try increasing RLIMIT_MLOCK ('ulimit -l' as root).

error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024

llama_load_model_from_file: failed to load model

Traceback (most recent call last):

File "", line 198, in _run_module_as_main

File "", line 88, in _run_code

File "/app/llama_cpp/server/main.py", line 46, in

app = create_app(settings=settings)

      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/app/llama_cpp/server/app.py", line 313, in create_app

llama = llama_cpp.Llama(

        ^^^^^^^^^^^^^^^^

File "/app/llama_cpp/llama.py", line 313, in init

assert self.model is not None

       ^^^^^^^^^^^^^^^^^^^^^^

AssertionError

Exception ignored in: <function Llama.del at 0xffff9187a200>

Traceback (most recent call last):

File "/app/llama_cpp/llama.py", line 1510, in del

if self.ctx is not None:

   ^^^^^^^^

AttributeError: 'Llama' object has no attribute 'ctx'

The text was updated successfully, but these errors were encountered:

itamargero · 2023-08-17T12:41:24Z

Tried the "N_GQA: '8'" from another open issue, the container loads now, but the speed is 1 token per 1 minute or so.

mayankchhabra · 2023-08-17T17:26:24Z

Thanks for reporting, @itamargero. We have added N_GQA: 8 in the docker-compose.yml for the 70B model. The token generation is slow on M1 due to lack of GPU offloading and Metal support, so currently only the CPU is being utilized. We hope to add Metal support soon. Closing this issue for now.

mayankchhabra closed this as completed Aug 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error on Macbook Pro with M1 and 64GB (docker compose -f docker-compose-70b.yml up -d) #13

Error on Macbook Pro with M1 and 64GB (docker compose -f docker-compose-70b.yml up -d) #13

itamargero commented Aug 17, 2023

itamargero commented Aug 17, 2023 •

edited

mayankchhabra commented Aug 17, 2023

Error on Macbook Pro with M1 and 64GB (docker compose -f docker-compose-70b.yml up -d) #13

Error on Macbook Pro with M1 and 64GB (docker compose -f docker-compose-70b.yml up -d) #13

Comments

itamargero commented Aug 17, 2023

itamargero commented Aug 17, 2023 • edited

mayankchhabra commented Aug 17, 2023

itamargero commented Aug 17, 2023 •

edited