Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on Macbook Pro with M1 and 64GB (docker compose -f docker-compose-70b.yml up -d) #13

Closed
itamargero opened this issue Aug 17, 2023 · 2 comments

Comments

@itamargero
Copy link

sing /usr/local/lib/python3.11/site-packages

Finished processing dependencies for llama-cpp-python==0.1.77

Initializing server with:

Batch size: 2096

Number of CPU threads: 8

Number of GPU layers: 0

Context window: 4096

Traceback (most recent call last):

File "", line 198, in _run_module_as_main

File "", line 88, in _run_code

File "/app/llama_cpp/server/main.py", line 46, in

app = create_app(settings=settings)

      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/app/llama_cpp/server/app.py", line 313, in create_app

llama = llama_cpp.Llama(

        ^^^^^^^^^^^^^^^^

File "/app/llama_cpp/llama.py", line 313, in init

assert self.model is not None

       ^^^^^^^^^^^^^^^^^^^^^^

AssertionError

Exception ignored in: <function Llama.del at 0xffff863f6200>

Traceback (most recent call last):

File "/app/llama_cpp/llama.py", line 1510, in del

if self.ctx is not None:

   ^^^^^^^^

AttributeError: 'Llama' object has no attribute 'ctx'

/usr/local/lib/python3.11/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated.

!!

    ********************************************************************************

    Please avoid running ``setup.py`` and ``easy_install``.

    Instead, use pypa/build, pypa/installer or other

    standards-based tools.


    See https://github.com/pypa/setuptools/issues/917 for details.

    ********************************************************************************

!!

easy_install.initialize_options(self)

/usr/local/lib/python3.11/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.

!!

    ********************************************************************************

    Please avoid running ``setup.py`` directly.

    Instead, use pypa/build, pypa/installer or other

    standards-based tools.


    See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.

    ********************************************************************************

!!

self.initialize_options()

/usr/local/lib/python3.11/site-packages/pydantic/_internal/fields.py:126: UserWarning: Field "model_alias" has conflict with protected namespace "model".

You may be able to resolve this warning by setting model_config['protected_namespaces'] = ('settings_',).

warnings.warn(

llama.cpp: loading model from /models/llama-2-70b-chat.bin

llama_model_load_internal: format = ggjt v3 (latest)

llama_model_load_internal: n_vocab = 32000

llama_model_load_internal: n_ctx = 4096

llama_model_load_internal: n_embd = 8192

llama_model_load_internal: n_mult = 4096

llama_model_load_internal: n_head = 64

llama_model_load_internal: n_head_kv = 64

llama_model_load_internal: n_layer = 80

llama_model_load_internal: n_rot = 128

llama_model_load_internal: n_gqa = 1

llama_model_load_internal: rnorm_eps = 1.0e-06

llama_model_load_internal: n_ff = 24576

llama_model_load_internal: freq_base = 10000.0

llama_model_load_internal: freq_scale = 1

llama_model_load_internal: ftype = 2 (mostly Q4_0)

llama_model_load_internal: model size = 65B

llama_model_load_internal: ggml ctx size = 0.21 MB

warning: failed to mlock 221184-byte buffer (after previously locking 0 bytes): Cannot allocate memory

Try increasing RLIMIT_MLOCK ('ulimit -l' as root).

error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024

llama_load_model_from_file: failed to load model

Traceback (most recent call last):

File "", line 198, in _run_module_as_main

File "", line 88, in _run_code

File "/app/llama_cpp/server/main.py", line 46, in

app = create_app(settings=settings)

      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/app/llama_cpp/server/app.py", line 313, in create_app

llama = llama_cpp.Llama(

        ^^^^^^^^^^^^^^^^

File "/app/llama_cpp/llama.py", line 313, in init

assert self.model is not None

       ^^^^^^^^^^^^^^^^^^^^^^

AssertionError

Exception ignored in: <function Llama.del at 0xffff9187a200>

Traceback (most recent call last):

File "/app/llama_cpp/llama.py", line 1510, in del

if self.ctx is not None:

   ^^^^^^^^

AttributeError: 'Llama' object has no attribute 'ctx'

@itamargero
Copy link
Author

itamargero commented Aug 17, 2023

Tried the "N_GQA: '8'" from another open issue, the container loads now, but the speed is 1 token per 1 minute or so.

@mayankchhabra
Copy link
Member

Thanks for reporting, @itamargero. We have added N_GQA: 8 in the docker-compose.yml for the 70B model. The token generation is slow on M1 due to lack of GPU offloading and Metal support, so currently only the CPU is being utilized. We hope to add Metal support soon. Closing this issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants