How to install on macOS? #146

0xdevalias · 2023-04-09T05:04:27Z

Originally posted as part of the following issue:

Fix for MPS support on Apple Silicon oobabooga/text-generation-webui#393 (comment)

As part of that, I got: ModuleNotFoundError: No module named 'llama_inference_offload'

Which led me to this repo, where I tried to install the requirements as follows:

⇒ cd ..
# ..snip..

⇒ git clone git@github.com:qwopqwop200/GPTQ-for-LLaMa.git
# ..snip..

⇒ cd GPTQ-for-LLaMa
# ..snip..

⇒ pyenv local miniconda3-latest/envs/textgen
# ..snip..

⇒ pip install -r requirements.txt
Collecting git+https://github.com/huggingface/transformers (from -r requirements.txt (line 4))
  Cloning https://github.com/huggingface/transformers to /private/var/folders/j4/kxtq1cjs1l98xfqncjbsbx1c0000gn/T/pip-req-build-_6j4_tu0
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers /private/var/folders/j4/kxtq1cjs1l98xfqncjbsbx1c0000gn/T/pip-req-build-_6j4_tu0
  Resolved https://github.com/huggingface/transformers to commit 656e869a4523f6a0ce90b3aacbb05cc8fb5794bb
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: safetensors==0.3.0 in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen/lib/python3.10/site-packages (from -r requirements.txt (line 1)) (0.3.0)
Collecting datasets==2.10.1
  Downloading datasets-2.10.1-py3-none-any.whl (469 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 469.0/469.0 kB 6.8 MB/s eta 0:00:00
Requirement already satisfied: sentencepiece in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen/lib/python3.10/site-packages (from -r requirements.txt (line 3)) (0.1.97)
Collecting accelerate==0.17.1
  Using cached accelerate-0.17.1-py3-none-any.whl (212 kB)
ERROR: Could not find a version that satisfies the requirement triton==2.0.0 (from versions: none)
ERROR: No matching distribution found for triton==2.0.0

But that resulted in the errors:

ERROR: Could not find a version that satisfies the requirement triton==2.0.0 (from versions: none)
ERROR: No matching distribution found for triton==2.0.0

Looking at pypi there appears to be 2.0.0 version of triton, so i'm not sure why it wouldn't be able to install it:

https://pypi.org/project/triton/#history

Looking at the built files for version 2.0.0:

https://pypi.org/project/triton/2.0.0/#files

I'm guessing it might be because there may not be a python 3.10.x version built?

⇒ python --version
Python 3.10.9

The text was updated successfully, but these errors were encountered:

0xdevalias · 2023-04-09T05:22:30Z

Seems I still got the same error on python 3.9.x:

⇒ python --version
Python 3.9.16

⇒ pip install -r requirements.txt
Collecting git+https://github.com/huggingface/transformers (from -r requirements.txt (line 4))
  Cloning https://github.com/huggingface/transformers to /private/var/folders/j4/kxtq1cjs1l98xfqncjbsbx1c0000gn/T/pip-req-build-wwj2wmga
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers /private/var/folders/j4/kxtq1cjs1l98xfqncjbsbx1c0000gn/T/pip-req-build-wwj2wmga
  Resolved https://github.com/huggingface/transformers to commit 656e869a4523f6a0ce90b3aacbb05cc8fb5794bb
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: safetensors==0.3.0 in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages (from -r requirements.txt (line 1)) (0.3.0)
Collecting datasets==2.10.1
  Using cached datasets-2.10.1-py3-none-any.whl (469 kB)
Requirement already satisfied: sentencepiece in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages (from -r requirements.txt (line 3)) (0.1.97)
Collecting accelerate==0.17.1
  Using cached accelerate-0.17.1-py3-none-any.whl (212 kB)
ERROR: Could not find a version that satisfies the requirement triton==2.0.0 (from versions: none)
ERROR: No matching distribution found for triton==2.0.0

Potentially related:

Is Triton unable to install in python 3.10 versions? triton-lang/triton#1057
- Is Triton unable to install in python 3.10 versions? triton-lang/triton#1057 (comment)
  - I was able to do the build from source and make it work on Mac.
- Is Triton unable to install in python 3.10 versions? triton-lang/triton#1057 (comment)
  - Note that we have long-term plan to have Triton also work on Mac with Apple GPUs, but this will take time to materialize. But when the time comes, MacOS will be added as a supported platform.
Package does not exist on macOS (intel) triton-lang/triton#1465

0xdevalias · 2023-04-09T05:37:56Z

Seems that triton will install from source ok though:

On my MacBook Pro 2019 (intel), I followed the install instructions from here (along with a little extra to setup a conda environment to do it in):

https://github.com/openai/triton#install-from-source

As follows:

⇒ conda create -n textgen_py3_9_16 python=3.9.16
# ..snip..

⇒ conda activate textgen_py3_9_16
# ..snip..

⇒ git clone git@github.com:openai/triton.git
# ..snip..

⇒ cd triton/python
# ..snip..

⇒ pip install cmake
# ..snip..

⇒ pip install -e .
Obtaining file:///Users/devalias/dev/AI/text-generation-webui/repositories/triton/python
  Preparing metadata (setup.py) ... done
Requirement already satisfied: filelock in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages (from triton==2.1.0) (3.11.0)
Installing collected packages: triton
  Running setup.py develop for triton
Successfully installed triton-2.1.0

Originally posted by @0xdevalias in triton-lang/triton#1465 (comment)

That installed version 2.1.0, but we can do 2.0.0 by doing the following:

⇒ git checkout v2.0.0
# ..snip..

⇒ pip install -e .
Obtaining file:///Users/devalias/dev/AI/text-generation-webui/repositories/triton/python
  Preparing metadata (setup.py) ... done
Requirement already satisfied: cmake in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages (from triton==2.0.0) (3.26.1)
Requirement already satisfied: filelock in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages (from triton==2.0.0) (3.11.0)
Requirement already satisfied: torch in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages (from triton==2.0.0) (2.0.0)
Collecting lit
  Downloading lit-16.0.0.tar.gz (144 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 145.0/145.0 kB 3.3 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Requirement already satisfied: jinja2 in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages (from torch->triton==2.0.0) (3.1.2)
Requirement already satisfied: sympy in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages (from torch->triton==2.0.0) (1.11.1)
Requirement already satisfied: typing-extensions in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages (from torch->triton==2.0.0) (4.5.0)
Requirement already satisfied: networkx in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages (from torch->triton==2.0.0) (3.1)
Requirement already satisfied: MarkupSafe>=2.0 in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages (from jinja2->torch->triton==2.0.0) (2.1.2)
Requirement already satisfied: mpmath>=0.19 in /Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages (from sympy->torch->triton==2.0.0) (1.3.0)
Building wheels for collected packages: lit
  Building wheel for lit (setup.py) ... done
  Created wheel for lit: filename=lit-16.0.0-py3-none-any.whl size=93582 sha256=90c6c50decf1b60e45356b3a993c62d719b6506090f7899d82f6e2f9ef0ff031
  Stored in directory: /Users/devalias/Library/Caches/pip/wheels/c7/ee/80/1520ca86c3557f70e5504b802072f7fc3b0e2147f376b133ed
Successfully built lit
Installing collected packages: lit, triton
  Attempting uninstall: triton
    Found existing installation: triton 2.1.0
    Uninstalling triton-2.1.0:
      Successfully uninstalled triton-2.1.0
  Running setup.py develop for triton
Successfully installed lit-16.0.0 triton-2.0.0

Once I did that, I could go back to this project and pip install -r requirements.txt completed successfully!

0xdevalias · 2023-04-09T06:27:58Z

After a few little hacks (see linked issue comment below) I managed to get the main webUI to start and load the model:

Fix for MPS support on Apple Silicon oobabooga/text-generation-webui#393 (comment)

But then it fails when it tries to generate any of the prompts it raises AssertionError: Torch not compiled with CUDA enabled, even though I passed --cpu through to the webui (though I suspect this project still tries to load it on the GPU despite that?):

Traceback (most recent call last):
  File "/Users/devalias/dev/AI/text-generation-webui/modules/callbacks.py", line 66, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "/Users/devalias/dev/AI/text-generation-webui/modules/text_generation.py", line 220, in generate_with_callback
    shared.model.generate(**kwargs)
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/transformers/generation/utils.py", line 1485, in generate
    return self.sample(
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/transformers/generation/utils.py", line 2524, in sample
    outputs = self(
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 687, in forward
    outputs = self.model(
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 577, in forward
    layer_outputs = decoder_layer(
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 196, in forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/devalias/dev/AI/text-generation-webui/repositories/GPTQ-for-LLaMa/quant.py", line 450, in forward
    out = QuantLinearFunction.apply(x.reshape(-1,x.shape[-1]), self.qweight, self.scales,
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/torch/cuda/amp/autocast_mode.py", line 106, in decorate_fwd
    return fwd(*args, **kwargs)
  File "/Users/devalias/dev/AI/text-generation-webui/repositories/GPTQ-for-LLaMa/quant.py", line 364, in forward
    output = matmul248(input, qweight, scales, qzeros, g_idx, bits, maxq)
  File "/Users/devalias/dev/AI/text-generation-webui/repositories/GPTQ-for-LLaMa/quant.py", line 336, in matmul248
    output = torch.empty((input.shape[0], qweight.shape[1]), device='cuda', dtype=torch.float16)
  File "/Users/devalias/.pyenv/versions/miniconda3-latest/envs/textgen_py3_9_16/lib/python3.9/site-packages/torch/cuda/__init__.py", line 239, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
Output generated in 0.31 seconds (0.00 tokens/s, 0 tokens, context 67)

Searching for hardcoded references to cuda:

https://github.com/search?q=repo%3Aqwopqwop200%2FGPTQ-for-LLaMa%20cuda&type=code

These are the files that seem to be hardcoding the device:

GPTQ-for-LLaMa/modelutils.py

Line 5 in 9463299

DEV = torch.device('cuda:0')
GPTQ-for-LLaMa/llama_inference.py

Line 12 in 9463299

DEV = torch.device('cuda:0')
GPTQ-for-LLaMa/llama_inference_offload.py

Line 12 in 9463299

DEV = torch.device('cuda:0')

GPTQ-for-LLaMa/quant.py

Lines 335 to 358 in 9463299

    
           def matmul248(input, qweight, scales, qzeros, g_idx, bits, maxq): 
        
               output = torch.empty((input.shape[0], qweight.shape[1]), device='cuda', dtype=torch.float16) 
        
               grid = lambda META: (triton.cdiv(input.shape[0], META['BLOCK_SIZE_M']) * triton.cdiv(qweight.shape[1], META['BLOCK_SIZE_N']),) 
        
               matmul_248_kernel[grid](input, qweight, output, 
        
                                       scales, qzeros, g_idx, 
        
                                       input.shape[0], qweight.shape[1], input.shape[1], bits, maxq, 
        
                                       input.stride(0), input.stride(1), 
        
                                       qweight.stride(0), qweight.stride(1), 
        
                                       output.stride(0), output.stride(1), 
        
                                       scales.stride(0), qzeros.stride(0)) 
        
               return output 
        
           def transpose_matmul248(input, qweight, scales, qzeros, g_idx, bits, maxq): 
        
               output_dim = (qweight.shape[0] * 32) // bits 
        
               output = torch.empty((input.shape[0], output_dim), device='cuda', dtype=torch.float16) 
        
               grid = lambda META: (triton.cdiv(input.shape[0], META['BLOCK_SIZE_M']) * triton.cdiv(output_dim, META['BLOCK_SIZE_K']),) 
        
               transpose_matmul_248_kernel[grid](input, qweight, output, 
        
                                                 scales, qzeros, g_idx, 
        
                                                 input.shape[0], qweight.shape[1], output_dim, bits, maxq, 
        
                                                 input.stride(0), input.stride(1), 
        
                                                 qweight.stride(0), qweight.stride(1), 
        
                                                 output.stride(0), output.stride(1), 
        
                                                 scales.stride(0), qzeros.stride(0)) 
        
               return output

GPTQ-for-LLaMa/quant.py

Lines 455 to 484 in 9463299

    
           def autotune_warmup(model, transpose = False): 
        
               """ 
        
               Pre-tunes the quantized kernel 
        
               """ 
        
               from tqdm import tqdm 
        
               n_values = {} 
        
               for _, m in model.named_modules(): 
        
                   if not isinstance(m, QuantLinear): 
        
                       continue 
        
                   k = m.infeatures 
        
                   n = m.outfeatures 
        
                   if n not in n_values: 
        
                       n_values[n] = (k, m.qweight.cuda(), m.scales.cuda(), m.qzeros.cuda(), m.g_idx.cuda(), m.bits, m.maxq) 
        
               print(f'Found {len(n_values)} unique N values.') 
        
               print('Warming up autotune cache ...') 
        
               for m in tqdm(range(0, 12)): 
        
                   m = 2 ** m   # [1, 2048] 
        
                   for n, (k, qweight, scales, qzeros, g_idx, bits, maxq) in n_values.items(): 
        
                       a = torch.randn(m, k, dtype=torch.float16, device='cuda') 
        
                       matmul248(a, qweight, scales, qzeros, g_idx, bits, maxq) 
        
                       if transpose: 
        
                           a = torch.randn(m, n, dtype=torch.float16, device='cuda') 
        
                           transpose_matmul248(a, qweight, scales, qzeros, g_idx, bits, maxq) 
        
               del n_values

Whereas in the text-generation-webui there appears to be code setting device=cpu:

https://github.com/oobabooga/text-generation-webui/blob/bce1b7fbb24704eb6bb80302fb8ff2e06e4a285b/modules/models.py#L93

qwopqwop200 · 2023-04-10T06:29:28Z

macos is not supported.

0xdevalias · 2023-04-10T06:43:41Z

@qwopqwop200 Is it not supported because it there are technical limitations that say it can't be, or just not supported because you don't want to have to put in the extra effort/capacity/etc to do so?

If the latter then I might look into it more, but if there are technical limitations preventing it, it would be good to know those up front.

qwopqwop200 · 2023-04-10T06:56:59Z

Currently, if you are not an apple silicon, I think you will probably apply. Apple silicon is not supported due to technical limitations.

0xdevalias · 2023-04-10T07:32:39Z

Thanks for that :)

I have both an Intel 2019 Macbook Pro (which I was using for the above), and an apple silicon M2 MacBook Pro (which I haven't tried to run anything on yet)

If you're able to, what are the technical limitations that currently prevent it from running on Apple silicon?

qwopqwop200 · 2023-04-10T07:59:08Z

To be precise, the biggest thing is not to support CUDA, and there are no other limitations.

0xdevalias · 2023-04-10T08:14:00Z

So could the references to cuda as the device just be changed to mps or cpu or similar then (which is what I was suggesting above in #146 (comment)), or are there cuda specific customisations happening in this repo's code?

qwopqwop200 · 2023-04-10T08:17:37Z

Currently, this code supports only cuda users, and it is thought that the implementation of cpu is possible. But I don't have the ability to implement it.

qwopqwop200 · 2023-04-10T08:19:18Z

Currently an alternative to this is to use llama.cpp.

0xdevalias · 2023-04-10T08:24:28Z

Ok, thanks for the info :)

Erraoudy · 2024-05-29T17:11:09Z

same issue here, can anyone help about How to install on macOS?

0xdevalias mentioned this issue Apr 9, 2023

Fix for MPS support on Apple Silicon oobabooga/text-generation-webui#393

Merged

0xdevalias changed the title ~~How to install on macOS? (ERROR: Could not find a version that satisfies the requirement triton==2.0.0 (from versions: none))~~ How to install on macOS? Apr 9, 2023

qwopqwop200 closed this as completed Apr 10, 2023

vegetax mentioned this issue May 29, 2023

'NoneType' object has no attribute 'cadam32bit_grad_fp32' oobabooga/text-generation-webui#2397

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to install on macOS? #146

How to install on macOS? #146

0xdevalias commented Apr 9, 2023 •

edited

Loading

0xdevalias commented Apr 9, 2023 •

edited

Loading

0xdevalias commented Apr 9, 2023 •

edited

Loading

0xdevalias commented Apr 9, 2023 •

edited

Loading

qwopqwop200 commented Apr 10, 2023

0xdevalias commented Apr 10, 2023 •

edited

Loading

qwopqwop200 commented Apr 10, 2023

0xdevalias commented Apr 10, 2023

qwopqwop200 commented Apr 10, 2023

0xdevalias commented Apr 10, 2023

qwopqwop200 commented Apr 10, 2023

qwopqwop200 commented Apr 10, 2023

0xdevalias commented Apr 10, 2023

Erraoudy commented May 29, 2024

How to install on macOS? #146

How to install on macOS? #146

Comments

0xdevalias commented Apr 9, 2023 • edited Loading

0xdevalias commented Apr 9, 2023 • edited Loading

0xdevalias commented Apr 9, 2023 • edited Loading

0xdevalias commented Apr 9, 2023 • edited Loading

qwopqwop200 commented Apr 10, 2023

0xdevalias commented Apr 10, 2023 • edited Loading

qwopqwop200 commented Apr 10, 2023

0xdevalias commented Apr 10, 2023

qwopqwop200 commented Apr 10, 2023

0xdevalias commented Apr 10, 2023

qwopqwop200 commented Apr 10, 2023

qwopqwop200 commented Apr 10, 2023

0xdevalias commented Apr 10, 2023

Erraoudy commented May 29, 2024

0xdevalias commented Apr 9, 2023 •

edited

Loading

0xdevalias commented Apr 9, 2023 •

edited

Loading

0xdevalias commented Apr 9, 2023 •

edited

Loading

0xdevalias commented Apr 9, 2023 •

edited

Loading

0xdevalias commented Apr 10, 2023 •

edited

Loading