Unable to run with Metal on M1 Mac (normal works fine) #2048

cheese-melted · 2023-06-29T11:47:30Z

Prerequisites

I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

Behavior

Able to run
$./main -m ./models/7B/ggml-model-q4_0.bin -n 128
Unable to run
$./main -m ./models/7B/ggml-model-q4_0.bin -n 128 -ngl 1

Environment and Context

$ python3 --version
Python 3.9.12

$ make --version
GNU Make 3.81
...
This program built for i386-apple-darwin11.3.0

$ g++ --version
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
Target: arm64-apple-darwin22.5.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

Failure Information (for bugs)

I notice:

examples/embd-input/embd-input-lib.cpp:217:12: warning: address of stack memory associated with local variable 'ret' returned [-Wreturn-stack-address]
    return ret.c_str();
           ^~~
1 warning generated.

during the build.

On running:
$./main -m ./models/7B/ggml-model-q4_0.bin -n 128 -ngl 1

I get (see logs below for full output):

ggml_metal_graph_compute: command buffer 0 failed with status 5
GGML_ASSERT: ggml-metal.m:975: false

Steps to Reproduce

$make clean
$LLAMA_METAL=1 METAL
$./main -m ./models/7B/ggml-model-q4_0.bin -n 128 -ngl 1

Failure Logs

$ make clean
I llama.cpp build info: 
I UNAME_S:  Darwin
I UNAME_P:  arm
I UNAME_M:  arm64
I CFLAGS:   -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -DGGML_USE_K_QUANTS
I LDFLAGS:   -framework Accelerate
I CC:       Apple clang version 14.0.3 (clang-1403.0.22.14.1)
I CXX:      Apple clang version 14.0.3 (clang-1403.0.22.14.1)

rm -vf *.o *.so main quantize quantize-stats perplexity embedding benchmark-matmult save-load-state server vdot train-text-from-scratch embd-input-test build-info.h



$ LLAMA_METAL=1 make
I llama.cpp build info: 
I UNAME_S:  Darwin
I UNAME_P:  arm
I UNAME_M:  arm64
I CFLAGS:   -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL -DGGML_METAL_NDEBUG
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -DGGML_USE_K_QUANTS -DGGML_USE_METAL
I LDFLAGS:   -framework Accelerate -framework Foundation -framework Metal -framework MetalKit -framework MetalPerformanceShaders
I CC:       Apple clang version 14.0.3 (clang-1403.0.22.14.1)
I CXX:      Apple clang version 14.0.3 (clang-1403.0.22.14.1)

cc  -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL -DGGML_METAL_NDEBUG   -c ggml.c -o ggml.o
c++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -DGGML_USE_K_QUANTS -DGGML_USE_METAL -c llama.cpp -o llama.o
c++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -DGGML_USE_K_QUANTS -DGGML_USE_METAL -c examples/common.cpp -o common.o
cc -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL -DGGML_METAL_NDEBUG   -c -o k_quants.o k_quants.c
cc -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL -DGGML_METAL_NDEBUG -c ggml-metal.m -o ggml-metal.o
c++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -DGGML_USE_K_QUANTS -DGGML_USE_METAL examples/main/main.cpp ggml.o llama.o common.o k_quants.o ggml-metal.o -o main  -framework Accelerate -framework Foundation -framework Metal -framework MetalKit -framework MetalPerformanceShaders

====  Run ./main -h for help.  ====

c++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -DGGML_USE_K_QUANTS -DGGML_USE_METAL examples/quantize/quantize.cpp ggml.o llama.o k_quants.o ggml-metal.o -o quantize  -framework Accelerate -framework Foundation -framework Metal -framework MetalKit -framework MetalPerformanceShaders
c++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -DGGML_USE_K_QUANTS -DGGML_USE_METAL examples/quantize-stats/quantize-stats.cpp ggml.o llama.o k_quants.o ggml-metal.o -o quantize-stats  -framework Accelerate -framework Foundation -framework Metal -framework MetalKit -framework MetalPerformanceShaders
c++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -DGGML_USE_K_QUANTS -DGGML_USE_METAL examples/perplexity/perplexity.cpp ggml.o llama.o common.o k_quants.o ggml-metal.o -o perplexity  -framework Accelerate -framework Foundation -framework Metal -framework MetalKit -framework MetalPerformanceShaders
c++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -DGGML_USE_K_QUANTS -DGGML_USE_METAL examples/embedding/embedding.cpp ggml.o llama.o common.o k_quants.o ggml-metal.o -o embedding  -framework Accelerate -framework Foundation -framework Metal -framework MetalKit -framework MetalPerformanceShaders
c++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -DGGML_USE_K_QUANTS -DGGML_USE_METAL pocs/vdot/vdot.cpp ggml.o k_quants.o ggml-metal.o -o vdot  -framework Accelerate -framework Foundation -framework Metal -framework MetalKit -framework MetalPerformanceShaders
c++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -DGGML_USE_K_QUANTS -DGGML_USE_METAL examples/train-text-from-scratch/train-text-from-scratch.cpp ggml.o llama.o k_quants.o ggml-metal.o -o train-text-from-scratch  -framework Accelerate -framework Foundation -framework Metal -framework MetalKit -framework MetalPerformanceShaders
c++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -DGGML_USE_K_QUANTS -DGGML_USE_METAL examples/simple/simple.cpp ggml.o llama.o common.o k_quants.o ggml-metal.o -o simple  -framework Accelerate -framework Foundation -framework Metal -framework MetalKit -framework MetalPerformanceShaders
c++ --shared -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -DGGML_USE_K_QUANTS -DGGML_USE_METAL examples/embd-input/embd-input-lib.cpp ggml.o llama.o common.o k_quants.o ggml-metal.o -o libembdinput.so  -framework Accelerate -framework Foundation -framework Metal -framework MetalKit -framework MetalPerformanceShaders
examples/embd-input/embd-input-lib.cpp:217:12: warning: address of stack memory associated with local variable 'ret' returned [-Wreturn-stack-address]
    return ret.c_str();
           ^~~
1 warning generated.
c++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -DGGML_USE_K_QUANTS -DGGML_USE_METAL examples/embd-input/embd-input-test.cpp ggml.o llama.o common.o k_quants.o ggml-metal.o -o embd-input-test  -framework Accelerate -framework Foundation -framework Metal -framework MetalKit -framework MetalPerformanceShaders -L. -lembdinput



$./main -m ./models/7B/ggml-model-q4_0.bin -n 128

main: build = 762 (96a712c)
main: seed  = 1688038998
llama.cpp: loading model from ./models/7B/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.08 MB
llama_model_load_internal: mem required  = 5439.94 MB (+ 1026.00 MB per state)
llama_new_context_with_model: kv self size  =  256.00 MB

system_info: n_threads = 4 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 128, n_keep = 0



$ ./main -m ./models/7B/ggml-model-q4_0.bin -n 128 -ngl 1
main: build = 762 (96a712c)
main: seed  = 1688038252
llama.cpp: loading model from ./models/7B/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.08 MB
llama_model_load_internal: mem required  = 5439.94 MB (+ 1026.00 MB per state)
llama_new_context_with_model: kv self size  =  256.00 MB
ggml_metal_init: allocating
ggml_metal_init: using MPS
ggml_metal_init: loading '/Users/alan/llama.cpp/ggml-metal.metal'
ggml_metal_init: loaded kernel_add                            0x128007550
ggml_metal_init: loaded kernel_mul                            0x128007c70
ggml_metal_init: loaded kernel_mul_row                        0x1280082a0
ggml_metal_init: loaded kernel_scale                          0x1280087c0
ggml_metal_init: loaded kernel_silu                           0x128008ce0
ggml_metal_init: loaded kernel_relu                           0x128009200
ggml_metal_init: loaded kernel_gelu                           0x128009720
ggml_metal_init: loaded kernel_soft_max                       0x128009dd0
ggml_metal_init: loaded kernel_diag_mask_inf                  0x12800a430
ggml_metal_init: loaded kernel_get_rows_f16                   0x12800aab0
ggml_metal_init: loaded kernel_get_rows_q4_0                  0x12800b130
ggml_metal_init: loaded kernel_get_rows_q4_1                  0x12800b920
ggml_metal_init: loaded kernel_get_rows_q2_K                  0x12800bfa0
ggml_metal_init: loaded kernel_get_rows_q3_K                  0x12800c620
ggml_metal_init: loaded kernel_get_rows_q4_K                  0x12800cca0
ggml_metal_init: loaded kernel_get_rows_q5_K                  0x12800d320
ggml_metal_init: loaded kernel_get_rows_q6_K                  0x12800d9a0
ggml_metal_init: loaded kernel_rms_norm                       0x12800e050
ggml_metal_init: loaded kernel_norm                           0x12800e700
ggml_metal_init: loaded kernel_mul_mat_f16_f32                0x12800f0d0
ggml_metal_init: loaded kernel_mul_mat_q4_0_f32               0x12800f7b0
ggml_metal_init: loaded kernel_mul_mat_q4_1_f32               0x12800fe90
ggml_metal_init: loaded kernel_mul_mat_q2_K_f32               0x128010570
ggml_metal_init: loaded kernel_mul_mat_q3_K_f32               0x128010df0
ggml_metal_init: loaded kernel_mul_mat_q4_K_f32               0x1280114d0
ggml_metal_init: loaded kernel_mul_mat_q5_K_f32               0x128011bb0
ggml_metal_init: loaded kernel_mul_mat_q6_K_f32               0x128012290
ggml_metal_init: loaded kernel_rope                           0x128012d80
ggml_metal_init: loaded kernel_alibi_f32                      0x128013640
ggml_metal_init: loaded kernel_cpy_f32_f16                    0x128013ed0
ggml_metal_init: loaded kernel_cpy_f32_f32                    0x128014760
ggml_metal_init: loaded kernel_cpy_f16_f16                    0x128014ff0
ggml_metal_init: recommendedMaxWorkingSetSize =  5461.34 MB
ggml_metal_init: hasUnifiedMemory             = true
ggml_metal_init: maxTransferRate              = built-in GPU
llama_new_context_with_model: max tensor size =   102.54 MB
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  3648.31 MB, ( 3648.70 /  5461.34)
ggml_metal_add_buffer: allocated 'eval            ' buffer, size =   768.00 MB, ( 4416.70 /  5461.34)
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =   258.00 MB, ( 4674.70 /  5461.34)
ggml_metal_add_buffer: allocated 'scr0            ' buffer, size =   512.00 MB, ( 5186.70 /  5461.34)
ggml_metal_add_buffer: allocated 'scr1            ' buffer, size =   512.00 MB, ( 5698.70 /  5461.34), warning: current allocated size is greater than the recommended max working set size

system_info: n_threads = 4 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 128, n_keep = 0


ggml_metal_graph_compute: command buffer 0 failed with status 5
GGML_ASSERT: ggml-metal.m:975: false
zsh: abort      ./main -m ./models/7B/ggml-model-q4_0.bin -n 128 -ngl 1



$ git log | head -1
commit 96a712ca1b7f427e3bd7ffc0c70b2105cfc7fbf1



$pip list | egrep "torch|numpy|sentencepiece"
numpy                  1.24.0
sentencepiece          0.1.98
torch                  2.0.0

The text was updated successfully, but these errors were encountered:

artemsablin · 2023-06-29T17:09:54Z

I may be wrong but status 5 is you running out of memory. Metal implementations cannot use more than half of the unified memory (or however much is reported in the maxworkingset) I think.

In your logs:

llama_model_load_internal: mem required  = 5439.94 MB (+ 1026.00 MB per state)

ggml_metal_init: recommendedMaxWorkingSetSize =  5461.34 MB

Try a smaller model.

cheese-melted · 2023-06-30T10:34:36Z

Okay thanks 😎. If anybody ends up able to run the 7B model 4.0 quantised with metal enabled on M1 Mac 8GB RAM let me know!!

gunners81 · 2023-07-01T12:53:34Z

Okay thanks 😎. If anybody ends up able to run the 7B model 4.0 quantised with metal enabled on M1 Mac 8GB RAM let me know!!

i think its not possible for now, due to hardware limited. my M1 Mac Air 8GB only able to load 3B model using GPU Inference / MPS, im happy with it :), much faster than CPU

ggerganov · 2023-07-01T18:49:24Z

Have you tried this branch: #2011

cheese-melted · 2023-07-03T00:34:39Z

I've just given it a go - still same error message.

RonanKMcGovern · 2023-08-03T22:29:43Z

I managed to get it working with the smallest model (llama-2-7b-chat.ggmlv3.q2_K.bin) from TheBloke. I tried llama-2-7b-chat.ggmlv3.q4_K_S too but ran out of memory.

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
LLAMA_METAL=1 make

then

./main -t 8 -ngl 32 -m llama-2-7b-chat.ggmlv3.q2_K.bin --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 --in-prefix-bos --in-prefix ' [INST] ' --in-suffix ' [/INST]' -i -ins

I was able to chat with Llama 2 using that. The streaming is not smooth, but not too bad. Here is the data:

>  [INST] How's the form?
 [/INST]The form is great! It's always a pleasure to receive feedback from you. Your insights and suggestions are invaluable, and I will definitely consider them as I continue to improve my responses. Thank you for taking the time to review my work and for your ongoing support. How may I assist you today?

>  [INST] Can you search the web?
 [/INST]Of course! I can definitely help you with that. What would you like me to search for? Please provide me with the relevant keywords or terms, and I will do my best to find the information you're looking for.

>  [INST] Search for apple tart
 [/INST]Sure! Here are some delicious apple tart recipes you might enjoy:

1. Apple Tart Recipe - A classic recipe with a buttery crust and a sweet, juicy apple filling.
2. Mini Apple Tarts - Bite-sized versions of the classic tart, perfect for individual portions or parties.
3. Apple Tart Bar - A no-bake dessert that's easy to prepare and perfect for a crowd.
4. Caramel Apple Tart - A twist on the classic tart, with a layer of caramel sauce added to the filling.
5. Apple Almond Tarts - A delicious combination of apples and almonds in a buttery crust.
6. Apple Crumble Tart - A hearty tart with a crumbly oat topping, perfect for a comforting dessert.
7. Apple Cranberry Tarts - A fruity twist on the classic tart, with cranberries added to the filling.
8. Apple Cheddar Tart - A savory tart that's perfect for a cheese board or pic
>  [INST] 

llama_print_timings:        load time =  2563.87 ms
llama_print_timings:      sample time =  1275.74 ms /   360 runs   (    3.54 ms per token,   282.19 tokens per second)
llama_print_timings: prompt eval time = 13966.18 ms /    92 tokens (  151.81 ms per token,     6.59 tokens per second)
llama_print_timings:        eval time = 114639.65 ms /   360 runs   (  318.44 ms per token,     3.14 tokens per second)
llama_print_timings:       total time = 171982.80 ms

cheese-melted · 2023-08-04T09:20:43Z

Thanks! The 3 bit quantisation works as well :)

linghunjiushu · 2023-08-07T09:38:20Z

Have you tried this branch: #2011

Thanks. Hope helps.

I tried #2011, it dont works. And I change n_ctx from 4096 to 2000, then it works fine.

my env : apple m2 pro 16gb
model : llama-2-13b-chat.ggmlv3.q4_0.bin

RDearnaley · 2023-09-10T01:23:37Z

What would be nice was if reducing --gpulayers fixed the problem, so only the layers that Metal was actually running counted towards the recommendedMaxWorkingSetSize. I have an M2 MacBook Pro with 64GB, recommendedMaxWorkingSetSize = 49152.00 MB, so I can run LLama2 70B 6-bit quantized on CPU (rather slow), but not on GPU — I guess I'll need to redownload one of the 5-bit quantized versions...

kitther · 2024-02-15T04:04:40Z

@cheese-melted , reduce n_gpu_layers can reduce the mem pressure on gpu, which can solve your problem. But it will use cpu more and thus slow you down.

viandmarket25 · 2024-02-29T18:30:18Z

From my experience,

Device: MacBook Pro 2017, 13inch, intel iris gpu, 16gb ram (Metal not supported)

I tried running the llama-2-7b.Q2_K.gguf (2.5gb)

got errors including +ggml_metal_graph_compute: command buffer 0 failed with status 5

My solution was to re-compile using

make LLAMA_NO_METAL=1

It now works, even though it is not so fast.

cheese-melted closed this as not planned Won't fix, can't repro, duplicate, stale Jun 30, 2023

a652 mentioned this issue Nov 1, 2023

GPU方式运行模型服务出错 WisdomShell/codeshell-vscode#42

Open

martindevans mentioned this issue Dec 4, 2023

CUDA error 700 : an illegal memory access was encountered SciSharp/LLamaSharp#343

Open

larrycai mentioned this issue Dec 23, 2023

any environment requirement for the model, doesn't work in MacAir M1 (16G) ise-uiuc/magicoder#19

Closed

liangDarwin2 mentioned this issue Feb 2, 2024

Feature Request: run large gguf file in low RAM machine #5207

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to run with Metal on M1 Mac (normal works fine) #2048

Unable to run with Metal on M1 Mac (normal works fine) #2048

cheese-melted commented Jun 29, 2023

artemsablin commented Jun 29, 2023

cheese-melted commented Jun 30, 2023

gunners81 commented Jul 1, 2023

ggerganov commented Jul 1, 2023

cheese-melted commented Jul 3, 2023

RonanKMcGovern commented Aug 3, 2023

cheese-melted commented Aug 4, 2023

linghunjiushu commented Aug 7, 2023

RDearnaley commented Sep 10, 2023

kitther commented Feb 15, 2024

viandmarket25 commented Feb 29, 2024 •

edited

Loading

Unable to run with Metal on M1 Mac (normal works fine) #2048

Unable to run with Metal on M1 Mac (normal works fine) #2048

Comments

cheese-melted commented Jun 29, 2023

Prerequisites

Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

artemsablin commented Jun 29, 2023

cheese-melted commented Jun 30, 2023

gunners81 commented Jul 1, 2023

ggerganov commented Jul 1, 2023

cheese-melted commented Jul 3, 2023

RonanKMcGovern commented Aug 3, 2023

cheese-melted commented Aug 4, 2023

linghunjiushu commented Aug 7, 2023

RDearnaley commented Sep 10, 2023

kitther commented Feb 15, 2024

viandmarket25 commented Feb 29, 2024 • edited Loading

viandmarket25 commented Feb 29, 2024 •

edited

Loading