Make CMake LLAMA_NATIVE flag actually use the instructions supported by the processor #3273

netrunnereve · 2023-09-20T00:21:59Z

Currently LLAMA_NATIVE doesn't disable instruction set specific flags like LLAMA_AVX when it's set. Instead we get something like -march=native -mf16c -mfma -mavx -mavx2, which won't run on a native host that doesn't support these instructions.

Also is there a reason why LLAMA_NATIVE isn't turned on by default? The Makefile already uses -march=native and someone wanting to compile for other users really should manually set the appropiate flags depending on what they want to support.
Okay this seems to be because -march=native is not supported on MSVC. Merging #809 may fix this.

staviq · 2023-09-20T01:04:04Z

Oh, I think this probably explains at least some of the occasional segfault bug reports here

Green-Sky

better would be a string, so i can give it "x86-64-v3" or something similar.

CMakeLists.txt

staviq · 2023-09-20T16:16:20Z

This is a bit tangential, and I'm writing this for future reference, but I stumbled upon quite simple trick to detect CPU extensions, either in the code or in cmake. Basically, if you run an empty string through compiler preprocessor, it wil output all compiler defines, and if your CPU supports say AVX, something like #define __AVX__ will be present in the output. I found docs for gcc and msvc specifying known define flags, and I've seen people implement that as cmake.find thingy.

I think somebody was working on CPU features detection but I can't find which issue/pr it was, so I'm leaving it here.

netrunnereve · 2023-09-20T19:10:35Z

This is a bit tangential, and I'm writing this for future reference, but I stumbled upon quite simple trick to detect CPU extensions, either in the code or in cmake. Basically, if you run an empty string through compiler preprocessor, it wil output all compiler defines, and if your CPU supports say AVX, something like #define __AVX__ will be present in the output. I found docs for gcc and msvc specifying known define flags, and I've seen people implement that as cmake.find thingy.

I think somebody was working on CPU features detection but I can't find which issue/pr it was, so I'm leaving it here.

Yeah that's #809. Since that's intended for Windows and I don't have Windows someone else will have to bring in that PR and test it (@howard0su seems to have stopped working on it).

staviq · 2023-09-21T13:55:11Z

Hi! In issue #3279 I experienced an "illegal hardware instruction" during quantization. I tried out this pull request and it fixes my bug when compiling with cmake (yay 🎉 thank you).

However, when I run make on this branch, I still get the illegal hardware instruction error. So could it be that the Makefile also doesn't disable the flags and non-native instructions are compiled into the binaries?

Make and cmake are maintained separately and sometimes PRs add or change something in only either one, usually that gets corrected after somebody notices, but generally, use whichever build system works for you, neither build system is better when it comes to the binaries you get.

netrunnereve · 2023-09-21T17:01:50Z

@thilomichael I don't have a Mac so I can't reproduce your issue with Make. Can you provide the exact flags passed to the compiler with Make (fail) and CMake (pass)?

e.g.

g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wno-format-truncation -Wno-array-bounds -pthread  -march=native -mtune=native  examples/quantize/quantize.cpp ggml.o llama.o k_quants.o ggml-alloc.o -o quantize

cebtenzzre · 2023-09-21T18:20:05Z

However, when I run make on this branch, I still get the illegal hardware instruction error.

LLAMA_NATIVE is disabled by default with CMake, I have a feeling that if you enable it on this PR you will see the same problem. Probably some #ifdef-guarded code is using an instruction that it isn't supposed to be. Could you provide a backtrace (bt) and disassembly (disas) with gdb? Also, it would be a good idea to open an issue for this since I think you have a different problem than this PR is intended to fix.

thilomichael · 2023-09-22T09:03:53Z

@netrunnereve I just run make without any parameters. When examining what it runs with make -n I get the following call for the quantization:

c++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DGGML_USE_METAL  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wmissing-prototypes -pthread    examples/quantize/quantize.cpp ggml.o llama.o k_quants.o ggml-metal.o ggml-alloc.o -o quantize -framework Accelerate -framework Foundation -framework Metal -framework MetalKit

@cebtenzzre The thing with my problem is that it is "solved" when I checkout this branch and compile with cmake (i.e., I don't get any 'illegal hardware instruction' during quantization). But I still do get it when I'm running make. So that is why I was suspecting that the issue solved in this PR is not only affecting the cmake configuration but also the makefile (but I really am out of my depth here, so that could totally be wrong). So basically: my problem is solved, I just want to make sure that there isn't the same bug in the Makefile that is left unfixed :)

I am running MacOS so I'm using lldb for debugging. Here are the stacktrace and the output when running quantize:

* thread #2, stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0x1e011a40)
  * frame #0: 0x000000010006de40 quantize`quantize_row_q5_K_reference + 924
    frame #1: 0x000000010006f624 quantize`ggml_quantize_q5_K + 32
    frame #2: 0x000000010004eb6c quantize`llama_model_quantize_internal(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, llama_model_quantize_params const*)::$_9::operator()() const + 136
    frame #3: 0x000000010004f030 quantize`void* std::__1::__thread_proxy[abi:v160006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, llama_model_quantize_internal(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, llama_model_quantize_params const*)::$_9>>(void*) + 52
    frame #4: 0x00000001a53fbfa8 libsystem_pthread.dylib`_pthread_start + 148

* thread #2, stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0x1e011a40)
    frame #0: 0x000000010006de40 quantize`quantize_row_q5_K_reference + 924
quantize`quantize_row_q5_K_reference:
->  0x10006de40 <+924>: .long  0x1e011a40                ; unknown opcode
    0x10006de44 <+928>: fcsel  s2, s18, s2, gt
    0x10006de48 <+932>: fcmp   s20, s1
    0x10006de4c <+936>: fcsel  s1, s20, s1, mi
  thread #3, stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0x1e011a40)
    frame #0: 0x000000010006de40 quantize`quantize_row_q5_K_reference + 924
quantize`quantize_row_q5_K_reference:
->  0x10006de40 <+924>: .long  0x1e011a40                ; unknown opcode
    0x10006de44 <+928>: fcsel  s2, s18, s2, gt
    0x10006de48 <+932>: fcmp   s20, s1
    0x10006de4c <+936>: fcsel  s1, s20, s1, mi
  thread #4, stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0x1e011a40)
    frame #0: 0x000000010006de40 quantize`quantize_row_q5_K_reference + 924
quantize`quantize_row_q5_K_reference:
->  0x10006de40 <+924>: .long  0x1e011a40                ; unknown opcode
    0x10006de44 <+928>: fcsel  s2, s18, s2, gt
    0x10006de48 <+932>: fcmp   s20, s1
    0x10006de4c <+936>: fcsel  s1, s20, s1, mi
  thread #5, stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0x1e011a40)
    frame #0: 0x000000010006de40 quantize`quantize_row_q5_K_reference + 924
quantize`quantize_row_q5_K_reference:
->  0x10006de40 <+924>: .long  0x1e011a40                ; unknown opcode
    0x10006de44 <+928>: fcsel  s2, s18, s2, gt
    0x10006de48 <+932>: fcmp   s20, s1
    0x10006de4c <+936>: fcsel  s1, s20, s1, mi
  thread #6, stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0x1e011a40)
    frame #0: 0x000000010006de40 quantize`quantize_row_q5_K_reference + 924
quantize`quantize_row_q5_K_reference:
->  0x10006de40 <+924>: .long  0x1e011a40                ; unknown opcode
    0x10006de44 <+928>: fcsel  s2, s18, s2, gt
    0x10006de48 <+932>: fcmp   s20, s1
    0x10006de4c <+936>: fcsel  s1, s20, s1, mi
  thread #7, stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0x1e011a40)
    frame #0: 0x000000010006de40 quantize`quantize_row_q5_K_reference + 924
quantize`quantize_row_q5_K_reference:
->  0x10006de40 <+924>: .long  0x1e011a40                ; unknown opcode
    0x10006de44 <+928>: fcsel  s2, s18, s2, gt
    0x10006de48 <+932>: fcmp   s20, s1
    0x10006de4c <+936>: fcsel  s1, s20, s1, mi
  thread #8, stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0x1e011a40)
    frame #0: 0x000000010006de40 quantize`quantize_row_q5_K_reference + 924
quantize`quantize_row_q5_K_reference:
->  0x10006de40 <+924>: .long  0x1e011a40                ; unknown opcode
    0x10006de44 <+928>: fcsel  s2, s18, s2, gt
    0x10006de48 <+932>: fcmp   s20, s1
    0x10006de4c <+936>: fcsel  s1, s20, s1, mi

cebtenzzre · 2023-09-22T18:20:54Z

I just run make without any parameters. When examining what it runs with make -n I get the following call for the quantization:

You sent the command for the link step, but we need to see the command for the compile step (uses -c, has a -o someobject.o part). It would also be helpful to see the equivalent command from cmake, which you can get by running make VERBOSE=1 after configuring.

Since you're on aarch64, the Makefile adds -mcpu=native and the CMake script on master currently doesn't. This PR is not applicable as it has to do with amd64 only. You should open a new issue.

ggerganov · 2023-09-28T20:13:45Z

Should we make LLAMA_NATIVE=ON by defaut?
Otherwise the default CMake builds will not have AVX, etc.

netrunnereve · 2023-09-28T22:05:25Z

I certainly want this to be the default but apparently MSVC doesn't support march=native. So we'll either have to pull in #809 or only set LLAMA_NATIVE=on when MSVC is not being used.
Then again the Makefile has this on by default and I have no idea what happens when you try using it with MSVC. I don't have Windows so I can't test it out.

Green-Sky · 2023-09-28T22:09:50Z

if you make the it the default, make sure to disable it in the nix flake. (or tell me to do it)

cebtenzzre · 2023-09-28T23:05:37Z

Then again the Makefile has this on by default and I have no idea what happens when you try using it with MSVC. I don't have Windows so I can't test it out.

GNU Makefiles aren't typically used for native MSVC builds - ours definitely doesn't support MSVC. When you use cmake on Windows, it generates Visual Studio project files by default.

ggerganov · 2023-09-29T11:44:44Z

I agree, MSVC shouldn't be a problem. If there is no other concern, we should set LLAMA_NATIVE=ON by default

… also see what happens if we use it on msvc

cebtenzzre · 2023-09-30T01:16:45Z

Un-draft this when you're done with your "see what happens" changes. I wouldn't want it to accidentally get merged if it's not finished.

netrunnereve · 2023-09-30T01:57:55Z

Okay this should be good once the CI passes. march=native is now only triggered for x86 processors just like the Makefile, and it's already automatically disabled for MSVC so it should work everywhere.
I still have LLAMA_NATIVE=OFF set for the Windows builds just in case we manage to get that working in the future and I don't want it to break the architecture-specific releases. It's also turned off for Flake as requested by @Green-Sky (untested as I don't have a Nix machine).

flake.nix

…example * 'master' of github.com:ggerganov/llama.cpp: (24 commits) convert : fix Baichuan2 models by using vocab size in config.json (ggerganov#3299) readme : add project status link ggml : fix build after ggerganov#3329 llm : add Refact model (ggerganov#3329) sync : ggml (conv 1d + 2d updates, UB fixes) (ggerganov#3468) finetune : readme fix typo (ggerganov#3465) ggml : add RISC-V Vector Support for K-Quants and improved the existing intrinsics (ggerganov#3453) main : consistent prefix/suffix coloring (ggerganov#3425) llama : fix session saving/loading (ggerganov#3400) llama : expose model's rope_freq_scale in the API (ggerganov#3418) metal : alibi for arbitrary number of heads (ggerganov#3426) cmake : make LLAMA_NATIVE flag actually use the instructions supported by the processor (ggerganov#3273) Work on the BPE tokenizer (ggerganov#3252) convert : fix vocab size when not defined in hparams (ggerganov#3421) cmake : increase minimum version for add_link_options (ggerganov#3444) CLBlast: Add broadcast support for matrix multiplication (ggerganov#3402) gguf : add BERT, MPT, and GPT-J arch info (ggerganov#3408) gguf : general usability improvements (ggerganov#3409) cmake : make CUDA flags more similar to the Makefile (ggerganov#3420) finetune : fix ggerganov#3404 (ggerganov#3437) ...

…d by the processor (ggerganov#3273) * fix LLAMA_NATIVE * syntax * alternate implementation * my eyes must be getting bad... * set cmake LLAMA_NATIVE=ON by default * march=native doesn't work for ios/tvos, so disable for those targets. also see what happens if we use it on msvc * revert 8283237 and only allow LLAMA_NATIVE on x86 like the Makefile * remove -DLLAMA_MPI=ON --------- Co-authored-by: netrunnereve <netrunnereve@users.noreply.github.com>

netrunnereve added 2 commits September 20, 2023 00:07

fix LLAMA_NATIVE

c0583b2

syntax

ce6f1a0

Green-Sky approved these changes Sep 20, 2023

View reviewed changes

cebtenzzre reviewed Sep 20, 2023

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

netrunnereve added 2 commits September 21, 2023 00:02

alternate implementation

169c622

my eyes must be getting bad...

1310dbe

alonfaraj mentioned this pull request Sep 21, 2023

Quantization of Q4_K and Q5_K fail with "illegal hardware instruction" #3279

Closed

This comment was marked as off-topic.

Sign in to view

alonfaraj mentioned this pull request Sep 26, 2023

[User] main program built by cmake crashed due to Illegal instruction #3339

Closed

4 tasks

Merge branch 'ggerganov:master' into llama_native

89ceced

netrunnereve and others added 3 commits September 29, 2023 20:55

set cmake LLAMA_NATIVE=ON by default

45916f9

Merge https://github.com/ggerganov/llama.cpp into llama_native

062561d

march=native doesn't work for ios/tvos, so disable for those targets.…

8283237

… also see what happens if we use it on msvc

cebtenzzre marked this pull request as draft September 30, 2023 01:16

revert 8283237 and only allow LLAMA_NATIVE on x86 like the Makefile

2c177a4

netrunnereve marked this pull request as ready for review September 30, 2023 01:58

netrunnereve mentioned this pull request Sep 30, 2023

Core dumped on trying to import from llama_cpp module when built with CUBLAS=on abetlen/llama-cpp-python#412

Closed

Green-Sky requested changes Sep 30, 2023

View reviewed changes

flake.nix Outdated Show resolved Hide resolved

remove -DLLAMA_MPI=ON

bc70f21

Green-Sky approved these changes Oct 1, 2023

View reviewed changes

cebtenzzre mentioned this pull request Oct 1, 2023

Disable some flags in Apple x86_64 #3424

Closed

ggerganov merged commit 017efe8 into ggerganov:master Oct 3, 2023
33 checks passed

netrunnereve deleted the llama_native branch October 3, 2023 19:21

AutonomicPerfectionist mentioned this pull request Oct 23, 2023

Prebuilt x86 linux libs use EVEX encoding for AVX instructions, causing SIGILL kherud/java-llama.cpp#25

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make CMake LLAMA_NATIVE flag actually use the instructions supported by the processor #3273

Make CMake LLAMA_NATIVE flag actually use the instructions supported by the processor #3273

netrunnereve commented Sep 20, 2023 •

edited

Loading

staviq commented Sep 20, 2023

Green-Sky left a comment

staviq commented Sep 20, 2023

netrunnereve commented Sep 20, 2023

This comment was marked as off-topic.

staviq commented Sep 21, 2023

This comment was marked as off-topic.

netrunnereve commented Sep 21, 2023

cebtenzzre commented Sep 21, 2023 •

edited

Loading

thilomichael commented Sep 22, 2023

cebtenzzre commented Sep 22, 2023 •

edited

Loading

ggerganov commented Sep 28, 2023

netrunnereve commented Sep 28, 2023 •

edited

Loading

Green-Sky commented Sep 28, 2023

cebtenzzre commented Sep 28, 2023

ggerganov commented Sep 29, 2023

cebtenzzre commented Sep 30, 2023

netrunnereve commented Sep 30, 2023

Make CMake LLAMA_NATIVE flag actually use the instructions supported by the processor #3273

Make CMake LLAMA_NATIVE flag actually use the instructions supported by the processor #3273

Conversation

netrunnereve commented Sep 20, 2023 • edited Loading

staviq commented Sep 20, 2023

Green-Sky left a comment

Choose a reason for hiding this comment

staviq commented Sep 20, 2023

netrunnereve commented Sep 20, 2023

This comment was marked as off-topic.

staviq commented Sep 21, 2023

This comment was marked as off-topic.

netrunnereve commented Sep 21, 2023

cebtenzzre commented Sep 21, 2023 • edited Loading

thilomichael commented Sep 22, 2023

cebtenzzre commented Sep 22, 2023 • edited Loading

ggerganov commented Sep 28, 2023

netrunnereve commented Sep 28, 2023 • edited Loading

Green-Sky commented Sep 28, 2023

cebtenzzre commented Sep 28, 2023

ggerganov commented Sep 29, 2023

cebtenzzre commented Sep 30, 2023

netrunnereve commented Sep 30, 2023

netrunnereve commented Sep 20, 2023 •

edited

Loading

cebtenzzre commented Sep 21, 2023 •

edited

Loading

cebtenzzre commented Sep 22, 2023 •

edited

Loading

netrunnereve commented Sep 28, 2023 •

edited

Loading