Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illegal intruction after running gpt4all-lora-quantized-linux-x86 #82

Closed
WillemDeGroef opened this issue Mar 30, 2023 · 29 comments
Closed

Comments

@WillemDeGroef
Copy link

WillemDeGroef commented Mar 30, 2023

I'm trying to run the gpt4all-lora-quantized-linux-x86 on a Ubuntu Linux machine with 240 Intel(R) Xeon(R) CPU E7-8880 v2 @ 2.50GHz processors and 295GB RAM. No GPUs installed.
Ubuntu 22.04 running on a VMWare ESXi

I get the following error: Illegal instruction

willem@ubuntu:/data/chat$ gdb -q ./gpt4all-lora-quantized-linux-x86
Reading symbols from ./gpt4all-lora-quantized-linux-x86...
(No debugging symbols found in ./gpt4all-lora-quantized-linux-x86)
(gdb) run
Starting program: /data/gpt4all/chat/gpt4all-lora-quantized-linux-x86
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
main: seed = 1680171804
llama_model_load: loading model from 'gpt4all-lora-quantized.bin' - please wait ...
llama_model_load: ggml ctx size = 6065.35 MB
llama_model_load: memory_size =  2048.00 MB, n_mem = 65536
llama_model_load: loading model part 1/1 from 'gpt4all-lora-quantized.bin'
llama_model_load: .................................... done
llama_model_load: model size =  4017.27 MB / num tensors = 291

system_info: n_threads = 4 / 240 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
sampling parameters: temp = 0.100000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000



Program received signal SIGILL, Illegal instruction.
0x0000000000425282 in ggml_set_f32 ()
@qinidema
Copy link

qinidema commented Mar 30, 2023

There are 665 instructions in that function, and there are ones that require AVX and AVX2.
The instruction at 0x0000000000425282 is "vbroadcastss ymm1,xmm0" (C4 E2 7D 18 C8), and it requires AVX2. It lies just in the beginning of the function ggml_set_f32, and the only previous AVX instruction is vmovss, which requires just AVX. So likely vbroadcastss was just the first AVX2-requiring instruction that your CPU encountered.

So you need a CPU with AVX2 support to run this, as far as I can see E7-8880 v2 supports only AVX, but not AVX2. But in the output you've provided I see

AVX = 1 | AVX2 = 1

which confuses me a bit.

@qinidema
Copy link

qinidema commented Mar 30, 2023

But in the output you've provided I see

AVX = 1 | AVX2 = 1

which confuses me a bit.

Lol, figured that out.

ggml_cpu_has_avx2 proc near
mov     eax, 1
retn
ggml_cpu_has_avx2 endp

ggml_cpu_has_avx2 is basically "return true;" in the code. ggml_cpu_has_avx and ggml_cpu_has_sse3 are the same. Interesting that ggml_cpu_has_avx512 is "return false;". I.e. there are no real checks during the output of these statistics (from _Z23llama_print_system_infov, after the "system_info: " part of the line). It is decided in compile-time, and not run-time.

@HerbCSO
Copy link

HerbCSO commented Mar 30, 2023

FWIW I ran into a similar problem running a VM under Proxmox. I was able to work around this by setting the CPU type to "host", which exposed the full instruction set (for a Ryzen 9 5900X in my case) and then it worked. Not sure if you can do something similar in ESXi, but thought I'd mention this in case it helps you. NM, I see your 240 Xeons don't have AVX2 support - bummer!

Certainly sounds like @qinidema is onto something here though. ;]

@mvrozanti
Copy link

Why is AVX2 necessary anyway? Is there a workaround?

@pirate486743186
Copy link

lol the gif demo has AVX = 0 | AVX2 = 0 (M1 Mac)

This probably just needs a simple recompile.

@qinidema
Copy link

qinidema commented Mar 30, 2023

@mvrozanti you need to recompile this to get a new binary with your compile-time defines.
@pirate486743186 yep.

@mvrozanti
Copy link

mvrozanti commented Mar 30, 2023

@qinidema That fork also didn't work for me:

main: seed = 1680211596
llama_model_load: loading model from '/home/m/macrovip/gpt4all-lora-unfiltered-quantized.bin' - please wait ...
llama_model_load: ggml ctx size = 6065.35 MB
llama_model_load: memory_size =  2048.00 MB, n_mem = 65536
llama_model_load: loading model part 1/1 from '/home/m/macrovip/gpt4all-lora-unfiltered-quantized.bin'
llama_model_load: terminate called after throwing an instance of 'std::__ios_failure'
  what():  basic_ios::clear: iostream error

At least I got an error with this one. What made you believe that fork in specific would work?

@pirate486743186
Copy link

It's the actual source code of the project.
Same error here too.
I tried cmake, it didn't work either

sudo apt install libpthreadpool-dev
cmake . make chat or make -lpthread chat

/usr/bin/ld: libggml.a(ggml.c.o): in function `ggml_graph_compute':
ggml.c:(.text+0x170b0): undefined reference to `pthread_create'
/usr/bin/ld: ggml.c:(.text+0x17113): undefined reference to `pthread_join'
collect2: error: ld returned 1 exit status
gmake[2]: *** [CMakeFiles/quantize.dir/build.make:119: quantize] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:99: CMakeFiles/quantize.dir/all] Error 2
gmake: *** [Makefile:103: all] Error 2

@qinidema
Copy link

@mvrozanti

At least I got an error with this one. What made you believe that fork in specific would work?

That line in this (current) repository.

For custom hardware compilation, see our [Alpaca C++](https://github.com/zanussbaum/gpt4all.cpp) repository.

@qinidema
Copy link

qinidema commented Mar 31, 2023

@mvrozanti @pirate486743186
Had no problems compiling the executable with the following standard commands:

wget https://github.com/zanussbaum/gpt4all.cpp/archive/refs/heads/master.zip
unzip master.zip
cd gpt4all.cpp-master
mkdir build; cd build
cmake ..
make

Then i've received the "chat" executable (and at least is starts and shows help message successfully), as well as "quantize" and "libggml.a" library.
Cmake 3.25.1, make 4.3, gcc 12.2.0, glibc 2.36, Arch Linux.
Here is the full make log.

@pirate486743186
Copy link

I'm on Debian 11. It's probably incompatibility with older versions.
It will be easier to just get a precompiled binary.

@pirate486743186
Copy link

@nomic-ai
Can you please fix this first. It just needs a recompile with generic flags.

@qinidema
Copy link

qinidema commented Mar 31, 2023

@pirate486743186 try this: no-avx2.tar.gz
Compiled it with cmake -D LLAMA_NO_AVX2=1 (static versions included too). Still requires AVX, FMA and F16C though (I can recompile without them too).
BTW AVX2 is "on by default" on non-MSVC x86 in that repo, no analyzing of actual CPU features even in compile-time. I think that needs to be corrected.

@pirate486743186
Copy link

with the static compile, it gives again 'illegal instruction'. I have an old laptop.

with cmake it can't find pthread when compiling apparently.

/usr/bin/ld: libggml.a(ggml.c.o): in function `ggml_graph_compute':
ggml.c:(.text+0x16eb0): undefined reference to `pthread_create'
/usr/bin/ld: ggml.c:(.text+0x16f13): undefined reference to `pthread_join'
collect2: error: ld returned 1 exit status
make[3]: *** [CMakeFiles/chat.dir/build.make:119: chat] Error 1
make[2]: *** [CMakeFiles/Makefile2:153: CMakeFiles/chat.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:160: CMakeFiles/chat.dir/rule] Error 2
make: *** [Makefile:163: chat] Error 2

@vsl-iil
Copy link

vsl-iil commented Mar 31, 2023

Same issue; though I get illegal instruction in
0x000055555558a1c9 in ggml_type_sizef ()
Tried compiling gpt4all.cpp myself, with and without LLAMA_NO_AVX2=1 -D LLAMA_NO_FMA=1 -D LLAMA_NO_AVX=1 flags, same issue.

@pirate486743186
Copy link

for those that are frustrated, keep in mind it was released 2 days ago. Have ultra low expectations.

@martinmcmillan
Copy link

I have an Intel i5-3320M with no AVX2 or FMA support. I followed these steps:

#82 (comment)

@mvrozanti @pirate486743186 Had no problems compiling the executable with the following standard commands:

wget https://github.com/zanussbaum/gpt4all.cpp/archive/refs/heads/master.zip
unzip master.zip
cd gpt4all.cpp-master
mkdir build; cd build

and then

$ cmake -D LLAMA_NO_AVX2=1 -D LLAMA_NO_FMA=1 ..
$ make
$ ./chat -m ~/gpt4all/chat/gpt4all-lora-quantized.bin

and it worked. On my laptop, it is very slow as would be expected.

@qinidema
Copy link

qinidema commented Apr 3, 2023

@pirate486743186 and what is the address of that instruction?

@mvrozanti you can try this one: no-avx-avx2-fma-f16c.tar.gz
Compiled it with cmake -D LLAMA_NO_AVX2=1 -D LLAMA_NO_AVX=1 -D LLAMA_NO_FMA=1 .. and additionally commented out the line with F16C.

@vsl-iil you can try the archive above too, though I cannot determine what is the instruction on your address 0x000055555558a1c9, there's some kind of heavy ASLR in your case.

@pirate486743186
Copy link

It doesn't say. I have avx and f16c. These should probably work for most or even all.
It worked with no-avx-avx2-fma-f16c.tar.gz (you purged everything lol).

@JezausTevas
Copy link

JezausTevas commented Apr 6, 2023

with the static compile, it gives again 'illegal instruction'. I have an old laptop.

with cmake it can't find pthread when compiling apparently.

/usr/bin/ld: libggml.a(ggml.c.o): in function `ggml_graph_compute':
ggml.c:(.text+0x16eb0): undefined reference to `pthread_create'
/usr/bin/ld: ggml.c:(.text+0x16f13): undefined reference to `pthread_join'
collect2: error: ld returned 1 exit status
make[3]: *** [CMakeFiles/chat.dir/build.make:119: chat] Error 1
make[2]: *** [CMakeFiles/Makefile2:153: CMakeFiles/chat.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:160: CMakeFiles/chat.dir/rule] Error 2
make: *** [Makefile:163: chat] Error 2

Add this to CMakeLists.txt after line 25:
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -pthread")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -pthread")

@pirate486743186
Copy link

i opened an other issue for debian 11
#180

It compiles with this fix and the one mentioned there. But unfortunately, when it starts it gives llama_model_load: Segmentation fault

@qinidema
Copy link

qinidema commented Apr 7, 2023

@pirate486743186 can you share your resulting compiled binary and coredump after the crash?
You can list available core dumps (it's better to start with the fresh one though, i.e. launch it again and get a fresh crash with this line with "llama_model_load") with coredumpctl list and then export it with coredumpctl -o core.dump dump 1234 where 1234 is the PID of that fresh crash.

@pirate486743186
Copy link

@pirate486743186
Copy link

pirate486743186 commented Apr 11, 2023

I'm using this. It seams to work better.
https://github.com/ggerganov/llama.cpp

to use it, you'll need to convert it
first download the tokenizer file. https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/tokenizer.model

run these commands adapted to your case ( they are working on a new unified converter script)

python3 convert-gpt4all-to-ggml.py models/gpt4all-7B/gpt4all-lora-quantized.bin ./models/tokenizer.model 
python3 migrate-ggml-2023-03-30-pr613.py models/gpt4all-7B/gpt4all-lora-quantized.bin models/gpt4all-7B/gpt4all-lora-quantized-new.bin

Then you run it with something like this. This is a bit incorrect, you'll need to adjust the parameters for better behavior.
./main -i --interactive-first -r "### Human:" -c 2048 --temp 0.1 --ignore-eos -b 1024 -n 10 --repeat_penalty 1.2 --instruct --color -m out.bin

@nehulagr
Copy link

I am using old Macbook pro (Mid 2012 Intel Model) with 8GB RAM.

$ wget https://github.com/zanussbaum/gpt4all.cpp/archive/refs/heads/master.zip
$ unzip master.zip
$ cd gpt4all.cpp-master
$ mkdir build; cd build
$ cmake -D LLAMA_NO_AVX2=1 -D LLAMA_NO_FMA=1 ..
$ make
$ ./chat -m ~/gpt4all/chat/gpt4all-lora-quantized.bin

This worked for me! But very very slow! I am going to upgrade my RAM tomorrow and see if that helps!

@pirate486743186
Copy link

it needs 4GB, more RAM will not help.
try llama.cpp from above. It's more efficient. you'll need to convert it.
For me, it takes some time to start talking every time it's its turn, but after that the tokens come at tolerably slow speed.

In the next months/year, efficiency should increase by a lot. In general, at first software is inefficient and slow.

@Dave86ch
Copy link

Can you provide suggestions on how to fix this error?

davesoma@Dave:~/gpt4all_/gpt4all.cpp-master/build$ ./chat -m ~/gpt4all/chat/gpt4all-lora-quantized.bin main: seed = 1681664229 llama_model_load: loading model from '/home/davesoma/gpt4all/chat/gpt4all-lora-quantized.bin' - please wait ... llama_model_load: ggml ctx size = 6065.35 MB Segmentation fault

@gerardbm
Copy link

It happens the same to me:

2023-04-16_20:07:33

I only have 4 GB of RAM. Is this the problem?

@rguo123 rguo123 closed this as not planned Won't fix, can't repro, duplicate, stale May 10, 2023
@nitinvengurlekar
Copy link

When doing cmake, I get:
cmake -D LLAMA_NO_AVX2=1 -D LLAMA_NO_AVX=1 -D LLAMA_NO_FMA=1 ..
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done
CMake Error in CMakeLists.txt:
Target "chat" requires the language dialect "CXX20" (with compiler
extensions), but CMake does not know the compile flags to use to enable it.
CMake Error in CMakeLists.txt:
Target "quantize" requires the language dialect "CXX20" (with compiler
extensions), but CMake does not know the compile flags to use to enable it.
-- Generating done
-- Build files have been written to: /root/master-gpt4all/gpt4all.cpp-master/build

And fails with
make
[ 12%] Building C object CMakeFiles/ggml.dir/ggml.c.o
[ 25%] Linking C static library libggml.a
[ 25%] Built target ggml
[ 37%] Linking CXX executable chat
libggml.a(ggml.c.o): In function ggml_graph_compute': ggml.c:(.text+0x1a876): undefined reference to pthread_join'
ggml.c:(.text+0x1a952): undefined reference to `pthread_create'
collect2: error: ld returned 1 exit status
CMakeFiles/chat.dir/build.make:121: recipe for target 'chat' failed
make[2]: *** [chat] Error 1
CMakeFiles/Makefile2:104: recipe for target 'CMakeFiles/chat.dir/all' failed
make[1]: *** [CMakeFiles/chat.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests