Illegal intruction after running `gpt4all-lora-quantized-linux-x86` #82

WillemDeGroef · 2023-03-30T10:35:17Z

I'm trying to run the gpt4all-lora-quantized-linux-x86 on a Ubuntu Linux machine with 240 Intel(R) Xeon(R) CPU E7-8880 v2 @ 2.50GHz processors and 295GB RAM. No GPUs installed.
Ubuntu 22.04 running on a VMWare ESXi

I get the following error: Illegal instruction

willem@ubuntu:/data/chat$ gdb -q ./gpt4all-lora-quantized-linux-x86
Reading symbols from ./gpt4all-lora-quantized-linux-x86...
(No debugging symbols found in ./gpt4all-lora-quantized-linux-x86)
(gdb) run
Starting program: /data/gpt4all/chat/gpt4all-lora-quantized-linux-x86
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
main: seed = 1680171804
llama_model_load: loading model from 'gpt4all-lora-quantized.bin' - please wait ...
llama_model_load: ggml ctx size = 6065.35 MB
llama_model_load: memory_size =  2048.00 MB, n_mem = 65536
llama_model_load: loading model part 1/1 from 'gpt4all-lora-quantized.bin'
llama_model_load: .................................... done
llama_model_load: model size =  4017.27 MB / num tensors = 291

system_info: n_threads = 4 / 240 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
sampling parameters: temp = 0.100000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000



Program received signal SIGILL, Illegal instruction.
0x0000000000425282 in ggml_set_f32 ()

The text was updated successfully, but these errors were encountered:

qinidema · 2023-03-30T12:06:14Z

There are 665 instructions in that function, and there are ones that require AVX and AVX2.
The instruction at 0x0000000000425282 is "vbroadcastss ymm1,xmm0" (C4 E2 7D 18 C8), and it requires AVX2. It lies just in the beginning of the function ggml_set_f32, and the only previous AVX instruction is vmovss, which requires just AVX. So likely vbroadcastss was just the first AVX2-requiring instruction that your CPU encountered.

So you need a CPU with AVX2 support to run this, as far as I can see E7-8880 v2 supports only AVX, but not AVX2. But in the output you've provided I see

AVX = 1 | AVX2 = 1

which confuses me a bit.

qinidema · 2023-03-30T14:23:25Z

But in the output you've provided I see

AVX = 1 | AVX2 = 1

which confuses me a bit.

Lol, figured that out.

ggml_cpu_has_avx2 proc near
mov     eax, 1
retn
ggml_cpu_has_avx2 endp

ggml_cpu_has_avx2 is basically "return true;" in the code. ggml_cpu_has_avx and ggml_cpu_has_sse3 are the same. Interesting that ggml_cpu_has_avx512 is "return false;". I.e. there are no real checks during the output of these statistics (from _Z23llama_print_system_infov, after the "system_info: " part of the line). It is decided in compile-time, and not run-time.

HerbCSO · 2023-03-30T14:33:12Z

FWIW I ran into a similar problem running a VM under Proxmox. I was able to work around this by setting the CPU type to "host", which exposed the full instruction set (for a Ryzen 9 5900X in my case) and then it worked. ~~Not sure if you can do something similar in ESXi, but thought I'd mention this in case it helps you.~~ NM, I see your 240 Xeons don't have AVX2 support - bummer!

Certainly sounds like @qinidema is onto something here though. ;]

mvrozanti · 2023-03-30T18:24:12Z

Why is AVX2 necessary anyway? Is there a workaround?

pirate486743186 · 2023-03-30T19:34:52Z

lol the gif demo has AVX = 0 | AVX2 = 0 (M1 Mac)

This probably just needs a simple recompile.

qinidema · 2023-03-30T20:38:35Z

@mvrozanti you need to recompile this to get a new binary with your compile-time defines.
@pirate486743186 yep.

mvrozanti · 2023-03-30T21:28:14Z

@qinidema That fork also didn't work for me:

main: seed = 1680211596
llama_model_load: loading model from '/home/m/macrovip/gpt4all-lora-unfiltered-quantized.bin' - please wait ...
llama_model_load: ggml ctx size = 6065.35 MB
llama_model_load: memory_size =  2048.00 MB, n_mem = 65536
llama_model_load: loading model part 1/1 from '/home/m/macrovip/gpt4all-lora-unfiltered-quantized.bin'
llama_model_load: terminate called after throwing an instance of 'std::__ios_failure'
  what():  basic_ios::clear: iostream error

At least I got an error with this one. What made you believe that fork in specific would work?

pirate486743186 · 2023-03-30T22:44:11Z

It's the actual source code of the project.
Same error here too.
I tried cmake, it didn't work either

sudo apt install libpthreadpool-dev
cmake . make chat or make -lpthread chat

/usr/bin/ld: libggml.a(ggml.c.o): in function `ggml_graph_compute':
ggml.c:(.text+0x170b0): undefined reference to `pthread_create'
/usr/bin/ld: ggml.c:(.text+0x17113): undefined reference to `pthread_join'
collect2: error: ld returned 1 exit status
gmake[2]: *** [CMakeFiles/quantize.dir/build.make:119: quantize] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:99: CMakeFiles/quantize.dir/all] Error 2
gmake: *** [Makefile:103: all] Error 2

qinidema · 2023-03-31T08:36:58Z

@mvrozanti

At least I got an error with this one. What made you believe that fork in specific would work?

That line in this (current) repository.

For custom hardware compilation, see our [Alpaca C++](https://github.com/zanussbaum/gpt4all.cpp) repository.

qinidema · 2023-03-31T14:21:40Z

@mvrozanti @pirate486743186
Had no problems compiling the executable with the following standard commands:

wget https://github.com/zanussbaum/gpt4all.cpp/archive/refs/heads/master.zip
unzip master.zip
cd gpt4all.cpp-master
mkdir build; cd build
cmake ..
make

Then i've received the "chat" executable (and at least is starts and shows help message successfully), as well as "quantize" and "libggml.a" library.
Cmake 3.25.1, make 4.3, gcc 12.2.0, glibc 2.36, Arch Linux.
Here is the full make log.

pirate486743186 · 2023-03-31T15:27:50Z

I'm on Debian 11. It's probably incompatibility with older versions.
It will be easier to just get a precompiled binary.

pirate486743186 · 2023-03-31T15:32:11Z

@nomic-ai
Can you please fix this first. It just needs a recompile with generic flags.

qinidema · 2023-03-31T16:26:05Z

@pirate486743186 try this: no-avx2.tar.gz
Compiled it with cmake -D LLAMA_NO_AVX2=1 (static versions included too). Still requires AVX, FMA and F16C though (I can recompile without them too).
BTW AVX2 is "on by default" on non-MSVC x86 in that repo, no analyzing of actual CPU features even in compile-time. I think that needs to be corrected.

pirate486743186 · 2023-03-31T17:07:53Z

with the static compile, it gives again 'illegal instruction'. I have an old laptop.

with cmake it can't find pthread when compiling apparently.

/usr/bin/ld: libggml.a(ggml.c.o): in function `ggml_graph_compute':
ggml.c:(.text+0x16eb0): undefined reference to `pthread_create'
/usr/bin/ld: ggml.c:(.text+0x16f13): undefined reference to `pthread_join'
collect2: error: ld returned 1 exit status
make[3]: *** [CMakeFiles/chat.dir/build.make:119: chat] Error 1
make[2]: *** [CMakeFiles/Makefile2:153: CMakeFiles/chat.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:160: CMakeFiles/chat.dir/rule] Error 2
make: *** [Makefile:163: chat] Error 2

vsl-iil · 2023-03-31T18:49:01Z

Same issue; though I get illegal instruction in
0x000055555558a1c9 in ggml_type_sizef ()
Tried compiling gpt4all.cpp myself, with and without LLAMA_NO_AVX2=1 -D LLAMA_NO_FMA=1 -D LLAMA_NO_AVX=1 flags, same issue.

pirate486743186 · 2023-03-31T18:57:39Z

for those that are frustrated, keep in mind it was released 2 days ago. Have ultra low expectations.

martinmcmillan · 2023-04-02T14:39:21Z

I have an Intel i5-3320M with no AVX2 or FMA support. I followed these steps:

#82 (comment)

@mvrozanti @pirate486743186 Had no problems compiling the executable with the following standard commands:
wget https://github.com/zanussbaum/gpt4all.cpp/archive/refs/heads/master.zip
unzip master.zip
cd gpt4all.cpp-master
mkdir build; cd build

and then

$ cmake -D LLAMA_NO_AVX2=1 -D LLAMA_NO_FMA=1 ..
$ make
$ ./chat -m ~/gpt4all/chat/gpt4all-lora-quantized.bin

and it worked. On my laptop, it is very slow as would be expected.

qinidema · 2023-04-03T10:41:58Z

@pirate486743186 and what is the address of that instruction?

@mvrozanti you can try this one: no-avx-avx2-fma-f16c.tar.gz
Compiled it with cmake -D LLAMA_NO_AVX2=1 -D LLAMA_NO_AVX=1 -D LLAMA_NO_FMA=1 .. and additionally commented out the line with F16C.

@vsl-iil you can try the archive above too, though I cannot determine what is the instruction on your address 0x000055555558a1c9, there's some kind of heavy ASLR in your case.

pirate486743186 · 2023-04-03T17:53:06Z

It doesn't say. I have avx and f16c. These should probably work for most or even all.
It worked with no-avx-avx2-fma-f16c.tar.gz (you purged everything lol).

JezausTevas · 2023-04-06T21:06:19Z

with the static compile, it gives again 'illegal instruction'. I have an old laptop.

with cmake it can't find pthread when compiling apparently.

/usr/bin/ld: libggml.a(ggml.c.o): in function `ggml_graph_compute':
ggml.c:(.text+0x16eb0): undefined reference to `pthread_create'
/usr/bin/ld: ggml.c:(.text+0x16f13): undefined reference to `pthread_join'
collect2: error: ld returned 1 exit status
make[3]: *** [CMakeFiles/chat.dir/build.make:119: chat] Error 1
make[2]: *** [CMakeFiles/Makefile2:153: CMakeFiles/chat.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:160: CMakeFiles/chat.dir/rule] Error 2
make: *** [Makefile:163: chat] Error 2

Add this to CMakeLists.txt after line 25:
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -pthread")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -pthread")

pirate486743186 · 2023-04-06T22:50:59Z

i opened an other issue for debian 11
#180

It compiles with this fix and the one mentioned there. But unfortunately, when it starts it gives llama_model_load: Segmentation fault

qinidema · 2023-04-07T10:24:50Z

@pirate486743186 can you share your resulting compiled binary and coredump after the crash?
You can list available core dumps (it's better to start with the fresh one though, i.e. launch it again and get a fresh crash with this line with "llama_model_load") with coredumpctl list and then export it with coredumpctl -o core.dump dump 1234 where 1234 is the PID of that fresh crash.

pirate486743186 · 2023-04-07T17:49:51Z

dump 2GB, respond here #180
https://drive.google.com/file/d/1UUOae8oAerUTG9aucMYpKsa8EWs0-hXJ/view?usp=share_link

the compiled file
chat.zip

pirate486743186 · 2023-04-11T18:02:48Z

I'm using this. It seams to work better.
https://github.com/ggerganov/llama.cpp

to use it, you'll need to convert it
first download the tokenizer file. https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/tokenizer.model

run these commands adapted to your case ( they are working on a new unified converter script)

python3 convert-gpt4all-to-ggml.py models/gpt4all-7B/gpt4all-lora-quantized.bin ./models/tokenizer.model 
python3 migrate-ggml-2023-03-30-pr613.py models/gpt4all-7B/gpt4all-lora-quantized.bin models/gpt4all-7B/gpt4all-lora-quantized-new.bin

Then you run it with something like this. This is a bit incorrect, you'll need to adjust the parameters for better behavior.
./main -i --interactive-first -r "### Human:" -c 2048 --temp 0.1 --ignore-eos -b 1024 -n 10 --repeat_penalty 1.2 --instruct --color -m out.bin

nehulagr · 2023-04-14T17:26:07Z

I am using old Macbook pro (Mid 2012 Intel Model) with 8GB RAM.

$ wget https://github.com/zanussbaum/gpt4all.cpp/archive/refs/heads/master.zip
$ unzip master.zip
$ cd gpt4all.cpp-master
$ mkdir build; cd build
$ cmake -D LLAMA_NO_AVX2=1 -D LLAMA_NO_FMA=1 ..
$ make
$ ./chat -m ~/gpt4all/chat/gpt4all-lora-quantized.bin

This worked for me! But very very slow! I am going to upgrade my RAM tomorrow and see if that helps!

pirate486743186 · 2023-04-14T21:18:26Z

it needs 4GB, more RAM will not help.
try llama.cpp from above. It's more efficient. you'll need to convert it.
For me, it takes some time to start talking every time it's its turn, but after that the tokens come at tolerably slow speed.

In the next months/year, efficiency should increase by a lot. In general, at first software is inefficient and slow.

Dave86ch · 2023-04-16T16:59:42Z

Can you provide suggestions on how to fix this error?

davesoma@Dave:~/gpt4all_/gpt4all.cpp-master/build$ ./chat -m ~/gpt4all/chat/gpt4all-lora-quantized.bin main: seed = 1681664229 llama_model_load: loading model from '/home/davesoma/gpt4all/chat/gpt4all-lora-quantized.bin' - please wait ... llama_model_load: ggml ctx size = 6065.35 MB Segmentation fault

gerardbm · 2023-04-16T18:09:37Z

It happens the same to me:

I only have 4 GB of RAM. Is this the problem?

nitinvengurlekar · 2023-08-04T19:52:06Z

When doing cmake, I get:
cmake -D LLAMA_NO_AVX2=1 -D LLAMA_NO_AVX=1 -D LLAMA_NO_FMA=1 ..
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done
CMake Error in CMakeLists.txt:
Target "chat" requires the language dialect "CXX20" (with compiler
extensions), but CMake does not know the compile flags to use to enable it.
CMake Error in CMakeLists.txt:
Target "quantize" requires the language dialect "CXX20" (with compiler
extensions), but CMake does not know the compile flags to use to enable it.
-- Generating done
-- Build files have been written to: /root/master-gpt4all/gpt4all.cpp-master/build

And fails with
make
[ 12%] Building C object CMakeFiles/ggml.dir/ggml.c.o
[ 25%] Linking C static library libggml.a
[ 25%] Built target ggml
[ 37%] Linking CXX executable chat
libggml.a(ggml.c.o): In function ggml_graph_compute': ggml.c:(.text+0x1a876): undefined reference to pthread_join'
ggml.c:(.text+0x1a952): undefined reference to `pthread_create'
collect2: error: ld returned 1 exit status
CMakeFiles/chat.dir/build.make:121: recipe for target 'chat' failed
make[2]: *** [chat] Error 1
CMakeFiles/Makefile2:104: recipe for target 'CMakeFiles/chat.dir/all' failed
make[1]: *** [CMakeFiles/chat.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

qinidema mentioned this issue Mar 30, 2023

Illegal instruction (core dumped) on Linux Virtual Machine (KVM) #63

Closed

mvrozanti mentioned this issue Apr 8, 2023

Model load issue - Illegal instruction found when running gpt4all-lora-quantized-linux-x86 #241

Closed

qinidema mentioned this issue Apr 11, 2023

Non-AVX binaries #317

Closed

qinidema mentioned this issue Apr 13, 2023

Illegal instruction (core dumped) #332

Closed

rguo123 closed this as not planned Won't fix, can't repro, duplicate, stale May 10, 2023

alxspiker mentioned this issue May 11, 2023

python privateGPT.py : core dumping zylon-ai/private-gpt#47

Closed

royshil mentioned this issue Sep 26, 2023

Linux - crash when adding localvocal 0.0.3 locaal-ai/obs-localvocal#23

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Illegal intruction after running `gpt4all-lora-quantized-linux-x86` #82

Illegal intruction after running `gpt4all-lora-quantized-linux-x86` #82

WillemDeGroef commented Mar 30, 2023 •

edited

Loading

qinidema commented Mar 30, 2023 •

edited

Loading

qinidema commented Mar 30, 2023 •

edited

Loading

HerbCSO commented Mar 30, 2023 •

edited

Loading

mvrozanti commented Mar 30, 2023

pirate486743186 commented Mar 30, 2023

qinidema commented Mar 30, 2023 •

edited

Loading

mvrozanti commented Mar 30, 2023 •

edited

Loading

pirate486743186 commented Mar 30, 2023

qinidema commented Mar 31, 2023

qinidema commented Mar 31, 2023 •

edited

Loading

pirate486743186 commented Mar 31, 2023

pirate486743186 commented Mar 31, 2023

qinidema commented Mar 31, 2023 •

edited

Loading

pirate486743186 commented Mar 31, 2023

vsl-iil commented Mar 31, 2023

pirate486743186 commented Mar 31, 2023

martinmcmillan commented Apr 2, 2023

qinidema commented Apr 3, 2023

pirate486743186 commented Apr 3, 2023

JezausTevas commented Apr 6, 2023 •

edited

Loading

pirate486743186 commented Apr 6, 2023

qinidema commented Apr 7, 2023

pirate486743186 commented Apr 7, 2023

pirate486743186 commented Apr 11, 2023 •

edited

Loading

nehulagr commented Apr 14, 2023

pirate486743186 commented Apr 14, 2023

Dave86ch commented Apr 16, 2023

gerardbm commented Apr 16, 2023

nitinvengurlekar commented Aug 4, 2023

Illegal intruction after running gpt4all-lora-quantized-linux-x86 #82

Illegal intruction after running gpt4all-lora-quantized-linux-x86 #82

Comments

WillemDeGroef commented Mar 30, 2023 • edited Loading

qinidema commented Mar 30, 2023 • edited Loading

qinidema commented Mar 30, 2023 • edited Loading

HerbCSO commented Mar 30, 2023 • edited Loading

mvrozanti commented Mar 30, 2023

pirate486743186 commented Mar 30, 2023

qinidema commented Mar 30, 2023 • edited Loading

mvrozanti commented Mar 30, 2023 • edited Loading

pirate486743186 commented Mar 30, 2023

qinidema commented Mar 31, 2023

qinidema commented Mar 31, 2023 • edited Loading

pirate486743186 commented Mar 31, 2023

pirate486743186 commented Mar 31, 2023

qinidema commented Mar 31, 2023 • edited Loading

pirate486743186 commented Mar 31, 2023

vsl-iil commented Mar 31, 2023

pirate486743186 commented Mar 31, 2023

martinmcmillan commented Apr 2, 2023

qinidema commented Apr 3, 2023

pirate486743186 commented Apr 3, 2023

JezausTevas commented Apr 6, 2023 • edited Loading

pirate486743186 commented Apr 6, 2023

qinidema commented Apr 7, 2023

pirate486743186 commented Apr 7, 2023

pirate486743186 commented Apr 11, 2023 • edited Loading

nehulagr commented Apr 14, 2023

pirate486743186 commented Apr 14, 2023

Dave86ch commented Apr 16, 2023

gerardbm commented Apr 16, 2023

nitinvengurlekar commented Aug 4, 2023

Illegal intruction after running `gpt4all-lora-quantized-linux-x86` #82

Illegal intruction after running `gpt4all-lora-quantized-linux-x86` #82

WillemDeGroef commented Mar 30, 2023 •

edited

Loading

qinidema commented Mar 30, 2023 •

edited

Loading

qinidema commented Mar 30, 2023 •

edited

Loading

HerbCSO commented Mar 30, 2023 •

edited

Loading

qinidema commented Mar 30, 2023 •

edited

Loading

mvrozanti commented Mar 30, 2023 •

edited

Loading

qinidema commented Mar 31, 2023 •

edited

Loading

qinidema commented Mar 31, 2023 •

edited

Loading

JezausTevas commented Apr 6, 2023 •

edited

Loading

pirate486743186 commented Apr 11, 2023 •

edited

Loading