- 
                Notifications
    
You must be signed in to change notification settings  - Fork 13.5k
 
Closed
Labels
bug-unconfirmedmedium severityUsed to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)
Description
What happened?
I am running llama-perplexity on q3_k_m quantized qwen2.5-14b-instruct and wikitext-2. A while into the progress (~90 seconds on my computer with the Debug build; GPU computation clearly has time to happen for a while) it segfaults. The attached log output is from the Debug build, but the same crash happens on the Release build. I am running it with the command llama-perplexity -m qwen2.5-14b-instruct-q3_k_m-00001-of-00002.gguf -f wikitext-2-raw/wiki.test.raw -fa -c 32768 -ngl 49.
This is fully repeatable for me.
Name and Version
This is b3829. Running it on NixOS; I think some of the packaging stuff (as it needs to be repeatable) causes the version to be reported as 0:
$ llama-perplexity --version
version: 0 (unknown)
built with gcc (GCC) 13.3.0 for x86_64-unknown-linux-gnu
What operating system are you seeing the problem on?
Linux
Relevant log output
$ time llama-perplexity -m qwen2.5-14b-instruct-q3_k_m-00001-of-00002.gguf -f wikitext-2-raw/wiki.test.raw -fa -c 32768 -ngl 49
build: 0 (unknown) with gcc (GCC) 13.3.0 for x86_64-unknown-linux-gnu (debug)
llama_model_loader: additional 1 GGUFs metadata loaded.
llama_model_loader: loaded meta data with 29 key-value pairs and 579 tensors from qwen2.5-14b-instruct-q3_k_m-00001-of-00002.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = qwen2.5-14b-instruct
llama_model_loader: - kv   3:                            general.version str              = v0.1
llama_model_loader: - kv   4:                           general.finetune str              = qwen2.5-14b-instruct
llama_model_loader: - kv   5:                         general.size_label str              = 15B
llama_model_loader: - kv   6:                          qwen2.block_count u32              = 48
llama_model_loader: - kv   7:                       qwen2.context_length u32              = 131072
llama_model_loader: - kv   8:                     qwen2.embedding_length u32              = 5120
llama_model_loader: - kv   9:                  qwen2.feed_forward_length u32              = 13824
llama_model_loader: - kv  10:                 qwen2.attention.head_count u32              = 40
llama_model_loader: - kv  11:              qwen2.attention.head_count_kv u32              = 8
llama_model_loader: - kv  12:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  13:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  14:                          general.file_type u32              = 12
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  16:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  17:                      tokenizer.ggml.tokens arr[str,152064]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,152064]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  22:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  23:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  24:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
llama_model_loader: - kv  25:               general.quantization_version u32              = 2
llama_model_loader: - kv  26:                                   split.no u16              = 0
llama_model_loader: - kv  27:                                split.count u16              = 2
llama_model_loader: - kv  28:                        split.tensors.count i32              = 579
llama_model_loader: - type  f32:  241 tensors
llama_model_loader: - type q3_K:  193 tensors
llama_model_loader: - type q4_K:  139 tensors
llama_model_loader: - type q5_K:    5 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: special tokens cache size = 22
llm_load_vocab: token to piece cache size = 0.9310 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = qwen2
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 152064
llm_load_print_meta: n_merges         = 151387
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 131072
llm_load_print_meta: n_embd           = 5120
llm_load_print_meta: n_layer          = 48
llm_load_print_meta: n_head           = 40
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 5
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 13824
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 2
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 1000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 131072
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = ?B
llm_load_print_meta: model ftype      = Q3_K - Medium
llm_load_print_meta: model params     = 14.77 B
llm_load_print_meta: model size       = 6.83 GiB (3.97 BPW)
llm_load_print_meta: general.name     = qwen2.5-14b-instruct
llm_load_print_meta: BOS token        = 151643 '<|endoftext|>'
llm_load_print_meta: EOS token        = 151645 '<|im_end|>'
llm_load_print_meta: PAD token        = 151643 '<|endoftext|>'
llm_load_print_meta: LF token         = 148848 'ÄĬ'
llm_load_print_meta: EOT token        = 151645 '<|im_end|>'
llm_load_print_meta: EOG token        = 151643 '<|endoftext|>'
llm_load_print_meta: EOG token        = 151645 '<|im_end|>'
llm_load_print_meta: max token length = 256
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
llm_load_tensors: ggml ctx size =    0.51 MiB
llm_load_tensors: offloading 48 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 49/49 layers to GPU
llm_load_tensors:        CPU buffer size =   319.04 MiB
llm_load_tensors:      CUDA0 buffer size =  6674.48 MiB
.........................................................................................
llama_new_context_with_model: n_ctx      = 32768
llama_new_context_with_model: n_batch    = 2048
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 1
llama_new_context_with_model: freq_base  = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      CUDA0 KV buffer size =  6144.00 MiB
llama_new_context_with_model: KV self size  = 6144.00 MiB, K (f16): 3072.00 MiB, V (f16): 3072.00 MiB
llama_new_context_with_model:  CUDA_Host  output buffer size =     0.58 MiB
ggml_gallocr_reserve_n: reallocating CUDA0 buffer from size 0.00 MiB to 307.00 MiB
ggml_gallocr_reserve_n: reallocating CUDA_Host buffer from size 0.00 MiB to 74.01 MiB
llama_new_context_with_model:      CUDA0 compute buffer size =   307.00 MiB
llama_new_context_with_model:  CUDA_Host compute buffer size =    74.01 MiB
llama_new_context_with_model: graph nodes  = 1495
llama_new_context_with_model: graph splits = 2
llama_init_from_gpt_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
llama_output_reserve: reallocating output buffer from size 0.58 MiB to 1.16 MiB
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 2 1 1]
system_info: n_threads = 16 (n_threads_batch = 16) / 32 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
perplexity: tokenizing the input ..
perplexity: tokenization took 736.355 ms
perplexity: calculating perplexity over 9 chunks, n_ctx=32768, batch_size=2048, n_seq=1
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
llama_output_reserve: reallocating output buffer from size 1.16 MiB to 1188.00 MiB
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
perplexity: 24.67 seconds per pass - ETA 3.70 minutes
Segmentation fault (core dumped)
real	1m29,564s
user	0m28,035s
sys	0m30,288s
$ coredumpctl debug
           PID: 3147013 (llama-perplexit)
           UID: 1000 (sliedes)
           GID: 100 (users)
        Signal: 11 (SEGV)
     Timestamp: Tue 2024-10-08 04:01:36 CEST (3min 34s ago)
  Command Line: llama-perplexity -m qwen2.5-14b-instruct-q3_k_m-00001-of-00002.gguf -f wikitext-2-raw/wiki.test.raw -fa -c 32768 -ngl 49
    Executable: /nix/store/213m0yym3qdfnvnzfqdvr82pbgp0b63l-llama-cpp-3829/bin/llama-perplexity
 Control Group: /user.slice/user-1000.slice/session-2.scope
          Unit: session-2.scope
         Slice: user-1000.slice
       Session: 2
     Owner UID: 1000 (sliedes)
       Boot ID: 6224b3f52c0e45468c99f5f5cc1d17f4
    Machine ID: 13629c48106c49a39ea48f0b10557f82
      Hostname: poyta
       Storage: /var/lib/systemd/coredump/core.llama-perplexit.1000.6224b3f52c0e45468c99f5f5cc1d17f4.3147013.1728352896000000.zst (present)
  Size on Disk: 9.6G
       Message: Process 3147013 (llama-perplexit) of user 1000 dumped core.
                Module libgomp.so.1 without build-id.
                Module libgcc_s.so.1 without build-id.
                Module libstdc++.so.6 without build-id.
                Stack trace of thread 3147061:
                #0  0x000000000041d0af _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e0af)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147046:
                #0  0x00007f83bbf909f6 __expf_fma (libm.so.6 + 0x769f6)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147049:
                #0  0x000000000041d11d _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e11d)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147047:
                #0  0x000000000041d0af _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e0af)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147050:
                #0  0x00007f83bbf90a2e __expf_fma (libm.so.6 + 0x76a2e)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147048:
                #0  0x000000000041d11d _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e11d)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147051:
                #0  0x00007f83bbf90a2e __expf_fma (libm.so.6 + 0x76a2e)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147053:
                #0  0x0000000000410e50 n/a (llama-perplexity + 0x11e50)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147054:
                #0  0x00007f83bbf90a71 __expf_fma (libm.so.6 + 0x76a71)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147052:
                #0  0x000000000041d11d _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e11d)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147056:
                #0  0x00007f83bbf90a71 __expf_fma (libm.so.6 + 0x76a71)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147055:
                #0  0x000000000041d123 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e123)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147057:
                #0  0x00007f83bbf90a71 __expf_fma (libm.so.6 + 0x76a71)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147059:
                #0  0x000000000041d123 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e123)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147060:
                #0  0x00007f83bbf909e0 __expf_fma (libm.so.6 + 0x769e0)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147064:
                #0  0x000000000041d11d _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e11d)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147062:
                #0  0x000000000041d119 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e119)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147063:
                #0  0x00007f83bbf90a71 __expf_fma (libm.so.6 + 0x76a71)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147066:
                #0  0x00007f83bbf90a75 __expf_fma (libm.so.6 + 0x76a75)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147067:
                #0  0x00007f83bbf90a71 __expf_fma (libm.so.6 + 0x76a71)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147068:
                #0  0x000000000041d123 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e123)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147069:
                #0  0x000000000041d11d _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e11d)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147070:
                #0  0x00007f83bbf90a71 __expf_fma (libm.so.6 + 0x76a71)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147071:
                #0  0x00007f83bbf90a5f __expf_fma (libm.so.6 + 0x76a5f)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147072:
                #0  0x000000000041d11d _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e11d)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147074:
                #0  0x00007f83bbf90a6d __expf_fma (libm.so.6 + 0x76a6d)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147073:
                #0  0x000000000041d119 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e119)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147075:
                #0  0x00007f83bbf90a56 __expf_fma (libm.so.6 + 0x76a56)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147076:
                #0  0x00007f83bbf90a71 __expf_fma (libm.so.6 + 0x76a71)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147058:
                #0  0x000000000041d0af _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e0af)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147020:
                #0  0x00007f83bbb0ad1f __poll (libc.so.6 + 0x101d1f)
                #1  0x00007f8394a54e3f n/a (libcuda.so.1 + 0x254e3f)
                #2  0x00007f8394b27fbf n/a (libcuda.so.1 + 0x327fbf)
                #3  0x00007f8394a51113 n/a (libcuda.so.1 + 0x251113)
                #4  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #5  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147014:
                #0  0x00007f83bba960ce __futex_abstimed_wait_common (libc.so.6 + 0x8d0ce)
                #1  0x00007f83bba98c20 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8fc20)
                #2  0x00000000004979fb _ZZN7gpt_log6resumeEvENKUlvE_clEv (llama-perplexity + 0x989fb)
                #3  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #4  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #5  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147013:
                #0  0x000000000041d11d _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e11d)
                #1  0x000000000041fbb8 _ZL10perplexityP13llama_contextRK10gpt_paramsi (llama-perplexity + 0x20bb8)
                #2  0x0000000000417e8e main (llama-perplexity + 0x18e8e)
                #3  0x00007f83bba3314e __libc_start_call_main (libc.so.6 + 0x2a14e)
                #4  0x00007f83bba33209 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2a209)
                #5  0x000000000041bd55 _start (llama-perplexity + 0x1cd55)
                Stack trace of thread 3147019:
                #0  0x00007f83bba960ce __futex_abstimed_wait_common (libc.so.6 + 0x8d0ce)
                #1  0x00007f83bba98f45 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x8ff45)
                #2  0x00007f83949aebca n/a (libcuda.so.1 + 0x1aebca)
                #3  0x00007f8394a51113 n/a (libcuda.so.1 + 0x251113)
                #4  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #5  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147015:
                #0  0x00007f83bbb0ad1f __poll (libc.so.6 + 0x101d1f)
                #1  0x00007f8394a54e3f n/a (libcuda.so.1 + 0x254e3f)
                #2  0x00007f8394b27fbf n/a (libcuda.so.1 + 0x327fbf)
                #3  0x00007f8394a51113 n/a (libcuda.so.1 + 0x251113)
                #4  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #5  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                Stack trace of thread 3147065:
                #0  0x00007f83bbf90a71 __expf_fma (libm.so.6 + 0x76a71)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                ELF object binary architecture: AMD x86-64
Reading symbols from /nix/store/213m0yym3qdfnvnzfqdvr82pbgp0b63l-llama-cpp-3829/bin/llama-perplexity...
warning: Loadable section ".dynstr" outside of ELF segments
  in /nix/store/213m0yym3qdfnvnzfqdvr82pbgp0b63l-llama-cpp-3829/bin/llama-perplexity
Reading symbols from /nix/store/a9ahhxdscnmaa9r21apcpmgpi7rm8lzw-llama-cpp-3829-debug/lib/debug/.build-id/cd/c2662de87e621fa634f03179bc07cbc193881d.debug...
warning: Can't open file /dev/zero (deleted) during file-backed mapping note processing
warning: core file may not match specified executable file.
[New LWP 3147061]
[New LWP 3147046]
[New LWP 3147049]
[New LWP 3147047]
[New LWP 3147050]
[New LWP 3147048]
[New LWP 3147051]
[New LWP 3147053]
[New LWP 3147054]
[New LWP 3147052]
[New LWP 3147056]
[New LWP 3147055]
[New LWP 3147057]
[New LWP 3147059]
[New LWP 3147060]
[New LWP 3147064]
[New LWP 3147062]
[New LWP 3147063]
[New LWP 3147066]
[New LWP 3147067]
[New LWP 3147068]
[New LWP 3147069]
[New LWP 3147070]
[New LWP 3147071]
[New LWP 3147072]
[New LWP 3147074]
[New LWP 3147073]
[New LWP 3147075]
[New LWP 3147076]
[New LWP 3147058]
[New LWP 3147020]
[New LWP 3147014]
[New LWP 3147013]
[New LWP 3147019]
[New LWP 3147015]
[New LWP 3147065]
Downloading separate debug info for /nix/store/jzp0hpr9avl6i7gkx19dz59xirp0q7m2-cuda_cudart-12.4.99-lib/lib/libcudart.so.12
Downloading separate debug info for /nix/store/gj56jqxsgmvzgf9kdnbhciw5p48h78lb-libcublas-12.4.2.65-lib/lib/libcublas.so.12
Downloading separate debug info for /nix/store/gj56jqxsgmvzgf9kdnbhciw5p48h78lb-libcublas-12.4.2.65-lib/lib/libcublasLt.so.12
Downloading separate debug info for /run/opengl-driver/lib/libcuda.so.1
Downloading separate debug info for system-supplied DSO at 0x7ffe0914f000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/nix/store/3dyw8dzj9ab4m8hv5dpyx7zii8d0w6fi-glibc-2.39-52/lib/libthread_db.so.1".
Core was generated by `llama-perplexity -m qwen2.5-14b-instruct-q3_k_m-00001-of-00002.gguf -f wikitext'.
Program terminated with signal SIGSEGV, Segmentation fault.
Downloading source file /build/source/examples/perplexity/perplexity.cpp
#0  log_softmax (tok=5636, logits=0x7f77e6105010, n_vocab=152064) at /build/source/examples/perplexity/perplexity.cpp:107
warning: 107	/build/source/examples/perplexity/perplexity.cpp: No such file or directory
[Current thread is 1 (Thread 0x7f836eff1000 (LWP 3147061))]
(gdb) set substitute-path /build/source /home/sliedes/proj/llama.cpp
(gdb) bt
#0  log_softmax (tok=5636, logits=0x7f77e6105010, n_vocab=152064) at /build/source/examples/perplexity/perplexity.cpp:107
#1  operator() (__closure=0x1b732f18) at /build/source/examples/perplexity/perplexity.cpp:172
#2  0x00007f83bbce86d3 in execute_native_thread_routine () from /nix/store/22nxhmsfcv2q2rpkmfvzwg2w5z1l231z-gcc-13.3.0-lib/lib/libstdc++.so.6
#3  0x00007f83bba99a42 in start_thread (arg=<optimized out>) at pthread_create.c:447
#4  0x00007f83bbb1905c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
(gdb) bt -full
#0  log_softmax (tok=5636, logits=0x7f77e6105010, n_vocab=152064) at /build/source/examples/perplexity/perplexity.cpp:107
        max_logit = <optimized out>
        sum_exp = <optimized out>
        max_logit = <optimized out>
        sum_exp = <optimized out>
        i = <optimized out>
        i = <optimized out>
#1  operator() (__closure=0x1b732f18) at /build/source/examples/perplexity/perplexity.cpp:172
        lock = <optimized out>
        i = 14124
        results = <optimized out>
        v = <optimized out>
        local_nll = 784.23475355210473
        local_nll2 = 3605.6059932089684
        n_token = 16383
        tokens = 0x7f837cb02010
        logits = 0x7f79e5fff010
        n_vocab = 152064
        prob_history = 0x192460d0
        logit_history = 0x19121fa0
        nll2 = @0x7ffe09117578: 0
        nll = @0x7ffe09117570: 0
        counter = @0x7ffe0911756c: 14126
        mutex = @0x7ffe09117770: {<std::__mutex_base> = {_M_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0,
                __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>,
              __align = 0}}, <No data fields>}
#2  0x00007f83bbce86d3 in execute_native_thread_routine () from /nix/store/22nxhmsfcv2q2rpkmfvzwg2w5z1l231z-gcc-13.3.0-lib/lib/libstdc++.so.6
No symbol table info available.
#3  0x00007f83bba99a42 in start_thread (arg=<optimized out>) at pthread_create.c:447
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140202479652864, 78784545229976008, -152, 2, 140729050559120, 140203846717440,
                -139264307837193784, -139451664010759736}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0,
              cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#4  0x00007f83bbb1905c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.
(gdb) l
102	    }
103	    return probs;
104	}
105	
106	static results_log_softmax log_softmax(int n_vocab, const float * logits, int tok) {
107	    float max_logit = logits[0];
108	    for (int i = 1; i < n_vocab; ++i) {
109	        max_logit = std::max(max_logit, logits[i]);
110	    }
111	    double sum_exp = 0.0;
(gdb) p logits[0]
Cannot access memory at address 0x7f77e6105010
(gdb) p n_vocab
$1 = 152064
(gdb) p tok
$2 = 5636
(gdb)Metadata
Metadata
Assignees
Labels
bug-unconfirmedmedium severityUsed to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)