Bug: llama-perplexity segfaults

### What happened?

I am running llama-perplexity on [q3_k_m quantized qwen2.5-14b-instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct-GGUF) and wikitext-2. A while into the progress (~90 seconds on my computer with the Debug build; GPU computation clearly has time to happen for a while) it segfaults. The attached log output is from the Debug build, but the same crash happens on the Release build. I am running it with the command `llama-perplexity -m qwen2.5-14b-instruct-q3_k_m-00001-of-00002.gguf -f wikitext-2-raw/wiki.test.raw -fa -c 32768 -ngl 49`.

This is fully repeatable for me.

### Name and Version

This is b3829. Running it on NixOS; I think some of the packaging stuff (as it needs to be repeatable) causes the version to be reported as 0:

```
$ llama-perplexity --version
version: 0 (unknown)
built with gcc (GCC) 13.3.0 for x86_64-unknown-linux-gnu
```

### What operating system are you seeing the problem on?

Linux

### Relevant log output

```shell
$ time llama-perplexity -m qwen2.5-14b-instruct-q3_k_m-00001-of-00002.gguf -f wikitext-2-raw/wiki.test.raw -fa -c 32768 -ngl 49
build: 0 (unknown) with gcc (GCC) 13.3.0 for x86_64-unknown-linux-gnu (debug)
llama_model_loader: additional 1 GGUFs metadata loaded.
llama_model_loader: loaded meta data with 29 key-value pairs and 579 tensors from qwen2.5-14b-instruct-q3_k_m-00001-of-00002.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = qwen2.5-14b-instruct
llama_model_loader: - kv   3:                            general.version str              = v0.1
llama_model_loader: - kv   4:                           general.finetune str              = qwen2.5-14b-instruct
llama_model_loader: - kv   5:                         general.size_label str              = 15B
llama_model_loader: - kv   6:                          qwen2.block_count u32              = 48
llama_model_loader: - kv   7:                       qwen2.context_length u32              = 131072
llama_model_loader: - kv   8:                     qwen2.embedding_length u32              = 5120
llama_model_loader: - kv   9:                  qwen2.feed_forward_length u32              = 13824
llama_model_loader: - kv  10:                 qwen2.attention.head_count u32              = 40
llama_model_loader: - kv  11:              qwen2.attention.head_count_kv u32              = 8
llama_model_loader: - kv  12:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  13:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  14:                          general.file_type u32              = 12
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  16:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  17:                      tokenizer.ggml.tokens arr[str,152064]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,152064]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  22:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  23:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  24:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
llama_model_loader: - kv  25:               general.quantization_version u32              = 2
llama_model_loader: - kv  26:                                   split.no u16              = 0
llama_model_loader: - kv  27:                                split.count u16              = 2
llama_model_loader: - kv  28:                        split.tensors.count i32              = 579
llama_model_loader: - type  f32:  241 tensors
llama_model_loader: - type q3_K:  193 tensors
llama_model_loader: - type q4_K:  139 tensors
llama_model_loader: - type q5_K:    5 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: special tokens cache size = 22
llm_load_vocab: token to piece cache size = 0.9310 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = qwen2
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 152064
llm_load_print_meta: n_merges         = 151387
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 131072
llm_load_print_meta: n_embd           = 5120
llm_load_print_meta: n_layer          = 48
llm_load_print_meta: n_head           = 40
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 5
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 13824
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 2
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 1000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 131072
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = ?B
llm_load_print_meta: model ftype      = Q3_K - Medium
llm_load_print_meta: model params     = 14.77 B
llm_load_print_meta: model size       = 6.83 GiB (3.97 BPW)
llm_load_print_meta: general.name     = qwen2.5-14b-instruct
llm_load_print_meta: BOS token        = 151643 '<|endoftext|>'
llm_load_print_meta: EOS token        = 151645 '<|im_end|>'
llm_load_print_meta: PAD token        = 151643 '<|endoftext|>'
llm_load_print_meta: LF token         = 148848 'ÄĬ'
llm_load_print_meta: EOT token        = 151645 '<|im_end|>'
llm_load_print_meta: EOG token        = 151643 '<|endoftext|>'
llm_load_print_meta: EOG token        = 151645 '<|im_end|>'
llm_load_print_meta: max token length = 256
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
llm_load_tensors: ggml ctx size =    0.51 MiB
llm_load_tensors: offloading 48 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 49/49 layers to GPU
llm_load_tensors:        CPU buffer size =   319.04 MiB
llm_load_tensors:      CUDA0 buffer size =  6674.48 MiB
.........................................................................................
llama_new_context_with_model: n_ctx      = 32768
llama_new_context_with_model: n_batch    = 2048
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 1
llama_new_context_with_model: freq_base  = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      CUDA0 KV buffer size =  6144.00 MiB
llama_new_context_with_model: KV self size  = 6144.00 MiB, K (f16): 3072.00 MiB, V (f16): 3072.00 MiB
llama_new_context_with_model:  CUDA_Host  output buffer size =     0.58 MiB
ggml_gallocr_reserve_n: reallocating CUDA0 buffer from size 0.00 MiB to 307.00 MiB
ggml_gallocr_reserve_n: reallocating CUDA_Host buffer from size 0.00 MiB to 74.01 MiB
llama_new_context_with_model:      CUDA0 compute buffer size =   307.00 MiB
llama_new_context_with_model:  CUDA_Host compute buffer size =    74.01 MiB
llama_new_context_with_model: graph nodes  = 1495
llama_new_context_with_model: graph splits = 2
llama_init_from_gpt_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
llama_output_reserve: reallocating output buffer from size 0.58 MiB to 1.16 MiB
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 2 1 1]

system_info: n_threads = 16 (n_threads_batch = 16) / 32 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
perplexity: tokenizing the input ..
perplexity: tokenization took 736.355 ms
perplexity: calculating perplexity over 9 chunks, n_ctx=32768, batch_size=2048, n_seq=1
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
llama_output_reserve: reallocating output buffer from size 1.16 MiB to 1188.00 MiB
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
ggml_backend_cuda_graph_compute: disabling CUDA graphs due to batch size > 1 [ffn_inp-0] [5120 512 1 1]
perplexity: 24.67 seconds per pass - ETA 3.70 minutes
Segmentation fault (core dumped)

real	1m29,564s
user	0m28,035s
sys	0m30,288s

$ coredumpctl debug
           PID: 3147013 (llama-perplexit)
           UID: 1000 (sliedes)
           GID: 100 (users)
        Signal: 11 (SEGV)
     Timestamp: Tue 2024-10-08 04:01:36 CEST (3min 34s ago)
  Command Line: llama-perplexity -m qwen2.5-14b-instruct-q3_k_m-00001-of-00002.gguf -f wikitext-2-raw/wiki.test.raw -fa -c 32768 -ngl 49
    Executable: /nix/store/213m0yym3qdfnvnzfqdvr82pbgp0b63l-llama-cpp-3829/bin/llama-perplexity
 Control Group: /user.slice/user-1000.slice/session-2.scope
          Unit: session-2.scope
         Slice: user-1000.slice
       Session: 2
     Owner UID: 1000 (sliedes)
       Boot ID: 6224b3f52c0e45468c99f5f5cc1d17f4
    Machine ID: 13629c48106c49a39ea48f0b10557f82
      Hostname: poyta
       Storage: /var/lib/systemd/coredump/core.llama-perplexit.1000.6224b3f52c0e45468c99f5f5cc1d17f4.3147013.1728352896000000.zst (present)
  Size on Disk: 9.6G
       Message: Process 3147013 (llama-perplexit) of user 1000 dumped core.

                Module libgomp.so.1 without build-id.
                Module libgcc_s.so.1 without build-id.
                Module libstdc++.so.6 without build-id.
                Stack trace of thread 3147061:
                #0  0x000000000041d0af _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e0af)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147046:
                #0  0x00007f83bbf909f6 __expf_fma (libm.so.6 + 0x769f6)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147049:
                #0  0x000000000041d11d _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e11d)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147047:
                #0  0x000000000041d0af _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e0af)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147050:
                #0  0x00007f83bbf90a2e __expf_fma (libm.so.6 + 0x76a2e)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147048:
                #0  0x000000000041d11d _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e11d)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147051:
                #0  0x00007f83bbf90a2e __expf_fma (libm.so.6 + 0x76a2e)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147053:
                #0  0x0000000000410e50 n/a (llama-perplexity + 0x11e50)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147054:
                #0  0x00007f83bbf90a71 __expf_fma (libm.so.6 + 0x76a71)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147052:
                #0  0x000000000041d11d _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e11d)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147056:
                #0  0x00007f83bbf90a71 __expf_fma (libm.so.6 + 0x76a71)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147055:
                #0  0x000000000041d123 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e123)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147057:
                #0  0x00007f83bbf90a71 __expf_fma (libm.so.6 + 0x76a71)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147059:
                #0  0x000000000041d123 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e123)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147060:
                #0  0x00007f83bbf909e0 __expf_fma (libm.so.6 + 0x769e0)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147064:
                #0  0x000000000041d11d _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e11d)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147062:
                #0  0x000000000041d119 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e119)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147063:
                #0  0x00007f83bbf90a71 __expf_fma (libm.so.6 + 0x76a71)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147066:
                #0  0x00007f83bbf90a75 __expf_fma (libm.so.6 + 0x76a75)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147067:
                #0  0x00007f83bbf90a71 __expf_fma (libm.so.6 + 0x76a71)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147068:
                #0  0x000000000041d123 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e123)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147069:
                #0  0x000000000041d11d _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e11d)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147070:
                #0  0x00007f83bbf90a71 __expf_fma (libm.so.6 + 0x76a71)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147071:
                #0  0x00007f83bbf90a5f __expf_fma (libm.so.6 + 0x76a5f)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147072:
                #0  0x000000000041d11d _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e11d)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147074:
                #0  0x00007f83bbf90a6d __expf_fma (libm.so.6 + 0x76a6d)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147073:
                #0  0x000000000041d119 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e119)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147075:
                #0  0x00007f83bbf90a56 __expf_fma (libm.so.6 + 0x76a56)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147076:
                #0  0x00007f83bbf90a71 __expf_fma (libm.so.6 + 0x76a71)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147058:
                #0  0x000000000041d0af _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e0af)
                #1  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #2  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #3  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147020:
                #0  0x00007f83bbb0ad1f __poll (libc.so.6 + 0x101d1f)
                #1  0x00007f8394a54e3f n/a (libcuda.so.1 + 0x254e3f)
                #2  0x00007f8394b27fbf n/a (libcuda.so.1 + 0x327fbf)
                #3  0x00007f8394a51113 n/a (libcuda.so.1 + 0x251113)
                #4  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #5  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147014:
                #0  0x00007f83bba960ce __futex_abstimed_wait_common (libc.so.6 + 0x8d0ce)
                #1  0x00007f83bba98c20 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8fc20)
                #2  0x00000000004979fb _ZZN7gpt_log6resumeEvENKUlvE_clEv (llama-perplexity + 0x989fb)
                #3  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #4  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #5  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147013:
                #0  0x000000000041d11d _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e11d)
                #1  0x000000000041fbb8 _ZL10perplexityP13llama_contextRK10gpt_paramsi (llama-perplexity + 0x20bb8)
                #2  0x0000000000417e8e main (llama-perplexity + 0x18e8e)
                #3  0x00007f83bba3314e __libc_start_call_main (libc.so.6 + 0x2a14e)
                #4  0x00007f83bba33209 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2a209)
                #5  0x000000000041bd55 _start (llama-perplexity + 0x1cd55)

                Stack trace of thread 3147019:
                #0  0x00007f83bba960ce __futex_abstimed_wait_common (libc.so.6 + 0x8d0ce)
                #1  0x00007f83bba98f45 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x8ff45)
                #2  0x00007f83949aebca n/a (libcuda.so.1 + 0x1aebca)
                #3  0x00007f8394a51113 n/a (libcuda.so.1 + 0x251113)
                #4  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #5  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147015:
                #0  0x00007f83bbb0ad1f __poll (libc.so.6 + 0x101d1f)
                #1  0x00007f8394a54e3f n/a (libcuda.so.1 + 0x254e3f)
                #2  0x00007f8394b27fbf n/a (libcuda.so.1 + 0x327fbf)
                #3  0x00007f8394a51113 n/a (libcuda.so.1 + 0x251113)
                #4  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #5  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)

                Stack trace of thread 3147065:
                #0  0x00007f83bbf90a71 __expf_fma (libm.so.6 + 0x76a71)
                #1  0x000000000041d110 _ZZL14process_logitsiPKfPKiiRSt6vectorISt6threadSaIS4_EERdS8_PfS9_ENKUlvE_clEv (llama-perplexity + 0x1e110)
                #2  0x00007f83bbce86d3 execute_native_thread_routine (libstdc++.so.6 + 0xe86d3)
                #3  0x00007f83bba99a42 start_thread (libc.so.6 + 0x90a42)
                #4  0x00007f83bbb1905c __clone3 (libc.so.6 + 0x11005c)
                ELF object binary architecture: AMD x86-64

Reading symbols from /nix/store/213m0yym3qdfnvnzfqdvr82pbgp0b63l-llama-cpp-3829/bin/llama-perplexity...

warning: Loadable section ".dynstr" outside of ELF segments
  in /nix/store/213m0yym3qdfnvnzfqdvr82pbgp0b63l-llama-cpp-3829/bin/llama-perplexity
Reading symbols from /nix/store/a9ahhxdscnmaa9r21apcpmgpi7rm8lzw-llama-cpp-3829-debug/lib/debug/.build-id/cd/c2662de87e621fa634f03179bc07cbc193881d.debug...

warning: Can't open file /dev/zero (deleted) during file-backed mapping note processing

warning: core file may not match specified executable file.
[New LWP 3147061]
[New LWP 3147046]
[New LWP 3147049]
[New LWP 3147047]
[New LWP 3147050]
[New LWP 3147048]
[New LWP 3147051]
[New LWP 3147053]
[New LWP 3147054]
[New LWP 3147052]
[New LWP 3147056]
[New LWP 3147055]
[New LWP 3147057]
[New LWP 3147059]
[New LWP 3147060]
[New LWP 3147064]
[New LWP 3147062]
[New LWP 3147063]
[New LWP 3147066]
[New LWP 3147067]
[New LWP 3147068]
[New LWP 3147069]
[New LWP 3147070]
[New LWP 3147071]
[New LWP 3147072]
[New LWP 3147074]
[New LWP 3147073]
[New LWP 3147075]
[New LWP 3147076]
[New LWP 3147058]
[New LWP 3147020]
[New LWP 3147014]
[New LWP 3147013]
[New LWP 3147019]
[New LWP 3147015]
[New LWP 3147065]
Downloading separate debug info for /nix/store/jzp0hpr9avl6i7gkx19dz59xirp0q7m2-cuda_cudart-12.4.99-lib/lib/libcudart.so.12
Downloading separate debug info for /nix/store/gj56jqxsgmvzgf9kdnbhciw5p48h78lb-libcublas-12.4.2.65-lib/lib/libcublas.so.12
Downloading separate debug info for /nix/store/gj56jqxsgmvzgf9kdnbhciw5p48h78lb-libcublas-12.4.2.65-lib/lib/libcublasLt.so.12
Downloading separate debug info for /run/opengl-driver/lib/libcuda.so.1
Downloading separate debug info for system-supplied DSO at 0x7ffe0914f000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/nix/store/3dyw8dzj9ab4m8hv5dpyx7zii8d0w6fi-glibc-2.39-52/lib/libthread_db.so.1".
Core was generated by `llama-perplexity -m qwen2.5-14b-instruct-q3_k_m-00001-of-00002.gguf -f wikitext'.
Program terminated with signal SIGSEGV, Segmentation fault.
Downloading source file /build/source/examples/perplexity/perplexity.cpp
#0  log_softmax (tok=5636, logits=0x7f77e6105010, n_vocab=152064) at /build/source/examples/perplexity/perplexity.cpp:107

warning: 107	/build/source/examples/perplexity/perplexity.cpp: No such file or directory
[Current thread is 1 (Thread 0x7f836eff1000 (LWP 3147061))]
(gdb) set substitute-path /build/source /home/sliedes/proj/llama.cpp
(gdb) bt
#0  log_softmax (tok=5636, logits=0x7f77e6105010, n_vocab=152064) at /build/source/examples/perplexity/perplexity.cpp:107
#1  operator() (__closure=0x1b732f18) at /build/source/examples/perplexity/perplexity.cpp:172
#2  0x00007f83bbce86d3 in execute_native_thread_routine () from /nix/store/22nxhmsfcv2q2rpkmfvzwg2w5z1l231z-gcc-13.3.0-lib/lib/libstdc++.so.6
#3  0x00007f83bba99a42 in start_thread (arg=<optimized out>) at pthread_create.c:447
#4  0x00007f83bbb1905c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
(gdb) bt -full
#0  log_softmax (tok=5636, logits=0x7f77e6105010, n_vocab=152064) at /build/source/examples/perplexity/perplexity.cpp:107
        max_logit = <optimized out>
        sum_exp = <optimized out>
        max_logit = <optimized out>
        sum_exp = <optimized out>
        i = <optimized out>
        i = <optimized out>
#1  operator() (__closure=0x1b732f18) at /build/source/examples/perplexity/perplexity.cpp:172
        lock = <optimized out>
        i = 14124
        results = <optimized out>
        v = <optimized out>
        local_nll = 784.23475355210473
        local_nll2 = 3605.6059932089684
        n_token = 16383
        tokens = 0x7f837cb02010
        logits = 0x7f79e5fff010
        n_vocab = 152064
        prob_history = 0x192460d0
        logit_history = 0x19121fa0
        nll2 = @0x7ffe09117578: 0
        nll = @0x7ffe09117570: 0
        counter = @0x7ffe0911756c: 14126
        mutex = @0x7ffe09117770: {<std::__mutex_base> = {_M_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0,
                __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>,
              __align = 0}}, <No data fields>}
#2  0x00007f83bbce86d3 in execute_native_thread_routine () from /nix/store/22nxhmsfcv2q2rpkmfvzwg2w5z1l231z-gcc-13.3.0-lib/lib/libstdc++.so.6
No symbol table info available.
#3  0x00007f83bba99a42 in start_thread (arg=<optimized out>) at pthread_create.c:447
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140202479652864, 78784545229976008, -152, 2, 140729050559120, 140203846717440,
                -139264307837193784, -139451664010759736}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0,
              cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#4  0x00007f83bbb1905c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.
(gdb) l
102	    }
103	    return probs;
104	}
105	
106	static results_log_softmax log_softmax(int n_vocab, const float * logits, int tok) {
107	    float max_logit = logits[0];
108	    for (int i = 1; i < n_vocab; ++i) {
109	        max_logit = std::max(max_logit, logits[i]);
110	    }
111	    double sum_exp = 0.0;
(gdb) p logits[0]
Cannot access memory at address 0x7f77e6105010
(gdb) p n_vocab
$1 = 152064
(gdb) p tok
$2 = 5636
(gdb)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: llama-perplexity segfaults #9779

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: llama-perplexity segfaults #9779

Description

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions