-
Couldn't load subscription status.
- Fork 13.5k
Description
Name and Version
bash llama-server --version
load_backend: loaded RPC backend from /home/tipu/Applications/llamacpp/libggml-rpc.so
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Graphics (RADV RENOIR) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
load_backend: loaded Vulkan backend from /home/tipu/Applications/llamacpp/libggml-vulkan.so
load_backend: loaded CPU backend from /home/tipu/Applications/llamacpp/libggml-cpu-haswell.so
version: 6800 (0398752)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
Vulkan
Hardware
bash hostnamectl
Operating System: Ubuntu 24.04.3 LTS
Kernel: Linux 6.14.0-33-generic
Architecture: x86-64
Hardware Vendor: GMKtec
Hardware Model: M5 PLUS
Firmware Version: M5 PLUS 1.03
Device Name AMD Radeon Graphics
PCI (domain:bus:dev.func) 0000:03:00.0
DeviceID:RevID 0x15E7.0xC1
OpenGL Driver Version Mesa 25.2.5 - kisak-mesa PPA
gfx_target_version gfx90c
GPU Type APU
Family Raven (RV)
ASIC Name Renoir
Chip Class GFX9
Shader Engine (SE) 1
Shader Array (SA/SH) per SE 1
CU per SA 8
Total CU 8
RenderBackendPlus (RB+) 2 (16 ROPs)
Peak Pixel Fill-Rate 32 GP/s
GPU Clock 200-2000 MHz
Peak FP32 2048 GFLOPS
VRAM Type DDR4
VRAM Bit Width 128-bit
VRAM Size 16384 MiB
Memory Clock 400-1333 MHz
ResizableBAR Enabled
ECC Memory Not Supported
bash vulkaninfo --summary
WARNING: [Loader Message] Code 0 : terminator_CreateInstance: Received return code -3 from call to vkCreateInstance in ICD /usr/lib/x86_64-linux-gnu/libvulkan_dzn.so. Skipping this driver.
==========
VULKANINFO
==========
Vulkan Instance Version: 1.4.313
Devices:
========
GPU0:
apiVersion = 1.4.318
driverVersion = 25.2.5
vendorID = 0x1002
deviceID = 0x15e7
deviceType = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
deviceName = AMD Radeon Graphics (RADV RENOIR)
driverID = DRIVER_ID_MESA_RADV
driverName = radv
driverInfo = Mesa 25.2.5 - kisak-mesa PPA
conformanceVersion = 1.4.0.0
deviceUUID = 00000000-0300-0000-0000-000000000000
driverUUID = 414d442d-4d45-5341-2d44-525600000000
GPU1:
apiVersion = 1.4.318
driverVersion = 25.2.5
vendorID = 0x10005
deviceID = 0x0000
deviceType = PHYSICAL_DEVICE_TYPE_CPU
deviceName = llvmpipe (LLVM 20.1.8, 256 bits)
driverID = DRIVER_ID_MESA_LLVMPIPE
driverName = llvmpipe
driverInfo = Mesa 25.2.5 - kisak-mesa PPA (LLVM 20.1.8)
conformanceVersion = 1.3.1.1
deviceUUID = 6d657361-3235-2e32-2e35-202d206b6900
driverUUID = 6c6c766d-7069-7065-5555-494400000000
Models
https://huggingface.co/ggml-org/granite-docling-258M-GGUF
- name: granite-docling
model_path: /home/tipu/AI/models/ggml-org/Granite_docling/granite-docling-258M-f16.gguf
model_params: |
--port 8888
--api-key 12345
--mmproj /home/tipu/AI/models/ggml-org/Granite_docling/mmproj-granite-docling-258M-f16.gguf
--n-predict -1
--ctx-size 16384
--n-gpu-layers 99
--jinja
--repeat-penalty 1.0
--temp 0.0
--top-k 0
--top-p 1.0
--alias granite-docling
--mlock
--seed -1
--swa-full
--no-escape
--no-mmap
Problem description & steps to reproduce
when I run llama server. I try to perform OCR. I start getting output in loop.
Starting model: granite-docling
Model path: /home/tipu/AI/models/ggml-org/Granite_docling/granite-docling-258M-f16.gguf
Changing directory to: /home/tipu/Applications/llamacpp/
Executing: ./llama-server -m /home/tipu/AI/models/ggml-org/Granite_docling/granite-docling-258M-f16.gguf --port 8888 --api-key 12345 --mmproj /home/tipu/AI/models/ggml-org/Granite_docling/mmproj-granite-docling-258M-f16.gguf --n-predict -1 --ctx-size 16384 --n-gpu-layers 99 --jinja --repeat-penalty 1.0 --temp 0.0 --top-k 0 --top-p 1.0 --alias granite-docling --mlock --seed -1 --swa-full --no-escape --no-mmap
Press Ctrl+C to stop the server and return to the menu.
load_backend: loaded RPC backend from /home/tipu/Applications/llamacpp/libggml-rpc.so
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Graphics (RADV RENOIR) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
load_backend: loaded Vulkan backend from /home/tipu/Applications/llamacpp/libggml-vulkan.so
load_backend: loaded CPU backend from /home/tipu/Applications/llamacpp/libggml-cpu-haswell.so
build: 6800 (0398752d) with cc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0 for x86_64-linux-gnu
system info: n_threads = 8, n_threads_batch = 8, total_threads = 16
system_info: n_threads = 8 (n_threads_batch = 8) / 16 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
main: binding port with default address family
main: HTTP server is listening, hostname: 127.0.0.1, port: 8888, http threads: 15
main: loading model
srv load_model: loading model '/home/tipu/AI/models/ggml-org/Granite_docling/granite-docling-258M-f16.gguf'
llama_model_load_from_file_impl: using device Vulkan0 (AMD Radeon Graphics (RADV RENOIR)) (0000:03:00.0) - 37487 MiB free
llama_model_loader: loaded meta data with 45 key-value pairs and 272 tensors from /home/tipu/AI/models/ggml-org/Granite_docling/granite-docling-258M-f16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.size_label str = 164M
llama_model_loader: - kv 3: general.license str = apache-2.0
llama_model_loader: - kv 4: general.dataset.count u32 = 4
llama_model_loader: - kv 5: general.dataset.0.name str = SynthCodeNet
llama_model_loader: - kv 6: general.dataset.0.organization str = Ds4Sd
llama_model_loader: - kv 7: general.dataset.0.repo_url str = https://huggingface.co/ds4sd/SynthCod...
llama_model_loader: - kv 8: general.dataset.1.name str = SynthFormulaNet
llama_model_loader: - kv 9: general.dataset.1.organization str = Ds4Sd
llama_model_loader: - kv 10: general.dataset.1.repo_url str = https://huggingface.co/ds4sd/SynthFor...
llama_model_loader: - kv 11: general.dataset.2.name str = SynthChartNet
llama_model_loader: - kv 12: general.dataset.2.organization str = Ds4Sd
llama_model_loader: - kv 13: general.dataset.2.repo_url str = https://huggingface.co/ds4sd/SynthCha...
llama_model_loader: - kv 14: general.dataset.3.name str = DoclingMatix
llama_model_loader: - kv 15: general.dataset.3.organization str = HuggingFaceM4
llama_model_loader: - kv 16: general.dataset.3.repo_url str = https://huggingface.co/HuggingFaceM4/...
llama_model_loader: - kv 17: general.tags arr[str,14] = ["text-generation", "documents", "cod...
llama_model_loader: - kv 18: general.languages arr[str,1] = ["en"]
llama_model_loader: - kv 19: llama.block_count u32 = 30
llama_model_loader: - kv 20: llama.context_length u32 = 8192
llama_model_loader: - kv 21: llama.embedding_length u32 = 576
llama_model_loader: - kv 22: llama.feed_forward_length u32 = 1536
llama_model_loader: - kv 23: llama.attention.head_count u32 = 9
llama_model_loader: - kv 24: llama.attention.head_count_kv u32 = 3
llama_model_loader: - kv 25: llama.rope.freq_base f32 = 100000.000000
llama_model_loader: - kv 26: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 27: llama.attention.key_length u32 = 64
llama_model_loader: - kv 28: llama.attention.value_length u32 = 64
llama_model_loader: - kv 29: general.file_type u32 = 1
llama_model_loader: - kv 30: llama.vocab_size u32 = 100352
llama_model_loader: - kv 31: llama.rope.dimension_count u32 = 64
llama_model_loader: - kv 32: general.quantization_version u32 = 2
llama_model_loader: - kv 33: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 34: tokenizer.ggml.pre str = granite-docling
llama_model_loader: - kv 35: tokenizer.ggml.tokens arr[str,100352] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 36: tokenizer.ggml.token_type arr[i32,100352] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 37: tokenizer.ggml.merges arr[str,100000] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 38: tokenizer.ggml.bos_token_id u32 = 100264
llama_model_loader: - kv 39: tokenizer.ggml.eos_token_id u32 = 100257
llama_model_loader: - kv 40: tokenizer.ggml.unknown_token_id u32 = 100338
llama_model_loader: - kv 41: tokenizer.ggml.padding_token_id u32 = 100257
llama_model_loader: - kv 42: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 43: tokenizer.chat_template str = {%- for message in messages -%}\n{{- '...
llama_model_loader: - kv 44: tokenizer.ggml.add_space_prefix bool = false
llama_model_loader: - type f32: 61 tensors
llama_model_loader: - type f16: 211 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = F16
print_info: file size = 312.88 MiB (16.00 BPW)
load: printing all EOG tokens:
load: - 100257 ('<|end_of_text|>')
load: special tokens cache size = 96
load: token to piece cache size = 0.6152 MB
print_info: arch = llama
print_info: vocab_only = 0
print_info: n_ctx_train = 8192
print_info: n_embd = 576
print_info: n_layer = 30
print_info: n_head = 9
print_info: n_head_kv = 3
print_info: n_rot = 64
print_info: n_swa = 0
print_info: is_swa_any = 0
print_info: n_embd_head_k = 64
print_info: n_embd_head_v = 64
print_info: n_gqa = 3
print_info: n_embd_k_gqa = 192
print_info: n_embd_v_gqa = 192
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-05
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: f_attn_scale = 0.0e+00
print_info: n_ff = 1536
print_info: n_expert = 0
print_info: n_expert_used = 0
print_info: causal attn = 1
print_info: pooling type = 0
print_info: rope type = 0
print_info: rope scaling = linear
print_info: freq_base_train = 100000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 8192
print_info: rope_finetuned = unknown
print_info: model type = 256M
print_info: model params = 164.01 M
print_info: general.name = n/a
print_info: vocab type = BPE
print_info: n_vocab = 100352
print_info: n_merges = 100000
print_info: BOS token = 100264 '<|start_of_role|>'
print_info: EOS token = 100257 '<|end_of_text|>'
print_info: EOT token = 100257 '<|end_of_text|>'
print_info: UNK token = 100338 '<|unk|>'
print_info: PAD token = 100257 '<|end_of_text|>'
print_info: LF token = 198 'Ċ'
print_info: EOG token = 100257 '<|end_of_text|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = false)
load_tensors: offloading 30 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 31/31 layers to GPU
load_tensors: Vulkan0 model buffer size = 312.88 MiB
load_tensors: Vulkan_Host model buffer size = 110.25 MiB
.................................................
llama_init_from_model: model default pooling_type is [0], but [-1] was specified
llama_context: constructing llama_context
llama_context: n_seq_max = 1
llama_context: n_ctx = 16384
llama_context: n_ctx_per_seq = 16384
llama_context: n_batch = 2048
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = auto
llama_context: kv_unified = false
llama_context: freq_base = 100000.0
llama_context: freq_scale = 1
llama_context: n_ctx_per_seq (16384) > n_ctx_train (8192) -- possible training context overflow
llama_context: Vulkan_Host output buffer size = 0.38 MiB
llama_kv_cache: Vulkan0 KV buffer size = 360.00 MiB
llama_kv_cache: size = 360.00 MiB ( 16384 cells, 30 layers, 1/1 seqs), K (f16): 180.00 MiB, V (f16): 180.00 MiB
llama_context: Flash Attention was auto, set to enabled
llama_context: Vulkan0 compute buffer size = 197.12 MiB
llama_context: Vulkan_Host compute buffer size = 33.14 MiB
llama_context: graph nodes = 937
llama_context: graph splits = 2
common_init_from_params: added <|end_of_text|> logit bias = -inf
common_init_from_params: setting dry_penalty_last_n to ctx_size = 16384
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
clip_model_loader: model name:
clip_model_loader: description:
clip_model_loader: GGUF version: 3
clip_model_loader: alignment: 32
clip_model_loader: n_tensors: 198
clip_model_loader: n_kv: 36
clip_model_loader: has vision encoder
clip_ctx: CLIP using Vulkan0 backend
load_hparams: projector: idefics3
load_hparams: n_embd: 768
load_hparams: n_head: 12
load_hparams: n_ff: 3072
load_hparams: n_layer: 12
load_hparams: ffn_op: gelu
load_hparams: projection_dim: 576
--- vision hparams ---
load_hparams: image_size: 512
load_hparams: patch_size: 16
load_hparams: has_llava_proj: 0
load_hparams: minicpmv_version: 0
load_hparams: proj_scale_factor: 4
load_hparams: n_wa_pattern: 0
load_hparams: model size: 181.22 MiB
load_hparams: metadata size: 0.07 MiB
alloc_compute_meta: Vulkan0 compute buffer size = 60.00 MiB
alloc_compute_meta: CPU compute buffer size = 3.00 MiB
srv load_model: loaded multimodal model, '/home/tipu/AI/models/ggml-org/Granite_docling/mmproj-granite-docling-258M-f16.gguf'
srv init: initializing slots, n_slots = 1
slot init: id 0 | task -1 | new slot n_ctx_slot = 16384
srv init: prompt cache is enabled, size limit: 8192 MiB
srv init: use `--cache-ram 0` to disable the prompt cache
srv init: for more info see https://github.com/ggml-org/llama.cpp/pull/16391
srv init: thinking = 0
main: model loaded
main: chat template, chat_template: {%- for message in messages -%}
{{- '<|start_of_role|>' + message['role'] + '<|end_of_role|>' -}}
{%- if message['content'] is string -%}
{{- message['content'] -}}
{%- else -%}
{%- for part in message['content'] -%}
{%- if part['type'] == 'text' -%}
{{- part['text'] -}}
{%- elif part['type'] == 'image' -%}
{{- '<image>' -}}
{%- endif -%}
{%- endfor -%}
{%- endif -%}
{{- '<|end_of_text|>
' -}}
{%- endfor -%}
{%- if add_generation_prompt -%}
{{- '<|start_of_role|>assistant' -}}
{%- if controls -%}{{- ' ' + controls | tojson() -}}{%- endif -%}
{{- '<|end_of_role|>' -}}
{%- endif -%}
, example_format: '<|start_of_role|>system<|end_of_role|>You are a helpful assistant<|end_of_text|>
<|start_of_role|>user<|end_of_role|>Hello<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>Hi there<|end_of_text|>
<|start_of_role|>user<|end_of_role|>How are you?<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>'
main: server is listening on http://127.0.0.1:8888 - starting the main loop
srv update_slots: all slots are idle
Output:
<loc_36>loc_34>loc_463>loc_40>Invoice Number: 001
<loc_36>loc_43>loc_463>loc_50>Invoice Number: 001
<loc_36>loc_57>loc_463><loc_67>Invoice Number: 001
<loc_36>loc_63><loc_463><loc_76>Invoice Number: 001
<loc_36><loc_70><loc_463><loc_81>Invoice Number: 001
<loc_36><loc_84><loc_463><loc_104>Invoice Number: 001 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 20
Output from Nanonets:
Invoice
Invoice Number: INV-20250609
Date: June 9, 2025
Bill To: Souvik Mandal
123 Business Street
Kolkata, India
| Item | Description | Quantity | Unit Price | Total |
|---|---|---|---|---|
| 001 | Consulting Services | 10 | ₹2000 | ₹20,000 |
| 002 | Design Work | 5 | ₹1500 | ₹7,500 |
| Grand Total | ₹27,500 | |||
Thank you for your business!
Payment was received on June 7, 2025
First Bad Commit
Not sure.
Relevant log output
bash llama-server -m /home/tipu/AI/models/ggml-org/Granite_docling/granite-docling-258M-f16.gguf --port 8888 --api-key 12345 --mmproj /home/tipu/AI/models/ggml-org/Granite_docling/mmproj-granite-docling-258M-f16.gguf --n-predict -1 --ctx-size 16384 --n-gpu-layers 99 --jinja --repeat-penalty 1.0 --temp 0.0 --top-k 0 --top-p 1.0 --alias granite-docling --mlock --seed -1 --swa-full --no-escape --no-mmap -lv 1
load_backend: loaded RPC backend from /home/tipu/Applications/llamacpp/libggml-rpc.so
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Graphics (RADV RENOIR) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
load_backend: loaded Vulkan backend from /home/tipu/Applications/llamacpp/libggml-vulkan.so
load_backend: loaded CPU backend from /home/tipu/Applications/llamacpp/libggml-cpu-haswell.so
build: 6800 (0398752d) with cc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0 for x86_64-linux-gnu
system info: n_threads = 8, n_threads_batch = 8, total_threads = 16
system_info: n_threads = 8 (n_threads_batch = 8) / 16 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
main: binding port with default address family
main: HTTP server is listening, hostname: 127.0.0.1, port: 8888, http threads: 15
main: loading model
srv load_model: loading model '/home/tipu/AI/models/ggml-org/Granite_docling/granite-docling-258M-f16.gguf'
llama_model_load_from_file_impl: using device Vulkan0 (AMD Radeon Graphics (RADV RENOIR)) (0000:03:00.0) - 36602 MiB free
llama_model_loader: loaded meta data with 45 key-value pairs and 272 tensors from /home/tipu/AI/models/ggml-org/Granite_docling/granite-docling-258M-f16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.size_label str = 164M
llama_model_loader: - kv 3: general.license str = apache-2.0
llama_model_loader: - kv 4: general.dataset.count u32 = 4
llama_model_loader: - kv 5: general.dataset.0.name str = SynthCodeNet
llama_model_loader: - kv 6: general.dataset.0.organization str = Ds4Sd
llama_model_loader: - kv 7: general.dataset.0.repo_url str = https://huggingface.co/ds4sd/SynthCod...
llama_model_loader: - kv 8: general.dataset.1.name str = SynthFormulaNet
llama_model_loader: - kv 9: general.dataset.1.organization str = Ds4Sd
llama_model_loader: - kv 10: general.dataset.1.repo_url str = https://huggingface.co/ds4sd/SynthFor...
llama_model_loader: - kv 11: general.dataset.2.name str = SynthChartNet
llama_model_loader: - kv 12: general.dataset.2.organization str = Ds4Sd
llama_model_loader: - kv 13: general.dataset.2.repo_url str = https://huggingface.co/ds4sd/SynthCha...
llama_model_loader: - kv 14: general.dataset.3.name str = DoclingMatix
llama_model_loader: - kv 15: general.dataset.3.organization str = HuggingFaceM4
llama_model_loader: - kv 16: general.dataset.3.repo_url str = https://huggingface.co/HuggingFaceM4/...
llama_model_loader: - kv 17: general.tags arr[str,14] = ["text-generation", "documents", "cod...
llama_model_loader: - kv 18: general.languages arr[str,1] = ["en"]
llama_model_loader: - kv 19: llama.block_count u32 = 30
llama_model_loader: - kv 20: llama.context_length u32 = 8192
llama_model_loader: - kv 21: llama.embedding_length u32 = 576
llama_model_loader: - kv 22: llama.feed_forward_length u32 = 1536
llama_model_loader: - kv 23: llama.attention.head_count u32 = 9
llama_model_loader: - kv 24: llama.attention.head_count_kv u32 = 3
llama_model_loader: - kv 25: llama.rope.freq_base f32 = 100000.000000
llama_model_loader: - kv 26: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 27: llama.attention.key_length u32 = 64
llama_model_loader: - kv 28: llama.attention.value_length u32 = 64
llama_model_loader: - kv 29: general.file_type u32 = 1
llama_model_loader: - kv 30: llama.vocab_size u32 = 100352
llama_model_loader: - kv 31: llama.rope.dimension_count u32 = 64
llama_model_loader: - kv 32: general.quantization_version u32 = 2
llama_model_loader: - kv 33: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 34: tokenizer.ggml.pre str = granite-docling
llama_model_loader: - kv 35: tokenizer.ggml.tokens arr[str,100352] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 36: tokenizer.ggml.token_type arr[i32,100352] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 37: tokenizer.ggml.merges arr[str,100000] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 38: tokenizer.ggml.bos_token_id u32 = 100264
llama_model_loader: - kv 39: tokenizer.ggml.eos_token_id u32 = 100257
llama_model_loader: - kv 40: tokenizer.ggml.unknown_token_id u32 = 100338
llama_model_loader: - kv 41: tokenizer.ggml.padding_token_id u32 = 100257
llama_model_loader: - kv 42: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 43: tokenizer.chat_template str = {%- for message in messages -%}\n{{- '...
llama_model_loader: - kv 44: tokenizer.ggml.add_space_prefix bool = false
llama_model_loader: - type f32: 61 tensors
llama_model_loader: - type f16: 211 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = F16
print_info: file size = 312.88 MiB (16.00 BPW)
init_tokenizer: initializing tokenizer for type 2
load: control token: 100351 '</code>' is not marked as EOG
load: control token: 100345 '<row_3_col_4>' is not marked as EOG
load: control token: 100344 '<row_3_col_3>' is not marked as EOG
load: control token: 100342 '<row_3_col_1>' is not marked as EOG
load: control token: 100340 '<global-img>' is not marked as EOG
load: control token: 100339 '<fake_token_around_image>' is not marked as EOG
load: control token: 100337 '<rhed>' is not marked as EOG
load: control token: 100336 '<ched>' is not marked as EOG
load: control token: 100335 '<nl>' is not marked as EOG
load: control token: 100333 '<ucel>' is not marked as EOG
load: control token: 100332 '<lcel>' is not marked as EOG
load: control token: 100329 '<rec_' is not marked as EOG
load: control token: 100324 '<unordered_list>' is not marked as EOG
load: control token: 100320 '<references>' is not marked as EOG
load: control token: 100319 '</paragraph>' is not marked as EOG
load: control token: 100317 '</text>' is not marked as EOG
load: control token: 100312 '<chart>' is not marked as EOG
load: control token: 100310 '</value_' is not marked as EOG
load: control token: 100309 '<value_' is not marked as EOG
load: control token: 100305 '<key_value_region>' is not marked as EOG
load: control token: 100304 '</form>' is not marked as EOG
load: control token: 100303 '<form>' is not marked as EOG
load: control token: 100300 '</checkbox_selected>' is not marked as EOG
load: control token: 100298 '</otsl>' is not marked as EOG
load: control token: 100296 '</section_header_level_6>' is not marked as EOG
load: control token: 100293 '<section_header_level_5>' is not marked as EOG
load: control token: 100291 '<section_header_level_4>' is not marked as EOG
load: control token: 100290 '</section_header_level_3>' is not marked as EOG
load: control token: 100288 '</section_header_level_2>' is not marked as EOG
load: control token: 100287 '<section_header_level_2>' is not marked as EOG
load: control token: 100286 '</section_header_level_1>' is not marked as EOG
load: control token: 100285 '<section_header_level_1>' is not marked as EOG
load: control token: 100284 '</picture>' is not marked as EOG
load: control token: 100283 '<picture>' is not marked as EOG
load: control token: 100281 '<page_header>' is not marked as EOG
load: control token: 100277 '<list_item>' is not marked as EOG
load: control token: 100276 '</formula>' is not marked as EOG
load: control token: 100274 '</footnote>' is not marked as EOG
load: control token: 100273 '<footnote>' is not marked as EOG
load: control token: 100272 '</caption>' is not marked as EOG
load: control token: 100269 '<title>' is not marked as EOG
load: control token: 100268 '<row_2_col_3>' is not marked as EOG
load: control token: 100266 '</title>' is not marked as EOG
load: control token: 100265 '<|end_of_role|>' is not marked as EOG
load: control token: 100263 '<row_2_col_1>' is not marked as EOG
load: control token: 100262 '<row_1_col_4>' is not marked as EOG
load: control token: 100256 '<|pad|>' is not marked as EOG
load: control token: 100343 '<row_3_col_2>' is not marked as EOG
load: control token: 100315 '<smiles>' is not marked as EOG
load: control token: 100347 '<row_4_col_2>' is not marked as EOG
load: control token: 100316 '</smiles>' is not marked as EOG
load: control token: 100321 '</references>' is not marked as EOG
load: control token: 100326 '<group>' is not marked as EOG
load: control token: 100275 '<formula>' is not marked as EOG
load: control token: 100292 '</section_header_level_4>' is not marked as EOG
load: control token: 100259 '<row_1_col_2>' is not marked as EOG
load: control token: 100294 '</section_header_level_5>' is not marked as EOG
load: control token: 100301 '<checkbox_unselected>' is not marked as EOG
load: control token: 100279 '<page_footer>' is not marked as EOG
load: control token: 100270 '<image>' is not marked as EOG
load: control token: 100295 '<section_header_level_6>' is not marked as EOG
load: control token: 100261 '<row_1_col_3>' is not marked as EOG
load: control token: 100334 '<xcel>' is not marked as EOG
load: control token: 100264 '<|start_of_role|>' is not marked as EOG
load: control token: 100348 '<row_4_col_3>' is not marked as EOG
load: control token: 100327 '<doctag>' is not marked as EOG
load: control token: 100306 '</key_value_region>' is not marked as EOG
load: control token: 100330 '<fcel>' is not marked as EOG
load: control token: 100271 '<caption>' is not marked as EOG
load: control token: 100308 '</key_' is not marked as EOG
load: control token: 100278 '</list_item>' is not marked as EOG
load: control token: 100258 '<row_1_col_1>' is not marked as EOG
load: control token: 100331 '<ecel>' is not marked as EOG
load: control token: 100289 '<section_header_level_3>' is not marked as EOG
load: control token: 100323 '</ordered_list>' is not marked as EOG
load: control token: 100280 '</page_footer>' is not marked as EOG
load: control token: 100338 '<|unk|>' is not marked as EOG
load: control token: 100346 '<row_4_col_1>' is not marked as EOG
load: control token: 100297 '<otsl>' is not marked as EOG
load: control token: 100311 '<link_' is not marked as EOG
load: control token: 100307 '<key_' is not marked as EOG
load: control token: 100328 '</doctag>' is not marked as EOG
load: control token: 100267 '<row_2_col_2>' is not marked as EOG
load: control token: 100260 '<text>' is not marked as EOG
load: control token: 100341 '<row_2_col_4>' is not marked as EOG
load: control token: 100318 '<paragraph>' is not marked as EOG
load: control token: 100314 '<page_break>' is not marked as EOG
load: control token: 100299 '<checkbox_selected>' is not marked as EOG
load: control token: 100313 '</chart>' is not marked as EOG
load: control token: 100282 '</page_header>' is not marked as EOG
load: control token: 100322 '<ordered_list>' is not marked as EOG
load: control token: 100302 '</checkbox_unselected>' is not marked as EOG
load: control token: 100325 '</unordered_list>' is not marked as EOG
load: control token: 100349 '<row_4_col_4>' is not marked as EOG
load: control token: 100350 '<code>' is not marked as EOG
load: printing all EOG tokens:
load: - 100257 ('<|end_of_text|>')
load: special tokens cache size = 96
load: token to piece cache size = 0.6152 MB
print_info: arch = llama
print_info: vocab_only = 0
print_info: n_ctx_train = 8192
print_info: n_embd = 576
print_info: n_layer = 30
print_info: n_head = 9
print_info: n_head_kv = 3
print_info: n_rot = 64
print_info: n_swa = 0
print_info: is_swa_any = 0
print_info: n_embd_head_k = 64
print_info: n_embd_head_v = 64
print_info: n_gqa = 3
print_info: n_embd_k_gqa = 192
print_info: n_embd_v_gqa = 192
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-05
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: f_attn_scale = 0.0e+00
print_info: n_ff = 1536
print_info: n_expert = 0
print_info: n_expert_used = 0
print_info: causal attn = 1
print_info: pooling type = 0
print_info: rope type = 0
print_info: rope scaling = linear
print_info: freq_base_train = 100000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 8192
print_info: rope_finetuned = unknown
print_info: model type = 256M
print_info: model params = 164.01 M
print_info: general.name = n/a
print_info: vocab type = BPE
print_info: n_vocab = 100352
print_info: n_merges = 100000
print_info: BOS token = 100264 '<|start_of_role|>'
print_info: EOS token = 100257 '<|end_of_text|>'
print_info: EOT token = 100257 '<|end_of_text|>'
print_info: UNK token = 100338 '<|unk|>'
print_info: PAD token = 100257 '<|end_of_text|>'
print_info: LF token = 198 'Ċ'
print_info: EOG token = 100257 '<|end_of_text|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = false)
load_tensors: layer 0 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 1 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 2 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 3 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 4 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 5 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 6 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 7 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 8 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 9 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 10 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 11 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 12 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 13 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 14 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 15 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 16 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 17 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 18 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 19 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 20 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 21 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 22 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 23 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 24 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 25 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 26 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 27 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 28 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 29 assigned to device Vulkan0, is_swa = 0
load_tensors: layer 30 assigned to device Vulkan0, is_swa = 0
create_tensor: loading tensor token_embd.weight
create_tensor: loading tensor output_norm.weight
create_tensor: loading tensor token_embd.weight
create_tensor: loading tensor blk.0.attn_norm.weight
create_tensor: loading tensor blk.0.attn_q.weight
create_tensor: loading tensor blk.0.attn_k.weight
create_tensor: loading tensor blk.0.attn_v.weight
create_tensor: loading tensor blk.0.attn_output.weight
create_tensor: loading tensor blk.0.ffn_norm.weight
create_tensor: loading tensor blk.0.ffn_gate.weight
create_tensor: loading tensor blk.0.ffn_down.weight
create_tensor: loading tensor blk.0.ffn_up.weight
create_tensor: loading tensor blk.1.attn_norm.weight
create_tensor: loading tensor blk.1.attn_q.weight
create_tensor: loading tensor blk.1.attn_k.weight
create_tensor: loading tensor blk.1.attn_v.weight
create_tensor: loading tensor blk.1.attn_output.weight
create_tensor: loading tensor blk.1.ffn_norm.weight
create_tensor: loading tensor blk.1.ffn_gate.weight
create_tensor: loading tensor blk.1.ffn_down.weight
create_tensor: loading tensor blk.1.ffn_up.weight
create_tensor: loading tensor blk.2.attn_norm.weight
create_tensor: loading tensor blk.2.attn_q.weight
create_tensor: loading tensor blk.2.attn_k.weight
create_tensor: loading tensor blk.2.attn_v.weight
create_tensor: loading tensor blk.2.attn_output.weight
create_tensor: loading tensor blk.2.ffn_norm.weight
create_tensor: loading tensor blk.2.ffn_gate.weight
create_tensor: loading tensor blk.2.ffn_down.weight
create_tensor: loading tensor blk.2.ffn_up.weight
create_tensor: loading tensor blk.3.attn_norm.weight
create_tensor: loading tensor blk.3.attn_q.weight
create_tensor: loading tensor blk.3.attn_k.weight
create_tensor: loading tensor blk.3.attn_v.weight
create_tensor: loading tensor blk.3.attn_output.weight
create_tensor: loading tensor blk.3.ffn_norm.weight
create_tensor: loading tensor blk.3.ffn_gate.weight
create_tensor: loading tensor blk.3.ffn_down.weight
create_tensor: loading tensor blk.3.ffn_up.weight
create_tensor: loading tensor blk.4.attn_norm.weight
create_tensor: loading tensor blk.4.attn_q.weight
create_tensor: loading tensor blk.4.attn_k.weight
create_tensor: loading tensor blk.4.attn_v.weight
create_tensor: loading tensor blk.4.attn_output.weight
create_tensor: loading tensor blk.4.ffn_norm.weight
create_tensor: loading tensor blk.4.ffn_gate.weight
create_tensor: loading tensor blk.4.ffn_down.weight
create_tensor: loading tensor blk.4.ffn_up.weight
create_tensor: loading tensor blk.5.attn_norm.weight
create_tensor: loading tensor blk.5.attn_q.weight
create_tensor: loading tensor blk.5.attn_k.weight
create_tensor: loading tensor blk.5.attn_v.weight
create_tensor: loading tensor blk.5.attn_output.weight
create_tensor: loading tensor blk.5.ffn_norm.weight
create_tensor: loading tensor blk.5.ffn_gate.weight
create_tensor: loading tensor blk.5.ffn_down.weight
create_tensor: loading tensor blk.5.ffn_up.weight
create_tensor: loading tensor blk.6.attn_norm.weight
create_tensor: loading tensor blk.6.attn_q.weight
create_tensor: loading tensor blk.6.attn_k.weight
create_tensor: loading tensor blk.6.attn_v.weight
create_tensor: loading tensor blk.6.attn_output.weight
create_tensor: loading tensor blk.6.ffn_norm.weight
create_tensor: loading tensor blk.6.ffn_gate.weight
create_tensor: loading tensor blk.6.ffn_down.weight
create_tensor: loading tensor blk.6.ffn_up.weight
create_tensor: loading tensor blk.7.attn_norm.weight
create_tensor: loading tensor blk.7.attn_q.weight
create_tensor: loading tensor blk.7.attn_k.weight
create_tensor: loading tensor blk.7.attn_v.weight
create_tensor: loading tensor blk.7.attn_output.weight
create_tensor: loading tensor blk.7.ffn_norm.weight
create_tensor: loading tensor blk.7.ffn_gate.weight
create_tensor: loading tensor blk.7.ffn_down.weight
create_tensor: loading tensor blk.7.ffn_up.weight
create_tensor: loading tensor blk.8.attn_norm.weight
create_tensor: loading tensor blk.8.attn_q.weight
create_tensor: loading tensor blk.8.attn_k.weight
create_tensor: loading tensor blk.8.attn_v.weight
create_tensor: loading tensor blk.8.attn_output.weight
create_tensor: loading tensor blk.8.ffn_norm.weight
create_tensor: loading tensor blk.8.ffn_gate.weight
create_tensor: loading tensor blk.8.ffn_down.weight
create_tensor: loading tensor blk.8.ffn_up.weight
create_tensor: loading tensor blk.9.attn_norm.weight
create_tensor: loading tensor blk.9.attn_q.weight
create_tensor: loading tensor blk.9.attn_k.weight
create_tensor: loading tensor blk.9.attn_v.weight
create_tensor: loading tensor blk.9.attn_output.weight
create_tensor: loading tensor blk.9.ffn_norm.weight
create_tensor: loading tensor blk.9.ffn_gate.weight
create_tensor: loading tensor blk.9.ffn_down.weight
create_tensor: loading tensor blk.9.ffn_up.weight
create_tensor: loading tensor blk.10.attn_norm.weight
create_tensor: loading tensor blk.10.attn_q.weight
create_tensor: loading tensor blk.10.attn_k.weight
create_tensor: loading tensor blk.10.attn_v.weight
create_tensor: loading tensor blk.10.attn_output.weight
create_tensor: loading tensor blk.10.ffn_norm.weight
create_tensor: loading tensor blk.10.ffn_gate.weight
create_tensor: loading tensor blk.10.ffn_down.weight
create_tensor: loading tensor blk.10.ffn_up.weight
create_tensor: loading tensor blk.11.attn_norm.weight
create_tensor: loading tensor blk.11.attn_q.weight
create_tensor: loading tensor blk.11.attn_k.weight
create_tensor: loading tensor blk.11.attn_v.weight
create_tensor: loading tensor blk.11.attn_output.weight
create_tensor: loading tensor blk.11.ffn_norm.weight
create_tensor: loading tensor blk.11.ffn_gate.weight
create_tensor: loading tensor blk.11.ffn_down.weight
create_tensor: loading tensor blk.11.ffn_up.weight
create_tensor: loading tensor blk.12.attn_norm.weight
create_tensor: loading tensor blk.12.attn_q.weight
create_tensor: loading tensor blk.12.attn_k.weight
create_tensor: loading tensor blk.12.attn_v.weight
create_tensor: loading tensor blk.12.attn_output.weight
create_tensor: loading tensor blk.12.ffn_norm.weight
create_tensor: loading tensor blk.12.ffn_gate.weight
create_tensor: loading tensor blk.12.ffn_down.weight
create_tensor: loading tensor blk.12.ffn_up.weight
create_tensor: loading tensor blk.13.attn_norm.weight
create_tensor: loading tensor blk.13.attn_q.weight
create_tensor: loading tensor blk.13.attn_k.weight
create_tensor: loading tensor blk.13.attn_v.weight
create_tensor: loading tensor blk.13.attn_output.weight
create_tensor: loading tensor blk.13.ffn_norm.weight
create_tensor: loading tensor blk.13.ffn_gate.weight
create_tensor: loading tensor blk.13.ffn_down.weight
create_tensor: loading tensor blk.13.ffn_up.weight
create_tensor: loading tensor blk.14.attn_norm.weight
create_tensor: loading tensor blk.14.attn_q.weight
create_tensor: loading tensor blk.14.attn_k.weight
create_tensor: loading tensor blk.14.attn_v.weight
create_tensor: loading tensor blk.14.attn_output.weight
create_tensor: loading tensor blk.14.ffn_norm.weight
create_tensor: loading tensor blk.14.ffn_gate.weight
create_tensor: loading tensor blk.14.ffn_down.weight
create_tensor: loading tensor blk.14.ffn_up.weight
create_tensor: loading tensor blk.15.attn_norm.weight
create_tensor: loading tensor blk.15.attn_q.weight
create_tensor: loading tensor blk.15.attn_k.weight
create_tensor: loading tensor blk.15.attn_v.weight
create_tensor: loading tensor blk.15.attn_output.weight
create_tensor: loading tensor blk.15.ffn_norm.weight
create_tensor: loading tensor blk.15.ffn_gate.weight
create_tensor: loading tensor blk.15.ffn_down.weight
create_tensor: loading tensor blk.15.ffn_up.weight
create_tensor: loading tensor blk.16.attn_norm.weight
create_tensor: loading tensor blk.16.attn_q.weight
create_tensor: loading tensor blk.16.attn_k.weight
create_tensor: loading tensor blk.16.attn_v.weight
create_tensor: loading tensor blk.16.attn_output.weight
create_tensor: loading tensor blk.16.ffn_norm.weight
create_tensor: loading tensor blk.16.ffn_gate.weight
create_tensor: loading tensor blk.16.ffn_down.weight
create_tensor: loading tensor blk.16.ffn_up.weight
create_tensor: loading tensor blk.17.attn_norm.weight
create_tensor: loading tensor blk.17.attn_q.weight
create_tensor: loading tensor blk.17.attn_k.weight
create_tensor: loading tensor blk.17.attn_v.weight
create_tensor: loading tensor blk.17.attn_output.weight
create_tensor: loading tensor blk.17.ffn_norm.weight
create_tensor: loading tensor blk.17.ffn_gate.weight
create_tensor: loading tensor blk.17.ffn_down.weight
create_tensor: loading tensor blk.17.ffn_up.weight
create_tensor: loading tensor blk.18.attn_norm.weight
create_tensor: loading tensor blk.18.attn_q.weight
create_tensor: loading tensor blk.18.attn_k.weight
create_tensor: loading tensor blk.18.attn_v.weight
create_tensor: loading tensor blk.18.attn_output.weight
create_tensor: loading tensor blk.18.ffn_norm.weight
create_tensor: loading tensor blk.18.ffn_gate.weight
create_tensor: loading tensor blk.18.ffn_down.weight
create_tensor: loading tensor blk.18.ffn_up.weight
create_tensor: loading tensor blk.19.attn_norm.weight
create_tensor: loading tensor blk.19.attn_q.weight
create_tensor: loading tensor blk.19.attn_k.weight
create_tensor: loading tensor blk.19.attn_v.weight
create_tensor: loading tensor blk.19.attn_output.weight
create_tensor: loading tensor blk.19.ffn_norm.weight
create_tensor: loading tensor blk.19.ffn_gate.weight
create_tensor: loading tensor blk.19.ffn_down.weight
create_tensor: loading tensor blk.19.ffn_up.weight
create_tensor: loading tensor blk.20.attn_norm.weight
create_tensor: loading tensor blk.20.attn_q.weight
create_tensor: loading tensor blk.20.attn_k.weight
create_tensor: loading tensor blk.20.attn_v.weight
create_tensor: loading tensor blk.20.attn_output.weight
create_tensor: loading tensor blk.20.ffn_norm.weight
create_tensor: loading tensor blk.20.ffn_gate.weight
create_tensor: loading tensor blk.20.ffn_down.weight
create_tensor: loading tensor blk.20.ffn_up.weight
create_tensor: loading tensor blk.21.attn_norm.weight
create_tensor: loading tensor blk.21.attn_q.weight
create_tensor: loading tensor blk.21.attn_k.weight
create_tensor: loading tensor blk.21.attn_v.weight
create_tensor: loading tensor blk.21.attn_output.weight
create_tensor: loading tensor blk.21.ffn_norm.weight
create_tensor: loading tensor blk.21.ffn_gate.weight
create_tensor: loading tensor blk.21.ffn_down.weight
create_tensor: loading tensor blk.21.ffn_up.weight
create_tensor: loading tensor blk.22.attn_norm.weight
create_tensor: loading tensor blk.22.attn_q.weight
create_tensor: loading tensor blk.22.attn_k.weight
create_tensor: loading tensor blk.22.attn_v.weight
create_tensor: loading tensor blk.22.attn_output.weight
create_tensor: loading tensor blk.22.ffn_norm.weight
create_tensor: loading tensor blk.22.ffn_gate.weight
create_tensor: loading tensor blk.22.ffn_down.weight
create_tensor: loading tensor blk.22.ffn_up.weight
create_tensor: loading tensor blk.23.attn_norm.weight
create_tensor: loading tensor blk.23.attn_q.weight
create_tensor: loading tensor blk.23.attn_k.weight
create_tensor: loading tensor blk.23.attn_v.weight
create_tensor: loading tensor blk.23.attn_output.weight
create_tensor: loading tensor blk.23.ffn_norm.weight
create_tensor: loading tensor blk.23.ffn_gate.weight
create_tensor: loading tensor blk.23.ffn_down.weight
create_tensor: loading tensor blk.23.ffn_up.weight
create_tensor: loading tensor blk.24.attn_norm.weight
create_tensor: loading tensor blk.24.attn_q.weight
create_tensor: loading tensor blk.24.attn_k.weight
create_tensor: loading tensor blk.24.attn_v.weight
create_tensor: loading tensor blk.24.attn_output.weight
create_tensor: loading tensor blk.24.ffn_norm.weight
create_tensor: loading tensor blk.24.ffn_gate.weight
create_tensor: loading tensor blk.24.ffn_down.weight
create_tensor: loading tensor blk.24.ffn_up.weight
create_tensor: loading tensor blk.25.attn_norm.weight
create_tensor: loading tensor blk.25.attn_q.weight
create_tensor: loading tensor blk.25.attn_k.weight
create_tensor: loading tensor blk.25.attn_v.weight
create_tensor: loading tensor blk.25.attn_output.weight
create_tensor: loading tensor blk.25.ffn_norm.weight
create_tensor: loading tensor blk.25.ffn_gate.weight
create_tensor: loading tensor blk.25.ffn_down.weight
create_tensor: loading tensor blk.25.ffn_up.weight
create_tensor: loading tensor blk.26.attn_norm.weight
create_tensor: loading tensor blk.26.attn_q.weight
create_tensor: loading tensor blk.26.attn_k.weight
create_tensor: loading tensor blk.26.attn_v.weight
create_tensor: loading tensor blk.26.attn_output.weight
create_tensor: loading tensor blk.26.ffn_norm.weight
create_tensor: loading tensor blk.26.ffn_gate.weight
create_tensor: loading tensor blk.26.ffn_down.weight
create_tensor: loading tensor blk.26.ffn_up.weight
create_tensor: loading tensor blk.27.attn_norm.weight
create_tensor: loading tensor blk.27.attn_q.weight
create_tensor: loading tensor blk.27.attn_k.weight
create_tensor: loading tensor blk.27.attn_v.weight
create_tensor: loading tensor blk.27.attn_output.weight
create_tensor: loading tensor blk.27.ffn_norm.weight
create_tensor: loading tensor blk.27.ffn_gate.weight
create_tensor: loading tensor blk.27.ffn_down.weight
create_tensor: loading tensor blk.27.ffn_up.weight
create_tensor: loading tensor blk.28.attn_norm.weight
create_tensor: loading tensor blk.28.attn_q.weight
create_tensor: loading tensor blk.28.attn_k.weight
create_tensor: loading tensor blk.28.attn_v.weight
create_tensor: loading tensor blk.28.attn_output.weight
create_tensor: loading tensor blk.28.ffn_norm.weight
create_tensor: loading tensor blk.28.ffn_gate.weight
create_tensor: loading tensor blk.28.ffn_down.weight
create_tensor: loading tensor blk.28.ffn_up.weight
create_tensor: loading tensor blk.29.attn_norm.weight
create_tensor: loading tensor blk.29.attn_q.weight
create_tensor: loading tensor blk.29.attn_k.weight
create_tensor: loading tensor blk.29.attn_v.weight
create_tensor: loading tensor blk.29.attn_output.weight
create_tensor: loading tensor blk.29.ffn_norm.weight
create_tensor: loading tensor blk.29.ffn_gate.weight
create_tensor: loading tensor blk.29.ffn_down.weight
create_tensor: loading tensor blk.29.ffn_up.weight
load_tensors: offloading 30 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 31/31 layers to GPU
load_tensors: Vulkan0 model buffer size = 312.88 MiB
load_tensors: Vulkan_Host model buffer size = 110.25 MiB
load_all_data: device Vulkan0 does not support async, host buffers or events
................................................load_all_data: buffer type Vulkan_Host is not the default buffer type for device Vulkan0 for async uploads
.
llama_init_from_model: model default pooling_type is [0], but [-1] was specified
llama_context: constructing llama_context
llama_context: n_seq_max = 1
llama_context: n_ctx = 16384
llama_context: n_ctx_per_seq = 16384
llama_context: n_batch = 2048
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = auto
llama_context: kv_unified = false
llama_context: freq_base = 100000.0
llama_context: freq_scale = 1
llama_context: n_ctx_per_seq (16384) > n_ctx_train (8192) -- possible training context overflow
set_abort_callback: call
llama_context: Vulkan_Host output buffer size = 0.38 MiB
create_memory: n_ctx = 16384 (padded)
llama_kv_cache: layer 0: dev = Vulkan0
llama_kv_cache: layer 1: dev = Vulkan0
llama_kv_cache: layer 2: dev = Vulkan0
llama_kv_cache: layer 3: dev = Vulkan0
llama_kv_cache: layer 4: dev = Vulkan0
llama_kv_cache: layer 5: dev = Vulkan0
llama_kv_cache: layer 6: dev = Vulkan0
llama_kv_cache: layer 7: dev = Vulkan0
llama_kv_cache: layer 8: dev = Vulkan0
llama_kv_cache: layer 9: dev = Vulkan0
llama_kv_cache: layer 10: dev = Vulkan0
llama_kv_cache: layer 11: dev = Vulkan0
llama_kv_cache: layer 12: dev = Vulkan0
llama_kv_cache: layer 13: dev = Vulkan0
llama_kv_cache: layer 14: dev = Vulkan0
llama_kv_cache: layer 15: dev = Vulkan0
llama_kv_cache: layer 16: dev = Vulkan0
llama_kv_cache: layer 17: dev = Vulkan0
llama_kv_cache: layer 18: dev = Vulkan0
llama_kv_cache: layer 19: dev = Vulkan0
llama_kv_cache: layer 20: dev = Vulkan0
llama_kv_cache: layer 21: dev = Vulkan0
llama_kv_cache: layer 22: dev = Vulkan0
llama_kv_cache: layer 23: dev = Vulkan0
llama_kv_cache: layer 24: dev = Vulkan0
llama_kv_cache: layer 25: dev = Vulkan0
llama_kv_cache: layer 26: dev = Vulkan0
llama_kv_cache: layer 27: dev = Vulkan0
llama_kv_cache: layer 28: dev = Vulkan0
llama_kv_cache: layer 29: dev = Vulkan0
llama_kv_cache: Vulkan0 KV buffer size = 360.00 MiB
llama_kv_cache: size = 360.00 MiB ( 16384 cells, 30 layers, 1/1 seqs), K (f16): 180.00 MiB, V (f16): 180.00 MiB
llama_context: enumerating backends
llama_context: backend_ptrs.size() = 2
llama_context: max_nodes = 2184
llama_context: reserving full memory module
llama_context: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 1
graph_reserve: reserving a graph for ubatch with n_tokens = 1, n_seqs = 1, n_outputs = 1
llama_context: Flash Attention was auto, set to enabled
graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512
graph_reserve: reserving a graph for ubatch with n_tokens = 1, n_seqs = 1, n_outputs = 1
graph_reserve: reserving a graph for ubatch with n_tokens = 512, n_seqs = 1, n_outputs = 512
llama_context: Vulkan0 compute buffer size = 197.12 MiB
llama_context: Vulkan_Host compute buffer size = 33.14 MiB
llama_context: graph nodes = 937
llama_context: graph splits = 2
clear_adapter_lora: call
common_init_from_params: added <|end_of_text|> logit bias = -inf
common_init_from_params: setting dry_penalty_last_n to ctx_size = 16384
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
set_warmup: value = 1
set_warmup: value = 0
clip_model_loader: model name:
clip_model_loader: description:
clip_model_loader: GGUF version: 3
clip_model_loader: alignment: 32
clip_model_loader: n_tensors: 198
clip_model_loader: n_kv: 36
clip_model_loader: has vision encoder
clip_model_loader: tensor[0]: n_dims = 2, name = mm.model.fc.weight, tensor_size=14155776, offset=0, shape:[12288, 576, 1, 1], type = f16
clip_model_loader: tensor[1]: n_dims = 1, name = v.patch_embd.bias, tensor_size=3072, offset=14155776, shape:[768, 1, 1, 1], type = f32
clip_model_loader: tensor[2]: n_dims = 4, name = v.patch_embd.weight, tensor_size=2359296, offset=14158848, shape:[16, 16, 3, 768], type = f32
clip_model_loader: tensor[3]: n_dims = 2, name = v.position_embd.weight, tensor_size=3145728, offset=16518144, shape:[768, 1024, 1, 1], type = f32
clip_model_loader: tensor[4]: n_dims = 1, name = v.blk.0.ln1.bias, tensor_size=3072, offset=19663872, shape:[768, 1, 1, 1], type = f32
clip_model_loader: tensor[5]: n_dims = 1, name = v.blk.0.ln1.weight, tensor_size=3072, offset=19666944, shape:[768, 1, 1, 1], type = f32
clip_model_loader: tensor[6]: n_dims = 1, name = v.blk.0.ln2.bias, tensor_size=3072, offset=19670016, shape:[768, 1, 1, 1], type = f32
clip_model_loader: tensor[7]: n_dims = 1, name = v.blk.0.ln2.weight, tensor_size=3072, offset=19673088, shape:[768, 1, 1, 1], type = f32
clip_model_loader: tensor[8]: n_dims = 1, name = v.blk.0.ffn_up.bias, tensor_size=12288, offset=19676160, shape:[3072, 1, 1, 1], type = f32
clip_model_loader: tensor[9]: n_dims = 2, name = v.blk.0.ffn_up.weight, tensor_size=4718592, offset=19688448, shape:[768, 3072, 1, 1], type = f16
clip_model_loader: tensor[10]: n_dims = 1, name = v.blk.0.ffn_down.bias, tensor_size=3072, offset=24407040, shape:[768, 1, 1, 1], type = f32
clip_model_loader: tensor[11]: n_dims = 2, name = v.blk.0.ffn_down.weight, tensor_size=4718592, offset=24410112, shape:[3072, 768, 1, 1], type = f16
clip_model_loader: tensor[12]: n_dims = 1, name = v.blk.0.attn_k.bias, tensor_size=3072, offset=29128704, shape:[768, 1, 1, 1], type = f32
clip_model_loader: tensor[13]: n_dims = 2, name = v.blk.0.attn_k.weight, tensor_size=1179648, offset=29131776, shape:[768, 768, 1, 1], type = f16
clip_model_loader: tensor[14]: n_dims = 1, name = v.blk.0.attn_out.bias, tensor_size=3072, offset=30311424, shape:[768, 1, 1, 1], type = f32
clip_model_loader: tensor[15]: n_dims = 2, name = v.blk.0.attn_out.weight, tensor_size=1179648, offset=30314496, shape:[768, 768, 1, 1], type = f16
clip_model_loader: tensor[16]: n_dims = 1, name = v.blk.0.attn_q.bias, tensor_size=3072, offset=31494144, shape:[768, 1, 1, 1], type = f32
clip_model_loader: tensor[17]: n_dims = 2, name = v.blk.0.attn_q.weight, tensor_size=1179648, offset=31497216, shape:[768, 768, 1, 1], type = f16
clip_model_loader: tensor[18]: n_dims = 1, name = v.blk.0.attn_v.bias, tensor_size=3072, offset=32676864, shape:[768, 1, 1, 1], type = f32
clip_model_loader: tensor[19]: n_dims = 2, name = v.blk.0.attn_v.weight, tensor_size=1179648, offset=32679936, shape:[768, 768, 1, 1], type = f16
clip_model_loader: tensor[20]: n_dims = 1, name = v.blk.1.ln1.bias, tensor_size=3072, offset=33859584, shape:[768, 1, 1, 1], type = f32
clip_model_loader: tensor[21]: n_dims = 1, name = v.blk.1.ln1.weight, tensor_size=3072, offset=33862656, shape:[768, 1, 1, 1], type = f32
clip_model_loader: tensor[22]: n_dims = 1, name = v.blk.1.ln2.bias, tensor_size=3072, offset=33865728, shape:[768, 1, 1, 1], type = f32
clip_model_loader: tensor[23]: n_dims = 1, name = v.blk.1.ln2.weight, tensor_size=3072, offset=33868800, shape:[768, 1, 1, 1], type = f32
clip_model_loader: tensor[24]: n_dims = 1, name = v.blk.1.ffn_up.bias, tensor_size=12288, offset=33871872, shape:[3072, 1, 1, 1], type = f32
clip_model_loader: tensor[25]: n_dims = 2, name = v.blk.1.ffn_up.weight, tensor_size=4718592, offset=33884160, shape:[768, 3072, 1, 1], type = f16
clip_model_loader: tensor[26]: n_dims = 1, name = v.blk.1.ffn_down.bias, tensor_size=3072, offset=38602752, shape:[768, 1, 1, 1], type = f32
clip_model_loader: tensor[27]: n_dims = 2, name = v.blk.1.ffn_down.weight, tensor_size=4718592, offset=38605824, shape:[3072, 768, 1, 1], type = f16
clip_model_loader: tensor[28]: n_dims = 1, name = v.blk.1.attn_k.bias, tensor_size=3072, offset=43324416, shape:[768, 1, 1, 1], type = f32
clip_model_loader: tensor[29]: n_dims = 2, name = v.blk.1.attn_k.weight, tensor_size=1179648, offset=43327488, shape:[768, 768, 1, 1], type = f16
clip_model_loader: tensor[30]: n_dims = 1, name = v.blk.1.attn_out.bias, tensor_size=3072, offset=44507136, shape:[768, 1, 1, 1], type = f32
clip_model_loader: tensor[31]: n_dims = 2, name = v.blk.1.attn_out.weight, tensor_size=1179648, offset=44510208, shape:[768, 768, 1, 1], type = f16
clip_model_loader: tensor[32]: n_dims = 1, name = v.blk.1.attn_q.bias, tensor_size=3072,