Skip to content

Conversation

@tdakhran
Copy link
Contributor

@tdakhran tdakhran commented Jan 6, 2026

Liquid AI released LFM2.5-Audio-1.5B.

LFM2.5-Audio-1.5B is Liquid AI's updated end-to-end audio foundation model. Key improvements include a custom, LFM based audio detokenizer, llama.cpp compatible GGUFs for CPU inference, and better ASR and TTS performance.

This PR is intended to provide a functional implementation in llama.cpp until necessary infrastructure is implemented.
The plan is to split and merge it into upstream in smaller chunks, while keeping and tracking functional implementation here. It will be rebased from time to time.

GGUFs, precompiled runners, and instructions, live in https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B-GGUF.

Merge plan:

Demo of capabilities (watch with audio on)

demo.mp4

Thank you, @ngxson for the help!

@github-actions github-actions bot added model Model specific examples python python script changes server labels Jan 6, 2026
@tdakhran tdakhran force-pushed the tarek/feat/os-lfm2.5-audio-1.5b-upstream branch from c275436 to e1a8fd1 Compare January 6, 2026 14:46
@tdakhran
Copy link
Contributor Author

tdakhran commented Jan 6, 2026

@ngxson @CISC is there a way to disable CI for this PR? There is no need to build it for each commit.

@CISC
Copy link
Collaborator

CISC commented Jan 6, 2026

@ngxson @CISC is there a way to disable CI for this PR? There is no need to build it for each commit.

Only way I know is to have a merge conflict.

@ggerganov
Copy link
Member

If the string [no ci] is present anywhere in the commit message, it won't execute the CI

@CISC
Copy link
Collaborator

CISC commented Jan 6, 2026

If the string [no ci] is present anywhere in the commit message, it won't execute the CI

Or that. We just have to remember to remove them all from the merge message. :)

Change is decoupled from ggml-org#18641.

[LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B)
needs streaming istft for generating output audio.

* add streaming ISTFT class (`mtmd_audio_streaming_istft`) with overlap-add for audio reconstruction
* replace global audio cache with per-instance cache, the model requires
  two independent caches, for preprocessing (audio input) and for istft
  (audio output).
* unified templated FFT/IFFT implementation supporting both forward and inverse transforms
… tarek/feat/os-lfm2.5-audio-1.5b-upstream

[no ci]
ngxson pushed a commit that referenced this pull request Jan 6, 2026
Change is decoupled from #18641.

[LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B)
needs streaming istft for generating output audio.

* add streaming ISTFT class (`mtmd_audio_streaming_istft`) with overlap-add for audio reconstruction
* replace global audio cache with per-instance cache, the model requires
  two independent caches, for preprocessing (audio input) and for istft
  (audio output).
* unified templated FFT/IFFT implementation supporting both forward and inverse transforms
@elfarolab
Copy link

@tdakhran

Hello Tarek,

I am trying to build your WIP PR.
I know it is a draft, it should be considered work in progress.

With the last commit: 'Read n_layer from gguf', using LTO, building fails at the very end of building here:

FAILED: bin/llama-liquid-audio-cli
: && /usr/bin/c++ -O3 -DNDEBUG  tools/liquid-audio/CMakeFiles/llama-liquid-audio-cli.dir/cli.cpp.o -o bin/llama-liquid-audio-cli  tools/liquid-audio/libliquid-audio.a  common/libcommon.a  /usr/lib/aarch64-linux-gnu/libcurl.so  tools/mtmd/libmtmd.a  src/libllama.a  ggml/src/libggml.a  ggml/src/libggml-cpu.a  /usr/lib/gcc/aarch64-linux-gnu/11/libgomp.a  /usr/lib/aarch64-linux-gnu/libpthread.a  ggml/src/ggml-blas/libggml-blas.a  /usr/lib/aarch64-linux-gnu/libopenblas.so.0  ggml/src/ggml-cuda/libggml-cuda.a  ggml/src/libggml-base.a  -lm  /usr/local/cuda-12.6/targets/aarch64-linux/lib/libcudart_static.a  /usr/local/cuda-12.6/targets/aarch64-linux/lib/libcublas_static.a  /usr/local/cuda-12.6/targets/aarch64-linux/lib/libcublasLt_static.a  /usr/local/cuda-12.6/targets/aarch64-linux/lib/libculibos.a  /usr/local/cuda-12.6/targets/aarch64-linux/lib/stubs/libcuda.so  -ldl  /usr/lib/aarch64-linux-gnu/librt.a && :
/usr/bin/ld: tools/mtmd/libmtmd.a(mtmd-helper.cpp.o):(.bss+0x28): multiple definition of `ma_atomic_global_lock'; tools/liquid-audio/CMakeFiles/llama-liquid-audio-cli.dir/cli.cpp.o:(.bss+0x0): first defined here
lto-wrapper: warning: using serial compilation of 17 LTRANS jobs
collect2: error: ld returned 1 exit status
[474/474] : && /usr/bin/c++ -O3 -DNDEBUG  tools/liquid-audio/CMakeFiles/llama-liquid-audio-server.dir/server.cpp.o -o bin/llama-liquid-audio-server  tools/liquid-audio/libliquid-audio.a  vendor/cpp-httplib/libcpp-httplib.a  common/libcommon.a  /usr/lib/aarch64-linux-gnu/libcurl.so  tools/mtmd/libmtmd.a  src/libllama.a  ggml/src/libggml.a  ggml/src/libggml-cpu.a  /usr/lib/gcc/aarch64-linux-gnu/11/libgomp.a  /usr/lib/aarch64-linux-gnu/libpthread.a  ggml/src/ggml-blas/libggml-blas.a  /usr/lib/aarch64-linux-gnu/libopenblas.so.0  ggml/src/ggml-cuda/libggml-cuda.a  ggml/src/libggml-base.a  -lm  /usr/local/cuda-12.6/targets/aarch64-linux/lib/libcudart_static.a  /usr/local/cuda-12.6/targets/aarch64-linux/lib/libcublas_static.a  /usr/local/cuda-12.6/targets/aarch64-linux/lib/libcublasLt_static.a  /usr/local/cuda-12.6/targets/aarch64-linux/lib/libculibos.a  /usr/local/cuda-12.6/targets/aarch64-linux/lib/stubs/libcuda.so  -ldl  /usr/lib/aarch64-linux-gnu/librt.a  /usr/lib/aarch64-linux-gnu/libssl.so  /usr/lib/aarch64-linux-gnu/libcrypto.so && :
lto-wrapper: warning: using serial compilation of 17 LTRANS jobs
ninja: build stopped: subcommand failed.

llama-server and llama-liquid-audio-server are succefully built, cli fails.

If there is anything I can do to help testing let me know.
I am building a system also with this model on Jetson Orin.

Thank you so much.

@tdakhran
Copy link
Contributor Author

tdakhran commented Jan 7, 2026

@elfarolab , mentioned commit didn't change anything related to compilation or LTO, could it be that there are stale object files somewhere?

Tested that the clean build in ubuntu:24.04 Docker image works

root@1641914992f4:/tmp/build# cmake /mnt -DLLAMA_CURL=OFF
root@1641914992f4:/tmp/build# make -j20 llama-liquid-audio-cli llama-liquid-audio-server
...
[ 98%] Built target liquid-audio
[100%] Built target llama-liquid-audio-cli
[100%] Built target llama-liquid-audio-server

UPD: it's related to miniaudio

cli defines implementation here https://github.com/ggml-org/llama.cpp/pull/18641/changes#diff-73f13371b37801825dc2cdbfacadf9af40aef9dca4770d9dacbbe3534c7a7dacR13 , another implementation is defined in mtmd audio.

try commenting this line

@elfarolab
Copy link

Before building I delete the building destination directory every time.
I am building with these options:

CMAKE_BUILD_TYPE=Release
CMAKE_INSTALL_PREFIX=$LLAMACPP_PREFIX_DIR
GGML_CUDA=ON
GGML_CUDA_FA=ON
GGML_CUDA_GRAPHS=ON
GGML_CUDA_FORCE_CUBLAS=ON
GGML_BLAS=ON
GGML_BLAS_VENDOR=OpenBLAS
BLAS_LIBRARIES="$OPENBLAS_LIB"
GGML_CUDA_USE_MMQ=ON
GGML_CUDA_FA_ALL_QUANTS=ON
GGML_AVX=OFF
GGML_AVX2=OFF
GGML_AVX512=OFF
GGML_SSE42=OFF
GGML_F16C=OFF
GGML_FMA=OFF
GGML_ACCELERATE=OFF
GGML_METAL=OFF
GGML_OPENCL=OFF
GGML_SYCL=OFF
GGML_HEXAGON=OFF
GGML_HIP=OFF
GGML_WEBGPU=OFF
GGML_VULKAN=OFF
GGML_LTO=ON
BUILD_SHARED_LIBS=OFF
GGML_STATIC=ON
CMAKE_CUDA_ARCHITECTURES=87
GGML_CUDA_F16=ON
GGML_CUDA_BF16=ON
BLA_STATIC=ON
LLAMA_BUILD_EXAMPLES=ON
LLAMA_BUILD_TESTS=OFF
LLAMA_OPENSSL=ON
LLAMA_CURL=ON
GGML_CUDA_JETSON_DEVICE=ON
GGML_CUDA_ENABLE_UNIFIED_MEMORY=ON
LLAMA_TOOLS_INSTALL=ON
GGML_BACKEND_DL=OFF
GGML_CPU_ALL_VARIANTS=OFF

I always build llama.cpp the same way with the options above, never get failures.
Also it is not the first time I build a PR.
I could try building without ninja.

@tdakhran
Copy link
Contributor Author

tdakhran commented Jan 7, 2026

@elfarolab , it should work now, there were two implementations of miniaudio

@elfarolab
Copy link

@elfarolab , it should work now, there were two implementations of miniaudio

rebuilding

@elfarolab
Copy link

elfarolab commented Jan 7, 2026

Built successfully.
cd to build dir.

export CKPT=/opt/usbhd/models/LFM2.5-Audio-1.5B-GGUF
export INPUT_WAV=/opt/usbhd/models/LFM2.5-Audio-1.5B-GGUF/speech_orig_24000Hz.wav
./llama-liquid-audio-cli -m $CKPT/LFM2.5-Audio-1.5B-F16.gguf -mm $CKPT/mmproj-LFM2.5-Audio-1.5B-F16.gguf -mv $CKPT/vocoder-LFM2.5-Audio-1.5B-F16.gguf --tts-speaker-file $CKPT/tokenizer-LFM2.5-Audio-1.5B-F16.gguf -sys "Perform ASR." --audio $INPUT_WAV
...

init_audio: audio input is in experimental stage and may have reduced quality:
    https://github.com/ggml-org/llama.cpp/discussions/13759
audio_decoder_ggml_ctx: using CUDA0 backend
audio_decoder_ggml_ctx: using GPU+CPU backend
load_gguf: Loaded 85 tensors from /opt/usbhd/models/LFM2.5-Audio-1.5B-GGUF/vocoder-LFM2.5-Audio-1.5B-F16.gguf
common_chat_params_init_lfm2: Using content relying on the template
add_text: <|im_start|>system
Perform ASR.<|im_end|>
<|im_start|>user

audio_tokens->n_tokens = 136
add_text: <|im_end|>
<|im_start|>assistant

encoding audio slice...
CUDA error: out of memory
  current device: 0, in function alloc at /opt/usbhd/SRC/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:475
  cuMemAddressReserve(&pool_addr, CUDA_POOL_VMM_MAX_SIZE, 0, 0, 0)
/opt/usbhd/SRC/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:96: CUDA error
[New LWP 278228]
[New LWP 278229]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
0x0000fffface49940 in __GI___wait4 (pid=<optimized out>, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30	../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.
#0  0x0000fffface49940 in __GI___wait4 (pid=<optimized out>, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30	in ../sysdeps/unix/sysv/linux/wait4.c
#1  0x0000aaaac82e034c in ggml_abort ()
#2  0x0000aaaac8531740 in ggml_cuda_error(char const*, char const*, char const*, int, char const*) ()
#3  0x0000aaaac854774c in ggml_cuda_pool_vmm::alloc(unsigned long, unsigned long*) ()
#4  0x0000aaaac8541228 in ggml_cuda_op_mul_mat_cublas(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, CUstream_st*) ()
#5  0x0000aaaac853ce04 in ggml_cuda_op_mul_mat(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, void (*)(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, CUstream_st*), void (*)(float const*, int const*, void*, ggml_type, long, long, long, long, long, long, long, long, CUstream_st*)) ()
#6  0x0000aaaac85405a8 in ggml_cuda_compute_forward(ggml_backend_cuda_context&, ggml_tensor*) ()
#7  0x0000aaaac85441b8 in ggml_cuda_graph_evaluate_and_capture(ggml_backend_cuda_context*, ggml_cgraph*, bool, bool) ()
#8  0x0000aaaac8545bf4 in ggml_backend_cuda_graph_compute(ggml_backend*, ggml_cgraph*) ()
#9  0x0000aaaac82f2fd8 in ggml_backend_sched_compute_splits(ggml_backend_sched*) [clone .lto_priv.0] ()
#10 0x0000aaaac82fa310 in ggml_backend_sched_graph_compute ()
#11 0x0000aaaac83aab10 in clip_image_batch_encode(clip_ctx*, int, clip_image_f32_batch const*, float*) ()
#12 0x0000aaaac833d020 in mtmd_encode_chunk ()
#13 0x0000aaaac83a3d90 in mtmd_helper_eval_chunk_single ()
#14 0x0000aaaac80d5714 in liquid::audio::Runner::RunnerImpl::eval_messages(std::vector<common_chat_msg, std::allocator<common_chat_msg> > const&, bool) ()
#15 0x0000aaaac80d5d1c in liquid::audio::Runner::RunnerImpl::generate(std::vector<liquid::audio::Runner::Message, std::allocator<liquid::audio::Runner::Message> > const&, int, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)> const&, std::function<void (std::vector<float, std::allocator<float> > const&)> const&) ()
#16 0x0000aaaac7d84194 in main ()
[Inferior 1 (process 278215) detached]
Aborted

that CUDA error: out of memory is weird, I am not running anything.. I reboot just in case

@elfarolab
Copy link

with fresh system, anything GPU or CPU intensive running:

free -h

               total        used        free      shared  buff/cache   available
Mem:            29Gi       693Mi       248Mi        25Mi        29Gi        28Gi
Swap:           14Gi       0.0Ki        14Gi
export CKPT=/opt/usbhd/models/LFM2.5-Audio-1.5B-GGUF
export INPUT_WAV=/opt/usbhd/models/LFM2.5-Audio-1.5B-GGUF/speech_orig_24000Hz.wav
./llama-liquid-audio-cli -m $CKPT/LFM2.5-Audio-1.5B-F16.gguf -mm $CKPT/mmproj-LFM2.5-Audio-1.5B-F16.gguf -mv $CKPT/vocoder-LFM2.5-Audio-1.5B-F16.gguf --tts-speaker-file $CKPT/tokenizer-LFM2.5-Audio-1.5B-F16.gguf -sys "Perform ASR." --audio $INPUT_WAV

loading tensors..

then:

free -h

               total        used        free      shared  buff/cache   available
Mem:            29Gi       4.3Gi       723Mi        25Mi        24Gi        25Gi
Swap:           14Gi       0.0Ki        14Gi

same error:

warmup: *****************************************************************
init_audio: audio input is in experimental stage and may have reduced quality:
    https://github.com/ggml-org/llama.cpp/discussions/13759
audio_decoder_ggml_ctx: using CUDA0 backend
audio_decoder_ggml_ctx: using GPU+CPU backend
load_gguf: Loaded 85 tensors from /opt/usbhd/models/LFM2.5-Audio-1.5B-GGUF/vocoder-LFM2.5-Audio-1.5B-F16.gguf
common_chat_params_init_lfm2: Using content relying on the template
add_text: <|im_start|>system
Perform ASR.<|im_end|>
<|im_start|>user

audio_tokens->n_tokens = 136
add_text: <|im_end|>
<|im_start|>assistant

encoding audio slice...
/opt/usbhd/SRC/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:96: CUDA error
CUDA error: out of memory
  current device: 0, in function alloc at /opt/usbhd/SRC/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:475
  cuMemAddressReserve(&pool_addr, CUDA_POOL_VMM_MAX_SIZE, 0, 0, 0)
[New LWP 2143]
[New LWP 2144]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
0x0000ffffa90a9940 in __GI___wait4 (pid=<optimized out>, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30	../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.
#0  0x0000ffffa90a9940 in __GI___wait4 (pid=<optimized out>, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30	in ../sysdeps/unix/sysv/linux/wait4.c
#1  0x0000aaaad783034c in ggml_abort ()
#2  0x0000aaaad7a81740 in ggml_cuda_error(char const*, char const*, char const*, int, char const*) ()
#3  0x0000aaaad7a9774c in ggml_cuda_pool_vmm::alloc(unsigned long, unsigned long*) ()
#4  0x0000aaaad7a91228 in ggml_cuda_op_mul_mat_cublas(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, CUstream_st*) ()
#5  0x0000aaaad7a8ce04 in ggml_cuda_op_mul_mat(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, void (*)(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, CUstream_st*), void (*)(float const*, int const*, void*, ggml_type, long, long, long, long, long, long, long, long, CUstream_st*)) ()
#6  0x0000aaaad7a905a8 in ggml_cuda_compute_forward(ggml_backend_cuda_context&, ggml_tensor*) ()
#7  0x0000aaaad7a941b8 in ggml_cuda_graph_evaluate_and_capture(ggml_backend_cuda_context*, ggml_cgraph*, bool, bool) ()
#8  0x0000aaaad7a95bf4 in ggml_backend_cuda_graph_compute(ggml_backend*, ggml_cgraph*) ()
#9  0x0000aaaad7842fd8 in ggml_backend_sched_compute_splits(ggml_backend_sched*) [clone .lto_priv.0] ()
#10 0x0000aaaad784a310 in ggml_backend_sched_graph_compute ()
#11 0x0000aaaad78fab10 in clip_image_batch_encode(clip_ctx*, int, clip_image_f32_batch const*, float*) ()
#12 0x0000aaaad788d020 in mtmd_encode_chunk ()
#13 0x0000aaaad78f3d90 in mtmd_helper_eval_chunk_single ()
#14 0x0000aaaad7625714 in liquid::audio::Runner::RunnerImpl::eval_messages(std::vector<common_chat_msg, std::allocator<common_chat_msg> > const&, bool) ()
#15 0x0000aaaad7625d1c in liquid::audio::Runner::RunnerImpl::generate(std::vector<liquid::audio::Runner::Message, std::allocator<liquid::audio::Runner::Message> > const&, int, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)> const&, std::function<void (std::vector<float, std::allocator<float> > const&)> const&) ()
#16 0x0000aaaad72d4194 in main ()
[Inferior 1 (process 2130) detached]
Aborted

@elfarolab
Copy link

elfarolab commented Jan 7, 2026

audio file is < 30 sec duration

ls -lh $INPUT_WAV
Permissions  Size User      Date Modified Name
.rw-r--r--  506Ki llamaexec  7 Jan 10:44  /opt/usbhd/models/LFM2.5-Audio-1.5B-GGUF/speech_orig_24000Hz.wav
mediainfo $INPUT_WAV
General
Complete name                            : /opt/usbhd/models/LFM2.5-Audio-1.5B-GGUF/speech_orig_24000Hz.wav
Format                                   : Wave
File size                                : 506 KiB
Duration                                 : 10 s 800 ms
Overall bit rate mode                    : Constant
Overall bit rate                         : 384 kb/s
Writing application                      : Lavf62.3.100

Audio
Format                                   : PCM
Format settings                          : Little / Signed
Codec ID                                 : 1
Duration                                 : 10 s 800 ms
Bit rate mode                            : Constant
Bit rate                                 : 384 kb/s
Channel(s)                               : 1 channel
Sampling rate                            : 24.0 kHz
Bit depth                                : 16 bits
Stream size                              : 506 KiB (100%)

I know, 24KHz looks strange but it is because later is used with libopus

@tdakhran
Copy link
Contributor Author

tdakhran commented Jan 7, 2026

@elfarolab can you try with a smaller audio file (e.g., 4 seconds)? Add --no-mmap flag to the server and use ggufs with Q8_0 quantization (or even Q4_0).

@elfarolab
Copy link

elfarolab commented Jan 7, 2026

TEST Q4_0

with fresh rebooted system, anything GPU or CPU intensive is running:
free -h

               total        used        free      shared  buff/cache   available
Mem:            29Gi       644Mi        29Gi        17Mi       241Mi        28Gi
Swap:           14Gi          0B        14Gi

tegrastats

01-07-2026 11:45:22 RAM 694/30697MB (lfb 2x4MB) SWAP 0/15348MB (cached 0MB) CPU [0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201] GR3D_FREQ 0% cpu@46.593C soc2@42.937C soc0@43.312C gpu@42.093C tj@46.593C soc1@43.093C VDD_GPU_SOC 4798mW/4800mW VDD_CPU_CV 1600mW/1520mW VIN_SYS_5V0 3326mW/3326mW
01-07-2026 11:45:23 RAM 694/30697MB (lfb 2x4MB) SWAP 0/15348MB (cached 0MB) CPU [1%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201] GR3D_FREQ 0% cpu@46.281C soc2@42.906C soc0@43.406C gpu@41.781C tj@46.281C soc1@43.093C VDD_GPU_SOC 4800mW/4800mW VDD_CPU_CV 1600mW/1533mW VIN_SYS_5V0 3326mW/3326mW
01-07-2026 11:45:24 RAM 694/30697MB (lfb 2x4MB) SWAP 0/15348MB (cached 0MB) CPU [0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201] GR3D_FREQ 0% cpu@46.343C soc2@42.937C soc0@43.406C gpu@41.875C tj@46.343C soc1@43.125C VDD_GPU_SOC 4800mW/4800mW VDD_CPU_CV 1600mW/1543mW VIN_SYS_5V0 3326mW/3326mW
01-07-2026 11:45:25 RAM 694/30697MB (lfb 2x4MB) SWAP 0/15348MB (cached 0MB) CPU [0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201] GR3D_FREQ 0% cpu@46.25C soc2@42.906C soc0@43.218C gpu@41.843C tj@46.25C soc1@43.156C VDD_GPU_SOC 4800mW/4800mW VDD_CPU_CV 1600mW/1550mW VIN_SYS_5V0 3326mW/3326mW
01-07-2026 11:45:26 RAM 694/30697MB (lfb 2x4MB) SWAP 0/15348MB (cached 0MB) CPU [0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201] GR3D_FREQ 0% cpu@46.437C soc2@43.031C soc0@43.218C gpu@41.937C tj@46.437C soc1@43.218C VDD_GPU_SOC 4800mW/4800mW VDD_CPU_CV 1600mW/1556mW VIN_SYS_5V0 3326mW/3326mW
01-07-2026 11:45:27 RAM 694/30697MB (lfb 2x4MB) SWAP 0/15348MB (cached 0MB) CPU [0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201] GR3D_FREQ 0% cpu@46.562C soc2@42.812C soc0@43.25C gpu@41.875C tj@46.562C soc1@43.031C VDD_GPU_SOC 4800mW/4800mW VDD_CPU_CV 1200mW/1520mW VIN_SYS_5V0 3326mW/3326mW
01-07-2026 11:45:28 RAM 694/30697MB (lfb 2x4MB) SWAP 0/15348MB (cached 0MB) CPU [0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201,0%@2201] GR3D_FREQ 0% cpu@46.343C soc2@42.937C soc0@43.25C gpu@41.843C tj@46.343C soc1@43.093C VDD_GPU_SOC 4800mW/4800mW VDD_CPU_CV 1200mW/1491mW VIN_SYS_5V0 3326mW/3326mW
^C
mediainfo speech_orig_24000Hz_4secs.wav
General
Complete name                            : speech_orig_24000Hz_4secs.wav
Format                                   : Wave
File size                                : 117 KiB
Duration                                 : 2 s 500 ms
Overall bit rate mode                    : Constant
Overall bit rate                         : 385 kb/s
Writing application                      : Lavf62.3.100 (libsndfile-1.0.31)
Software                                 : Lavf62.3.100

Audio
Format                                   : PCM
Format settings                          : Little / Signed
Codec ID                                 : 1
Duration                                 : 2 s 500 ms
Bit rate mode                            : Constant
Bit rate                                 : 384 kb/s
Channel(s)                               : 1 channel
Sampling rate                            : 24.0 kHz
Bit depth                                : 16 bits
Stream size                              : 117 KiB (100%)
export INPUT_WAV=/opt/usbhd/models/LFM2.5-Audio-1.5B-GGUF/speech_orig_24000Hz_4secs.wav
export CKPT=/opt/usbhd/models/LFM2.5-Audio-1.5B-GGUF
./llama-liquid-audio-cli --no-mmap -m $CKPT/LFM2.5-Audio-1.5B-Q4_0.gguf -mm $CKPT/mmproj-LFM2.5-Audio-1.5B-Q4_0.gguf -mv $CKPT/vocoder-LFM2.5-Audio-1.5B-Q4_0.gguf --tts-speaker-file $CKPT/tokenizer-LFM2.5-Audio-1.5B-Q4_0.gguf -sys "Perform ASR." --audio $INPUT_WAV

same error:

looks like the model try to expand into the whole available memory and beyond

Full Log
  Device 0: Orin, compute capability 8.7, VMM: yes
build: 123 (4a2f68a) with GNU 11.4.0 for Linux aarch64
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
llama_params_fit_impl: projected to use 845 MiB of device memory vs. 29463 MiB of free device memory
llama_params_fit_impl: will leave 28617 >= 1024 MiB of free device memory, no changes needed
llama_params_fit: successfully fit params to free device memory
llama_params_fit: fitting params to free memory took 0.87 seconds
llama_model_load_from_file_impl: using device CUDA0 (Orin) (0000:00:00.0) - 29465 MiB free
llama_model_loader: loaded meta data with 38 key-value pairs and 148 tensors from /opt/usbhd/models/LFM2.5-Audio-1.5B-GGUF/LFM2.5-Audio-1.5B-Q4_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = lfm2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = LFM2.5 Audio 1.5B
llama_model_loader: - kv   3:                           general.basename str              = LFM2.5-Audio
llama_model_loader: - kv   4:                         general.size_label str              = 1.5B
llama_model_loader: - kv   5:                            general.license str              = other
llama_model_loader: - kv   6:                       general.license.name str              = lfm1.0
llama_model_loader: - kv   7:                       general.license.link str              = LICENSE
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = LFM2 1.2B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = LiquidAI
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/LiquidAI/LFM2-...
llama_model_loader: - kv  12:                               general.tags arr[str,7]       = ["liquid", "lfm2", "audio", "lfm2-aud...
llama_model_loader: - kv  13:                          general.languages arr[str,1]       = ["en"]
llama_model_loader: - kv  14:                           lfm2.block_count u32              = 16
llama_model_loader: - kv  15:                        lfm2.context_length u32              = 128000
llama_model_loader: - kv  16:                      lfm2.embedding_length u32              = 2048
llama_model_loader: - kv  17:                   lfm2.feed_forward_length u32              = 8192
llama_model_loader: - kv  18:                  lfm2.attention.head_count u32              = 32
llama_model_loader: - kv  19:               lfm2.attention.head_count_kv arr[i32,16]      = [0, 0, 8, 0, 0, 8, 0, 0, 8, 0, 8, 0, ...
llama_model_loader: - kv  20:                        lfm2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  21:      lfm2.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  22:                            lfm2.vocab_size u32              = 65536
llama_model_loader: - kv  23:                     lfm2.shortconv.l_cache u32              = 3
llama_model_loader: - kv  24:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  25:                         tokenizer.ggml.pre str              = lfm2
llama_model_loader: - kv  26:                      tokenizer.ggml.tokens arr[str,65536]   = ["<|pad|>", "<|startoftext|>", "<|end...
llama_model_loader: - kv  27:                  tokenizer.ggml.token_type arr[i32,65536]   = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  28:                      tokenizer.ggml.merges arr[str,63683]   = ["Ċ Ċ", "Ċ ĊĊ", "ĊĊ Ċ", "Ċ �...
llama_model_loader: - kv  29:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  30:                tokenizer.ggml.eos_token_id u32              = 7
llama_model_loader: - kv  31:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  32:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  33:               tokenizer.ggml.add_sep_token bool             = false
llama_model_loader: - kv  34:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  35:                    tokenizer.chat_template str              = {{- bos_token -}}{%- set system_promp...
llama_model_loader: - kv  36:               general.quantization_version u32              = 2
llama_model_loader: - kv  37:                          general.file_type u32              = 2
llama_model_loader: - type  f32:   55 tensors
llama_model_loader: - type q4_0:   92 tensors
llama_model_loader: - type q6_K:    1 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_0
print_info: file size   = 661.25 MiB (4.74 BPW)
load: 0 unused tokens
load: printing all EOG tokens:
load:   - 2 ('<|endoftext|>')
load:   - 7 ('<|im_end|>')
load: special tokens cache size = 507
load: token to piece cache size = 0.3756 MB
print_info: arch             = lfm2
print_info: vocab_only       = 0
print_info: no_alloc         = 0
print_info: n_ctx_train      = 128000
print_info: n_embd           = 2048
print_info: n_embd_inp       = 2048
print_info: n_layer          = 16
print_info: n_head           = 32
print_info: n_head_kv        = [0, 0, 8, 0, 0, 8, 0, 0, 8, 0, 8, 0, 8, 0, 8, 0]
print_info: n_rot            = 64
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 64
print_info: n_embd_head_v    = 64
print_info: n_gqa            = [0, 0, 4, 0, 0, 4, 0, 0, 4, 0, 4, 0, 4, 0, 4, 0]
print_info: n_embd_k_gqa     = [0, 0, 512, 0, 0, 512, 0, 0, 512, 0, 512, 0, 512, 0, 512, 0]
print_info: n_embd_v_gqa     = [0, 0, 512, 0, 0, 512, 0, 0, 512, 0, 512, 0, 512, 0, 512, 0]
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-05
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 8192
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: n_expert_groups  = 0
print_info: n_group_used     = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 2
print_info: rope scaling     = linear
print_info: freq_base_train  = 1000000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 128000
print_info: rope_yarn_log_mul= 0.0000
print_info: rope_finetuned   = unknown
print_info: model type       = 1.2B
print_info: model params     = 1.17 B
print_info: general.name     = LFM2.5 Audio 1.5B
print_info: vocab type       = BPE
print_info: n_vocab          = 65536
print_info: n_merges         = 63683
print_info: BOS token        = 1 '<|startoftext|>'
print_info: EOS token        = 7 '<|im_end|>'
print_info: EOT token        = 2 '<|endoftext|>'
print_info: PAD token        = 0 '<|pad|>'
print_info: LF token         = 708 'Ċ'
print_info: EOG token        = 2 '<|endoftext|>'
print_info: EOG token        = 7 '<|im_end|>'
print_info: max token length = 30
load_tensors: loading model tensors, this can take a while... (mmap = false)
load_tensors: offloading output layer to GPU
load_tensors: offloading 15 repeating layers to GPU
load_tensors: offloaded 17/17 layers to GPU
load_tensors:          CPU model buffer size =   105.00 MiB
load_tensors:        CUDA0 model buffer size =   661.25 MiB
.......................................................................
common_init_result: added <|endoftext|> logit bias = -inf
common_init_result: added <|im_end|> logit bias = -inf
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 4096
llama_context: n_ctx_seq     = 4096
llama_context: n_batch       = 2048
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = auto
llama_context: kv_unified    = false
llama_context: freq_base     = 1000000.0
llama_context: freq_scale    = 1
llama_context: n_ctx_seq (4096) < n_ctx_train (128000) -- the full capacity of the model will not be utilized
llama_context:  CUDA_Host  output buffer size =     0.25 MiB
llama_kv_cache:      CUDA0 KV buffer size =    48.00 MiB
llama_kv_cache: size =   48.00 MiB (  4096 cells,   6 layers,  1/1 seqs), K (f16):   24.00 MiB, V (f16):   24.00 MiB
llama_memory_recurrent:      CUDA0 RS buffer size =     0.16 MiB
llama_memory_recurrent: size =    0.16 MiB (     1 cells,  16 layers,  1 seqs), R (f32):    0.16 MiB, S (f32):    0.00 MiB
llama_context: Flash Attention was auto, set to enabled
llama_context:      CUDA0 compute buffer size =   136.00 MiB
llama_context:  CUDA_Host compute buffer size =    12.01 MiB
llama_context: graph nodes  = 549
llama_context: graph splits = 2
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
llama_params_fit_impl: projected to use 180 MiB of device memory vs. 28283 MiB of free device memory
llama_params_fit_impl: will leave 28103 >= 1024 MiB of free device memory, no changes needed
llama_params_fit: successfully fit params to free device memory
llama_params_fit: fitting params to free memory took 0.20 seconds
llama_model_load_from_file_impl: using device CUDA0 (Orin) (0000:00:00.0) - 28283 MiB free
llama_model_loader: loaded meta data with 29 key-value pairs and 77 tensors from /opt/usbhd/models/LFM2.5-Audio-1.5B-GGUF/tokenizer-LFM2.5-Audio-1.5B-Q4_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = lfm2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Audio_Detokenizer
llama_model_loader: - kv   3:                         general.size_label str              = 70M
llama_model_loader: - kv   4:                           lfm2.block_count u32              = 8
llama_model_loader: - kv   5:                        lfm2.context_length u32              = 128000
llama_model_loader: - kv   6:                      lfm2.embedding_length u32              = 512
llama_model_loader: - kv   7:                   lfm2.feed_forward_length u32              = 2304
llama_model_loader: - kv   8:                  lfm2.attention.head_count u32              = 16
llama_model_loader: - kv   9:               lfm2.attention.head_count_kv arr[i32,8]       = [0, 0, 8, 0, 8, 0, 8, 0]
llama_model_loader: - kv  10:                        lfm2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  11:      lfm2.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  12:                            lfm2.vocab_size u32              = 65536
llama_model_loader: - kv  13:                     lfm2.shortconv.l_cache u32              = 3
llama_model_loader: - kv  14:              lfm2.attention.sliding_window u32              = 30
llama_model_loader: - kv  15:                  lfm2.embedding_length_out u32              = 1282
llama_model_loader: - kv  16:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  17:                         tokenizer.ggml.pre str              = lfm2
llama_model_loader: - kv  18:                      tokenizer.ggml.tokens arr[str,65536]   = ["<|pad|>", "<|startoftext|>", "<|end...
llama_model_loader: - kv  19:                  tokenizer.ggml.token_type arr[i32,65536]   = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  20:                      tokenizer.ggml.merges arr[str,63683]   = ["Ċ Ċ", "Ċ ĊĊ", "ĊĊ Ċ", "Ċ �...
llama_model_loader: - kv  21:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  22:                tokenizer.ggml.eos_token_id u32              = 7
llama_model_loader: - kv  23:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  24:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  25:               tokenizer.ggml.add_sep_token bool             = false
llama_model_loader: - kv  26:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  27:               general.quantization_version u32              = 2
llama_model_loader: - kv  28:                          general.file_type u32              = 2
llama_model_loader: - type  f32:   29 tensors
llama_model_loader: - type q4_0:   47 tensors
llama_model_loader: - type q6_K:    1 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_0
print_info: file size   = 45.94 MiB (5.49 BPW)
load: 0 unused tokens
load: printing all EOG tokens:
load:   - 2 ('<|endoftext|>')
load:   - 7 ('<|im_end|>')
load: special tokens cache size = 507
load: token to piece cache size = 0.3756 MB
print_info: arch             = lfm2
print_info: vocab_only       = 0
print_info: no_alloc         = 0
print_info: n_ctx_train      = 128000
print_info: n_embd           = 512
print_info: n_embd_inp       = 512
print_info: n_layer          = 8
print_info: n_head           = 16
print_info: n_head_kv        = [0, 0, 8, 0, 8, 0, 8, 0]
print_info: n_rot            = 32
print_info: n_swa            = 30
print_info: is_swa_any       = 1
print_info: n_embd_head_k    = 32
print_info: n_embd_head_v    = 32
print_info: n_gqa            = [0, 0, 2, 0, 2, 0, 2, 0]
print_info: n_embd_k_gqa     = [0, 0, 256, 0, 256, 0, 256, 0]
print_info: n_embd_v_gqa     = [0, 0, 256, 0, 256, 0, 256, 0]
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-05
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 2304
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: n_expert_groups  = 0
print_info: n_group_used     = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 2
print_info: rope scaling     = linear
print_info: freq_base_train  = 1000000.0
print_info: freq_scale_train = 1
print_info: freq_base_swa    = 10000.0
print_info: freq_scale_swa   = 1
print_info: n_ctx_orig_yarn  = 128000
print_info: rope_yarn_log_mul= 0.0000
print_info: rope_finetuned   = unknown
print_info: model type       = ?B
print_info: model params     = 70.14 M
print_info: general.name     = Audio_Detokenizer
print_info: vocab type       = BPE
print_info: n_vocab          = 65536
print_info: n_merges         = 63683
print_info: BOS token        = 1 '<|startoftext|>'
print_info: EOS token        = 7 '<|im_end|>'
print_info: EOT token        = 2 '<|endoftext|>'
print_info: PAD token        = 0 '<|pad|>'
print_info: LF token         = 708 'Ċ'
print_info: EOG token        = 2 '<|endoftext|>'
print_info: EOG token        = 7 '<|im_end|>'
print_info: max token length = 30
load_tensors: loading model tensors, this can take a while... (mmap = false)
load_tensors: offloading output layer to GPU
load_tensors: offloading 7 repeating layers to GPU
load_tensors: offloaded 9/9 layers to GPU
load_tensors:          CPU model buffer size =    26.25 MiB
load_tensors:        CUDA0 model buffer size =    45.94 MiB
..............................
common_init_result: added <|endoftext|> logit bias = -inf
common_init_result: added <|im_end|> logit bias = -inf
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 4096
llama_context: n_ctx_seq     = 4096
llama_context: n_batch       = 2048
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = auto
llama_context: kv_unified    = false
llama_context: freq_base     = 1000000.0
llama_context: freq_scale    = 1
llama_context: n_ctx_seq (4096) < n_ctx_train (128000) -- the full capacity of the model will not be utilized
llama_context:  CUDA_Host  output buffer size =     0.25 MiB
llama_kv_cache_iswa: creating non-SWA KV cache, size = 4096 cells
llama_kv_cache: size =    0.00 MiB (  4096 cells,   0 layers,  1/1 seqs), K (f16):    0.00 MiB, V (f16):    0.00 MiB
llama_kv_cache_iswa: creating     SWA KV cache, size = 768 cells
llama_kv_cache:      CUDA0 KV buffer size =     2.25 MiB
llama_kv_cache: size =    2.25 MiB (   768 cells,   3 layers,  1/1 seqs), K (f16):    1.12 MiB, V (f16):    1.12 MiB
llama_memory_recurrent:      CUDA0 RS buffer size =     0.02 MiB
llama_memory_recurrent: size =    0.02 MiB (     1 cells,   8 layers,  1 seqs), R (f32):    0.02 MiB, S (f32):    0.00 MiB
llama_context: layer 2 is assigned to device CUDA0 but the Flash Attention tensor is assigned to device CPU (usually due to missing support)
llama_context: Flash Attention was auto, set to disabled
llama_context:      CUDA0 compute buffer size =   132.50 MiB
llama_context:  CUDA_Host compute buffer size =     2.51 MiB
llama_context: graph nodes  = 295
llama_context: graph splits = 2
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
common_chat_params_init_lfm2: Using content relying on the template
init: chat template example:
<|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
Hello<|im_end|>
<|im_start|>assistant
Hi there<|im_end|>
<|im_start|>user
How are you?<|im_end|>
<|im_start|>assistant

clip_model_loader: model name:   LFM2.5 Audio 1.5B
clip_model_loader: description:
clip_model_loader: GGUF version: 3
clip_model_loader: alignment:    32
clip_model_loader: n_tensors:    650
clip_model_loader: n_kv:         26

clip_model_loader: has audio encoder
clip_model_loader: tensor[0]: n_dims = 1, name = a.blk.0.attn_k.bias, tensor_size=2048, offset=0, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[1]: n_dims = 2, name = a.blk.0.attn_k.weight, tensor_size=147456, offset=2048, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[2]: n_dims = 1, name = a.blk.0.attn_out.bias, tensor_size=2048, offset=149504, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[3]: n_dims = 2, name = a.blk.0.attn_out.weight, tensor_size=147456, offset=151552, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[4]: n_dims = 1, name = a.blk.0.attn_q.bias, tensor_size=2048, offset=299008, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[5]: n_dims = 2, name = a.blk.0.attn_q.weight, tensor_size=147456, offset=301056, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[6]: n_dims = 1, name = a.blk.0.attn_v.bias, tensor_size=2048, offset=448512, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[7]: n_dims = 2, name = a.blk.0.attn_v.weight, tensor_size=147456, offset=450560, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[8]: n_dims = 1, name = a.blk.0.conv_dw.bias, tensor_size=2048, offset=598016, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[9]: n_dims = 2, name = a.blk.0.conv_dw.weight, tensor_size=18432, offset=600064, shape:[9, 512, 1, 1], type = f32
clip_model_loader: tensor[10]: n_dims = 1, name = a.blk.0.conv_norm.bias, tensor_size=2048, offset=618496, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[11]: n_dims = 1, name = a.blk.0.conv_norm.weight, tensor_size=2048, offset=620544, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[12]: n_dims = 1, name = a.blk.0.conv_pw1.bias, tensor_size=4096, offset=622592, shape:[1024, 1, 1, 1], type = f32
clip_model_loader: tensor[13]: n_dims = 2, name = a.blk.0.conv_pw1.weight, tensor_size=294912, offset=626688, shape:[512, 1024, 1, 1], type = q4_0
clip_model_loader: tensor[14]: n_dims = 1, name = a.blk.0.conv_pw2.bias, tensor_size=2048, offset=921600, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[15]: n_dims = 2, name = a.blk.0.conv_pw2.weight, tensor_size=147456, offset=923648, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[16]: n_dims = 1, name = a.blk.0.ffn_down.bias, tensor_size=2048, offset=1071104, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[17]: n_dims = 2, name = a.blk.0.ffn_down.weight, tensor_size=589824, offset=1073152, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[18]: n_dims = 1, name = a.blk.0.ffn_down_1.bias, tensor_size=2048, offset=1662976, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[19]: n_dims = 2, name = a.blk.0.ffn_down_1.weight, tensor_size=589824, offset=1665024, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[20]: n_dims = 1, name = a.blk.0.ffn_norm.bias, tensor_size=2048, offset=2254848, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[21]: n_dims = 1, name = a.blk.0.ffn_norm.weight, tensor_size=2048, offset=2256896, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[22]: n_dims = 1, name = a.blk.0.ffn_norm_1.bias, tensor_size=2048, offset=2258944, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[23]: n_dims = 1, name = a.blk.0.ffn_norm_1.weight, tensor_size=2048, offset=2260992, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[24]: n_dims = 1, name = a.blk.0.ffn_up.bias, tensor_size=8192, offset=2263040, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[25]: n_dims = 2, name = a.blk.0.ffn_up.weight, tensor_size=589824, offset=2271232, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[26]: n_dims = 1, name = a.blk.0.ffn_up_1.bias, tensor_size=8192, offset=2861056, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[27]: n_dims = 2, name = a.blk.0.ffn_up_1.weight, tensor_size=589824, offset=2869248, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[28]: n_dims = 2, name = a.blk.0.linear_pos.weight, tensor_size=147456, offset=3459072, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[29]: n_dims = 1, name = a.blk.0.ln1.bias, tensor_size=2048, offset=3606528, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[30]: n_dims = 1, name = a.blk.0.ln1.weight, tensor_size=2048, offset=3608576, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[31]: n_dims = 1, name = a.blk.0.ln2.bias, tensor_size=2048, offset=3610624, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[32]: n_dims = 1, name = a.blk.0.ln2.weight, tensor_size=2048, offset=3612672, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[33]: n_dims = 1, name = a.blk.0.norm_conv.bias, tensor_size=2048, offset=3614720, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[34]: n_dims = 1, name = a.blk.0.norm_conv.weight, tensor_size=2048, offset=3616768, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[35]: n_dims = 2, name = a.blk.0.pos_bias_u, tensor_size=2048, offset=3618816, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[36]: n_dims = 2, name = a.blk.0.pos_bias_v, tensor_size=2048, offset=3620864, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[37]: n_dims = 1, name = a.blk.1.attn_k.bias, tensor_size=2048, offset=3622912, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[38]: n_dims = 2, name = a.blk.1.attn_k.weight, tensor_size=147456, offset=3624960, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[39]: n_dims = 1, name = a.blk.1.attn_out.bias, tensor_size=2048, offset=3772416, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[40]: n_dims = 2, name = a.blk.1.attn_out.weight, tensor_size=147456, offset=3774464, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[41]: n_dims = 1, name = a.blk.1.attn_q.bias, tensor_size=2048, offset=3921920, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[42]: n_dims = 2, name = a.blk.1.attn_q.weight, tensor_size=147456, offset=3923968, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[43]: n_dims = 1, name = a.blk.1.attn_v.bias, tensor_size=2048, offset=4071424, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[44]: n_dims = 2, name = a.blk.1.attn_v.weight, tensor_size=147456, offset=4073472, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[45]: n_dims = 1, name = a.blk.1.conv_dw.bias, tensor_size=2048, offset=4220928, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[46]: n_dims = 2, name = a.blk.1.conv_dw.weight, tensor_size=18432, offset=4222976, shape:[9, 512, 1, 1], type = f32
clip_model_loader: tensor[47]: n_dims = 1, name = a.blk.1.conv_norm.bias, tensor_size=2048, offset=4241408, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[48]: n_dims = 1, name = a.blk.1.conv_norm.weight, tensor_size=2048, offset=4243456, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[49]: n_dims = 1, name = a.blk.1.conv_pw1.bias, tensor_size=4096, offset=4245504, shape:[1024, 1, 1, 1], type = f32
clip_model_loader: tensor[50]: n_dims = 2, name = a.blk.1.conv_pw1.weight, tensor_size=294912, offset=4249600, shape:[512, 1024, 1, 1], type = q4_0
clip_model_loader: tensor[51]: n_dims = 1, name = a.blk.1.conv_pw2.bias, tensor_size=2048, offset=4544512, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[52]: n_dims = 2, name = a.blk.1.conv_pw2.weight, tensor_size=147456, offset=4546560, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[53]: n_dims = 1, name = a.blk.1.ffn_down.bias, tensor_size=2048, offset=4694016, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[54]: n_dims = 2, name = a.blk.1.ffn_down.weight, tensor_size=589824, offset=4696064, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[55]: n_dims = 1, name = a.blk.1.ffn_down_1.bias, tensor_size=2048, offset=5285888, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[56]: n_dims = 2, name = a.blk.1.ffn_down_1.weight, tensor_size=589824, offset=5287936, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[57]: n_dims = 1, name = a.blk.1.ffn_norm.bias, tensor_size=2048, offset=5877760, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[58]: n_dims = 1, name = a.blk.1.ffn_norm.weight, tensor_size=2048, offset=5879808, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[59]: n_dims = 1, name = a.blk.1.ffn_norm_1.bias, tensor_size=2048, offset=5881856, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[60]: n_dims = 1, name = a.blk.1.ffn_norm_1.weight, tensor_size=2048, offset=5883904, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[61]: n_dims = 1, name = a.blk.1.ffn_up.bias, tensor_size=8192, offset=5885952, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[62]: n_dims = 2, name = a.blk.1.ffn_up.weight, tensor_size=589824, offset=5894144, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[63]: n_dims = 1, name = a.blk.1.ffn_up_1.bias, tensor_size=8192, offset=6483968, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[64]: n_dims = 2, name = a.blk.1.ffn_up_1.weight, tensor_size=589824, offset=6492160, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[65]: n_dims = 2, name = a.blk.1.linear_pos.weight, tensor_size=147456, offset=7081984, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[66]: n_dims = 1, name = a.blk.1.ln1.bias, tensor_size=2048, offset=7229440, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[67]: n_dims = 1, name = a.blk.1.ln1.weight, tensor_size=2048, offset=7231488, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[68]: n_dims = 1, name = a.blk.1.ln2.bias, tensor_size=2048, offset=7233536, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[69]: n_dims = 1, name = a.blk.1.ln2.weight, tensor_size=2048, offset=7235584, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[70]: n_dims = 1, name = a.blk.1.norm_conv.bias, tensor_size=2048, offset=7237632, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[71]: n_dims = 1, name = a.blk.1.norm_conv.weight, tensor_size=2048, offset=7239680, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[72]: n_dims = 2, name = a.blk.1.pos_bias_u, tensor_size=2048, offset=7241728, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[73]: n_dims = 2, name = a.blk.1.pos_bias_v, tensor_size=2048, offset=7243776, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[74]: n_dims = 1, name = a.blk.10.attn_k.bias, tensor_size=2048, offset=7245824, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[75]: n_dims = 2, name = a.blk.10.attn_k.weight, tensor_size=147456, offset=7247872, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[76]: n_dims = 1, name = a.blk.10.attn_out.bias, tensor_size=2048, offset=7395328, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[77]: n_dims = 2, name = a.blk.10.attn_out.weight, tensor_size=147456, offset=7397376, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[78]: n_dims = 1, name = a.blk.10.attn_q.bias, tensor_size=2048, offset=7544832, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[79]: n_dims = 2, name = a.blk.10.attn_q.weight, tensor_size=147456, offset=7546880, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[80]: n_dims = 1, name = a.blk.10.attn_v.bias, tensor_size=2048, offset=7694336, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[81]: n_dims = 2, name = a.blk.10.attn_v.weight, tensor_size=147456, offset=7696384, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[82]: n_dims = 1, name = a.blk.10.conv_dw.bias, tensor_size=2048, offset=7843840, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[83]: n_dims = 2, name = a.blk.10.conv_dw.weight, tensor_size=18432, offset=7845888, shape:[9, 512, 1, 1], type = f32
clip_model_loader: tensor[84]: n_dims = 1, name = a.blk.10.conv_norm.bias, tensor_size=2048, offset=7864320, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[85]: n_dims = 1, name = a.blk.10.conv_norm.weight, tensor_size=2048, offset=7866368, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[86]: n_dims = 1, name = a.blk.10.conv_pw1.bias, tensor_size=4096, offset=7868416, shape:[1024, 1, 1, 1], type = f32
clip_model_loader: tensor[87]: n_dims = 2, name = a.blk.10.conv_pw1.weight, tensor_size=294912, offset=7872512, shape:[512, 1024, 1, 1], type = q4_0
clip_model_loader: tensor[88]: n_dims = 1, name = a.blk.10.conv_pw2.bias, tensor_size=2048, offset=8167424, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[89]: n_dims = 2, name = a.blk.10.conv_pw2.weight, tensor_size=147456, offset=8169472, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[90]: n_dims = 1, name = a.blk.10.ffn_down.bias, tensor_size=2048, offset=8316928, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[91]: n_dims = 2, name = a.blk.10.ffn_down.weight, tensor_size=589824, offset=8318976, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[92]: n_dims = 1, name = a.blk.10.ffn_down_1.bias, tensor_size=2048, offset=8908800, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[93]: n_dims = 2, name = a.blk.10.ffn_down_1.weight, tensor_size=589824, offset=8910848, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[94]: n_dims = 1, name = a.blk.10.ffn_norm.bias, tensor_size=2048, offset=9500672, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[95]: n_dims = 1, name = a.blk.10.ffn_norm.weight, tensor_size=2048, offset=9502720, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[96]: n_dims = 1, name = a.blk.10.ffn_norm_1.bias, tensor_size=2048, offset=9504768, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[97]: n_dims = 1, name = a.blk.10.ffn_norm_1.weight, tensor_size=2048, offset=9506816, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[98]: n_dims = 1, name = a.blk.10.ffn_up.bias, tensor_size=8192, offset=9508864, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[99]: n_dims = 2, name = a.blk.10.ffn_up.weight, tensor_size=589824, offset=9517056, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[100]: n_dims = 1, name = a.blk.10.ffn_up_1.bias, tensor_size=8192, offset=10106880, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[101]: n_dims = 2, name = a.blk.10.ffn_up_1.weight, tensor_size=589824, offset=10115072, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[102]: n_dims = 2, name = a.blk.10.linear_pos.weight, tensor_size=147456, offset=10704896, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[103]: n_dims = 1, name = a.blk.10.ln1.bias, tensor_size=2048, offset=10852352, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[104]: n_dims = 1, name = a.blk.10.ln1.weight, tensor_size=2048, offset=10854400, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[105]: n_dims = 1, name = a.blk.10.ln2.bias, tensor_size=2048, offset=10856448, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[106]: n_dims = 1, name = a.blk.10.ln2.weight, tensor_size=2048, offset=10858496, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[107]: n_dims = 1, name = a.blk.10.norm_conv.bias, tensor_size=2048, offset=10860544, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[108]: n_dims = 1, name = a.blk.10.norm_conv.weight, tensor_size=2048, offset=10862592, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[109]: n_dims = 2, name = a.blk.10.pos_bias_u, tensor_size=2048, offset=10864640, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[110]: n_dims = 2, name = a.blk.10.pos_bias_v, tensor_size=2048, offset=10866688, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[111]: n_dims = 1, name = a.blk.11.attn_k.bias, tensor_size=2048, offset=10868736, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[112]: n_dims = 2, name = a.blk.11.attn_k.weight, tensor_size=147456, offset=10870784, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[113]: n_dims = 1, name = a.blk.11.attn_out.bias, tensor_size=2048, offset=11018240, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[114]: n_dims = 2, name = a.blk.11.attn_out.weight, tensor_size=147456, offset=11020288, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[115]: n_dims = 1, name = a.blk.11.attn_q.bias, tensor_size=2048, offset=11167744, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[116]: n_dims = 2, name = a.blk.11.attn_q.weight, tensor_size=147456, offset=11169792, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[117]: n_dims = 1, name = a.blk.11.attn_v.bias, tensor_size=2048, offset=11317248, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[118]: n_dims = 2, name = a.blk.11.attn_v.weight, tensor_size=147456, offset=11319296, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[119]: n_dims = 1, name = a.blk.11.conv_dw.bias, tensor_size=2048, offset=11466752, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[120]: n_dims = 2, name = a.blk.11.conv_dw.weight, tensor_size=18432, offset=11468800, shape:[9, 512, 1, 1], type = f32
clip_model_loader: tensor[121]: n_dims = 1, name = a.blk.11.conv_norm.bias, tensor_size=2048, offset=11487232, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[122]: n_dims = 1, name = a.blk.11.conv_norm.weight, tensor_size=2048, offset=11489280, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[123]: n_dims = 1, name = a.blk.11.conv_pw1.bias, tensor_size=4096, offset=11491328, shape:[1024, 1, 1, 1], type = f32
clip_model_loader: tensor[124]: n_dims = 2, name = a.blk.11.conv_pw1.weight, tensor_size=294912, offset=11495424, shape:[512, 1024, 1, 1], type = q4_0
clip_model_loader: tensor[125]: n_dims = 1, name = a.blk.11.conv_pw2.bias, tensor_size=2048, offset=11790336, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[126]: n_dims = 2, name = a.blk.11.conv_pw2.weight, tensor_size=147456, offset=11792384, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[127]: n_dims = 1, name = a.blk.11.ffn_down.bias, tensor_size=2048, offset=11939840, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[128]: n_dims = 2, name = a.blk.11.ffn_down.weight, tensor_size=589824, offset=11941888, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[129]: n_dims = 1, name = a.blk.11.ffn_down_1.bias, tensor_size=2048, offset=12531712, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[130]: n_dims = 2, name = a.blk.11.ffn_down_1.weight, tensor_size=589824, offset=12533760, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[131]: n_dims = 1, name = a.blk.11.ffn_norm.bias, tensor_size=2048, offset=13123584, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[132]: n_dims = 1, name = a.blk.11.ffn_norm.weight, tensor_size=2048, offset=13125632, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[133]: n_dims = 1, name = a.blk.11.ffn_norm_1.bias, tensor_size=2048, offset=13127680, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[134]: n_dims = 1, name = a.blk.11.ffn_norm_1.weight, tensor_size=2048, offset=13129728, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[135]: n_dims = 1, name = a.blk.11.ffn_up.bias, tensor_size=8192, offset=13131776, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[136]: n_dims = 2, name = a.blk.11.ffn_up.weight, tensor_size=589824, offset=13139968, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[137]: n_dims = 1, name = a.blk.11.ffn_up_1.bias, tensor_size=8192, offset=13729792, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[138]: n_dims = 2, name = a.blk.11.ffn_up_1.weight, tensor_size=589824, offset=13737984, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[139]: n_dims = 2, name = a.blk.11.linear_pos.weight, tensor_size=147456, offset=14327808, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[140]: n_dims = 1, name = a.blk.11.ln1.bias, tensor_size=2048, offset=14475264, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[141]: n_dims = 1, name = a.blk.11.ln1.weight, tensor_size=2048, offset=14477312, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[142]: n_dims = 1, name = a.blk.11.ln2.bias, tensor_size=2048, offset=14479360, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[143]: n_dims = 1, name = a.blk.11.ln2.weight, tensor_size=2048, offset=14481408, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[144]: n_dims = 1, name = a.blk.11.norm_conv.bias, tensor_size=2048, offset=14483456, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[145]: n_dims = 1, name = a.blk.11.norm_conv.weight, tensor_size=2048, offset=14485504, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[146]: n_dims = 2, name = a.blk.11.pos_bias_u, tensor_size=2048, offset=14487552, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[147]: n_dims = 2, name = a.blk.11.pos_bias_v, tensor_size=2048, offset=14489600, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[148]: n_dims = 1, name = a.blk.12.attn_k.bias, tensor_size=2048, offset=14491648, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[149]: n_dims = 2, name = a.blk.12.attn_k.weight, tensor_size=147456, offset=14493696, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[150]: n_dims = 1, name = a.blk.12.attn_out.bias, tensor_size=2048, offset=14641152, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[151]: n_dims = 2, name = a.blk.12.attn_out.weight, tensor_size=147456, offset=14643200, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[152]: n_dims = 1, name = a.blk.12.attn_q.bias, tensor_size=2048, offset=14790656, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[153]: n_dims = 2, name = a.blk.12.attn_q.weight, tensor_size=147456, offset=14792704, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[154]: n_dims = 1, name = a.blk.12.attn_v.bias, tensor_size=2048, offset=14940160, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[155]: n_dims = 2, name = a.blk.12.attn_v.weight, tensor_size=147456, offset=14942208, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[156]: n_dims = 1, name = a.blk.12.conv_dw.bias, tensor_size=2048, offset=15089664, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[157]: n_dims = 2, name = a.blk.12.conv_dw.weight, tensor_size=18432, offset=15091712, shape:[9, 512, 1, 1], type = f32
clip_model_loader: tensor[158]: n_dims = 1, name = a.blk.12.conv_norm.bias, tensor_size=2048, offset=15110144, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[159]: n_dims = 1, name = a.blk.12.conv_norm.weight, tensor_size=2048, offset=15112192, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[160]: n_dims = 1, name = a.blk.12.conv_pw1.bias, tensor_size=4096, offset=15114240, shape:[1024, 1, 1, 1], type = f32
clip_model_loader: tensor[161]: n_dims = 2, name = a.blk.12.conv_pw1.weight, tensor_size=294912, offset=15118336, shape:[512, 1024, 1, 1], type = q4_0
clip_model_loader: tensor[162]: n_dims = 1, name = a.blk.12.conv_pw2.bias, tensor_size=2048, offset=15413248, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[163]: n_dims = 2, name = a.blk.12.conv_pw2.weight, tensor_size=147456, offset=15415296, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[164]: n_dims = 1, name = a.blk.12.ffn_down.bias, tensor_size=2048, offset=15562752, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[165]: n_dims = 2, name = a.blk.12.ffn_down.weight, tensor_size=589824, offset=15564800, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[166]: n_dims = 1, name = a.blk.12.ffn_down_1.bias, tensor_size=2048, offset=16154624, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[167]: n_dims = 2, name = a.blk.12.ffn_down_1.weight, tensor_size=589824, offset=16156672, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[168]: n_dims = 1, name = a.blk.12.ffn_norm.bias, tensor_size=2048, offset=16746496, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[169]: n_dims = 1, name = a.blk.12.ffn_norm.weight, tensor_size=2048, offset=16748544, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[170]: n_dims = 1, name = a.blk.12.ffn_norm_1.bias, tensor_size=2048, offset=16750592, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[171]: n_dims = 1, name = a.blk.12.ffn_norm_1.weight, tensor_size=2048, offset=16752640, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[172]: n_dims = 1, name = a.blk.12.ffn_up.bias, tensor_size=8192, offset=16754688, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[173]: n_dims = 2, name = a.blk.12.ffn_up.weight, tensor_size=589824, offset=16762880, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[174]: n_dims = 1, name = a.blk.12.ffn_up_1.bias, tensor_size=8192, offset=17352704, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[175]: n_dims = 2, name = a.blk.12.ffn_up_1.weight, tensor_size=589824, offset=17360896, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[176]: n_dims = 2, name = a.blk.12.linear_pos.weight, tensor_size=147456, offset=17950720, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[177]: n_dims = 1, name = a.blk.12.ln1.bias, tensor_size=2048, offset=18098176, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[178]: n_dims = 1, name = a.blk.12.ln1.weight, tensor_size=2048, offset=18100224, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[179]: n_dims = 1, name = a.blk.12.ln2.bias, tensor_size=2048, offset=18102272, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[180]: n_dims = 1, name = a.blk.12.ln2.weight, tensor_size=2048, offset=18104320, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[181]: n_dims = 1, name = a.blk.12.norm_conv.bias, tensor_size=2048, offset=18106368, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[182]: n_dims = 1, name = a.blk.12.norm_conv.weight, tensor_size=2048, offset=18108416, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[183]: n_dims = 2, name = a.blk.12.pos_bias_u, tensor_size=2048, offset=18110464, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[184]: n_dims = 2, name = a.blk.12.pos_bias_v, tensor_size=2048, offset=18112512, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[185]: n_dims = 1, name = a.blk.13.attn_k.bias, tensor_size=2048, offset=18114560, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[186]: n_dims = 2, name = a.blk.13.attn_k.weight, tensor_size=147456, offset=18116608, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[187]: n_dims = 1, name = a.blk.13.attn_out.bias, tensor_size=2048, offset=18264064, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[188]: n_dims = 2, name = a.blk.13.attn_out.weight, tensor_size=147456, offset=18266112, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[189]: n_dims = 1, name = a.blk.13.attn_q.bias, tensor_size=2048, offset=18413568, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[190]: n_dims = 2, name = a.blk.13.attn_q.weight, tensor_size=147456, offset=18415616, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[191]: n_dims = 1, name = a.blk.13.attn_v.bias, tensor_size=2048, offset=18563072, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[192]: n_dims = 2, name = a.blk.13.attn_v.weight, tensor_size=147456, offset=18565120, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[193]: n_dims = 1, name = a.blk.13.conv_dw.bias, tensor_size=2048, offset=18712576, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[194]: n_dims = 2, name = a.blk.13.conv_dw.weight, tensor_size=18432, offset=18714624, shape:[9, 512, 1, 1], type = f32
clip_model_loader: tensor[195]: n_dims = 1, name = a.blk.13.conv_norm.bias, tensor_size=2048, offset=18733056, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[196]: n_dims = 1, name = a.blk.13.conv_norm.weight, tensor_size=2048, offset=18735104, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[197]: n_dims = 1, name = a.blk.13.conv_pw1.bias, tensor_size=4096, offset=18737152, shape:[1024, 1, 1, 1], type = f32
clip_model_loader: tensor[198]: n_dims = 2, name = a.blk.13.conv_pw1.weight, tensor_size=294912, offset=18741248, shape:[512, 1024, 1, 1], type = q4_0
clip_model_loader: tensor[199]: n_dims = 1, name = a.blk.13.conv_pw2.bias, tensor_size=2048, offset=19036160, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[200]: n_dims = 2, name = a.blk.13.conv_pw2.weight, tensor_size=147456, offset=19038208, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[201]: n_dims = 1, name = a.blk.13.ffn_down.bias, tensor_size=2048, offset=19185664, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[202]: n_dims = 2, name = a.blk.13.ffn_down.weight, tensor_size=589824, offset=19187712, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[203]: n_dims = 1, name = a.blk.13.ffn_down_1.bias, tensor_size=2048, offset=19777536, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[204]: n_dims = 2, name = a.blk.13.ffn_down_1.weight, tensor_size=589824, offset=19779584, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[205]: n_dims = 1, name = a.blk.13.ffn_norm.bias, tensor_size=2048, offset=20369408, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[206]: n_dims = 1, name = a.blk.13.ffn_norm.weight, tensor_size=2048, offset=20371456, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[207]: n_dims = 1, name = a.blk.13.ffn_norm_1.bias, tensor_size=2048, offset=20373504, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[208]: n_dims = 1, name = a.blk.13.ffn_norm_1.weight, tensor_size=2048, offset=20375552, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[209]: n_dims = 1, name = a.blk.13.ffn_up.bias, tensor_size=8192, offset=20377600, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[210]: n_dims = 2, name = a.blk.13.ffn_up.weight, tensor_size=589824, offset=20385792, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[211]: n_dims = 1, name = a.blk.13.ffn_up_1.bias, tensor_size=8192, offset=20975616, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[212]: n_dims = 2, name = a.blk.13.ffn_up_1.weight, tensor_size=589824, offset=20983808, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[213]: n_dims = 2, name = a.blk.13.linear_pos.weight, tensor_size=147456, offset=21573632, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[214]: n_dims = 1, name = a.blk.13.ln1.bias, tensor_size=2048, offset=21721088, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[215]: n_dims = 1, name = a.blk.13.ln1.weight, tensor_size=2048, offset=21723136, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[216]: n_dims = 1, name = a.blk.13.ln2.bias, tensor_size=2048, offset=21725184, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[217]: n_dims = 1, name = a.blk.13.ln2.weight, tensor_size=2048, offset=21727232, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[218]: n_dims = 1, name = a.blk.13.norm_conv.bias, tensor_size=2048, offset=21729280, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[219]: n_dims = 1, name = a.blk.13.norm_conv.weight, tensor_size=2048, offset=21731328, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[220]: n_dims = 2, name = a.blk.13.pos_bias_u, tensor_size=2048, offset=21733376, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[221]: n_dims = 2, name = a.blk.13.pos_bias_v, tensor_size=2048, offset=21735424, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[222]: n_dims = 1, name = a.blk.14.attn_k.bias, tensor_size=2048, offset=21737472, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[223]: n_dims = 2, name = a.blk.14.attn_k.weight, tensor_size=147456, offset=21739520, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[224]: n_dims = 1, name = a.blk.14.attn_out.bias, tensor_size=2048, offset=21886976, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[225]: n_dims = 2, name = a.blk.14.attn_out.weight, tensor_size=147456, offset=21889024, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[226]: n_dims = 1, name = a.blk.14.attn_q.bias, tensor_size=2048, offset=22036480, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[227]: n_dims = 2, name = a.blk.14.attn_q.weight, tensor_size=147456, offset=22038528, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[228]: n_dims = 1, name = a.blk.14.attn_v.bias, tensor_size=2048, offset=22185984, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[229]: n_dims = 2, name = a.blk.14.attn_v.weight, tensor_size=147456, offset=22188032, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[230]: n_dims = 1, name = a.blk.14.conv_dw.bias, tensor_size=2048, offset=22335488, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[231]: n_dims = 2, name = a.blk.14.conv_dw.weight, tensor_size=18432, offset=22337536, shape:[9, 512, 1, 1], type = f32
clip_model_loader: tensor[232]: n_dims = 1, name = a.blk.14.conv_norm.bias, tensor_size=2048, offset=22355968, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[233]: n_dims = 1, name = a.blk.14.conv_norm.weight, tensor_size=2048, offset=22358016, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[234]: n_dims = 1, name = a.blk.14.conv_pw1.bias, tensor_size=4096, offset=22360064, shape:[1024, 1, 1, 1], type = f32
clip_model_loader: tensor[235]: n_dims = 2, name = a.blk.14.conv_pw1.weight, tensor_size=294912, offset=22364160, shape:[512, 1024, 1, 1], type = q4_0
clip_model_loader: tensor[236]: n_dims = 1, name = a.blk.14.conv_pw2.bias, tensor_size=2048, offset=22659072, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[237]: n_dims = 2, name = a.blk.14.conv_pw2.weight, tensor_size=147456, offset=22661120, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[238]: n_dims = 1, name = a.blk.14.ffn_down.bias, tensor_size=2048, offset=22808576, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[239]: n_dims = 2, name = a.blk.14.ffn_down.weight, tensor_size=589824, offset=22810624, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[240]: n_dims = 1, name = a.blk.14.ffn_down_1.bias, tensor_size=2048, offset=23400448, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[241]: n_dims = 2, name = a.blk.14.ffn_down_1.weight, tensor_size=589824, offset=23402496, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[242]: n_dims = 1, name = a.blk.14.ffn_norm.bias, tensor_size=2048, offset=23992320, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[243]: n_dims = 1, name = a.blk.14.ffn_norm.weight, tensor_size=2048, offset=23994368, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[244]: n_dims = 1, name = a.blk.14.ffn_norm_1.bias, tensor_size=2048, offset=23996416, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[245]: n_dims = 1, name = a.blk.14.ffn_norm_1.weight, tensor_size=2048, offset=23998464, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[246]: n_dims = 1, name = a.blk.14.ffn_up.bias, tensor_size=8192, offset=24000512, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[247]: n_dims = 2, name = a.blk.14.ffn_up.weight, tensor_size=589824, offset=24008704, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[248]: n_dims = 1, name = a.blk.14.ffn_up_1.bias, tensor_size=8192, offset=24598528, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[249]: n_dims = 2, name = a.blk.14.ffn_up_1.weight, tensor_size=589824, offset=24606720, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[250]: n_dims = 2, name = a.blk.14.linear_pos.weight, tensor_size=147456, offset=25196544, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[251]: n_dims = 1, name = a.blk.14.ln1.bias, tensor_size=2048, offset=25344000, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[252]: n_dims = 1, name = a.blk.14.ln1.weight, tensor_size=2048, offset=25346048, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[253]: n_dims = 1, name = a.blk.14.ln2.bias, tensor_size=2048, offset=25348096, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[254]: n_dims = 1, name = a.blk.14.ln2.weight, tensor_size=2048, offset=25350144, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[255]: n_dims = 1, name = a.blk.14.norm_conv.bias, tensor_size=2048, offset=25352192, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[256]: n_dims = 1, name = a.blk.14.norm_conv.weight, tensor_size=2048, offset=25354240, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[257]: n_dims = 2, name = a.blk.14.pos_bias_u, tensor_size=2048, offset=25356288, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[258]: n_dims = 2, name = a.blk.14.pos_bias_v, tensor_size=2048, offset=25358336, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[259]: n_dims = 1, name = a.blk.15.attn_k.bias, tensor_size=2048, offset=25360384, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[260]: n_dims = 2, name = a.blk.15.attn_k.weight, tensor_size=147456, offset=25362432, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[261]: n_dims = 1, name = a.blk.15.attn_out.bias, tensor_size=2048, offset=25509888, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[262]: n_dims = 2, name = a.blk.15.attn_out.weight, tensor_size=147456, offset=25511936, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[263]: n_dims = 1, name = a.blk.15.attn_q.bias, tensor_size=2048, offset=25659392, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[264]: n_dims = 2, name = a.blk.15.attn_q.weight, tensor_size=147456, offset=25661440, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[265]: n_dims = 1, name = a.blk.15.attn_v.bias, tensor_size=2048, offset=25808896, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[266]: n_dims = 2, name = a.blk.15.attn_v.weight, tensor_size=147456, offset=25810944, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[267]: n_dims = 1, name = a.blk.15.conv_dw.bias, tensor_size=2048, offset=25958400, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[268]: n_dims = 2, name = a.blk.15.conv_dw.weight, tensor_size=18432, offset=25960448, shape:[9, 512, 1, 1], type = f32
clip_model_loader: tensor[269]: n_dims = 1, name = a.blk.15.conv_norm.bias, tensor_size=2048, offset=25978880, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[270]: n_dims = 1, name = a.blk.15.conv_norm.weight, tensor_size=2048, offset=25980928, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[271]: n_dims = 1, name = a.blk.15.conv_pw1.bias, tensor_size=4096, offset=25982976, shape:[1024, 1, 1, 1], type = f32
clip_model_loader: tensor[272]: n_dims = 2, name = a.blk.15.conv_pw1.weight, tensor_size=294912, offset=25987072, shape:[512, 1024, 1, 1], type = q4_0
clip_model_loader: tensor[273]: n_dims = 1, name = a.blk.15.conv_pw2.bias, tensor_size=2048, offset=26281984, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[274]: n_dims = 2, name = a.blk.15.conv_pw2.weight, tensor_size=147456, offset=26284032, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[275]: n_dims = 1, name = a.blk.15.ffn_down.bias, tensor_size=2048, offset=26431488, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[276]: n_dims = 2, name = a.blk.15.ffn_down.weight, tensor_size=589824, offset=26433536, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[277]: n_dims = 1, name = a.blk.15.ffn_down_1.bias, tensor_size=2048, offset=27023360, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[278]: n_dims = 2, name = a.blk.15.ffn_down_1.weight, tensor_size=589824, offset=27025408, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[279]: n_dims = 1, name = a.blk.15.ffn_norm.bias, tensor_size=2048, offset=27615232, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[280]: n_dims = 1, name = a.blk.15.ffn_norm.weight, tensor_size=2048, offset=27617280, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[281]: n_dims = 1, name = a.blk.15.ffn_norm_1.bias, tensor_size=2048, offset=27619328, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[282]: n_dims = 1, name = a.blk.15.ffn_norm_1.weight, tensor_size=2048, offset=27621376, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[283]: n_dims = 1, name = a.blk.15.ffn_up.bias, tensor_size=8192, offset=27623424, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[284]: n_dims = 2, name = a.blk.15.ffn_up.weight, tensor_size=589824, offset=27631616, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[285]: n_dims = 1, name = a.blk.15.ffn_up_1.bias, tensor_size=8192, offset=28221440, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[286]: n_dims = 2, name = a.blk.15.ffn_up_1.weight, tensor_size=589824, offset=28229632, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[287]: n_dims = 2, name = a.blk.15.linear_pos.weight, tensor_size=147456, offset=28819456, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[288]: n_dims = 1, name = a.blk.15.ln1.bias, tensor_size=2048, offset=28966912, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[289]: n_dims = 1, name = a.blk.15.ln1.weight, tensor_size=2048, offset=28968960, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[290]: n_dims = 1, name = a.blk.15.ln2.bias, tensor_size=2048, offset=28971008, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[291]: n_dims = 1, name = a.blk.15.ln2.weight, tensor_size=2048, offset=28973056, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[292]: n_dims = 1, name = a.blk.15.norm_conv.bias, tensor_size=2048, offset=28975104, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[293]: n_dims = 1, name = a.blk.15.norm_conv.weight, tensor_size=2048, offset=28977152, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[294]: n_dims = 2, name = a.blk.15.pos_bias_u, tensor_size=2048, offset=28979200, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[295]: n_dims = 2, name = a.blk.15.pos_bias_v, tensor_size=2048, offset=28981248, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[296]: n_dims = 1, name = a.blk.16.attn_k.bias, tensor_size=2048, offset=28983296, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[297]: n_dims = 2, name = a.blk.16.attn_k.weight, tensor_size=147456, offset=28985344, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[298]: n_dims = 1, name = a.blk.16.attn_out.bias, tensor_size=2048, offset=29132800, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[299]: n_dims = 2, name = a.blk.16.attn_out.weight, tensor_size=147456, offset=29134848, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[300]: n_dims = 1, name = a.blk.16.attn_q.bias, tensor_size=2048, offset=29282304, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[301]: n_dims = 2, name = a.blk.16.attn_q.weight, tensor_size=147456, offset=29284352, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[302]: n_dims = 1, name = a.blk.16.attn_v.bias, tensor_size=2048, offset=29431808, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[303]: n_dims = 2, name = a.blk.16.attn_v.weight, tensor_size=147456, offset=29433856, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[304]: n_dims = 1, name = a.blk.16.conv_dw.bias, tensor_size=2048, offset=29581312, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[305]: n_dims = 2, name = a.blk.16.conv_dw.weight, tensor_size=18432, offset=29583360, shape:[9, 512, 1, 1], type = f32
clip_model_loader: tensor[306]: n_dims = 1, name = a.blk.16.conv_norm.bias, tensor_size=2048, offset=29601792, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[307]: n_dims = 1, name = a.blk.16.conv_norm.weight, tensor_size=2048, offset=29603840, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[308]: n_dims = 1, name = a.blk.16.conv_pw1.bias, tensor_size=4096, offset=29605888, shape:[1024, 1, 1, 1], type = f32
clip_model_loader: tensor[309]: n_dims = 2, name = a.blk.16.conv_pw1.weight, tensor_size=294912, offset=29609984, shape:[512, 1024, 1, 1], type = q4_0
clip_model_loader: tensor[310]: n_dims = 1, name = a.blk.16.conv_pw2.bias, tensor_size=2048, offset=29904896, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[311]: n_dims = 2, name = a.blk.16.conv_pw2.weight, tensor_size=147456, offset=29906944, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[312]: n_dims = 1, name = a.blk.16.ffn_down.bias, tensor_size=2048, offset=30054400, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[313]: n_dims = 2, name = a.blk.16.ffn_down.weight, tensor_size=589824, offset=30056448, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[314]: n_dims = 1, name = a.blk.16.ffn_down_1.bias, tensor_size=2048, offset=30646272, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[315]: n_dims = 2, name = a.blk.16.ffn_down_1.weight, tensor_size=589824, offset=30648320, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[316]: n_dims = 1, name = a.blk.16.ffn_norm.bias, tensor_size=2048, offset=31238144, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[317]: n_dims = 1, name = a.blk.16.ffn_norm.weight, tensor_size=2048, offset=31240192, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[318]: n_dims = 1, name = a.blk.16.ffn_norm_1.bias, tensor_size=2048, offset=31242240, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[319]: n_dims = 1, name = a.blk.16.ffn_norm_1.weight, tensor_size=2048, offset=31244288, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[320]: n_dims = 1, name = a.blk.16.ffn_up.bias, tensor_size=8192, offset=31246336, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[321]: n_dims = 2, name = a.blk.16.ffn_up.weight, tensor_size=589824, offset=31254528, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[322]: n_dims = 1, name = a.blk.16.ffn_up_1.bias, tensor_size=8192, offset=31844352, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[323]: n_dims = 2, name = a.blk.16.ffn_up_1.weight, tensor_size=589824, offset=31852544, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[324]: n_dims = 2, name = a.blk.16.linear_pos.weight, tensor_size=147456, offset=32442368, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[325]: n_dims = 1, name = a.blk.16.ln1.bias, tensor_size=2048, offset=32589824, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[326]: n_dims = 1, name = a.blk.16.ln1.weight, tensor_size=2048, offset=32591872, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[327]: n_dims = 1, name = a.blk.16.ln2.bias, tensor_size=2048, offset=32593920, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[328]: n_dims = 1, name = a.blk.16.ln2.weight, tensor_size=2048, offset=32595968, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[329]: n_dims = 1, name = a.blk.16.norm_conv.bias, tensor_size=2048, offset=32598016, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[330]: n_dims = 1, name = a.blk.16.norm_conv.weight, tensor_size=2048, offset=32600064, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[331]: n_dims = 2, name = a.blk.16.pos_bias_u, tensor_size=2048, offset=32602112, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[332]: n_dims = 2, name = a.blk.16.pos_bias_v, tensor_size=2048, offset=32604160, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[333]: n_dims = 1, name = a.blk.2.attn_k.bias, tensor_size=2048, offset=32606208, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[334]: n_dims = 2, name = a.blk.2.attn_k.weight, tensor_size=147456, offset=32608256, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[335]: n_dims = 1, name = a.blk.2.attn_out.bias, tensor_size=2048, offset=32755712, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[336]: n_dims = 2, name = a.blk.2.attn_out.weight, tensor_size=147456, offset=32757760, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[337]: n_dims = 1, name = a.blk.2.attn_q.bias, tensor_size=2048, offset=32905216, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[338]: n_dims = 2, name = a.blk.2.attn_q.weight, tensor_size=147456, offset=32907264, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[339]: n_dims = 1, name = a.blk.2.attn_v.bias, tensor_size=2048, offset=33054720, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[340]: n_dims = 2, name = a.blk.2.attn_v.weight, tensor_size=147456, offset=33056768, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[341]: n_dims = 1, name = a.blk.2.conv_dw.bias, tensor_size=2048, offset=33204224, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[342]: n_dims = 2, name = a.blk.2.conv_dw.weight, tensor_size=18432, offset=33206272, shape:[9, 512, 1, 1], type = f32
clip_model_loader: tensor[343]: n_dims = 1, name = a.blk.2.conv_norm.bias, tensor_size=2048, offset=33224704, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[344]: n_dims = 1, name = a.blk.2.conv_norm.weight, tensor_size=2048, offset=33226752, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[345]: n_dims = 1, name = a.blk.2.conv_pw1.bias, tensor_size=4096, offset=33228800, shape:[1024, 1, 1, 1], type = f32
clip_model_loader: tensor[346]: n_dims = 2, name = a.blk.2.conv_pw1.weight, tensor_size=294912, offset=33232896, shape:[512, 1024, 1, 1], type = q4_0
clip_model_loader: tensor[347]: n_dims = 1, name = a.blk.2.conv_pw2.bias, tensor_size=2048, offset=33527808, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[348]: n_dims = 2, name = a.blk.2.conv_pw2.weight, tensor_size=147456, offset=33529856, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[349]: n_dims = 1, name = a.blk.2.ffn_down.bias, tensor_size=2048, offset=33677312, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[350]: n_dims = 2, name = a.blk.2.ffn_down.weight, tensor_size=589824, offset=33679360, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[351]: n_dims = 1, name = a.blk.2.ffn_down_1.bias, tensor_size=2048, offset=34269184, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[352]: n_dims = 2, name = a.blk.2.ffn_down_1.weight, tensor_size=589824, offset=34271232, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[353]: n_dims = 1, name = a.blk.2.ffn_norm.bias, tensor_size=2048, offset=34861056, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[354]: n_dims = 1, name = a.blk.2.ffn_norm.weight, tensor_size=2048, offset=34863104, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[355]: n_dims = 1, name = a.blk.2.ffn_norm_1.bias, tensor_size=2048, offset=34865152, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[356]: n_dims = 1, name = a.blk.2.ffn_norm_1.weight, tensor_size=2048, offset=34867200, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[357]: n_dims = 1, name = a.blk.2.ffn_up.bias, tensor_size=8192, offset=34869248, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[358]: n_dims = 2, name = a.blk.2.ffn_up.weight, tensor_size=589824, offset=34877440, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[359]: n_dims = 1, name = a.blk.2.ffn_up_1.bias, tensor_size=8192, offset=35467264, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[360]: n_dims = 2, name = a.blk.2.ffn_up_1.weight, tensor_size=589824, offset=35475456, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[361]: n_dims = 2, name = a.blk.2.linear_pos.weight, tensor_size=147456, offset=36065280, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[362]: n_dims = 1, name = a.blk.2.ln1.bias, tensor_size=2048, offset=36212736, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[363]: n_dims = 1, name = a.blk.2.ln1.weight, tensor_size=2048, offset=36214784, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[364]: n_dims = 1, name = a.blk.2.ln2.bias, tensor_size=2048, offset=36216832, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[365]: n_dims = 1, name = a.blk.2.ln2.weight, tensor_size=2048, offset=36218880, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[366]: n_dims = 1, name = a.blk.2.norm_conv.bias, tensor_size=2048, offset=36220928, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[367]: n_dims = 1, name = a.blk.2.norm_conv.weight, tensor_size=2048, offset=36222976, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[368]: n_dims = 2, name = a.blk.2.pos_bias_u, tensor_size=2048, offset=36225024, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[369]: n_dims = 2, name = a.blk.2.pos_bias_v, tensor_size=2048, offset=36227072, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[370]: n_dims = 1, name = a.blk.3.attn_k.bias, tensor_size=2048, offset=36229120, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[371]: n_dims = 2, name = a.blk.3.attn_k.weight, tensor_size=147456, offset=36231168, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[372]: n_dims = 1, name = a.blk.3.attn_out.bias, tensor_size=2048, offset=36378624, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[373]: n_dims = 2, name = a.blk.3.attn_out.weight, tensor_size=147456, offset=36380672, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[374]: n_dims = 1, name = a.blk.3.attn_q.bias, tensor_size=2048, offset=36528128, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[375]: n_dims = 2, name = a.blk.3.attn_q.weight, tensor_size=147456, offset=36530176, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[376]: n_dims = 1, name = a.blk.3.attn_v.bias, tensor_size=2048, offset=36677632, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[377]: n_dims = 2, name = a.blk.3.attn_v.weight, tensor_size=147456, offset=36679680, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[378]: n_dims = 1, name = a.blk.3.conv_dw.bias, tensor_size=2048, offset=36827136, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[379]: n_dims = 2, name = a.blk.3.conv_dw.weight, tensor_size=18432, offset=36829184, shape:[9, 512, 1, 1], type = f32
clip_model_loader: tensor[380]: n_dims = 1, name = a.blk.3.conv_norm.bias, tensor_size=2048, offset=36847616, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[381]: n_dims = 1, name = a.blk.3.conv_norm.weight, tensor_size=2048, offset=36849664, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[382]: n_dims = 1, name = a.blk.3.conv_pw1.bias, tensor_size=4096, offset=36851712, shape:[1024, 1, 1, 1], type = f32
clip_model_loader: tensor[383]: n_dims = 2, name = a.blk.3.conv_pw1.weight, tensor_size=294912, offset=36855808, shape:[512, 1024, 1, 1], type = q4_0
clip_model_loader: tensor[384]: n_dims = 1, name = a.blk.3.conv_pw2.bias, tensor_size=2048, offset=37150720, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[385]: n_dims = 2, name = a.blk.3.conv_pw2.weight, tensor_size=147456, offset=37152768, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[386]: n_dims = 1, name = a.blk.3.ffn_down.bias, tensor_size=2048, offset=37300224, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[387]: n_dims = 2, name = a.blk.3.ffn_down.weight, tensor_size=589824, offset=37302272, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[388]: n_dims = 1, name = a.blk.3.ffn_down_1.bias, tensor_size=2048, offset=37892096, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[389]: n_dims = 2, name = a.blk.3.ffn_down_1.weight, tensor_size=589824, offset=37894144, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[390]: n_dims = 1, name = a.blk.3.ffn_norm.bias, tensor_size=2048, offset=38483968, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[391]: n_dims = 1, name = a.blk.3.ffn_norm.weight, tensor_size=2048, offset=38486016, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[392]: n_dims = 1, name = a.blk.3.ffn_norm_1.bias, tensor_size=2048, offset=38488064, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[393]: n_dims = 1, name = a.blk.3.ffn_norm_1.weight, tensor_size=2048, offset=38490112, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[394]: n_dims = 1, name = a.blk.3.ffn_up.bias, tensor_size=8192, offset=38492160, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[395]: n_dims = 2, name = a.blk.3.ffn_up.weight, tensor_size=589824, offset=38500352, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[396]: n_dims = 1, name = a.blk.3.ffn_up_1.bias, tensor_size=8192, offset=39090176, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[397]: n_dims = 2, name = a.blk.3.ffn_up_1.weight, tensor_size=589824, offset=39098368, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[398]: n_dims = 2, name = a.blk.3.linear_pos.weight, tensor_size=147456, offset=39688192, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[399]: n_dims = 1, name = a.blk.3.ln1.bias, tensor_size=2048, offset=39835648, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[400]: n_dims = 1, name = a.blk.3.ln1.weight, tensor_size=2048, offset=39837696, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[401]: n_dims = 1, name = a.blk.3.ln2.bias, tensor_size=2048, offset=39839744, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[402]: n_dims = 1, name = a.blk.3.ln2.weight, tensor_size=2048, offset=39841792, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[403]: n_dims = 1, name = a.blk.3.norm_conv.bias, tensor_size=2048, offset=39843840, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[404]: n_dims = 1, name = a.blk.3.norm_conv.weight, tensor_size=2048, offset=39845888, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[405]: n_dims = 2, name = a.blk.3.pos_bias_u, tensor_size=2048, offset=39847936, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[406]: n_dims = 2, name = a.blk.3.pos_bias_v, tensor_size=2048, offset=39849984, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[407]: n_dims = 1, name = a.blk.4.attn_k.bias, tensor_size=2048, offset=39852032, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[408]: n_dims = 2, name = a.blk.4.attn_k.weight, tensor_size=147456, offset=39854080, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[409]: n_dims = 1, name = a.blk.4.attn_out.bias, tensor_size=2048, offset=40001536, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[410]: n_dims = 2, name = a.blk.4.attn_out.weight, tensor_size=147456, offset=40003584, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[411]: n_dims = 1, name = a.blk.4.attn_q.bias, tensor_size=2048, offset=40151040, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[412]: n_dims = 2, name = a.blk.4.attn_q.weight, tensor_size=147456, offset=40153088, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[413]: n_dims = 1, name = a.blk.4.attn_v.bias, tensor_size=2048, offset=40300544, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[414]: n_dims = 2, name = a.blk.4.attn_v.weight, tensor_size=147456, offset=40302592, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[415]: n_dims = 1, name = a.blk.4.conv_dw.bias, tensor_size=2048, offset=40450048, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[416]: n_dims = 2, name = a.blk.4.conv_dw.weight, tensor_size=18432, offset=40452096, shape:[9, 512, 1, 1], type = f32
clip_model_loader: tensor[417]: n_dims = 1, name = a.blk.4.conv_norm.bias, tensor_size=2048, offset=40470528, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[418]: n_dims = 1, name = a.blk.4.conv_norm.weight, tensor_size=2048, offset=40472576, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[419]: n_dims = 1, name = a.blk.4.conv_pw1.bias, tensor_size=4096, offset=40474624, shape:[1024, 1, 1, 1], type = f32
clip_model_loader: tensor[420]: n_dims = 2, name = a.blk.4.conv_pw1.weight, tensor_size=294912, offset=40478720, shape:[512, 1024, 1, 1], type = q4_0
clip_model_loader: tensor[421]: n_dims = 1, name = a.blk.4.conv_pw2.bias, tensor_size=2048, offset=40773632, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[422]: n_dims = 2, name = a.blk.4.conv_pw2.weight, tensor_size=147456, offset=40775680, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[423]: n_dims = 1, name = a.blk.4.ffn_down.bias, tensor_size=2048, offset=40923136, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[424]: n_dims = 2, name = a.blk.4.ffn_down.weight, tensor_size=589824, offset=40925184, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[425]: n_dims = 1, name = a.blk.4.ffn_down_1.bias, tensor_size=2048, offset=41515008, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[426]: n_dims = 2, name = a.blk.4.ffn_down_1.weight, tensor_size=589824, offset=41517056, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[427]: n_dims = 1, name = a.blk.4.ffn_norm.bias, tensor_size=2048, offset=42106880, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[428]: n_dims = 1, name = a.blk.4.ffn_norm.weight, tensor_size=2048, offset=42108928, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[429]: n_dims = 1, name = a.blk.4.ffn_norm_1.bias, tensor_size=2048, offset=42110976, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[430]: n_dims = 1, name = a.blk.4.ffn_norm_1.weight, tensor_size=2048, offset=42113024, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[431]: n_dims = 1, name = a.blk.4.ffn_up.bias, tensor_size=8192, offset=42115072, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[432]: n_dims = 2, name = a.blk.4.ffn_up.weight, tensor_size=589824, offset=42123264, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[433]: n_dims = 1, name = a.blk.4.ffn_up_1.bias, tensor_size=8192, offset=42713088, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[434]: n_dims = 2, name = a.blk.4.ffn_up_1.weight, tensor_size=589824, offset=42721280, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[435]: n_dims = 2, name = a.blk.4.linear_pos.weight, tensor_size=147456, offset=43311104, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[436]: n_dims = 1, name = a.blk.4.ln1.bias, tensor_size=2048, offset=43458560, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[437]: n_dims = 1, name = a.blk.4.ln1.weight, tensor_size=2048, offset=43460608, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[438]: n_dims = 1, name = a.blk.4.ln2.bias, tensor_size=2048, offset=43462656, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[439]: n_dims = 1, name = a.blk.4.ln2.weight, tensor_size=2048, offset=43464704, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[440]: n_dims = 1, name = a.blk.4.norm_conv.bias, tensor_size=2048, offset=43466752, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[441]: n_dims = 1, name = a.blk.4.norm_conv.weight, tensor_size=2048, offset=43468800, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[442]: n_dims = 2, name = a.blk.4.pos_bias_u, tensor_size=2048, offset=43470848, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[443]: n_dims = 2, name = a.blk.4.pos_bias_v, tensor_size=2048, offset=43472896, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[444]: n_dims = 1, name = a.blk.5.attn_k.bias, tensor_size=2048, offset=43474944, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[445]: n_dims = 2, name = a.blk.5.attn_k.weight, tensor_size=147456, offset=43476992, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[446]: n_dims = 1, name = a.blk.5.attn_out.bias, tensor_size=2048, offset=43624448, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[447]: n_dims = 2, name = a.blk.5.attn_out.weight, tensor_size=147456, offset=43626496, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[448]: n_dims = 1, name = a.blk.5.attn_q.bias, tensor_size=2048, offset=43773952, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[449]: n_dims = 2, name = a.blk.5.attn_q.weight, tensor_size=147456, offset=43776000, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[450]: n_dims = 1, name = a.blk.5.attn_v.bias, tensor_size=2048, offset=43923456, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[451]: n_dims = 2, name = a.blk.5.attn_v.weight, tensor_size=147456, offset=43925504, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[452]: n_dims = 1, name = a.blk.5.conv_dw.bias, tensor_size=2048, offset=44072960, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[453]: n_dims = 2, name = a.blk.5.conv_dw.weight, tensor_size=18432, offset=44075008, shape:[9, 512, 1, 1], type = f32
clip_model_loader: tensor[454]: n_dims = 1, name = a.blk.5.conv_norm.bias, tensor_size=2048, offset=44093440, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[455]: n_dims = 1, name = a.blk.5.conv_norm.weight, tensor_size=2048, offset=44095488, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[456]: n_dims = 1, name = a.blk.5.conv_pw1.bias, tensor_size=4096, offset=44097536, shape:[1024, 1, 1, 1], type = f32
clip_model_loader: tensor[457]: n_dims = 2, name = a.blk.5.conv_pw1.weight, tensor_size=294912, offset=44101632, shape:[512, 1024, 1, 1], type = q4_0
clip_model_loader: tensor[458]: n_dims = 1, name = a.blk.5.conv_pw2.bias, tensor_size=2048, offset=44396544, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[459]: n_dims = 2, name = a.blk.5.conv_pw2.weight, tensor_size=147456, offset=44398592, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[460]: n_dims = 1, name = a.blk.5.ffn_down.bias, tensor_size=2048, offset=44546048, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[461]: n_dims = 2, name = a.blk.5.ffn_down.weight, tensor_size=589824, offset=44548096, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[462]: n_dims = 1, name = a.blk.5.ffn_down_1.bias, tensor_size=2048, offset=45137920, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[463]: n_dims = 2, name = a.blk.5.ffn_down_1.weight, tensor_size=589824, offset=45139968, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[464]: n_dims = 1, name = a.blk.5.ffn_norm.bias, tensor_size=2048, offset=45729792, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[465]: n_dims = 1, name = a.blk.5.ffn_norm.weight, tensor_size=2048, offset=45731840, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[466]: n_dims = 1, name = a.blk.5.ffn_norm_1.bias, tensor_size=2048, offset=45733888, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[467]: n_dims = 1, name = a.blk.5.ffn_norm_1.weight, tensor_size=2048, offset=45735936, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[468]: n_dims = 1, name = a.blk.5.ffn_up.bias, tensor_size=8192, offset=45737984, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[469]: n_dims = 2, name = a.blk.5.ffn_up.weight, tensor_size=589824, offset=45746176, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[470]: n_dims = 1, name = a.blk.5.ffn_up_1.bias, tensor_size=8192, offset=46336000, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[471]: n_dims = 2, name = a.blk.5.ffn_up_1.weight, tensor_size=589824, offset=46344192, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[472]: n_dims = 2, name = a.blk.5.linear_pos.weight, tensor_size=147456, offset=46934016, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[473]: n_dims = 1, name = a.blk.5.ln1.bias, tensor_size=2048, offset=47081472, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[474]: n_dims = 1, name = a.blk.5.ln1.weight, tensor_size=2048, offset=47083520, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[475]: n_dims = 1, name = a.blk.5.ln2.bias, tensor_size=2048, offset=47085568, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[476]: n_dims = 1, name = a.blk.5.ln2.weight, tensor_size=2048, offset=47087616, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[477]: n_dims = 1, name = a.blk.5.norm_conv.bias, tensor_size=2048, offset=47089664, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[478]: n_dims = 1, name = a.blk.5.norm_conv.weight, tensor_size=2048, offset=47091712, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[479]: n_dims = 2, name = a.blk.5.pos_bias_u, tensor_size=2048, offset=47093760, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[480]: n_dims = 2, name = a.blk.5.pos_bias_v, tensor_size=2048, offset=47095808, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[481]: n_dims = 1, name = a.blk.6.attn_k.bias, tensor_size=2048, offset=47097856, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[482]: n_dims = 2, name = a.blk.6.attn_k.weight, tensor_size=147456, offset=47099904, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[483]: n_dims = 1, name = a.blk.6.attn_out.bias, tensor_size=2048, offset=47247360, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[484]: n_dims = 2, name = a.blk.6.attn_out.weight, tensor_size=147456, offset=47249408, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[485]: n_dims = 1, name = a.blk.6.attn_q.bias, tensor_size=2048, offset=47396864, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[486]: n_dims = 2, name = a.blk.6.attn_q.weight, tensor_size=147456, offset=47398912, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[487]: n_dims = 1, name = a.blk.6.attn_v.bias, tensor_size=2048, offset=47546368, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[488]: n_dims = 2, name = a.blk.6.attn_v.weight, tensor_size=147456, offset=47548416, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[489]: n_dims = 1, name = a.blk.6.conv_dw.bias, tensor_size=2048, offset=47695872, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[490]: n_dims = 2, name = a.blk.6.conv_dw.weight, tensor_size=18432, offset=47697920, shape:[9, 512, 1, 1], type = f32
clip_model_loader: tensor[491]: n_dims = 1, name = a.blk.6.conv_norm.bias, tensor_size=2048, offset=47716352, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[492]: n_dims = 1, name = a.blk.6.conv_norm.weight, tensor_size=2048, offset=47718400, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[493]: n_dims = 1, name = a.blk.6.conv_pw1.bias, tensor_size=4096, offset=47720448, shape:[1024, 1, 1, 1], type = f32
clip_model_loader: tensor[494]: n_dims = 2, name = a.blk.6.conv_pw1.weight, tensor_size=294912, offset=47724544, shape:[512, 1024, 1, 1], type = q4_0
clip_model_loader: tensor[495]: n_dims = 1, name = a.blk.6.conv_pw2.bias, tensor_size=2048, offset=48019456, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[496]: n_dims = 2, name = a.blk.6.conv_pw2.weight, tensor_size=147456, offset=48021504, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[497]: n_dims = 1, name = a.blk.6.ffn_down.bias, tensor_size=2048, offset=48168960, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[498]: n_dims = 2, name = a.blk.6.ffn_down.weight, tensor_size=589824, offset=48171008, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[499]: n_dims = 1, name = a.blk.6.ffn_down_1.bias, tensor_size=2048, offset=48760832, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[500]: n_dims = 2, name = a.blk.6.ffn_down_1.weight, tensor_size=589824, offset=48762880, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[501]: n_dims = 1, name = a.blk.6.ffn_norm.bias, tensor_size=2048, offset=49352704, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[502]: n_dims = 1, name = a.blk.6.ffn_norm.weight, tensor_size=2048, offset=49354752, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[503]: n_dims = 1, name = a.blk.6.ffn_norm_1.bias, tensor_size=2048, offset=49356800, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[504]: n_dims = 1, name = a.blk.6.ffn_norm_1.weight, tensor_size=2048, offset=49358848, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[505]: n_dims = 1, name = a.blk.6.ffn_up.bias, tensor_size=8192, offset=49360896, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[506]: n_dims = 2, name = a.blk.6.ffn_up.weight, tensor_size=589824, offset=49369088, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[507]: n_dims = 1, name = a.blk.6.ffn_up_1.bias, tensor_size=8192, offset=49958912, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[508]: n_dims = 2, name = a.blk.6.ffn_up_1.weight, tensor_size=589824, offset=49967104, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[509]: n_dims = 2, name = a.blk.6.linear_pos.weight, tensor_size=147456, offset=50556928, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[510]: n_dims = 1, name = a.blk.6.ln1.bias, tensor_size=2048, offset=50704384, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[511]: n_dims = 1, name = a.blk.6.ln1.weight, tensor_size=2048, offset=50706432, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[512]: n_dims = 1, name = a.blk.6.ln2.bias, tensor_size=2048, offset=50708480, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[513]: n_dims = 1, name = a.blk.6.ln2.weight, tensor_size=2048, offset=50710528, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[514]: n_dims = 1, name = a.blk.6.norm_conv.bias, tensor_size=2048, offset=50712576, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[515]: n_dims = 1, name = a.blk.6.norm_conv.weight, tensor_size=2048, offset=50714624, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[516]: n_dims = 2, name = a.blk.6.pos_bias_u, tensor_size=2048, offset=50716672, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[517]: n_dims = 2, name = a.blk.6.pos_bias_v, tensor_size=2048, offset=50718720, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[518]: n_dims = 1, name = a.blk.7.attn_k.bias, tensor_size=2048, offset=50720768, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[519]: n_dims = 2, name = a.blk.7.attn_k.weight, tensor_size=147456, offset=50722816, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[520]: n_dims = 1, name = a.blk.7.attn_out.bias, tensor_size=2048, offset=50870272, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[521]: n_dims = 2, name = a.blk.7.attn_out.weight, tensor_size=147456, offset=50872320, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[522]: n_dims = 1, name = a.blk.7.attn_q.bias, tensor_size=2048, offset=51019776, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[523]: n_dims = 2, name = a.blk.7.attn_q.weight, tensor_size=147456, offset=51021824, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[524]: n_dims = 1, name = a.blk.7.attn_v.bias, tensor_size=2048, offset=51169280, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[525]: n_dims = 2, name = a.blk.7.attn_v.weight, tensor_size=147456, offset=51171328, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[526]: n_dims = 1, name = a.blk.7.conv_dw.bias, tensor_size=2048, offset=51318784, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[527]: n_dims = 2, name = a.blk.7.conv_dw.weight, tensor_size=18432, offset=51320832, shape:[9, 512, 1, 1], type = f32
clip_model_loader: tensor[528]: n_dims = 1, name = a.blk.7.conv_norm.bias, tensor_size=2048, offset=51339264, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[529]: n_dims = 1, name = a.blk.7.conv_norm.weight, tensor_size=2048, offset=51341312, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[530]: n_dims = 1, name = a.blk.7.conv_pw1.bias, tensor_size=4096, offset=51343360, shape:[1024, 1, 1, 1], type = f32
clip_model_loader: tensor[531]: n_dims = 2, name = a.blk.7.conv_pw1.weight, tensor_size=294912, offset=51347456, shape:[512, 1024, 1, 1], type = q4_0
clip_model_loader: tensor[532]: n_dims = 1, name = a.blk.7.conv_pw2.bias, tensor_size=2048, offset=51642368, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[533]: n_dims = 2, name = a.blk.7.conv_pw2.weight, tensor_size=147456, offset=51644416, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[534]: n_dims = 1, name = a.blk.7.ffn_down.bias, tensor_size=2048, offset=51791872, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[535]: n_dims = 2, name = a.blk.7.ffn_down.weight, tensor_size=589824, offset=51793920, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[536]: n_dims = 1, name = a.blk.7.ffn_down_1.bias, tensor_size=2048, offset=52383744, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[537]: n_dims = 2, name = a.blk.7.ffn_down_1.weight, tensor_size=589824, offset=52385792, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[538]: n_dims = 1, name = a.blk.7.ffn_norm.bias, tensor_size=2048, offset=52975616, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[539]: n_dims = 1, name = a.blk.7.ffn_norm.weight, tensor_size=2048, offset=52977664, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[540]: n_dims = 1, name = a.blk.7.ffn_norm_1.bias, tensor_size=2048, offset=52979712, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[541]: n_dims = 1, name = a.blk.7.ffn_norm_1.weight, tensor_size=2048, offset=52981760, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[542]: n_dims = 1, name = a.blk.7.ffn_up.bias, tensor_size=8192, offset=52983808, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[543]: n_dims = 2, name = a.blk.7.ffn_up.weight, tensor_size=589824, offset=52992000, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[544]: n_dims = 1, name = a.blk.7.ffn_up_1.bias, tensor_size=8192, offset=53581824, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[545]: n_dims = 2, name = a.blk.7.ffn_up_1.weight, tensor_size=589824, offset=53590016, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[546]: n_dims = 2, name = a.blk.7.linear_pos.weight, tensor_size=147456, offset=54179840, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[547]: n_dims = 1, name = a.blk.7.ln1.bias, tensor_size=2048, offset=54327296, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[548]: n_dims = 1, name = a.blk.7.ln1.weight, tensor_size=2048, offset=54329344, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[549]: n_dims = 1, name = a.blk.7.ln2.bias, tensor_size=2048, offset=54331392, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[550]: n_dims = 1, name = a.blk.7.ln2.weight, tensor_size=2048, offset=54333440, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[551]: n_dims = 1, name = a.blk.7.norm_conv.bias, tensor_size=2048, offset=54335488, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[552]: n_dims = 1, name = a.blk.7.norm_conv.weight, tensor_size=2048, offset=54337536, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[553]: n_dims = 2, name = a.blk.7.pos_bias_u, tensor_size=2048, offset=54339584, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[554]: n_dims = 2, name = a.blk.7.pos_bias_v, tensor_size=2048, offset=54341632, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[555]: n_dims = 1, name = a.blk.8.attn_k.bias, tensor_size=2048, offset=54343680, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[556]: n_dims = 2, name = a.blk.8.attn_k.weight, tensor_size=147456, offset=54345728, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[557]: n_dims = 1, name = a.blk.8.attn_out.bias, tensor_size=2048, offset=54493184, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[558]: n_dims = 2, name = a.blk.8.attn_out.weight, tensor_size=147456, offset=54495232, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[559]: n_dims = 1, name = a.blk.8.attn_q.bias, tensor_size=2048, offset=54642688, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[560]: n_dims = 2, name = a.blk.8.attn_q.weight, tensor_size=147456, offset=54644736, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[561]: n_dims = 1, name = a.blk.8.attn_v.bias, tensor_size=2048, offset=54792192, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[562]: n_dims = 2, name = a.blk.8.attn_v.weight, tensor_size=147456, offset=54794240, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[563]: n_dims = 1, name = a.blk.8.conv_dw.bias, tensor_size=2048, offset=54941696, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[564]: n_dims = 2, name = a.blk.8.conv_dw.weight, tensor_size=18432, offset=54943744, shape:[9, 512, 1, 1], type = f32
clip_model_loader: tensor[565]: n_dims = 1, name = a.blk.8.conv_norm.bias, tensor_size=2048, offset=54962176, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[566]: n_dims = 1, name = a.blk.8.conv_norm.weight, tensor_size=2048, offset=54964224, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[567]: n_dims = 1, name = a.blk.8.conv_pw1.bias, tensor_size=4096, offset=54966272, shape:[1024, 1, 1, 1], type = f32
clip_model_loader: tensor[568]: n_dims = 2, name = a.blk.8.conv_pw1.weight, tensor_size=294912, offset=54970368, shape:[512, 1024, 1, 1], type = q4_0
clip_model_loader: tensor[569]: n_dims = 1, name = a.blk.8.conv_pw2.bias, tensor_size=2048, offset=55265280, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[570]: n_dims = 2, name = a.blk.8.conv_pw2.weight, tensor_size=147456, offset=55267328, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[571]: n_dims = 1, name = a.blk.8.ffn_down.bias, tensor_size=2048, offset=55414784, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[572]: n_dims = 2, name = a.blk.8.ffn_down.weight, tensor_size=589824, offset=55416832, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[573]: n_dims = 1, name = a.blk.8.ffn_down_1.bias, tensor_size=2048, offset=56006656, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[574]: n_dims = 2, name = a.blk.8.ffn_down_1.weight, tensor_size=589824, offset=56008704, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[575]: n_dims = 1, name = a.blk.8.ffn_norm.bias, tensor_size=2048, offset=56598528, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[576]: n_dims = 1, name = a.blk.8.ffn_norm.weight, tensor_size=2048, offset=56600576, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[577]: n_dims = 1, name = a.blk.8.ffn_norm_1.bias, tensor_size=2048, offset=56602624, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[578]: n_dims = 1, name = a.blk.8.ffn_norm_1.weight, tensor_size=2048, offset=56604672, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[579]: n_dims = 1, name = a.blk.8.ffn_up.bias, tensor_size=8192, offset=56606720, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[580]: n_dims = 2, name = a.blk.8.ffn_up.weight, tensor_size=589824, offset=56614912, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[581]: n_dims = 1, name = a.blk.8.ffn_up_1.bias, tensor_size=8192, offset=57204736, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[582]: n_dims = 2, name = a.blk.8.ffn_up_1.weight, tensor_size=589824, offset=57212928, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[583]: n_dims = 2, name = a.blk.8.linear_pos.weight, tensor_size=147456, offset=57802752, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[584]: n_dims = 1, name = a.blk.8.ln1.bias, tensor_size=2048, offset=57950208, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[585]: n_dims = 1, name = a.blk.8.ln1.weight, tensor_size=2048, offset=57952256, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[586]: n_dims = 1, name = a.blk.8.ln2.bias, tensor_size=2048, offset=57954304, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[587]: n_dims = 1, name = a.blk.8.ln2.weight, tensor_size=2048, offset=57956352, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[588]: n_dims = 1, name = a.blk.8.norm_conv.bias, tensor_size=2048, offset=57958400, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[589]: n_dims = 1, name = a.blk.8.norm_conv.weight, tensor_size=2048, offset=57960448, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[590]: n_dims = 2, name = a.blk.8.pos_bias_u, tensor_size=2048, offset=57962496, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[591]: n_dims = 2, name = a.blk.8.pos_bias_v, tensor_size=2048, offset=57964544, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[592]: n_dims = 1, name = a.blk.9.attn_k.bias, tensor_size=2048, offset=57966592, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[593]: n_dims = 2, name = a.blk.9.attn_k.weight, tensor_size=147456, offset=57968640, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[594]: n_dims = 1, name = a.blk.9.attn_out.bias, tensor_size=2048, offset=58116096, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[595]: n_dims = 2, name = a.blk.9.attn_out.weight, tensor_size=147456, offset=58118144, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[596]: n_dims = 1, name = a.blk.9.attn_q.bias, tensor_size=2048, offset=58265600, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[597]: n_dims = 2, name = a.blk.9.attn_q.weight, tensor_size=147456, offset=58267648, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[598]: n_dims = 1, name = a.blk.9.attn_v.bias, tensor_size=2048, offset=58415104, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[599]: n_dims = 2, name = a.blk.9.attn_v.weight, tensor_size=147456, offset=58417152, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[600]: n_dims = 1, name = a.blk.9.conv_dw.bias, tensor_size=2048, offset=58564608, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[601]: n_dims = 2, name = a.blk.9.conv_dw.weight, tensor_size=18432, offset=58566656, shape:[9, 512, 1, 1], type = f32
clip_model_loader: tensor[602]: n_dims = 1, name = a.blk.9.conv_norm.bias, tensor_size=2048, offset=58585088, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[603]: n_dims = 1, name = a.blk.9.conv_norm.weight, tensor_size=2048, offset=58587136, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[604]: n_dims = 1, name = a.blk.9.conv_pw1.bias, tensor_size=4096, offset=58589184, shape:[1024, 1, 1, 1], type = f32
clip_model_loader: tensor[605]: n_dims = 2, name = a.blk.9.conv_pw1.weight, tensor_size=294912, offset=58593280, shape:[512, 1024, 1, 1], type = q4_0
clip_model_loader: tensor[606]: n_dims = 1, name = a.blk.9.conv_pw2.bias, tensor_size=2048, offset=58888192, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[607]: n_dims = 2, name = a.blk.9.conv_pw2.weight, tensor_size=147456, offset=58890240, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[608]: n_dims = 1, name = a.blk.9.ffn_down.bias, tensor_size=2048, offset=59037696, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[609]: n_dims = 2, name = a.blk.9.ffn_down.weight, tensor_size=589824, offset=59039744, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[610]: n_dims = 1, name = a.blk.9.ffn_down_1.bias, tensor_size=2048, offset=59629568, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[611]: n_dims = 2, name = a.blk.9.ffn_down_1.weight, tensor_size=589824, offset=59631616, shape:[2048, 512, 1, 1], type = q4_0
clip_model_loader: tensor[612]: n_dims = 1, name = a.blk.9.ffn_norm.bias, tensor_size=2048, offset=60221440, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[613]: n_dims = 1, name = a.blk.9.ffn_norm.weight, tensor_size=2048, offset=60223488, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[614]: n_dims = 1, name = a.blk.9.ffn_norm_1.bias, tensor_size=2048, offset=60225536, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[615]: n_dims = 1, name = a.blk.9.ffn_norm_1.weight, tensor_size=2048, offset=60227584, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[616]: n_dims = 1, name = a.blk.9.ffn_up.bias, tensor_size=8192, offset=60229632, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[617]: n_dims = 2, name = a.blk.9.ffn_up.weight, tensor_size=589824, offset=60237824, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[618]: n_dims = 1, name = a.blk.9.ffn_up_1.bias, tensor_size=8192, offset=60827648, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[619]: n_dims = 2, name = a.blk.9.ffn_up_1.weight, tensor_size=589824, offset=60835840, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[620]: n_dims = 2, name = a.blk.9.linear_pos.weight, tensor_size=147456, offset=61425664, shape:[512, 512, 1, 1], type = q4_0
clip_model_loader: tensor[621]: n_dims = 1, name = a.blk.9.ln1.bias, tensor_size=2048, offset=61573120, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[622]: n_dims = 1, name = a.blk.9.ln1.weight, tensor_size=2048, offset=61575168, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[623]: n_dims = 1, name = a.blk.9.ln2.bias, tensor_size=2048, offset=61577216, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[624]: n_dims = 1, name = a.blk.9.ln2.weight, tensor_size=2048, offset=61579264, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[625]: n_dims = 1, name = a.blk.9.norm_conv.bias, tensor_size=2048, offset=61581312, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[626]: n_dims = 1, name = a.blk.9.norm_conv.weight, tensor_size=2048, offset=61583360, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[627]: n_dims = 2, name = a.blk.9.pos_bias_u, tensor_size=2048, offset=61585408, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[628]: n_dims = 2, name = a.blk.9.pos_bias_v, tensor_size=2048, offset=61587456, shape:[64, 8, 1, 1], type = f32
clip_model_loader: tensor[629]: n_dims = 3, name = a.conv1d.0.bias, tensor_size=1024, offset=61589504, shape:[1, 1, 256, 1], type = f32
clip_model_loader: tensor[630]: n_dims = 4, name = a.conv1d.0.weight, tensor_size=9216, offset=61590528, shape:[3, 3, 1, 256], type = f32
clip_model_loader: tensor[631]: n_dims = 3, name = a.conv1d.2.bias, tensor_size=1024, offset=61599744, shape:[1, 1, 256, 1], type = f32
clip_model_loader: tensor[632]: n_dims = 4, name = a.conv1d.2.weight, tensor_size=9216, offset=61600768, shape:[3, 3, 1, 256], type = f32
clip_model_loader: tensor[633]: n_dims = 3, name = a.conv1d.3.bias, tensor_size=1024, offset=61609984, shape:[1, 1, 256, 1], type = f32
clip_model_loader: tensor[634]: n_dims = 4, name = a.conv1d.3.weight, tensor_size=262144, offset=61611008, shape:[1, 1, 256, 256], type = f32
clip_model_loader: tensor[635]: n_dims = 3, name = a.conv1d.5.bias, tensor_size=1024, offset=61873152, shape:[1, 1, 256, 1], type = f32
clip_model_loader: tensor[636]: n_dims = 4, name = a.conv1d.5.weight, tensor_size=9216, offset=61874176, shape:[3, 3, 1, 256], type = f32
clip_model_loader: tensor[637]: n_dims = 3, name = a.conv1d.6.bias, tensor_size=1024, offset=61883392, shape:[1, 1, 256, 1], type = f32
clip_model_loader: tensor[638]: n_dims = 4, name = a.conv1d.6.weight, tensor_size=262144, offset=61884416, shape:[1, 1, 256, 256], type = f32
clip_model_loader: tensor[639]: n_dims = 2, name = a.embd_to_logits.weight, tensor_size=18883584, offset=62146560, shape:[2048, 16392, 1, 1], type = q4_0
clip_model_loader: tensor[640]: n_dims = 2, name = a.position_embd.weight, tensor_size=134283264, offset=81030144, shape:[2048, 16392, 1, 1], type = f32
clip_model_loader: tensor[641]: n_dims = 1, name = a.position_embd_norm.weight, tensor_size=8192, offset=215313408, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[642]: n_dims = 1, name = a.pre_encode.out.bias, tensor_size=2048, offset=215321600, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[643]: n_dims = 2, name = a.pre_encode.out.weight, tensor_size=1179648, offset=215323648, shape:[4096, 512, 1, 1], type = q4_0
clip_model_loader: tensor[644]: n_dims = 1, name = mm.a.mlp.0.bias, tensor_size=2048, offset=216503296, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[645]: n_dims = 1, name = mm.a.mlp.0.weight, tensor_size=2048, offset=216505344, shape:[512, 1, 1, 1], type = f32
clip_model_loader: tensor[646]: n_dims = 1, name = mm.a.mlp.1.bias, tensor_size=8192, offset=216507392, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[647]: n_dims = 2, name = mm.a.mlp.1.weight, tensor_size=589824, offset=216515584, shape:[512, 2048, 1, 1], type = q4_0
clip_model_loader: tensor[648]: n_dims = 1, name = mm.a.mlp.3.bias, tensor_size=8192, offset=217105408, shape:[2048, 1, 1, 1], type = f32
clip_model_loader: tensor[649]: n_dims = 2, name = mm.a.mlp.3.weight, tensor_size=2359296, offset=217113600, shape:[2048, 2048, 1, 1], type = q4_0
clip_ctx: CLIP using CUDA0 backend
load_hparams: projector:          lfm2a
load_hparams: n_embd:             512
load_hparams: n_head:             8
load_hparams: n_ff:               512
load_hparams: n_layer:            17
load_hparams: ffn_op:             gelu_quick
load_hparams: projection_dim:     2048

--- audio hparams ---
load_hparams: n_mel_bins:         128
load_hparams: proj_stack_factor:  0
load_hparams: audio_chunk_len:    1
load_hparams: audio_sample_rate:  16000
load_hparams: audio_n_fft:        512
load_hparams: audio_window_len:   400
load_hparams: audio_hop_len:      160

load_hparams: model size:         209.31 MiB
load_hparams: metadata size:      0.23 MiB
load_tensors: loaded 648 tensors from /opt/usbhd/models/LFM2.5-Audio-1.5B-GGUF/mmproj-LFM2.5-Audio-1.5B-Q4_0.gguf
warmup: warmup with audio size = 3000
alloc_compute_meta:      CUDA0 compute buffer size =   195.19 MiB
alloc_compute_meta:        CPU compute buffer size =     2.93 MiB
alloc_compute_meta: graph splits = 35, nodes = 1547
warmup: flash attention is enabled
warmup: *****************************************************************
warmup: WARNING: the CLIP graph uses unsupported operators by the backend
warmup:          the performance will be suboptimal
warmup:          list of unsupported ops (backend=CUDA0):
warmup:            UNARY: type = f32, ne = [512 375 1 1]
warmup:            UNARY: type = f32, ne = [512 375 1 1]
warmup:            UNARY: type = f32, ne = [512 375 1 1]
warmup:            UNARY: type = f32, ne = [512 375 1 1]
warmup:            UNARY: type = f32, ne = [512 375 1 1]
warmup:            UNARY: type = f32, ne = [512 375 1 1]
warmup:            UNARY: type = f32, ne = [512 375 1 1]
warmup:            UNARY: type = f32, ne = [512 375 1 1]
warmup:            UNARY: type = f32, ne = [512 375 1 1]
warmup:            UNARY: type = f32, ne = [512 375 1 1]
warmup:            UNARY: type = f32, ne = [512 375 1 1]
warmup:            UNARY: type = f32, ne = [512 375 1 1]
warmup:            UNARY: type = f32, ne = [512 375 1 1]
warmup:            UNARY: type = f32, ne = [512 375 1 1]
warmup:            UNARY: type = f32, ne = [512 375 1 1]
warmup:            UNARY: type = f32, ne = [512 375 1 1]
warmup:            UNARY: type = f32, ne = [512 375 1 1]
warmup: flash attention is enabled
warmup: please report this on github as an issue
warmup: ref: https://github.com/ggml-org/llama.cpp/pull/16837#issuecomment-3461676118
warmup: *****************************************************************
init_audio: audio input is in experimental stage and may have reduced quality:
    https://github.com/ggml-org/llama.cpp/discussions/13759
audio_decoder_ggml_ctx: using CUDA0 backend
audio_decoder_ggml_ctx: using GPU+CPU backend
load_gguf: Loaded 85 tensors from /opt/usbhd/models/LFM2.5-Audio-1.5B-GGUF/vocoder-LFM2.5-Audio-1.5B-Q4_0.gguf
common_chat_params_init_lfm2: Using content relying on the template
add_text: <|im_start|>system
Perform ASR.<|im_end|>
<|im_start|>user

audio_tokens->n_tokens = 32
add_text: <|im_end|>
<|im_start|>assistant

encoding audio slice...
/opt/usbhd/SRC/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:96: CUDA error
CUDA error: out of memory
  current device: 0, in function alloc at /opt/usbhd/SRC/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:475
  cuMemAddressReserve(&pool_addr, CUDA_POOL_VMM_MAX_SIZE, 0, 0, 0)
[New LWP 1544]
[New LWP 1545]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
0x0000ffffa07d9940 in __GI___wait4 (pid=<optimized out>, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30	../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.
#0  0x0000ffffa07d9940 in __GI___wait4 (pid=<optimized out>, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30	in ../sysdeps/unix/sysv/linux/wait4.c
#1  0x0000aaaac1a7034c in ggml_abort ()
#2  0x0000aaaac1cc1740 in ggml_cuda_error(char const*, char const*, char const*, int, char const*) ()
#3  0x0000aaaac1cd774c in ggml_cuda_pool_vmm::alloc(unsigned long, unsigned long*) ()
#4  0x0000aaaac1cd1318 in ggml_cuda_op_mul_mat_cublas(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, CUstream_st*) ()
#5  0x0000aaaac1ccce04 in ggml_cuda_op_mul_mat(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, void (*)(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, CUstream_st*), void (*)(float const*, int const*, void*, ggml_type, long, long, long, long, long, long, long, long, CUstream_st*)) ()
#6  0x0000aaaac1cd05a8 in ggml_cuda_compute_forward(ggml_backend_cuda_context&, ggml_tensor*) ()
#7  0x0000aaaac1cd41b8 in ggml_cuda_graph_evaluate_and_capture(ggml_backend_cuda_context*, ggml_cgraph*, bool, bool) ()
#8  0x0000aaaac1cd5bf4 in ggml_backend_cuda_graph_compute(ggml_backend*, ggml_cgraph*) ()
#9  0x0000aaaac1a82fd8 in ggml_backend_sched_compute_splits(ggml_backend_sched*) [clone .lto_priv.0] ()
#10 0x0000aaaac1a8a310 in ggml_backend_sched_graph_compute ()
#11 0x0000aaaac1b3ab10 in clip_image_batch_encode(clip_ctx*, int, clip_image_f32_batch const*, float*) ()
#12 0x0000aaaac1acd020 in mtmd_encode_chunk ()
#13 0x0000aaaac1b33d90 in mtmd_helper_eval_chunk_single ()
#14 0x0000aaaac1865714 in liquid::audio::Runner::RunnerImpl::eval_messages(std::vector<common_chat_msg, std::allocator<common_chat_msg> > const&, bool) ()
#15 0x0000aaaac1865d1c in liquid::audio::Runner::RunnerImpl::generate(std::vector<liquid::audio::Runner::Message, std::allocator<liquid::audio::Runner::Message> > const&, int, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)> const&, std::function<void (std::vector<float, std::allocator<float> > const&)> const&) ()
#16 0x0000aaaac1514194 in main ()
[Inferior 1 (process 1531) detached]
Aborted

@tdakhran
Copy link
Contributor Author

tdakhran commented Jan 7, 2026

@elfarolab , I was able to reproduce the issue on Orin Nano, will look into it

@tdakhran
Copy link
Contributor Author

tdakhran commented Jan 7, 2026

@elfarolab temporary solution will be to build without VMM cmake -DGGML_CUDA_NO_VMM=ON until I figure out how to reuse cuda context.

@elfarolab
Copy link

@elfarolab temporary solution will be to build without VMM cmake -DGGML_CUDA_NO_VMM=ON until I figure out how to reuse cuda context.

ok rebuilding now with -DGGML_CUDA_NO_VMM=ON

@tdakhran
Copy link
Contributor Author

tdakhran commented Jan 7, 2026

The alternative is to build with VMM, but reduce the VMM size to 16 GB with

--- a/ggml/src/ggml-cuda/ggml-cuda.cu
+++ b/ggml/src/ggml-cuda/ggml-cuda.cu
@@ -418,7 +418,7 @@ struct ggml_cuda_pool_leg : public ggml_cuda_pool {
 // pool with virtual memory
 #if defined(GGML_USE_VMM)
 struct ggml_cuda_pool_vmm : public ggml_cuda_pool {
-    static const size_t CUDA_POOL_VMM_MAX_SIZE = 1ull << 35; // 32 GB
+    static const size_t CUDA_POOL_VMM_MAX_SIZE = 1ull << 34; // 16 GB

     int device;
     CUdeviceptr pool_addr = 0;

@elfarolab
Copy link

@tdakhran

I found this discussion about VMM on Orin:

https://forums.developer.nvidia.com/t/high-execution-latency-of-cuda-vmm-api-on-jetson-agx-orin/325025

On Orin and on all Jetson series and DGX Spark memory is unified.
Don't know if this architecture affects the way of calling the VMM API.
Any light?

Thank you!

@tdakhran
Copy link
Contributor Author

tdakhran commented Jan 7, 2026

Orin Nano reports

Max successful VA reservation: 64 GB

lfm2.5-audio creates 4 cuda context and 4 cuda pools, with CUDA_POOL_VMM_MAX_SIZE =16GB, they fit, with CUDA_POOL_VMM_MAX_SIZE = 32GB they don't.

For 4070

Device: NVIDIA GeForce RTX 4070 Laptop GPU

Testing 4 allocations of 32 GB each:

Pool 1: OK (addr: 0x1600000000)
Pool 2: OK (addr: 0x1e00000000)
Pool 3: OK (addr: 0x2600000000)
Pool 4: OK (addr: 0x2e00000000)

=== Summary ===
Successful pools: 4 / 4
Total VA reserved: 128 GB

For Orin

Device: Orin

Testing 4 allocations of 32 GB each:

Pool 1: OK (addr: 0x3dc400000)
Pool 2: OK (addr: 0xbdc400000)
Pool 3: OK (addr: 0x13dc400000)
Pool 4: FAIL (CUDA_ERROR_OUT_OF_MEMORY)
  Trying smaller sizes...
  16 GB: OK

=== Summary ===
Successful pools: 3 / 4
Total VA reserved: 96 GB

@elfarolab
Copy link

it works with -DGGML_CUDA_NO_VMM=ON

init_audio: audio input is in experimental stage and may have reduced quality:
    https://github.com/ggml-org/llama.cpp/discussions/13759
audio_decoder_ggml_ctx: using CUDA0 backend
audio_decoder_ggml_ctx: using GPU+CPU backend
load_gguf: Loaded 85 tensors from /opt/usbhd/models/LFM2.5-Audio-1.5B-GGUF/vocoder-LFM2.5-Audio-1.5B-Q4_0.gguf
common_chat_params_init_lfm2: Using content relying on the template
add_text: <|im_start|>system
Perform ASR.<|im_end|>
<|im_start|>user

audio_tokens->n_tokens = 32
add_text: <|im_end|>
<|im_start|>assistant

encoding audio slice...
audio slice encoded in 112 ms
decoding audio batch 1/1, n_tokens_batch = 32
audio decoded (batch 1/1) in 4 ms

The birch canoe slid on the smooth planks.

llama_perf_context_print:        load time =   14571.35 ms
llama_perf_context_print: prompt eval time =     222.83 ms /    51 tokens (    4.37 ms per token,   228.87 tokens per second)
llama_perf_context_print:        eval time =      89.08 ms /    12 runs   (    7.42 ms per token,   134.70 tokens per second)
llama_perf_context_print:       total time =     351.77 ms /    63 tokens
llama_perf_context_print:    graphs reused =         11
audio samples per second:        nan
text  tokens  per second:      144.6

=== GENERATED TEXT ===
The birch canoe slid on the smooth planks.

@elfarolab
Copy link

elfarolab commented Jan 7, 2026

Orin Nano reports

Max successful VA reservation: 64 GB

lfm2.5-audio creates 4 cuda context and 4 cuda pools, with CUDA_POOL_VMM_MAX_SIZE =16GB, they fit, with CUDA_POOL_VMM_MAX_SIZE = 32GB they don't.

For 4070

Device: NVIDIA GeForce RTX 4070 Laptop GPU

Testing 4 allocations of 32 GB each:

Pool 1: OK (addr: 0x1600000000)
Pool 2: OK (addr: 0x1e00000000)
Pool 3: OK (addr: 0x2600000000)
Pool 4: OK (addr: 0x2e00000000)

=== Summary ===
Successful pools: 4 / 4
Total VA reserved: 128 GB

For Orin

Device: Orin

Testing 4 allocations of 32 GB each:

Pool 1: OK (addr: 0x3dc400000)
Pool 2: OK (addr: 0xbdc400000)
Pool 3: OK (addr: 0x13dc400000)
Pool 4: FAIL (CUDA_ERROR_OUT_OF_MEMORY)
  Trying smaller sizes...
  16 GB: OK

=== Summary ===
Successful pools: 3 / 4
Total VA reserved: 96 GB

Do you suggest to use -DGGML_CUDA_NO_VMM=ON or rather to reduce the VMM memory size with: CUDA_POOL_VMM_MAX_SIZE =16GB

@tdakhran
Copy link
Contributor Author

tdakhran commented Jan 7, 2026

Do you suggest to use -DGGML_CUDA_NO_VMM=ON or rather to reduce the VMM memory size with: CUDA_POOL_VMM_MAX_SIZE =16GB

I suggest trying both and picking the option that works faster

@elfarolab
Copy link

elfarolab commented Jan 7, 2026

@tdakhran

I can confirm that also ASR with llama-liquid-audio-server works as expected in streaming mode only.
Testing with 11 sec duration wav audio, the generated text was correct.
In future I would like to use the llama-server router mode.

Do you plan to integrate your changes to llama-server when everything is working?

Thank you so much.

@tdakhran
Copy link
Contributor Author

tdakhran commented Jan 7, 2026

@elfarolab , thanks for testing!

I can confirm that also ASR with llama-liquid-audio-server

Note that ASR works with the existing llama-cli from the main branch.

Do you plan to integrate your changes to llama-server when everything is working?

We are working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples model Model specific python python script changes server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants