Skip to content

[Bug] LTX2.3 GGML assert in CUDA #1603

@askmyteapot

Description

@askmyteapot

Git commit

latest in KoboldCPP branch Concedo_experimental

Operating System & Version

Windows 11

GGML backends

CUDA

Command-line arguments used

n/a

Steps to reproduce

ggml-org/llama.cpp#24072

What you expected to happen

linking the above llama.cpp issue, as i'm not sure if its truely a bug with GGML or here.

What actually happened

see above

Logs / error messages / stack trace

C:\koboldcpp>koboldcpp.py


Welcome to KoboldCpp - Version 1.115
For command line arguments, please run --help in the terminal.
Note: The GUI mode is not accessible to screen readers.


Auto Selected CUDA Backend (flag=0)

Loading Chat Completions Adapter: C:\koboldcpp\kcpp_adapters\AutoGuess.json
Chat Completions Adapter Loaded
gemma-3-12b-it-Q4_K_S.gguf already exists, using existing file.
Auto Set Threads: 15
System: Windows 10.0.26200 AMD64 AMD64 Family 25 Model 33 Stepping 2, AuthenticAMD
Detected Available GPU Memory: 24576 MB
Detected Available RAM: 55005 MB
Initializing dynamic library: koboldcpp_cublas.dll

Namespace(model=[], model_param=None, port=5001, port_param=5001, host='', launch=False, config=None, threads=15, usecuda=['normal', '1'], usevulkan=None, usecpu=False, contextsize=8192, gpulayers=-1, tensor_split=None, autofit=False, version=False, analyze='', maingpu=-1, batchsize=512, parallelrequests=1, blasthreads=None, splitmode='layer', nommq=False, lora=None, loramult=1.0, noshift=False, nofastforward=False, noswa=False, swapadding=0, smartcache=0, ropeconfig=[0.0, 10000.0], overridenativecontext=0, usemmap=False, usemlock=False, noavx2=False, failsafe=False, debugmode=1, onready='', benchmark=None, prompt='', cli=False, genlimit=0, multiuser=10, multiplayer=False, websearch=False, remotetunnel=False, highpriority=False, foreground=False, preloadstory=None, savedatafile=None, quiet=False, ssl=None, nocertify=False, mmproj=None, mmprojcpu=False, visionmaxres=1024, visionmintokens=-1, visionmaxtokens=-1, draftmodel=None, draftamount=8, draftgpulayers=999, draftgpusplit=None, password=None, ratelimit=0, ignoremissing=False, chatcompletionsadapter='AutoGuess', jinja=False, jinja_tools=False, jinja_kwargs='', jinjatemplate='', jinjathink='default', noflashattention=False, lowvram=False, quantkv='f16', smartcontext=False, unpack='', exportconfig='', exporttemplate='', nomodel=False, moeexperts=-1, moecpu=0, defaultgenamt=1024, nobostoken=False, enableguidance=False, maxrequestsize=32, overridekv=None, overridetensors=None, showgui=False, skiplauncher=False, singleinstance=False, nopipelineparallel=False, gendefaults='', gendefaultsoverwrite=False, mcpfile=None, device='', downloaddir='', autofitpadding=1024, hordemodelname='', hordeworkername='', hordekey='', hordemaxctx=0, hordegenlen=0, sdmodel='C:/LTX2_3/ltx-2.3-22b-dev-UD-Q5_K_S.gguf', sdthreads=15, sdclamped=0, sdclampedsoft=0, sdt5xxl='', sdclip1='gemma-3-12b-it-Q4_K_S.gguf', sdclip2='C:/LTX2_3/ltx-2.3-22b-dev_embeddings_connectors.safetensors', sdphotomaker='', sdupscaler='', sdflashattention=True, sdoffloadcpu=True, sdvaedevice=-1, sdclipdevice=-1, sdconvdirect='off', sdvae='C:/LTX2_3/ltx-2.3-22b-dev_video_vae.safetensors', sdvaeauto=False, sdaudiovae='C:/LTX2_3/ltx-2.3-22b-dev_audio_vae.safetensors', sdquant=0, sdlora=[], sdloramult=[1.0], sdtiledvae=512, sdmaingpu=-1, sdvramlimit=0, whispermodel='', ttsmodel='', ttswavtokenizer='', ttsgpu=False, ttsmaxlen=4096, ttsthreads=15, ttsdir='', musicllm='', musicembeddings='', musicdiffusion='', musicvae='', musiclowvram=False, embeddingsmodel='', embeddingsmaxctx=0, embeddingsgpu=False, rpcmode='disabled', rpcport=5551, rpchost='0.0.0.0', rpcdevice='', rpctargets='', admin=False, adminpassword='', admindir='', adminunloadtimeout=0, routermode=False, reqtimeout=600, autoswapmode=False, baseconfig='', hordeconfig=None, sdconfig=None, noblas=False, nommap=False, pipelineparallel=False, sdnotile=False, sdvaecpu=False, sdclipgpu=False, forceversion=False, sdgendefaults=False, flashattention=False, useswa=False, testmemory=False, proxy_port=None)

ImageGen Init - Load Model: C:\LTX2_3\ltx-2.3-22b-dev-UD-Q5_K_S.gguf
With Custom VAE: C:\LTX2_3\ltx-2.3-22b-dev_video_vae.safetensors
With Audio VAE: C:\LTX2_3\ltx-2.3-22b-dev_audio_vae.safetensors
With Custom Clip-1 Model: C:\koboldcpp\gemma-3-12b-it-Q4_K_S.gguf
With Custom Clip-2 Model: C:\LTX2_3\ltx-2.3-22b-dev_embeddings_connectors.safetensors
Flash Attention is enabled
Offloading weights to system RAM

Swap to Diffusion Model Path:C:\LTX2_3\ltx-2.3-22b-dev-UD-Q5_K_S.gguf
Setting sd backend list to "", params backend list to "CPU"
model_path:
clip_l_path: C:\koboldcpp\gemma-3-12b-it-Q4_K_S.gguf
clip_g_path: C:\LTX2_3\ltx-2.3-22b-dev_embeddings_connectors.safetensors
clip_vision_path:
t5xxl_path:
llm_path:
llm_vision_path:
diffusion_model_path: C:\LTX2_3\ltx-2.3-22b-dev-UD-Q5_K_S.gguf
high_noise_diffusion_model_path:
embeddings_connectors_path:
vae_path: C:\LTX2_3\ltx-2.3-22b-dev_video_vae.safetensors
audio_vae_path: C:\LTX2_3\ltx-2.3-22b-dev_audio_vae.safetensors
taesd_path:
control_net_path:
photo_maker_path:
tensor_type_rules:
vae_decode_only: false
free_params_immediately: false
n_threads: 15
wtype: NONE
rng_type: cuda
sampler_rng_type: NONE
prediction: NONE
offload_params_to_cpu: true
max_vram: 0.000
backend:
params_backend: CPU
keep_clip_on_cpu: false
keep_control_net_on_cpu: false
keep_vae_on_cpu: false
flash_attn: true
diffusion_flash_attn: true
circular_x: false
circular_y: false
chroma_use_dit_mask: false
chroma_use_t5_mask: false
chroma_t5_mask_pad: 1

ggml_cuda_init: found 1 CUDA devices (Total VRAM: 48934 MiB):

Device 0: NVIDIA RTX PRO 5000 Blackwell, compute capability 12.0, VMM: yes, VRAM: 48934 MiB

ggml was not compiled with any CUDA arch warning
Found 2 backend devices:
#0: CUDA0
#1: CPU
Initializing backend: CUDA0
Initializing backend: CPU
loading diffusion model from 'C:\LTX2_3\ltx-2.3-22b-dev-UD-Q5_K_S.gguf'
load C:\LTX2_3\ltx-2.3-22b-dev-UD-Q5_K_S.gguf using gguf format
init from 'C:\LTX2_3\ltx-2.3-22b-dev-UD-Q5_K_S.gguf'
loading llm from 'C:\koboldcpp\gemma-3-12b-it-Q4_K_S.gguf'
load C:\koboldcpp\gemma-3-12b-it-Q4_K_S.gguf using gguf format
init from 'C:\koboldcpp\gemma-3-12b-it-Q4_K_S.gguf'
loading vae from 'C:\LTX2_3\ltx-2.3-22b-dev_video_vae.safetensors'
load C:\LTX2_3\ltx-2.3-22b-dev_video_vae.safetensors using safetensors format
init from 'C:\LTX2_3\ltx-2.3-22b-dev_video_vae.safetensors', prefix = 'vae.'
loading embeddings connectors from 'C:\LTX2_3\ltx-2.3-22b-dev_embeddings_connectors.safetensors'
load C:\LTX2_3\ltx-2.3-22b-dev_embeddings_connectors.safetensors using safetensors format
init from 'C:\LTX2_3\ltx-2.3-22b-dev_embeddings_connectors.safetensors', prefix = ''
loading LTX audio VAE from 'C:\LTX2_3\ltx-2.3-22b-dev_audio_vae.safetensors'
load C:\LTX2_3\ltx-2.3-22b-dev_audio_vae.safetensors using safetensors format
init from 'C:\LTX2_3\ltx-2.3-22b-dev_audio_vae.safetensors', prefix = ''
Version: LTXAV
Weight type stat: f32: 2961 | q8_0: 311 | q4_K: 326 | q5_K: 1240 | q6_K: 204 | bf16: 1531
Conditioner weight type stat: f32: 289 | q4_K: 326 | q5_K: 10 | q6_K: 1
Diffusion model weight type stat: f32: 2672 | q8_0: 311 | q5_K: 1230 | q6_K: 203 | bf16: 28
VAE weight type stat: bf16: 272
ggml tensor size = 400 bytes
Try read vocab from C:\koboldcpp/embd_res/gemma2_vocab_json.embd

Try read vocab from C:\koboldcpp/embd_res/gemma2_merges_utf8_c_str.embd

vocab size: 262144
merges size 514905
llm: num_layers = 48, vocab_size = 262208, hidden_size = 3840, intermediate_size = 15360
using VAE for encoding / decoding
Using flash attention
Using flash attention in the diffusion model
loading weights
gemma3_12b params backend buffer size = 9661.05 MB(RAM) (626 tensors)
ltxav_text_projection params backend buffer size = 2205.02 MB(RAM) (4 tensors)
ltxav params backend buffer size = 15896.60 MB(RAM) (4444 tensors)
ltx_video_vae params backend buffer size = 1385.02 MB(RAM) (170 tensors)
ltx_audio_vae params backend buffer size = 339.88 MB(RAM) (1285 tensors)
NOT using mmap for 'C:\LTX2_3\ltx-2.3-22b-dev-UD-Q5_K_S.gguf' (mmap disabled by caller)
NOT using mmap for 'C:\koboldcpp\gemma-3-12b-it-Q4_K_S.gguf' (mmap disabled by caller)
NOT using mmap for 'C:\LTX2_3\ltx-2.3-22b-dev_video_vae.safetensors' (mmap disabled by caller)
NOT using mmap for 'C:\LTX2_3\ltx-2.3-22b-dev_embeddings_connectors.safetensors' (mmap disabled by caller)
NOT using mmap for 'C:\LTX2_3\ltx-2.3-22b-dev_audio_vae.safetensors' (mmap disabled by caller)
model files processing completed in 0.01s
using 15 threads for model loading
|=================================> | 4444/6573 - 2.98GB/s

|======================================> | 5070/6573 - 2.73GB/s

|=======================================> | 5240/6573 - 2.47GB/s

|=======================================> | 5244/6573 - 2.09GB/s

|==================================================| 6573/6573 - 2.05GB/s

loading tensors completed, taking 12.20s (read: 7.40s, memcpy: 0.00s, convert: 0.63s, copy_to_backend: 0.00s)
finished loaded file
total params memory size = 29487.56MB (VRAM 0.00MB, RAM 29487.56MB): text_encoders 11866.07MB(RAM), diffusion_model 15896.60MB(RAM), vae 1724.89MB(RAM), controlnet 0.00MB(N/A), pmid 0.00MB(N/A)
running in Flux FLOW mode
Setting to Video Generation Mode!
Load Image Model OK: True
Embedded KoboldAI Lite loaded.
Embedded API docs loaded.
Embedded SDUI loaded.
Llama.cpp UI loaded.

Active Modules: ImageGeneration
Inactive Modules: TextGeneration VoiceRecognition MultimodalVision MultimodalAudio NetworkMultiplayer ApiKeyPassword WebSearchProxy TextToSpeech VectorEmbeddings AdminControl MCPBridge MusicGen RouterMode
Enabled APIs: KoboldCppApi A1111ForgeApi ComfyUiApi
Note: For third party Ollama API Emulation, you should set the port to 11434.
Starting Kobold API on port 5001 at http://localhost:5001/api/
Starting OpenAI Compatible API on port 5001 at http://localhost:5001/v1/
Starting llama.cpp secondary WebUI at http://localhost:5001/lcpp/
StableUI is available at http://localhost:5001/sdui/

Please connect to custom endpoint at http://localhost:5001
::1 - - [03/Jun/2026 08:43:45] "GET /sdapi/v1/sd-models HTTP/1.1" 200 -

Input: {"width": 1536, "height": 1024, "denoising_strength": 0.7, "frames": 129, "fps": 24, "enable_hr": false, "send_as_refimg": true, "reverse_refimg": false, "seed": 666, "cfg_scale": 5, "steps": 20, "sampler_name": "euler", "prompt": "a kobold roaching a marshmallow over a campfire", "negative_prompt": "static characters, camera pan", "init_images": [snipped], "inpainting_mask_invert": null, "inpainting_fill": null, "video_output_type": 2, "rep_pen": 1.0, "memory": "", "stop_sequence": []}

KCPP SD: Requested dimensions 1536x1024 changed to 1248x832

[08:43:45] Generating Image (20 steps)

resize input image from 1536x1024 to 1248x832

ImageGen References: RefImg=0 Wan=1 Photomaker=0

VID PROMPT:a kobold roaching a marshmallow over a campfire
NPROMPT:static characters, camera pan
CLPSKP:-1
SIZE:1248x832
STEP:20
SEED:666
STRENGTH:0.7
FRAMES:129
CTRL_FRM:0
INIT_IMGS:1

get_sigmas with LTX2 scheduler
LTX2 scheduler: tokens=17238, shift=6.7558, stretch=1, terminal=0.1000
sampling using Euler method
IMG2VID
VAE Tile size: 10x10
num tiles : 6, 4
optimal overlap : 0.420000, 0.466667 (targeting 0.500000)
processing 24 tiles
ltx_video_vae compute buffer size: 113.67 MB(VRAM)
|==================================================| 24/24 - 5.03it/s

computing vae encode graph completed, taking 4.84s
encode_first_stage completed, taking 4842 ms
split prompt "a kobold roaching a marshmallow over a campfire" to tokens ["a", "Ôûükob", "old", "Ôûüro", "aching", "Ôûüa", "Ôûümarshmallow", "Ôûüover", "Ôûüa", "Ôûücampfire", ]
gemma3_12b compute buffer size: 2153.01 MB(VRAM)
gemma3_12b offload params (9661.10 MB, 626 tensors) to runtime backend (CUDA0), taking 1.30s
ltxav_text_projection compute buffer size: 15.96 MB(VRAM)
ltxav_text_projection offload params (2205.02 MB, 4 tensors) to runtime backend (CUDA0), taking 0.23s
computing LTXAV condition graph completed, taking 2432 ms
split prompt "static characters, camera pan" to tokens ["static", "Ôûücharacters", ",", "Ôûücamera", "Ôûüpan", ]
gemma3_12b compute buffer size: 2153.01 MB(VRAM)
gemma3_12b offload params (9661.10 MB, 626 tensors) to runtime backend (CUDA0), taking 1.09s
ltxav_text_projection compute buffer size: 8.71 MB(VRAM)
ltxav_text_projection offload params (2205.02 MB, 4 tensors) to runtime backend (CUDA0), taking 0.20s
computing LTXAV condition graph completed, taking 1875 ms
get_learned_condition completed, taking 4.31s
generate_video 1248x832x129
sample 39x26x17
ltxav compute buffer size: 11668.06 MB(VRAM)
ltxav offload params (15896.60 MB, 4444 tensors) to runtime backend (CUDA0), taking 2.33sC:\koboldcpp\ggml\src\ggml-cuda\cpy.cu:229: GGML_ASSERT(grid_z < USHRT_MAX) failed

Additional context / environment details

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions