Name and Version
I use ollama to run this model but something is wrong. and it show like that
llama_new_context_with_model: graph splits = 2
Launch params (1024, 1, 1) are larger than launch bounds (256) for kernel _ZL12rms_norm_f32ILi1024EEvPKfPfif please add launch_bounds to kernel define or use --gpu-max-threads-per-block recompile program !
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
No response
Problem description & steps to reproduce
I use ollama to run this model but something is wrong. and it show like that
llama_new_context_with_model: graph splits = 2
Launch params (1024, 1, 1) are larger than launch bounds (256) for kernel _ZL12rms_norm_f32ILi1024EEvPKfPfif please add launch_bounds to kernel define or use --gpu-max-threads-per-block recompile program !
First Bad Commit
No response
Relevant log output
No response