Replies: 2 comments 1 reply
-
|
You can see the graph splits using |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
There's a relatively new (I don't know when it got added exactly) command |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello, hope you guys are fine.
I was wondering if it was possible to, when using -fit, if it existed something among the lines to the commandline with manual -ot would do.
I.e., I can load a model with this
./llama-server -m '/models_llm_2tb/Kimi-K2.5-IQ3_S-00001-of-00010.gguf' -c 32768 --no-mmap -mg 0 -ub 2560 -b 2560And it works just fine with fit (since it is enabled by default), using about 250GB VRAM and the rest on RAM.
But I also manually did it with:
Got almost the same weights when loading with this, but took some hours to do and such.
Like i.e.
Using:
./llama-server -m '/models_llm_2tb/Kimi-K2.5-IQ3_S-00001-of-00010.gguf' -c 32768 --no-mmap -mg 0 -ub 2560 -b 2560 --output-otAnd then you would get something like:
-ot "blk.(0|1|2|3).ffn.=CUDA0,blk.(4|5|6|7).ffn.=CUDA1,blk.(8|9|10|11).ffn.=CUDA2,blk.(12|13|14|15).ffn.=CUDA3,blk.(16|17|18).ffn.=CUDA4,blk.(19|20|21).ffn.=CUDA5,blk.(22|23|24|25|26|27).ffn.=CUDA6,blk.(28|29|30|31|32|33).ffn.=CUDA7,blk.34.ffn_(norm|gate_inp|gate_shexp|down_shexp|up_shexp).weight=CUDA0,blk.34.ffn_down_exps.weight=CUDA0,blk.34.ffn_gate_exps.weight=CUDA6,blk.34.ffn_up_exps.weight=CUDA7,blk.35.ffn_gate_exps.weight=CUDA0,blk.35.ffn_up_exps.weight=CUDA0,blk.36.ffn.=CUDA7,blk.37.ffn_gate_exps.weight=CUDA6,blk.37.ffn_down_exps.weight=CUDA5,blk.37.ffn_up_exps.weight=CUDA2,blk.38.ffn_gate_exps.weight=CUDA3,blk.38.ffn_down_exps.weight=CUDA6,blk.38.ffn_up_exps.weight=CUDA1,blk.39.ffn_gate_exps.weight=CUDA4,exps=CPU"So the idea is to use the first, and with an extra flag, output the -ot string to use manually and maybe do some manual adjustment.
Hope I explained myself.
Beta Was this translation helpful? Give feedback.
All reactions