Skip to content
Discussion options

You must be logged in to vote

@milpster

Qwen3.5-397B-IQ4_KSS

8k ctx, q8_0 kv:

system_info: n_threads = 1 / 128 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
perplexity_v2: tokenizing the input ..
perplexity_v2: have 772507 tokens. Calculation chunk = 8448
perplexity_v2: calculating perplexity over 8 chunks, batch_size=2048
perplexity_v2: 15.52 seconds per pass - ETA 2.07 minutes
[1]4.6632,[2]5.2037,[3]5.1015,[4]5.0195,[5]5.1195,[6]5.2209,[7]5.2348,[8]5.2971,

llama_print_timings:        load time =   56825.87 ms
ll…

Replies: 6 comments 6 replies

Comment options

You must be logged in to vote
5 replies
@Dampfinchen
Comment options

@milpster
Comment options

@ikawrakow
Comment options

@milpster
Comment options

@magikRUKKOLA
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@magikRUKKOLA
Comment options

Answer selected by milpster
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants