fixed typo by moritzbrantner · Pull Request #178 · ggml-org/llama.cpp

moritzbrantner · 2023-03-15T20:30:25Z

No description provided.

Add Zig port of project

…quency-penalty Document presence frequency penalty

* Try interleaving 8 rows for iq4_xs On Zen4, PP-512 goes up from ~260 t/s to 288 t/s for L3-8B. TG-128 reaches max. performance at 2 threads and is slightly higher than 4 interleaved rows (14.48 t/s vs 13.11 t/s @ 2 threads and 14/28 t/s @ 4 threads). * Try interleaving 8 iq4_xs rows It is also faster on AVX2. This is the NEON implementation. It is tiny bit faster than 4 interleaved rows (~0.5%). So, this looks like a winner given the Zen4/AVX2 improvement without associated NEON egression. * Cleanup * 8-rows interleaved q8_0 (AVX2) * 8-rows interleaved q8_0 (Zen4) * 8-rows interleaved q8_0 (Zen4) - slightly better PP-512 is now 284 t/s compared to 257 t/s for 4-rows interleaved. TG-128 reaches peak of 8.16 t/s at just 2 threads compared to 7.95 t/s @ 4 threads before. * 8-rows interleaved q8_0 (NEON) PP-512 is slightly better (138 t/s vs 132.5 t/s), TG-128 is about the same. * FA: repack Q8_0 to Q8_0_R8 * Remove special purpose mul_mat_q8_0_r4_q8_1_128 (Zen4) * FA: repack Q8_0 to Q8_0_R8 (NEON) Very slightly faster than the general purpose gemm, slightly slower than the D = 128 special case gemm mul_mat_q8_0_r4_q8_0_128. Still removing mul_mat_q8_0_r4_q8_0_128 as we simply don't have enough vector registers to hold 8 interleaved rows, so there is no point to have the special purpose implementation. --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

fixed typo

bb034e5

ggerganov merged commit 27944c4 into ggml-org:master Mar 15, 2023

gjmulder added the bug Something isn't working label Mar 15, 2023

blackhole89 pushed a commit that referenced this pull request Mar 15, 2023

fixed typo (#178)

23ac443

rooprob pushed a commit to rooprob/llama.cpp that referenced this pull request Aug 2, 2023

Merge pull request ggml-org#178 from cgbur/patch-1

ce05cc2

Add Zig port of project

Deadsg pushed a commit to Deadsg/llama.cpp that referenced this pull request Dec 19, 2023

Merge pull request ggml-org#178 from Stonelinks/document-presence-fre…

3c96b43

…quency-penalty Document presence frequency penalty

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixed typo#178

fixed typo#178
ggerganov merged 1 commit intoggml-org:masterfrom
moritzbrantner:master

moritzbrantner commented Mar 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

moritzbrantner commented Mar 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants