ggml : new Q4 and Q5 quantization formats + backward ops #154

ggerganov · 2023-05-14T08:24:25Z

sync llama.cpp

bump GGML_QNT_VERSION -> 1
increase ggml object overhead size from 256 to 512 in examples
drop Q4_2 support
ggml_tensor.backend member (CPU / CUDA)
fix data race in multi-threaded ggml_diag_mask_inf() operator a483bb2
fix ggml_rope() when not inplace 788381e
fix ggml_rope() GPT-NeoX (hopefully) 788381e
some of the old ops are no longer implicitly inplace !!! make sure to update your code if necessary, by explicitly using inplace calls, otherwise there will be unnecessary copies of some of the tensors:
- ggml_scale() -> ggml_scale_inplace()
- ggml_diag_mask_inf() -> ggml_diag_mask_info_inplace()
- ggml_soft_max() -> ggml_soft_max_inplace()
- ggml_rope() -> ggml_rope_inplace()
- see 5839d9e

sync llama.cpp - bump GGML_QNT_VERSION -> 1 - increase cwggml object overhead size from 256 to 512 in examples - drop Q4_2 support - tensor backend support CUDA

…gerganov#154) (ggerganov#294) * Use F16 for memory_k and memory_v * add command line switch to use f16 instead of f32 for memory k+v --------- Co-authored-by: Ty Everett <ty@tyweb.us>

ggml : new Q4 and Q5 quantization formats + backward ops

df6a3d3

sync llama.cpp - bump GGML_QNT_VERSION -> 1 - increase cwggml object overhead size from 256 to 512 in examples - drop Q4_2 support - tensor backend support CUDA

ggerganov force-pushed the new-qnt branch from 48ef744 to df6a3d3 Compare May 14, 2023 08:27

ggerganov added 4 commits May 14, 2023 14:44

ggml : fix rope calculation (!inplace + GPT-NeoX mode)

788381e

ggml : fix multi-threaded ggml_compute_forward_diag_mask_f32()

a483bb2

tests : add tests from llama.cpp

5eeb19f

examples : use inplace calls explicitly

5839d9e

ggerganov merged commit 3ce3145 into master May 14, 2023

ggerganov deleted the new-qnt branch May 14, 2023 12:18

marella mentioned this pull request May 15, 2023

Starcoder / Quantized Issues marella/ctransformers#1

Closed

bluecoconut mentioned this pull request May 17, 2023

Add StarCoder/SantaCoder example #146

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : new Q4 and Q5 quantization formats + backward ops #154

ggml : new Q4 and Q5 quantization formats + backward ops #154

ggerganov commented May 14, 2023 •

edited

ggml : new Q4 and Q5 quantization formats + backward ops #154

ggml : new Q4 and Q5 quantization formats + backward ops #154

Conversation

ggerganov commented May 14, 2023 • edited

ggerganov commented May 14, 2023 •

edited