Merge upstream by pi6am · Pull Request #4 · pi6am/koboldcpp

pi6am · 2024-07-29T05:23:58Z

No description provided.

Added link to game I made that depends on llama

* use sliding window for phi3 * fix typo, "data_swa" -> "data" * [conver_hf_to_gguf.py] add phi3 sliding window

* docfix: imatrix readme, quantum models -> quantized models. * docfix: server readme: quantum models -> quantized models.

…8669) * examples : remove finetune and train-text-from-scratch * fix build * update help message * fix small typo for export-lora

--------- Signed-off-by: Chen Xi <xi2chen@intel.com> Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>

* Improvements for Windows with Snapdragon X * Revert "Improvements for Windows with Snapdragon X" This reverts commit bf21397. * Improvements for Windows with Snapdragon X * WOA build clarifications * WIndows on ARM build clarifications * cmake build for Windows clarifications * Update docs/build.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: AndreasKunar <andreaskmsn.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

ggml-ci

`ggml_init` can fail if no unused context is found. In that case, a NULL-pointer deref will happen later in the code during a call to `ggml_set_on_alloc`. This fixes it by bailing out if no context is found.

…rg#8687)

* server : add Speech Recognition & Synthesis to UI * server : add Speech Recognition & Synthesis to UI (fixes)

usage of `aclrtGetMemInfo` is correct: https://www.hiascend.com/doc_center/source/zh/canncommercial/63RC2/inferapplicationdev/aclcppdevg/aclcppdevg_03_0103.html Co-authored-by: Judd <foldl@boxvest.com>

* ggml : reduce hash table reset cost * fix unreachable code warnings after GGML_ASSERT(false) * GGML_ASSERT(false) -> GGML_ABORT("fatal error") * GGML_ABORT use format string

* cann: fix multi-npu exec error * cann: update comment for ggml_backend_cann_supports_buft

This commit adds a --no-warmup option for llama-cli. The motivation for this is that it can be convenient to skip the warmup llama_decode call when debugging. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

…org#8622) * llama : model-based max number of graph nodes ggml-ci * llama : disable 405B max_nodes path due to lack of complaints ggml-ci

* Add llama 3.1 rope scaling factors to llama conversion and inference This commit generates the rope factors on conversion and adds them to the resulting model as a tensor. At inference time, these factors are passed to the `ggml_rope_ext` rope oepration, improving results for context windows above 8192 * Update convert_hf_to_gguf.py Co-authored-by: compilade <git@compilade.net> * address comments * address comments * Update src/llama.cpp Co-authored-by: compilade <git@compilade.net> * Update convert_hf_to_gguf.py Co-authored-by: compilade <git@compilade.net> --------- Co-authored-by: compilade <git@compilade.net>

This commit removes an UNUSED macro call that is not needed as the variable n0 is used in the code and will not produce a warning. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

…ggml/885)

…893) This prevents invalid frees when destroying a partially initialized vk_buffer_struct. For example, this could happen in ggml_vk_create_buffer when running out of device memory. Co-authored-by: Tony Wasserka <neobrain@users.noreply.github.com>

…ml/895) * Add support for float16 tensors in 1d pooling operations * Add support for float16 input tensors in 2d pooling operations * code cleanup remove unnecessary casting during srow ptr initialization --------- Co-authored-by: vanaka11 <vanaka1189@gmail.com>

Apply a loop tiling technique to the generic path, which provides performance upside for ISAs with enough registers to take advantage of it. Also helps the compiler optimize this path.

ggml-ci

# Conflicts: # .devops/nix/apps.nix # .devops/tools.sh # Makefile # README.md # docs/backend/SYCL.md # docs/build.md # examples/CMakeLists.txt # ggml/include/ggml.h # src/llama-vocab.cpp # tests/test-backend-ops.cpp # tests/test-chat-template.cpp # tests/test-sampling.cpp

* Update doc for MUSA Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Add GGML_MUSA in Makefile Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Add GGML_MUSA in CMake Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * CUDA => MUSA Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * MUSA adds support for __vsubss4 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Fix CI build failure Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

This reverts commit c3aa259907a77b19bb5c94015de61b8178b9d283. (+2 squashed commit) Squashed commit: [bf2f7e7c] missing include [c3aa2599] cu11 build threads

MorganRO8 and others added 30 commits July 24, 2024 19:48

readme : update games list (ggml-org#8673)

68504f0

Added link to game I made that depends on llama

llama: use sliding window for phi3 (ggml-org#8627)

8a4bad5

* use sliding window for phi3 * fix typo, "data_swa" -> "data" * [conver_hf_to_gguf.py] add phi3 sliding window

docs : Quantum -> Quantized (ggml-org#8666)

4b0eff3

* docfix: imatrix readme, quantum models -> quantized models. * docfix: server readme: quantum models -> quantized models.

examples : remove finetune and train-text-from-scratch (ggml-org#…

be6d7c0

…8669) * examples : remove finetune and train-text-from-scratch * fix build * update help message * fix small typo for export-lora

ggml : add and use ggml_cpu_has_llamafile() (ggml-org#8664)

eddcb52

[SYCL] fix multi-gpu issue on sycl (ggml-org#8554)

ed67bcb

--------- Signed-off-by: Chen Xi <xi2chen@intel.com> Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>

fix rocminfo error

9f2076b

tests : fix printfs (ggml-org#8068)

88954f7

llama : fix build + fix fabs compile warnings (ggml-org#8683)

4226a8d

ggml-ci

ggml: handle ggml_init failure to fix NULL pointer deref (ggml-org#8692)

49ce0ab

`ggml_init` can fail if no unused context is found. In that case, a NULL-pointer deref will happen later in the code during a call to `ggml_set_on_alloc`. This fixes it by bailing out if no context is found.

examples : export-lora : fix issue with quantized base models (ggml-o…

41cd47c

…rg#8687)

server : add Speech Recognition & Synthesis to UI (ggml-org#8679)

01aec4a

* server : add Speech Recognition & Synthesis to UI * server : add Speech Recognition & Synthesis to UI (fixes)

llama : fix order of parameters (ggml-org#8706)

01245f5

usage of `aclrtGetMemInfo` is correct: https://www.hiascend.com/doc_center/source/zh/canncommercial/63RC2/inferapplicationdev/aclcppdevg/aclcppdevg_03_0103.html Co-authored-by: Judd <foldl@boxvest.com>

refactor some fields

4531ab5

ggml : reduce hash table reset cost (ggml-org#8698)

2b1f616

* ggml : reduce hash table reset cost * fix unreachable code warnings after GGML_ASSERT(false) * GGML_ASSERT(false) -> GGML_ABORT("fatal error") * GGML_ABORT use format string

no fast forward for empty prompt

729eb1e

cann: Fix Multi-NPU execution error (ggml-org#8710)

bfb4c74

* cann: fix multi-npu exec error * cann: update comment for ggml_backend_cann_supports_buft

common : add --no-warmup option for main/llama-cli (ggml-org#8712)

9d03d08

This commit adds a --no-warmup option for llama-cli. The motivation for this is that it can be convenient to skip the warmup llama_decode call when debugging. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

llama : add function for model-based max number of graph nodes (ggml-…

92090ec

…org#8622) * llama : model-based max number of graph nodes ggml-ci * llama : disable 405B max_nodes path due to lack of complaints ggml-ci

increased padding, it is still way too little but whatever

eaa7028

ggml : remove unnecessary UNUSED macro call (ggml/880)

c12b6e8

This commit removes an UNUSED macro call that is not needed as the variable n0 is used in the code and will not produce a warning. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

cmake : only enable GGML_NATIVE and x86 flags if not crosscompiling (…

d2b851b

…ggml/885)

ggml : loop tiling optimizations for scalar path (ggml/898)

a05ca93

Apply a loop tiling technique to the generic path, which provides performance upside for ISAs with enough registers to take advantage of it. Also helps the compiler optimize this path.

sync : ggml

ae7985c

ggml-ci

ggml : add missing semicolon (#0)

345c8c0

ggml-ci

scripts : sync ggml-aarch64 sources

56f20aa

ggerganov and others added 7 commits July 27, 2024 18:08

scripts : sync vulkan-shaders (#0)

5e2727f

not working

01afb28

fix for older phi3 models without swa

0029e36

Revert "cu11 build threads"

edbdfbc

This reverts commit c3aa259907a77b19bb5c94015de61b8178b9d283. (+2 squashed commit) Squashed commit: [bf2f7e7c] missing include [c3aa2599] cu11 build threads

don't build rope factors from ggml-org#8676 for CLBlast as it segfaults

e47477f

pi6am merged commit 694bf6b into pi6am:concedo Jul 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge upstream#4

Merge upstream#4
pi6am merged 37 commits intopi6am:concedofrom
LostRuins:concedo

pi6am commented Jul 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

pi6am commented Jul 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants