Skip to content

Merge upstream#4

Merged
pi6am merged 37 commits intopi6am:concedofrom
LostRuins:concedo
Jul 29, 2024
Merged

Merge upstream#4
pi6am merged 37 commits intopi6am:concedofrom
LostRuins:concedo

Conversation

@pi6am
Copy link
Copy Markdown
Owner

@pi6am pi6am commented Jul 29, 2024

No description provided.

MorganRO8 and others added 30 commits July 24, 2024 19:48
Added link to game I made that depends on llama
* use sliding window for phi3

* fix typo, "data_swa" -> "data"

* [conver_hf_to_gguf.py] add phi3 sliding window
* docfix: imatrix readme, quantum models -> quantized models.

* docfix: server readme: quantum models -> quantized models.
…8669)

* examples : remove finetune and train-text-from-scratch

* fix build

* update help message

* fix small typo for export-lora

---------

Signed-off-by: Chen Xi <xi2chen@intel.com>
Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>
* Improvements for Windows with Snapdragon X

* Revert "Improvements for Windows with Snapdragon X"

This reverts commit bf21397.

* Improvements for Windows with Snapdragon X

* WOA build clarifications

* WIndows on ARM build clarifications

* cmake build for Windows clarifications

* Update docs/build.md

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: AndreasKunar <andreaskmsn.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
`ggml_init` can fail if no unused context is found. In that case, a NULL-pointer deref will happen later in the code during a call to `ggml_set_on_alloc`.

This fixes it by bailing out if no context is found.
* server : add Speech Recognition & Synthesis to UI

* server : add Speech Recognition & Synthesis to UI (fixes)
* ggml : reduce hash table reset cost

* fix unreachable code warnings after GGML_ASSERT(false)

* GGML_ASSERT(false) -> GGML_ABORT("fatal error")

* GGML_ABORT use format string
* cann: fix multi-npu exec error

* cann: update comment  for ggml_backend_cann_supports_buft
This commit adds a --no-warmup option for llama-cli.

The motivation for this is that it can be convenient to skip the
warmup llama_decode call when debugging.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
…org#8622)

* llama : model-based max number of graph nodes

ggml-ci

* llama : disable 405B max_nodes path due to lack of complaints

ggml-ci
* Add llama 3.1 rope scaling factors to llama conversion and inference

This commit generates the rope factors on conversion and adds them to the resulting model as a tensor. At inference time, these factors are passed to the `ggml_rope_ext` rope oepration, improving results for context windows above 8192

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <git@compilade.net>

* address comments

* address comments

* Update src/llama.cpp

Co-authored-by: compilade <git@compilade.net>

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <git@compilade.net>

---------

Co-authored-by: compilade <git@compilade.net>
This commit removes an UNUSED macro call that is not needed as the
variable n0 is used in the code and will not produce a warning.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
…893)

This prevents invalid frees when destroying a partially initialized
vk_buffer_struct. For example, this could happen in ggml_vk_create_buffer
when running out of device memory.

Co-authored-by: Tony Wasserka <neobrain@users.noreply.github.com>
…ml/895)

* Add support for float16 tensors in 1d pooling operations

* Add support for float16 input tensors in 2d pooling operations

* code cleanup

remove unnecessary casting during srow ptr initialization

---------

Co-authored-by: vanaka11 <vanaka1189@gmail.com>
Apply a loop tiling technique to the generic path, which provides
performance upside for ISAs with enough registers to take advantage
of it. Also helps the compiler optimize this path.
ggml-ci
ggerganov and others added 7 commits July 27, 2024 18:08
# Conflicts:
#	.devops/nix/apps.nix
#	.devops/tools.sh
#	Makefile
#	README.md
#	docs/backend/SYCL.md
#	docs/build.md
#	examples/CMakeLists.txt
#	ggml/include/ggml.h
#	src/llama-vocab.cpp
#	tests/test-backend-ops.cpp
#	tests/test-chat-template.cpp
#	tests/test-sampling.cpp
* Update doc for MUSA

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Add GGML_MUSA in Makefile

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Add GGML_MUSA in CMake

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* CUDA => MUSA

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* MUSA adds support for __vsubss4

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Fix CI build failure

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
This reverts commit c3aa259907a77b19bb5c94015de61b8178b9d283. (+2 squashed commit)

Squashed commit:

[bf2f7e7c] missing include

[c3aa2599] cu11 build threads
@pi6am pi6am merged commit 694bf6b into pi6am:concedo Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.