Merge upstream by pi6am · Pull Request #3 · pi6am/koboldcpp

pi6am · 2024-07-27T04:22:50Z

No description provided.

* Add `JAIS` model(s) * cleanup * address review comments * remove hack * un-hardcode max-alibi-bias * minor tweaks --------- Co-authored-by: fmz <quic_fzaghlou@quic.com>

…itorconfig step of CI. (ggml-org#8258)

… upgrade / migration confusion arising from ggml-org#7809. (ggml-org#8257)

…ggml-org#8261)

* Single load for half2 * Store scales in local mem * Vec load quantized values

Co-authored-by: Judd <foldl@boxvest.com>

* ppl : fix n_seq_max for perplexity * use 1 seq for kl_divergence

ggml-ci

* llama : suppress unref var in Windows MSVC This commit suppresses two warnings that are currently generated for src/llama.cpp when building on Windows MSVC ```console C:\llama.cpp\src\llama.cpp(14349,45): warning C4101: 'ex': unreferenced local variable [C:\llama.cpp\build\src\llama.vcxproj] C:\llama.cpp\src\llama.cpp(19285,44): warning C4101: 'e': unreferenced local variable [C:\llama.cpp\build\src\llama.vcxproj] ``` * Update src/llama.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

This commit adds the compile definition `_CRT_SECURE_NO_WARNINGS` to the root cmake subproject. The motivation for this is that currently the following warnings are displayed when compiling the tests and common cmake subprojects: ```console test-llama-grammar.cpp C:\llama.cpp\src\.\llama.cpp(1406,77): warning C4996: 'strerror': This function or variable may be unsafe. Consider using strerror_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details. [C:\llama.cpp\build\tests\test-llama-grammar.vcxproj] ... ``` This compile definition is currently set for the `src` subproject and this change moves into the root cmake project so that it is applied to all cmake subprojects.

* llama : add inference support and model types for T5 and FLAN-T5 model families * llama : add new API functions to support encoder-decoder models: llama_encode(), llama_model_has_encoder(), llama_model_decoder_start_token() * common, llama-cli, llama-batched : add support for encoder-decoder models * convert-hf : handle shared token embeddings tensors in T5Model * convert-hf : add support for SentencePiece BPE tokenizer in T5Model (for Pile-T5 models) * convert-hf : add MT5ForConditionalGeneration and UMT5ForConditionalGeneration to architectures supported by T5Model * convert : add t5 tokenizer tests, use "slow" HF tokenizer for t5 --------- Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Not namespaced though :(

This commit adds a new option to the tokenize example, --show-count. When this is set the total number of tokens are printed to stdout. This was added as an option as I was concerned that there might be scripts that use the output from this program and it might be better to not print this information by default. The motivation for this is that can be useful to find out how many tokens a file contains, for example when trying to determine prompt input file sizes for testing. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* Initial OpenELM support (270M only so far) * Fill out missing entries in llama_model_type_name * fixup! Initial OpenELM support (270M only so far) Fix formatting * llama : support all OpenELM models * llama : add variable GQA and variable FFN sizes Some metadata keys can now also be arrays to support setting their value per-layer for models like OpenELM. * llama : minor spacing changes Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * llama : use std::array for per-layer hparams * llama : fix save/load state * llama : do not print hparams for vocab-only models * llama : handle n_head == 0 * llama : use const ref for print_f and fix division by zero * llama : fix t5 uses of n_head and n_ff * llama : minor comment --------- Co-authored-by: Francis Couture-Harpin <git@compilade.net> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* main: add need_insert_eot * do not format system prompt if it is empty

Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>

* py : switch to snake_case ggml-ci * cont ggml-ci * cont ggml-ci * cont : fix link * gguf-py : use snake_case in scripts entrypoint export * py : rename requirements for convert_legacy_llama.py Needed for scripts/check-requirements.sh --------- Co-authored-by: Francis Couture-Harpin <git@compilade.net>

* fix group_norm ut * split softmax * fix softmax * add concat support condition * revert debug code * move QK_WARP_SIZE to presets.hpp

* Superflous parens in conditionals were removed. * Unused args in function were removed. * Replaced unused `idx` var with `_` * Initializing file_format and format_version attributes * Renaming constant to capitals * Preventing redefinition of the `f` var Signed-off-by: Jiri Podivin <jpodivin@redhat.com>

* Adding SmolLM Pre Tokenizer * Update convert_hf_to_gguf_update.py Co-authored-by: compilade <git@compilade.net> * Update src/llama.cpp Co-authored-by: compilade <git@compilade.net> * handle regex * removed .inp and out .out ggufs --------- Co-authored-by: compilade <git@compilade.net>

* llama : fix codeshell support * llama : move codeshell after smollm below to respect the enum order

* contrib : clarify PR squashing * contrib : fix typo + add list of modules

The check gating the use of `__builtin_amdgc_sdot4` specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using `HSA_OVERRIDE_GFX_VERSION` (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.

* Fix Vulkan matmul tests compile errors * Add Vulkan IQ4_NL support * Fix Vulkan DeepSeek-Coder-V2-Lite MoE support

…g#8508) * llama : move sampling code into llama-sampling ggml-ci * llama : move grammar code into llama-grammar ggml-ci * cont ggml-ci * cont : pre-fetch rules * cont ggml-ci * llama : deprecate llama_sample_grammar * llama : move tokenizers into llama-vocab ggml-ci * make : update llama.cpp deps [no ci] * llama : redirect external API to internal APIs ggml-ci * llama : suffix the internal APIs with "_impl" ggml-ci * llama : clean-up

…experimental # Conflicts: # .github/workflows/build.yml # CONTRIBUTING.md # README.md # flake.lock # tests/CMakeLists.txt # tests/test-backend-ops.cpp

* Update cmake to support nvidia hardware & open-source compiler --------- Signed-off-by: Joe Todd <joe.todd@codeplay.com>

# Conflicts: # Makefile # Package.swift # src/CMakeLists.txt # src/llama.cpp # tests/test-grammar-integration.cpp # tests/test-llama-grammar.cpp

* fix export-lora example * add more logging * reject merging subset * better check * typo

# Conflicts: # Makefile # ggml/src/CMakeLists.txt

* fix `llama_chat_format_single` for mistral * fix typo * use printf

Ensure SYCL CI builds both static & dynamic libs for testing purposes Signed-off-by: Joe Todd <joe.todd@codeplay.com>

# Conflicts: # .devops/llama-cli-intel.Dockerfile # .devops/llama-server-intel.Dockerfile # README.md # ggml/src/CMakeLists.txt # tests/test-chat-template.cpp

fmz and others added 30 commits July 2, 2024 16:36

Add JAIS model(s) (ggml-org#8118)

9689673

* Add `JAIS` model(s) * cleanup * address review comments * remove hack * un-hardcode max-alibi-bias * minor tweaks --------- Co-authored-by: fmz <quic_fzaghlou@quic.com>

Removes multiple newlines at the end of files that is breaking the ed…

07a3fc0

…itorconfig step of CI. (ggml-org#8258)

Adding step to clean target to remove legacy binary names to reduce…

3e2618b

… upgrade / migration confusion arising from ggml-org#7809. (ggml-org#8257)

fix: add missing short command line argument -mli for multiline-input (…

a27152b

…ggml-org#8261)

Dequant improvements rebase (ggml-org#8255)

fadde67

* Single load for half2 * Store scales in local mem * Vec load quantized values

fix typo (ggml-org#8267)

f8d6a23

Co-authored-by: Judd <foldl@boxvest.com>

fix phi 3 conversion (ggml-org#8262)

916248a

adjust some defaults and gui launcher

3fdbe33

ppl : fix n_seq_max for perplexity (ggml-org#8277)

5f2d4e6

* ppl : fix n_seq_max for perplexity * use 1 seq for kl_divergence

Define and optimize RDNA1 (ggml-org#8085)

d23287f

[SYCL] Remove unneeded semicolons (ggml-org#8280)

f619024

improvements to model downloader and chat completions adapter loader

6b07565

convert : fix gemma v1 tokenizer convert (ggml-org#8248)

20fc380

ggml-ci

build(python): Package scripts with pip-0517 compliance

b0a4699

fix: Actually include scripts in build

b1c3f26

Not namespaced though :(

fix: Update script paths in CI scripts

8219229

chore: ignore all __pychache__

de14e2e

chore: Fixup requirements and build

07786a6

chore: Remove rebase artifacts

01a5f06

doc: Add context for why we add an explicit pytorch source

1e92001

build: Export hf-to-gguf as snakecase

51d2eba

cli: add EOT when user hit Ctrl+C (ggml-org#8296)

a38b884

* main: add need_insert_eot * do not format system prompt if it is empty

rm get_work_group_size() by local cache for performance (ggml-org#8286)

f09b7cb

Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>

[SYCL] Fix WARP_SIZE=16 bug of Intel GPU (ggml-org#8266)

a9554e2

* fix group_norm ut * split softmax * fix softmax * add concat support condition * revert debug code * move QK_WARP_SIZE to presets.hpp

jpodivin and others added 29 commits July 22, 2024 23:44

llama : fix codeshell support (ggml-org#8599)

081fe43

* llama : fix codeshell support * llama : move codeshell after smollm below to respect the enum order

[SYCL] fix scratch size of softmax (ggml-org#8642)

063d99a

contrib : clarify PR squashing + module names (ggml-org#8630)

e7e6487

* contrib : clarify PR squashing * contrib : fix typo + add list of modules

Vulkan IQ4_NL Support (ggml-org#8613)

751fcfc

* Fix Vulkan matmul tests compile errors * Add Vulkan IQ4_NL support * Fix Vulkan DeepSeek-Coder-V2-Lite MoE support

add a tiny amount of padding

c80d5af

Merge commit '751fcfc6c33ea5f43cadd4d976f8fb176871df5e' into concedo_…

c81d162

…experimental # Conflicts: # .github/workflows/build.yml # CONTRIBUTING.md # README.md # flake.lock # tests/CMakeLists.txt # tests/test-backend-ops.cpp

sycl : Add support for non-release DPC++ & oneMKL (ggml-org#8644)

64cf50a

* Update cmake to support nvidia hardware & open-source compiler --------- Signed-off-by: Joe Todd <joe.todd@codeplay.com>

server : fix URL.parse in the UI (ggml-org#8646)

b841d07

Merge branch 'upstream' into concedo_experimental

eb5b4d0

# Conflicts: # Makefile # Package.swift # src/CMakeLists.txt # src/llama.cpp # tests/test-grammar-integration.cpp # tests/test-llama-grammar.cpp

examples : Fix llama-export-lora example (ggml-org#8607)

de28008

* fix export-lora example * add more logging * reject merging subset * better check * typo

update lite, try fix ci

44ef87f

remove extra padding for layer guessing

c76f340

Merge branch 'upstream' into concedo_experimental

01d5175

# Conflicts: # Makefile # ggml/src/CMakeLists.txt

add llama_lora_adapter_clear (ggml-org#8653)

b115105

Re-add erroneously removed -fsycl from GGML_EXTRA_LIBS (ggml-org#8667)

79167d9

llama : fix llama_chat_format_single for mistral (ggml-org#8657)

96952e7

* fix `llama_chat_format_single` for mistral * fix typo * use printf

fix broken template, updated lite

b7fc8e6

readme : update UI list [no ci] (ggml-org#8505)

3a7ac53

Build Llama SYCL Intel with static libs (ggml-org#8668)

f19bf99

Ensure SYCL CI builds both static & dynamic libs for testing purposes Signed-off-by: Joe Todd <joe.todd@codeplay.com>

adjusted layer estimation

e28c42d

Merge branch 'upstream' into concedo_experimental

cca2fa9

# Conflicts: # .devops/llama-cli-intel.Dockerfile # .devops/llama-server-intel.Dockerfile # README.md # ggml/src/CMakeLists.txt # tests/test-chat-template.cpp

adjusted layer estimation

d1f7832

fixed order of selection

0024d9d

fixed dict loading

57a98ba

revert num old cpu for ci

a84f7c5

pi6am merged commit 5a25d61 into pi6am:concedo Jul 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge upstream#3

Merge upstream#3
pi6am merged 250 commits intopi6am:concedofrom
LostRuins:concedo

pi6am commented Jul 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

pi6am commented Jul 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants