Merge upstream by pi6am · Pull Request #5 · pi6am/koboldcpp

pi6am · 2024-08-14T16:34:03Z

No description provided.

* llama : refactor session file management * llama : saving and restoring state checks for overflow The size of the buffers should now be given to the functions working with them, otherwise a truncated file could cause out of bound reads. * llama : stream from session file instead of copying into a big buffer Loading session files should no longer cause a memory usage spike. * llama : llama_state_get_size returns the actual size instead of max This is a breaking change, but makes that function *much* easier to keep up to date, and it also makes it reflect the behavior of llama_state_seq_get_size. * llama : share code between whole and seq_id-specific state saving Both session file types now use a more similar format. * llama : no longer store all hparams in session files Instead, the model arch name is stored. The layer count and the embedding dimensions of the KV cache are still verified when loading. Storing all the hparams is not necessary. * llama : fix uint64_t format type * llama : various integer type cast and format string fixes Some platforms use "%lu" and others "%llu" for uint64_t. Not sure how to handle that, so casting to size_t when displaying errors. * llama : remove _context suffix for llama_data_context * llama : fix session file loading llama_state_get_size cannot be used to get the max size anymore. * llama : more graceful error handling of invalid session files * llama : remove LLAMA_MAX_RNG_STATE It's no longer necessary to limit the size of the RNG state, because the max size of session files is not estimated anymore. * llama : cast seq_id in comparison with unsigned n_seq_max

…CLI options (ggml-org#8477) * chore: Fix compiler warnings, add help text, improve CLI options * Add prototypes for function definitions * Invert logic of --no-clean option to be more intuitive * Provide a new help prompt with clear instructions * chore : Add ignore rule for vulkan shader generator Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com> * Update ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp Co-authored-by: 0cc4m <picard12@live.de> * chore : Remove void and apply C++ style empty parameters * chore : Remove void and apply C++ style empty parameters --------- Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com> Co-authored-by: 0cc4m <picard12@live.de>

) The token immediately before an eot token was lost when SSE streaming was enabled if that token was contained entirely within a stop sequence. As an example of when this could happen, consider this prompt: Type the phrase 'pleas' once. In a Llama 3-derived model, 'pleas' tokenizes as 'ple' 'as'. The token 'as' is contained within this instruct mode stop sequence: <|eot_id|><|start_header_id|>assistant<|end_header_id|> due to the word 'assistant'. Since `string_contains_sequence_substring` returns True for 'as', this token is added to `tokenReserve` instead of being streamed immediately. If the '<|eot_id|>' token was generated next, the text in `tokenReserve` would be discarded.

…org#8746) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

…ggml-org#8748) In these codes, we want to retain the value that they previously held when mask[i] is false. So we should use undisturbed. With the default agnostic policy of rvv intrinsic, these values can be held or be written with 1s. Co-authored-by: carter.li <carter.li@starfivetech.com>

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

# Conflicts: # .github/workflows/build.yml # .gitignore # flake.lock # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt

…g#8751) * added android implementation of ggml_print_backtrace_symbols * Update ggml/src/ggml.c Co-authored-by: slaren <slarengh@gmail.com> * Update ggml/src/ggml.c Co-authored-by: slaren <slarengh@gmail.com> * Update ggml/src/ggml.c Co-authored-by: slaren <slarengh@gmail.com> * Update ggml/src/ggml.c Co-authored-by: slaren <slarengh@gmail.com> * Update ggml/src/ggml.c Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>

…gml-org#8774) * gguf_writer.py: add_array() should not add to kv store if empty * Apply suggestions from code review I was wondering if there was a specific reason for `if val` but good to hear we can safely use `len(val == 0` Co-authored-by: compilade <git@compilade.net> --------- Co-authored-by: compilade <git@compilade.net>

Listing individual outputs no longer necessary to reduce the runtime closure size after NixOS/nixpkgs#323056.

* Adding Gemma 2 2B configs Updates to Q scaling and Gemma 2 model sizes to match v2 2B model. * Update src/llama.cpp Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>

@fairydreaming

* Fix potential race condition as pointed out by @fairydreaming in ggml-org#8776 * Reference the .o rather than rebuilding every time. * Adding in CXXFLAGS and LDFLAGS * Removing unnecessary linker flags.

) Fixes ggml-org#8763

# Conflicts: # .devops/nix/package.nix # CMakeLists.txt # Makefile

Squashed commits: [13f42f83] Added vulkan support for SD

* cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X * update asserts * only use dmmv for supported types * add test

…-org#8783) * Only enable backtrace on GLIBC linux systems * fix missing file from copy * use glibc macro instead of defining a custom one

* Adding support for unified memory * adding again the documentation about unified memory * refactoring: Moved the unified memory code in the correct location. * Fixed compilation error when using hipblas * cleaning up the documentation * Updating the documentation Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * adding one more case where the PR should not be enabled --------- Co-authored-by: matteo serva <matteo.serva@gmail.com> Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

# Conflicts: # docs/build.md # ggml/src/ggml.c # tests/test-backend-ops.cpp

* OSX attempt 1 * OSX Pyinstaller * Update kcpp-build-release-osx.yaml * Update kcpp-build-release-osx.yaml * Update kcpp-build-release-osx.yaml * Add .metal file * Update kcpp-build-release-osx.yaml * Polish Mac (cherry picked from commit 52cc0da)

compilade and others added 30 commits July 28, 2024 00:42

bump size of some payload arr sequences from 16 to 24

f289fb4

cmake: use 1 more thread for non-ggml in CI (ggml-org#8740)

6eeaeba

[SYCL] add conv support (ggml-org#8688)

0832de7

improvements to auto layer calcs

e39b8aa

do not offload if auto layers is less than 2, as its usually slower

948646f

more bugfixes in auto gpu layers selection

102eec3

cuda : organize vendor-specific headers into vendors directory (ggml-…

439b3fc

…org#8746) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

[SYCL] Add TIMESTEP_EMBEDDING OP (ggml-org#8707)

c887d8b

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

cann: update cmake (ggml-org#8765)

6e2b600

flake.lock: Update (ggml-org#8729)

140074b

add magnum to colab models

1df850c

hack to fix bad unicode fragments corrupting streamed output

43c55bb

Merge branch 'upstream' into concedo_experimental

bf35652

# Conflicts: # .github/workflows/build.yml # .gitignore # flake.lock # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt

Merge branch 'upstream' into concedo_experimental

265f37f

we don't need this

7e95b80

if gpuid is specified, force specific order

2f04f84

also apply even if tensor split is set

9a04060

nix: cuda: rely on propagatedBuildInputs (ggml-org#8772)

268c566

Listing individual outputs no longer necessary to reduce the runtime closure size after NixOS/nixpkgs#323056.

cmake : fix use of external ggml (ggml-org#8787)

44d28dd

Adding Gemma 2 2B configs (ggml-org#8784)

398ede5

* Adding Gemma 2 2B configs Updates to Q scaling and Gemma 2 model sizes to match v2 2B model. * Update src/llama.cpp Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>

Build: Fix potential race condition (ggml-org#8781)

ed9d285

* Fix potential race condition as pointed out by @fairydreaming in ggml-org#8776 * Reference the .o rather than rebuilding every time. * Adding in CXXFLAGS and LDFLAGS * Removing unnecessary linker flags.

server : update llama-server embedding flag documentation (ggml-org#8779

afbbcf3

) Fixes ggml-org#8763

cann: support q8_0 for Ascend� backend (ggml-org#8805)

c8a0090

Merge branch 'upstream' into concedo_experimental

101efb6

# Conflicts: # .devops/nix/package.nix # CMakeLists.txt # Makefile

LostRuins and others added 9 commits August 1, 2024 17:12

Added vulkan support for SD (+1 squashed commits)

3a72410

Squashed commits: [13f42f83] Added vulkan support for SD

Merge branch 'upstream' into concedo_experimental

81bddc2

cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (ggml-org#8800)

7a11eb3

* cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X * update asserts * only use dmmv for supported types * add test

Build: Only include execinfo.h on linux systems that support it (ggml…

b7a08fd

…-org#8783) * Only enable backtrace on GLIBC linux systems * fix missing file from copy * use glibc macro instead of defining a custom one

[SYCL] Fixing wrong VDR iq4nl value (ggml-org#8812)

0fbbd88

Merge branch 'upstream' into concedo_experimental

804d352

# Conflicts: # docs/build.md # ggml/src/ggml.c # tests/test-backend-ops.cpp

Mac builds (#1037)

0d534d8

* OSX attempt 1 * OSX Pyinstaller * Update kcpp-build-release-osx.yaml * Update kcpp-build-release-osx.yaml * Update kcpp-build-release-osx.yaml * Add .metal file * Update kcpp-build-release-osx.yaml * Polish Mac (cherry picked from commit 52cc0da)

fix typo

c710874

pi6am merged commit 25ed6d8 into pi6am:concedo Aug 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge upstream#5

Merge upstream#5
pi6am merged 39 commits intopi6am:concedofrom
LostRuins:concedo

pi6am commented Aug 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

pi6am commented Aug 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants