Skip to content

Merge upstream#5

Merged
pi6am merged 39 commits intopi6am:concedofrom
LostRuins:concedo
Aug 14, 2024
Merged

Merge upstream#5
pi6am merged 39 commits intopi6am:concedofrom
LostRuins:concedo

Conversation

@pi6am
Copy link
Copy Markdown
Owner

@pi6am pi6am commented Aug 14, 2024

No description provided.

compilade and others added 30 commits July 28, 2024 00:42
* llama : refactor session file management

* llama : saving and restoring state checks for overflow

The size of the buffers should now be given to the functions working
with them, otherwise a truncated file could cause out of bound reads.

* llama : stream from session file instead of copying into a big buffer

Loading session files should no longer cause a memory usage spike.

* llama : llama_state_get_size returns the actual size instead of max

This is a breaking change, but makes that function *much* easier
to keep up to date, and it also makes it reflect the behavior
of llama_state_seq_get_size.

* llama : share code between whole and seq_id-specific state saving

Both session file types now use a more similar format.

* llama : no longer store all hparams in session files

Instead, the model arch name is stored.
The layer count and the embedding dimensions of the KV cache
are still verified when loading.
Storing all the hparams is not necessary.

* llama : fix uint64_t format type

* llama : various integer type cast and format string fixes

Some platforms use "%lu" and others "%llu" for uint64_t.
Not sure how to handle that, so casting to size_t when displaying errors.

* llama : remove _context suffix for llama_data_context

* llama : fix session file loading

llama_state_get_size cannot be used to get the max size anymore.

* llama : more graceful error handling of invalid session files

* llama : remove LLAMA_MAX_RNG_STATE

It's no longer necessary to limit the size of the RNG state,
because the max size of session files is not estimated anymore.

* llama : cast seq_id in comparison with unsigned n_seq_max
…CLI options (ggml-org#8477)

* chore: Fix compiler warnings, add help text, improve CLI options

* Add prototypes for function definitions
* Invert logic of --no-clean option to be more intuitive
* Provide a new help prompt with clear instructions

* chore : Add ignore rule for vulkan shader generator

Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>

* Update ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp

Co-authored-by: 0cc4m <picard12@live.de>

* chore : Remove void and apply C++ style empty parameters

* chore : Remove void and apply C++ style empty parameters

---------

Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>
Co-authored-by: 0cc4m <picard12@live.de>
)

The token immediately before an eot token was lost when SSE streaming
was enabled if that token was contained entirely within a stop sequence.
As an example of when this could happen, consider this prompt:
  Type the phrase 'pleas' once.
In a Llama 3-derived model, 'pleas' tokenizes as 'ple' 'as'. The token
'as' is contained within this instruct mode stop sequence:
  <|eot_id|><|start_header_id|>assistant<|end_header_id|>
due to the word 'assistant'. Since `string_contains_sequence_substring`
returns True for 'as', this token is added to `tokenReserve` instead of
being streamed immediately. If the '<|eot_id|>' token was generated
next, the text in `tokenReserve` would be discarded.
…org#8746)

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
…ggml-org#8748)

In these codes, we want to retain the value that they previously held
when mask[i] is false. So we should use undisturbed. With the default
agnostic policy of rvv intrinsic, these values can be held or be
written with 1s.

Co-authored-by: carter.li <carter.li@starfivetech.com>
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
# Conflicts:
#	.github/workflows/build.yml
#	.gitignore
#	flake.lock
#	ggml/CMakeLists.txt
#	ggml/src/CMakeLists.txt
…g#8751)

* added android implementation of ggml_print_backtrace_symbols

* Update ggml/src/ggml.c

Co-authored-by: slaren <slarengh@gmail.com>

* Update ggml/src/ggml.c

Co-authored-by: slaren <slarengh@gmail.com>

* Update ggml/src/ggml.c

Co-authored-by: slaren <slarengh@gmail.com>

* Update ggml/src/ggml.c

Co-authored-by: slaren <slarengh@gmail.com>

* Update ggml/src/ggml.c

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
…gml-org#8774)

* gguf_writer.py: add_array() should not add to kv store if empty

* Apply suggestions from code review

I was wondering if there was a specific reason for `if val` but good to hear we can safely use `len(val == 0`

Co-authored-by: compilade <git@compilade.net>

---------

Co-authored-by: compilade <git@compilade.net>
Listing individual outputs no longer necessary to reduce the runtime closure size after NixOS/nixpkgs#323056.
* Adding Gemma 2 2B configs

Updates to Q scaling and Gemma 2 model sizes to match v2 2B model.

* Update src/llama.cpp

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
* Fix potential race condition as pointed out by @fairydreaming in ggml-org#8776

* Reference the .o rather than rebuilding every time.

* Adding in CXXFLAGS and LDFLAGS

* Removing unnecessary linker flags.
# Conflicts:
#	.devops/nix/package.nix
#	CMakeLists.txt
#	Makefile
LostRuins and others added 9 commits August 1, 2024 17:12
Squashed commits:

[13f42f83] Added vulkan support for SD
* cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X

* update asserts

* only use dmmv for supported types

* add test
…-org#8783)

* Only enable backtrace on GLIBC linux systems

* fix missing file from copy

* use glibc macro instead of defining a custom one
* Adding support for unified memory

* adding again the documentation about unified memory

* refactoring: Moved the unified memory code in the correct location.

* Fixed compilation error when using hipblas

* cleaning up the documentation

* Updating the documentation

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* adding one more case where the PR should not be enabled

---------

Co-authored-by: matteo serva <matteo.serva@gmail.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
# Conflicts:
#	docs/build.md
#	ggml/src/ggml.c
#	tests/test-backend-ops.cpp
* OSX attempt 1

* OSX Pyinstaller

* Update kcpp-build-release-osx.yaml

* Update kcpp-build-release-osx.yaml

* Update kcpp-build-release-osx.yaml

* Add .metal file

* Update kcpp-build-release-osx.yaml

* Polish Mac

(cherry picked from commit 52cc0da)
@pi6am pi6am merged commit 25ed6d8 into pi6am:concedo Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.