Releases · ggerganov/llama.cpp

15 May 10:10

e8a7fd4

b2885 Latest

Latest

metal : support FA without mask + add asserts (#7278)

* ggml : fa without mask + add asserts

ggml-ci

* metal : support non-contiguous KV

ggml-ci

Assets 20

cudart-llama-bin-win-cu11.7.1-x64.zip

293 MB 2024-05-15T10:10:50Z
cudart-llama-bin-win-cu12.2.0-x64.zip

413 MB 2024-05-15T10:10:59Z
llama-b2885-bin-macos-arm64.zip

41 MB 2024-05-15T10:11:13Z
llama-b2885-bin-macos-x64.zip

37.6 MB 2024-05-15T10:11:15Z
llama-b2885-bin-ubuntu-x64.zip

45.9 MB 2024-05-15T10:11:17Z
llama-b2885-bin-win-arm64-x64.zip

6 MB 2024-05-15T10:11:32Z
llama-b2885-bin-win-avx-x64.zip

6.58 MB 2024-05-15T10:11:33Z
llama-b2885-bin-win-avx2-x64.zip

6.55 MB 2024-05-15T10:11:34Z
llama-b2885-bin-win-avx512-x64.zip

6.57 MB 2024-05-15T10:11:34Z
llama-b2885-bin-win-clblast-x64.zip

7.75 MB 2024-05-15T10:11:35Z
Source code (zip)

2024-05-14T16:09:30Z
Source code (tar.gz)

2024-05-14T16:09:30Z

15 May 09:34

github-actions

b2884

a5e3fde

b2884

sync : ggml

ggml-ci

Assets 20

15 May 03:33

github-actions

b2879

4f02636

b2879

server: free sampling contexts on exit (#7264)

* server: free sampling contexts on exit

This cleans up last leak found by the address sanitizer.

* fix whitespace

* fix whitespace

Assets 20

14 May 20:01

github-actions

b2878

1265c67

b2878

Revert "move ndk code to a new library (#6951)" (#7282)

This reverts commit efc8f767c8c8c749a245dd96ad4e2f37c164b54c.

Assets 20

14 May 13:10

github-actions

b2877

5e31828

b2877

ggml : add RPC backend (#6829)

* ggml : add RPC backend

The RPC backend proxies all operations to a remote server which runs a
regular backend (CPU, CUDA, Metal, etc).

* set TCP_NODELAY

* add CI workflows

* Address review comments

* fix warning

* implement llama_max_devices() for RPC

* Address review comments

* Address review comments

* wrap sockfd into a struct

* implement get_alignment and get_max_size

* add get_device_memory

* fix warning

* win32 support

* add README

* readme : trim trailing whitespace

* Address review comments

* win32 fix

* Address review comments

* fix compile warnings on macos

Assets 20

14 May 08:54

github-actions

b2876

5416002

b2876

llama : disable pipeline parallelism with nkvo (#7265)

Assets 19

14 May 08:29

github-actions

b2875

efc8f76

b2875

move ndk code to a new library (#6951)

Assets 19

14 May 09:39

github-actions

b2874

e0f5561

b2874

Add left recursion check: quit early instead of going into an infinit…

…e loop (#7083)

* Add left recursion check: quit early instead of going into an infinite loop

* Remove custom enum, rename left recursion check and move to "grammar internal" section, add handling for edge case where a leftmost nonterminal may be empty

* Remove unnecessary declaration

Assets 19

14 May 00:43

github-actions

b2871

614d3b9

b2871

llama : less KV padding when FA is off (#7257)

ggml-ci

Assets 19

13 May 23:55

github-actions

b2870

30e7033

b2870

llava-cli: fix base64 prompt (#7248)

Assets 19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggerganov/llama.cpp

b2885

b2884

b2879

b2878

b2877

b2876

b2875

b2874

b2871

b2870