Releases · simonw/llm-llama-cpp

09 Dec 05:48

simonw

0.3 Latest

Latest

New mechanism for running GGUF files directly, using llm -m gguf. Example:

llm -m gguf \
  -o path una-cybertron-7b-v2-bf16.Q8_0.gguf \
  'Instruction: Five reasons to get a pet walrus
Response:'

This makes it much easier to try out new GGUF files, for example those released by TheBloke on Hugging Face. #26

Assets 2

28 Sep 03:22

simonw

0.2b1 Pre-release

Pre-release

max_tokens now defaults to 4000, thanks Alexis Métaireau. #18
New -o max_tokens 100 option for changing the max tokens setting. #20
New -o n_gpu_layers 10 option for increasing the number of GPU layers. Thanks, LoopControl. #19

Assets 2

22 Sep 03:24

simonw

0.2b0 Pre-release

Pre-release

Support for new GGUF format model files. Thanks, Andrew Mshar. #16
Output from this model now streams. Thanks, Michael Hamann. #11
Support for compiling with METAL GPU acceleration on Apple Silicon. Thanks, vividfog. #14

Assets 2

01 Aug 17:42

simonw

0.1a0 Pre-release

Pre-release

Initial alpha release. Can download, register and run GGML models, including Llama 2 Chat models. #1, #2, #3, #4

Assets 2

Provide feedback