Skip to content

Releases: simonw/llm-llama-cpp

0.3

09 Dec 05:48
Compare
Choose a tag to compare

New mechanism for running GGUF files directly, using llm -m gguf. Example:

llm -m gguf \
  -o path una-cybertron-7b-v2-bf16.Q8_0.gguf \
  'Instruction: Five reasons to get a pet walrus
Response:'

This makes it much easier to try out new GGUF files, for example those released by TheBloke on Hugging Face. #26

0.2b1

28 Sep 03:22
Compare
Choose a tag to compare
0.2b1 Pre-release
Pre-release
  • max_tokens now defaults to 4000, thanks Alexis Métaireau. #18
  • New -o max_tokens 100 option for changing the max tokens setting. #20
  • New -o n_gpu_layers 10 option for increasing the number of GPU layers. Thanks, LoopControl. #19

0.2b0

22 Sep 03:24
Compare
Choose a tag to compare
0.2b0 Pre-release
Pre-release
  • Support for new GGUF format model files. Thanks, Andrew Mshar. #16
  • Output from this model now streams. Thanks, Michael Hamann. #11
  • Support for compiling with METAL GPU acceleration on Apple Silicon. Thanks, vividfog. #14

0.1a0

01 Aug 17:42
Compare
Choose a tag to compare
0.1a0 Pre-release
Pre-release
  • Initial alpha release. Can download, register and run GGML models, including Llama 2 Chat models. #1, #2, #3, #4