Working with Llama models

This is a guide on how to setup and use other LLMs with the system.

For each model, I provide brief installation instructions for convenience. However, you should always refer to the original repository I link to for the most up-to-date instructions.

Llama 2 and Llama-derivative models

LLaMA is a Meta LLM that initially leaked and was then open-sourced as Llama 2.
It has become the base for a whole family of derivative, open-source models, like Vicuna, a fine-tuned LLM for chat-like interactions.

Requirements

At least 10GB of CPU RAM for the 13B model
Make
Wget

Installation

Original instructions are here.
Refer to them for the most up-to-date information.
For speed optimisation, you need to decide which GPU library bindings to build with (make command).
Refer to the LLaMA.cpp documentation for more information on installing the dependent libraries.

In short, the mapping is:
- NVidia GPU: cuBLAS make -j LLAMA_CUBLAS=1
- AMD GPU: rocBLAS make -j LLAMA_CLBLAST=1
- Intel GPU or Metal: make -j LLAMA_METAL=1
The following one-liner will download >9Gb of data on your machine, so you will need a fast connection or some patience.

cd <somewhere_with_enough_space>
git clone https://github.com/fredi-python/llama.cpp.git \
  && cd llama.cpp \
  && make -j LLAMA_METAL=1 \
  && cd models \
  && wget -c https://huggingface.co/CRD716/ggml-vicuna-1.1-quantized/resolve/main/ggml-vicuna-13B-1.1-q5_1.bin

This command downloads the 5bit model, which is more accurate.
You may wish to experiment with other models, trading size for accuracy.

Here is a small list of alternative Llama-compatible GGML models to get you started.
They have various quantizations (i.e. accuracy and model size).

Vicuna 13B 4bit
List of different Llama-2 models: Various sizes and accuracies

They are all interchangeable when it comes to code execution.

Verifying Installation

Verify that the model is working and can respond by running the following command from the llama.cpp directory.
This starts llama.cpp in interactive mode, using the downloaded model.

./main \
  -m models/ggml-vicuna-13B-1.1-q5_1.bin \
  --repeat_penalty 1.0 \
  --color -i \
  -r "User:" \
  -f prompts/chat-with-vicuna-v1.txt

If you have downloaded multiple models you can launch them separately and test the results with the same prompt.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLAMA_MODELS.md

LLAMA_MODELS.md

Working with Llama models

Llama 2 and Llama-derivative models

Requirements

Installation

Verifying Installation

Files

LLAMA_MODELS.md

Latest commit

History

LLAMA_MODELS.md

File metadata and controls

Working with Llama models

Llama 2 and Llama-derivative models

Requirements

Installation

Verifying Installation