Skip to content
guinmoon edited this page Jun 17, 2024 · 1 revision

LoRA

Put the adapter files in the lora_adapters directory.

You cannot use mmap when connecting an adapter, so memory consumption will be high. In order to use mmap with lora do export. To add one LoRA adapter, you can use the LLMFarm chat settings interface. To add several LoRA adapters, manually write them in the configuration file like this:

{
...
    "prompt_format" : "{{prompt}}",
    "model" : "openllama-3b-v2-q8_0.gguf",
    "lora_adapters" : [
        {
            "adapter" : "lora-open-llama-3b-v2-q8_0-shakespeare-LATEST.bin",
            "scale" : 0.9
        },
        {
            "adapter" : "lora-open-llama-3b-v2-q8_0-second-LATEST.bin",
            "scale" : 1.0
        }
    ],
...
}

Fine Tune

Now you can perform FineTune directly in LLMFarm. To do this, go to settings -> FineTune FineTune consumes quite a lot of RAM, so on iOS it is possible to train only 3B models with minimum settings n_ctx, n_batch. You can read more about FineTune here..

Export LoRA as a model

Now you can perform FineTune directly in LLMFarm. To do this, go to settings -> Merge Lora You can read about how to apply an adapter to a model and export the resulting model here.. For proper functionality on iOS devices, it is recommended to merge with Q4_K models and below.

Clone this wiki locally