This is a small wrapper to build and run a llama.cpp llama-server to serve a (multimodal) LLM.
Clone the repository, build the llama-server and install a systemd user service:
git clone https://github.com/mlang/llm-api
make -C llm-apiThe file llm-api/Makefile contains a MODEL variable at the top of the file.
It is preet to a model that should fit in 16GB RAM.
Change the MODEL variable if you want to use a different HuggingFace model.
To download the model and test the server, execute:
make -C llm-api llm-apiThe LLM API will listen on 127.0.1.9:8080.
If this runs fine, you can start the systemd user service with:
systemctl --user start llm-apiIf you are using the llm Python package, you can
copy the file llm-api/extra-openai-models.yaml to your llm config directory:
ln -s $(pwd)/llm-api/extra-openai-models.yaml ~/.config/io.datasette.llm/Assuming you are using the llm Python package, you can describe an image with:
llm -m local -a image.jpgIf this works, you can enable the llm-api service permanently with:
systemctl --user enable llm-apiThis will start the llm-api service when you log in with the current user.