docs: updated local-models doc with better instructions

ErikBjare · ErikBjare · commit 23644af2f6f2 · 2023-10-28T13:00:52.000+02:00
diff --git a/docs/local-models.md b/docs/local-models.md
@@ -1,22 +1,48 @@
 🖥 Local Models
 ===============
 
-To run gptme with local models, you need to install and run the [llama-cpp-python][llama-cpp-python] server. To ensure you get the most out of your hardware, make sure you build it with [the appropriate hardware acceleration][hwaccel].
+This is a guide to setting up a local model for use with gptme.
 
-For macOS, you can find detailed instructions [here][metal].
+There are a few options, here we will cover two:
 
-I recommend the WizardCoder-Python models.
+### ollama + litellm
 
-[llama-cpp-python]: https://github.com/abetlen/llama-cpp-python
-[hwaccel]: https://github.com/abetlen/llama-cpp-python#installation-with-hardware-acceleration
-[metal]: https://github.com/abetlen/llama-cpp-python/blob/main/docs/install/macos.md
+Here's how to use ollama with the litellm proxy to get a OpenAI API-compatible server:
+
+You first need to install ollama and litellm.
+
+```sh
+ollama pull mistral
+ollama serve
+litellm --model ollama/mistral
+export OPENAI_API_BASE="http://localhost:8000"
+```
+
+### llama_cpp.server
+
+Here's how to use the llama_cpp.server to get a OpenAI API-compatible server.
+
+You first need to install and run the [llama-cpp-python][llama-cpp-python] server. To ensure you get the most out of your hardware, make sure you build it with [the appropriate hardware acceleration][hwaccel]. For macOS, you can find detailed instructions [here][metal].
 
 ```sh
 MODEL=~/ML/wizardcoder-python-13b-v1.0.Q4_K_M.gguf
 poetry run python -m llama_cpp.server --model $MODEL --n_gpu_layers 1  # Use `--n_gpu_layer 1` if you have a M1/M2 chip
-
-# Now, to use it:
 export OPENAI_API_BASE="http://localhost:8000/v1"
-gptme --llm local
 ```
 
+### Now, to use it:
+
+```sh
+gptme --llm local "say hello!"
+```
+
+
+### So, how well does it work?
+
+I've had mixed results. They are not nearly as good as GPT-4, and often struggles with the tools laid out in the system prompt. However I haven't tested with models larger than 7B/13B.
+
+I'm hoping future models, trained better for tool-use and interactive coding (where outputs are fed back), can remedy this, even at 7B/13B model sizes. Perhaps we can fine-tune a model on (GPT-4) conversation logs to create a purpose-fit model that knows how to use the tools.
+
+[llama-cpp-python]: https://github.com/abetlen/llama-cpp-python
+[hwaccel]: https://github.com/abetlen/llama-cpp-python#installation-with-hardware-acceleration
+[metal]: https://github.com/abetlen/llama-cpp-python/blob/main/docs/install/macos.md