Skip to content

Commit 23644af

Browse files
committed
docs: updated local-models doc with better instructions
1 parent a046004 commit 23644af

File tree

1 file changed

+35
-9
lines changed

1 file changed

+35
-9
lines changed

docs/local-models.md

Lines changed: 35 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,48 @@
11
🖥 Local Models
22
===============
33

4-
To run gptme with local models, you need to install and run the [llama-cpp-python][llama-cpp-python] server. To ensure you get the most out of your hardware, make sure you build it with [the appropriate hardware acceleration][hwaccel].
4+
This is a guide to setting up a local model for use with gptme.
55

6-
For macOS, you can find detailed instructions [here][metal].
6+
There are a few options, here we will cover two:
77

8-
I recommend the WizardCoder-Python models.
8+
### ollama + litellm
99

10-
[llama-cpp-python]: https://github.com/abetlen/llama-cpp-python
11-
[hwaccel]: https://github.com/abetlen/llama-cpp-python#installation-with-hardware-acceleration
12-
[metal]: https://github.com/abetlen/llama-cpp-python/blob/main/docs/install/macos.md
10+
Here's how to use ollama with the litellm proxy to get a OpenAI API-compatible server:
11+
12+
You first need to install ollama and litellm.
13+
14+
```sh
15+
ollama pull mistral
16+
ollama serve
17+
litellm --model ollama/mistral
18+
export OPENAI_API_BASE="http://localhost:8000"
19+
```
20+
21+
### llama_cpp.server
22+
23+
Here's how to use the llama_cpp.server to get a OpenAI API-compatible server.
24+
25+
You first need to install and run the [llama-cpp-python][llama-cpp-python] server. To ensure you get the most out of your hardware, make sure you build it with [the appropriate hardware acceleration][hwaccel]. For macOS, you can find detailed instructions [here][metal].
1326

1427
```sh
1528
MODEL=~/ML/wizardcoder-python-13b-v1.0.Q4_K_M.gguf
1629
poetry run python -m llama_cpp.server --model $MODEL --n_gpu_layers 1 # Use `--n_gpu_layer 1` if you have a M1/M2 chip
17-
18-
# Now, to use it:
1930
export OPENAI_API_BASE="http://localhost:8000/v1"
20-
gptme --llm local
2131
```
2232

33+
### Now, to use it:
34+
35+
```sh
36+
gptme --llm local "say hello!"
37+
```
38+
39+
40+
### So, how well does it work?
41+
42+
I've had mixed results. They are not nearly as good as GPT-4, and often struggles with the tools laid out in the system prompt. However I haven't tested with models larger than 7B/13B.
43+
44+
I'm hoping future models, trained better for tool-use and interactive coding (where outputs are fed back), can remedy this, even at 7B/13B model sizes. Perhaps we can fine-tune a model on (GPT-4) conversation logs to create a purpose-fit model that knows how to use the tools.
45+
46+
[llama-cpp-python]: https://github.com/abetlen/llama-cpp-python
47+
[hwaccel]: https://github.com/abetlen/llama-cpp-python#installation-with-hardware-acceleration
48+
[metal]: https://github.com/abetlen/llama-cpp-python/blob/main/docs/install/macos.md

0 commit comments

Comments
 (0)