|
1 | 1 | 🖥 Local Models
|
2 | 2 | ===============
|
3 | 3 |
|
4 |
| -To run gptme with local models, you need to install and run the [llama-cpp-python][llama-cpp-python] server. To ensure you get the most out of your hardware, make sure you build it with [the appropriate hardware acceleration][hwaccel]. |
| 4 | +This is a guide to setting up a local model for use with gptme. |
5 | 5 |
|
6 |
| -For macOS, you can find detailed instructions [here][metal]. |
| 6 | +There are a few options, here we will cover two: |
7 | 7 |
|
8 |
| -I recommend the WizardCoder-Python models. |
| 8 | +### ollama + litellm |
9 | 9 |
|
10 |
| -[llama-cpp-python]: https://github.com/abetlen/llama-cpp-python |
11 |
| -[hwaccel]: https://github.com/abetlen/llama-cpp-python#installation-with-hardware-acceleration |
12 |
| -[metal]: https://github.com/abetlen/llama-cpp-python/blob/main/docs/install/macos.md |
| 10 | +Here's how to use ollama with the litellm proxy to get a OpenAI API-compatible server: |
| 11 | + |
| 12 | +You first need to install ollama and litellm. |
| 13 | + |
| 14 | +```sh |
| 15 | +ollama pull mistral |
| 16 | +ollama serve |
| 17 | +litellm --model ollama/mistral |
| 18 | +export OPENAI_API_BASE="http://localhost:8000" |
| 19 | +``` |
| 20 | + |
| 21 | +### llama_cpp.server |
| 22 | + |
| 23 | +Here's how to use the llama_cpp.server to get a OpenAI API-compatible server. |
| 24 | + |
| 25 | +You first need to install and run the [llama-cpp-python][llama-cpp-python] server. To ensure you get the most out of your hardware, make sure you build it with [the appropriate hardware acceleration][hwaccel]. For macOS, you can find detailed instructions [here][metal]. |
13 | 26 |
|
14 | 27 | ```sh
|
15 | 28 | MODEL=~/ML/wizardcoder-python-13b-v1.0.Q4_K_M.gguf
|
16 | 29 | poetry run python -m llama_cpp.server --model $MODEL --n_gpu_layers 1 # Use `--n_gpu_layer 1` if you have a M1/M2 chip
|
17 |
| - |
18 |
| -# Now, to use it: |
19 | 30 | export OPENAI_API_BASE="http://localhost:8000/v1"
|
20 |
| -gptme --llm local |
21 | 31 | ```
|
22 | 32 |
|
| 33 | +### Now, to use it: |
| 34 | + |
| 35 | +```sh |
| 36 | +gptme --llm local "say hello!" |
| 37 | +``` |
| 38 | + |
| 39 | + |
| 40 | +### So, how well does it work? |
| 41 | + |
| 42 | +I've had mixed results. They are not nearly as good as GPT-4, and often struggles with the tools laid out in the system prompt. However I haven't tested with models larger than 7B/13B. |
| 43 | + |
| 44 | +I'm hoping future models, trained better for tool-use and interactive coding (where outputs are fed back), can remedy this, even at 7B/13B model sizes. Perhaps we can fine-tune a model on (GPT-4) conversation logs to create a purpose-fit model that knows how to use the tools. |
| 45 | + |
| 46 | +[llama-cpp-python]: https://github.com/abetlen/llama-cpp-python |
| 47 | +[hwaccel]: https://github.com/abetlen/llama-cpp-python#installation-with-hardware-acceleration |
| 48 | +[metal]: https://github.com/abetlen/llama-cpp-python/blob/main/docs/install/macos.md |
0 commit comments