Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for starting & changing models through UI #118

Open
wants to merge 22 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 3 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,14 +87,11 @@ cd llama-gpt
Run LlamaGPT with the following command:

```
./run-mac.sh --model 7b
./run-mac.sh
```

You can access LlamaGPT at http://localhost:3000.

> To run 13B or 70B chat models, replace `7b` with `13b` or `70b` respectively.
> To run 7B, 13B or 34B Code Llama models, replace `7b` with `code-7b`, `code-13b` or `code-34b` respectively.

To stop LlamaGPT, do `Ctrl + C` in Terminal.

### Install LlamaGPT anywhere else with Docker
Expand All @@ -111,20 +108,17 @@ cd llama-gpt
Run LlamaGPT with the following command:

```
./run.sh --model 7b
./run.sh
```

Or if you have an Nvidia GPU, you can run LlamaGPT with CUDA support using the `--with-cuda` flag, like:

```
./run.sh --model 7b --with-cuda
./run.sh --with-cuda
```

You can access LlamaGPT at `http://localhost:3000`.

> To run 13B or 70B chat models, replace `7b` with `13b` or `70b` respectively.
> To run Code Llama 7B, 13B or 34B models, replace `7b` with `code-7b`, `code-13b` or `code-34b` respectively.

To stop LlamaGPT, do `Ctrl + C` in Terminal.

> Note: On the first run, it may take a while for the model to be downloaded to the `/models` directory. You may also see lots of output like this for a few minutes, which is normal:
Expand Down
32 changes: 16 additions & 16 deletions docker-compose-cuda-ggml.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,19 +28,19 @@ services:
count: 1
capabilities: [gpu]

llama-gpt-ui:
# TODO: Use this image instead of building from source after the next release
# image: 'ghcr.io/getumbrel/llama-gpt-ui:latest'
build:
context: ./ui
dockerfile: Dockerfile
ports:
- 3000:3000
restart: on-failure
environment:
- 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
- 'OPENAI_API_HOST=http://llama-gpt-api-cuda-ggml:8000'
- 'DEFAULT_MODEL=/models/${MODEL_NAME:-llama-2-7b-chat.bin}'
- 'NEXT_PUBLIC_DEFAULT_SYSTEM_PROMPT=${DEFAULT_SYSTEM_PROMPT:-"You are a helpful and friendly AI assistant. Respond very concisely."}'
- 'WAIT_HOSTS=llama-gpt-api-cuda-ggml:8000'
- 'WAIT_TIMEOUT=${WAIT_TIMEOUT:-3600}'
# llama-gpt-ui:
# # TODO: Use this image instead of building from source after the next release
# # image: 'ghcr.io/getumbrel/llama-gpt-ui:latest'
# build:
# context: ./ui
# dockerfile: Dockerfile
# ports:
# - 3000:3000
# restart: on-failure
# environment:
# - 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
# - 'OPENAI_API_HOST=http://llama-gpt-api-cuda-ggml:8000'
# - 'DEFAULT_MODEL=/models/${MODEL_NAME:-llama-2-7b-chat.bin}'
# - 'NEXT_PUBLIC_DEFAULT_SYSTEM_PROMPT=${DEFAULT_SYSTEM_PROMPT:-"You are a helpful and friendly AI assistant. Respond very concisely."}'
# - 'WAIT_HOSTS=llama-gpt-api-cuda-ggml:8000'
# - 'WAIT_TIMEOUT=${WAIT_TIMEOUT:-3600}'
32 changes: 16 additions & 16 deletions docker-compose-cuda-gguf.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,19 +28,19 @@ services:
count: 1
capabilities: [gpu]

llama-gpt-ui:
# TODO: Use this image instead of building from source after the next release
# image: 'ghcr.io/getumbrel/llama-gpt-ui:latest'
build:
context: ./ui
dockerfile: Dockerfile
ports:
- 3000:3000
restart: on-failure
environment:
- 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
- 'OPENAI_API_HOST=http://llama-gpt-api-cuda-gguf:8000'
- 'DEFAULT_MODEL=/models/${MODEL_NAME:-code-llama-2-7b-chat.gguf}'
- 'NEXT_PUBLIC_DEFAULT_SYSTEM_PROMPT=${DEFAULT_SYSTEM_PROMPT:-"You are a helpful and friendly AI assistant. Respond very concisely."}'
- 'WAIT_HOSTS=llama-gpt-api-cuda-gguf:8000'
- 'WAIT_TIMEOUT=${WAIT_TIMEOUT:-3600}'
# llama-gpt-ui:
# # TODO: Use this image instead of building from source after the next release
# # image: 'ghcr.io/getumbrel/llama-gpt-ui:latest'
# build:
# context: ./ui
# dockerfile: Dockerfile
# ports:
# - 3000:3000
# restart: on-failure
# environment:
# - 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
# - 'OPENAI_API_HOST=http://llama-gpt-api-cuda-gguf:8000'
# - 'DEFAULT_MODEL=/models/${MODEL_NAME:-code-llama-2-7b-chat.gguf}'
# - 'NEXT_PUBLIC_DEFAULT_SYSTEM_PROMPT=${DEFAULT_SYSTEM_PROMPT:-"You are a helpful and friendly AI assistant. Respond very concisely."}'
# - 'WAIT_HOSTS=llama-gpt-api-cuda-gguf:8000'
# - 'WAIT_TIMEOUT=${WAIT_TIMEOUT:-3600}'
32 changes: 16 additions & 16 deletions docker-compose-gguf.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,19 +19,19 @@ services:
- IPC_LOCK
command: '/bin/sh /api/run.sh'

llama-gpt-ui:
# TODO: Use this image instead of building from source after the next release
# image: 'ghcr.io/getumbrel/llama-gpt-ui:latest'
build:
context: ./ui
dockerfile: Dockerfile
ports:
- 3000:3000
restart: on-failure
environment:
- 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
- 'OPENAI_API_HOST=http://llama-gpt-api:8000'
- 'DEFAULT_MODEL=/models/${MODEL_NAME:-llama-2-7b-chat.bin}'
- 'NEXT_PUBLIC_DEFAULT_SYSTEM_PROMPT=${DEFAULT_SYSTEM_PROMPT:-"You are a helpful and friendly AI assistant. Respond very concisely."}'
- 'WAIT_HOSTS=llama-gpt-api:8000'
- 'WAIT_TIMEOUT=${WAIT_TIMEOUT:-3600}'
# llama-gpt-ui:
# # TODO: Use this image instead of building from source after the next release
# # image: 'ghcr.io/getumbrel/llama-gpt-ui:latest'
# build:
# context: ./ui
# dockerfile: Dockerfile
# ports:
# - 3000:3000
# restart: on-failure
# environment:
# - 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
# - 'OPENAI_API_HOST=http://llama-gpt-api:8000'
# - 'DEFAULT_MODEL=/models/${MODEL_NAME:-llama-2-7b-chat.bin}'
# - 'NEXT_PUBLIC_DEFAULT_SYSTEM_PROMPT=${DEFAULT_SYSTEM_PROMPT:-"You are a helpful and friendly AI assistant. Respond very concisely."}'
# - 'WAIT_HOSTS=llama-gpt-api:8000'
# - 'WAIT_TIMEOUT=${WAIT_TIMEOUT:-3600}'
1 change: 1 addition & 0 deletions docker-compose-mac.yml → docker-compose-mac-ui.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,6 @@ services:
environment:
- 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
- 'OPENAI_API_HOST=http://host.docker.internal:3001'
- 'MODEL_MANAGER_ENDPOINT=http://host.docker.internal:3002'
- 'DEFAULT_MODEL=$MODEL'
- 'NEXT_PUBLIC_DEFAULT_SYSTEM_PROMPT=${DEFAULT_SYSTEM_PROMPT:-"You are a helpful and friendly AI assistant. Respond very concisely and use markdown if responding with code."}'
46 changes: 46 additions & 0 deletions docker-compose-rocm-ggml.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
version: '3.6'

services:
llama-gpt-api-rocm-ggml:
build:
context: ./rocm
dockerfile: ggml.Dockerfile
restart: on-failure
volumes:
- './models:/models'
- './rocm:/rocm'
ports:
- 3001:8000
environment:
MODEL: '/models/${MODEL_NAME:-llama-2-7b-chat.bin}'
MODEL_DOWNLOAD_URL: '${MODEL_DOWNLOAD_URL:-https://huggingface.co/TheBloke/Nous-Hermes-Llama-2-7B-GGML/resolve/main/nous-hermes-llama-2-7b.ggmlv3.q4_0.bin}'
N_GQA: '${N_GQA:-1}'
USE_MLOCK: 1
cap_add:
- IPC_LOCK
- SYS_RESOURCE
command: '/bin/sh /rocm/run.sh'
deploy:
resources:
reservations:
devices:
- driver: amdgpu
count: 1
capabilities: [gpu]

llama-gpt-ui:
# TODO: Use this image instead of building from source after the next release
# image: 'ghcr.io/getumbrel/llama-gpt-ui:latest'
build:
context: ./ui
dockerfile: Dockerfile
ports:
- 3000:3000
restart: on-failure
environment:
- 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
- 'OPENAI_API_HOST=http://llama-gpt-api-rocm-ggml:8000'
- 'DEFAULT_MODEL=/models/${MODEL_NAME:-llama-2-7b-chat.bin}'
- 'NEXT_PUBLIC_DEFAULT_SYSTEM_PROMPT=${DEFAULT_SYSTEM_PROMPT:-"You are a helpful and friendly AI assistant. Respond very concisely."}'
- 'WAIT_HOSTS=llama-gpt-api-rocm-ggml:8000'
- 'WAIT_TIMEOUT=${WAIT_TIMEOUT:-3600}'
46 changes: 46 additions & 0 deletions docker-compose-rocm-gguf.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
version: '3.6'

services:
llama-gpt-api-rocm-gguf:
build:
context: ./rocm
dockerfile: gguf.Dockerfile
restart: on-failure
volumes:
- './models:/models'
- './rocm:/rocm'
ports:
- 3001:8000
environment:
MODEL: '/models/${MODEL_NAME:-code-llama-2-7b-chat.gguf}'
MODEL_DOWNLOAD_URL: '${MODEL_DOWNLOAD_URL:-https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GGUF/resolve/main/codellama-7b-instruct.Q4_K_M.gguf}'
N_GQA: '${N_GQA:-1}'
USE_MLOCK: 1
cap_add:
- IPC_LOCK
- SYS_RESOURCE
command: '/bin/sh /rocm/run.sh'
deploy:
resources:
reservations:
devices:
- driver: amdgpu
count: 1
capabilities: [gpu]

llama-gpt-ui:
# TODO: Use this image instead of building from source after the next release
# image: 'ghcr.io/getumbrel/llama-gpt-ui:latest'
build:
context: ./ui
dockerfile: Dockerfile
ports:
- 3000:3000
restart: on-failure
environment:
- 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
- 'OPENAI_API_HOST=http://llama-gpt-api-rocm-gguf:8000'
- 'DEFAULT_MODEL=/models/${MODEL_NAME:-code-llama-2-7b-chat.gguf}'
- 'NEXT_PUBLIC_DEFAULT_SYSTEM_PROMPT=${DEFAULT_SYSTEM_PROMPT:-"You are a helpful and friendly AI assistant. Respond very concisely."}'
- 'WAIT_HOSTS=llama-gpt-api-rocm-gguf:8000'
- 'WAIT_TIMEOUT=${WAIT_TIMEOUT:-3600}'
18 changes: 18 additions & 0 deletions docker-compose-ui.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
version: '3.6'

services:
llama-gpt-ui:
# TODO: Use this image instead of building from source after the next release
# image: 'ghcr.io/getumbrel/llama-gpt-ui:latest'
build:
context: ./ui
dockerfile: Dockerfile
ports:
- 3000:3000
restart: on-failure
environment:
- 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
- 'OPENAI_API_HOST=http://llama-gpt-api:8000'
- 'MODEL_MANAGER_ENDPOINT=http://host.docker.internal:3002'
- 'DEFAULT_MODEL=/models/${MODEL_NAME:-llama-2-7b-chat.bin}'
- 'NEXT_PUBLIC_DEFAULT_SYSTEM_PROMPT=${DEFAULT_SYSTEM_PROMPT:-"You are a helpful and friendly AI assistant. Respond very concisely."}'
32 changes: 16 additions & 16 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,19 +19,19 @@ services:
- IPC_LOCK
command: '/bin/sh /api/run.sh'

llama-gpt-ui:
# TODO: Use this image instead of building from source after the next release
# image: 'ghcr.io/getumbrel/llama-gpt-ui:latest'
build:
context: ./ui
dockerfile: Dockerfile
ports:
- 3000:3000
restart: on-failure
environment:
- 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
- 'OPENAI_API_HOST=http://llama-gpt-api:8000'
- 'DEFAULT_MODEL=/models/${MODEL_NAME:-llama-2-7b-chat.bin}'
- 'NEXT_PUBLIC_DEFAULT_SYSTEM_PROMPT=${DEFAULT_SYSTEM_PROMPT:-"You are a helpful and friendly AI assistant. Respond very concisely."}'
- 'WAIT_HOSTS=llama-gpt-api:8000'
- 'WAIT_TIMEOUT=${WAIT_TIMEOUT:-3600}'
# llama-gpt-ui:
# # TODO: Use this image instead of building from source after the next release
# # image: 'ghcr.io/getumbrel/llama-gpt-ui:latest'
# build:
# context: ./ui
# dockerfile: Dockerfile
# ports:
# - 3000:3000
# restart: on-failure
# environment:
# - 'OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXX'
# - 'OPENAI_API_HOST=http://llama-gpt-api:8000'
# - 'DEFAULT_MODEL=/models/${MODEL_NAME:-llama-2-7b-chat.bin}'
# - 'NEXT_PUBLIC_DEFAULT_SYSTEM_PROMPT=${DEFAULT_SYSTEM_PROMPT:-"You are a helpful and friendly AI assistant. Respond very concisely."}'
# - 'WAIT_HOSTS=llama-gpt-api:8000'
# - 'WAIT_TIMEOUT=${WAIT_TIMEOUT:-3600}'
Loading