Skip to content

Commit

Permalink
docs: add sycl
Browse files Browse the repository at this point in the history
  • Loading branch information
mudler committed Feb 1, 2024
1 parent 6bb8e36 commit a953219
Show file tree
Hide file tree
Showing 2 changed files with 64 additions and 30 deletions.
81 changes: 52 additions & 29 deletions docs/content/docs/features/GPU-acceleration.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,45 @@ This section contains instruction on how to use LocalAI with GPU acceleration.
For accelleration for AMD or Metal HW there are no specific container images, see the [build]({{%relref "docs/getting-started/build#Acceleration" %}})
{{% /alert %}}

### CUDA(NVIDIA) acceleration

#### Requirements
## Model configuration

Depending on the model architecture and backend used, there might be different ways to enable GPU acceleration. It is required to configure the model you intend to use with a YAML config file. For example, for `llama.cpp` workloads a configuration file might look like this (where `gpu_layers` is the number of layers to offload to the GPU):

```yaml
name: my-model-name
# Default model parameters
parameters:
# Relative to the models path
model: llama.cpp-model.ggmlv3.q5_K_M.bin

context_size: 1024
threads: 1

f16: true # enable with GPU acceleration
gpu_layers: 22 # GPU Layers (only used when built with cublas)

```

For diffusers instead, it might look like this instead:

```yaml
name: stablediffusion
parameters:
model: toonyou_beta6.safetensors
backend: diffusers
step: 30
f16: true
diffusers:
pipeline_type: StableDiffusionPipeline
cuda: true
enable_parameters: "negative_prompt,num_inference_steps,clip_skip"
scheduler_type: "k_dpmpp_sde"
```

## CUDA(NVIDIA) acceleration

### Requirements

Requirement: nvidia-container-toolkit (installation instructions [1](https://www.server-world.info/en/note?os=Ubuntu_22.04&p=nvidia&f=2) [2](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html))

Expand Down Expand Up @@ -74,37 +110,24 @@ llama_model_load_internal: total VRAM used: 1598 MB
llama_init_from_file: kv self size = 512.00 MB
```

#### Model configuration
## Intel acceleration (sycl)

Depending on the model architecture and backend used, there might be different ways to enable GPU acceleration. It is required to configure the model you intend to use with a YAML config file. For example, for `llama.cpp` workloads a configuration file might look like this (where `gpu_layers` is the number of layers to offload to the GPU):
#### Requirements

```yaml
name: my-model-name
# Default model parameters
parameters:
# Relative to the models path
model: llama.cpp-model.ggmlv3.q5_K_M.bin
Requirement: [Intel oneAPI Base Toolkit](https://software.intel.com/content/www/us/en/develop/tools/oneapi/base-toolkit/download.html)

context_size: 1024
threads: 1
To use SYCL, use the images with the `sycl` tag, for example.

f16: true # enable with GPU acceleration
gpu_layers: 22 # GPU Layers (only used when built with cublas)
The image list is on [quay](https://quay.io/repository/go-skynet/local-ai?tab=tags):

```
- SYCL tags: `master-sycl`, `v1.40.0-sycl`, ...
- SYCL + FFmpeg tags: `master-sycl-ffmpeg`, `v1.40.0-sycl-ffmpeg`, ...

For diffusers instead, it might look like this instead:
### Notes

In addition to the commands to run LocalAI normally, you need to specify `--device /dev/dri` to docker, for example:

```bash
docker run --rm -ti --device /dev/dri -p 8080:8080 -e DEBUG=true -e MODELS_PATH=/models -e THREADS=1 -v $PWD/models:/models quay.io/go-skynet/local-ai:v1.40.0-sycl
```

```yaml
name: stablediffusion
parameters:
model: toonyou_beta6.safetensors
backend: diffusers
step: 30
f16: true
diffusers:
pipeline_type: StableDiffusionPipeline
cuda: true
enable_parameters: "negative_prompt,num_inference_steps,clip_skip"
scheduler_type: "k_dpmpp_sde"
```
13 changes: 12 additions & 1 deletion docs/content/docs/getting-started/build.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ Here is the list of the variables available that can be used to customize the bu

| Variable | Default | Description |
| ---------------------| ------- | ----------- |
| `BUILD_TYPE` | None | Build type. Available: `cublas`, `openblas`, `clblas`, `metal`,`hipblas` |
| `BUILD_TYPE` | None | Build type. Available: `cublas`, `openblas`, `clblas`, `metal`,`hipblas`, `sycl_f16`, `sycl_f32` |
| `GO_TAGS` | `tts stablediffusion` | Go tags. Available: `stablediffusion`, `tts`, `tinydream` |
| `CLBLAST_DIR` | | Specify a CLBlast directory |
| `CUDA_LIBPATH` | | Specify a CUDA library path |
Expand Down Expand Up @@ -225,6 +225,17 @@ make BUILD_TYPE=clblas build

To specify a clblast dir set: `CLBLAST_DIR`

#### Intel GPU acceleration

Intel GPU acceleration is supported via SYCL.

Requirements: [Intel oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html) (see also [llama.cpp setup installations instructions](https://github.com/ggerganov/llama.cpp/blob/d71ac90985854b0905e1abba778e407e17f9f887/README-sycl.md?plain=1#L56))

```
make BUILD_TYPE=sycl_f16 build # for float16
make BUILD_TYPE=sycl_f32 build # for float32
```

#### Metal (Apple Silicon)

```
Expand Down

0 comments on commit a953219

Please sign in to comment.