diff --git a/docs/content/docs/features/GPU-acceleration.md b/docs/content/docs/features/GPU-acceleration.md index cef08e71fb62..7d1f7e2e81ef 100644 --- a/docs/content/docs/features/GPU-acceleration.md +++ b/docs/content/docs/features/GPU-acceleration.md @@ -15,9 +15,45 @@ This section contains instruction on how to use LocalAI with GPU acceleration. For accelleration for AMD or Metal HW there are no specific container images, see the [build]({{%relref "docs/getting-started/build#Acceleration" %}}) {{% /alert %}} -### CUDA(NVIDIA) acceleration -#### Requirements +## Model configuration + +Depending on the model architecture and backend used, there might be different ways to enable GPU acceleration. It is required to configure the model you intend to use with a YAML config file. For example, for `llama.cpp` workloads a configuration file might look like this (where `gpu_layers` is the number of layers to offload to the GPU): + +```yaml +name: my-model-name +# Default model parameters +parameters: + # Relative to the models path + model: llama.cpp-model.ggmlv3.q5_K_M.bin + +context_size: 1024 +threads: 1 + +f16: true # enable with GPU acceleration +gpu_layers: 22 # GPU Layers (only used when built with cublas) + +``` + +For diffusers instead, it might look like this instead: + +```yaml +name: stablediffusion +parameters: + model: toonyou_beta6.safetensors +backend: diffusers +step: 30 +f16: true +diffusers: + pipeline_type: StableDiffusionPipeline + cuda: true + enable_parameters: "negative_prompt,num_inference_steps,clip_skip" + scheduler_type: "k_dpmpp_sde" +``` + +## CUDA(NVIDIA) acceleration + +### Requirements Requirement: nvidia-container-toolkit (installation instructions [1](https://www.server-world.info/en/note?os=Ubuntu_22.04&p=nvidia&f=2) [2](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)) @@ -74,37 +110,24 @@ llama_model_load_internal: total VRAM used: 1598 MB llama_init_from_file: kv self size = 512.00 MB ``` -#### Model configuration +## Intel acceleration (sycl) -Depending on the model architecture and backend used, there might be different ways to enable GPU acceleration. It is required to configure the model you intend to use with a YAML config file. For example, for `llama.cpp` workloads a configuration file might look like this (where `gpu_layers` is the number of layers to offload to the GPU): +#### Requirements -```yaml -name: my-model-name -# Default model parameters -parameters: - # Relative to the models path - model: llama.cpp-model.ggmlv3.q5_K_M.bin +Requirement: [Intel oneAPI Base Toolkit](https://software.intel.com/content/www/us/en/develop/tools/oneapi/base-toolkit/download.html) -context_size: 1024 -threads: 1 +To use SYCL, use the images with the `sycl` tag, for example. -f16: true # enable with GPU acceleration -gpu_layers: 22 # GPU Layers (only used when built with cublas) +The image list is on [quay](https://quay.io/repository/go-skynet/local-ai?tab=tags): -``` +- SYCL tags: `master-sycl`, `v1.40.0-sycl`, ... +- SYCL + FFmpeg tags: `master-sycl-ffmpeg`, `v1.40.0-sycl-ffmpeg`, ... -For diffusers instead, it might look like this instead: +### Notes + +In addition to the commands to run LocalAI normally, you need to specify `--device /dev/dri` to docker, for example: + +```bash +docker run --rm -ti --device /dev/dri -p 8080:8080 -e DEBUG=true -e MODELS_PATH=/models -e THREADS=1 -v $PWD/models:/models quay.io/go-skynet/local-ai:v1.40.0-sycl +``` -```yaml -name: stablediffusion -parameters: - model: toonyou_beta6.safetensors -backend: diffusers -step: 30 -f16: true -diffusers: - pipeline_type: StableDiffusionPipeline - cuda: true - enable_parameters: "negative_prompt,num_inference_steps,clip_skip" - scheduler_type: "k_dpmpp_sde" -``` \ No newline at end of file diff --git a/docs/content/docs/getting-started/build.md b/docs/content/docs/getting-started/build.md index c2a7e0eef8b3..b26a16d7e8c4 100644 --- a/docs/content/docs/getting-started/build.md +++ b/docs/content/docs/getting-started/build.md @@ -83,7 +83,7 @@ Here is the list of the variables available that can be used to customize the bu | Variable | Default | Description | | ---------------------| ------- | ----------- | -| `BUILD_TYPE` | None | Build type. Available: `cublas`, `openblas`, `clblas`, `metal`,`hipblas` | +| `BUILD_TYPE` | None | Build type. Available: `cublas`, `openblas`, `clblas`, `metal`,`hipblas`, `sycl_f16`, `sycl_f32` | | `GO_TAGS` | `tts stablediffusion` | Go tags. Available: `stablediffusion`, `tts`, `tinydream` | | `CLBLAST_DIR` | | Specify a CLBlast directory | | `CUDA_LIBPATH` | | Specify a CUDA library path | @@ -225,6 +225,17 @@ make BUILD_TYPE=clblas build To specify a clblast dir set: `CLBLAST_DIR` +#### Intel GPU acceleration + +Intel GPU acceleration is supported via SYCL. + +Requirements: [Intel oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html) (see also [llama.cpp setup installations instructions](https://github.com/ggerganov/llama.cpp/blob/d71ac90985854b0905e1abba778e407e17f9f887/README-sycl.md?plain=1#L56)) + +``` +make BUILD_TYPE=sycl_f16 build # for float16 +make BUILD_TYPE=sycl_f32 build # for float32 +``` + #### Metal (Apple Silicon) ```