Enabling ollama to run on Intel GPUs with SYCL backend #3278

zhewang1-intc · 2024-03-21T05:44:14Z

Hi, I am submitting this pr to enable ollama to run on Intel GPUs with SYCL as the backend. This pr was originally started by @felipeagc who is currently unable to actively participate due to relocation.
The original pr had fallen behind the main branch, making it inconvenient for maintainers @mxyng @jmorganca @dhiltgen to review. Therefore, I rebased the latest main branch and opened this new pull request. I have verified that it works correctly on Ubuntu 22.04 with ARC 770 GPU.
While I am not very familiar with this project and I welcome any guidance and assistance from the community. Let’s work together to make ollama support Intel GPU platforms. cc:@hshen14 @kevinintel @airMeng

UPDATE: works well on windows10 + ARC 770
UPDATE: works well on oneapi-docker-image(oneapi-basekit-Ubuntu22.04) + ARC770

airMeng · 2024-03-21T07:05:05Z

@jmorganca @mxyng Could you give a review?

semidark · 2024-03-24T17:38:12Z

I tried this on my Intel integrated GPU. I am able to build and run llama.cpp with Intel GPU support without too much problems following this Tutorial: https://github.com/ggerganov/llama.cpp/blob/master/README-sycl.md

docker run -p 8080:8080 -it --rm -v "$(pwd):/app:Z" --device /dev/dri/renderD128:/dev/dri/renderD128 --device /dev/dri/card0:/dev/dri/card0 llama-cpp-sycl-server -m "/app/models/orca-2-13b.Q5_K_M.gguf" -c 512 --host 0.0.0.0 --port 8080 -ngl 41
ggml_init_sycl: GGML_SYCL_DEBUG: 0
ggml_init_sycl: GGML_SYCL_F16: no
found 4 SYCL devices:
|  |                  |                                             |Compute   |Max compute|Max work|Max sub|               |
|ID|       Device Type|                                         Name|capability|units      |group   |group  |Global mem size|
|--|------------------|---------------------------------------------|----------|-----------|--------|-------|---------------|
| 0|[level_zero:gpu:0]|                 Intel(R) Iris(R) Xe Graphics|       1.3|         96|     512|     32|    26669551616|
| 1|    [opencl:gpu:0]|                 Intel(R) Iris(R) Xe Graphics|       3.0|         96|     512|     32|    26669551616|
| 2|    [opencl:cpu:0]|11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz|       3.0|          8|    8192|     64|    33336942592|
| 3|    [opencl:acc:0]|               Intel(R) FPGA Emulation Device|       1.2|          8|67108864|     64|    33336942592|
{"build":0,"commit":"unknown","function":"main","level":"INFO","line":2756,"msg":"build info","tid":"138119685482496","timestamp":1711301931}
{"function":"main","level":"INFO","line":2763,"msg":"system info","n_threads":4,"n_threads_batch":-1,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | ","tid":"138119685482496","timestamp":1711301931,"total_threads":8}
llama_model_loader: loaded meta data with 22 key-value pairs and 363 tensors from /app/models/orca-2-13b.Q5_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 5120
llama_model_loader: - kv   4:                          llama.block_count u32              = 40
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 13824
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 40
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 40
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  11:                          general.file_type u32              = 17
llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32003]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32003]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32003]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  18:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  19:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  20:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  21:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   81 tensors
llama_model_loader: - type q5_K:  241 tensors
llama_model_loader: - type q6_K:   41 tensors
llm_load_vocab: special tokens definition check successful ( 262/32003 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32003
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 5120
llm_load_print_meta: n_head           = 40
llm_load_print_meta: n_head_kv        = 40
llm_load_print_meta: n_layer          = 40
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 5120
llm_load_print_meta: n_embd_v_gqa     = 5120
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 13824
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 13B
llm_load_print_meta: model ftype      = Q5_K - Medium
llm_load_print_meta: model params     = 13.02 B
llm_load_print_meta: model size       = 8.60 GiB (5.67 BPW) 
llm_load_print_meta: general.name     = LLaMA v2
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: PAD token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
ggml_backend_sycl_set_mul_device_mode: true
detect 1 SYCL GPUs: [0] with top Max compute units:96
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
llm_load_tensors: ggml ctx size =    0.28 MiB
llm_load_tensors: offloading 40 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 41/41 layers to GPU
llm_load_tensors:      SYCL0 buffer size =  8694.22 MiB
llm_load_tensors:        CPU buffer size =   107.43 MiB
...................................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      SYCL0 KV buffer size =   400.00 MiB
llama_new_context_with_model: KV self size  =  400.00 MiB, K (f16):  200.00 MiB, V (f16):  200.00 MiB
llama_new_context_with_model:  SYCL_Host  output buffer size =    62.51 MiB
llama_new_context_with_model:      SYCL0 compute buffer size =    81.00 MiB
llama_new_context_with_model:  SYCL_Host compute buffer size =    11.00 MiB
llama_new_context_with_model: graph nodes  = 1324
llama_new_context_with_model: graph splits = 2

But the Version of llama.cpp started by Ollama fails to detect/use my GPU.

./ollama serve
time=2024-03-24T11:21:16.096-06:00 level=INFO source=images.go:863 msg="total blobs: 6"
time=2024-03-24T11:21:16.096-06:00 level=INFO source=images.go:870 msg="total unused blobs removed: 0"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:	export GIN_MODE=release
 - using code:	gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST   /api/pull                 --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers)
[GIN-debug] POST   /api/generate             --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers)
[GIN-debug] POST   /api/chat                 --> github.com/jmorganca/ollama/server.ChatHandler (5 handlers)
[GIN-debug] POST   /api/embeddings           --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers)
[GIN-debug] POST   /api/create               --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers)
[GIN-debug] POST   /api/push                 --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers)
[GIN-debug] POST   /api/copy                 --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers)
[GIN-debug] DELETE /api/delete               --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers)
[GIN-debug] POST   /api/show                 --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers)
[GIN-debug] POST   /api/blobs/:digest        --> github.com/jmorganca/ollama/server.CreateBlobHandler (5 handlers)
[GIN-debug] HEAD   /api/blobs/:digest        --> github.com/jmorganca/ollama/server.HeadBlobHandler (5 handlers)
[GIN-debug] POST   /v1/chat/completions      --> github.com/jmorganca/ollama/server.ChatHandler (6 handlers)
[GIN-debug] GET    /                         --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] GET    /api/tags                 --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
[GIN-debug] GET    /api/version              --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers)
[GIN-debug] HEAD   /                         --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD   /api/tags                 --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
[GIN-debug] HEAD   /api/version              --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers)
time=2024-03-24T11:21:16.097-06:00 level=INFO source=routes.go:999 msg="Listening on 127.0.0.1:11434 (version 0.0.0)"
time=2024-03-24T11:21:16.097-06:00 level=INFO source=payload_common.go:106 msg="Extracting dynamic libraries..."
time=2024-03-24T11:21:16.178-06:00 level=INFO source=payload_common.go:145 msg="Dynamic LLM libraries [oneapi cpu_avx2 cpu_avx cpu]"
time=2024-03-24T11:21:16.178-06:00 level=INFO source=gpu.go:105 msg="Detecting GPU type"
time=2024-03-24T11:21:16.178-06:00 level=INFO source=gpu.go:285 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-24T11:21:16.182-06:00 level=INFO source=gpu.go:331 msg="Discovered GPU libraries: []"
time=2024-03-24T11:21:16.182-06:00 level=INFO source=gpu.go:285 msg="Searching for GPU management library librocm_smi64.so"
time=2024-03-24T11:21:16.183-06:00 level=INFO source=gpu.go:331 msg="Discovered GPU libraries: []"
time=2024-03-24T11:21:16.183-06:00 level=INFO source=gpu.go:285 msg="Searching for GPU management library libze_intel_gpu.so"
time=2024-03-24T11:21:16.187-06:00 level=INFO source=gpu.go:331 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libze_intel_gpu.so.1.3.27191.42]"
time=2024-03-24T11:21:16.209-06:00 level=INFO source=gpu.go:130 msg="Intel GPU detected"
time=2024-03-24T11:21:16.209-06:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-24T11:21:16.210-06:00 level=INFO source=routes.go:1022 msg="no GPU detected"

Probalby due to the unified memory of the integrated GPU. I am not experienced with golang or project but a look into the routes.go file gives me the impression that the VRAM Check fails due to unified memory

1019         if runtime.GOOS == "linux" { // TODO - windows too
1020                 // check compatibility to log warnings
1021                 if _, err := gpu.CheckVRAM(); err != nil {
1022                         slog.Info(err.Error())
1023                 }
1024         }

Any suggestions what i could do to work around this?

update:

Poking around a little bit shows that i seam to be on the right path with the unified memory.
https://github.com/zhewang1-intc/ollama/blob/a6a05dce6d066239db6dcdca7accb494eafbb3ef/gpu/gpu.go#L234

Here it states // TODO - better handling of CPU based memory determiniation

airMeng · 2024-03-24T23:56:13Z

@semidark yes, I think there is some confusing in the logic here, we will refactor later to align with llama.cpp side.

dhiltgen · 2024-03-25T12:33:10Z

@zhewang1-intc I just got my hands on an Intel ARC GPU and my plan is to get it installed in a system, and try this PR out this week, then I'll do a code review for you.

Update: Looks like the ARC GPUs don't work on the older mobo/CPU combos I have at the moment, so I'll need to source a newer setup to validate it. Hopefully next week I can get it up and running.

dhiltgen

I haven't looked through the code in depth yet, but when I tried to run this on my test system, something doesn't work correctly during discovery.

time=2024-04-01T22:05:13.407Z level=DEBUG source=gpu.go:332 msg="gpu management search paths: [/usr/lib/x86_64-linux-gnu/libze_intel_gpu.so* /usr/lib*/libze_intel_gpu.so* /home/daniel/libze_intel_gpu.so*]"
time=2024-04-01T22:05:13.408Z level=INFO source=gpu.go:360 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libze_intel_gpu.so.1.3.28202.39]"
wiring Level-Zero management library functions in /usr/lib/x86_64-linux-gnu/libze_intel_gpu.so.1.3.28202.39
dlsym: zesInit
dlsym: zesDriverGet
dlsym: zesDeviceGet
dlsym: zesDeviceGetProperties
dlsym: zesDeviceEnumMemoryModules
dlsym: zesMemoryGetProperties
dlsym: zesMemoryGetState
zesInit err: 2013265921
time=2024-04-01T22:05:13.430Z level=INFO source=gpu.go:406 msg="Unable to load oneAPI management library /usr/lib/x86_64-linux-gnu/libze_intel_gpu.so.1.3.28202.39: oneapi vram init failure: 2013265921"

I was trying to emulate a user environment (not developer), so I only install the driver packages for ubuntu.

dhiltgen · 2024-04-01T22:04:42Z

Dockerfile

+COPY --from=llm-code / /go/src/github.com/jmorganca/ollama/
+WORKDIR /go/src/github.com/jmorganca/ollama/llm/generate


We recently cleaned up the package paths, so this is stale. replace jmorganca with ollama.

zhewang1-intc · 2024-04-02T05:28:09Z

I haven't looked through the code in depth yet, but when I tried to run this on my test system, something doesn't work correctly during discovery.

time=2024-04-01T22:05:13.407Z level=DEBUG source=gpu.go:332 msg="gpu management search paths: [/usr/lib/x86_64-linux-gnu/libze_intel_gpu.so* /usr/lib*/libze_intel_gpu.so* /home/daniel/libze_intel_gpu.so*]"
time=2024-04-01T22:05:13.408Z level=INFO source=gpu.go:360 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libze_intel_gpu.so.1.3.28202.39]"
wiring Level-Zero management library functions in /usr/lib/x86_64-linux-gnu/libze_intel_gpu.so.1.3.28202.39
dlsym: zesInit
dlsym: zesDriverGet
dlsym: zesDeviceGet
dlsym: zesDeviceGetProperties
dlsym: zesDeviceEnumMemoryModules
dlsym: zesMemoryGetProperties
dlsym: zesMemoryGetState
zesInit err: 2013265921
time=2024-04-01T22:05:13.430Z level=INFO source=gpu.go:406 msg="Unable to load oneAPI management library /usr/lib/x86_64-linux-gnu/libze_intel_gpu.so.1.3.28202.39: oneapi vram init failure: 2013265921"

I was trying to emulate a user environment (not developer), so I only install the driver packages for ubuntu.

It seems that the err code 2013265921 indicate the ZE_RESULT_ERROR_UNINITIALIZED error in level-zero, which means the driver is not initialized.
Please make sure that the Intel GPU driver is installed correctly on your machine.
You can refer to the README-sycl in llama.cpp to setup the environment and make sure that the device check and examples in the README-sycl work normally.
e.g. ZES_ENABLE_SYSMAN=1 ./build/bin/main -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm none -mg 0

airMeng · 2024-04-03T07:30:54Z

Hi it shall work now after ggerganov/llama.cpp#6435 merged

zhewang1-intc · 2024-05-16T05:16:24Z

@dhiltgen hi, could you pls take a review when you free so we can improve this pr?

dhiltgen · 2024-05-23T18:14:33Z

Since I'm having some difficulty successfully getting this running on my test system, let me suggest we break this into 2 pieces to reduce the rebase churn so we can make incremental progress. Let's focus this PR on the base enablement with the ./gpu/* updates and ./server/* changes so that this can be built from source if you have the libraries on your host, but otherwise is dormant code and not yet part of the binary release we ship. Then create a new PR for the ./docs/* and ./scripts/* changes which we can merge later once things are well tested and solid across a broad set of distros with ~zero manual steps required by the user beyond installing the GPU Driver. I'll take a review pass through this PR focused on the GPU and server changes.

dhiltgen · 2024-05-23T18:29:01Z

llm/generate/gen_linux.sh

+    CC=icx
+    CMAKE_DEFS="${COMMON_CMAKE_DEFS} ${CMAKE_DEFS} -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_SYCL=ON -DLLAMA_SYCL_F16=OFF"
+    BUILD_DIR="../build/linux/${ARCH}/oneapi"
+    EXTRA_LIBS="-fsycl -Wl,-rpath,${ONEAPI_ROOT}/compiler/latest/lib,-rpath,${ONEAPI_ROOT}/mkl/latest/lib,-rpath,${ONEAPI_ROOT}/tbb/latest/lib,-rpath,${ONEAPI_ROOT}/compiler/latest/opt/oclfpga/linux64/lib -lOpenCL -lmkl_core -lmkl_sycl_blas -lmkl_intel_ilp64 -lmkl_tbb_thread -ltbb"


This feels like it may be problematic if the user doesn't have oneapi installed at the exact same location as the build system. I think we'll eventually need to carry these libraries as dependency payloads like we do with cuda and rocm. (assuming that's permitted)

As an incremental step before we expose in the official builds, this may be OK, although I see you are copying the libraries below. Ideally we'd like to set this up so the user only needs the driver installed on the host, and we use the user-space library from our build to ensure things are linked properly.

The libraries copied to the ${BUILD_DIR}/bin directory contain the essential oneAPI dependencies required for running the llama.cpp. There's no need for users to install oneAPI themselves, as the build system handles this dependency which means build system should install oneAPI.

When the ollama binary executes, these dependencies are extracted to a temporary location on the user's machine. Upon detecting an Intel GPU driver, we appends the path to this temporary directory (containing the oneAPI dependencies) to the LD_LIBRARY_PATH environment variable. This ensures that even if oneAPI isn't installed locally, the program can still locate the necessary libraries to function.

In that case, I don't think this rpath is what we want, since that's setting a runtime setting for where ld can find the libraries at runtime and assumes you have installed the libs in the same location as the build system, but we likely want it to be a relative path based on $ORIGIN like we're doing with ROCm. We can tidy this up in a follow up.

dhiltgen · 2024-05-23T18:38:52Z

llm/server.go

+		if strings.HasPrefix(servers[i], "oneapi") {
+			os.Setenv("Path", os.Getenv("Path")+dir)
+			slog.Debug("append oneapi lib in Path env:", os.Getenv("Path"))
+		}


This doesn't seem necessary. This block should cover it, and if you need additional deps wired up, use GpuInfo.DependencyPath.

yes, you are right, i remove it and ollama also works even if we didn't source the oneapi-related env script.

airMeng · 2024-05-27T05:35:52Z

@dhiltgen Anything needing before merged?

dhiltgen · 2024-05-28T16:59:59Z

I'll merge this once we finalize the 0.1.39 release

zhewang1-intc force-pushed the rebase_ollama_main branch from edfbd44 to 3e1cc67 Compare March 21, 2024 06:24

zhewang1-intc force-pushed the rebase_ollama_main branch from fc7bffd to 3f06c4a Compare March 21, 2024 07:47

zhewang1-intc force-pushed the rebase_ollama_main branch 2 times, most recently from 9dd2b62 to 384bc56 Compare March 27, 2024 00:49

dhiltgen reviewed Apr 1, 2024

View reviewed changes

airMeng mentioned this pull request Apr 2, 2024

[SYCL] Disable iqx on windows as WA ggerganov/llama.cpp#6435

Merged

zhewang1-intc force-pushed the rebase_ollama_main branch 2 times, most recently from 6b1fb68 to aaf0a67 Compare April 9, 2024 06:58

zhewang1-intc force-pushed the rebase_ollama_main branch from e8aa4a5 to ac30dc4 Compare April 11, 2024 06:31

zhewang1-intc force-pushed the rebase_ollama_main branch from f66d6f5 to ca1ac65 Compare May 10, 2024 05:07

jdetroyes mentioned this pull request May 10, 2024

[Feature] Can you add gpu-type: intel otwld/ollama-helm#31

Open

zhewang1-intc force-pushed the rebase_ollama_main branch 2 times, most recently from 64af97c to b4c7cce Compare May 14, 2024 01:55

dhiltgen reviewed May 23, 2024

View reviewed changes

support ollama run on Intel GPUs

fd5971b

zhewang1-intc force-pushed the rebase_ollama_main branch from 812e8f2 to fd5971b Compare May 24, 2024 03:20

dhiltgen approved these changes May 24, 2024

View reviewed changes

dhiltgen merged commit 646371f into ollama:main May 28, 2024
15 checks passed

gamunu mentioned this pull request May 29, 2024

feat: enable OLLAMA Arc GPU support with SYCL backend #3796

Closed

kozuch mentioned this pull request Jun 6, 2024

Add support for Intel Arc GPUs #1590

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabling ollama to run on Intel GPUs with SYCL backend #3278

Enabling ollama to run on Intel GPUs with SYCL backend #3278

zhewang1-intc commented Mar 21, 2024 •

edited

airMeng commented Mar 21, 2024

semidark commented Mar 24, 2024 •

edited

airMeng commented Mar 24, 2024

dhiltgen commented Mar 25, 2024 •

edited

dhiltgen left a comment

dhiltgen Apr 1, 2024

zhewang1-intc commented Apr 2, 2024

airMeng commented Apr 3, 2024

zhewang1-intc commented May 16, 2024

dhiltgen commented May 23, 2024 •

edited

dhiltgen May 23, 2024

zhewang1-intc May 24, 2024

dhiltgen May 24, 2024

dhiltgen May 23, 2024

zhewang1-intc May 24, 2024

airMeng commented May 27, 2024

dhiltgen commented May 28, 2024 •

edited

		COPY --from=llm-code / /go/src/github.com/jmorganca/ollama/
		WORKDIR /go/src/github.com/jmorganca/ollama/llm/generate

Enabling ollama to run on Intel GPUs with SYCL backend #3278

Enabling ollama to run on Intel GPUs with SYCL backend #3278

Conversation

zhewang1-intc commented Mar 21, 2024 • edited

airMeng commented Mar 21, 2024

semidark commented Mar 24, 2024 • edited

airMeng commented Mar 24, 2024

dhiltgen commented Mar 25, 2024 • edited

dhiltgen left a comment

Choose a reason for hiding this comment

dhiltgen Apr 1, 2024

Choose a reason for hiding this comment

zhewang1-intc commented Apr 2, 2024

airMeng commented Apr 3, 2024

zhewang1-intc commented May 16, 2024

dhiltgen commented May 23, 2024 • edited

dhiltgen May 23, 2024

Choose a reason for hiding this comment

zhewang1-intc May 24, 2024

Choose a reason for hiding this comment

dhiltgen May 24, 2024

Choose a reason for hiding this comment

dhiltgen May 23, 2024

Choose a reason for hiding this comment

zhewang1-intc May 24, 2024

Choose a reason for hiding this comment

airMeng commented May 27, 2024

dhiltgen commented May 28, 2024 • edited

zhewang1-intc commented Mar 21, 2024 •

edited

semidark commented Mar 24, 2024 •

edited

dhiltgen commented Mar 25, 2024 •

edited

dhiltgen commented May 23, 2024 •

edited

dhiltgen commented May 28, 2024 •

edited