Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling ollama to run on Intel GPUs with SYCL backend #3278

Merged
merged 1 commit into from
May 28, 2024

Conversation

zhewang1-intc
Copy link
Contributor

@zhewang1-intc zhewang1-intc commented Mar 21, 2024

Hi, I am submitting this pr to enable ollama to run on Intel GPUs with SYCL as the backend. This pr was originally started by @felipeagc who is currently unable to actively participate due to relocation.
The original pr had fallen behind the main branch, making it inconvenient for maintainers @mxyng @jmorganca @dhiltgen to review. Therefore, I rebased the latest main branch and opened this new pull request. I have verified that it works correctly on Ubuntu 22.04 with ARC 770 GPU.
While I am not very familiar with this project and I welcome any guidance and assistance from the community. Let’s work together to make ollama support Intel GPU platforms. cc:@hshen14 @kevinintel @airMeng

UPDATE: works well on windows10 + ARC 770
UPDATE: works well on oneapi-docker-image(oneapi-basekit-Ubuntu22.04) + ARC770

@airMeng
Copy link

airMeng commented Mar 21, 2024

@jmorganca @mxyng Could you give a review?

@semidark
Copy link

semidark commented Mar 24, 2024

I tried this on my Intel integrated GPU. I am able to build and run llama.cpp with Intel GPU support without too much problems following this Tutorial: https://github.com/ggerganov/llama.cpp/blob/master/README-sycl.md

docker run -p 8080:8080 -it --rm -v "$(pwd):/app:Z" --device /dev/dri/renderD128:/dev/dri/renderD128 --device /dev/dri/card0:/dev/dri/card0 llama-cpp-sycl-server -m "/app/models/orca-2-13b.Q5_K_M.gguf" -c 512 --host 0.0.0.0 --port 8080 -ngl 41
ggml_init_sycl: GGML_SYCL_DEBUG: 0
ggml_init_sycl: GGML_SYCL_F16: no
found 4 SYCL devices:
|  |                  |                                             |Compute   |Max compute|Max work|Max sub|               |
|ID|       Device Type|                                         Name|capability|units      |group   |group  |Global mem size|
|--|------------------|---------------------------------------------|----------|-----------|--------|-------|---------------|
| 0|[level_zero:gpu:0]|                 Intel(R) Iris(R) Xe Graphics|       1.3|         96|     512|     32|    26669551616|
| 1|    [opencl:gpu:0]|                 Intel(R) Iris(R) Xe Graphics|       3.0|         96|     512|     32|    26669551616|
| 2|    [opencl:cpu:0]|11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz|       3.0|          8|    8192|     64|    33336942592|
| 3|    [opencl:acc:0]|               Intel(R) FPGA Emulation Device|       1.2|          8|67108864|     64|    33336942592|
{"build":0,"commit":"unknown","function":"main","level":"INFO","line":2756,"msg":"build info","tid":"138119685482496","timestamp":1711301931}
{"function":"main","level":"INFO","line":2763,"msg":"system info","n_threads":4,"n_threads_batch":-1,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | ","tid":"138119685482496","timestamp":1711301931,"total_threads":8}
llama_model_loader: loaded meta data with 22 key-value pairs and 363 tensors from /app/models/orca-2-13b.Q5_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 5120
llama_model_loader: - kv   4:                          llama.block_count u32              = 40
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 13824
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 40
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 40
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  11:                          general.file_type u32              = 17
llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32003]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32003]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32003]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  18:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  19:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  20:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  21:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   81 tensors
llama_model_loader: - type q5_K:  241 tensors
llama_model_loader: - type q6_K:   41 tensors
llm_load_vocab: special tokens definition check successful ( 262/32003 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32003
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 5120
llm_load_print_meta: n_head           = 40
llm_load_print_meta: n_head_kv        = 40
llm_load_print_meta: n_layer          = 40
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 5120
llm_load_print_meta: n_embd_v_gqa     = 5120
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 13824
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 13B
llm_load_print_meta: model ftype      = Q5_K - Medium
llm_load_print_meta: model params     = 13.02 B
llm_load_print_meta: model size       = 8.60 GiB (5.67 BPW) 
llm_load_print_meta: general.name     = LLaMA v2
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: PAD token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
ggml_backend_sycl_set_mul_device_mode: true
detect 1 SYCL GPUs: [0] with top Max compute units:96
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
llm_load_tensors: ggml ctx size =    0.28 MiB
llm_load_tensors: offloading 40 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 41/41 layers to GPU
llm_load_tensors:      SYCL0 buffer size =  8694.22 MiB
llm_load_tensors:        CPU buffer size =   107.43 MiB
...................................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      SYCL0 KV buffer size =   400.00 MiB
llama_new_context_with_model: KV self size  =  400.00 MiB, K (f16):  200.00 MiB, V (f16):  200.00 MiB
llama_new_context_with_model:  SYCL_Host  output buffer size =    62.51 MiB
llama_new_context_with_model:      SYCL0 compute buffer size =    81.00 MiB
llama_new_context_with_model:  SYCL_Host compute buffer size =    11.00 MiB
llama_new_context_with_model: graph nodes  = 1324
llama_new_context_with_model: graph splits = 2

But the Version of llama.cpp started by Ollama fails to detect/use my GPU.

./ollama serve
time=2024-03-24T11:21:16.096-06:00 level=INFO source=images.go:863 msg="total blobs: 6"
time=2024-03-24T11:21:16.096-06:00 level=INFO source=images.go:870 msg="total unused blobs removed: 0"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:	export GIN_MODE=release
 - using code:	gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST   /api/pull                 --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers)
[GIN-debug] POST   /api/generate             --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers)
[GIN-debug] POST   /api/chat                 --> github.com/jmorganca/ollama/server.ChatHandler (5 handlers)
[GIN-debug] POST   /api/embeddings           --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers)
[GIN-debug] POST   /api/create               --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers)
[GIN-debug] POST   /api/push                 --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers)
[GIN-debug] POST   /api/copy                 --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers)
[GIN-debug] DELETE /api/delete               --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers)
[GIN-debug] POST   /api/show                 --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers)
[GIN-debug] POST   /api/blobs/:digest        --> github.com/jmorganca/ollama/server.CreateBlobHandler (5 handlers)
[GIN-debug] HEAD   /api/blobs/:digest        --> github.com/jmorganca/ollama/server.HeadBlobHandler (5 handlers)
[GIN-debug] POST   /v1/chat/completions      --> github.com/jmorganca/ollama/server.ChatHandler (6 handlers)
[GIN-debug] GET    /                         --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] GET    /api/tags                 --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
[GIN-debug] GET    /api/version              --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers)
[GIN-debug] HEAD   /                         --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD   /api/tags                 --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
[GIN-debug] HEAD   /api/version              --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers)
time=2024-03-24T11:21:16.097-06:00 level=INFO source=routes.go:999 msg="Listening on 127.0.0.1:11434 (version 0.0.0)"
time=2024-03-24T11:21:16.097-06:00 level=INFO source=payload_common.go:106 msg="Extracting dynamic libraries..."
time=2024-03-24T11:21:16.178-06:00 level=INFO source=payload_common.go:145 msg="Dynamic LLM libraries [oneapi cpu_avx2 cpu_avx cpu]"
time=2024-03-24T11:21:16.178-06:00 level=INFO source=gpu.go:105 msg="Detecting GPU type"
time=2024-03-24T11:21:16.178-06:00 level=INFO source=gpu.go:285 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-24T11:21:16.182-06:00 level=INFO source=gpu.go:331 msg="Discovered GPU libraries: []"
time=2024-03-24T11:21:16.182-06:00 level=INFO source=gpu.go:285 msg="Searching for GPU management library librocm_smi64.so"
time=2024-03-24T11:21:16.183-06:00 level=INFO source=gpu.go:331 msg="Discovered GPU libraries: []"
time=2024-03-24T11:21:16.183-06:00 level=INFO source=gpu.go:285 msg="Searching for GPU management library libze_intel_gpu.so"
time=2024-03-24T11:21:16.187-06:00 level=INFO source=gpu.go:331 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libze_intel_gpu.so.1.3.27191.42]"
time=2024-03-24T11:21:16.209-06:00 level=INFO source=gpu.go:130 msg="Intel GPU detected"
time=2024-03-24T11:21:16.209-06:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-24T11:21:16.210-06:00 level=INFO source=routes.go:1022 msg="no GPU detected"

Probalby due to the unified memory of the integrated GPU. I am not experienced with golang or project but a look into the routes.go file gives me the impression that the VRAM Check fails due to unified memory

1019         if runtime.GOOS == "linux" { // TODO - windows too
1020                 // check compatibility to log warnings
1021                 if _, err := gpu.CheckVRAM(); err != nil {
1022                         slog.Info(err.Error())
1023                 }
1024         }

Any suggestions what i could do to work around this?


update:

Poking around a little bit shows that i seam to be on the right path with the unified memory.
https://github.com/zhewang1-intc/ollama/blob/a6a05dce6d066239db6dcdca7accb494eafbb3ef/gpu/gpu.go#L234

Here it states // TODO - better handling of CPU based memory determiniation

@airMeng
Copy link

airMeng commented Mar 24, 2024

@semidark yes, I think there is some confusing in the logic here, we will refactor later to align with llama.cpp side.

@dhiltgen
Copy link
Collaborator

dhiltgen commented Mar 25, 2024

@zhewang1-intc I just got my hands on an Intel ARC GPU and my plan is to get it installed in a system, and try this PR out this week, then I'll do a code review for you.

Update: Looks like the ARC GPUs don't work on the older mobo/CPU combos I have at the moment, so I'll need to source a newer setup to validate it. Hopefully next week I can get it up and running.

@zhewang1-intc zhewang1-intc force-pushed the rebase_ollama_main branch 2 times, most recently from 9dd2b62 to 384bc56 Compare March 27, 2024 00:49
Copy link
Collaborator

@dhiltgen dhiltgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked through the code in depth yet, but when I tried to run this on my test system, something doesn't work correctly during discovery.

time=2024-04-01T22:05:13.407Z level=DEBUG source=gpu.go:332 msg="gpu management search paths: [/usr/lib/x86_64-linux-gnu/libze_intel_gpu.so* /usr/lib*/libze_intel_gpu.so* /home/daniel/libze_intel_gpu.so*]"
time=2024-04-01T22:05:13.408Z level=INFO source=gpu.go:360 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libze_intel_gpu.so.1.3.28202.39]"
wiring Level-Zero management library functions in /usr/lib/x86_64-linux-gnu/libze_intel_gpu.so.1.3.28202.39
dlsym: zesInit
dlsym: zesDriverGet
dlsym: zesDeviceGet
dlsym: zesDeviceGetProperties
dlsym: zesDeviceEnumMemoryModules
dlsym: zesMemoryGetProperties
dlsym: zesMemoryGetState
zesInit err: 2013265921
time=2024-04-01T22:05:13.430Z level=INFO source=gpu.go:406 msg="Unable to load oneAPI management library /usr/lib/x86_64-linux-gnu/libze_intel_gpu.so.1.3.28202.39: oneapi vram init failure: 2013265921"

I was trying to emulate a user environment (not developer), so I only install the driver packages for ubuntu.

Dockerfile Outdated
Comment on lines 57 to 58
COPY --from=llm-code / /go/src/github.com/jmorganca/ollama/
WORKDIR /go/src/github.com/jmorganca/ollama/llm/generate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We recently cleaned up the package paths, so this is stale. replace jmorganca with ollama.

@zhewang1-intc
Copy link
Contributor Author

I haven't looked through the code in depth yet, but when I tried to run this on my test system, something doesn't work correctly during discovery.

time=2024-04-01T22:05:13.407Z level=DEBUG source=gpu.go:332 msg="gpu management search paths: [/usr/lib/x86_64-linux-gnu/libze_intel_gpu.so* /usr/lib*/libze_intel_gpu.so* /home/daniel/libze_intel_gpu.so*]"
time=2024-04-01T22:05:13.408Z level=INFO source=gpu.go:360 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libze_intel_gpu.so.1.3.28202.39]"
wiring Level-Zero management library functions in /usr/lib/x86_64-linux-gnu/libze_intel_gpu.so.1.3.28202.39
dlsym: zesInit
dlsym: zesDriverGet
dlsym: zesDeviceGet
dlsym: zesDeviceGetProperties
dlsym: zesDeviceEnumMemoryModules
dlsym: zesMemoryGetProperties
dlsym: zesMemoryGetState
zesInit err: 2013265921
time=2024-04-01T22:05:13.430Z level=INFO source=gpu.go:406 msg="Unable to load oneAPI management library /usr/lib/x86_64-linux-gnu/libze_intel_gpu.so.1.3.28202.39: oneapi vram init failure: 2013265921"

I was trying to emulate a user environment (not developer), so I only install the driver packages for ubuntu.

It seems that the err code 2013265921 indicate the ZE_RESULT_ERROR_UNINITIALIZED error in level-zero, which means the driver is not initialized.
Please make sure that the Intel GPU driver is installed correctly on your machine.
You can refer to the README-sycl in llama.cpp to setup the environment and make sure that the device check and examples in the README-sycl work normally.
e.g. ZES_ENABLE_SYSMAN=1 ./build/bin/main -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm none -mg 0

@airMeng
Copy link

airMeng commented Apr 3, 2024

Hi it shall work now after ggerganov/llama.cpp#6435 merged

@zhewang1-intc
Copy link
Contributor Author

@dhiltgen hi, could you pls take a review when you free so we can improve this pr?

@dhiltgen
Copy link
Collaborator

dhiltgen commented May 23, 2024

Since I'm having some difficulty successfully getting this running on my test system, let me suggest we break this into 2 pieces to reduce the rebase churn so we can make incremental progress. Let's focus this PR on the base enablement with the ./gpu/* updates and ./server/* changes so that this can be built from source if you have the libraries on your host, but otherwise is dormant code and not yet part of the binary release we ship. Then create a new PR for the ./docs/* and ./scripts/* changes which we can merge later once things are well tested and solid across a broad set of distros with ~zero manual steps required by the user beyond installing the GPU Driver. I'll take a review pass through this PR focused on the GPU and server changes.

CC=icx
CMAKE_DEFS="${COMMON_CMAKE_DEFS} ${CMAKE_DEFS} -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_SYCL=ON -DLLAMA_SYCL_F16=OFF"
BUILD_DIR="../build/linux/${ARCH}/oneapi"
EXTRA_LIBS="-fsycl -Wl,-rpath,${ONEAPI_ROOT}/compiler/latest/lib,-rpath,${ONEAPI_ROOT}/mkl/latest/lib,-rpath,${ONEAPI_ROOT}/tbb/latest/lib,-rpath,${ONEAPI_ROOT}/compiler/latest/opt/oclfpga/linux64/lib -lOpenCL -lmkl_core -lmkl_sycl_blas -lmkl_intel_ilp64 -lmkl_tbb_thread -ltbb"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it may be problematic if the user doesn't have oneapi installed at the exact same location as the build system. I think we'll eventually need to carry these libraries as dependency payloads like we do with cuda and rocm. (assuming that's permitted)

As an incremental step before we expose in the official builds, this may be OK, although I see you are copying the libraries below. Ideally we'd like to set this up so the user only needs the driver installed on the host, and we use the user-space library from our build to ensure things are linked properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The libraries copied to the ${BUILD_DIR}/bin directory contain the essential oneAPI dependencies required for running the llama.cpp. There's no need for users to install oneAPI themselves, as the build system handles this dependency which means build system should install oneAPI.

When the ollama binary executes, these dependencies are extracted to a temporary location on the user's machine. Upon detecting an Intel GPU driver, we appends the path to this temporary directory (containing the oneAPI dependencies) to the LD_LIBRARY_PATH environment variable. This ensures that even if oneAPI isn't installed locally, the program can still locate the necessary libraries to function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, I don't think this rpath is what we want, since that's setting a runtime setting for where ld can find the libraries at runtime and assumes you have installed the libs in the same location as the build system, but we likely want it to be a relative path based on $ORIGIN like we're doing with ROCm. We can tidy this up in a follow up.

llm/server.go Outdated
Comment on lines 227 to 230
if strings.HasPrefix(servers[i], "oneapi") {
os.Setenv("Path", os.Getenv("Path")+dir)
slog.Debug("append oneapi lib in Path env:", os.Getenv("Path"))
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem necessary. This block should cover it, and if you need additional deps wired up, use GpuInfo.DependencyPath.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you are right, i remove it and ollama also works even if we didn't source the oneapi-related env script.

@airMeng
Copy link

airMeng commented May 27, 2024

@dhiltgen Anything needing before merged?

@dhiltgen
Copy link
Collaborator

dhiltgen commented May 28, 2024

I'll merge this once we finalize the 0.1.39 release

@dhiltgen dhiltgen merged commit 646371f into ollama:main May 28, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants