Skip to content

Linux Kernel 6.18 support? #875

@plumlis

Description

@plumlis

I have been running Ollama on an ARL-H U258H laptop with 64GB RAM, Fedora 43, using Intel IPEX.
I was using the Intel officially recommended Docker image for inference:

https://hub.docker.com/r/intelanalytics/ipex-llm-inference-cpp-xpu

This setup had been working reliably before.

A few days ago I upgraded my system kernel to Linux 6.18, and since then Ollama models can no longer be loaded when using IPEX. The main error I see is:

“Native API failed. Native API returns: 39 (UR_RESULT_ERROR_OUT_OF_DEVICE_MEMORY)”

get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memorytime=2025-12-14T00:04:07.339+08:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server loading model"

[GIN] 2025/12/14 - 00:04:06 | 200 |      23.915µs |       127.0.0.1 | HEAD     "/"
[GIN] 2025/12/14 - 00:04:06 | 200 |  125.603561ms |       127.0.0.1 | POST     "/api/show"
Native API failed. Native API returns: 39 (UR_RESULT_ERROR_OUT_OF_DEVICE_MEMORY)
Exception caught at file:/home/runner/_work/llm.cpp/llm.cpp/ollama-llama-cpp/ggml/src/ggml-sycl/ggml-sycl.cpp, line:405, func:operator()
SYCL error: CHECK_TRY_ERROR(ctx->stream->memset( (char *)tensor->data + original_size, 0, padded_size - original_size).wait()): Exception caught in this line of code.
  in function ggml_backend_sycl_buffer_init_tensor at /home/runner/_work/llm.cpp/llm.cpp/ollama-llama-cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:405
/home/runner/_work/llm.cpp/llm.cpp/ollama-llama-cpp/ggml/src/ggml-sycl/../ggml-sycl/common.hpp:115: SYCL error

From the behavior, it looks like the GPU is detected and accessible, but device memory allocation always fails.

At first I suspected the Docker image might be too old, so I tried several alternatives:

Building and running Ollama manually against the local runtime

Running via ramalama (llama.cpp based container with Intel GPU support)

None of these worked on kernel 6.18.
For ramalama, the error message is:

:: initializing oneAPI environment ...
   entrypoint.sh: BASH_VERSION = 5.3.0(1)-release
   args: Using "$@" for setvars.sh arguments: llama-server -m /models/model.gguf --host 0.0.0.0 --port 8080
:: compiler -- latest
:: mkl -- latest
:: tbb -- latest
:: umf -- latest
:: oneAPI environment initialized ::
 
Abort was called at 70 line in file:
/builddir/build/BUILD/intel-compute-runtime-25.31.34666.3-build/compute-runtime-25.31.34666.3/shared/source/helpers/bindless_heaps_helper.cpp

I have also updated to the latest intel-compute-runtime available in the Fedora repositories, but the issue persists.

Importantly, if I boot back into kernel 6.17, everything works correctly again. Both the Intel IPEX Docker image and ramalama run without issues on 6.17.

I checked the Linux 6.18 changelog and did not notice any obvious changes related to GPU memory management or Intel Xe that would clearly explain this behavior.

I have already opened an issue on the IPEX side as well,:intel/ipex-llm#13334

but I wanted to ask here:
Is kernel 6.18 currently unsupported or unvalidated for IPEX / compute-runtime based workloads?
Or is there a known incompatibility between Linux 6.18 and the current Intel GPU userspace stack?

I also noticed discussions mentioning that Intel GPU support has been merged upstream into Ollama, but from what I can see this appears to be limited to experimental Vulkan support, not the IPEX / Level Zero path.

Any clarification on the expected kernel support status would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions