Skip to content

[Metal] Context init optimization opportunity: metal library is compiled for every llama context #12199

@iboB

Description

@iboB

It's likely that this should be addressed in ggml rather than llama

This is the observed call stack

llama_init_from_model 
  -> ggml_backend_dev_init 
    -> ggml_backend_metal_device_init 
      -> ggml_metal_init 
        -> device.newLibraryWithSource

(obviously in cases where the code is compiled such as with the default GGML_METAL_EMBED_LIBRARY)

For every context the exact same code is compiled again. This seems like something that can be avoided. I'm not a metal expert, but there must be a way to cache the compilation and reuse it for subsequent contexts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Apple Metalhttps://en.wikipedia.org/wiki/Metal_(API)good first issueGood for newcomers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions