[Metal] Context init optimization opportunity: metal library is compiled for every llama context

It's likely that this should be addressed in ggml rather than llama

This is the observed call stack

```
llama_init_from_model 
  -> ggml_backend_dev_init 
    -> ggml_backend_metal_device_init 
      -> ggml_metal_init 
        -> device.newLibraryWithSource
```

(obviously in cases where the code is compiled such as with the default `GGML_METAL_EMBED_LIBRARY`)

For every context the exact same code is compiled again. This seems like something that can be avoided. I'm not a metal expert, but there must be a way to cache the compilation and reuse it for subsequent contexts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Metal] Context init optimization opportunity: metal library is compiled for every llama context #12199

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Metal] Context init optimization opportunity: metal library is compiled for every llama context #12199

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions