It's likely that this should be addressed in ggml rather than llama
This is the observed call stack
llama_init_from_model
-> ggml_backend_dev_init
-> ggml_backend_metal_device_init
-> ggml_metal_init
-> device.newLibraryWithSource
(obviously in cases where the code is compiled such as with the default GGML_METAL_EMBED_LIBRARY)
For every context the exact same code is compiled again. This seems like something that can be avoided. I'm not a metal expert, but there must be a way to cache the compilation and reuse it for subsequent contexts.