You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’d like to have a proxy model that points to whichever model is currently loaded, and if no model is loaded, falls back to loading a specific default one.
To clarify further:
I have the Hermes Agent running, and every time it needs to perform a task, it loads the default model—gpt-oss-20b—to work with.
However, I also want to use a specific model (e.g., Qwen3-Coder) for working on pi.dev. Since my machine can only load one model at a time, if Hermes kicks in while I’m using another model (like Qwen3-Coder), it will load its own model (gpt-oss-20b), displacing mine—which is problematic.
So, I’d like to configure Hermes to use a “fake” proxy model on llama-swap that instead of loading one, simply points to whichever model already loaded (e.g., Qwen3-Coder). If no model is currently loaded, then it should load the default one (e.g., gpt-oss-20b).
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I’d like to have a proxy model that points to whichever model is currently loaded, and if no model is loaded, falls back to loading a specific default one.
To clarify further:
I have the Hermes Agent running, and every time it needs to perform a task, it loads the default model—
gpt-oss-20b—to work with.However, I also want to use a specific model (e.g.,
Qwen3-Coder) for working onpi.dev. Since my machine can only load one model at a time, if Hermes kicks in while I’m using another model (likeQwen3-Coder), it will load its own model (gpt-oss-20b), displacing mine—which is problematic.So, I’d like to configure Hermes to use a “fake” proxy model on llama-swap that instead of loading one, simply points to whichever model already loaded (e.g.,
Qwen3-Coder). If no model is currently loaded, then it should load the default one (e.g.,gpt-oss-20b).Is that possible?
Beta Was this translation helpful? Give feedback.
All reactions