-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update message processing #5126
Conversation
bcffa81
to
8087a37
Compare
78a13b7
to
8473b76
Compare
break | ||
} | ||
} | ||
// chatPrompt accepts a list of messages and returns the prompt and images that should be used for the next chat turn. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Thank you.
31d734e
to
68c3ec8
Compare
8844279
to
d982a4c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last comment about the response code in an error case, but otherwise this is looking ready to me.
d982a4c
to
3d8f174
Compare
dbfc9da
to
0d82aa0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, excited for this one
server/routes.go
Outdated
case err != nil: | ||
c.AbortWithStatusJSON(http.StatusBadRequest, gin.H{"error": err.Error()}) | ||
return | ||
func (s *Server) scheduleRunner(ctx context.Context, name string, caps []Capability, requestOpts map[string]any, keepAlive *api.Duration) (*runnerRef, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing in caps
is weird here – shouldn't the caller do that before scheduling the runner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the caller doesn't have access to the model so it can't check capabilities
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It did before this change
Could we load the model and check the capabilities before deciding to schedule the model here? and then pass in the Model
? It seems like a precondition the handler should check (ideally this function goes away and we do something like schduler.Schedule(model, opts)
from the handler which would be a lot clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe the function is misnamed. this doesn't schedule anything until some check including capabilities. it aggregates the various model and options related lines that chat, generate, and embeddings all have to call before scheduling a runner
This is looking good. Love that messages are becoming a first-class part of the templating. |
056ba91
to
eed3f96
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great- small comment on the templating so we don't overcommit to helpers yet unless we really have to
LGTM. Will test this a bunch this evening (no need to block merging) but it looks great |
ensure runtime model changes (template, system prompt, messages, options) are captured on model updates without needing to reload the server
this change changes the way messages are processed before handing off to the llm. there are a few areas worth mentioning:
messages are now a first class component of the template. template rendering will only falling back to the previous iterative template if messages is unsupported by the template. however, new models should implement the previous prompt/response template for compatibility with older ollama versions
the generate endpoint has been updated to use messages for prompt templating but the end result should be the same
the chat endpoint has been updated to preprocess incoming messages