-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Ollama: Pull models automatically at startup #1554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ollama: Pull models automatically at startup #1554
Conversation
ThomasVitale
commented
Oct 17, 2024
- Introduce support for Ollama model auto-pull at startup time
- Enhance support for Ollama model auto-pull at run time
- Update documentation about integrating with Ollama and managing models
- Adopt Builder pattern in Ollama Model classes for better code readability
- Unify Ollama model auto-pull functionality in production and test code
- Improve integration tests for Ollama with Testcontainers
* Introduce support for Ollama model auto-pull at startup time * Enhance support for Ollama model auto-pull at run time * Update documentation about integrating with Ollama and managing models * Adopt Builder pattern in Ollama Model classes for better code readability * Unify Ollama model auto-pull functionality in production and test code * Improve integration tests for Ollama with Testcontainers
FunctionCallbackContext functionCallbackContext) { | ||
this(ollamaApi, defaultOptions, functionCallbackContext, List.of()); | ||
} | ||
private ChatModelObservationConvention observationConvention = DEFAULT_OBSERVATION_CONVENTION; | ||
|
||
public OllamaChatModel(OllamaApi ollamaApi, OllamaOptions defaultOptions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The argument list grew so much that I didn't want to add even more overloaded constructors. Instead, I introduced a Builder to help making this whole initialisation code more readable.
public OllamaEmbeddingModel(OllamaApi ollamaApi, OllamaOptions defaultOptions) { | ||
this(ollamaApi, defaultOptions, ObservationRegistry.NOOP); | ||
} | ||
private EmbeddingModelObservationConvention observationConvention = DEFAULT_OBSERVATION_CONVENTION; | ||
|
||
public OllamaEmbeddingModel(OllamaApi ollamaApi, OllamaOptions defaultOptions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here, I introduced a Builder
@@ -954,13 +957,15 @@ public record ProgressResponse( | |||
* Download a model from the Ollama library. Cancelled pulls are resumed from where they left off, | |||
* and multiple calls will share the same download progress. | |||
*/ | |||
public ProgressResponse pullModel(PullModelRequest pullModelRequest) { | |||
return this.restClient.post() | |||
public Flux<ProgressResponse> pullModel(PullModelRequest pullModelRequest) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the streaming option is grew because it gives us continuous status updates on the download, that we can log to keep the user up-to-date. It also makes it easy to define timeouts and retries.
*/ | ||
public enum PullModelStrategy { | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When pulling models, there are two options here: always (latest model version) and when_missing (the model could be stale).
ollamaApi.pullModel(new OllamaApi.PullModelRequest(model)); | ||
logger.info("Completed pulling the '{}' model", model); | ||
var ollamaModelManager = new OllamaModelManager(ollamaApi); | ||
ollamaModelManager.pullModel(model, PullModelStrategy.WHEN_MISSING); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The integration test setup now uses the same auto-pull functionality as the production code.
@@ -52,23 +52,24 @@ class OllamaChatModelFunctionCallingIT extends BaseOllamaIT { | |||
|
|||
private static final Logger logger = LoggerFactory.getLogger(OllamaChatModelFunctionCallingIT.class); | |||
|
|||
private static final String MODEL = OllamaModel.LLAMA3_1.getName(); | |||
private static final String MODEL = "qwen2.5:3b"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got many failures with llama3.1, even after refining the prompt. With qwen2.5:3b I got all green results, and it's even a smaller model (so better for integration testing).
logger.info("Pulling the '{}' model - Status: {}", modelName, progressResponses.get(progressResponses.size() - 1).status()); | ||
} | ||
}) | ||
.takeUntil(progressResponses -> progressResponses.get(0) != null && progressResponses.get(0).status().equals("success")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we continuously print out the current status of the download, and the whole operation is configured with timeout and retry.
@ConfigurationProperties(OllamaInitializationProperties.CONFIG_PREFIX) | ||
public class OllamaInitializationProperties { | ||
|
||
public static final String CONFIG_PREFIX = "spring.ai.ollama.init"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming tries to be consistent with other similar Spring Boot features like spring.sql.init
to initialise database schemas.
Great stuff @ThomasVitale , thanks. |
rebased and merged at 8eef6e6 |
@tzolov thanks! I like the idea of the list of models. I'll work on a followup PR, including also a couple of improvements to add a bit more flexibility. |