Releases: ollama/ollama
v0.4.0
Llama 3.2 Vision
Ollama 0.4 adds support for Llama 3.2 Vision. After upgrading or downloading Ollama, run:
ollama run llama3.2-vision
For the larger, 90B version of the model, run:
ollama run llama3.2-vision:90b
What's changed
- Support for Llama 3.2 Vision (i.e. Mllama) architecture
- Sending follow on requests to vision models will now be much faster
- Fixed issues where stop sequences would not be detected correctly
- Ollama can now import models from Safetensors without a Modelfile when running
ollama create my-model
- Fixed issue where redirecting output to a file on Windows would cause invalid characters to be written
- Fixed issue where invalid model data would cause Ollama to error
Full Changelog: v0.3.14...v0.4.0
v0.3.14
New Models
- Granite 3 MoE: The IBM Granite 1B and 3B models are the first mixture of experts (MoE) Granite models from IBM designed for low latency usage.
- Granite 3 Dense: The IBM Granite 2B and 8B models are designed to support tool-based use cases and support for retrieval augmented generation (RAG), streamlining code generation, translation and bug fixing.
Thank you @gabe-l-hart for contributing Granite support to Ollama!
What's Changed
- Fix crashes for AMD GPUs with small system memory
- Fix error that would occur on macOS 11 Big Sur
- Fixed issue creating models from bf16 file types
- Improve CPU performance by improving default thread counts
- IBM granite/granitemoe architecture support by @gabe-l-hart in #6760
New Contributors
- @JHubi1 made their first contribution in #6465
- @gabe-l-hart made their first contribution in #6760
Full Changelog: v0.3.13...v0.3.14
v0.3.13
New safety models
- Llama Guard 3: a series of models by Meta, fine-tuned for content safety classification of LLM inputs and responses.
- ShieldGemma: ShieldGemma is set of instruction tuned models from Google DeepMind for evaluating the safety of text prompt input and text output responses against a set of defined safety policies.
What's Changed
- Default model in quickstart is now
llama3.2
- Fixed issue where
ollama pull
would leave connections when encountering an error ollama rm
will now stop a model if it is running prior to deleting it
New Go runner (for contributors):
Ollama is transitioning to a new Go-based subprocess model runner with improved reliability and caching. This new runner found under the llama
directory replaces the previous code under llm/ext_server
. During this transition period, the new Go runner is opt-in at build time, and requires using a different approach to build. For more information see development.md.
New Contributors
- @zmldndx made their first contribution in #7027
- @hidden1nin made their first contribution in #7096
- @shifragoldstone made their first contribution in #7117
Full Changelog: v0.3.12...v0.3.13
v0.3.12
New models
- Llama 3.2: Meta's Llama 3.2 goes small with 1B and 3B models.
- Qwen 2.5 Coder: The latest series of Code-Specific Qwen models, with significant improvements in code generation, code reasoning, and code fixing.
What's Changed
- Ollama now supports ARM Windows machines
- Fixed rare issue where Ollama would report a missing
.dll
file on Windows - Fixed performance issue for Windows without GPUs
New Contributors
- @RyanMarten made their first contribution in #6841
Full Changelog: v0.3.11...v0.3.12
v0.3.11
New models
- Solar-Pro-Preview: an advanced large language model (LLM) with 22 billion parameters designed to fit into a single GPU.
- Qwen 2.5: new multilingual Qwen models pretrained on Alibaba's latest large-scale dataset, encompassing up to 18 trillion tokens with support for a context window of up to 128K tokens.
- Bespoke-Minicheck: a state-of-the-art fact-checking model developed by Bespoke Labs.
- Mistral-Small: a lightweight 22B model designed for cost-effective use in tasks like translation and summarization.
- Reader-LM: A series of models that convert HTML content to Markdown content, which is useful for content conversion tasks.
What's Changed
- New
ollama stop
command to unload a running model - Ollama will now show an error when importing a model with an invalid number of tokens in the vocabulary
- The
ollama/ollama
container image will now start running almost immediately, leading to 5s faster start times - Fixed issue where
ollama show
would show excessive whitespace in the output
New Contributors
- @Palm1r made their first contribution in #6754
- @codefromthecrypt made their first contribution in #6744
- @ECuiDev made their first contribution in #6789
- @pepo-ec made their first contribution in #6818
- @yangfuhai made their first contribution in #6788
Full Changelog: v0.3.10...v0.3.11
v0.3.10
New models
- MiniCPM-V: A powerful, multi-modal model with leading performance on several benchmarks.
- Yi-Coder: a series of open-source code language models that delivers state-of-the-art coding performance with fewer than 10 billion parameters.
- DeepSeek-V2.5: An upgraded version of DeepSeek-V2 that integrates the general and coding abilities of both DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct.
What's Changed
- Fixed rare error that would occur for certain models when running
ollama show
- CUDA 11 will now be used for older NVIDIA drivers that are not compatible with CUDA 12
- Fixed error when running
ollama create
to import Gemma 2 models from safetensors - The OpenAI-compatible chat and completions APIs will no longer scale
temperature
andfrequency_penalty
New Contributors
- @rayfiyo made their first contribution in #6577
- @vimalk78 made their first contribution in #6579
- @SnoopyTlion made their first contribution in #6587
- @jonathanhecl made their first contribution in #6583
- @presbrey made their first contribution in #6508
- @yeahdongcn made their first contribution in #5847
- @openvmp made their first contribution in #6605
- @FellowTraveler made their first contribution in #5664
- @mateuszmigas made their first contribution in #5514
- @kspviswa made their first contribution in #6624
- @erkinalp made their first contribution in #6450
- @Carter907 made their first contribution in #6391
- @mitar made their first contribution in #6421
- @kangfenmao made their first contribution in #6633
- @teilomillet made their first contribution in #6099
- @fujitatomoya made their first contribution in #6430
- @1runeberg made their first contribution in #6378
- @iscy made their first contribution in #6642
- @jake83741 made their first contribution in #6644
- @ArdaGnsrn made their first contribution in #6361
- @SilasMarvin made their first contribution in #5063
- @wangqingfree made their first contribution in #6035
- @vitaly-zdanevich made their first contribution in #5714
- @Tobix99 made their first contribution in #6656
- @bean5 made their first contribution in #5888
- @ZeyoYT made their first contribution in #4957
- @imoize made their first contribution in #6681
- @nickthecook made their first contribution in #6680
Full Changelog: v0.3.9...v0.3.10
v0.3.9
What's Changed
- Fixed error that would occur when running Ollama on Linux machines with the ARM architecture
- Ollama will now show an improved error message when attempting to run unsupported models
- Fixed issue where Ollama would not auto-detect the chat template for Llama 3.1 models
OLLAMA_HOST
will now work with with URLs that contain paths
New Contributors
- @bryanhonof made their first contribution in #6074
Full Changelog: v0.3.8...v0.3.9
v0.3.8
What's Changed
- Fixed error where the
ollama
CLI couldn't be found on the path when upgrading Ollama on Windows
New Contributors
- @seankhatiri made their first contribution in #6530
Full Changelog: v0.3.7...v0.3.8
v0.3.7
New Models
- Hermes 3: Hermes 3 is the latest version of the flagship Hermes series of LLMs by Nous Research, which includes support for tool calling.
- Phi 3.5: A lightweight AI model with 3.8 billion parameters with performance overtaking similarly and larger sized models.
- SmolLM: A family of small models with 135M, 360M, and 1.7B parameters, trained on a new high-quality dataset.
What's Changed
- CUDA 12 support: improving performance by up to 10% on newer NVIDIA GPUs
- Improved performance of
ollama pull
andollama push
on slower connections - Fixed issue where setting
OLLAMA_NUM_PARALLEL
would cause models to be reloaded on lower VRAM systems - Ollama on Linux is now distributed as a
tar.gz
file, which contains theollama
binary along with required libraries.
New Contributors
- @pamelafox made their first contribution in #6345
- @eust-w made their first contribution in #5964
Full Changelog: v0.3.6...v0.3.7
v0.3.6
What's Changed
- Fixed issue where
/api/embed
would return an error instead of loading the model when theinput
field was not provided. ollama create
can now import Phi-3 models from Safetensors- Added progress information to
ollama create
when importing GGUF files - Ollama will now import GGUF files faster by minimizing file copies
Full Changelog: v0.3.5...v0.3.6