Skip to content

Releases: ollama/ollama

v0.4.0

21 Oct 05:10
9d71bcc
Compare
Choose a tag to compare

Ollama trying to look cool with its own reality vision glasses

Llama 3.2 Vision

Ollama 0.4 adds support for Llama 3.2 Vision. After upgrading or downloading Ollama, run:

ollama run llama3.2-vision

For the larger, 90B version of the model, run:

ollama run llama3.2-vision:90b

What's changed

  • Support for Llama 3.2 Vision (i.e. Mllama) architecture
  • Sending follow on requests to vision models will now be much faster
  • Fixed issues where stop sequences would not be detected correctly
  • Ollama can now import models from Safetensors without a Modelfile when running ollama create my-model
  • Fixed issue where redirecting output to a file on Windows would cause invalid characters to be written
  • Fixed issue where invalid model data would cause Ollama to error

Full Changelog: v0.3.14...v0.4.0

v0.3.14

17 Oct 22:31
f2890a4
Compare
Choose a tag to compare

IBM and Ollama are partnering to bring Granite models!

New Models

  • Granite 3 MoE: The IBM Granite 1B and 3B models are the first mixture of experts (MoE) Granite models from IBM designed for low latency usage.
  • Granite 3 Dense: The IBM Granite 2B and 8B models are designed to support tool-based use cases and support for retrieval augmented generation (RAG), streamlining code generation, translation and bug fixing.

Thank you @gabe-l-hart for contributing Granite support to Ollama!

What's Changed

  • Fix crashes for AMD GPUs with small system memory
  • Fix error that would occur on macOS 11 Big Sur
  • Fixed issue creating models from bf16 file types
  • Improve CPU performance by improving default thread counts
  • IBM granite/granitemoe architecture support by @gabe-l-hart in #6760

New Contributors

Full Changelog: v0.3.13...v0.3.14

v0.3.13

11 Oct 19:41
Compare
Choose a tag to compare

003150@2x

New safety models

  • Llama Guard 3: a series of models by Meta, fine-tuned for content safety classification of LLM inputs and responses.
  • ShieldGemma: ShieldGemma is set of instruction tuned models from Google DeepMind for evaluating the safety of text prompt input and text output responses against a set of defined safety policies.

What's Changed

  • Default model in quickstart is now llama3.2
  • Fixed issue where ollama pull would leave connections when encountering an error
  • ollama rm will now stop a model if it is running prior to deleting it

New Go runner (for contributors):

Ollama is transitioning to a new Go-based subprocess model runner with improved reliability and caching. This new runner found under the llama directory replaces the previous code under llm/ext_server. During this transition period, the new Go runner is opt-in at build time, and requires using a different approach to build. For more information see development.md.

New Contributors

Full Changelog: v0.3.12...v0.3.13

v0.3.12

23 Sep 03:39
e9e9bdb
Compare
Choose a tag to compare

Ollama supports llama 3.2

New models

  • Llama 3.2: Meta's Llama 3.2 goes small with 1B and 3B models.
  • Qwen 2.5 Coder: The latest series of Code-Specific Qwen models, with significant improvements in code generation, code reasoning, and code fixing.

What's Changed

  • Ollama now supports ARM Windows machines
  • Fixed rare issue where Ollama would report a missing .dll file on Windows
  • Fixed performance issue for Windows without GPUs

New Contributors

Full Changelog: v0.3.11...v0.3.12

v0.3.11

17 Sep 21:40
504a410
Compare
Choose a tag to compare

New models

  • Solar-Pro-Preview: an advanced large language model (LLM) with 22 billion parameters designed to fit into a single GPU.
  • Qwen 2.5: new multilingual Qwen models pretrained on Alibaba's latest large-scale dataset, encompassing up to 18 trillion tokens with support for a context window of up to 128K tokens.
  • Bespoke-Minicheck: a state-of-the-art fact-checking model developed by Bespoke Labs.
  • Mistral-Small: a lightweight 22B model designed for cost-effective use in tasks like translation and summarization.
  • Reader-LM: A series of models that convert HTML content to Markdown content, which is useful for content conversion tasks.

What's Changed

  • New ollama stop command to unload a running model
  • Ollama will now show an error when importing a model with an invalid number of tokens in the vocabulary
  • The ollama/ollama container image will now start running almost immediately, leading to 5s faster start times
  • Fixed issue where ollama show would show excessive whitespace in the output

New Contributors

Full Changelog: v0.3.10...v0.3.11

v0.3.10

06 Sep 16:17
06d4fba
Compare
Choose a tag to compare

New models

  • MiniCPM-V: A powerful, multi-modal model with leading performance on several benchmarks.
  • Yi-Coder: a series of open-source code language models that delivers state-of-the-art coding performance with fewer than 10 billion parameters.
  • DeepSeek-V2.5: An upgraded version of DeepSeek-V2 that integrates the general and coding abilities of both DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct.

What's Changed

  • Fixed rare error that would occur for certain models when running ollama show
  • CUDA 11 will now be used for older NVIDIA drivers that are not compatible with CUDA 12
  • Fixed error when running ollama create to import Gemma 2 models from safetensors
  • The OpenAI-compatible chat and completions APIs will no longer scale temperature and frequency_penalty

New Contributors

Full Changelog: v0.3.9...v0.3.10

v0.3.9

31 Aug 19:26
a1cef4d
Compare
Choose a tag to compare

What's Changed

  • Fixed error that would occur when running Ollama on Linux machines with the ARM architecture
  • Ollama will now show an improved error message when attempting to run unsupported models
  • Fixed issue where Ollama would not auto-detect the chat template for Llama 3.1 models
  • OLLAMA_HOST will now work with with URLs that contain paths

New Contributors

Full Changelog: v0.3.8...v0.3.9

v0.3.8

28 Aug 01:09
93ea924
Compare
Choose a tag to compare

What's Changed

  • Fixed error where the ollama CLI couldn't be found on the path when upgrading Ollama on Windows

New Contributors

Full Changelog: v0.3.7...v0.3.8

v0.3.7

20 Aug 17:45
0f92b19
Compare
Choose a tag to compare

New Models

  • Hermes 3: Hermes 3 is the latest version of the flagship Hermes series of LLMs by Nous Research, which includes support for tool calling.
  • Phi 3.5: A lightweight AI model with 3.8 billion parameters with performance overtaking similarly and larger sized models.
  • SmolLM: A family of small models with 135M, 360M, and 1.7B parameters, trained on a new high-quality dataset.

What's Changed

  • CUDA 12 support: improving performance by up to 10% on newer NVIDIA GPUs
  • Improved performance of ollama pull and ollama push on slower connections
  • Fixed issue where setting OLLAMA_NUM_PARALLEL would cause models to be reloaded on lower VRAM systems
  • Ollama on Linux is now distributed as a tar.gz file, which contains the ollama binary along with required libraries.

New Contributors

Full Changelog: v0.3.6...v0.3.7

v0.3.6

13 Aug 20:27
4c4fe3f
Compare
Choose a tag to compare

What's Changed

  • Fixed issue where /api/embed would return an error instead of loading the model when the input field was not provided.
  • ollama create can now import Phi-3 models from Safetensors
  • Added progress information to ollama create when importing GGUF files
  • Ollama will now import GGUF files faster by minimizing file copies

Full Changelog: v0.3.5...v0.3.6