Skip to content

Releases: ollama/ollama

v0.1.33

28 Apr 17:51
ec1acbb
Compare
Choose a tag to compare
v0.1.33 Pre-release
Pre-release

3 ollamas

Models:

  • Llama 3: a new model by Meta, and the most capable openly available LLM to date
  • Phi 3 Mini: a new 3.8B parameters, lightweight, state-of-the-art open model by Microsoft.
  • Moondream moondream is a small vision language model designed to run efficiently on edge devices.
  • Llama 3 Gradient 1048K: A Llama 3 fine-tune by Gradient to support up to a 1M token context window.
  • Dolphin Llama 3: The uncensored Dolphin model, trained by Eric Hartford and based on Llama 3 with a variety of instruction, conversational, and coding skills.
  • Qwen 110B: The first Qwen model over 100B parameters in size with outstanding performance in evaluations

What's Changed

  • Fixed issues where the model would not terminate, causing the API to hang.
  • Fixed a series of out of memory errors on Apple Silicon Macs
  • Fixed out of memory errors when running Mixtral architecture models

Experimental concurrency features

New concurrency features are coming soon to Ollama. They are available

  • OLLAMA_NUM_PARALLEL: Handle multiple requests simultaneously for a single model
  • OLLAMA_MAX_LOADED_MODELS: Load multiple models simultaneously

To enable these features, set the environment variables for ollama serve. For more info see this guide:

OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve

New Contributors

Full Changelog: v0.1.32...v0.1.33-rc5

v0.1.32

10 Apr 23:01
fb9580d
Compare
Choose a tag to compare

picture of ollama levelling up

New models

  • WizardLM 2: State of the art large language model from Microsoft AI with improved performance on complex chat, multilingual, reasoning and agent use cases.
    • wizardlm2:8x22b: large 8x22B model based on Mixtral 8x22B
    • wizardlm2:7b: fast, high-performing model based on Mistral 7B
  • Snowflake Arctic Embed: A suite of text embedding models by Snowflake, optimized for performance.
  • Command R+: a powerful, scalable large language model purpose-built for RAG use cases
  • DBRX: A large 132B open, general-purpose LLM created by Databricks.
  • Mixtral 8x22B: the new leading Mixture of Experts (MoE) base model by Mistral AI.

What's Changed

  • Ollama will now better utilize available VRAM, leading to less out-of-memory errors, as well as better GPU utilization
  • When running larger models that don't fit into VRAM on macOS, Ollama will now split the model between GPU and CPU to maximize performance.
  • Fixed several issues where Ollama would hang upon encountering an error
  • Fix issue where using quotes in OLLAMA_ORIGINS would cause an error

New Contributors

Full Changelog: v0.1.31...v0.1.32

v0.1.31

05 Apr 16:09
Compare
Choose a tag to compare

ollama embedding

Ollama supports embedding models. Bring your existing documents or other data, and combine it with text prompts to build RAG (retrieval augmented generation) apps using the Ollama REST API, Python or Javascript libraries.

New models

  • Qwen 1.5 32B: A new 32B multilingual model competitive with larger models such as Mixtral
  • StarlingLM Beta: A high ranking 7B model on popular benchmarks that includes a permissive Apache 2.0 license.
  • DolphinCoder StarCoder 7B: A 7B uncensored variant of the Dolphin model family that excels at coding, based on StarCoder2 7B.
  • StableLM 1.6 Chat: A new version of StableLM 1.6 tuned for instruction

What's new

  • Fixed issue where Ollama would hang when using certain unicode characters in the prompt such as emojis

Full Changelog: v0.1.30...v0.1.31

v0.1.30

26 Mar 18:19
756c257
Compare
Choose a tag to compare
Ollama now supports Cohere's Command R model

New models

  • Command R: a Large Language Model optimized for conversational interaction and long context tasks.
  • mxbai-embed-large: A new state-of-the-art large embedding model

What's Changed

  • Fixed various issues with ollama run on Windows
    • History now will work when pressing up and down arrow keys
    • Right and left arrow keys will now move the cursor appropriately
    • Pasting multi-line strings will now work on Windows
  • Fixed issue where mounting or sharing files between Linux and Windows (e.g. via WSL or Docker) would cause errors due to having : in the filename.
  • Improved support for AMD MI300 and MI300X Accelerators
  • Improved cleanup of temporary files resulting in better space utilization

Important change

For filesystem compatibility, Ollama has changed model data filenames to use - instead of :. This change will be applied automatically. If downgrading to 0.1.29 or lower from 0.1.30 (on Linux or macOS only) run:

find ~/.ollama/models/blobs -type f -exec bash -c 'mv "$0" "${0//-/:}"' {} \;

New Contributors

Full Changelog: v0.1.29...v0.1.30

v0.1.29

10 Mar 02:24
e87c780
Compare
Choose a tag to compare

AMD Preview

Ollama now supports AMD graphics cards in preview on Windows and Linux. All the features are now accelerated by AMD graphics cards, and support is included by default in Ollama for Linux, Windows and Docker.

Supported cards and accelerators

Family Supported cards and accelerators
AMD Radeon RX 7900 XTX 7900 XT 7900 GRE 7800 XT 7700 XT 7600 XT 7600
6950 XT 6900 XTX 6900XT 6800 XT 6800
Vega 64 Vega 56
AMD Radeon PRO W7900 W7800 W7700 W7600 W7500
W6900X W6800X Duo W6800X W6800
V620 V420 V340 V320
Vega II Duo Vega II VII SSG
AMD Instinct MI300X MI300A MI300
MI250X MI250 MI210 MI200
MI100 MI60 MI50

What's Changed

  • ollama <command> -h will now show documentation for supported environment variables
  • Fixed issue where generating embeddings with nomic-embed-text, all-minilm or other embedding models would hang on Linux
  • Experimental support for importing Safetensors models using the FROM <directory with safetensors model> command in the Modelfile
  • Fixed issues where Ollama would hang when using JSON mode.
  • Fixed issue where ollama run would error when piping output to tee and other tools
  • Fixed an issue where memory would not be released when running vision models
  • Ollama will no longer show an error message when piping to stdin on Windows

New Contributors

Full Changelog: v0.1.28...v0.1.29

v0.1.28

01 Mar 06:41
21347e1
Compare
Choose a tag to compare

New models

  • StarCoder2: the next generation of transparently trained open code LLMs that comes in three sizes: 3B, 7B and 15B parameters.
  • DolphinCoder: a chat model based on StarCoder2 15B that excels at writing code.

What's Changed

  • Vision models such as llava should now respond better to text prompts
  • Improved support for llava 1.6 models
  • Fixed issue where switching between models repeatedly would cause Ollama to hang
  • Installing Ollama on Windows no longer requires a minimum of 4GB disk space
  • Ollama on macOS will now more reliably determine available VRAM
  • Fixed issue where running Ollama in podman would not detect Nvidia GPUs
  • Ollama will correctly return an empty embedding when calling /api/embeddings with an empty prompt instead of hanging

New Contributors

Full Changelog: v0.1.27...v0.1.28

v0.1.27

22 Feb 22:55
Compare
Choose a tag to compare

306900613-01333db3-c27b-4044-88b3-9b2ffbe06415

Gemma

Gemma is a new, top-performing family of lightweight open models built by Google. Available in 2b and 7b parameter sizes:

  • ollama run gemma:2b
  • ollama run gemma:7b (default)

What's Changed

  • Performance improvements (up to 2x) when running Gemma models
  • Fixed performance issues on Windows without GPU acceleration. Systems with AVX and AVX2 instruction sets should be 2-4x faster.
  • Reduced likelihood of false positive Windows Defender alerts on Windows.

New Contributors

Full Changelog: v0.1.26...v0.1.27

v0.1.26

21 Feb 04:20
Compare
Choose a tag to compare

What's Changed

  • Support for bert and nomic-bert embedding models
  • Fixed issue where system prompt and prompt template would not be updated when loading a new model
  • Quotes will now be trimmed around the value of the OLLAMA_HOST on Windows
  • Fixed duplicate button issue on the Windows taskbar menu.
  • Fixed issue where system prompt would be be overridden when using the /api/chat endpoint
  • Hardened AMD driver lookup logic
  • Fixed issue where two versions of Ollama on Windows would run at the same time
  • Fixed issue where memory would not be released after a model is unloaded with modern CUDA-enabled GPUs
  • Fixed issue where AVX2 was required for GPU on Windows machines with GPUs
  • Fixed issue where /bye or /exit would not work with trailing spaces or characters after them

New Contributors

Full Changelog: v0.1.25...v0.1.26

v0.1.25

14 Feb 07:21
42e77e2
Compare
Choose a tag to compare

ollama_windows

Windows Preview

Ollama is now available on Windows in preview. Download it here. Ollama on Windows makes it possible to pull, run and create large language models in a new native Windows experience. It includes built-in GPU acceleration, access to the full model library, and the Ollama API including OpenAI compatibility.

What's Changed

  • Ollama on Windows is now available in preview.
  • Fixed an issue where requests would hang after being repeated several times
  • Ollama will now correctly error when provided an unsupported image format
  • Fixed issue where ollama serve wouldn't immediately quit when receiving a termination signal
  • Fixed issues with prompt templating for the /api/chat endpoint, such as where Ollama would omit the second system prompt in a series of messages
  • Fixed issue where providing an empty list of messages would return a non-empty response instead of loading the model
  • Setting a negative keep_alive value (e.g. -1) will now correctly keep the model loaded indefinitely

New Contributors

Full Changelog: v0.1.24...v0.1.25

v0.1.24

08 Feb 03:02
69f392c
Compare
Choose a tag to compare

OpenAI Compatibility

openai

This release adds initial compatibility support for the OpenAI Chat Completions API.

Usage with cURL

curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "llama2",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'

New Models

  • Qwen 1.5: Qwen 1.5 is a new family of large language models by Alibaba Cloud spanning from 0.5B to 72B.

What's Changed

  • Fixed issue where requests to /api/chat would hang when providing empty user messages repeatedly
  • Fixed issue on macOS where Ollama would return a missing library error after being open for a long period of time

New Contributors

Full Changelog: v0.1.23...v0.1.24