Releases · ollama/ollama

28 Apr 17:51

github-actions

v0.1.33-rc5

ec1acbb

v0.1.33 Pre-release

Pre-release

Models:

Llama 3: a new model by Meta, and the most capable openly available LLM to date
Phi 3 Mini: a new 3.8B parameters, lightweight, state-of-the-art open model by Microsoft.
Moondream moondream is a small vision language model designed to run efficiently on edge devices.
Llama 3 Gradient 1048K: A Llama 3 fine-tune by Gradient to support up to a 1M token context window.
Dolphin Llama 3: The uncensored Dolphin model, trained by Eric Hartford and based on Llama 3 with a variety of instruction, conversational, and coding skills.
Qwen 110B: The first Qwen model over 100B parameters in size with outstanding performance in evaluations

What's Changed

Fixed issues where the model would not terminate, causing the API to hang.
Fixed a series of out of memory errors on Apple Silicon Macs
Fixed out of memory errors when running Mixtral architecture models

Experimental concurrency features

New concurrency features are coming soon to Ollama. They are available

OLLAMA_NUM_PARALLEL: Handle multiple requests simultaneously for a single model
OLLAMA_MAX_LOADED_MODELS: Load multiple models simultaneously

To enable these features, set the environment variables for ollama serve. For more info see this guide:

OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve

New Contributors

@sidxt made their first contribution in #3705
@ChengenH made their first contribution in #3789
@secondtruth made their first contribution in #3503
@reid41 made their first contribution in #3612
@ericcurtin made their first contribution in #3626
@JT2M0L3Y made their first contribution in #3633
@datvodinh made their first contribution in #3655
@MapleEve made their first contribution in #3817
@swuecho made their first contribution in #3810
@brycereitano made their first contribution in #3895
@bsdnet made their first contribution in #3889
@fyxtro made their first contribution in #3855
@natalyjazzviolin made their first contribution in #3962

Full Changelog: v0.1.32...v0.1.33-rc5

Contributors

secondtruth, swuecho, and 11 other contributors

Assets 10

10 Apr 23:01

github-actions

v0.1.32

fb9580d

v0.1.32 Latest

Latest

New models

WizardLM 2: State of the art large language model from Microsoft AI with improved performance on complex chat, multilingual, reasoning and agent use cases.
- wizardlm2:8x22b: large 8x22B model based on Mixtral 8x22B
- wizardlm2:7b: fast, high-performing model based on Mistral 7B
Snowflake Arctic Embed: A suite of text embedding models by Snowflake, optimized for performance.
Command R+: a powerful, scalable large language model purpose-built for RAG use cases
DBRX: A large 132B open, general-purpose LLM created by Databricks.
Mixtral 8x22B: the new leading Mixture of Experts (MoE) base model by Mistral AI.

What's Changed

Ollama will now better utilize available VRAM, leading to less out-of-memory errors, as well as better GPU utilization
When running larger models that don't fit into VRAM on macOS, Ollama will now split the model between GPU and CPU to maximize performance.
Fixed several issues where Ollama would hang upon encountering an error
Fix issue where using quotes in OLLAMA_ORIGINS would cause an error

New Contributors

@sugarforever made their first contribution in #3400
@yaroslavyaroslav made their first contribution in #3378
@Nagi-ovo made their first contribution in #3423
@ParisNeo made their first contribution in #3436
@philippgille made their first contribution in #3437
@cesto93 made their first contribution in #3461
@ThomasVitale made their first contribution in #3515
@writinwaters made their first contribution in #3539
@alexmavr made their first contribution in #3555

Full Changelog: v0.1.31...v0.1.32

Contributors

philippgille, sugarforever, and 7 other contributors

Assets 10

05 Apr 16:09

github-actions

v0.1.31

dc011d1

v0.1.31

Ollama supports embedding models. Bring your existing documents or other data, and combine it with text prompts to build RAG (retrieval augmented generation) apps using the Ollama REST API, Python or Javascript libraries.

New models

Qwen 1.5 32B: A new 32B multilingual model competitive with larger models such as Mixtral
StarlingLM Beta: A high ranking 7B model on popular benchmarks that includes a permissive Apache 2.0 license.
DolphinCoder StarCoder 7B: A 7B uncensored variant of the Dolphin model family that excels at coding, based on StarCoder2 7B.
StableLM 1.6 Chat: A new version of StableLM 1.6 tuned for instruction

What's new

Fixed issue where Ollama would hang when using certain unicode characters in the prompt such as emojis

Full Changelog: v0.1.30...v0.1.31

Assets 10

26 Mar 18:19

github-actions

v0.1.30

756c257

v0.1.30

Ollama now supports Cohere's Command R model

New models

Command R: a Large Language Model optimized for conversational interaction and long context tasks.
mxbai-embed-large: A new state-of-the-art large embedding model

What's Changed

Fixed various issues with ollama run on Windows
- History now will work when pressing up and down arrow keys
- Right and left arrow keys will now move the cursor appropriately
- Pasting multi-line strings will now work on Windows
Fixed issue where mounting or sharing files between Linux and Windows (e.g. via WSL or Docker) would cause errors due to having : in the filename.
Improved support for AMD MI300 and MI300X Accelerators
Improved cleanup of temporary files resulting in better space utilization

Important change

For filesystem compatibility, Ollama has changed model data filenames to use - instead of :. This change will be applied automatically. If downgrading to 0.1.29 or lower from 0.1.30 (on Linux or macOS only) run:

find ~/.ollama/models/blobs -type f -exec bash -c 'mv "$0" "${0//-/:}"' {} \;

New Contributors

@alitrack made their first contribution in #3111
@drazdra made their first contribution in #3338
@rapidarchitect made their first contribution in #3288
@yusufcanb made their first contribution in #3274
@jikkuatwork made their first contribution in #3178
@timothycarambat made their first contribution in #3145
@fly2tomato made their first contribution in #2946
@enoch1118 made their first contribution in #2927
@danny-avila made their first contribution in #2918
@mmo80 made their first contribution in #2881
@anaisbetts made their first contribution in #2428
@marco-souza made their first contribution in #1905
@guchenhe made their first contribution in #1944
@herval made their first contribution in #1873
@Npahlfer made their first contribution in #1623
@remy415 made their first contribution in #2279

Full Changelog: v0.1.29...v0.1.30

Contributors

anaisbetts, herval, and 14 other contributors

Assets 10

10 Mar 02:24

jmorganca

v0.1.29

e87c780

v0.1.29

AMD Preview

Ollama now supports AMD graphics cards in preview on Windows and Linux. All the features are now accelerated by AMD graphics cards, and support is included by default in Ollama for Linux, Windows and Docker.

Supported cards and accelerators

Family	Supported cards and accelerators
AMD Radeon RX	`7900 XTX` `7900 XT` `7900 GRE` `7800 XT` `7700 XT` `7600 XT` `7600` `6950 XT` `6900 XTX` `6900XT` `6800 XT` `6800` `Vega 64` `Vega 56`
AMD Radeon PRO	`W7900` `W7800` `W7700` `W7600` `W7500` `W6900X` `W6800X Duo` `W6800X` `W6800` `V620` `V420` `V340` `V320` `Vega II Duo` `Vega II` `VII` `SSG`
AMD Instinct	`MI300X` `MI300A` `MI300` `MI250X` `MI250` `MI210` `MI200` `MI100` `MI60` `MI50`

What's Changed

ollama <command> -h will now show documentation for supported environment variables
Fixed issue where generating embeddings with nomic-embed-text, all-minilm or other embedding models would hang on Linux
Experimental support for importing Safetensors models using the FROM <directory with safetensors model> command in the Modelfile
Fixed issues where Ollama would hang when using JSON mode.
Fixed issue where ollama run would error when piping output to tee and other tools
Fixed an issue where memory would not be released when running vision models
Ollama will no longer show an error message when piping to stdin on Windows

New Contributors

@tgraupmann made their first contribution in #2582
@andersrex made their first contribution in #2909
@leonid20000 made their first contribution in #2440
@hishope made their first contribution in #2973
@mrdjohnson made their first contribution in #2759
@mofanke made their first contribution in #3077
@racerole made their first contribution in #3073
@Chris-AS1 made their first contribution in #3094

Full Changelog: v0.1.28...v0.1.29

Contributors

tgraupmann, andersrex, and 6 other contributors

Assets 8

01 Mar 06:41

jmorganca

v0.1.28

21347e1

v0.1.28

New models

StarCoder2: the next generation of transparently trained open code LLMs that comes in three sizes: 3B, 7B and 15B parameters.
DolphinCoder: a chat model based on StarCoder2 15B that excels at writing code.

What's Changed

Vision models such as llava should now respond better to text prompts
Improved support for llava 1.6 models
Fixed issue where switching between models repeatedly would cause Ollama to hang
Installing Ollama on Windows no longer requires a minimum of 4GB disk space
Ollama on macOS will now more reliably determine available VRAM
Fixed issue where running Ollama in podman would not detect Nvidia GPUs
Ollama will correctly return an empty embedding when calling /api/embeddings with an empty prompt instead of hanging

New Contributors

@Bin-Huang made their first contribution in #1706
@elthommy made their first contribution in #2737
@peanut256 made their first contribution in #2354
@tylinux made their first contribution in #2827
@fred-bf made their first contribution in #2780
@bmwiedemann made their first contribution in #2836

Full Changelog: v0.1.27...v0.1.28

Contributors

bmwiedemann, tylinux, and 4 other contributors

Assets 7

22 Feb 22:55

jmorganca

v0.1.27

8782dd5

v0.1.27

Gemma

Gemma is a new, top-performing family of lightweight open models built by Google. Available in 2b and 7b parameter sizes:

ollama run gemma:2b
ollama run gemma:7b (default)

What's Changed

Performance improvements (up to 2x) when running Gemma models
Fixed performance issues on Windows without GPU acceleration. Systems with AVX and AVX2 instruction sets should be 2-4x faster.
Reduced likelihood of false positive Windows Defender alerts on Windows.

New Contributors

@joshyan1 made their first contribution in #2657
@pfrankov made their first contribution in #2138
@adminazhar made their first contribution in #2686
@b-tocs made their first contribution in #2510
@Yuan-ManX made their first contribution in #2249
@langchain4j made their first contribution in #1690
@logancyang made their first contribution in #1918

Full Changelog: v0.1.26...v0.1.27

Contributors

pfrankov, logancyang, and 5 other contributors

Assets 7

21 Feb 04:20

jmorganca

v0.1.26

2a7553c

v0.1.26

What's Changed

Support for bert and nomic-bert embedding models
Fixed issue where system prompt and prompt template would not be updated when loading a new model
Quotes will now be trimmed around the value of the OLLAMA_HOST on Windows
Fixed duplicate button issue on the Windows taskbar menu.
Fixed issue where system prompt would be be overridden when using the /api/chat endpoint
Hardened AMD driver lookup logic
Fixed issue where two versions of Ollama on Windows would run at the same time
Fixed issue where memory would not be released after a model is unloaded with modern CUDA-enabled GPUs
Fixed issue where AVX2 was required for GPU on Windows machines with GPUs
Fixed issue where /bye or /exit would not work with trailing spaces or characters after them

New Contributors

@tristanbob made their first contribution in #2545
@justinh-rahb made their first contribution in #2563
@gerazov made their first contribution in #2188
@eddumelendez made their first contribution in #2164
@lulzshadowwalker made their first contribution in #2381
@jakobhoeg made their first contribution in #2466
@jdetroyes made their first contribution in #1673
@djcopley made their first contribution in #1767
@pythops made their first contribution in #2329
@ttsugriy made their first contribution in #2511
@medoror made their first contribution in #2180
@nikeshparajuli made their first contribution in #1775
@n4ze3m made their first contribution in #2447

Full Changelog: v0.1.25...v0.1.26

Contributors

ttsugriy, eddumelendez, and 11 other contributors

Assets 7

14 Feb 07:21

jmorganca

v0.1.25

42e77e2

v0.1.25

Windows Preview

Ollama is now available on Windows in preview. Download it here. Ollama on Windows makes it possible to pull, run and create large language models in a new native Windows experience. It includes built-in GPU acceleration, access to the full model library, and the Ollama API including OpenAI compatibility.

What's Changed

Ollama on Windows is now available in preview.
Fixed an issue where requests would hang after being repeated several times
Ollama will now correctly error when provided an unsupported image format
Fixed issue where ollama serve wouldn't immediately quit when receiving a termination signal
Fixed issues with prompt templating for the /api/chat endpoint, such as where Ollama would omit the second system prompt in a series of messages
Fixed issue where providing an empty list of messages would return a non-empty response instead of loading the model
Setting a negative keep_alive value (e.g. -1) will now correctly keep the model loaded indefinitely

New Contributors

@lebrunel made their first contribution in #2477
@bnorick made their first contribution in #2480

Full Changelog: v0.1.24...v0.1.25

Contributors

bnorick and lebrunel

Assets 7

08 Feb 03:02

jmorganca

v0.1.24

69f392c

v0.1.24

OpenAI Compatibility

This release adds initial compatibility support for the OpenAI Chat Completions API.

Usage with cURL

curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "llama2",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'

New Models

Qwen 1.5: Qwen 1.5 is a new family of large language models by Alibaba Cloud spanning from 0.5B to 72B.

What's Changed

Fixed issue where requests to /api/chat would hang when providing empty user messages repeatedly
Fixed issue on macOS where Ollama would return a missing library error after being open for a long period of time

New Contributors

@easp made their first contribution in #2340
@mraiser made their first contribution in #1849

Full Changelog: v0.1.23...v0.1.24

Contributors

easp and mraiser

Assets 6

Releases: ollama/ollama

v0.1.33

Models:

What's Changed

Experimental concurrency features

New Contributors

Contributors

v0.1.32

New models

What's Changed

New Contributors

Contributors

v0.1.31

New models

What's new

v0.1.30

New models

What's Changed

New Contributors

Contributors

v0.1.29

AMD Preview

Supported cards and accelerators

What's Changed

New Contributors

Contributors

v0.1.28

New models

What's Changed

New Contributors

Contributors

v0.1.27

Gemma

What's Changed

New Contributors

Contributors

v0.1.26

What's Changed

New Contributors

Contributors

v0.1.25

Windows Preview

What's Changed

New Contributors

Contributors

v0.1.24

OpenAI Compatibility

Usage with cURL

New Models

What's Changed

New Contributors

Contributors