Releases: ollama/ollama
v0.1.33
Models:
- Llama 3: a new model by Meta, and the most capable openly available LLM to date
- Phi 3 Mini: a new 3.8B parameters, lightweight, state-of-the-art open model by Microsoft.
- Moondream moondream is a small vision language model designed to run efficiently on edge devices.
- Llama 3 Gradient 1048K: A Llama 3 fine-tune by Gradient to support up to a 1M token context window.
- Dolphin Llama 3: The uncensored Dolphin model, trained by Eric Hartford and based on Llama 3 with a variety of instruction, conversational, and coding skills.
- Qwen 110B: The first Qwen model over 100B parameters in size with outstanding performance in evaluations
What's Changed
- Fixed issues where the model would not terminate, causing the API to hang.
- Fixed a series of out of memory errors on Apple Silicon Macs
- Fixed out of memory errors when running Mixtral architecture models
Experimental concurrency features
New concurrency features are coming soon to Ollama. They are available
OLLAMA_NUM_PARALLEL
: Handle multiple requests simultaneously for a single modelOLLAMA_MAX_LOADED_MODELS
: Load multiple models simultaneously
To enable these features, set the environment variables for ollama serve
. For more info see this guide:
OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve
New Contributors
- @sidxt made their first contribution in #3705
- @ChengenH made their first contribution in #3789
- @secondtruth made their first contribution in #3503
- @reid41 made their first contribution in #3612
- @ericcurtin made their first contribution in #3626
- @JT2M0L3Y made their first contribution in #3633
- @datvodinh made their first contribution in #3655
- @MapleEve made their first contribution in #3817
- @swuecho made their first contribution in #3810
- @brycereitano made their first contribution in #3895
- @bsdnet made their first contribution in #3889
- @fyxtro made their first contribution in #3855
- @natalyjazzviolin made their first contribution in #3962
Full Changelog: v0.1.32...v0.1.33-rc5
v0.1.32
New models
- WizardLM 2: State of the art large language model from Microsoft AI with improved performance on complex chat, multilingual, reasoning and agent use cases.
wizardlm2:8x22b
: large 8x22B model based on Mixtral 8x22Bwizardlm2:7b
: fast, high-performing model based on Mistral 7B
- Snowflake Arctic Embed: A suite of text embedding models by Snowflake, optimized for performance.
- Command R+: a powerful, scalable large language model purpose-built for RAG use cases
- DBRX: A large 132B open, general-purpose LLM created by Databricks.
- Mixtral 8x22B: the new leading Mixture of Experts (MoE) base model by Mistral AI.
What's Changed
- Ollama will now better utilize available VRAM, leading to less out-of-memory errors, as well as better GPU utilization
- When running larger models that don't fit into VRAM on macOS, Ollama will now split the model between GPU and CPU to maximize performance.
- Fixed several issues where Ollama would hang upon encountering an error
- Fix issue where using quotes in
OLLAMA_ORIGINS
would cause an error
New Contributors
- @sugarforever made their first contribution in #3400
- @yaroslavyaroslav made their first contribution in #3378
- @Nagi-ovo made their first contribution in #3423
- @ParisNeo made their first contribution in #3436
- @philippgille made their first contribution in #3437
- @cesto93 made their first contribution in #3461
- @ThomasVitale made their first contribution in #3515
- @writinwaters made their first contribution in #3539
- @alexmavr made their first contribution in #3555
Full Changelog: v0.1.31...v0.1.32
v0.1.31
Ollama supports embedding models. Bring your existing documents or other data, and combine it with text prompts to build RAG (retrieval augmented generation) apps using the Ollama REST API, Python or Javascript libraries.
New models
- Qwen 1.5 32B: A new 32B multilingual model competitive with larger models such as Mixtral
- StarlingLM Beta: A high ranking 7B model on popular benchmarks that includes a permissive Apache 2.0 license.
- DolphinCoder StarCoder 7B: A 7B uncensored variant of the Dolphin model family that excels at coding, based on StarCoder2 7B.
- StableLM 1.6 Chat: A new version of StableLM 1.6 tuned for instruction
What's new
- Fixed issue where Ollama would hang when using certain unicode characters in the prompt such as emojis
Full Changelog: v0.1.30...v0.1.31
v0.1.30
New models
- Command R: a Large Language Model optimized for conversational interaction and long context tasks.
- mxbai-embed-large: A new state-of-the-art large embedding model
What's Changed
- Fixed various issues with
ollama run
on Windows- History now will work when pressing up and down arrow keys
- Right and left arrow keys will now move the cursor appropriately
- Pasting multi-line strings will now work on Windows
- Fixed issue where mounting or sharing files between Linux and Windows (e.g. via WSL or Docker) would cause errors due to having
:
in the filename. - Improved support for AMD MI300 and MI300X Accelerators
- Improved cleanup of temporary files resulting in better space utilization
Important change
For filesystem compatibility, Ollama has changed model data filenames to use -
instead of :
. This change will be applied automatically. If downgrading to 0.1.29 or lower from 0.1.30 (on Linux or macOS only) run:
find ~/.ollama/models/blobs -type f -exec bash -c 'mv "$0" "${0//-/:}"' {} \;
New Contributors
- @alitrack made their first contribution in #3111
- @drazdra made their first contribution in #3338
- @rapidarchitect made their first contribution in #3288
- @yusufcanb made their first contribution in #3274
- @jikkuatwork made their first contribution in #3178
- @timothycarambat made their first contribution in #3145
- @fly2tomato made their first contribution in #2946
- @enoch1118 made their first contribution in #2927
- @danny-avila made their first contribution in #2918
- @mmo80 made their first contribution in #2881
- @anaisbetts made their first contribution in #2428
- @marco-souza made their first contribution in #1905
- @guchenhe made their first contribution in #1944
- @herval made their first contribution in #1873
- @Npahlfer made their first contribution in #1623
- @remy415 made their first contribution in #2279
Full Changelog: v0.1.29...v0.1.30
v0.1.29
AMD Preview
Ollama now supports AMD graphics cards in preview on Windows and Linux. All the features are now accelerated by AMD graphics cards, and support is included by default in Ollama for Linux, Windows and Docker.
Supported cards and accelerators
Family | Supported cards and accelerators |
---|---|
AMD Radeon RX | 7900 XTX 7900 XT 7900 GRE 7800 XT 7700 XT 7600 XT 7600 6950 XT 6900 XTX 6900XT 6800 XT 6800 Vega 64 Vega 56 |
AMD Radeon PRO | W7900 W7800 W7700 W7600 W7500 W6900X W6800X Duo W6800X W6800 V620 V420 V340 V320 Vega II Duo Vega II VII SSG |
AMD Instinct | MI300X MI300A MI300 MI250X MI250 MI210 MI200 MI100 MI60 MI50 |
What's Changed
ollama <command> -h
will now show documentation for supported environment variables- Fixed issue where generating embeddings with
nomic-embed-text
,all-minilm
or other embedding models would hang on Linux - Experimental support for importing Safetensors models using the
FROM <directory with safetensors model>
command in the Modelfile - Fixed issues where Ollama would hang when using JSON mode.
- Fixed issue where
ollama run
would error when piping output totee
and other tools - Fixed an issue where memory would not be released when running vision models
- Ollama will no longer show an error message when piping to stdin on Windows
New Contributors
- @tgraupmann made their first contribution in #2582
- @andersrex made their first contribution in #2909
- @leonid20000 made their first contribution in #2440
- @hishope made their first contribution in #2973
- @mrdjohnson made their first contribution in #2759
- @mofanke made their first contribution in #3077
- @racerole made their first contribution in #3073
- @Chris-AS1 made their first contribution in #3094
Full Changelog: v0.1.28...v0.1.29
v0.1.28
New models
- StarCoder2: the next generation of transparently trained open code LLMs that comes in three sizes: 3B, 7B and 15B parameters.
- DolphinCoder: a chat model based on StarCoder2 15B that excels at writing code.
What's Changed
- Vision models such as
llava
should now respond better to text prompts - Improved support for
llava
1.6 models - Fixed issue where switching between models repeatedly would cause Ollama to hang
- Installing Ollama on Windows no longer requires a minimum of 4GB disk space
- Ollama on macOS will now more reliably determine available VRAM
- Fixed issue where running Ollama in
podman
would not detect Nvidia GPUs - Ollama will correctly return an empty embedding when calling
/api/embeddings
with an emptyprompt
instead of hanging
New Contributors
- @Bin-Huang made their first contribution in #1706
- @elthommy made their first contribution in #2737
- @peanut256 made their first contribution in #2354
- @tylinux made their first contribution in #2827
- @fred-bf made their first contribution in #2780
- @bmwiedemann made their first contribution in #2836
Full Changelog: v0.1.27...v0.1.28
v0.1.27
Gemma
Gemma is a new, top-performing family of lightweight open models built by Google. Available in 2b
and 7b
parameter sizes:
ollama run gemma:2b
ollama run gemma:7b
(default)
What's Changed
- Performance improvements (up to 2x) when running Gemma models
- Fixed performance issues on Windows without GPU acceleration. Systems with AVX and AVX2 instruction sets should be 2-4x faster.
- Reduced likelihood of false positive Windows Defender alerts on Windows.
New Contributors
- @joshyan1 made their first contribution in #2657
- @pfrankov made their first contribution in #2138
- @adminazhar made their first contribution in #2686
- @b-tocs made their first contribution in #2510
- @Yuan-ManX made their first contribution in #2249
- @langchain4j made their first contribution in #1690
- @logancyang made their first contribution in #1918
Full Changelog: v0.1.26...v0.1.27
v0.1.26
What's Changed
- Support for
bert
andnomic-bert
embedding models - Fixed issue where system prompt and prompt template would not be updated when loading a new model
- Quotes will now be trimmed around the value of the
OLLAMA_HOST
on Windows - Fixed duplicate button issue on the Windows taskbar menu.
- Fixed issue where system prompt would be be overridden when using the
/api/chat
endpoint - Hardened AMD driver lookup logic
- Fixed issue where two versions of Ollama on Windows would run at the same time
- Fixed issue where memory would not be released after a model is unloaded with modern CUDA-enabled GPUs
- Fixed issue where AVX2 was required for GPU on Windows machines with GPUs
- Fixed issue where
/bye
or/exit
would not work with trailing spaces or characters after them
New Contributors
- @tristanbob made their first contribution in #2545
- @justinh-rahb made their first contribution in #2563
- @gerazov made their first contribution in #2188
- @eddumelendez made their first contribution in #2164
- @lulzshadowwalker made their first contribution in #2381
- @jakobhoeg made their first contribution in #2466
- @jdetroyes made their first contribution in #1673
- @djcopley made their first contribution in #1767
- @pythops made their first contribution in #2329
- @ttsugriy made their first contribution in #2511
- @medoror made their first contribution in #2180
- @nikeshparajuli made their first contribution in #1775
- @n4ze3m made their first contribution in #2447
Full Changelog: v0.1.25...v0.1.26
v0.1.25
Windows Preview
Ollama is now available on Windows in preview. Download it here. Ollama on Windows makes it possible to pull, run and create large language models in a new native Windows experience. It includes built-in GPU acceleration, access to the full model library, and the Ollama API including OpenAI compatibility.
What's Changed
- Ollama on Windows is now available in preview.
- Fixed an issue where requests would hang after being repeated several times
- Ollama will now correctly error when provided an unsupported image format
- Fixed issue where
ollama serve
wouldn't immediately quit when receiving a termination signal - Fixed issues with prompt templating for the
/api/chat
endpoint, such as where Ollama would omit the second system prompt in a series of messages - Fixed issue where providing an empty list of messages would return a non-empty response instead of loading the model
- Setting a negative
keep_alive
value (e.g.-1
) will now correctly keep the model loaded indefinitely
New Contributors
Full Changelog: v0.1.24...v0.1.25
v0.1.24
OpenAI Compatibility
This release adds initial compatibility support for the OpenAI Chat Completions API.
Usage with cURL
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama2",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello!"
}
]
}'
New Models
- Qwen 1.5: Qwen 1.5 is a new family of large language models by Alibaba Cloud spanning from 0.5B to 72B.
What's Changed
- Fixed issue where requests to
/api/chat
would hang when providing emptyuser
messages repeatedly - Fixed issue on macOS where Ollama would return a missing library error after being open for a long period of time
New Contributors
Full Changelog: v0.1.23...v0.1.24