Skip to content

Releases: ollama/ollama

v0.1.45

15 Jun 19:11
21adf8b
Compare
Choose a tag to compare
v0.1.45 Pre-release
Pre-release

What's Changed

  • Setting seed in the /v1/chat/completions OpenAI compatibility endpoint no longer changes temperature
  • Enhanced GPU discovery and multi-gpu support with concurrency
  • Skip searching for network devices in Linux install script
  • Report GPU configuration variables in server log
  • Update linux ROCm to v6.1.1
  • Workaround AMD Vega RX 56 SDMA support on linux
  • Fix memory prediction for deepseek v2 models
  • Speed up model loading on Windows with CUDA GPUs

New Contributors

Full Changelog: v0.1.44...v0.1.45-rc3

v0.1.44

13 Jun 19:38
c39761c
Compare
Choose a tag to compare

What's Changed

  • Fixed issue where unicode characters such as emojis would not be loaded correctly when running ollama create
  • Fixed certain cases where Nvidia GPUs would not be detected and reported as compute capability 1.0 devices

Full Changelog: v0.1.43...v0.1.44

v0.1.43

11 Jun 02:00
217f60c
Compare
Choose a tag to compare

Ollama honest work

What's Changed

  • New import.md guide for converting and importing models to Ollama
  • Fixed issue where embedding vectors resulting from /api/embeddings would not be accurate
  • JSON mode responses will no longer include invalid escape characters
  • Removing a model will no longer show incorrect File not found errors
  • Fixed issue where running ollama create would result in an error on Windows with certain file formatting

New Contributors

Full Changelog: v0.1.42...v0.1.43

v0.1.42

07 Jun 07:06
385a32e
Compare
Choose a tag to compare

New models

  • Qwen 2: a new series of large language models from Alibaba group

What's Changed

  • Fixed issue where qwen2 would output erroneous text such as GGG on Nvidia and AMD GPUs
  • ollama pull is now faster if it detects a model is already downloaded
  • ollama create will now automatically detect prompt templates for popular model architectures such as Llama, Gemma, Phi and more.
  • Ollama can now be accessed from local apps built with Electron and Tauri, as well as in developing apps in local html files
  • Update welcome prompt in Windows to llama3
  • Fixed issues where /api/ps and /api/tags would show invalid timestamps in responses

New Contributors

Full Changelog: v0.1.41...v0.1.42

v0.1.41

02 Jun 03:30
476fb8e
Compare
Choose a tag to compare

What's Changed

  • Fixed issue on Windows 10 and 11 with Intel CPUs with integrated GPUs where Ollama would encounter an error

Full Changelog: v0.1.40...v0.1.41

v0.1.40

31 May 05:49
829ff87
Compare
Choose a tag to compare

ollama continuing to capture bugs

New models

  • Codestral: Codestral is Mistral AI’s first-ever code model designed for code generation tasks.
  • IBM Granite Code: now in 3B and 8B parameter sizes.
  • Deepseek V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

What's Changed

  • Fixed out of memory and incorrect token issues when running Codestral on 16GB Macs
  • Fixed issue where full-width characters (e.g. Japanese, Chinese, Russian) were deleted at end of the line when using ollama run

New Examples

New Contributors

Full Changelog: v0.1.39...v0.1.40

v0.1.39

22 May 02:46
ad89708
Compare
Choose a tag to compare

New models

  • Cohere Aya 23: A new state-of-the-art, multilingual LLM covering 23 different languages.
  • Mistral 7B 0.3: A new version of Mistral 7B with initial support for function calling.
  • Phi-3 Medium: a 14B parameters, lightweight, state-of-the-art open model by Microsoft.
  • Phi-3 Mini 128K and Phi-3 Medium 128K: versions of the Phi-3 models that support a context window size of 128K
  • Granite code: A family of open foundation models by IBM for Code Intelligence

Llama 3 import

It is now possible to import and quantize Llama 3 and its finetunes from Safetensors format to Ollama.

First, clone a Hugging Face repo with a Safetensors model:

git clone https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
cd Meta-Llama-3-8B-Instruct

Next, create a Modelfile:

FROM .

TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""

PARAMETER stop <|start_header_id|>
PARAMETER stop <|end_header_id|>
PARAMETER stop <|eot_id|>

Then, create and quantize a model:

ollama create --quantize q4_0 -f Modelfile my-llama3 
ollama run my-llama3

What's Changed

  • Fixed issues where wide characters such as Chinese, Korean, Japanese and Russian languages.
  • Added new OLLAMA_NOHISTORY=1 environment variable that can be set to disable history when using ollama run
  • New experimental OLLAMA_FLASH_ATTENTION=1 flag for ollama serve that improves token generation speed on Apple Silicon Macs and NVIDIA graphics cards
  • Fixed error that would occur on Windows running ollama create -f Modelfile
  • ollama create can now create models from I-Quant GGUF files
  • Fixed EOF errors when resuming downloads via ollama pull
  • Added a Ctrl+W shortcut to ollama run

New Contributors

Full Changelog: v0.1.38...v0.1.39

v0.1.38

15 May 00:28
d1692fd
Compare
Choose a tag to compare

New Models

  • Falcon 2: A new 11B parameters causal decoder-only model built by TII and trained over 5T tokens.
  • Yi 1.5: A new high-performing version of Yi, now licensed as Apache 2.0. Available in 6B, 9B and 34B sizes.

What's Changed

ollama ps

A new command is now available: ollama ps. This command displays currently loaded models, their memory footprint, and the processors used (GPU or CPU):

% ollama ps
NAME             	ID          	SIZE  	PROCESSOR      	UNTIL              
mixtral:latest   	7708c059a8bb	28 GB 	47%/53% CPU/GPU	Forever           	
llama3:latest    	a6990ed6be41	5.5 GB	100% GPU       	4 minutes from now	
all-minilm:latest	1b226e2802db	585 MB	100% GPU       	4 minutes from now

/clear

To clear the chat history for a session when running ollama run, use /clear:

>>> /clear
Cleared session context
  • Fixed issue where switching loaded models on Windows would take several seconds
  • Running /save will no longer abort the chat session if an incorrect name is provided
  • The /api/tags API endpoint will now correctly return an empty list [] instead of null if no models are provided

New Contributors

Full Changelog: v0.1.37...v0.1.38

v0.1.37

12 May 01:59
41ba301
Compare
Choose a tag to compare

What's Changed

  • Fixed issue where models with uppercase characters in the name would not show with ollama list
  • Fixed usage string for ollama create
  • Fix finish_reason being "" instead of null in the Open-AI compatible chat API.

New Contributors

Full Changelog: v0.1.36...v0.1.37

v0.1.36

11 May 06:37
Compare
Choose a tag to compare

What's Changed

  • Fixed exit status 0xc0000005 error with AMD graphics cards on Windows
  • Fixed rare out of memory errors when loading a model to run with CPU

Full Changelog: v0.1.35...v0.1.36