Releases: ollama/ollama
v0.1.45
What's Changed
- Setting
seed
in the/v1/chat/completions
OpenAI compatibility endpoint no longer changestemperature
- Enhanced GPU discovery and multi-gpu support with concurrency
- Skip searching for network devices in Linux install script
- Report GPU configuration variables in server log
- Update linux ROCm to v6.1.1
- Workaround AMD Vega RX 56 SDMA support on linux
- Fix memory prediction for deepseek v2 models
- Speed up model loading on Windows with CUDA GPUs
New Contributors
- @jayson-cloude made their first contribution in #4972
Full Changelog: v0.1.44...v0.1.45-rc3
v0.1.44
What's Changed
- Fixed issue where unicode characters such as emojis would not be loaded correctly when running
ollama create
- Fixed certain cases where Nvidia GPUs would not be detected and reported as compute capability 1.0 devices
Full Changelog: v0.1.43...v0.1.44
v0.1.43
What's Changed
- New import.md guide for converting and importing models to Ollama
- Fixed issue where embedding vectors resulting from
/api/embeddings
would not be accurate - JSON mode responses will no longer include invalid escape characters
- Removing a model will no longer show incorrect
File not found
errors - Fixed issue where running
ollama create
would result in an error on Windows with certain file formatting
New Contributors
- @erhant made their first contribution in #4854
- @nischalj10 made their first contribution in #4612
- @dcasota made their first contribution in #4852
- @Napuh made their first contribution in #4084
- @hughescr made their first contribution in #3782
- @jimscard made their first contribution in #3382
Full Changelog: v0.1.42...v0.1.43
v0.1.42
New models
- Qwen 2: a new series of large language models from Alibaba group
What's Changed
- Fixed issue where
qwen2
would output erroneous text such asGGG
on Nvidia and AMD GPUs ollama pull
is now faster if it detects a model is already downloadedollama create
will now automatically detect prompt templates for popular model architectures such as Llama, Gemma, Phi and more.- Ollama can now be accessed from local apps built with Electron and Tauri, as well as in developing apps in local html files
- Update welcome prompt in Windows to
llama3
- Fixed issues where
/api/ps
and/api/tags
would show invalid timestamps in responses
New Contributors
- @shoebham made their first contribution in #4766
- @kartikm7 made their first contribution in #4719
- @royjhan made their first contribution in #4822
Full Changelog: v0.1.41...v0.1.42
v0.1.41
What's Changed
- Fixed issue on Windows 10 and 11 with Intel CPUs with integrated GPUs where Ollama would encounter an error
Full Changelog: v0.1.40...v0.1.41
v0.1.40
New models
- Codestral: Codestral is Mistral AI’s first-ever code model designed for code generation tasks.
- IBM Granite Code: now in 3B and 8B parameter sizes.
- Deepseek V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
What's Changed
- Fixed out of memory and incorrect token issues when running Codestral on 16GB Macs
- Fixed issue where full-width characters (e.g. Japanese, Chinese, Russian) were deleted at end of the line when using
ollama run
New Examples
New Contributors
- @zhewang1-intc made their first contribution in #3278
Full Changelog: v0.1.39...v0.1.40
v0.1.39
New models
- Cohere Aya 23: A new state-of-the-art, multilingual LLM covering 23 different languages.
- Mistral 7B 0.3: A new version of Mistral 7B with initial support for function calling.
- Phi-3 Medium: a 14B parameters, lightweight, state-of-the-art open model by Microsoft.
- Phi-3 Mini 128K and Phi-3 Medium 128K: versions of the Phi-3 models that support a context window size of 128K
- Granite code: A family of open foundation models by IBM for Code Intelligence
Llama 3 import
It is now possible to import and quantize Llama 3 and its finetunes from Safetensors format to Ollama.
First, clone a Hugging Face repo with a Safetensors model:
git clone https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
cd Meta-Llama-3-8B-Instruct
Next, create a Modelfile
:
FROM .
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
{{ .Response }}<|eot_id|>"""
PARAMETER stop <|start_header_id|>
PARAMETER stop <|end_header_id|>
PARAMETER stop <|eot_id|>
Then, create and quantize a model:
ollama create --quantize q4_0 -f Modelfile my-llama3
ollama run my-llama3
What's Changed
- Fixed issues where wide characters such as Chinese, Korean, Japanese and Russian languages.
- Added new
OLLAMA_NOHISTORY=1
environment variable that can be set to disable history when usingollama run
- New experimental
OLLAMA_FLASH_ATTENTION=1
flag forollama serve
that improves token generation speed on Apple Silicon Macs and NVIDIA graphics cards - Fixed error that would occur on Windows running
ollama create -f Modelfile
ollama create
can now create models from I-Quant GGUF files- Fixed
EOF
errors when resuming downloads viaollama pull
- Added a
Ctrl+W
shortcut toollama run
New Contributors
- @rapmd73 made their first contribution in #4467
- @sammcj made their first contribution in #4120
- @likejazz made their first contribution in #4535
Full Changelog: v0.1.38...v0.1.39
v0.1.38
New Models
- Falcon 2: A new 11B parameters causal decoder-only model built by TII and trained over 5T tokens.
- Yi 1.5: A new high-performing version of Yi, now licensed as Apache 2.0. Available in 6B, 9B and 34B sizes.
What's Changed
ollama ps
A new command is now available: ollama ps
. This command displays currently loaded models, their memory footprint, and the processors used (GPU or CPU):
% ollama ps
NAME ID SIZE PROCESSOR UNTIL
mixtral:latest 7708c059a8bb 28 GB 47%/53% CPU/GPU Forever
llama3:latest a6990ed6be41 5.5 GB 100% GPU 4 minutes from now
all-minilm:latest 1b226e2802db 585 MB 100% GPU 4 minutes from now
/clear
To clear the chat history for a session when running ollama run
, use /clear
:
>>> /clear
Cleared session context
- Fixed issue where switching loaded models on Windows would take several seconds
- Running
/save
will no longer abort the chat session if an incorrect name is provided - The
/api/tags
API endpoint will now correctly return an empty list[]
instead ofnull
if no models are provided
New Contributors
- @fangtaosong made their first contribution in #4387
- @machimachida made their first contribution in #4424
Full Changelog: v0.1.37...v0.1.38
v0.1.37
What's Changed
- Fixed issue where models with uppercase characters in the name would not show with
ollama list
- Fixed usage string for
ollama create
- Fix
finish_reason
being""
instead ofnull
in the Open-AI compatible chat API.
New Contributors
- @todashuta made their first contribution in #4362
Full Changelog: v0.1.36...v0.1.37
v0.1.36
What's Changed
- Fixed
exit status 0xc0000005
error with AMD graphics cards on Windows - Fixed rare out of memory errors when loading a model to run with CPU
Full Changelog: v0.1.35...v0.1.36