Releases: ollama/ollama
v0.1.9
New models
- Yi: a high-performing, bilingual model supporting both English and Chinese.
What's Changed
- JSON mode: instruct models to always return valid JSON when calling
/api/generate
by setting theformat
parameter tojson
- Raw mode: bypass any templating done by Ollama by passing
{"raw": true}
to/api/generate
- Better error descriptions when downloading and uploading models with
ollama pull
andollama push
- Fixed issue where Linux installer would encounter an error when running as the
root
user - Improved progress bar design when running
ollama pull
andollama push
- Fixed issue where running on a machine with less than 2GB of VRAM would be slow
New Contributors
- @pepperoni21 made their first contribution in #995
- @lgrammel made their first contribution in #1020
- @ej52 made their first contribution in #999
- @David-Kunz made their first contribution in #996
- @tjbck made their first contribution in #943
- @omagdy7 made their first contribution in #1029
- @upchui made their first contribution in #1034
- @kevinhermawan made their first contribution in #1043
- @amithkoujalgi made their first contribution in #1044
- @mpldr made their first contribution in #1042
- @aashish2057 made their first contribution in #992
- @nickanderson made their first contribution in #1062
Full Changelog: v0.1.8...v0.1.9
v0.1.8
New Models
- CodeBooga: A high-performing code instruct model created by merging two existing code models.
- Dolphin 2.2 Mistral: An instruct-tuned model based on Mistral. Version 2.2 is fine-tuned for improved conversation and empathy.
- MistralLite: MistralLite is a fine-tuned model based on Mistral with enhanced capabilities of processing long contexts.
- Yarn Mistral an extension of Mistral to support a context window of up to 128 tokens
- Yarn Llama 2 an extension of Llama 2 to support a context window of up to 128 tokens
What's Changed
- Ollama will now honour large context sizes on models such as
codellama
andmistrallite
- Fixed issue where repeated characters would be output on long contexts
ollama push
is now much faster. 7B models will push up to ~100MB/s and large models (70B+) up to 1GB/s if network speeds permit
New Contributors
- @dloss made their first contribution in #948
- @noahgitsham made their first contribution in #983
Full Changelog: v0.1.7...v0.1.8
v0.1.7
What's Changed
- Fixed an issue when running
ollama run
where certain key combinations such as Ctrl+Space would lead to an unresponsive prompt - Fixed issue in
ollama run
where retrieving the previous prompt from history would require two up arrow key presses instead of one - Exiting
ollama run
with Ctrl+D will now put cursor on the next line
Full Changelog: v0.1.6...v0.1.7
v0.1.6
New models
- Dolphin 2.1 Mistral: an instruct-tuned model based on Mistral and trained on a dataset filtered to remove alignment and bias.
- Zephyr Beta: this is the second model in the series based on Mistral, and has strong performance that compares to and even exceeds Llama 2 70b in several categories. It’s trained on a distilled dataset, improving grammar and yielding even better chat results.
What's Changed
- Pasting multi-line strings in
ollama run
is now possible - Fixed various issues when writing prompts in
ollama run
- The library models have been refreshed and revamped including
llama2
,codellama
, and more:- All
chat
orinstruct
models now support setting thesystem
parameter, orSYSTEM
command in theModelfile
- Parameters (
num_ctx
, etc) have been updated for library models - Slight performance improvements for all models
- All
- Model storage can now be configured with
OLLAMA_MODELS
. See the FAQ for more info on how to configure this. OLLAMA_HOST
will now default to port443
whenhttps://
is specified, and port80
whenhttp://
is specified- Fixed trailing slashes causing an error when using
OLLAMA_HOST
- Fixed issue where
ollama pull
would retry multiple times when out of space - Fixed various
out of memory
issues when using Nvidia GPUs - Fixed performance issue previously introduced on AMD CPUs
New Contributors
Full Changelog: v0.1.5...v0.1.6
v0.1.5
What's Changed
- Fix an issue where an error would occur when running
falcon
orstarcoder
models
Full Changelog: v0.1.4...v0.1.5
v0.1.4
New models
- OpenHermes 2 Mistral: a new fine-tuned model based on Mistral, trained on open datasets totalling over 900,000 instructions. This model has strong multi-turn chat skills, surpassing previous Hermes 13B models and even matching 70B models on some benchmarks.
What's Changed
- Faster model switching: models will now stay loaded between requests when using different parameters (e.g.
temperature
) or system prompts starcoder
,sqlcoder
andfalcon
models now have unicode support. Note: they will need to be re-pulled (e.g.ollama pull starcoder
)- New documentation guide on importing existing models to Ollama (GGUF, PyTorch, etc)
ollama serve
will now print the current version of Ollama on startollama run
will now show more descriptive errors when encountering runtime issues (such as insufficient memory)- Fixed an issue where Ollama on Linux would use CPU instead of using both the CPU and GPU for GPUs with less memory
- Fixed architecture check in Linux install script
- Fixed issue where leading whitespaces would be returned in responses
- Fixed issue where
ollama show
would show an emptySYSTEM
prompt (instead of omitting it) - Fixed issue with the
/api/tags
endpoint would returnnull
instead of[]
if no models were found - Fixed an issue where
ollama show
wouldn't work when connecting remotely by usingOLLAMA_HOST
- Fixed issue where GPU/Metal would be used on macOS even with
num_gpu
set to0
- Fixed issue where certain characters would be escaped in responses
- Fixed
ollama serve
logs to report the proper amount of GPU memory (VRAM) being used
Note: the EMBED
keyword in Modelfile
is being revisited until a future version of Ollama. Join the discussion on how we can make it better.
New Contributors
- @vieux made their first contribution in #810
- @s-kostyaev made their first contribution in #801
- @ggozad made their first contribution in #794
- @awaescher made their first contribution in #811
- @deichbewohner made their first contribution in #799
Full Changelog: v0.1.3...v0.1.4
v0.1.3
What's Changed
- Improved various API error messages to be easier to read
- Improved GPU allocation for older GPUs to fix "out of memory" errors
- Fixed issue where setting
num_gpu
to0
would result in an error - Ollama for macOS will now always update to the latest version, even if earlier updates had also been downloaded beforehand
Full Changelog: v0.1.2...v0.1.3
v0.1.2
New Models
- Zephyr A fine-tuned 7B version of mistral that was trained on a mix of publicly available, synthetic datasets and performs as well as Llama 2 70B in many benchmarks
- Mistral OpenOrca a 7 billion parameter model fine-tuned on top of the Mistral 7B model using the OpenOrca dataset
Examples
Ollama's examples have been updated with some new examples:
- Ask the mentors: a TypesScript, multi-user conversation app
- TypeScript LangChain: a simple example of using Ollama with LangChainJS and TypeScript.
What's Changed
- Download speeds for
ollama pull
have been significantly improved, from 60MB/s to over 1.5GB/s (25x faster) on fast network connections - The API now supports non-streaming responses. Set the
stream
parameter tofalse
and endpoints will return data in one single response:curl -X POST http://localhost:11434/api/generate -d '{ "model": "llama2", "prompt": "Why is the sky blue?", "stream": false }'
- Ollama can now be used with http proxies (using
HTTP_PROXY=http://<proxy>
) and https proxies (usingHTTPS_PROXY=https://<proxy>
) - Fixed
token too long
error when generating a response q8_0
,q5_0
,q5_1
, andf32
models will now use GPU on Linux- Revise help text in
ollama run
to be easier to read - Rename runner subprocess to
ollama-runner
ollama create
will now show feedback when reading model metadata- Fix
not found error
showing when runningollama pull
- Improved video memory allocation on Linux to fix errors when using Nvidia GPUs
New Contributors
Full Changelog: v0.1.1...v0.1.2
v0.1.1
What's Changed
- Cancellable responses:
Ctrl+C
will now cancel responses when runningollama run
- Exit
ollama run
sessions withCtrl+D
or/bye
- Improved error messages for unknown
/slash
commands when usingollama run
- Various improvements to the Linux install script for distro compatibility and to fix bugs
- Fixed install issues on Fedora
- Fixed issue where specifying the
library/
prefix inollama run
would cause an error - Fixed highlight color for placeholder text in
ollama run
- Fixed issue where auto updater would not restart when clicking "Restart to Update"
- Ollama will now clean up subdirectories in
~/.ollama/models
- Ollama when now show a default message when
ollama show
results in an empty message
New Contributors
- @aaroncoffey made their first contribution in #629
- @lstep made their first contribution in #621
- @JayNakrani made their first contribution in #632
- @jimexist made their first contribution in #664
- @hallh made their first contribution in #663
Full Changelog: v0.1.0...v0.1.1
v0.1.0
Ollama for Linux
Ollama for Linux is now available, with GPU acceleration enabled out-of-the-box for Nvidia GPUs.
💯 Ollama will run on cloud servers with multiple GPUs attached
🤖 Ollama will run on WSL 2 with GPU support
😍 Ollama maximizes the number of GPU layers to load to increase performance without crashing
🤩 Ollama will support CPU only, and small hobby gaming GPUs to super powerful workstation graphics cards like the H100
Download
curl https://ollama.ai/install.sh | sh
Manual install steps are also available.
Changelog
- Ollama will now automatically offload as much of the running model as is supported by your GPU for maximum performance without any crashes
- Fix issue where characters would be erased when running
ollama run
- Added a new community project by @TwanLuttik in #574
New Contributors
- @TwanLuttik made their first contribution in #574
Full Changelog: v0.0.21...v0.1.0