Skip to content

Releases: ollama/ollama

v0.1.9

10 Nov 05:03
5cba29b
Compare
Choose a tag to compare

New models

  • Yi: a high-performing, bilingual model supporting both English and Chinese.

What's Changed

  • JSON mode: instruct models to always return valid JSON when calling /api/generate by setting the format parameter to json
  • Raw mode: bypass any templating done by Ollama by passing {"raw": true} to /api/generate
  • Better error descriptions when downloading and uploading models with ollama pull and ollama push
  • Fixed issue where Linux installer would encounter an error when running as the root user
  • Improved progress bar design when running ollama pull and ollama push
  • Fixed issue where running on a machine with less than 2GB of VRAM would be slow

New Contributors

Full Changelog: v0.1.8...v0.1.9

v0.1.8

04 Nov 00:06
Compare
Choose a tag to compare

New Models

  • CodeBooga: A high-performing code instruct model created by merging two existing code models.
  • Dolphin 2.2 Mistral: An instruct-tuned model based on Mistral. Version 2.2 is fine-tuned for improved conversation and empathy.
  • MistralLite: MistralLite is a fine-tuned model based on Mistral with enhanced capabilities of processing long contexts.
  • Yarn Mistral an extension of Mistral to support a context window of up to 128 tokens
  • Yarn Llama 2 an extension of Llama 2 to support a context window of up to 128 tokens

What's Changed

  • Ollama will now honour large context sizes on models such as codellama and mistrallite
  • Fixed issue where repeated characters would be output on long contexts
  • ollama push is now much faster. 7B models will push up to ~100MB/s and large models (70B+) up to 1GB/s if network speeds permit

New Contributors

Full Changelog: v0.1.7...v0.1.8

v0.1.7

28 Oct 04:06
Compare
Choose a tag to compare

What's Changed

  • Fixed an issue when running ollama run where certain key combinations such as Ctrl+Space would lead to an unresponsive prompt
  • Fixed issue in ollama run where retrieving the previous prompt from history would require two up arrow key presses instead of one
  • Exiting ollama run with Ctrl+D will now put cursor on the next line

Full Changelog: v0.1.6...v0.1.7

v0.1.6

27 Oct 20:58
3a1ed9f
Compare
Choose a tag to compare

New models

  • Dolphin 2.1 Mistral: an instruct-tuned model based on Mistral and trained on a dataset filtered to remove alignment and bias.
  • Zephyr Beta: this is the second model in the series based on Mistral, and has strong performance that compares to and even exceeds Llama 2 70b in several categories. It’s trained on a distilled dataset, improving grammar and yielding even better chat results.

What's Changed

  • Pasting multi-line strings in ollama run is now possible
  • Fixed various issues when writing prompts in ollama run
  • The library models have been refreshed and revamped including llama2, codellama, and more:
    • All chat or instruct models now support setting the system parameter, or SYSTEM command in the Modelfile
    • Parameters (num_ctx, etc) have been updated for library models
    • Slight performance improvements for all models
  • Model storage can now be configured with OLLAMA_MODELS. See the FAQ for more info on how to configure this.
  • OLLAMA_HOST will now default to port 443 when https:// is specified, and port 80 when http:// is specified
  • Fixed trailing slashes causing an error when using OLLAMA_HOST
  • Fixed issue where ollama pull would retry multiple times when out of space
  • Fixed various out of memory issues when using Nvidia GPUs
  • Fixed performance issue previously introduced on AMD CPUs

New Contributors

Full Changelog: v0.1.5...v0.1.6

v0.1.5

24 Oct 18:09
cecf831
Compare
Choose a tag to compare

What's Changed

  • Fix an issue where an error would occur when running falcon or starcoder models

Full Changelog: v0.1.4...v0.1.5

v0.1.4

20 Oct 12:56
c345b4c
Compare
Choose a tag to compare

New models

  • OpenHermes 2 Mistral: a new fine-tuned model based on Mistral, trained on open datasets totalling over 900,000 instructions. This model has strong multi-turn chat skills, surpassing previous Hermes 13B models and even matching 70B models on some benchmarks.

What's Changed

  • Faster model switching: models will now stay loaded between requests when using different parameters (e.g. temperature) or system prompts
  • starcoder, sqlcoder and falcon models now have unicode support. Note: they will need to be re-pulled (e.g. ollama pull starcoder)
  • New documentation guide on importing existing models to Ollama (GGUF, PyTorch, etc)
  • ollama serve will now print the current version of Ollama on start
  • ollama run will now show more descriptive errors when encountering runtime issues (such as insufficient memory)
  • Fixed an issue where Ollama on Linux would use CPU instead of using both the CPU and GPU for GPUs with less memory
  • Fixed architecture check in Linux install script
  • Fixed issue where leading whitespaces would be returned in responses
  • Fixed issue where ollama show would show an empty SYSTEM prompt (instead of omitting it)
  • Fixed issue with the /api/tags endpoint would return null instead of [] if no models were found
  • Fixed an issue where ollama show wouldn't work when connecting remotely by using OLLAMA_HOST
  • Fixed issue where GPU/Metal would be used on macOS even with num_gpu set to 0
  • Fixed issue where certain characters would be escaped in responses
  • Fixed ollama serve logs to report the proper amount of GPU memory (VRAM) being used

Note: the EMBED keyword in Modelfile is being revisited until a future version of Ollama. Join the discussion on how we can make it better.

New Contributors

Full Changelog: v0.1.3...v0.1.4

v0.1.3

13 Oct 23:59
Compare
Choose a tag to compare

What's Changed

  • Improved various API error messages to be easier to read
  • Improved GPU allocation for older GPUs to fix "out of memory" errors
  • Fixed issue where setting num_gpu to 0 would result in an error
  • Ollama for macOS will now always update to the latest version, even if earlier updates had also been downloaded beforehand

Full Changelog: v0.1.2...v0.1.3

v0.1.2

12 Oct 18:46
Compare
Choose a tag to compare

New Models

  • Zephyr A fine-tuned 7B version of mistral that was trained on a mix of publicly available, synthetic datasets and performs as well as Llama 2 70B in many benchmarks
  • Mistral OpenOrca a 7 billion parameter model fine-tuned on top of the Mistral 7B model using the OpenOrca dataset

Examples

Ollama's examples have been updated with some new examples:

What's Changed

  • Download speeds for ollama pull have been significantly improved, from 60MB/s to over 1.5GB/s (25x faster) on fast network connections
  • The API now supports non-streaming responses. Set the stream parameter to false and endpoints will return data in one single response:
    curl -X POST http://localhost:11434/api/generate -d '{
      "model": "llama2",
      "prompt": "Why is the sky blue?",
      "stream": false
    }'
    
  • Ollama can now be used with http proxies (using HTTP_PROXY=http://<proxy>) and https proxies (using HTTPS_PROXY=https://<proxy>)
  • Fixed token too long error when generating a response
  • q8_0, q5_0, q5_1, and f32 models will now use GPU on Linux
  • Revise help text in ollama run to be easier to read
  • Rename runner subprocess to ollama-runner
  • ollama create will now show feedback when reading model metadata
  • Fix not found error showing when running ollama pull
  • Improved video memory allocation on Linux to fix errors when using Nvidia GPUs

New Contributors

Full Changelog: v0.1.1...v0.1.2

v0.1.1

02 Oct 21:47
1852755
Compare
Choose a tag to compare

What's Changed

  • Cancellable responses: Ctrl+C will now cancel responses when running ollama run
  • Exit ollama run sessions with Ctrl+D or /bye
  • Improved error messages for unknown /slash commands when using ollama run
  • Various improvements to the Linux install script for distro compatibility and to fix bugs
  • Fixed install issues on Fedora
  • Fixed issue where specifying the library/ prefix in ollama run would cause an error
  • Fixed highlight color for placeholder text in ollama run
  • Fixed issue where auto updater would not restart when clicking "Restart to Update"
  • Ollama will now clean up subdirectories in ~/.ollama/models
  • Ollama when now show a default message when ollama show results in an empty message

New Contributors

Full Changelog: v0.1.0...v0.1.1

v0.1.0

23 Sep 13:22
5306b02
Compare
Choose a tag to compare

Ollama for Linux

Ollama for Linux is now available, with GPU acceleration enabled out-of-the-box for Nvidia GPUs.

💯 Ollama will run on cloud servers with multiple GPUs attached
🤖 Ollama will run on WSL 2 with GPU support
😍 Ollama maximizes the number of GPU layers to load to increase performance without crashing
🤩 Ollama will support CPU only, and small hobby gaming GPUs to super powerful workstation graphics cards like the H100

Download

curl https://ollama.ai/install.sh | sh

Manual install steps are also available.

Changelog

  • Ollama will now automatically offload as much of the running model as is supported by your GPU for maximum performance without any crashes
  • Fix issue where characters would be erased when running ollama run
  • Added a new community project by @TwanLuttik in #574

New Contributors

Full Changelog: v0.0.21...v0.1.0