Releases · ollama/ollama

10 Nov 05:03

jmorganca

v0.1.9

5cba29b

v0.1.9

New models

Yi: a high-performing, bilingual model supporting both English and Chinese.

What's Changed

JSON mode: instruct models to always return valid JSON when calling /api/generate by setting the format parameter to json
Raw mode: bypass any templating done by Ollama by passing {"raw": true} to /api/generate
Better error descriptions when downloading and uploading models with ollama pull and ollama push
Fixed issue where Linux installer would encounter an error when running as the root user
Improved progress bar design when running ollama pull and ollama push
Fixed issue where running on a machine with less than 2GB of VRAM would be slow

New Contributors

@pepperoni21 made their first contribution in #995
@lgrammel made their first contribution in #1020
@ej52 made their first contribution in #999
@David-Kunz made their first contribution in #996
@tjbck made their first contribution in #943
@omagdy7 made their first contribution in #1029
@upchui made their first contribution in #1034
@kevinhermawan made their first contribution in #1043
@amithkoujalgi made their first contribution in #1044
@mpldr made their first contribution in #1042
@aashish2057 made their first contribution in #992
@nickanderson made their first contribution in #1062

Full Changelog: v0.1.8...v0.1.9

Contributors

nickanderson, lgrammel, and 10 other contributors

Assets 6

04 Nov 00:06

jmorganca

v0.1.8

e21579a

v0.1.8

New Models

CodeBooga: A high-performing code instruct model created by merging two existing code models.
Dolphin 2.2 Mistral: An instruct-tuned model based on Mistral. Version 2.2 is fine-tuned for improved conversation and empathy.
MistralLite: MistralLite is a fine-tuned model based on Mistral with enhanced capabilities of processing long contexts.
Yarn Mistral an extension of Mistral to support a context window of up to 128 tokens
Yarn Llama 2 an extension of Llama 2 to support a context window of up to 128 tokens

What's Changed

Ollama will now honour large context sizes on models such as codellama and mistrallite
Fixed issue where repeated characters would be output on long contexts
ollama push is now much faster. 7B models will push up to ~100MB/s and large models (70B+) up to 1GB/s if network speeds permit

New Contributors

@dloss made their first contribution in #948
@noahgitsham made their first contribution in #983

Full Changelog: v0.1.7...v0.1.8

Contributors

dloss and noahgitsham

Assets 6

28 Oct 04:06

jmorganca

v0.1.7

9ec16f0

v0.1.7

What's Changed

Fixed an issue when running ollama run where certain key combinations such as Ctrl+Space would lead to an unresponsive prompt
Fixed issue in ollama run where retrieving the previous prompt from history would require two up arrow key presses instead of one
Exiting ollama run with Ctrl+D will now put cursor on the next line

Full Changelog: v0.1.6...v0.1.7

Assets 6

27 Oct 20:58

jmorganca

v0.1.6

3a1ed9f

v0.1.6

New models

Dolphin 2.1 Mistral: an instruct-tuned model based on Mistral and trained on a dataset filtered to remove alignment and bias.
Zephyr Beta: this is the second model in the series based on Mistral, and has strong performance that compares to and even exceeds Llama 2 70b in several categories. It’s trained on a distilled dataset, improving grammar and yielding even better chat results.

What's Changed

Pasting multi-line strings in ollama run is now possible
Fixed various issues when writing prompts in ollama run
The library models have been refreshed and revamped including llama2, codellama, and more:
- All chat or instruct models now support setting the system parameter, or SYSTEM command in the Modelfile
- Parameters (num_ctx, etc) have been updated for library models
- Slight performance improvements for all models
Model storage can now be configured with OLLAMA_MODELS. See the FAQ for more info on how to configure this.
OLLAMA_HOST will now default to port 443 when https:// is specified, and port 80 when http:// is specified
Fixed trailing slashes causing an error when using OLLAMA_HOST
Fixed issue where ollama pull would retry multiple times when out of space
Fixed various out of memory issues when using Nvidia GPUs
Fixed performance issue previously introduced on AMD CPUs

New Contributors

@ajayk made their first contribution in #855

Full Changelog: v0.1.5...v0.1.6

Contributors

ajayk

Assets 6

24 Oct 18:09

jmorganca

v0.1.5

cecf831

v0.1.5

What's Changed

Fix an issue where an error would occur when running falcon or starcoder models

Full Changelog: v0.1.4...v0.1.5

Assets 6

20 Oct 12:56

jmorganca

v0.1.4

c345b4c

v0.1.4

New models

OpenHermes 2 Mistral: a new fine-tuned model based on Mistral, trained on open datasets totalling over 900,000 instructions. This model has strong multi-turn chat skills, surpassing previous Hermes 13B models and even matching 70B models on some benchmarks.

What's Changed

Faster model switching: models will now stay loaded between requests when using different parameters (e.g. temperature) or system prompts
starcoder, sqlcoder and falcon models now have unicode support. Note: they will need to be re-pulled (e.g. ollama pull starcoder)
New documentation guide on importing existing models to Ollama (GGUF, PyTorch, etc)
ollama serve will now print the current version of Ollama on start
ollama run will now show more descriptive errors when encountering runtime issues (such as insufficient memory)
Fixed an issue where Ollama on Linux would use CPU instead of using both the CPU and GPU for GPUs with less memory
Fixed architecture check in Linux install script
Fixed issue where leading whitespaces would be returned in responses
Fixed issue where ollama show would show an empty SYSTEM prompt (instead of omitting it)
Fixed issue with the /api/tags endpoint would return null instead of [] if no models were found
Fixed an issue where ollama show wouldn't work when connecting remotely by using OLLAMA_HOST
Fixed issue where GPU/Metal would be used on macOS even with num_gpu set to 0
Fixed issue where certain characters would be escaped in responses
Fixed ollama serve logs to report the proper amount of GPU memory (VRAM) being used

Note: the EMBED keyword in Modelfile is being revisited until a future version of Ollama. Join the discussion on how we can make it better.

New Contributors

@vieux made their first contribution in #810
@s-kostyaev made their first contribution in #801
@ggozad made their first contribution in #794
@awaescher made their first contribution in #811
@deichbewohner made their first contribution in #799

Full Changelog: v0.1.3...v0.1.4

Contributors

ggozad, vieux, and 3 other contributors

Assets 6

13 Oct 23:59

jmorganca

v0.1.3

832b4db

v0.1.3

What's Changed

Improved various API error messages to be easier to read
Improved GPU allocation for older GPUs to fix "out of memory" errors
Fixed issue where setting num_gpu to 0 would result in an error
Ollama for macOS will now always update to the latest version, even if earlier updates had also been downloaded beforehand

Full Changelog: v0.1.2...v0.1.3

Assets 6

12 Oct 18:46

jmorganca

v0.1.2

d890890

v0.1.2

New Models

Zephyr A fine-tuned 7B version of mistral that was trained on a mix of publicly available, synthetic datasets and performs as well as Llama 2 70B in many benchmarks
Mistral OpenOrca a 7 billion parameter model fine-tuned on top of the Mistral 7B model using the OpenOrca dataset

Examples

Ollama's examples have been updated with some new examples:

Ask the mentors: a TypesScript, multi-user conversation app
TypeScript LangChain: a simple example of using Ollama with LangChainJS and TypeScript.

What's Changed

Download speeds for ollama pull have been significantly improved, from 60MB/s to over 1.5GB/s (25x faster) on fast network connections

The API now supports non-streaming responses. Set the stream parameter to false and endpoints will return data in one single response:

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

Ollama can now be used with http proxies (using HTTP_PROXY=http://<proxy>) and https proxies (using HTTPS_PROXY=https://<proxy>)
Fixed token too long error when generating a response
q8_0, q5_0, q5_1, and f32 models will now use GPU on Linux
Revise help text in ollama run to be easier to read
Rename runner subprocess to ollama-runner
ollama create will now show feedback when reading model metadata
Fix not found error showing when running ollama pull
Improved video memory allocation on Linux to fix errors when using Nvidia GPUs

New Contributors

@xyproto made their first contribution in #705
@konsalex made their first contribution in #741

Full Changelog: v0.1.1...v0.1.2

Contributors

xyproto and konsalex

Assets 6

02 Oct 21:47

jmorganca

v0.1.1

1852755

v0.1.1

What's Changed

Cancellable responses: Ctrl+C will now cancel responses when running ollama run
Exit ollama run sessions with Ctrl+D or /bye
Improved error messages for unknown /slash commands when using ollama run
Various improvements to the Linux install script for distro compatibility and to fix bugs
Fixed install issues on Fedora
Fixed issue where specifying the library/ prefix in ollama run would cause an error
Fixed highlight color for placeholder text in ollama run
Fixed issue where auto updater would not restart when clicking "Restart to Update"
Ollama will now clean up subdirectories in ~/.ollama/models
Ollama when now show a default message when ollama show results in an empty message

New Contributors

@aaroncoffey made their first contribution in #629
@lstep made their first contribution in #621
@JayNakrani made their first contribution in #632
@jimexist made their first contribution in #664
@hallh made their first contribution in #663

Full Changelog: v0.1.0...v0.1.1

Contributors

lstep, jimexist, and 3 other contributors

Assets 6

23 Sep 13:22

jmorganca

v0.1.0

5306b02

v0.1.0

Ollama for Linux

Ollama for Linux is now available, with GPU acceleration enabled out-of-the-box for Nvidia GPUs.

💯 Ollama will run on cloud servers with multiple GPUs attached
🤖 Ollama will run on WSL 2 with GPU support
😍 Ollama maximizes the number of GPU layers to load to increase performance without crashing
🤩 Ollama will support CPU only, and small hobby gaming GPUs to super powerful workstation graphics cards like the H100

Download

curl https://ollama.ai/install.sh | sh

Manual install steps are also available.

Changelog

Ollama will now automatically offload as much of the running model as is supported by your GPU for maximum performance without any crashes
Fix issue where characters would be erased when running ollama run
Added a new community project by @TwanLuttik in #574

New Contributors

@TwanLuttik made their first contribution in #574

Full Changelog: v0.0.21...v0.1.0

Contributors

TwanLuttik

Assets 6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New models

What's Changed

New Contributors

Contributors

New Models

What's Changed

New Contributors

Contributors

What's Changed

New models

What's Changed

New Contributors

Contributors

What's Changed

New models

What's Changed

New Contributors

Contributors

What's Changed

New Models

Examples

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

Ollama for Linux

Download

Changelog

New Contributors

Contributors

Releases: ollama/ollama

v0.1.9

New models

What's Changed

New Contributors

Contributors

v0.1.8

New Models

What's Changed

New Contributors

Contributors

v0.1.7

What's Changed

v0.1.6

New models

What's Changed

New Contributors

Contributors

v0.1.5

What's Changed

v0.1.4

New models

What's Changed

New Contributors

Contributors

v0.1.3

What's Changed

v0.1.2

New Models

Examples

What's Changed

New Contributors

Contributors

v0.1.1

What's Changed

New Contributors

Contributors

v0.1.0

Ollama for Linux

Download

Changelog

New Contributors

Contributors