Release v0.1.2 · ollama/ollama

New Models

Zephyr A fine-tuned 7B version of mistral that was trained on a mix of publicly available, synthetic datasets and performs as well as Llama 2 70B in many benchmarks
Mistral OpenOrca a 7 billion parameter model fine-tuned on top of the Mistral 7B model using the OpenOrca dataset

Ollama's examples have been updated with some new examples:

Ask the mentors: a TypesScript, multi-user conversation app
TypeScript LangChain: a simple example of using Ollama with LangChainJS and TypeScript.

Download speeds for ollama pull have been significantly improved, from 60MB/s to over 1.5GB/s (25x faster) on fast network connections

The API now supports non-streaming responses. Set the stream parameter to false and endpoints will return data in one single response:

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

Ollama can now be used with http proxies (using HTTP_PROXY=http://<proxy>) and https proxies (using HTTPS_PROXY=https://<proxy>)
Fixed token too long error when generating a response
q8_0, q5_0, q5_1, and f32 models will now use GPU on Linux
Revise help text in ollama run to be easier to read
Rename runner subprocess to ollama-runner
ollama create will now show feedback when reading model metadata
Fix not found error showing when running ollama pull
Improved video memory allocation on Linux to fix errors when using Nvidia GPUs

Full Changelog: v0.1.1...v0.1.2