v0.1.2
New Models
- Zephyr A fine-tuned 7B version of mistral that was trained on a mix of publicly available, synthetic datasets and performs as well as Llama 2 70B in many benchmarks
- Mistral OpenOrca a 7 billion parameter model fine-tuned on top of the Mistral 7B model using the OpenOrca dataset
Examples
Ollama's examples have been updated with some new examples:
- Ask the mentors: a TypesScript, multi-user conversation app
- TypeScript LangChain: a simple example of using Ollama with LangChainJS and TypeScript.
What's Changed
- Download speeds for
ollama pull
have been significantly improved, from 60MB/s to over 1.5GB/s (25x faster) on fast network connections - The API now supports non-streaming responses. Set the
stream
parameter tofalse
and endpoints will return data in one single response:curl -X POST http://localhost:11434/api/generate -d '{ "model": "llama2", "prompt": "Why is the sky blue?", "stream": false }'
- Ollama can now be used with http proxies (using
HTTP_PROXY=http://<proxy>
) and https proxies (usingHTTPS_PROXY=https://<proxy>
) - Fixed
token too long
error when generating a response q8_0
,q5_0
,q5_1
, andf32
models will now use GPU on Linux- Revise help text in
ollama run
to be easier to read - Rename runner subprocess to
ollama-runner
ollama create
will now show feedback when reading model metadata- Fix
not found error
showing when runningollama pull
- Improved video memory allocation on Linux to fix errors when using Nvidia GPUs
New Contributors
Full Changelog: v0.1.1...v0.1.2