Skip to content

v0.1.23

Compare
Choose a tag to compare
@jmorganca jmorganca released this 02 Feb 06:34
· 1183 commits to main since this release

vision

New vision models

The LLaVA model family on Ollama has been updated to version 1.6, and now includes a new 34b version:

  • ollama run llava A new 7B LLaVA model based on mistral.
  • ollama run llava:13b 13B LLaVA model
  • ollama run llava:34b 34B LLaVA model – one of the most powerful open-source vision models available

These new models share new improvements:

  • More permissive licenses: LLaVA 1.6 models are distributed via the Apache 2.0 license or the LLaMA 2 Community License.
  • Higher image resolution: support for up to 4x more pixels, allowing the model to grasp more details.
  • Improved text recognition and reasoning capabilities: these models are trained on additional document, chart and diagram data sets.

keep_alive parameter: control how long models stay loaded

When making API requests, the new keep_alive parameter can be used to control how long a model stays loaded in memory:

curl http://localhost:11434/api/generate -d '{
  "model": "mistral",
  "prompt": "Why is the sky blue?",
  "keep_alive": "30s"
}'
  • If set to a positive duration (e.g. 20m, 1hr or 30), the model will stay loaded for the provided duration
  • If set to a negative duration (e.g. -1), the model will stay loaded indefinitely
  • If set to 0, the model will be unloaded immediately once finished
  • If not set, the model will stay loaded for 5 minutes by default

Support for more Nvidia GPUs

  • GeForce GTX TITAN X 980 Ti 980 970 960 950 750 Ti 750
  • GeForce GTX 980M 970M 965M 960M 950M 860M 850M
  • GeForce 940M 930M 910M 840M 830M
  • Quadro M6000 M5500M M5000 M2200 M1200 M620 M520
  • Tesla M60 M40
  • NVS 810

What's Changed

  • New keep_alive API parameter to control how long models stay loaded
  • Image paths can now be provided to ollama run when running multimodal models
  • Fixed issue where downloading models via ollama pull would slow down to 99%
  • Fixed error when running Ollama with Nvidia GPUs and CPUs without AVX instructions
  • Support for additional Nvidia GPUs (compute capability 5)
  • Fixed issue where system prompt would be repeated in subsequent messages
  • ollama serve will now print prompt when OLLAMA_DEBUG=1 is set
  • Fixed issue where exceeding context size would cause erroneous responses in ollama run and the /api/chat API
  • ollama run will now allow sending messages without images to multimodal models

New Contributors

Full Changelog: v0.1.22...v0.1.23