Skip to content

Releases: qualcomm/ai-hub-models

v0.56.0

19 Jun 01:24
ca623d4

Choose a tag to compare

Model improvements & fixes

  • Added evaluation support for Detectron2_Detection
  • Fixed dataset issue to reflect the correct on-device accuracy for BERT
  • Fixed accuracy discrepancy between torch and on-device for HuggingFace WaveLM Base Plus
  • Fixed input type to avoid mismatch for VideoMAE
  • Model changes to RTMDet that allow quantization to be w8a16
  • Correctly adding all context lengths to Qwen 2.5 7B VL metadata
  • PiperTTS variants correctly specify the language in their description.
  • Running demo.py on-device now works with custom input shapes
  • MMMU multimodal eval dataset and evaluator were added for VLMs
  • Ability run curated 100 prompts evaluation across LLMs/VLMs in evaluate.py
  • Added performance numbers for Samsung Galaxy S26 across all models.
  • XR 2 Gen 2 published perf numbers are now measured using Samsung Galaxy S22 instead of QCS8450 (Proxy), as Proxy devices will soon be deprecated in workbench.

v0.55.0

04 Jun 09:40
9e147e3

Choose a tag to compare

New Models

  • NAFNet (denoise and deblur)
  • PiperTTS (en/de/it)
  • YOLOv8-OBB
  • YOLOv11-Pose

Improvements

  • YOLO models now support exporting to dynamic batch size
  • EdgeTAM now supports video object tracking (in addition to image segmentation)
  • Link to Voice AI SDK added to website and README for relevant models

Bug Fixes

  • Add missing ddcolor dependency

Performance Numbers

  • Updated performance numbers with the latest version of AI Hub Workbench
  • Dragonwing IQ-8275 EVK is now featured on model cards as a similar device to SA7255P

v0.54.0

19 May 22:27
eb19c08

Choose a tag to compare

New Models

  • Pi0.5 (pi05) — Vision-language-action model

Improvements

  • Select Qualcomm devices that are not hosted on AI Hub Workbench (i.e., Dragonwing Q-7790, Dragonwing Q-8750, Dragonwing IQ-X5121, Dragonwing IQ-X7181, SA8255P ADP, SA8650P ADP) now appear on the device dropdown list for applicable models alongside metrics for a similar hosted device. This helps users to identify supported device/model pairings even when AI Hub Workbench cannot host the actual device.

Bug Fixes

  • Fix FCN-ResNet50 accuracy gap: Previous evaluation used incorrect label mapping for the 21-class VOC segmentation task, producing artificially low metrics.
  • Fix DETR models mixed precision: Switched DETR-ResNet50/101 and Conditional-DETR quantization to use fp16 as the higher precision, resolving an accuracy regression with the previous int16 recipe.
  • Fix MeloTTS empty assets in config: Model assets in config.json were serializing as empty due to a Pydantic v2 type annotation issue

v0.53.1

05 May 16:26
a3b8ab0

Choose a tag to compare

  • New models

  • Re-instated models

  • Improvements and bug fixes

    • Export metadata.json was missing IO specs for models released as context bin
    • Evaluation bug fixes for huggingface_wavelm_base_plus
    • The model llama_v3_2_3b_instruct_ssd now supports up to 8K context length (thanks to config changes and LM head split out into separate context binary)

v0.52.0

28 Apr 15:49
bbca4cf

Choose a tag to compare

New Models

  • Qwen2.5-VL-7B (handles both vision and text input, pre-compiled assets are available from here)
  • ResNet34-SSD

Improvements

  • Downloads from model cards are now zip files consistent with the export command (which includes metadata and auxiliary files).
  • Fix profiling issues with ddrnet23_slim and lama_dilated.
  • Add Genie App scripts to LLMs/VLMs (run with genie-app -s genie-app-script.txt)
  • Bug fixes

v0.51.0

22 Apr 00:04
a52c65d

Choose a tag to compare

New Models

  • CREStereo
  • DnCNN — First image de-noising model
  • Mask2Former
  • YOLOv9-Det

Reinstated Models

  • Whisper-Medium
  • RF-DETR
  • Swin family (Swin-Base, Swin-Small, Swin-Tiny, SwinV2-Base)

New Features

  • Whisper, MeloTTS, and Opus models can now be exported to run with the Voice AI runtime
  • Enhanced model metadata — get_input_spec() now carries typed preprocessing metadata (image normalization, resize info) via InputSpecEntry for programmatic access

Improvements

  • Bumped default QAIRT version to 2.45; all performance and accuracy numbers updated - Upgraded LLM Genie runtime to QAIRT 2.45
  • Improved mediapipe_selfie visualization to match MediaPipe reference
  • Model metadata produced by export changed from YAML to JSON and now includes LLM metadata

Bug Fixes / Cleanup

  • Fixed swinv2_base accuracy — added missing attention replacement that was causing ~39% accuracy loss
  • Fixed a bug in the Whisper demo which now runs successfully
  • Extraneous normalization in DDRNet and PidNet demo was removed
  • Qwen 2.5 7B export now includes sequence length of 1
  • Fixed electra_bert_base_discrim_google evaluation script for batch size > 1
  • Removed ONNXRUNTIME_GENAI runtime

v0.50.2

09 Apr 20:54
15564da

Choose a tag to compare

Improvements:

  • All the assets and performance/accuracy numbers were updated.

v0.50.1

09 Apr 20:53
bd9f6e1

Choose a tag to compare

 Bug Fixes:

  • Fixed windows compatibility for Llama 3.2 3B Instruct SSD
  • In export.py, models with multiple components now support changing input size for each component separately
  • Fix issue where LiteRT was not included in the SDK / Tool versions in metadata created by export.py
  • Fixed issue where evaluate.py for electra_bert_base_discrim_google would always produce accuracy of 0
  • Fix issue where CLI commands would fail on Python 3.11+
  • Restore missing package README for display on PyPI

v0.50.0

09 Apr 20:53
df201c7

Choose a tag to compare

New Models & Assets

  • Qwen3-4B-Instruct-2507
  • Stereonet

 Bug Fixes:

  • Fixed export of Qwen 2.5 7B Instruct
  • SD 2.1 had regressed and was returning noise instead of a natural image. This has been fixed.

Improvements:

  • Added support for PyTorch 2.11

v0.49.1

24 Mar 02:55
c7381f9

Choose a tag to compare

General Updates

  • Added quantized variants for DETR-ResNet50, DETR-ResNet50-DC5, DETR-ResNet101, and DETR-ResNet101-DC5
    Removed BiseNet and BGNet due to licensing concerns
  • Llama 3.2 3B Instruct SSD variant uses Self Speculative Decoding (SSD), inference acceleration solution that achieves on-target speed up with guaranteed output accuracy identical to the base model. Choose this variant over llama_v3_2_3b_instruct for faster token generation on supported devices.
  • Updated performance & accuracy data from latest version of AI Hub Workbench

Bug Fixes

  • Fixed batchnorm unfolding issue in MediaPipe Hand Gesture, enabling the model to be fully NPU-resident when quantized with TFLite
  • Fixed non-determinism in loading the BSD300 dataset. This previously caused us to report incorrect accuracy data for several super resolution models.
  • MeloTTS has been updated to work around an HTP issue with summation that produced incorrect shapes at runtime. This update is available only via the export script and is not yet available with pre-generated assets.