Releases: qualcomm/ai-hub-models
Releases · qualcomm/ai-hub-models
v0.56.0
Model improvements & fixes
- Added evaluation support for Detectron2_Detection
- Fixed dataset issue to reflect the correct on-device accuracy for BERT
- Fixed accuracy discrepancy between torch and on-device for HuggingFace WaveLM Base Plus
- Fixed input type to avoid mismatch for VideoMAE
- Model changes to RTMDet that allow quantization to be w8a16
- Correctly adding all context lengths to Qwen 2.5 7B VL metadata
- PiperTTS variants correctly specify the language in their description.
- Running demo.py on-device now works with custom input shapes
- MMMU multimodal eval dataset and evaluator were added for VLMs
- Ability run curated 100 prompts evaluation across LLMs/VLMs in evaluate.py
- Added performance numbers for Samsung Galaxy S26 across all models.
- XR 2 Gen 2 published perf numbers are now measured using Samsung Galaxy S22 instead of QCS8450 (Proxy), as Proxy devices will soon be deprecated in workbench.
v0.55.0
New Models
- NAFNet (denoise and deblur)
- PiperTTS (en/de/it)
- YOLOv8-OBB
- YOLOv11-Pose
Improvements
- YOLO models now support exporting to dynamic batch size
- EdgeTAM now supports video object tracking (in addition to image segmentation)
- Link to Voice AI SDK added to website and README for relevant models
Bug Fixes
- Add missing
ddcolordependency
Performance Numbers
- Updated performance numbers with the latest version of AI Hub Workbench
- Dragonwing IQ-8275 EVK is now featured on model cards as a similar device to SA7255P
v0.54.0
New Models
- Pi0.5 (pi05) — Vision-language-action model
Improvements
- Select Qualcomm devices that are not hosted on AI Hub Workbench (i.e., Dragonwing Q-7790, Dragonwing Q-8750, Dragonwing IQ-X5121, Dragonwing IQ-X7181, SA8255P ADP, SA8650P ADP) now appear on the device dropdown list for applicable models alongside metrics for a similar hosted device. This helps users to identify supported device/model pairings even when AI Hub Workbench cannot host the actual device.
Bug Fixes
- Fix FCN-ResNet50 accuracy gap: Previous evaluation used incorrect label mapping for the 21-class VOC segmentation task, producing artificially low metrics.
- Fix DETR models mixed precision: Switched DETR-ResNet50/101 and Conditional-DETR quantization to use fp16 as the higher precision, resolving an accuracy regression with the previous int16 recipe.
- Fix MeloTTS empty assets in config: Model assets in config.json were serializing as empty due to a Pydantic v2 type annotation issue
v0.53.1
-
New models
- SixDRepNet (
sixd_repnet) - RangeNet-Plus-Plus (RangeNet++,
rangenet_plus_plus)
- SixDRepNet (
-
Re-instated models
- HuggingFace-WavLM-Base-Plus (
huggingface_wavlm_base_plus)
- HuggingFace-WavLM-Base-Plus (
-
Improvements and bug fixes
- Export metadata.json was missing IO specs for models released as context bin
- Evaluation bug fixes for huggingface_wavelm_base_plus
- The model llama_v3_2_3b_instruct_ssd now supports up to 8K context length (thanks to config changes and LM head split out into separate context binary)
v0.52.0
New Models
- Qwen2.5-VL-7B (handles both vision and text input, pre-compiled assets are available from here)
- ResNet34-SSD
Improvements
- Downloads from model cards are now zip files consistent with the export command (which includes metadata and auxiliary files).
- Fix profiling issues with ddrnet23_slim and lama_dilated.
- Add Genie App scripts to LLMs/VLMs (run with
genie-app -s genie-app-script.txt) - Bug fixes
v0.51.0
New Models
- CREStereo
- DnCNN — First image de-noising model
- Mask2Former
- YOLOv9-Det
Reinstated Models
- Whisper-Medium
- RF-DETR
- Swin family (Swin-Base, Swin-Small, Swin-Tiny, SwinV2-Base)
New Features
- Whisper, MeloTTS, and Opus models can now be exported to run with the Voice AI runtime
- Enhanced model metadata —
get_input_spec()now carries typed preprocessing metadata (image normalization, resize info) viaInputSpecEntryfor programmatic access
Improvements
- Bumped default QAIRT version to 2.45; all performance and accuracy numbers updated - Upgraded LLM Genie runtime to QAIRT 2.45
- Improved
mediapipe_selfievisualization to match MediaPipe reference - Model metadata produced by export changed from YAML to JSON and now includes LLM metadata
Bug Fixes / Cleanup
- Fixed
swinv2_baseaccuracy — added missing attention replacement that was causing ~39% accuracy loss - Fixed a bug in the Whisper demo which now runs successfully
- Extraneous normalization in DDRNet and PidNet demo was removed
- Qwen 2.5 7B export now includes sequence length of 1
- Fixed
electra_bert_base_discrim_googleevaluation script for batch size > 1 - Removed
ONNXRUNTIME_GENAIruntime
v0.50.2
v0.50.1
Bug Fixes:
- Fixed windows compatibility for Llama 3.2 3B Instruct SSD
- In export.py, models with multiple components now support changing input size for each component separately
- Fix issue where LiteRT was not included in the SDK / Tool versions in metadata created by export.py
- Fixed issue where evaluate.py for electra_bert_base_discrim_google would always produce accuracy of 0
- Fix issue where CLI commands would fail on Python 3.11+
- Restore missing package README for display on PyPI
v0.50.0
v0.49.1
General Updates
- Added quantized variants for DETR-ResNet50, DETR-ResNet50-DC5, DETR-ResNet101, and DETR-ResNet101-DC5
Removed BiseNet and BGNet due to licensing concerns - Llama 3.2 3B Instruct SSD variant uses Self Speculative Decoding (SSD), inference acceleration solution that achieves on-target speed up with guaranteed output accuracy identical to the base model. Choose this variant over llama_v3_2_3b_instruct for faster token generation on supported devices.
- Updated performance & accuracy data from latest version of AI Hub Workbench
Bug Fixes
- Fixed batchnorm unfolding issue in MediaPipe Hand Gesture, enabling the model to be fully NPU-resident when quantized with TFLite
- Fixed non-determinism in loading the BSD300 dataset. This previously caused us to report incorrect accuracy data for several super resolution models.
- MeloTTS has been updated to work around an HTP issue with summation that produced incorrect shapes at runtime. This update is available only via the export script and is not yet available with pre-generated assets.